Birds of a feather vote together (part II)
Last week, “aquienvoto.uy” [whoIvote.uy], became a media boom, everybody was talking about. It is an application which recommends users who to vote, in the upcoming elections.
Users score from 1 to 5 (1 meaning ‘disagree’, 3 ‘neutral/don’t know’ and 5 ‘totally agree’) to 26 statements on economy, security, and social affairs. Next, based on their answers, they are told who to vote.
On the first part of the article (very aquí), by means of an example I explained how the algorithm works; here, I will analise users’ answers, and on the third one, I will develop other classification models, based on algorithms other than KneighborsClassifier.
Hands-on data!
I had all users’ answers before my eyes: I couldn’t help looking into data (sorry about this, it is a mind’s professional distortion).
I cloned GitHub repository, and made a small transformation to Pentaho Data Integrator to move csv’s data to an SQL Server, and to be able to “pivot” them in order to have a table where each column would represent each user who filled in the form:
This is how the survey’s data look, once they have been “pivoted”, where there is a single line for each filled in form:
It should be noted that these data were taken from 123.119 forms. Unfortunately, only 25.166 users, out of that figure, selected their candidate at the end of the form.
The first remarkable observation was to learn which political party the people on the survey prefer:
We can make the same analysis but according to each candidate
What do Uruguayans “totally agree” on?
It was interesting to find out which questions had most positive votes:
The four statements which were given 4 on average, were the ones on the picture above.
What do Uruguayans “totally disagree” on?
The 4 statements which were given the lowest score were:
Let’s see how the scores for “Legalizing abortion was a mistake” are distributed:
What statements can Uruguayans not agree on?
I thought it would be interesting to see which of them are most controversial, so I calculated their standard deviation.
Now, let’s focus on the score given to the most controversial statement: “Death penalty should be an option for major crimes”.
There is a clear-cut division of votes.
In the next part of this article, I will analise questions/statements more into depth, particularly to spot which statements divide officialism from opposition, and I will create classification models using Random Forest algorithms and Rules.
Héctor Cotelo@CoteloHector
Data Analytics & Information consultant