What is the point of history if not to learn from the past and apply that knowledge to the future? Our rush to amass ever larger silos of data, the data that’s writing modern history, cannot outpace the gallop to build tools capable of making sense of it. Accumulation is only really useful in conjunction with analysis. And the ultimate goal of Big Data Analysis should be Big Data Extrapolation. To use data about the past and the present to tell us something useful about the future.
In this two-parter I’m going to tell you about my attempts to use social data to predict the future, specifically to anticipate the results of a national election the day before it takes place. In the first part I’ll explain the methodology I used to (successfully) predict the 2014 European Parliament Elections back in May. Then, in an as-yet-unwritten second part, I’ll look at whether the trick can be repeated with the Scottish Independence Referendum vote in September, and the UK General Election in 2015. Ahead of that second part I’m sharing the Scottish Independence data I’m gathering, so you can conduct your own experiments.
But first, we go back to May, and the night the UK came second to the French in a xenophobia contest…
If I were to only see the world through my twitter stream, I’d imagine it to be an erudite, groovy and socially conscientous place. Everyone within my filter bubble is tolerant of their neighbours, has a good understanding of our socio-economic problems and their causes, and are, generally, intelligent, witty and reasoned. No-one in my twitter stream would ever vote for a far-right party campaigning primarily on anti-immigration platform. This would be unthinkable.
But I know there is a bigger world out there. I know my local newsagent has a pile of Daily Mails ten times the height of all its broadsheets combined. And I know that on Thursday 22nd May 4,352,251 of my UK compatriots did that thing that was so unthinkable from within my comfy, skewed bubble – voted for an a borderline xenophobic political party. The far right UKIP got the biggest share of the vote in a national election, the first time a minority party has outperformed both Labour and the Conservatives, the two deeply entrenched leading parties that dominate UK politics, in 110 years. Could anyone have seen this coming?
Well, yes. We did see it coming. I called the result on the day of the poll. And I got UKIP’s share of the vote right to within a 1% margin of error. If you keep reading I’ll show you my workings.
I was experimenting with the Brandwatch API. Brandwatch provide social media monitoring and analytics tools on an industrial scale. They pride themselves on the quality of their data – which they claim is spam-free, relevant and current. I could think of no better test of their capabilities than to see if the sentiment and volume derived from their data would match the sentiment and volume we could interpret from a national election result.
My findings were better than expected.
My premise was a little shaky. I was going to look at Twitter, the most immediate and now-obsessed of the social media platforms, to see if there was a correlation between the level of chatter on the five leading parties and the share of the vote. It was a two step correlation, based on two big assumptions. First I needed to assume the volume of mentions was indicitative of the volume of interest. Second that that volume of interest would translate into the volume of votes.
Volume of mentions = Volume of interest = Volume of votes
It would be easy to argue that either of these premises could be flawed. And even if both were sound, they would be near impossible to validate. But this shouldn’t matter. The premises didn’t need to be watertight, they just needed to be good enough. Twitter, in this case, is just treated as a representative sample.
Representative samples are the basis of most opinion polling, which is traditionally conducted on small, carefully chosen groups. Any poll result should always be judged with a consideration for the “representative”-ness of its sample. With my experiment the sample was large. Very large. But I still needed to assess how good it was, if only because if I didn’t ask that question someone else would. The flippest of my answers was: about as good as any other. At least as good as the 34% “representative sample” who chose to exercise their democratic right that day.
With only 34% of voters expressing their opinion it essentially means both my twitter data and the election result itself may be regarded as representative samples of public preference. One representative sample gave me my prediction, the other sent 73 euro-MPs to Brussels.
I tried to head off as many arguments against my methodology as I could preempt. And there were many. Joaquin at Brandwatch pointed me at this article for example.
The biggest and most damning of the common criticisms with previous Twitter-predictions was that they all seemed to have been written after the fact (like this article), so were not really predictions at all. This was an easy one to quash. For it to be worth doing I knew I had to make the prediction ahead of the result. So I made a point of tweeting the prediction on the day of polling, so the figure was time-stamped.
I also still had to defend the criticism that my sample, though large, was not representative. We could not be assured the authors were from the correct demographic – many may not even be entitled to vote (a twitterers entitlement to spout opinion is not dependent on age, country of residency, criminal record or sound state of mind). Not everyone is on Twitter. The data doesn’t take into account “lurkers” and the “silent majority”. It includes disinformation (i.e. lies). Tweets cannot be filtered for sarcasm. Twitter content doesn’t represent it’s authors true self, only a public front. Etc. Etc. Etc.
In short, it was expected that the query would contain a lot of unwanted data.
My counter-argument was that there was no such thing as unwanted data in this context. The “unwanted” data was as much a part of the picture as the wanted. I could trust Brandwatch’s algorithms to clean out the spam and duplicates (one reason I favoured their app over gathering data from Twitter’s API direct), so I could be assured my data was a rich mess of human-generated thought and opinion. At least as rich and messy as the “real” world.
We can’t say all voters make their decisions on sound, rational, fully-informed grounds, so an election could easily contain the same amount of mess and noise as an average day on twitter. So, I was arguing, this one representative sample could be just as valid as the other.
The complaint that “not everyone is on Twitter”, or that what people say in public is not the same as what they really think, are the same criticisms we could level at any other poll. But with Twitter we have a slight advantage, as we have grown to know it rather well through these early years, so we might be able to make better assumptions about the demographic than we could about a random poll. I’ll look at some such assumptions below when it comes to explain the disparities.
But before we start fudging the figures, lets have a look at what the data predicted.
I looked at four data sets -
1. Total mentions that day
2. Total positive mentions that day
3. Total mentions for the last 7 days
4. Total positive mentions for the last 7 days.
I presented snapshots of these data sets here and updated them every few hours throughout that Thursday (22nd May 2014). I took the final snapshot of the data once the polls had closed at 10pm, and presented that as my final prediction. The result wouldn’t be known until the Sunday, once the rest of Europe had voted in their various elections, so the prediction stood at this for four days. Below are the figures at close of polling.
Polling Day: Thu 22nd May 2014 midnight-10:19pm
Seven Days: Fri 16th – Thu 22nd May 2014
I then took an aggregate prediction, by averaging these values. I thought an average between the bigger, more random, volume figures and the smaller, more specific, positive sentiment would give a better balance than just going with the biggest number.
Initially I took:
(mentions that day) + (positive sentiment that day)
+ (mentions for the week in total)
+ (positive sentiment for the week)
…and divided the total by four.
My thinking here was to weight the prediction slightly towards the very latest data, expecting this to say something about the snap decisions made on the day. But I changed my mind as the day went on, and instead decided to just go with the 7 day data (see “Trusting the Data” below). So my final prediction was based on:
seven day positive + total mentions, divided by two.
This was then the final result on Sunday:
The 9.7% = other parties (SNP, Sinn Fein, Plaid Cymru & 21 others)
… And this is how my prediction compared:
Adding it up, the total difference was 9.8%. Which is, of course, the figure unaccounted for in this five-party split – those who voted for other parties. So, I corrected the figures to take this into account, by making the total percentage 109.7%.
This gave me:
val / 109.7 * 100
So what does this tell us?
First up we should note that I really shouldn’t be allowed to get away with that retrospective adjustment for “other”. I’d be no more capable of guessing 9.7% as a reasonable figure for “other” ahead of the event as I would be able to speculate on any of the other parties results. This is clearly a fudge.
It might be an excusable fudge though, because the failing is not with the data, only my laziness in setting up the query and my haste to produce a figure on the day. The data would have been capable of discovering the 9.7% figure if I had monitored the mentions of all the other 24 parties fielding candidates (including regionally popular parties such as the Scottish Nationalists, Sinn Fein and Plaid Cymru). I just didn’t have time to set up queries on every bugger in the game.
Still, I’m going to work on the faith that the data would have got this right, even though I cannot prove that it would without having actually done the query. The Brandwatch App does enable me to do this retrospectively if I want, as BW’s databases store everything (EVERYTHING!). But I’m still not sure I could trust myself to do this without prejudice after the event. So I hope you’ll forgive me if I leave this one figure to faith.
Second problem – the Tory figure is way out. If this was more accurate would it have balanced the Labour and Green figures to their correct positions? This is probably a big assumption too.
My only explanation for this discrepancy: tories don’t tweet. Even looking at Twitter from outside my filter bubble, I’d be tempted to say it has a very slight left-wing bias. Simply because conservatives, by their very nature, tend to be … um … conservative – i.e. they hold traditional values, so they are less likely to be expressing their opinion through such a hideous new-fangled abomination like twitter.
Unfortunately, this assertion is slightly undermined by the fact that my Twitter data was not the only place this was evident. The Tory share was underestimated, to a lesser degree, in many of the polls too.
YouGov/Mirror/Times – 21st May 2014
Opinium – 21st May 2014
Survation/Mirror – 20th May 2014
So, treading very carefully here, might we make a further assumption about Tory voters – that they are more likely to hide their voting intention than supporters of other parties? Is voting tory, to some, a secret shame? The centre-right ideology has become, particularly in the post-Thatcher UK, a narrative of the self: self-reliance, self-worth and self-protection. Does this individualism subtly influence their likelihood to disguise voting intention? If one’s vote is something done for themselves, their own interests, rather than others, why should it be anyone else’s business? We can only speculate.
Positive Sentiment – Polling Day
Moving on swiftly, lets look at the third noticeable discrepancy – the amount of chatter around the Green party on polling day. This was partly the reason I switched from averaging all four figures to just using the 7 day data as, to my not-at-all-impartial human eyes, the positive Green sentiment was looking like an abberant spike. And even averaged in with the rest of the week, the spike was enough to overestimate their share of the vote by almost a third.
Again, I can only speculate as to the reasons. The Greens, who did see big gains that day, may have been consciously over-compensating for their lack of coverage in other media. Or it may simply have been that they were the party with the best social media strategy. 28.5% of all positive mentions on polling day is a figure of which any social media agency would be proud. Deliberate and concerted social media strategy is something that is difficult to account for with a prediction of this nature. If all parties were equal, it would average itself out, but if any one party excels at it (as maybe the Greens do), it throws the figures.
To be honest, in a perverse way I was glad the prediction threw up these little discrepancies. If only because it gave me something to write about and some lessons to be learned (as I’ll discuss in Part 2, when I look at reproducing this). If it was spot on, I’d have dismissed it as a freak coincidence. But with the figure so close, I should probably also look at what I did right too.
The thing I did right, I believe, was to trust the data.
Unavoidably, I bought my own expectations to the project. The biggest of these was a worry the UKIP chatter would overestimate their share of the vote. My own Twitter stream, full of the erudite, tolerant and groovy, contained a torrent of UKIP/Farage ridicule. Would the volume of UKIP mentions be disproportionate simply because they were easy targets, too easy to make jokes about? Especially in the land of the smart-arses we call Twitter.
But somehow, of all the figures, this was the one that had the closest correlation. Which strongly reinforces my initial questionable premise:
Volume of mentions = Volume of interest = Volume of votes
It’s interesting though that most of the polls also overestimated UKIP’s share. The very thing I was expecting from the Twitter data, but which didn’t appear. I wondered if this might be the opposite of the Tory “secret shame” effect, that people declared an intent to vote UKIP when questioned as a form of posturing or self-aggrandisement. Yet when it came to the day, they didn’t follow through. Although, if this was the case, why didn’t we see this in the Twitter data too?
Throughout that Thursday I was monitoring my algorithm and tweaking the prediction by changing the way the aggregate was calculated. Meaning, of course, that it was prone to my own prejudices. I could equally have chosen just positive sentiment. Or just volume. Or just the day’s figures, rather than the 7 day average. But of the options I toyed with, I was perhaps lucky that the one I picked came the closest. If I had plumped for just 7 day volume, which retrospectively looks like the second closest, the split (adjusted for “other”) would have been:
… which, again, wouldn’t have been far off. And is more even in its discrepancy. But also would have predicted a labour victory, which would have been the wrong headline.
The Brandwatch App contains a lot more sophisticated filters for the data, which I could have used to belay the demographic criticism I mentioned above (“opinions are only relevant if they’re from those entitled to vote”). The app can break tweets down by gender and age and profession demographics for example, even if those haven’t been specified by the author. But I chose not to look at that, instead taking the volume, with all it’s mess of “unwanted data”, as my measuring stick. I trusted the chaos to more accurately reflect the chaos of a voting public, with their mess of motives and ideosyncratic reasoning.
If this experiment proved one thing, it was that the concept of redundancy in this kind of data (especially when it is as “clean” as the Brandwatch data) is meaningless.
The real proof of this pudding would be to replicate the experiment. Knowing what we do, and what we’ve learned from above, can I rerun this for other elections? Say, the 2015 UK General Election next year? Or the Scottish Referendum next month?
This will be the subject of part 2, where I expect the excuses for my data-fails will get considerably more convoluted. Ahead of that you can see what I’m monitoring on the Scottish Referendum here. That link also includes pointers to the raw data, so you can attempt your own predictions before September 18th.