In recent years, big data and analytics have become deeply embedded in the way we make decisions. People rely on data, they believe in it, and they use it to identify ways to move their businesses forward.
But for big data believers, Tuesday’s election came as a huge shock, and now data scientists are left wondering what happened. How could all of these advanced algorithms and complex models have been so, so wrong? What happened? And what lessons can marketers learn from this?
A big (data) failure
While any data scientist will tell you that no forecasting model is ever 100% accurate, the margin by which it was off is incredibly concerning. A group of data scientists were convened back in April to study the election and is expected to complete its analysis by May 2017 to understand exactly why the pre-election polls were so dramatically wrong.
However, even in the immediate aftermath of the election there are a few clues that may offer an explanation for what happened.
Big data doesn’t account for environmental changes and nuances. This presents a problem in forecasting behavioral predictions where these factors play a huge role.
In the case of the election, there are a number of occurrences that could have impacted the results — like voter turnout, change in vote, collapse of third-party, dissuasion from voting, and Trump supporters not admitting they were voting for him.
Princeton University neuroscience professor Sam Wang told the the New York Times “that polls may have failed to capture Republican loyalists who initially vowed not to vote for Mr. Trump, but changed their minds in the voting booth.”
If major data sets, like an entire set of the population, were unaccounted for in the prediction model then it would unarguably offset the results.
Unlike the majority of forecasts, however, there was one model that got it right. Trump’s data and analytics team based out of San Antonio predicted Trump’s victory by accounting for a huge shift in the polling sample that was not factored into earlier projections.
In this case, the missing information proved to be pivotal in the election. The majority of prediction models likely accounted for the polls correctly, however didn’t have enough empirical data to support the results.
According to Bloomberg, Trump’s numbers were different because they were forecasting an entirely different electorate than other models. One that accounted for “older, whiter, more rural, more populist. And much angrier at what they perceive to be an overclass of entitled elites.”
In the weeks leading up to the election, Trump tailored his message to this group. And, unlike the Clinton campaign, he spent a considerable amount of time in the week leading up to Nov. 8 sharing these sentiments in the six states that proved to be pivotal in this election — Wisconsin, Michigan, Ohio, Florida, Pennsylvania, and North Carolina.
Head of Product at Cambridge Analytica Matt Oczkowsk, who was part of Trump’s data team, told Wired that “for the past 10 days, the campaign saw a tightening in its internal polls. When absentee votes and early votes started coming in, his team noticed a decrease in black turnout, an increase in Hispanic turnout, and an increase in turnout among those over 55.”
Through this early insight, Oczkowski and his team realized that the pollsters were wrong in terms of their samples and who they considered voters. They reworked the models and quickly saw a clear path to Trump’s victory, particularly within the Rust Belt states like Ohio, Michigan, Iowa, and Wisconsin
“The rural vote is the story tonight,” Oczkowski says. “The amount of disenfranchised voters who came out to vote in rural America has been significant.”
There is an obvious lesson to be learned here: empirical research is key.
With more empirical research, pollsters, and the Clinton campaign, would’ve had a better understanding of the beliefs, concerns, and hopes in the aforementioned states, and could’ve used it to fuel her campaign strategy and more accurately gauge uncertainty in the prediction models.
What can marketers learn from this?
The election highlighted some of the challenges with big data… the biggest being that it is still in its infancy. For marketers, the lessons are clear.
Context is key.
Data doesn’t take context into consideration. Although there are complex algorithms out there that help analyze and factor in context through social media data, we still have a long way to go before we can really leverage that data to make behavioral predictions.
Remain relentlessly objective.
Continue to pull in as much data from as many different sources as possible, don’t fall in love with your thesis, and challenge your assumptions consistently with every piece of new data. You never know what the final piece to unlocking the puzzle will be but with each additional piece the picture become clearer and clearer.
Don’t underestimate the role of empirical research.
Empirical research is essential for identifying cause and effect. For marketers, causality is critical. Moreover, it helps guide the interpretation of the data and make sense of the story. This is also very important when you’re looking at massive data sets, and without causality you may be looking in the wrong direction.
Although incredibly valuable, data is not infallible.
Data provides incredible value to marketers and should always be a consideration in the decision-making process. However, it is important to keep in mind that it’s not telling you what is going to happen or not happen but rather what that probability is based on the information provided.
It will be months before we fully understand what exactly caused the dramatic inaccuracy in the pre-election polls, but hopefully we can learn from this. It’s not the first time big data has been wrong and it certainly won’t be the last. The best we can do is take this as a friendly reminder that although indispensable, it’s not perfect.