Just days before the election, my final forecast went against the wisdom of professional forecasters and pollsters alike and projected a rail-thin electoral margin for Joe Biden. While the election results surprised many people on the night of November 3, my probabilistic model produced a point prediction with an even closer race in the electoral college–273 electoral votes for Biden compared to his actual 306–but a wider spread in the popular vote–52.8% compared to his actual 52.3%.
The statistical aphorism that “all models are wrong, but some are useful” served as my guiding philosophy in constructing this model. As I discussed in my final prediction, I did not expect this model to perfectly forecast all outcomes in the election. Rather, this forecast aimed to provide a range of state-level probabilities and outcomes. Then, I used the most probable state-level popular vote counts to produce point predictions for the Electoral College and national popular vote. While I presented these numbers as my “final prediction”, I would have been incredibly shocked if the point predictions perfectly matched the electoral outcomes since my model’s probabilities indicated a fair amount of uncertainty.
All in all, I’m quite happy with how this model paralleled with the election outcomes. It only misclassified the winner of GA, NV, and AZ, which were three of the final states called after election night. Even though the forecast indicated that Donald Trump had a greater probability of winning these three states, the vote shares were incredibly close in the simulations, and either candidate had a fair shot of winning: Joe Biden won GA, NV, and AZ in 19.2%, 43.9%, and 20.5% of simulations, respectively.
Forecasters cannot predict the election outcome with absolute certainty, but models provide a range of possible scenarios. This model successfully anticipated a close Electoral Race with a large popular vote margin, and the actual outcome occurred more than a handful of times in my simulations.
The actual Electoral College outcome, with each candidate winning the exact combination of states that they won on Election Day, occurred in 53, or 0.001, of my simulations. To put that into perspective, the scenario from my point prediction occurred in 5080 of my simulations, which only equates to 0.051% of my simulations. With a frequentist1 interpretation, my forecast may have correctly assigned the probabilities to each outcome and we just happened to observe one of the 53 scenarios in which each candidate won that exact grouping of states. Unfortunately, only one iteration of each election plays out in the real world, so we cannot determine the true probabilities of each outcome.
With a correlation of 0.962 between the actual and the predicted two-party popular vote for each state, the predicted state-level two-party vote shares have a very strong correlation with the actual state-level outcomes. With that said, the inaccuracies do have a couple of distinct patterns:
On average, Joe Biden underperformed his predicted vote share by -0.242 percentage points relative to the forecast. As visible in the below scatterplot, Joe Biden’s actual vote share fell short of the model’s predictions in the Democrat-leaning states and exceeded the predicted vote share in the Republican-leaning states.
Despite overpredicting Joe Biden’s vote share in most states, the model underestimated Joe Biden’s performance in the only three misclassified states. Essentially, the model overestimated Joe Biden’s vote share in general but underestimated it in the states with incorrect point predictions.
The below maps illustrate the areas with the greatest error. Notice that safe blue and red states such as New York and Louisiana have relatively large errors, while battleground states such as Texas and Ohio have extremely slim errors. For a closer look at the data, the included table contains all of the actual and predicted two-party vote shares for Joe Biden, ordered by the magnitude of the error:
State | Actual Democratic Two-Party Vote Share | Predicted Democratic Two-Party Vote Share | Error |
---|---|---|---|
NY | 57.48345 | 69.61151 | -12.1280547 |
RI | 60.50962 | 69.52752 | -9.0179003 |
HI | 65.03266 | 72.32903 | -7.2963701 |
LA | 40.53556 | 33.93166 | 6.6039010 |
SC | 44.07339 | 37.81854 | 6.2548501 |
DE | 59.62674 | 65.87647 | -6.2497293 |
AR | 35.79247 | 29.62813 | 6.1643451 |
AK | 44.76601 | 40.02248 | 4.7435307 |
CA | 65.03768 | 69.50769 | -4.4700118 |
CT | 60.15808 | 64.60562 | -4.4475457 |
MS | 41.62040 | 37.20396 | 4.4164364 |
NJ | 58.14343 | 62.50579 | -4.3623519 |
WA | 59.94656 | 64.18870 | -4.2421353 |
OR | 58.21289 | 62.25845 | -4.0455593 |
ND | 32.78259 | 36.80377 | -4.0211801 |
MA | 66.86069 | 70.77922 | -3.9185289 |
NE | 40.24716 | 44.02699 | -3.7798311 |
WV | 30.20202 | 33.80620 | -3.6041767 |
KS | 42.25143 | 38.65813 | 3.5932950 |
MN | 53.63371 | 50.05711 | 3.5765990 |
GA | 50.12756 | 47.01611 | 3.1114503 |
ME | 55.12922 | 52.09217 | 3.0370501 |
SD | 36.56522 | 39.42158 | -2.8563608 |
AZ | 50.15683 | 47.34474 | 2.8120875 |
MT | 41.60282 | 38.79975 | 2.8030690 |
MO | 42.16658 | 39.50587 | 2.6607056 |
AL | 37.03289 | 34.62090 | 2.4119937 |
KY | 36.79990 | 34.47128 | 2.3286170 |
IN | 41.79340 | 39.56968 | 2.2237237 |
VA | 55.15251 | 57.35345 | -2.2009380 |
NV | 51.22312 | 49.36777 | 1.8553463 |
TN | 38.11647 | 36.32844 | 1.7880338 |
NM | 55.51569 | 53.81917 | 1.6965192 |
CO | 56.93964 | 58.48604 | -1.5463978 |
UT | 39.30639 | 37.82175 | 1.4846465 |
NC | 49.31589 | 48.10486 | 1.2110359 |
IA | 45.81652 | 46.91440 | -1.0978824 |
VT | 68.29919 | 67.26910 | 1.0300846 |
IL | 58.62478 | 59.62805 | -1.0032772 |
MI | 51.41355 | 50.53204 | 0.8815117 |
NH | 53.74835 | 53.06014 | 0.6882049 |
WY | 27.51957 | 26.85758 | 0.6619846 |
FL | 48.30525 | 48.95294 | -0.6476975 |
TX | 47.17236 | 46.69227 | 0.4800943 |
OK | 33.05996 | 32.60532 | 0.4546372 |
ID | 34.12328 | 33.75413 | 0.3691501 |
MD | 66.84738 | 67.16643 | -0.3190540 |
PA | 50.59915 | 50.68815 | -0.0890044 |
OH | 45.94309 | 45.99202 | -0.0489300 |
WI | 50.31728 | 50.35326 | -0.0359886 |
Since this model was not unilaterally biased like most other forecast models, this model’s average error is considerably closer to zero than other popular forecasts, and the errors are more normally distributed around zero:
Model | Mean Error | Root Mean Squared Error | Classification Accuracy | Missed States |
---|---|---|---|---|
Kayla Manning | -0.2417201 | 3.882507 | 94 | AZ, GA, NV |
The Economist | -2.3310087 | 2.803927 | 96 | FL, NC |
FiveThirtyEight | -2.4447961 | 3.019431 | 96 | FL, NC |
To assess my vote share trend hypothesis, I would follow the same procedures as outlined in my final prediction to reconstruct the model with the additional variable. Once I have constructed this model, I would follow a series of steps to assess its validity:
These assessments should provide enough metrics to determine if this new model performs better or worse than my original model in in-sample and out-of-sample validation. If this new model performed better than the original model on at least two out of these three measures, then I know that the absence of that particular variable introduced a weakness to my original model. However, if my previous model performed better or approximately the same on these measures, then I would stick with my original, more parsimonious model for the future.
To assess the validity of my voter registration hypothesis, I would follow the steps outlined above, but with the variable that captures the change in Democratic voter registration in the place of the voting shift variable. If both of these new models perform better than the original model, I would follow the same process to assess the strength of an additional model that contains both variables.
Aside from the absence of the aforementioned variables to capture partisan trends within states, I also plan to make several methodological changes to this model for the future. I touched on many of these in greater detail in my final prediction post, but here is a brief overview:
While my forecast failed to predict the election outcomes with absolute precision, this model correctly projected a relatively close race in the Electoral College with a larger margin in the popular vote. Furthermore, the outcomes of November 3 all reasonably match the vote shares and win probabilities estimated by the model. Even in GA, NV, and AZ–the three misclassified states–the actual vote shares were not too far from the predictions, and the simulations gave both candidates a fair probability of winning all three of those states. Despite having predicted this election exceptionally well, I hypothesize that this model’s failure to capture partisan trends within states led to the shortcomings in the predictions.
Unlike rolling dice, we cannot experience multiple occurrences of the same election to uncover the true probability of each event. Frequentist probability describes the relative frequency of an event in many trials; conducting many simulations in my model took a frequentist approach to uncover the probability of each outcome. However, we can never really know if any of the probabilities were correct because the 2020 election only happened once (thank goodness!). Trying to say whether or not a probabilistic forecast was correct is like rolling a “six” on a single die and concluding that your prior probabilities of 1/6 for rolling a 6 and 5/6 for rolling anything else were incorrect because you observed the less probable outcome on a single iteration.↩︎
I explain my weighting scheme in further detail in my final prediction, but I essentially took a weighted average of each state’s polling numbers, favoring polls with higher grades by much more than polls with lower grades.↩︎
However, any changes would have to keep in mind that FL, OH, WI, etc. were more conservative than most forecasts anticipated, and this model correctly anticipated the winner in these highly contentious battleground states.↩︎
To remain consistent with my final forecast, I would not use polls from after 3 PM EST on November 1, which is the last time I used FiveThirtyEight’s state-level polling data for my original model.↩︎