What Is Probability?

Two legitimate interpretations of probability, both useful, often confused. The frequentist view treats probability as long-run frequency; the Bayesian view treats it as a degree of belief. The distinction matters every time you read a forecast, a poll, or a risk figure in public life.

Time: 12 minutes

Requires: Unit 0.5

Opening Hook

On the morning of 8 November 2016, the HuffPost election forecast model gave Hillary Clinton a 98 percent chance of winning the presidency. The Princeton Election Consortium had her at 93 percent. FiveThirtyEight, considered the cautious outlier, had her at 71 percent. By the following morning, Donald Trump had won.

The forecasters were not, for the most part, wrong. At least, not in the way the headlines suggested. A 70 percent chance of something happening is also a 30 percent chance of it not happening. A one-in-three chance of rain is not a forecast of sunshine. But millions of people who read those probabilities on election morning genuinely believed they were being told the result in advance. When the result arrived, the reaction was not “well, the 30 percent scenario came in.” The reaction was “the forecasters were lying to us.”

This is what happens when probability is used in public discourse without any shared understanding of what it means. And it happens constantly: in weather forecasts, in medical risk statistics, in criminal trials, in financial marketing. You are handed numbers expressed as chances or likelihoods or probabilities, and you are expected to make decisions with them, without anyone explaining what they are actually measuring.

This unit explains that.

The Concept

Probability, at its most basic, is a number between 0 and 1 (or equivalently, between 0 percent and 100 percent) that represents how likely something is to occur. Zero means it cannot happen. One means it is certain. Everything in between is uncertain, to some degree.

So far, so simple. The problem begins when you ask what “likely” actually means. Mathematicians and statisticians have argued about this for roughly three centuries, and the argument has not fully resolved. But there are two main answers, and both of them are useful in the right context.

The frequentist interpretation says that probability is the long-run frequency of an event across many repeated trials. If you flip a fair coin a thousand times, you will get heads roughly 500 times, give or take. The probability of heads is 0.5 because, in the long run, across many tosses, heads comes up about half the time. This is probability as something you can measure by counting.

The engine that makes this work is the law of large numbers. In a small number of trials, the actual results can deviate significantly from the theoretical probability: you might flip ten heads in a row. But as the number of trials increases, the observed frequency converges toward the theoretical probability. With a thousand coin flips, you are very likely to be close to 50 percent heads. With a million, you are virtually certain to be close. The frequentist definition is reliable precisely because of this convergence.

Notice what the frequentist interpretation requires: repetition. The event has to be the kind of thing that can happen over and over again, under roughly the same conditions, so that you can observe its long-run frequency. A coin toss qualifies. Rolling a die qualifies. The failure rate of a particular type of engine, measured over thousands of hours of operation, qualifies.

The Bayesian interpretation takes a completely different view. It says that probability is a degree of belief, a measure of how confident you are that something is true, given everything you currently know. Under this interpretation, probability is not a property of a repeatable process. It is a property of a mind’s relationship to a claim.

The Bayesian approach is named after Thomas Bayes, an eighteenth-century English minister and mathematician whose work on updating beliefs in light of evidence was published posthumously in 1763. The core idea is that you start with some prior belief about the probability of an event, then update it as new evidence arrives. The updated belief is called the posterior probability. (We will encounter Bayes’ theorem properly in Unit 1.7; for now, the name and the general shape of the idea is enough.)

Under the Bayesian interpretation, saying there is a 30 percent chance of rain tomorrow is not a claim about how often it rains on days with this kind of atmospheric reading. It is a claim about the forecaster’s current state of knowledge, their degree of confidence, after incorporating all available information: satellite images, pressure readings, historical patterns, model outputs. Tomorrow only happens once. There is no long-run repetition. The Bayesian interpretation allows probability to be applied to single events, which is exactly what makes it useful for forecasting.

Both interpretations are valid. Neither is wrong. They are answering slightly different questions. The frequentist is asking: in the long run, across many similar situations, how often does this kind of event occur? The Bayesian is asking: given everything I currently know, how confident should I be that this will happen?

The trouble arises when the two are confused, or when a number is presented as if it belongs to one framework when it actually belongs to the other.

Why It Matters

Weather forecasts are, in practice, a blend of both. When the Met Office says there is a 70 percent chance of rain tomorrow, they mean that in roughly 70 out of every 100 occasions when the atmospheric conditions resemble today’s, it rains. That has a frequentist flavour: it is derived from historical frequencies. But it is being applied to a single event, tomorrow’s weather, which gives it a Bayesian flavour. BBC Weather had to clarify this directly on social media in 2022, explaining that a 70 percent rain symbol does not mean it will rain for 70 percent of the day, nor that 70 percent of the area will see rain. Most readers had inferred one of those meanings.

The confusion matters because the two interpretations lead to different behaviours. If you think 70 percent means it will rain for most of the day, you stay home. If you understand it as a 70-in-100 chance of any rain, you might take an umbrella and go about your plans. The number is the same. The action it implies is not.

Election forecasting puts the confusion under an especially harsh light. The 98 percent Clinton probability that HuffPost published on election day was a model output: given all available polling data and historical patterns, their model found that in 98 of 100 simulated elections, Clinton won. This is a frequentist statement about a simulation, applied to a one-time real-world event, which requires you to accept some Bayesian reasoning about the model’s assumptions. Many readers received it as a certainty bordering on a guarantee. A 30-in-100 chance of the Trump outcome is not a small chance. It is roughly the same as the probability that a randomly selected person has blood type O positive.

The phrase “probability of X” appears constantly in public life. Medical trial results, crime statistics, financial projections, scientific findings. Each time, the number was produced by a method that assumes either a frequentist or a Bayesian framework, or some mixture of both. Almost never is that method disclosed, and almost never is the distinction explained. You are handed a number and expected to know what to do with it.

How to Spot It

The 2016 election forecasts are the clearest documented case of the confusion causing real harm, because the aftermath was so public and so widely documented.

Multiple models produced probabilities in the 70 to 98 percent range for a Clinton victory. Nate Silver, whose FiveThirtyEight model was at the cautious end with 71 percent, wrote after the election that the public had systematically misread what probability forecasts mean. He described conversations in which people told him the models had “said Clinton was a sure thing.” A 71 percent probability is not a sure thing. It means Trump wins roughly once in every three such elections. Silver had explicitly built that uncertainty into his model; most readers filtered it out.

The tell in cases like this is a probability without a stated method. When HuffPost said “98 percent,” that number came from running their simulation 10 million times and counting how often Clinton won. The number was a frequentist output from a simulation, being applied to a one-off event as if it were a Bayesian degree of belief. The two are not the same, and the gap between them is exactly where public confusion lives.

The question to ask when anyone quotes a probability is: what is the reference class? A frequentist probability requires a comparison set: 98 out of 100 what, exactly? 98 out of 100 simulations of a model that may have flaws? 98 out of 100 elections with similar polling margins? Those are very different claims. A Bayesian probability requires transparency about the prior: what did this person believe before they gathered the evidence, and how much did the evidence move them? Neither question is usually answered in a headline.

Your Challenge

In March 2023, a UK government minister was asked about the probability of a new form of taxation being introduced before the next general election. The minister replied: “I would say the probability of that is very low. We have no plans to introduce such a tax.”

Take that claim apart. What interpretation of probability is the minister using? Is this a frequentist or a Bayesian claim? What would each interpretation require to make it well-founded? What information is the minister relying on, and what information is being left out? Is “very low” a probability claim at all, or is it something else entirely?

There is no answer on this page.

References

HuffPost 2016 presidential forecast (98 percent probability for Clinton): “HuffPost Forecasts Hillary Clinton Will Win With 323 Electoral Votes,” HuffPost, 7 November 2016. URL: https://www.huffpost.com/entry/polls-hillary-clinton-win_n_5821074ce4b0e80b02cc2a94 — Model methodology: simulated the election 10 million times using state-by-state polling averages; Clinton won 9.8 million simulations.

FiveThirtyEight 2016 presidential forecast (71 percent probability for Clinton at close of polls): Nate Silver and the FiveThirtyEight team, post-election analysis summarised in: “Nate Silver says conventional wisdom, not data, killed 2016 election forecasts,” Harvard Gazette, March 2017. URL: https://news.harvard.edu/gazette/story/2017/03/nate-silver-says-conventional-wisdom-not-data-killed-2016-election-forecasts/

Princeton Election Consortium 2016 final projection (93 percent probability for Clinton): Sam Wang, “Final Projections 2016,” Princeton Election Consortium. URL: https://election.princeton.edu/articles/final-projections-2016/

BBC Weather clarification on the meaning of percentage rain probability: @bbcweather on X (Twitter), 1 July 2022. URL: https://x.com/bbcweather/status/1542456396230410240 — Thread clarifies that a 70 percent rain symbol does not mean rain for 70 percent of the time or across 70 percent of the area; it represents the probability that measurable rainfall will occur at a given location during the forecast period.

Frequentist vs. Bayesian interpretations of weather probability: Andrew Gelman, “What does it mean when they say there’s a 30% chance of rain?”, Statistical Modeling, Causal Inference, and Social Science blog, 14 December 2019. URL: https://statmodeling.stat.columbia.edu/2019/12/14/what-does-it-mean-when-they-say-theres-a-30-chance-of-rain/

Public misunderstanding of election probability forecasts: “Did Trump win in 2016 because people are bad at probability?”, Washington Post, 28 February 2020. URL: https://www.washingtonpost.com/politics/2020/02/28/did-trump-win-2016-because-people-are-bad-probability/

Law of large numbers: Wikipedia, “Law of large numbers.” URL: https://en.wikipedia.org/wiki/Law_of_large_numbers

Thomas Bayes and the origins of Bayesian probability: Sharon Bertsch McGrayne, The Theory That Would Not Die (Yale University Press, 2011). Bayes’ original paper: “An Essay towards solving a Problem in the Doctrine of Chances,” Philosophical Transactions of the Royal Society, 1763.