Bayes's Theorem
Posted on Mon, Feb 20 2023 in Bob's Journal
I was surprised that a quick web search for Bayes's Theorem didn't turn up a simple example with an explanation. I could find interesting examples and stories that didn't actually describe how you would apply Bayes's Theorem, and complex math for scenarios that most people never encounter, but there didn't seem to be anything in the middle.
How is it that you can go from trusting a coin toss implicitly to accusing your friend of using a two-headed coin without even examining it? How does your email program know if a message it's never seen before is spam or not? Both of these can be achieved with Bayesian reasoning. Let's walk through the coin toss example. Coin tosses are considered a fair way to pick between two options. In a football game, it's used to decide which team gets to pick their side of the stadium or who receives the kickoff. In other cases, it can be used to decide who gets the last piece of cake or rides shotgun. It's a simple, unbiased process for making decisions between two options where there isn't a good reason to prefer one over the other.
Suppose that you and a friend are making such a decision, and she suggests you settle it with a coin toss. If it's heads, you'll both do what she wants to do. If it's tails, you'll both do what you want to do. You agree. The coin comes up heads and you've got your decision. No arguing or hurt feelings. However, if she keeps getting heads over and over, at some point you're going to think you're being tricked. How did you arrive at that conclusion? Perhaps it's just sheer intuition, and yet there's a lot of math that your intuition is doing (sometimes sloppily) behind the scenes.
Two-headed coins are rare, and (as the name suggests) they can only ever result in heads. I'm not sure how rare, but we can pick a number that sounds about right. I'd say that 99.9% of the time, the coin used in a coin toss is fair. (If you're in a place where people are more frequently using two-headed coins, you can adjust this number accordingly.)
Bayes's Theorem basically says that the probability that a hypothesis is accurate depends on its likelihood (in this case, how likely people are to use fair coins) and the prior and marginal probabilities (how well each hypothesis fits the data). On the first coin toss, when we get a heads, the probability that the coin toss is fair is 99.8%. Obviously, you'd get a heads 50% of the time, so getting one on the first toss isn't that far out of the realm of possibility.
If the second toss turns out to be a heads a well, the probability still remains at 99.6%. However, by the time you've got ten heads in a row, the probability that your coin toss is fair has dropped to less than half and twelve heads in a row is less than 20%. This is based on two-headed coins being rare. If your friend was just telling you the week before about how she found a website that sells trick coins, you might have started with a probability of only two-thirds that it's fair, in which case even the first heads drops your confidence in a fair coin to less than half. Your determination of likelihood can have a big impact on the final probability.
You might be thinking at this point, "but there are more than just 'fair' and 'two-headed' coins in the world!" Of course you're correct. To be rigorous, we should compute the probability of every possible kind of trick coin. This is one of the weaknesses of Bayesian reasoning, though an entirely reasonable one. It can only pick between known hypotheses, not create them for you. If you've never heard of the type of trick coin your friend has, she can continue to trick you with impunity and you'll be none the wiser.
It's also worth keeping in mind that the results are probabilities. Just because the results lean towards one hypothesis, that doesn't mean it's correct, just that it's the most reasonable. Probability does allow for your friend getting ten heads on ten consecutive tosses of a fair coin. It's just uncommon. More evidence can also dramatically shift your understanding. For example, if your friend tosses the coin and it comes up tails even once, the two-headed coin hypothesis's probability drops to zero. However, there are trick coins that just make heads much more likely, but to detect that you'll need a new calculation.
Bayesian reasoning allows us to determine numerical values to our hunches. It allows you to break down your intuition about a hypothesis into smaller pieces that you can examine independently. How confident are you in the likelihood or prior probability you chose? Would someone else agree? Putting those together into a formula allows you to see if your intuition is reasonable. For example, if your friend gets six heads in a row, is she definitely using a two-headed coin? If you think the prior likelihood of a fair coin is 99.9%, you'll discover the probability that she has a fair coin is still 94%, even after six heads. It's best to wait for more data before you accuse your friend of cheating.
If you're interested in the math, the Wikipedia article is a good place to start. The key idea behind Bayes's Theorem is that the reasonableness of a hypothesis does not rest on the hypothesis itself, but on how it compares to the reasonableness of all competing hypotheses. The results can be surprising, but the logic is really simple. The most reasonable hypothesis is the one that has the highest combination of commonness and explanatory power.