Monday, January 27, 2014

Bayes' Theorem, Part 1: Not Just a Mnemonic for Apostrophe Placement

If you're intimately familiar with Bayes' Theorem or profoundly bored of it, you may still find value in this post by taking a shot every time you read the words "theorem" and "disease."

I first encountered Bayes' Theorem in a high school conversation about email spam filters. I didn't retain much about either the theorem or spam filters, but promptly added the term "Bayes' Theorem" to my mental list of Things That Sound Vaguely Technical And Also Possibly Sinister. (That list includes the names of every military and/or aerospace contractor that ever existed. If you think of any exceptions, send them my way.)

Years afterward, Bayes' Theorem started cropping up in my medical biophysics studies and after-hours discussions about airport and border security. More recently, I used Bayes' Theorem to weigh forensic evidence in the upcoming documentary series To Catch a Killer. The theorem seems to appear everywhere and makes you sound smart, but just what is it?

Basically, Bayes' Theorem tells you how to update your beliefs using new information. That's the best plain-English definition I can think of. More generally, Bayes' Theorem tells you how to manipulate conditional probabilities, saving you from fallacious logic along the lines of "most Pabst drinkers are hipsters, so most hipsters drink Pabst." (It may be true that most Pabst drinkers are not hipsters, but that's not the point of this fallacy. The lesson for me is that I come up with poor examples.)

Bayes' Theorem follows directly from basic probability principles, but proper derivations tend to look like field notes by Will Hunting on how to outperform pompous Harvard grad students at impressing Minnie Driver. Accordingly, this post shall include zero equations, which is great, since I figured out how to embed equations in my last post. Instead, I'll try to show the importance of Bayes' Theorem by posing the following brain teaser to you, dear reader.


Brain Teaser: You Tested Positive for a Rare Disease; Do You Really Have It?

Imagine that a disease afflicts 0.1% of the general population, or 1 in 1000 people. A particular diagnostic test returns either "positive" or "negative" to indicate the presence or absence of the disease. Let's say you know that this test is 99% sensitive. That's a compact way of saying that out of 100 people who truly do have the disease, 99 of them will correctly test positive, whereas 1 will erroneously test negative, even though they actually have the disease. Let's also say you know that the test is 95% specific. That means that out of 100 disease-free people, 95 will correctly test negative, but 5 of these healthy people will erroneously be told that they have the disease.

Suppose you run this test on yourself, and sweet buttery Jesus, it says you're positive. This deeply distresses you, as it should if the disease in question were, say, dancing plague. As psychosomatic head-bobbing sets in, you ask yourself the following question: given the positive test result, what are the chances that I actually have dancing plague?

Take another look at those goddamn numbers. The test is 99% sensitive and 95% specific. Should you embrace your groovy fate and invest in a bell-bottomed suit and unnervingly realistic John Travolta mask? Is all hope lost? Is the jig up?!

Think it over and decide on your final answer before reading on. At the very least, don't bother with precise numbers, but decide whether you think the chance of actually having dancing plague is more or less than 50%, given your positive diagnosis.

If you haven't seen this kind of question before, the chance that your answer exceeds 50% exceeds 50%. It turns out that even though you tested positive, the chance that you have the disease is only about 2%! Choreographed celebrations are in order.


Explanation

You don't actually need to know anything about Bayes' Theorem to correctly answer the above question, though you might end up stepping through the theorem without knowing it. Here's one way to proceed.

Pick a sample of 1000 people from the general population. On average, only 1 of these people will actually have the disease. The vast majority, 999 out of 1000, will be healthy. Our initial sample thus consists of 999 healthy people and 1 sick person. Now, test them all.

Our test is 99% sensitive. That means that when the one diseased guy in our sample gets tested, he'll be correctly identified as sick 99 times out of 100. Very rarely, 1 time in 100, the test will mess up and give him a negative result.

The specificity of 95% means that most healthy people will test negative, as they should. 95% of the initial 999 healthy people, or 949.05 of them on average, will correctly be told that they're disease-free. However, the remaining 49.95 healthy people will erroneously receive positive test results, even though they're fine.

Therefore, by testing each of our starting 1000 people, we'd find an average of 0.99 correct positive diagnoses and 49.95 incorrect positive diagnoses, giving 50.94 positive diagnoses in total. Rounding off the numbers, it's obvious that about 51 people in our initial 1000 will be freaked out by positive test results. However, only one of these people will actually have the disease.

If you test positive, you could be any one of those 51 people, so try not to panic: the chance that you're the one person who actually has dancing plague is 1/51, or 1.96%.


Final Remarks

What was that about Bayes' Theorem helping to update your beliefs? "Belief" refers to one possible way to interpret what it means for a random outcome to have some numerically determined chance of occurring. In the above disease example, it's sensible to think of the chance that someone is ill as a measure of how firmly you believe that they're ill.

If you randomly chose one person from the general population and didn't test them, you'd be pretty skeptical that they're ill, since the disease is so rare. The chance that you picked someone with the disease is 1/1000. Running the test then gives you new information -- specifically, the test outcome. That outcome is sometimes wrong, but you can still use the new information to update your prior belief that the person has the disease. If the person tests positive, your belief just jumped from a prior value of 1/1000 to a "posterior" value of 1/51, a 20-fold increase.

Cliffhanger Ending

In a future post, we'll derive Bayes' Theorem and show how it applies to this and other problems. Until next time!

EDIT: Part 2 is here.

2 comments:

  1. Hi there! Oh my, I'm a tad embarrassed that you read (hopefully just skimmed) my posts about, well, nothing, and you're over here discussing Bayes' theorem. To be honest, I followed you because I nerded out when I saw you had an interest in cardiovascular physiology! Anyway, thanks for stopping by and saying hi :)

    ReplyDelete
  2. Your writing is expressive in a way I'd like mine to be and deals with topics I can definitely relate to, so I, for one, encourage you to keep on blogging. Happy to know you share an interest in marvels of the science! Do you happen to have a background in CV research yourself?

    ReplyDelete