This semester I am taking a class in Bayesian inference and today we came across an interesting example that I would like to share with you. The post will assume a basic knowledge of probability theory, but it will be nothing a visit to Wikipedia can’t handle.

**Bayes’ Theorem**

Throughout I will assume we have appropriate measure spaces lurking in the background, but suppress reference to them where possible.

Given two events with , the conditional probability of given is defined to be .

This intuitively captures the proportion of the measure of event in , so given that has happened, this tells you how likely is.

This can be used twice to relate the both conditional probabilities and we call it *Bayes’ Theorem*: .

One intuitive way to think of this is where represents a hypothesis, and represents data or evidence. Then Bayes’ Theorem describes how to update your belief in the hypothesis based on this evidence.

We often call the *prior, *our initial degree of belief in . In this spirit we call the *posterior*, describing our beliefs now that we’ve found out that . The conditional is called the *liklihood, *this measures how likely the evidence is given the data, this is often easier to calculate that the other way around, and this asymmetry is the reason Bayes’ theorem is useful.

The theorem may be naturally rephrased for the distributions of random variables where

for continuous , and similarly for discrete random variables or combinations of the two.

**A First Example**

Let’s suppose that is a Boolean random variable so that , for some probability . In other words has a Bernoulli distribution with parameter . In our example takes as its argument a member of a population and returns if they have a certain disease, and otherwise, with probability . We call this the *prevalence *of the disease.

Now we introduce another random variable which is a test for the disease, call it . This also takes values in where 1 means the test predicts that a person has the disease, and 0 that they do not. Like most tests however it is imperfect, to describe this we specify some conditional distributions: . In medical parlance, is the *sensitivity *of the test, and the *specificity*.

Now we want to know, given these parameters, what is the probability that a person has the disease, given that the test says they do?

Translating this, we are looking for . So let’s use Bayes’ Theorem to work it out:

,

just plugging for the top, next we use the law of total probability to deduce that

,

and rearrange using the conditional probability formula to give

Now suppose that the sensitivity and that the specificity , a pretty good test I’m sure you’d agree.

Let’s suppose that the prevalence , then we calculate the probability of a person having the disease given a positive test: ! So a person with a positive result has a less than 5 percent chance of having the disease. I thought that this was extremely surprising, and it comes down to the prevalence of the disease, basically the less prevalent, the better your test needs to be to detect it above the noise.

**An example**

This got me thinking, how many positive tests would you need to conclude that a person has a disease? In other words, how fast, if it at all, does the probability of a person having the disease go to 1 as the number of positive tests, in a row, goes to infinity?

To do this we’ll think about a guy Bob who has just walked into the clinic to get tested. Because of recent legislation, everyone in the entire world has to get tested for this disease, and so the prevalence amongst those getting tested is just the same as the prevalence of the disease in general. We call this prevalence as before and treat this as known.

This time will be a random variable that is our prior belief about whether Bob has the disease, this will again be the Bernoulli distribution with parameter .

The difference this time is we will perform multiple tests on the same person, but that it returns positive or negative each time ‘independently’. Now for a sequence of tests identically distributed tests it is of course *not *the case that they are independent as it is plain that . What we want instead is *conditional independence*, namely that

for all .

What this means is that the probability of a test getting it wrong or right is independent of the person it is testing, and depends only on whether or not the person has the disease.

Now we update our prior given that we have positive tests, that is , which we’ll calll for brevity:

now using the conditional independence, and the law of total probability as before gives:

.

and isolating the dependence on :

and hence as iff as iff . Also notice that we need to assume that is not 1 or 0, otherwise there is nothing to do since the prevalence is total or non-existent.

So how fast does it converge? As you can see it goes to 1 at a rate of , faster than any polynomial! So we say that event *occurs with* *overwhelming probability.*