Table of contents |
2 Bayes' theorem in probability theory 3 Bayesianism 4 Examples 5 References 6 See also |
Bayes' theorem is a result in probability theory, named after the Reverend Thomas Bayes (1702--61). Bayes worked on the problem of computing a distribution for the parameter of a binomial distribution (to use modern terminology); his work was edited and presented posthumously (1763) by his friend Richard Price, in An Essay towards solving a Problem in the Doctrine of Chances. Bayes' results were replicated and extended by Laplace in an essay of 1774, who apparently was not aware of Bayes' work.
The main result (Proposition 9 in the essay) derived by Bayes is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is
What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. That is, not only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra is used to make inferences of either kind. In contrast, according to the frequentist interpretation, there is no such thing as a probability distribution for p and therefore some non-probabilistic inference mechanism must be used to reason about p.
Bayes' theorem is used in statistical inference to update estimates of the probability that different hypotheses are true, based on observations and a knowledge of how likely those observations are, given each hypothesis. There are discrete and continuous versions of the theorem.
Bayesianism, as a school of thought, is based on the notion that Bayes' theorem can be applied to any propositions, whether they are statements about experimental variables or any other kind of statements. In contrast, according to the frequentist school, Bayes' theorem can only be applied to problems in which all statements are about experimental variables alone.
In probability theory, Bayes' theorem is a statement about conditional probabilities that allows the exchange of the order of the eventss. If A and B are two events, Bayes' theorem allows us to calculate the probability of A given B if we know the probability of B given A and the probabilities of each event alone. It is a simple theorem with broad applicability.
To derive Bayes' theorem, note first from the definition of conditional probability that
Each term in Bayes' theorem has a conventional name.
The term P(A) is called the prior probability of A.
It is "prior" in the sense that it precedes any information about B.
Equivalently, P(A) is also called the marginal probability of A.
The term P(A|B) is called the posterior probability of A, given B. It is "posterior" in the sense that it is derived from or entailed by the specified value of B.
The term P(B|A), for a specific value of B, is called the likelihood function for A.
The term P(B) is called the prior or marginal probability of B.
Bayes' theorem is often embellished by noting that
See also the law of total probability.
There is also a version of Bayes' theorem for continuous distributions.
It has the same form as in the discrete case,
but it is somewhat harder to derive, since probability densities,
strictly speaking, are not probabilities,
so Bayes' theorem has to be established by a limit process.
See Papoulis (citation below), Section 4.5 for a derivation.
The continuous case of Bayes' theorem also says the posterior distribution
results from multiplying the prior by the likelihood and then normalizing.
The prior and posterior distributions are usually identified with their
probability density functions.
For example, suppose the proportion of voters who will vote "yes" is
an unknown number p between 0 and 1. A sample of n voters is
drawn randomly from the population, and it is observed that x of
those n voters will vote "yes". The likelihood function is then
Bayesianism is the philosophical school which holds that the rules of mathematical
probability apply not only when probabilities are relative frequencies
assigned to random events, but also when they are degrees of belief
assigned to uncertain propositions. Updating these degrees of belief in light of new evidence almost invariably involves application of Bayes' theorem.
To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?
Intuitively, it seems clear that the answer should be more than 50%, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. Let H1 corresponds to bowl #1, and H2 to bowl #2.
It is given that the bowls are identical from Fred's point of view, thus P(H1) = P(H2), and the two must add up to 1, so both are equal to 50%.
The "data" D consists in the observation of a plain cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 = 75% and P(D | H2) = 20/40 = 50%. Bayes' formula then yields
False positives are a problem in any kind of test: no test is perfect, and sometimes the test will incorrectly report a positive result. For example, if a test for a particular disease is performed on a patient, then there is a chance (usually small) that the test will return a postive result even if the patient does not have the disease. The problem lies, however, not just in the chance of a false positive prior to testing, but determining the chance that a positive result is in fact a false positive. As we will demonstrate, using Bayes' theorem, if a condition is rare, then the majority of positive results may be false positives, even if the test for that condition is (otherwise) reasonably accurate.
Suppose that a test for a particular disease has a very high success rate:
Let A be the event that the patient has the disease, and B be the event that the test returns a positive result. Then, using the second form of Bayes' theorem (above), the probability of a true positive is
Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. (Nonetheless, this is 20 times the proportion before we knew the outcome of the test! The test is not useless, and re-testing may improve the reliability of the result.) In this case, Bayes' theorem helps show that the accuracy of tests for rare conditions must be very high in order to produce reliable results from a single test, due to the posibility of false positives.
Historical remarks
where m is the number of observed successes and n the number of observed failures. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes' Theorem (as described below), but it does not appear that Bayes himself emphasized or focused on that result.Bayes' theorem in probability theory
denoting by P(A,B) the joint probability of A and B.
Dividing the left- and right-hand sides by P(B), we obtain
which is Bayes' theorem.Alternative forms of Bayes' theorem
so the theorem can be restated as
where AC is the complementaryary event of A. More generally, where {Ai} forms a partition of the event space,
for any Ai in the partition.Bayes' theorem for probability densities
Multiplying that by the prior probability density funtion of p
and then normalizing gives the posterior probability distribution
of p, and thus updates probabilities in light of the new data
given by the opinion poll. Thus if the prior probability distribution
of p is uniform on the interval [0,1], then the posterior
probability distribution would have a density of the form
and this "constant" would be different from the one that appears
in the likelihood function.Bayesianism
Examples
From which bowl is the cookie?
Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability, P(H1), which is 50%.
After observing the cookie, we revise the probability to P(H1|D), which is 60%.False positives in a medical test
Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have all the information required to use Bayes' theorem to calculate the probability that, given the test was positive, that it is a false positive.
and hence the probability of a false positive is about (1 − 0.019) = 0.981.References
Versions of the essay
Commentaries
Additional material
See also