4.2: Conditional Probability
Often, it is required to compute the probability of an event given that another event has occurred.
Example \(\PageIndex{1}\)
What is the probability that two cards drawn at random from a deck of playing cards will both be Aces?
Solution
It might seem that you could use the formula for the probability of two independent events and simply multiply \(\dfrac{4}{52} \cdot \dfrac{4}{52}=\dfrac{1}{169}\). This would be incorrect, however, because the two events are not independent. If the first card drawn is an Ace, then the probability that the second card is also an Ace would be lower because there would only be three Aces left in the deck.
Once the first card chosen is an Ace, the probability that the second card chosen is also an Ace is called the conditional probability of drawing an Ace. In this case the "condition" is that the first card is an Ace. Symbolically, we write this as:
\[P(\text {Ace on second draw} \mid \text {an Ace on the first draw}) \nonumber \]
The vertical bar "|" is read as "given," so the above expression is short for "The probability that an Ace is drawn on the second draw given that an Ace was drawn on the first draw." What is this probability? After an Ace is drawn on the first draw, there are 3 Aces out of 51 total cards left. This means that the conditional probability of drawing an Ace after one Ace has already been drawn is \(\dfrac{3}{51}=\dfrac{1}{17}\).
Thus, the probability of both cards being Aces is \(\dfrac{4}{52} \cdot \dfrac{3}{51}=\dfrac{12}{2652}=\dfrac{1}{221}\)
Conditional Probability
The probability the event \(B\) occurs, given that event \(A\) has happened, is represented as \(P(B \mid A)\). This is read as “the probability of \(B\) given \(A\)"
Then, the probability that \(A\) and \(B\) both occur is this conditional probability MULTIPLIED by the probability that A occurs, i.e., \(P(A \text { and } B)=P(A) P(B \mid A)\)
Example \(\PageIndex{2}\)
Find the probability that a die rolled shows a 6, given that a flipped coin shows a head.
Solution
These are two independent events, so the probability of the die rolling a 6 is, \(\dfrac{1}{6}\) regardless of the result of the coin flip.
Example \(\PageIndex{3}\)
The table below shows the number of survey subjects who have received a speeding ticket in the last year and the color of their car. Find the probability that a randomly chosen person:
- Has a speeding ticket given they have a red car
- Has a red car given they have a speeding ticket
Solution
|
Speeding ticket |
No speeding ticket |
Total |
|
|---|---|---|---|
|
Red car |
15 |
135 |
150 |
|
Not red car |
45 |
470 |
515 |
|
Total |
60 |
605 |
665 |
- Since we know the person has a red car, we are only considering the 150 people in the first row of the table. Of those, 15 have a speeding ticket, so \[P(\text { ticket } \mid \text { red car })=\dfrac{15}{150}=\dfrac{1}{10}=0.1 \nonumber \]
- Since we know the person has a speeding ticket, we are only considering the 60 people in the first column of the table. Of those, 15 have a red car, so \[P(\text { red car } \mid \text { ticket })=\dfrac{15}{60}=\dfrac{1}{4}=0.25 \nonumber \]
Notice from the last example that \(P(B|A)\) is not equal to \(P(A|B)\).
These kinds of conditional probabilities are what insurance companies use to determine your insurance rates. They look at the conditional probability of you having accident, given your age, your car, your car color, your driving history, etc., and price your policy based on that likelihood.
Example \(\PageIndex{4}\)
If you pull 2 cards out of a deck, what is the probability that both are spades?
Solution
The probability that the first card is a spade is \(\dfrac{13}{52}=\dfrac{1}{4}\)
The probability that the second card is a spade, given the first was a spade, is \(\dfrac{12}{51}\), since there is one less spade in the deck, and one less total cards. Note \(\dfrac{12}{51}=\dfrac{4}{17}\)
The probability that both cards are spades is \(\dfrac{1}{4} \cdot \dfrac{4}{17}=\dfrac{1}{17} \approx 0.0588 \)
Example \(\PageIndex{5}\)
If you draw two cards from a deck, what is the probability that you will get the Ace of Diamonds and a black card?
Solution
You can satisfy this condition by having Case A or Case B, as follows:
Case A) you can get the Ace of Diamonds first and then a black card or
Case B) you can get a black card first and then the Ace of Diamonds.
Let's calculate the probability of Case A:
The probability that the first card is the Ace of Diamonds is \(\dfrac{1}{52}\). The probability that the second card is black given that the first card is the Ace of Diamonds is \(\dfrac{26}{51}\) because 26 of the remaining 51 cards are black. The probability is therefore \(\dfrac{1}{52} \cdot \dfrac{26}{51}=\dfrac{1}{102}\)
Now for Case B:
The probability that the first card is black is \(\dfrac{26}{52}=\dfrac{1}{2}\). The probability that the second card is the Ace of Diamonds given that the first card is black is \(\dfrac{1}{51}\). The probability of Case B is therefore \(\dfrac{1}{2} \cdot \dfrac{1}{52}=\dfrac{1}{102}\), the same as the probability of Case 1.
Recall that the probability of \(A\) or \(B\) is \(P(A) + P(B) - P(\text { A and B })\). In this problem, \(P(\text { A and B }) = 0\) since it is clear that Case A and Case B cannot both occur. Therefore, the probability of Case A or Case B is \(\dfrac{1}{102}+\dfrac{1}{102}=\dfrac{2}{102}=\dfrac{1}{51}\). The probability that you will get the Ace of Diamonds and a black card when drawing two cards from a deck is \(\dfrac{1}{51}\).
Try it Now 4
In your drawer you have 10 pairs of socks, 6 of which are white. If you reach in and randomly grab two pairs of socks, what is the probability that both are white?
Example \(\PageIndex{6}\)
A home pregnancy test was given to women, then pregnancy was verified through blood tests. The following table shows the home pregnancy test results. Find
- \(P(\text { not pregnant } \mid \text { positive test result })\)
- \(P(\text { positive test result } \mid \text { not pregnant })\)
Solution
|
Positive test |
Negative test |
Total |
|
|---|---|---|---|
|
Pregnant |
70 |
4 |
74 |
|
Not Pregnant |
5 |
14 |
19 |
|
Total |
75 |
18 |
93 |
- Since we know the test result was positive, we’re limited to the 75 women in the first column, of which 5 were not pregnant. \[P(\text { not pregnant }\mid \text { positive test result })=\dfrac{5}{75} \approx 0.067 \nonumber \]
- Since we know the woman is not pregnant, we are limited to the 19 women in the second row, of which 5 had a positive test. \[P(\text { positive test result }\mid \text { not pregnant })=\dfrac{5}{19} \approx 0.263 \nonumber \]
The second result is what is usually called a false positive: A positive result when the woman is not actually pregnant. False positives deserve close attention; due to its importance in everyday life, elaborate on this idea below.
Let us now concentrate on the more complex conditional probability problems we began looking at above.
Example \(\PageIndex{7}\)
Suppose a certain disease has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). A test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it), but the false positive rate is 5% (that is, about 5% of people who take the test will test positive even though they do not have the disease). Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease?
Solution
There are two ways to approach the solution to this problem. One involves an important result in probability theory called Bayes' theorem, which we will discuss a bit later, but for now we will use an alternative and, we hope, much more intuitive approach.
Let's break down the information in the problem piece by piece.
Suppose a certain disease has an incidence rate of 0.1% (that is, it afflicts 0.1% of the population). The percentage 0.1% can be converted to a decimal number by moving the decimal place two places to the left, to get 0.001. In turn, 0.001 can be rewritten as a fraction: 1/1000. This tells us that about 1 in every 1000 people has the disease. (If we wanted, we could write \(P(\text {disease })=0.001\))
A test has been devised to detect this disease. The test does not produce false negatives (that is, anyone who has the disease will test positive for it). This part is fairly straightforward: everyone who has the disease will test positive, or alternatively everyone who tests negative does not have the disease. (We could also say \(P(\text { positive } \mid \text { disease })=1\))
The false positive rate is 5% (that is, about 5% of people who take the test will test positive even though they do not have the disease). This is even more straightforward. Another way of looking at it is that out of every 100 people who are tested and do not have the disease, 5 will test positive even though they do not have the disease. (We could also say that \(P(\text { positive } \mid \text { no disease })=0.05\)
Suppose a randomly selected person takes the test and tests positive. What is the probability that this person actually has the disease? Here we want to compute \(P(\text { disease } \mid \text { positive })\). We already know that \(P(\text { positive } \mid \text { disease })=1\), but remember that conditional probabilities are not equal if the conditions are switched.
Rather than thinking in terms of all these probabilities we have developed, let's create a hypothetical situation and apply the facts as set out above. First, suppose we randomly select 1000 people and administer the test. How many do we expect to have the disease? Since about 1/1000 of all people are afflicted with the disease, 1/1000 of 1000 people is 1. (Now you know why we chose 1000.) Only 1 of 1000 test subjects actually has the disease; the other 999 do not.
We also know that 5% of all people who do not have the disease will test positive. There are 999 disease-free people, so we would expect (0.05)(999)=49.95 (so, about 50) people to test positive who do not have the disease.
Now back to the original question, computing \(P(\text { disease } \mid \text { positive })\). There are 51 people who test positive in our example (the one unfortunate person who actually has the disease, plus the 50 people who tested positive but don't). Only one of these people has the disease, so
\[P(\text { disease } \mid \text { positive }) \approx \dfrac{1}{51} \approx 0.0196 \nonumber \]
or less than 2%. Does this surprise you? This means that of all people who test positive, over 98% do not have the disease.
The answer we got was slightly approximate, since we rounded 49.95 to 50. We could redo the problem with 100,000 test subjects, 100 of whom would have the disease and (0.05)(99,900)=4995 test positive but do not have the disease, so the exact probability of having the disease if you test positive is
\[P(\text { disease } \mid \text { positive }) \approx \dfrac{100}{5095} \approx 0.0196 \nonumber \]
which is pretty much the same answer.
But back to the surprising result. Of all people who test positive, over 98% do not have the disease . If your guess for the probability a person who tests positive has the disease was wildly different from the right answer (2%), don't feel bad. The exact same problem was posed to doctors and medical students at the Harvard Medical School 25 years ago and the results revealed in a 1978 New England Journal of Medicine article. Only about 18% of the participants got the right answer. Most of the rest thought the answer was closer to 95% (perhaps they were misled by the false positive rate of 5%).
So at least you should feel a little better that a bunch of doctors didn't get the right answer either (assuming you thought the answer was much higher). But the significance of this finding and similar results from other studies in the intervening years lies not in making math students feel better but in the possibly catastrophic consequences it might have for patient care. If a doctor thinks the chances that a positive test result nearly guarantees that a patient has a disease, they might begin an unnecessary and possibly harmful treatment regimen on a healthy patient. Or worse, as in the early days of the AIDS crisis when being HIV-positive was often equated with a death sentence, the patient might take a drastic action and commit suicide.
As we have seen in this hypothetical example, the most responsible course of action for treating a patient who tests positive would be to counsel the patient that they most likely do not have the disease and to order further, more reliable, tests to verify the diagnosis.
One of the reasons that the doctors and medical students in the study did so poorly is that such problems, when presented in the types of statistics courses that medical students often take, are solved by use of Bayes' theorem, which is stated as follows:
Bayes' Theorem
\[P(A \mid B)=\dfrac{P(A) P(B \mid A)}{P(A) P(B \mid A)+P(\bar{A}) P(B \mid \bar{A})} \nonumber \]
In our earlier example, this translates to
\[P(\text { disease } \mid \text { positive })=\dfrac{P(\text { disease }) P(\text { positive } \mid \text { disease })}{P(\text { disease }) P(\text { positive } \mid \text { disease })+P(\text { no disease }) P(\text { positive } \mid \text { no disease })} \nonumber \]
Plugging in the numbers gives
\[P(\text { disease } \mid \text { positive })=\dfrac{(0.001)(1)}{(0.001)(1)+(0.999)(0.05)}=0.0196 \nonumber \]
which is exactly the same answer as our original solution.
The problem is that you (or the typical medical student, or even the typical math professor) are much more likely to be able to remember the original solution than to remember Bayes' theorem. Psychologists, such as Gerd Gigerenzer, author of Calculated Risks: How to Know When Numbers Deceive You , have advocated that the method involved in the original solution (which Gigerenzer calls the method of "natural frequencies") be employed in place of Bayes' Theorem. Gigerenzer performed a study and found that those educated in the natural frequency method were able to recall it far longer than those who were taught Bayes' theorem. When one considers the possible life-and-death consequences associated with such calculations it seems wise to heed his advice.
Example \(\PageIndex{8}\)
A certain disease has an incidence rate of 2%. If the false negative rate is 10% and the false positive rate is 1%, compute the probability that a person who tests positive actually has the disease.
Solution
Imagine 10,000 people who are tested. Of these 10,000, 200 will have the disease; 10% of them, or 20, will test negative and the remaining 180 will test positive. Of the 9800 who do not have the disease, 98 will test positive. So of the 278 total people who test positive, 180 will have the disease. Thus
\[P(\text { disease } \mid \text { positive }) = \dfrac{180}{278} \approx 0.647 \nonumber \]
so about 65% of the people who test positive will have the disease.
Using Bayes theorem directly would give the same result:
\[P(\text { disease } \mid \text { positive })=\dfrac{(0.02)(0.90)}{(0.02)(0.90)+(0.98)(0.01)}=\dfrac{0.018}{0.0278} \approx 0.647 \nonumber \]
Try it Now 5
A certain disease has an incidence rate of 0.5%. If there are no false negatives and if the false positive rate is 3%, compute the probability that a person who tests positive actually has the disease.
Contributors and Attributions
-
Saburo Matsumoto
CC-BY-4.0