7.1: Bayes' Theorem

Last updated
Save as PDF

Page ID: 130500

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

In this section, we will explore an extremely powerful result which is called Bayes' Theorem. This result is so important that the core idea has been generalized and a whole course can be dedicated towards it. Before we state Bayes' Theorem allow us to note why Bayes' Theorem is important.

NVIDIA is a company which specializes in hardware and software, especially those relating to artificial intelligence. According to NVIDIA, "NB [Naive Bayes] algorithms have been shown to work well on text classification use cases. They are often applied to tasks such as filtering spam emails; predicting categories and sentiment for tweets, web pages, blog posts, user ratings, and forum posts; or ranking documents and web pages". More information can be found here: https://developer.nvidia.com/blog/faster-text-classification-with-naive-bayes-and-gpus/
An excellent resource to read in your spare time is The Theory That Would Not Die (How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy). https://yalebooks.yale.edu/book/9780300188226/the-theory-that-would-not-die/

In order to understand Bayes' Theorem, we must recall a few definitions.

Definition: Mutually Exclusive Events, Exhaustive Events, Forms a Partition on $ S $

The events $E_1, E_2, \ldots, E_k$ are said to be mutually exclusive events if $ E_i \cap E_j = \emptyset $ for all $i \neq j $.
The events $E_1, E_2, \ldots, E_k$ are said to be exhaustive on $S$ if $E_1 \cup E_2 \cup \ldots \cup E_k = S$.
The events $E_1, E_2, \ldots, E_k$ are said to form a partition on $S$ if $E_1, E_2, \ldots, E_k$ are mutually exclusive events and exhaustive on $S$.

Example 1 $\PageIndex{1}$

If $ S = \{ 1, 2, 3, 4, 5, 6 \} $ and $E_1 = \{ 1, 2, 3 \}$ and $E_2 = \{ 4, 5, 6 \}$ then $E_1$ and $E_2$ form a partition on $S $.
If $ S = \{ 1, 2, 3, 4, 5, 6 \} $ and $E_1 = \{ 1, 2 \}$ and $E_2 = \{ 3, 4 \}$ and $E_3 = \{ 5, 6 \}$ then $E_1$, $E_2$, $E_3$ forms a partition on $S$.
If $S$ is the sample space for some experiment, then $A$ and $A^c$ forms a partition on $S$.

At this point, we might wonder what exactly a partition does to a sample space. The answer is that a partition splits or fragments the sample space into disjoint pieces. For instance, if $B_1, B_2, B_3 $ and $B_4$ partitions $S$, then the picture might look like this:

$7.1.png$

Notice that $B_1, B_2, B_3 $ and $B_4$ are all mutually exclusive and that their union gives back the entire sample space $S$.

The question we will not consider is why are partitions important? Quite simply, partitions are important because they will help us compute the probability of an event $ A $. How so? Well let us assume that $B_1, B_2, B_3 $ and $B_4$ forms a partition on $S$ and suppose we have some event $A$ in $S$. Consider the following diagram:

$myImage.png$

We see that the event $A = A_1 \cup A_2 \cup A_3 \cup A_4 = (B_1 \cap A) \cup (B_2 \cap A) \cup (B_3 \cap A) \cup (B_4 \cap A) $. Notice that $(B_1 \cap A), (B_2 \cap A), (B_3 \cap A)$ and $(B_4 \cap A) $ are all mutually exclusive. This means we have the following:

\begin{align*} P(A) & = P(A_1 \cup A_2 \cup A_3 \cup A_4) \\ & = P \big[(B_1 \cap A) \cup (B_2 \cap A) \cup (B_3 \cap A) \cup (B_4 \cap A) \big] \\ &= P(B_1 \cap A) + P(B_2 \cap A) + P(B_3 \cap A) + P(B_4 \cap A) ~~~ \text{by the Basic Conditioning Rule yields} \\&= P(B_1) P(A|B_1) + P(B_2) P(A|B_2) + P(B_3) P(A|B_3)+ P(B_4) P(A|B_4) \\ &= \sum_{j=1}^{4} P(B_j) P(A | B_j) \end{align*}

The generalization of the above discussion yields The Law of Total Probability.

Theorem: The Law of Total Probability $\PageIndex{2}$

Theorem: Suppose the events $ B_1, B_2, \ldots, B_k $ forms a partition on a sample space $S$ such that $ P(B_j) > 0 $ for $ j = 1, 2, 3, \ldots, k $. Then for every event $A$ in $S$,

\[ P(A) = \sum_{j=1}^{k} P(B_j) P(A | B_j) \]

Proof

Since $ B_1, B_2, \ldots, B_k $ forms a partition on the sample space $S$, then the events $(B_1 \cap A), (B_2 \cap A), \ldots $ and $(B_k \cap A) $ forms a partition on $A$. That is, $(B_1 \cap A) \cup (B_2 \cap A) \cup \ldots \cup (B_k \cap A) = A$ and the events $(B_1 \cap A), (B_2 \cap A), \ldots $ and $(B_k \cap A) $ are mutually exclusive. Hence, we obtain the following:

\begin{align*} P(A) & = P \big[ (B_1 \cap A) \cup (B_2 \cap A) \cup \ldots \cup (B_k \cap A) \big] \\ &= P(B_1 \cap A) + P(B_2 \cap A) + \ldots + P(B_k \cap A) ~~~ \text{by the Basic Conditioning Rule yields} \\&= P(B_1) P(A|B_1) + P(B_2) P(A|B_2) + \ldots + P(B_k) P(A|B_4) \\ &= \sum_{j=1}^{4} P(B_j) P(A | B_j) \end{align*}

Example 2 $\PageIndex{3}$

Suppose Player 1 draws a card at random, without replacement, from a regular deck of 52 cards and the card remains facedown. Player 2 then draws a card at random from the deck of 51 cards. Show that the probability that Player 2 draws an ace is $ \frac{4}{52} $.

Answer 1

Although this question has a simple counting solution (can you find it?), we will opt for a solution rooted in conditional probabilities. The difficulty in this problem is that we are aware of the card that Player 1 has drawn. That is, either Player 1 drew an ace or Player 1 drew a non-ace. Let

\begin{align*} B_1 &= \{ \text{Player 1 drew an ace} \} \\ B_2 &= \{ \text{Player 1 drew a non-ace} \} \end{align*}

Notice that $ B_1 $ and $ B_2 $ forms a partition on $S$. We wish to find the probability that Player 2 draws an ace and so we let $A = \{ \text{Player 2 draws an ace} \} $. By The Law of Total Probability, we obtain the following:

\begin{align*} P(A) &= P(B_1) P(A | B_1) + P(B_2) P( A | B_2 ) \\ &= \frac{4}{52} \bigg( \frac{3}{51} \bigg) + \frac{48}{52} \bigg( \frac{4}{51} \bigg) \\ &= \frac{4}{52} \\ &= \frac{1}{13} \end{align*}

Answer 2

Alternatively, we may make a tree diagram. To do so, our first level of branches represents our partition while the second level of branches represents the probability of the event we are interested in studying.

$Conditional1.png$

We are asked to find the probability that Player 2 has an ace and so we sum the products of the appropriate branches together. Doing so yields:

\[ \frac{4}{52} \bigg( \frac{3}{51} \bigg) + \frac{48}{52} \bigg( \frac{4}{51} \bigg) = \frac{4}{52} = \frac{1}{13} \nonumber\ \]

Example 3 $\PageIndex{4}$

Suppose a 15 minute rapid antigen test for the SARS-CoV-2 virus is 80.2% effective in detecting the virus when it is present. However, the test also yields a false positive 8% of the time, meaning if you do not have the disease, then the test will say you are positive for the disease with probability 8%. Assume that 3.39% of people living in Queens, New York has the virus. Suppose a person living in Queens takes the test. Find the probability that the test comes back positive.

Answer 1

As usual, we first must introduce a partition. Let $B_1 = \{ \text{the person has the disease} \} $ and $B_2 = \{ \text{the person does not has the disease} \} $. Notice that $ B_1 $ and $ B_2 $ forms a partition on $S$. We wish to find the probability that the person has the disease and so let $A = \{ \text{the person has the disease} \} $. By The Law of Total Probability, we obtain the following:

\begin{align*} P(A) &= P(B_1) P(A | B_1) + P(B_2) P( A | B_2 ) \\ &= 0.0339(.802) + 0.9661(0.08) \\ & \approx 0.1045 \end{align*}

Answer 2

$Condition 7.2.png$

We are asked to find the probability that the person tests positive and so we sum the products of the appropriate branches together. Doing so yields:

\[ 0.0339(.802) + 0.9661(0.08) \approx 0.1045 \nonumber\ \]

Example 4 $\PageIndex{5}$

A helicopter is missing and it is presumed that the helicopter was equally likely to have crash landed in any of three possible regions. There is a 60% chance the helicopter will be found upon a search of region 1 when the helicopter, is in fact, in that region. This probability is 80% and 20% for regions 2 and 3 respectively. What is the probability that the helicopter will not be found in region 1?

Note: Some students may wonder as to why there isn't a 100% chance the helicopter will be found upon a search of region 1 when the helicopter, is in fact, in that region. The answer is just because you may know an object is located within a certain region, it is not necessarily guaranteed that you will find it! Think about how many times you may have lost an item in your house or apartment. Just because you know the object is in your house, it does not mean that you are guaranteed to find it! There is some probability that you will overlook the item. Similarly, the same thing is true in real life. Just because we may know that the helicopter crash landed in region 1, it is not guaranteed that we will find the helicopter when we search region 1. This is due to environmental and geographic features. For instance, if region 1 is a heavily forested area, then it may be difficult to spot the helicopter from above.

Answer 1

As usual, we must first introduce a partition. Let $B_i = \{ \text{the helicopter landed in region} ~ i \} $. Notice that the events $B_1, B_2 $ and $B_3$ forms a partition on $S$. We wish to find the probability that the helicopter will not be found in region 1 and so let $A = \{ \text{the helicopter is not found in region 1} \} $. By The Law of Total Probability, we obtain the following:

\begin{align*} P(A) &= P(B_1) P(A | B_1) + P(B_2) P( A | B_2 ) + P(B_3) P( A | B_3 ) \\ &= \frac{1}{3} (0.4) + \frac{1}{3} (1) + \frac{1}{3} (1) \\ & = 0.8 \end{align*}

Answer 2

$Helicopter_2.png$

We are asked to find the probability that the helicopter is not found in region 1 and so we sum the products of the appropriate branches together. Doing so yields \begin{align*} \frac{1}{3} (0.4) + \frac{1}{3} (1) + \frac{1}{3} (1) = 0.8 \end{align*}

Answer 3

Alternatively, we may make the probability tree where the first level still represents the partition, but the second level is determined by the information directly stated in the question.

$Helicopter_1.png$

Note: You may equivalently note that $ 1 - \frac{1}{3}(0.6) = 1 - 0.2 = 0.8 $.

With The Law of Total Probability under our belts, we are now in a position to state one of the most important results in probability and statistics - Bayes' Theorem.

Theorem: Bayes' Theorem $\PageIndex{6}$

Theorem: Suppose the events $B_1, B_2, \ldots, B_k $ forms a partition on a sample space $S$ such that $ P(B_j)>0 $ for $ j = 1, 2, \ldots, k $ and also suppose that $A$ is an event in $S$ such that $P(A)>0$. Then for $i = 1, 2, \ldots, k $, \[ P(B_i | A) = \frac{P(B_i)P(A|B_i)}{\sum_{j=1}^{k} P(B_j) P(A | B_j)} \]

Proof:: \begin{align*} P(B_i | A) &= \frac{P(B_i \cap A)}{P(A)} \\ \\ &= \frac{P(B_i)P(A|B_i)}{P(A)} \\ \\ &= \frac{P(B_i)P(A|B_i)}{\sum_{j=1}^{k} P(B_j) P(A | B_j)} \end{align*}

Example 5 $\PageIndex{7}$

Suppose a 15 minute rapid antigen test for the SARS-CoV-2 virus is 80.2% effective in detecting the virus when it is present. However, the test also yields a false positive 8% of the time. Assume that 3.39% of people living in Queens, New York has the virus. Suppose a person living in Queens takes the test and it is learned that the test comes back positive. Find the probability that the person actually has the disease.

Answer 1

We will stick to the same notation used in Example 3. Let $B_1 = \{ \text{the person has the disease} \} $, $B_2 = \{ \text{the person does not has the disease} \} $ and $A = \{ \text{the person has the disease} \} $. We are asked to find $P(B_1 | A) $. By Bayes' Theorem,

\begin{align*} P(B_1 | A) &= \frac{P(B_1)P(A|B_1)}{\sum_{j=1}^{2} P(B_j) P(A | B_j)} \\ \\ &= \frac{P(B_1)P(A|B_1)}{P(B_1)P(A|B_1) + P(B_2)P(A|B_2)} \\ \\ &= \frac{0.0339(.802)}{0.0339(.802) + 0.9661(0.08) } \\ \\ & \approx 0.2602 \end{align*}

Answer 2

Alternatively, we may use our diagram to answer this question.

$Condition 7.2.png$

We are asked to find $ P( \text{have the virus} | \text{tested positive}) $. By Bayes' Theorem, we obtain the following:

\begin{align*} P( \text{have the virus} | \text{tested positive} ) &= \frac{P(\text{has the virus AND tests positive})}{P(\text{tests positive})} \\ \\ &= \frac{0.0339(.802)}{0.0339(.802) + 0.9661(0.08) } \\ \\ & \approx 0.2602 \end{align*}

Example $\PageIndex{8}$

Answer 1

We stick to the same notation as in Example 4. Let $B_i = \{ \text{the helicopter landed in region} ~ i \} $ and let $A = \{ \text{the helicopter is not found in region 1} \} $. By Bayes' Theorem, we obtain the following:

\begin{align*} P(B_1 | A) &= \frac{P(B_1 \cap A)}{P(A)} \\ \\ &= \frac{P(B_1 \cap A)}{P(B_1) P(A | B_1) + P(B_2) P( A | B_2 ) + P(B_3) P( A | B_3 )} \\ \\ &= \frac{\frac{1}{3} \times 0.4}{\bigg( \frac{1}{3} \times 0.4 \bigg) + \bigg(\frac{1}{3} \times 1 \bigg)+ \bigg(\frac{1}{3} \times 1 \bigg)} \\ \\ &= \frac{1}{6}\end{align*}

Answer 2

Alternatively, we may use the following tree diagram.

$Helicopter_2.png$

We are asked to find $ P( \text{the helicopter is in region 1} | \text{the helicopter is not found in region 1}) $. By Bayes' Theorem, we obtain the following:

\begin{align*} P( \text{the helicopter is in region 1} | \text{the helicopter is not found in region 1}) &= \frac{P(\text{helicopter is in region 1 AND not the helicopter is not found in region 1})}{P(\text{the helicopter is not found in region 1})} \\ \\ &= \frac{ \frac{1}{3} \times 0.4}{ \frac{1}{3} (0.4) + \frac{1}{3} (1) + \frac{1}{3} (1) } \\ \\ & = \frac{1}{6} \end{align*}

Answer 3

With regards to the alternative tree diagram:

$Helicopter_1.png$

\begin{align*} P( \text{the helicopter is in region 1} | \text{the helicopter is not found in region 1}) &= \frac{P(\text{helicopter is in region 1 AND not the helicopter is not found in region 1})}{P(\text{the helicopter is not found in region 1})} \\ \\ &= \frac{ \frac{1}{3} \times 0.4}{ \frac{1}{3} (0.4) + \frac{1}{3} (0.8) + \frac{1}{3} (0.2) + \frac{1}{3} (0.2) + \frac{1}{3} (0.8) } \\ \\ & = \frac{1}{6} \end{align*}

Example $\PageIndex{9}$

A life insurance agency believes that it's clients who are senior citizens can be divided into two classes: those who are in good health and those who are not in good health. The agency reports that a senior citizen in good health will pass away within a one year period with probability 0.09 whereas this probability is 0.26 for the senior citizen who is not in good health. Suppose that 10% of the agency's senior citizens are in good health. Given that a senior citizen passed away, what is the probability that the senior citizen was not in good health?

Answer

Here, we only present the tree diagram. $Condition_SeniorCitizens.png$

Looking at the above diagram, we see that

\begin{align*} P( \text{senior citizen is not in good health} | \text{senior citizen passed away} ) &= \frac{P( \text{senior citizen is not in good health AND passes away} ) }{P(\text{senior citizen passed away})} \\ \\ &= \frac{0.9(0.26)}{0.1(0.09) + (0.9)(0.26)} \\ \\ & \approx 0.9630 \end{align*}

We will end this section with a discussion of The Monty Hall Problem.

Example: The Monty Hall Problem $\PageIndex{10}$

Suppose that you are a contestant on a game show and you have to choose between three closed doors. Whatever is behind the chosen door is your prize. Behind one door is a brand new car which is what you would like to have and behind the other two doors are goats. The host of the show, Monty, knows where the cars and goats are located, but you, being the contestant, does not. Suppose you pick a door at random.For concreteness, let us suppose that you pick Door 1. Before Monty opens the door you chose, he randomly opens up a different door which reveals a goat behind it. (Note that he never opens the door with the car first). For concreteness, let us suppose that Monty opens up Door 3 to reveal the goat. At this point in time, there are only two doors remaining - your door, Door 1, and the other door, Door 2. Monty then asks you if you would like to keep your door or if you would like to switch to the remaining door. What should you do and why?

Take a moment and think about the question before reading further!

Caution

Students who are unfamiliar with the above problem often have the following thought process: after Monty reveals the door there are only two possible outcomes:

My initial door has the car and the other door has the goat.
My initial door has the goat and the other door has the car.

Hence if I keep my door, there is a 1 out of 2 chance or a 50% chance that I win. Similarly, if I switch, there is a 50% chance that I win and so I am indifferent to switching since everything is equally likely.

If your thought process was similar to the above argument, then you are not alone.

According to Wikipedia, (https://en.Wikipedia.org/wiki/Monty_Hall_problem):

"The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let's Make a Deal and named after its original host, Monty Hall. The problem was originally posed (and solved) in a letter by Steve Selvin to the American Statistician in 1975. It became famous as a question from reader Craig F. Whitaker's letter quoted in Marilyn vos Savant's "Ask Marilyn" column in Parade magazine in 1990:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

Savant's response was that the contestant should switch to the other door. Under the standard assumptions, the switching strategy has a 2/3 probability of winning the car, while the strategy of sticking with the initial choice has only a 1/3 probability.

Many readers of Savant's column refused to believe switching is beneficial and rejected her explanation. After the problem appeared in Parade, approximately 10,000 readers, including nearly 1,000 with PhDs, wrote to the magazine, most of them calling Savant wrong. Even when given explanations, simulations, and formal mathematical proofs, many people still did not accept that switching is the best strategy. Paul Erdős, one of the most prolific mathematicians in history, remained unconvinced until he was shown a computer simulation demonstrating Savant's predicted result.

The problem is a paradox of the veridical type, because the solution is so counterintuitive it can seem absurd but is nevertheless demonstrably true."

Let us suppose that there is a 50% chance of winning if we stay with our door. This means if we were to conduct the experiment 1000 times, we would expect to win, on average, 500 of time. Let us see if that is indeed true by running a simulation in Java.

Using a Java compiler such as https://www.jdoodle.com/online-java-compiler/, run the following code provided by Miguel R. Osorio, Queens College:

Code

import java.util.*;

public class Main

{

public static void main(String args[])

{

//this variable holds the number of correct outcomes from running our simulations

int correctOutcomes = 0;

//this is the amount of simulations we want to run

int numSimulations = 1000;

//this for loop runs our simulation of a single monty hall problem, then, if our simulation

//was successful and we increase our variable for the amount of correct outcomes by 1 otherwise

//we do not add anything

//we do this process "numSimulations" times

for(int i = 0; i < numSimulations; i++)

{

if(runSimOnce() == 1)

correctOutcomes++;

//correctOutcomes += (runSimOnceWithSwap() == 1) ? 1 : 0; this is another way to do this but

//it is less efficient

}

System.out.println("Number of times we win the car if we stay with our door (out of 1000): " + correctOutcomes);

System.out.println("Number of times we lose the car if we stay with our door (out of 100) : " + (numSimulations - correctOutcomes));

//reset the number of correct outcomes after running the simulation without swapping

correctOutcomes = 0;

//this for loop runs our simulation of a single monty hall problem, then, if our simulation

//was successful and we increase our variable for the amount of correct outcomes by 1 otherwise

//we do not add anything

//we do this process "numSimulations" times

for(int i = 0; i < numSimulations; i++)

{

if(runSimOnceWithSwap() == 1)

correctOutcomes++;

//correctOutcomes += (runSimOnceWithSwap() == 1) ? 1 : 0; this is another way to do this but

//it is less efficient

}

System.out.println("\nNumber of times we win the car if we switch our door (out of 1000): " + correctOutcomes);

System.out.println("Number of times we lose the car if we switch our door (out of 1000): " + (numSimulations - correctOutcomes));

System.out.println("Total number of simulations ran: " + (numSimulations * 2));

System.out.println(numSimulations + " ran with staying aka NOT switching");

System.out.println(numSimulations + " ran with switching");

}

/**

* @function sunRimOnce

* this function simulates the monty hall problem by choosing randomly, and without revealing

* one of our wrong choices, giving around a 33% chance of success and returning the outcomes as

* an integer where 1 means we successfully picked the gate that had the car and 0 means we picked

* wrong.

public static int runSimOnce()

{

//this variable is our set of all gates, all of them start at 0 meaning the car is not behind

//that door

int[] gates = {0, 0, 0};

//calls our random number generator so that we can use it

Random r = new Random();

/*this chooses a random gate x where x = random number between 0 and

* the amount of gates not including the last number so [0,3)

* out of all the possible gates then assigns it the value 1 or effectively

* "places the car behind that door"

gates[r.nextInt(gates.length)] = 1;

* this selects a gate at random, all with equal probability regardless of how many simulations

* are run

int choice = r.nextInt(gates.length);

* return 1 or "success" if the content or value of our "gate" is the car(1) or 0 otherwise

return (gates[choice] == 1) ? 1 : 0;

}

/**

* @function sunRimOnceWithSwap

* this function simulates the monty hall problem by choosing randomly, and then swapping

* our original choice after one of the incorrect gates was revealed.

* @return 1 if we got the car or 0 if we did not get it after swapping

public static int runSimOnceWithSwap()

{

//this variable is our set of all gates, all of them start at 0 meaning the car is not behind

//that door

int[] gates = {0, 0, 0};

//calls our random number generator so that we can use it

Random r = new Random();

/*this chooses a random gate x where x = random number between 0 and

* the amount of gates not including the last number so [0,3)

* out of all the possible gates then assigns it the value 1 or effectively

* "places the car behind that door"

gates[r.nextInt(gates.length)] = 1;

* this selects a gate at random, all with equal probability regardless of how many simulations

* are run

int choice = r.nextInt(gates.length);

* this section of the code will first choose which gate to reveal, and change our

* "choice" to the gate that was NOT revealed AND we didn't choose first

if((findReveal(gates, choice) == 2 && choice == 1) || (findReveal(gates, choice) == 1 && choice == 2))

choice = 0;

else if((findReveal(gates, choice) == 2 && choice == 0) || (findReveal(gates, choice) == 0 && choice == 2))

choice = 1;

else if((findReveal(gates, choice) == 1 && choice == 0) || (findReveal(gates, choice) == 0 && choice == 1))

choice = 2;

* return 1 or "success" if the content or value of our "gate"

* after swapping our choice is the car(1) or 0 otherwise

return (gates[choice] == 1) ? 1 : 0;

}

/**

* @function findReveal

* this function finds the gate that we should reveal and returns it as a value bound by [0, length of our set)

* @param gates this is the set of total gates that we first started with

* @param originalChoice this is the index of the "gate" that we originally chose

* @return the gate that is NOT the one we chose AND does not have the car behind it

public static int findReveal(int[] gates, int originalChoice)

{

int reveal = 0;

for(int i = 0; i < gates.length; i++)

reveal = (gates[i] == 0 && i != originalChoice) ? i : reveal;

return reveal;

}

We see that if we were to stay with our door, we only win a little over 300 times. However, if we were to switch with our door, we win will over 600 times. How can we see this mathematically?

Example $\PageIndex{11}$

Answer

As usual, we introduce a partition. Let $C_i = \{ \text{the car is behind Door} ~ i \} $ and let $A = \text{Monty opened up Door 3} $. We wish to find the probability that the car is behind Door 1 given this information $A$ and compare it to the probability that the car is behind Door 2 given this information $A$. By Bayes Theorem, we obtain the following:

\begin{align*} P(C_1 | A) &= \frac{P(C_1)P(A|C_1)}{P(C_1)P(A|C_1) + P(C_2)P(A|C_2) + P(C_3)P(A|C_3)} \\ \\ &= \frac{ \frac{1}{3} P(A|C_1)}{ \frac{1}{3} P(A|C_1) + \frac{1}{3} P(A|C_2) + \frac{1}{3} P(A|C_3)} \\ \\ \\ \\ &= \frac{ \frac{1}{3} \bigg( \frac{1}{2} \bigg) }{ \frac{1}{3} \bigg( \frac{1}{2} \bigg) + \frac{1}{3} (1) + \frac{1}{3} (0)} \\ \\ &= \frac{1}{3} \end{align*}

This shows us that the probability that the car is behind Door 1, given that Monty opened up Door 3 is $ \frac{1}{3} $. Thus the probability that the car is behind Door 2, given that Monty opened up Door 3 must be $ \frac{2}{3} $. Alternatively, note by Bayes' Theorem that

\begin{align*} P(C_2 | A) &= \frac{P(C_2)P(A|C_2)}{P(C_1)P(A|C_1) + P(C_2)P(A|C_2) + P(C_3)P(A|C_3)} \\ \\ &= \frac{ \frac{1}{3} P(A|C_2)}{ \frac{1}{3} P(A|C_1) + \frac{1}{3} P(A|C_2) + \frac{1}{3} P(A|C_3)} \\ \\ \\ \\ &= \frac{ \frac{1}{3} \bigg( 1 \bigg) }{ \frac{1}{3} \bigg( \frac{1}{2} \bigg) + \frac{1}{3} (1) + \frac{1}{3} (0)} \\ \\ &= \frac{2}{3} \end{align*}

Hence, we are twice as likely to win the car by switching doors.

If you are still not convinced of the result proven above, allow us to make a simple observation about the Monty Hall problem. We can have one of two strategies:

Strategy 1: Our strategy is to switch doors.

Strategy 2: Our strategy is to keep our door.

Under Strategy 1, notice that we always win by switching is if we select the door with a goat behind it. This forces Monty to open up the other door with the goat and so by switching we must land on the car.

Under Strategy 2, the only time we win by keeping our door is if the initial door we select has the car.

Notice under Strategy 1, there is a $ \frac{2}{3} $ chance we select the door with the goat behind it whereas under Strategy 2, there is a $ \frac{1}{3} $ chance we select the door with the car. Since we are more likely to select the door with the goat, Strategy 1 is far more beneficial for us.

If you wish to see the tree diagram, then here is one of (many!) explanations: https://www.youtube.com/watch?v=cphYs1bCeDs

Search

Text Color

Text Size

Margin Size

Font Type

Example 1 \(\PageIndex{1}\)

Definition: Mutually Exclusive Events, Exhaustive Events, Forms a Partition on \( S \)

Example 1 \(\PageIndex{1}\)

Theorem: The Law of Total Probability \(\PageIndex{2}\)

Example 2 \(\PageIndex{3}\)

Example 3 \(\PageIndex{4}\)

Example 4 \(\PageIndex{5}\)

Theorem: Bayes' Theorem \(\PageIndex{6}\)

Example 5 \(\PageIndex{7}\)

Example \(\PageIndex{8}\)

Example \(\PageIndex{9}\)

Example: The Monty Hall Problem \(\PageIndex{10}\)

Caution

Example \(\PageIndex{11}\)