4.5: Total probability

Last updated
Save as PDF

Page ID: 95649

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There’s a very useful fact that goes by the grandiose name “The Law of Total Probability." It goes like this. If there’s an event whose probability we’d like to know, we can split it up into pieces and add up their probabilities, as long as we do it in the right way.

“The right way" bit is the key, of course. And it has to do with partitions. Recall from section 12 that a partition of a set is a mutually exclusive and collectively exhaustive group of subsets. One example is that every set and its complement together form a partition of \(\Omega\). By the same token, for any sets \(A\) and \(B\), these two sets together form a partition of \(A\):

\[\begin{aligned} A \cap B \\ A \cap \overline{B}\end{aligned}\]

This is worth taking a moment to understand completely. Suppose \(A\) is the set of all WWE professional wrestling fans, and \(B\) is the set of all people born in southern states. The first set listed above, \(A \cap B\) contains professional wrestling fans born in southern states, and the second set, \(A \cap \overline{B}\), the wrestling fans not born in southern states. Clearly, every wrestling fan is in one of these two sets, and no fan is in both. So it’s a partition of \(A\). This works for any two sets \(A\) and \(B\): \(A \cap B\) and \(A \cap \overline{B}\) are a partition of \(A\). We’re just dividing up the A’s into the A’s that are also B’s, and the A’s that are not B’s. Every A is in one (and just one) of those groups.

This idea can be extended to more than two sets. Let \(C_1\) be the set of all people born in southern states, \(C_2\) the set of people born in western states, and \(C_3\) those not born in either region. (The set \(C_3\) includes lots of things: people born in Ohio, people born in Taiwan, and ham sandwiches, among others.) The following three sets, then, together form another partition of \(A\): \(A \cap C_1\), \(A \cap C_2\), and \(A \cap C_3\). This is because every professional wrestling fan is either born in the south, or born in the west, or neither one.

Okay, now back to probability. In the two-set case, no matter what the event \(A\) is, we can divide up its probability like this:

\[\begin{aligned} \text{Pr}(A) &= \text{Pr}(A \cap B) + \text{Pr}(A \cap \overline{B}) \\ &= \text{Pr}(A|B) \text{Pr}(B) + \text{Pr}(A|\overline{B}) \text{Pr}(\overline{B})\end{aligned}\]

where \(B\) is any other event. The last step makes use of the conditional probability definition from above. We’re dividing up A into the B’s and the non-B’s, in a strategy to determine A’s probability. In the general case, if \(N\) sets named \(C_k\) (where \(k\) is a number from 1 to \(N\)) make up a partition of \(\Omega\), then:

\[\begin{aligned} \text{Pr}(A) &= \text{Pr}(A \cap C_1) + \text{Pr}(A \cap C_2) + \cdots + \text{Pr}(A \cap C_N) \\ &= \text{Pr}(A|C_1) \text{Pr}(C_1) + \text{Pr}(A|C_2) \text{Pr}(C_2) + \cdots + \text{Pr}(A|C_N) \text{Pr}(C_N) \\ &= \sum_{k=1}^N{\text{Pr}(A|C_k) \text{Pr}(C_k)}\end{aligned}\]

is the formula.¹

Let’s take an example of this approach. Suppose that as part of a promotion for Muvico Cinemas movie theatre, we’re planning to give a door prize to the \(1000^{\text{th}}\) customer this Saturday afternoon. We want to know, though, the probability that this person will be a minor. Figuring out how many patrons overall will be under 18 might be difficult. But suppose we’re showing these three films on Saturday: The Avengers, Black Swan, and Dr. Seuss’s The Lorax. We can estimate the fraction of each movie’s viewers that will be minors: .6, .01, and .95, respectively. We can also predict how many tickets will be sold for each film: 2,000 for the Avengers, 500 for Black Swan, and 1,000 for Lorax.

Applying frequentist principles, we can compute the probability that a particular visitor will be seeing each of the movies:

Pr(Avengers) = \(\frac{2000}{2000+500+1000} = .571\)
Pr(BlackSwan) = \(\frac{500}{2000+500+1000} = .143\)
Pr(Lorax) = \(\frac{1500}{2000+500+1000} = .286\)

To be clear: this is saying that if we select a visitor at random on Saturday, the probability that they will be seeing The Avengers is .571.

But (and this is the trick) we can also compute the conditional probability that an attendee of each of these films will be a minor:

\[\begin{aligned} \text{Pr(minor}|\text{Avengers)} &= .6 \\ \text{Pr(minor}|\text{BlackSwan)} &= .01 \\ \text{Pr(minor}|\text{Lorax)} &= .95\end{aligned}\]

In words: “If we know that a visitor is coming to see The Avengers, there’s a .6 probability that they’ll be a minor." We’re using the background knowledge to determine the conditional probability. It might be hard to figure out the probability of minors in general, but easier to figure out the probability of minors watching a specific movie.

Now, it’s just a matter of stitching together the parts:

\[\begin{aligned} \text{Pr(minor)} = &\ \text{Pr(minor}|\text{Avengers) Pr(Avengers)} + \\ &\ \text{Pr(minor}|\text{BlackSwan) Pr(BlackSwan)} + \\ &\ \text{Pr(minor}|\text{Lorax) Pr(Lorax)}\\ = &\ .6 \cdot .571 + .01 \cdot .143 + .95 \cdot .286 \\ = &\ .343 + .00143 + .272 \approx .616\end{aligned}\]

In words, there are three different ways for a visitor to be a minor: they could be an Avengers fan and a minor (pretty likely, since there’s lots of Avengers fans), or a Black Swan fan and a minor (not likely), or a Lorax fan and a minor (fairly likely, since although there’s not a ton of Lorax fans overall, most of them are minors). Adding up these probabilities is legit only because the three movies form a partition of the visitors (i.e., every visitor is there to see one and only one movie).

The Law of Total Probability comes in handy in scenarios where there’s more than one “way" for an event to occur. It lets you break that event apart into the different ways, then apply your knowledge of the likelihood of each of those ways in order to compute the grand, overall probability of the event.

If you’re not familiar with the notation in that last line, realize that Σ (a capital Greek “sigma”) just represents a sort of loop with a counter. The “k = 1” under the sign means that the counter is k and starts at 1; the “N” above the sign means the counter goes up to N, which is its last value. And what does the loop do? It adds up a cumulative sum. The thing being added to the total each time through the loop is the expression to the right of the sign. The last line with the Σ is just a more compact way of expressing the preceding line.