4.1: Simple Sample Spaces
Recall that the main topic of the last few sections was combinatorial analysis. In this section, we relate how counting is related to probability. We first must make the following two definitions.
If \(A\) is a finite set, then we define the cardinality of \(A\), denoted by \(|A|\), to be the number of elements in \(A\).
Compute the cardinalities of the following sets:
- \(S = \{ H, T \} \)
- \(S = \{ 1, 2, 3, 4, 5, 6 \} \)
- \(S = \{ \text{all possible selections of a president, vice president, secretary and treasurer from a group of 30 people} \} \).
Solution
- \( |S| = 2 \)
- \( |S| = 6 \)
- \( |S| = P_{30,4} \)
A simple sample space refers to a finite sample space where all of the outcomes are equally likely.
- Suppose an experiment consists of flipping a fair coin. Then \(S = \{ H, T \} \). This sample space is finite and since the coin is fair, both outcomes are equally likely. Hence, under this experiment, \(S\) is a simple sample space.
- Suppose an experiment consists of rolling a regular 6-sided (fair) die. Then \(S = \{ 1, 2, 3, 4, 5, 6 \} \). This sample space is finite and since the coin is fair, both outcomes are equally likely. Hence, under this experiment, \(S\) is a simple sample space.
- Suppose an experiment consists of rolling a 6-sided loaded die. That is, the die is weighted in such a way that it is more likely to roll a 6 as opposed to a 1. In this example, \(S \) is still \( \{ 1, 2, 3, 4, 5, 6 \} \). However, under this experiment, \(S\) is not a simple sample space since the outcome of rolling a 6 is more likely than rolling a 1.
We will now discuss an important result concerning simple sample spaces which will be our focus for the remainder of this section.
If \( S = \{ s_1, s_2, \ldots, s_n \} \) is a simple sample space then \[ P( \{ s_1 \} ) = P( \{ s_2 \} ) = \ldots = P( \{ s_n \} )= \frac{1}{n} \nonumber\ \]
- Proof
-
Since \(S = \{s_1\} \cup \{s_1\} \cup \ldots \cup \{s_n\} \) then \[ P(S) = P(\{s_1\} \cup \{s_1\} \cup \ldots \cup \{s_n\}) \nonumber\ \] By assumption, all of the \(n\) elements in \(S\) are mutually exclusive and so by Axiom 3, we have \[ P(S) = P(\{s_1\}) + P(\{s_2\}) + \ldots + P(\{s_n\}) \label{f} \] By the definition of a simple sample space, all of the outcomes are equally likely and so we will denote their probabilities by lowercase \(p\). That is, \( P(\{s_1\}) = P(\{s_2\}) = \ldots = P(\{s_n\}) = p\) and so Equation \ref{f} becomes
\begin{align*}
P(S) & = \underbrace{p + p + \ldots + p}_{n ~ \text{times}} \\[4pt]
1 & = np \\[4pt] p & = \frac{1}{n}
\end{align*}
If \( S = \{ s_1, s_2, \ldots, s_n \} \) is a simple sample space and \(A\) is any event in \(S\) such that \( |A| = m \), then \[ P(A) = \frac{m}{n} = \frac{|A|}{|S|} \]
- Proof
-
Since \( |A| = m \) then \(A\) has the following form \( A = \{ s_{i_{1}}, s_{i_{2}}, \ldots, s_{i_{m}} \} \). Thus, we have
\begin{align*}
P(A) & = P( \underbrace{s_{i_{1}} \cup s_{i_{2}} \cup \ldots \cup s_{i_{m}}}_{\text{by disjointness}} ) \\[4pt] P(A) & = \underbrace{P( s_{i_{1}}) + P(s_{i_{2}}) + \ldots + P(s_{i_{m}}})_{\text{by the above theorem each probability is 1/n}} \\[4pt] P(A) & = \underbrace{\frac{1}{n} + \frac{1}{n} + \frac{1}{n} + \ldots + \frac{1}{n}}_{m ~ \text{times}} \\[4pt] P(A) & =\frac{m}{n} \\[4pt] P(A) & =\frac{|A|}{|S|}
\end{align*}
In order to answer questions concerning simple sample spaces, we will use the following framework:
- First define the event \(A\).
- Then define the sample space \(S\) by saying "Let \(S\) be the set of all outcomes where an outcome is _________" and fill in the blank.
- Argue that \(S\) is indeed a simple sample space. We typically do this by asking ourselves, "is there any reason to suspect that a particular outcome in \(S\) is more likely than another outcome in \(S\)?" If the answer is no then we are in a simple sample space. If the answer is yes, then we are not in a simple sample space and we should go back to step 2 and consider a different way to define an outcome.
- Count \(|S|\) by proceeding in steps if necessary.
- Count \(|A|\) by proceeding in steps if necessary.
- Our answer is given by \( \displaystyle P(A) = \frac{|A|}{|S|} \nonumber\ \)
In the first example, we reconsider the problem which we posed in our very first discussion concerning probabilities.
Suppose a fair coin is tossed. What is the probability that a heads is obtained?
- Answer
-
Allow us to apply the framework outlined above.
- Let \(A = \{ \text{a heads is obtained} \} \).
- Let Let \(S\) be the set of all outcomes where an outcome is a result of the coin flip. Hence \(S = \{ H, T \} \) where \(H\) denotes a heads was obtained and \(T\) denotes a tail was obtained.
- Is there any reason to suspect that one outcome in \(S\) is more likely than another outcome in \(S\)? Since the coin is fair, there is no reason to believe this. Hence we are in a simple sample space.
- The cardinality of \(S\) is simple enough to compute without having to introduce permutations, combinations, or multinomials. Quite simply, \(|S| = 2\).
- Similarly, \( |A| = 1 \).
- Hence, \( P(A) = \frac{|A|}{|S|} = \frac{1}{2} = 0.5 \).
From a group of 10 women and 12 men, 5 people are to be selected at random to be on a committee. What is the probability that we obtain 2 women and 3 men?
- Answer
-
Allow us to apply the framework outlined above.
- Let \(A = \{ \text{we obtain 2 women and 3 men} \} \).
- Let Let \(S\) be the set of all outcomes where an outcome is a subset of 5 people.
- Is there any reason to suspect that one outcome in \(S\) is more likely than another outcome in \(S\)? That is, is there any reason to suspect that a certain subset of 5 specific people is more likely than another subset of 5 specific people? Since the selection happens at random, there is no reason to believe this. Hence we are in a simple sample space.
- The cardinality of \(S\) is \(|S| = \binom{22}{5} \).
- To find the cardinality of \(A\), we recall the following example from Section 3.1. Hence, \( |A| = \binom{10}{2} \binom{12}{3} \).
- Hence, \( \displaystyle P(A) = \frac{\binom{10}{2} \binom{12}{3}}{\binom{22}{5}} \approx 0.3759 \).
In the above example, it was natural for an outcome to be a subset of 5 people since there was nothing special about the order in which we selected the elements. That being said, let us consider the following question: if for some reason, we did care about the order of selection, would the probability be any different? That is, if we regard an outcome as an ordered 5-tuple, then would the probability change? Let us take a closer look:
From a group of 10 women and 12 men, 5 people are to be selected at random to be on a committee. What is the probability that we obtain 2 women and 3 men?
- Answer
-
Allow us to apply the framework outlined above.
- Let \(A = \{ \text{we obtain 2 women and 3 men} \} \).
- Let Let \(S\) be the set of all outcomes where an outcome is an ordered 5-tuple.
- Is there any reason to suspect that one outcome in \(S\) is more likely than another outcome in \(S\)? That is, is there any reason to suspect that a certain ordering of 5 specific people is more likely than another ordering of 5 specific people? Since the selection happens at random, there is no reason to believe this. Hence we are in a simple sample space.
- The cardinality of \(S\) is \(|S| = P_{22,5} \).
-
To find the cardinality of \(A\), we proceed in steps.
- We first choose a subset of two women. There are \( \binom{10}{2} = 45 \) ways to do this.
- We now select a subset of three men. There are \( \binom{12}{3} = 220 \) ways to do this.
- Since an outcome is ordered, we now must order these 5 distinct people. There are 5! ways to do so.
- So \(|A| = \binom{10}{2} \times \binom{12}{3} \times 5! \) Hence, \( \displaystyle P(A) = \frac{\binom{10}{2} \times \binom{12}{3} \times 5!}{P_{22,5}} \approx 0.3759 \) which is the same answer as above.
Notice that \[ \frac{\binom{10}{2} \binom{12}{3}}{\binom{22}{5}} = \frac{\binom{10}{2} \times \binom{12}{3} \times 5!}{P_{22,5}} \nonumber\ \] and so it turns out that it does not matter if we allow the five people to have an ordering - the probability will be the same. More generally, this is always the case for any experiment that involves a random selection of \(k\) elements from \(n\). We summarize this in the following remark.
Whenever an experiment consists of randomly selecting \(k\) elements from \(n\) elements, we have some flexibility in defining the outcomes of \(S\). We may either
- let an outcome be a subset of \(k\) elements, or
- let an outcome be an ordered \(k\) -tuple.
The choice is completely up to us. Either way, we will obtain the same probability.
The above remark essentially tells us that it is okay to imagine that the experiment has some sort of additional structure where we care about the ordering of the elements. Again, this only applies if the experiment requires us to randomly selecting \(k\) elements from \(n\) distinct elements. Moreover, notice that the above remark is general in the sense that it does not matter if the \(n\) elements are distinct. In such a case where the \(n\) elements are not distinct, we can make them distinct by indexing them as previously discussed.
In the above example, some may argue that the subset route was simpler. However, this is not always the case. Consider the following problem which we first tackle via permutations.
From a group of 8 married couples, 4 people are chosen at random. Find the probability that the five people who are chosen are not related, meaning that the five people do not belong to the same household.
- Answer
-
Allow us to apply our usual framework.
- Let \(A = \{ \text{the five chosen people are unrelated} \} \).
- Let Let \(S\) be the set of all outcomes where an outcome is an ordered 4-tuple.
- Is there any reason to suspect that one outcome in \(S\) is more likely than another outcome in \(S\)? That is, is there any reason to suspect that a certain ordering of 4 specific people is more likely than another ordering of 4 specific people? Since the selection happens at random, there is no reason to believe this. Hence we are in a simple sample space.
- The cardinality of \(S\) is \(|S| = P_{16,4} \).
-
To find the cardinality of \(A\), we proceed in steps.
- We first choose any person. There are 16 ways to do this.
- We now must choose a person who is not related to the person above. There are 14 ways to do this.
- We now must choose a person who is not related to any previous selected person There are 12 ways to do this.
- We now must choose a person who is not related to any previous selected person There are 10 ways to do this.
- So \(|A| = 16 \times 14 \times 12 \times 10 \) Hence, \( \displaystyle P(A) = \frac{16 \times 14 \times 12 \times 10}{P_{16,4}} \approx 0.6154 \).
We now consider the same problem where the outcomes are subsets of four people.
From a group of 8 married couples, 4 people are chosen at random. Find the probability that the five people who are chosen are not related, meaning that the five people do not belong to the same household.
- Answer
-
Allow us to apply our usual framework.
- Let \(A = \{ \text{the five chosen people are unrelated} \} \).
- Let Let \(S\) be the set of all outcomes where an outcome is a subset of 4 people.
- Is there any reason to suspect that one outcome in \(S\) is more likely than another outcome in \(S\)? That is, is there any reason to suspect that a certain subset of 4 specific people is more likely than another subset of 4 specific people? Since the selection happens at random, there is no reason to believe this. Hence we are in a simple sample space.
- The cardinality of \(S\) is \(|S| = \binom{16}{4} \).
-
To find the cardinality of \(A\), we proceed in steps. (Remember, we need a set of steps that will generate all subsets of 4 people where the 4 people are not related to each other. This can be difficult and requires some thought!)
- We first choose four couples. There are \( \binom{8}{4} \) ways to do this.
- Within each of the four couples, we will select one person. This will insure that we end up with four people who are not related to each other. There are \( \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1} \) ways to do this.
- So \( |A| = \binom{8}{4} \times \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1} \). Hence, \( \displaystyle P(A) = \frac{ \binom{8}{4} \times \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1} \times \binom{2}{1}}{\binom{16}{4}} \approx 0.6154 \).
Reflecting back on both solutions, it appears that the permutation route was more straightforward. In general, you should choose the method best-suited to the problem.
If 3 marbles are randomly withdrawn from a bowl consisting of 997 green marbles and 3 red marbles, then what is the probability that one marble is green and the other two are red?
Critique this solution:
Allow us to apply our usual framework.
- Let \( A = \{ \text{we obtain 1 green marbles and 2 red marbles} \} \).
- Let \(S \) denote the set of all outcomes where an outcome is the number of green marbles we get after selecting 3 marbles. Then, \( S = \{ 0 ~ \text{greens}, 1 ~ \text{green}, 2 ~ \text{green}, 3 ~ \text{greens} \} \).
- Since the selection happens at random, there is no reason to believe that one outcome is more likely than another outcome.
- \( |S| = 4 \).
- \( |A| = 1 \).
- Hence \( P(A) = \frac{|A|}{|S|} = \frac{1}{4} \).
Is the above solution correct? Why or why not?
- Answer
-
The above answer is incorrect! Just because the selection happens at random, it does not mean that \(S\) is a simple sample space. After some reflection it should be clear that based off the composition of the bowl, it is far more likely that we get 3 greens as opposed to 0 greens and so S is not a simple sample space. As we mentioned in our framework, whenever \(S\) is not a simple sample space, we should redefine \(S\). Here is how we can do this:
1) Let \( A = \{ \text{we obtain 1 green marbles and 2 red marbles} \} \).
2) Notice in this experiment, we are selecting 3 elements at random from 1000 elements. By our remark, we can let an outcome be a subset of 3 elements or we may let an outcome be an ordered 3-tuple. For concreteness, we will let an outcome be a subset of 3 elements *. For example, here are a few possible outcomes: \( \{ G_{17}, G_{409}, R_{2} \} \) or \( \{ G_{1}, G_{241}, G_{65} \} \) or \( \{ R_{1}, R_{2}, R_{3} \} \). Notice it is implied that the elements are indexed or distinct since a combination requires all elements to be distinct.
3) Since the selection happens at random, there is no reason to believe that one subset of three specific marbles is more likely than another subset of three specific marbles. Thus \(S\) is a simple sample space.
4) \( |S| = \binom{1000}{3} \).
5) To find \( |A| \), we proceed in steps. We need to make sure our outcome looks like this: \( \{ \underbrace{ \_\_ }_{green}, \underbrace{ \_\_, \_\_ }_{red} \} \). We first select one green. There are \( \binom{997}{1} \) ways to do this. We then must select two reds. There are \( \binom{3}{2} \) ways to do this. Hence, \( |A| = \binom{997}{1} \times \binom{3}{2} \).
6) Putting everything together, we obtain that \( P(A) = \frac{|A|}{|S|} = \frac{ \binom{997}{1} \times \binom{3}{2} }{\binom{1000}{3}} \approx 0.000018\).
Suppose two fair dice are rolled. What is the probability that we obtain a sum of 7? (You may assume that the dice are regular 6-sided dice).
Critique this solution:
Allow us to apply our usual framework.
- Let \( A = \{ \text{we obtain a sum of 7} \} \).
- Let \(S \) denote the set of all outcomes where an outcome is a possible sum that results from rolling a pair of fair dice. Then, \( S = \{ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 \} \).
- Since the dice are fair, there is no reason to believe that one outcome is more likely than another outcome.
- \( |S| = 11 \).
- \( |A| = 1 \).
- Hence \( P(A) = \frac{|A|}{|S|} = \frac{1}{11} \).
Is the above solution correct? Why or why not?
- Answer
-
This is incorrect because \(S\) is not a simple sample space. For if it were, then all of the outcomes would be equally likely. This means the chances of getting a sum of 2 would be the same as getting a sum of 7 but clearly this is not the case! There is only one configuration of the dice that would yield a sum of 2. However, there are multiple configurations of the dice that yields a sum of 7. If you are not convinced, then ask yourself if the following is a fair bet. If you roll a sum of 2 then I will give you \$100 dollars but if I roll a sum of 7 then you have to give me a sum of \$100. Clearly, this is not a fair bet since there is only one way you can possibly win but I have multiple ways to win!
Since \(S\) is not a simple sample space, our framework tells us that we should re-define \(S\) in order to make it a simple sample space. How can we do this? Looking back at the problem, we notice that we are not selecting \(k\) elements from \(n\) and so the same trick that worked in the above problem no longer applies. In situations like these, our construction of \(S\) should be based on what is actually fair in the experiment. For this experiment, the dice themselves are fair and so an outcome should be a specification of the number obtained on each die. That is, we can think of an outcome as an ordered pair \( ( \underbrace{\_\_ }_{\text{Die}~1}, \underbrace{\_\_ }_{\text{Die}~2} ) \).
1. Let \( A = \{ \text{we obtain a sum of 7} \} \).
2. Let \(S \) denote the set of all outcomes where an outcome is an ordered pair. The first entry in the ordered pair denotes the outcome of rolling Die 1 and the second entry in the ordered pair denotes the outcome of rolling Die 2.
3. \(S\) is a simple sample space because there is no reason to believe that a certain ordered pair is more likely than another ordered pair since the dice are fair. Visually we can list out the outcomes in \(S\) via the following chart:
Die 2 1 2 3 4 5 6 Die 1 1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) 2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) 3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) 4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) 5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) 6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) 4. \( |S| = 6^2 \).
5. \( |A| = 6 \).
6. \( P(A) = \frac{|A|}{|S|} = \frac{6}{36} = \frac{1}{6} \).
The following two problems will make use of the theorems we proved in Section 1.3. Before we get into the next problem, allow us to make the following informal comment.
In Section 2.1, we stated that following two remarks:
1) The word "or" corresponds to the set theoretic operation of union and the word "and" corresponds to the set theoretic operation of intersection.
2) In rephrasing an event, we should always ask ourselves "can we describe this event by using the words or/and ?"
Sometimes when rephrasing the event, our rephrasing will use the word "or" many times. As such, we will be left with many unions when we describe the event in terms of sets. If the events in the union are disjoint, then we should be able to proceed directly. However, in the case that we have many unions and the events are not disjoint, the computation can be extremely complicated. For such a situation, it is better to proceed indirectly by studying the complement.
Our next problem illustrates the above comment.
If \( k \leq 365\) people are in a room, then what is the probability that at least two people will have the same birthday?
You may make the following assumptions:
- The same birthday means the same month and day, not year.
- No twins, triplets, etc. are present.
- Each of the 365 days are equally likely to be someone's birthday.
- The year is not a leap year.
- If someone's birthday is 2/29 then we will consider their birthday to be 3/01.
Solution
1. Let \( A = \{ \text{at least two people share the same birthday} \} \). This event can be rephrased as \(A = \{ \text{2 people share the same birthday} \} \cup \text{3 people share the same birthday} \} \cup \ldots \cup \text{k people share the same birthday} \} \). Notice that the events are not disjoint! It is possible to have a room of \(k\) people where 2 people share the same birthday and then a group of 3 people also share the same birthday. Looking back at \(A\), we see that \(A\) is the union of many non-disjoint events and so proceeding directly is ill advised. Rather, we will study it's complement. Notice that
\begin{align*}
A^c &= \{ \text{it is not the case that at least two people share the same birthday} \} \\[4pt] & = \{ \text{no one shares the same birthday} \} \\[4pt] & = \{ \text{everyone has different birthdays} \} \\[4pt] & = \{ \text{Person 1 and Person 2 and Person 3 and} \ldots \text{and Person k all have different birthdays} \\[4pt]
\end{align*}
2. Let \(S\) be the set of all outcomes where an outcome is a list of the birthdays for each person. That is, an outcome can be thought of as the following ordered \(k-\)tuple: \( ( \underbrace{\_\_}_{\text{Person 1's birthday}} , \underbrace{\_\_}_{\text{Person 2's birthday}}, \ldots, \underbrace{\_\_}_{\text{Person k's birthday}} ) \).
3. Based off the assumptions that we have outlined above, \(S\) is a simple sample space since there is no reason to believe that one list of birthdays is more likely than another list of birthdays.
4. \( |S| = 365^k \).
5. To find \( |A^c| \), we will proceed in steps.
- Select a birthday for Person 1. There are 365 choices.
- Select a different birthday for Person 2. There are 364 choices.
- Select a different birthday for Person 3. There are 363 choices.
....
k. Select a different birthday for Person k. There are \(365 - (k-1) = 365 - k + 1 \) choices.
By The Generalized Principle of Counting, \( |A^c| = 365 \times 364 \times \ldots \times (365 - k + 1) = P_{365,k} \).
6. Hence \( P(A^c) = \frac{|A^c|}{|S|} = \frac{ P_{365,k}}{365^k} \) and so
\[ P(A) = 1 - P(A^c) = 1 - \frac{ P_{365,k}}{365^k} \nonumber \]
Let us plot the \(P(A) \) as a function of \(k\).
It seems that the probability approaches 1 rather quickly! Zooming in, we obtain the following graph:
Let us summarize some of the selected probabilities in the form of a chart.
| The number of people in the room, \( k \) | The probability that at least two people share the same birthday, \( P(A) \) |
|---|---|
| 1 | 0 |
| 5 | 0.027 |
| 10 | 0.117 |
| 15 | 0.253 |
| 20 | 0.411 |
| 25 | 0.569 |
| 30 | 0.706 |
| 40 | 0.891 |
| 50 | 0.970 |
| 60 | 0.994 |
Ten marbles are randomly withdrawn without replacement from a bowl that contains 10 red, 15 blue, and 20 green marbles. Find the probability that either exactly 4 red marbles or exactly 4 blue marbles are withdrawn.
- Answer
-
As usual, we apply our counting framework while keeping a keen eye out for ways to simplify the event of interest.
1) Let \(A = \{ \text{we obtain exactly 4 red marbles or 4 blue marbles} \} \). The word or indicates that we have a union of two events. As such, let \(A_R = \{ \text{we obtain exactly 4 red marbles} \) and let \( A_B= \{ \text{we obtain exactly 4 blue marbles} \} \). Then \( A = A_R \cup A_B \). Notice that \(A_R \) and \( A_B \) are not disjoint since it is possible to select ten marbles and obtain 4 reds and 4 blues. However, we only have one union as opposed to many of them as seen in the previous example and so we will deal with this directly. Doing so yields the following:
\begin{align*} P(A) = P(A_R \cup A_B) &= P(A_R) + P(A_B) - P(A_R \cap A_B) \\ & = \frac{|A_R|}{|S|} + \frac{|A_B|}{|S|} - \frac{|A_R \cap A_B|}{|S|} ~~~~\text{provided we are in a simple sample space} \end{align*}
2) Since this experiment involves a random selection of \(k\) elements from \(n\), we have some flexibility in defining \(S\). We may either let an outcome be a subset of ten marbles or we may let an outcome be an ordered 10-tuple. For convenience, we will let an outcome be a subset of 10 marbles.
3) \(S\) is a simple sample space since there is no reason to believe a certain subset of 10 sepcific marbles is more likely than another subset of 10 specific marbles.
4) \( |S| = \binom{45}{10} \).
5) \( |A_R| = \binom{10}{4} \binom{35}{6} \) , \( |A_B| = \binom{15}{4} \binom{30}{6} \), and \( |A_R \cap A_B| = \binom{10}{4} \binom{15}{4} \ \binom{20}{2} \ \)
Putting everything together yields:
\begin{align*} \displaystyle P(A) = \frac{\binom{10}{4} \binom{35}{6}}{\binom{45}{10}} + \frac{ \binom{15}{4} \binom{30}{6} }{\binom{45}{10}} - \frac{\binom{10}{4} \binom{15}{4} \ \binom{20}{2} }{\binom{45}{10}} \approx 0.3438 \end{align*}