6.1: Discrete Random Variables
By the end of this section, the student should be able to:
- Recognize and use discrete random variables.
- Calculate and interpret expected values.
- A student takes a ten-question, true-false quiz. Because the student had such a busy schedule, he or she could not study and guesses randomly at each answer. What is the probability of the student passing the test with at least a 70%?
- Small companies might be interested in the number of long-distance phone calls their employees make during the peak time of the day. Suppose the average is 20 calls. What is the probability that the employees make more than 20 long-distance phone calls during the peak time?
These two examples illustrate two different types of probability problems involving discrete random variables. Recall that discrete data are data that you can count. A random variable describes the outcomes of a statistical experiment in words. The values of a random variable can vary with each repetition of an experiment.
Upper case letters such as \(X\) or \(Y\) denote a random variable. Lower case letters like \(x\) or \(y\) denote the value of a random variable. If \(X\) is a random variable and \(x\) is given as a number then \(X=x\) is an event with its probability expressed as \(P(X=x)\).
For example, let \(X\) be the number of heads you get when you toss three fair coins. The sample space for the toss of three fair coins is TTT ; THH ; HTH ; HHT ; HTT ; THT ; TTH ; HHH . Then, \(x =\) 0, 1, 2, 3 are all possible outcomes of \(X\). Notice that for this example, the \(x\) values are countable outcomes. Because you can count the possible values that \(X\) can take on and the outcomes are random (the x values 0, 1, 2, 3), \(X\) is a discrete random variable. The event \(X=2\) is an event in which you get two heads when you toss three fair coins, and its probability is expressed as \(P(X=2)\).
Discrete Random Variables
A discrete probability distribution table (function) lists possible values of the random variable along with the probability of each outcome. It has two characteristics:
- Each probability is between zero and one, inclusive.
- The sum of the probabilities is one.
A child psychologist is interested in the number of times a newborn baby's crying wakes its mother after midnight. For a random sample of 50 mothers, the following information was obtained. Let \(X\) be the number of times per week a newborn baby's crying wakes its mother after midnight. For this example, \(x = 0, 1, 2, 3, 4, 5\).
\(P(X=x)\) is the probability that \(X\) takes on a value \(x\).
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | \(P(X = 0) = \dfrac{2}{50}\) |
| 1 | \(P(X = 1) = \dfrac{11}{50}\) |
| 2 | \(P(X = 2) = \dfrac{23}{50}\) |
| 3 | \(P(X = 3) = \dfrac{9}{50}\) |
| 4 | \(P(X = 4) = \dfrac{4}{50}\) |
| 5 | \(P(X = 5) = \dfrac{1}{50}\) |
\(X\) takes on the values 0, 1, 2, 3, 4, 5. This is a discrete PDF because:
- Each \(P(X=x)\) is between zero and one, inclusive.
- The sum of the probabilities is one, that is,
\[\dfrac{2}{50} + \dfrac{11}{50} + \dfrac{23}{50} + \dfrac{9}{50} + \dfrac{4}{50} + \dfrac{1}{50} = 1\]
A hospital researcher is interested in the number of times the average post-op patient will ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following information was obtained. Let \(X\) be the number of times a patient rings the nurse during a 12-hour shift. For this exercise, \(x = 0, 1, 2, 3, 4, 5\). \(P(X=x)\) is the probability that \(X\) takes on value \(x\). Why is this a discrete probability distribution function (two reasons)?
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | \(P(X = 0) = \dfrac{4}{50}\) |
| 1 | \(P(X = 1) = \dfrac{8}{50}\) |
| 2 | \(P(X = 2) = \dfrac{16}{50}\) |
| 3 | \(P(X = 3) = \dfrac{14}{50}\) |
| 4 | \(P(X = 4) = \dfrac{6}{50}\) |
| 5 | \(P(X = 5) = \dfrac{2}{50}\) |
Answer
Each \(P(X=x)\) is between 0 and 1, inclusive, and the sum of the probabilities is 1, that is:
\[\dfrac{4}{50} + \dfrac{8}{50} +\dfrac{16}{50} +\dfrac{14}{50} +\dfrac{6}{50} + \dfrac{2}{50} = 1\]
Suppose Nancy has classes three days a week. She attends classes three days a week 80% of the time, two days 15% of the time, one day 4% of the time, and no days 1% of the time. Suppose one week is randomly selected.
- Let \(X\) be the number of days Nancy ____________________.
- \(X\) takes on what values?
- Suppose one week is randomly chosen. Construct a probability distribution table (called a PDF table) like the one in Example . The table should have two columns labeled \(x\) and \(P(X=x)\). What does the \(P(X=x)\) column sum to?
Solutions
a. Let \(X\) be the number of days Nancy attends class per week.
b. 0, 1, 2, and 3
c
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | 0.01 |
| 1 | 0.04 |
| 2 | 0.15 |
| 3 | 0.80 |
Jeremiah has basketball practice two days a week. Ninety percent of the time, he attends both practices. Eight percent of the time, he attends one practice. Two percent of the time, he does not attend either practice. What is X and what values does it take on?
Answer
\(X\) is the number of days Jeremiah attends basketball practice per week. X takes on the values 0, 1, and 2.
A company wants to evaluate its attrition rate, in other words, how long new hires stay with the company. Over the years, they have established the following probability distribution. Let \(X\) be the number of years a new hire will stay with the company. Let \(P(X=x)\) be the probability that a new hire will stay with the company \(x\) years.
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | 0.12 |
| 1 | 0.18 |
| 2 | 0.30 |
| 3 | 0.15 |
| 4 | |
| 5 | 0.10 |
| 6 | 0.05 |
1. Complete the table using the data provided, i.e. find \(P(X = 4)\).
2. Find \(P(X \geq 5)\).
Answer
1. \(P(X=4)=1-(0.12+0.18+0.30+0.15+0.10+0.05)=1-0.90=0.10\) thus the complete probability distribution table for \(X\):| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | 0.12 |
| 1 | 0.18 |
| 2 | 0.30 |
| 3 | 0.15 |
| 4 | 0.10 |
| 5 | 0.10 |
| 6 | 0.05 |
2. \(P(X\geq5)=0.10 + 0.05 = 0.15\)
Javier volunteers in community events each month. He does not do more than five events in a month. Over the years, he has established the following probability distribution. Let \(X\) be the number of times Javier volunteers in a randomly selected month and let \(P(X=x)\) be the probability that Javier volunteers \(x\) times.
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | 0.05 |
| 1 | 0.05 |
| 2 | |
| 3 | 0.20 |
| 4 | 0.25 |
| 5 | 0.35 |
1. Complete the table using the data provided, i.e. find \(P(X = 2)\).
2. Find \(P(X \leq 3)\).
- Answer
-
1. \(P(X=2)=1-(0.05+0.05+0.20+0.25+0.35)=1-0.90=0.10\) thus the complete probability distribution table for \(X\):
\(x\) \(P(X=x)\) 0 0.05 1 0.05 2 0.10 3 0.20 4 0.25 5 0.35 2. \(P(X \leq 3) = 0.05+0.05+0.10+0.20=0.40\).
Expected Value
The expected value is often referred to as the "long-term" average or mean. This means that over the long term of doing an experiment over and over, you would expect this average.
You toss a coin and record the result. What is the probability that the result is heads? If you flip a coin two times, does probability tell you that these flips will result in one heads and one tail? You might toss a fair coin ten times and record nine heads. As you learned in Chapter 3, probability does not describe the short-term results of an experiment. It gives information about what can be expected in the long term. To demonstrate this, Karl Pearson once tossed a fair coin 24,000 times! He recorded the results of each toss, obtaining heads 12,012 times. In his experiment, Pearson illustrated the Law of Large Numbers.
The Law of Large Numbers states that, as the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero (the theoretical probability and the relative frequency get closer and closer together). When evaluating the long-term results of statistical experiments, we often want to know the “average” outcome. This “long-term average” is known as the mean or expected value of the experiment and is denoted by the Greek letter \(\mu\). In other words, after conducting many trials of an experiment, you would expect this average value.
To find the expected value or long-term average, \(\mu\), simply multiply each value of the random variable by its probability and add the products. It is much easier to learn by seen the process rather than looking at the formula.
A men's soccer team plays soccer zero, one, or two days a week. The probability that they play zero days is 0.2, the probability that they play one day is 0.5, and the probability that they play two days is 0.3. The expected value, \(\mu\), in this case is the long-term average of the number of days per week the men's soccer team plays soccer. To find it we create the probability distribution table and add a column \(x*P(x)\). In this column, you will multiply each \(x\) value by its probability.
| \(x\) | \(P(X=x)\) | \(x\cdot P(X=x)\) |
|---|---|---|
| 0 | 0.2 | (0)(0.2) = 0 |
| 1 | 0.5 | (1)(0.5) = 0.5 |
| 2 | 0.3 | (2)(0.3) = 0.6 |
Add the last column \(x\cdot P(X=x)\) to find the long term average or expected value:
\[(0)(0.2) + (1)(0.5) + (2)(0.3) = 0 + 0.5 + 0.6 = 1.1. \nonumber\]
The expected value is 1.1. The men's soccer team would, on the average, expect to play soccer 1.1 days per week. The number 1.1 is the long-term average or expected value if the men's soccer team plays soccer week after week after week. We say \(\mu = 1.1\).
Find the expected value of the number of times a newborn baby's crying wakes its mother after midnight. The expected value is the expected number of times per week a newborn baby's crying wakes its mother after midnight.
| \(x\) | \(P(X=x)\) | \(x\cdot P(X=x)\) |
|---|---|---|
| 0 | \(P(X = 0) = \dfrac{2}{50}\) | \((0)\left(\dfrac{2}{50}\right) = 0\) |
| 1 | \(P(X = 1) = \dfrac{11}{50}\) | \((1)\left(\dfrac{11}{50}\right) = \dfrac{11}{50}\) |
| 2 | \(P(X = 2) = \dfrac{23}{50}\) | \((2)\left(\dfrac{23}{50}\right) = \dfrac{46}{50}\) |
| 3 | \(P(X = 3) = \dfrac{9}{50}\) | \((3)\left(\dfrac{9}{50}\right) = \dfrac{27}{50}\) |
| 4 | \(P(X = 4) = \dfrac{4}{50}\) | \((4)\left(\dfrac{4}{50}\right) = \dfrac{16}{50}\) |
| 5 | \(P(X = 5) = \dfrac{1}{50}\) | \((5)\left(\dfrac{1}{50}\right) = \dfrac{5}{50}\) |
Add the values in the third column of the table to find the expected value of \(X\):
\[\mu = \text{Expected Value} = \dfrac{105}{50} = 2.1 \nonumber\]
A hospital researcher is interested in the number of times the average post-op patient will ring the nurse during a 12-hour shift. For a random sample of 50 patients, the following information was obtained. What is the expected value?
| \(x\) | \(P(X=x)\) |
|---|---|
| 0 | \(P(X = 0) = \dfrac{4}{50}\) |
| 1 | \(P(X = 1) = \dfrac{8}{50}\) |
| 2 | \(P(X = 2) = \dfrac{16}{50}\) |
| 3 | \(P(X = 3) = \dfrac{14}{50}\) |
| 4 | \(P(X = 4) = \dfrac{6}{50}\) |
| 5 | \(P(X = 5) = \dfrac{2}{50}\) |
- Answer
-
The expected value is 2.24
\[(0)\dfrac{4}{50} + (1)\dfrac{8}{50} + (2)\dfrac{16}{50} + (3)\dfrac{14}{50} + (4)\dfrac{6}{50} + (5)\dfrac{2}{50} = 0 + \dfrac{8}{50} + \dfrac{32}{50} + \dfrac{42}{50} + \dfrac{24}{50} + \dfrac{10}{50} = \dfrac{116}{50} = 2.32\]
Suppose you play a game of chance in which five numbers are chosen from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. A computer randomly selects five numbers from zero to nine with replacement. You pay $2 to play and could profit $100,000 if you match all five numbers in order (you get your $2 back plus $100,000). Over the long term, what is your expected profit of playing the game?
To do this problem, set up an expected value table for the amount of money you can profit.
Let \(X\) be the amount of money you profit. The values of \(x\) are not 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Since you are interested in your profit (or loss), the values of \(x\) are 100,000 dollars and −2 dollars.
To win, you must get all five numbers correct, in order. The probability of choosing one correct number is \(\dfrac{1}{10}\) because there are ten numbers. You may choose a number more than once. The probability of choosing all five numbers correctly and in order is
\[\begin{align*} \left(\dfrac{1}{10}\right) \left(\dfrac{1}{10}\right) \left(\dfrac{1}{10}\right) \left(\dfrac{1}{10}\right) \left(\dfrac{1}{10}\right) &= (1)(10^{-5}) \\[5pt] &= 0.00001. \end{align*}\]
Therefore, the probability of winning is 0.00001 and the probability of losing is
\[1−0.00001=0.99999.1−0.00001 = 0.99999.\nonumber\]
The expected value table is as follows:
| \(x\) | \(P(X=x)\) | \(x\cdot P(X=x)\) | |
|---|---|---|---|
| Loss | –2 | 0.99999 | (–2)(0.99999) = –1.99998 |
| Profit | 100,000 | 0.00001 | (100000)(0.00001) = 1 |
Since –0.99998 is about –1, you would, on average, expect to lose approximately $1 for each game you play. However, each time you play, you either lose $2 or profit $100,000. The $1 is the average or expected LOSS per game after playing this game over and over.
Suppose you play a game with a biased coin. You play each game by tossing the coin once. \(P(\text{heads}) = \dfrac{2}{3}\) and \(P(\text{tails}) = \dfrac{1}{3}\). If you toss a head, you pay $6. If you toss a tail, you win $10. If you play this game many times, will you come out ahead?
a. Define a random variable \(X\).
b. Complete the following expected value table.
| \(x\) | ____ | ____ | |
|---|---|---|---|
| WIN | 10 | \(\dfrac{1}{3}\) | ____ |
| LOSE | ____ | ____ | \(\dfrac{-12}{3}\) |
c. What is the expected value, \(\mu\)? Do you come out ahead?
- Answer
-
a. \(X\) = amount of profit
b.
\(x\) \(P(X=x)\) \(x\cdot P(X=x)\) WIN 10 \(\dfrac{1}{3}\) \(\dfrac{10}{3}\) LOSE –6 \(\dfrac{2}{3}\) \(\dfrac{-12}{3}\) c. Add the last column of the table. The expected value \(\mu = -\dfrac{2}{3}=-0.67\). You lose, on average, about 67 cents each time you play the game, so you do not come out ahead.
Some of the more common discrete probability functions are binomial, geometric, hypergeometric, and Poisson. A probability distribution function is a pattern. You try to fit a probability problem into a pattern or distribution in order to perform the necessary calculations. These distributions are tools to make solving probability problems easier. Each distribution has its own special characteristics. Learning the characteristics enables you to distinguish among the different distributions.