12.1: The Central Limit Theorem
In the last section, we introduced the normal random variable and primarily discussed how to find the probabilities associated with a normal random variable. We also mentioned that the normal random variable is arguably the most important continuous random variable. We shall now find out why. We first make the following definition:
Definition: Let \( X_1, X_2, X_3, \ldots, X_n \) be random variables. We call the new random variable,
\begin{align*}
\frac{X_1 + X_2 + X_3 + \ldots + X_n}{n}
\end{align*}
the
sample mean
and we denote it by \( \overline{X}_n \).
1) Suppose \(X_i\) models the outcome of our \( i^{th} \) roll of a fair die. Then the random variable \( \dfrac{X_1 + X_2 + \ldots + X_{100}}{100} \) is the sample mean of the 100 rolls.
2) Let us suppose there are 30 people in a classroom and let \(X_i\) models the amount of time person \( i \) spends watching TV on a daily basis. Then the random variable \( \dfrac{X_1 + X_2 + \ldots + X_{30}}{30} \) represents the sample mean.
The question we will consider is if the random variables, \( X_1, X_2, X_3, \ldots, X_n \) are all independent and identically distributed, then can we say anything about their sample mean? The answer is yes and the details are given to us in one of the most important theorems in all of probability and statistics - The Central Limit Theorem.
Theorem: Suppose the random variables \( X_1, X_2, X_3, \ldots, X_n\) are all independent and identically distributed where each random variable has mean \( \mu \) and variance \(\sigma^2\). Then for each fixed \(x\),
\begin{align*}
\lim_{n \rightarrow \infty} P \bigg[ \frac{\overline{X}_n - \mu}{\sigma / \sqrt{n}} \leq x \bigg] = \Phi(x)
\end{align*}
Or equivalently
\begin{align*}
\lim_{n \rightarrow \infty} P \bigg[ \frac{X_1 + X_2 + \ldots + X_n - n\mu}{\sigma \sqrt{n}} \leq x \bigg] = \Phi(x)
\end{align*}
Suppose we have a bunch of random variables, \( X_1, X_2, X_3, \ldots, X_n\) that are all independent and have the same distribution. (For instance, all of them are Uniform random variables with the same parameters or all of them are Exponential random variables with the same parameters). Since they come from the same or identical distribution, then they all have the same average, \( \mu \), and same variance, \( \sigma^2 \). The Central Limit Theorem tells us that:
1) the new random variable, \( \dfrac{X_1 + X_2 + \ldots + X_n}{n} = \overline{X}_n \) will approximately be \( \mathcal{N}(\mu, \frac{\sigma^2}{n}) \).
2) the new random variable, \( X_1 + X_2 + \ldots + X_n \) will be approximately \( \mathcal{N}(n\mu, n \sigma^2) \).
Additionally, notice how general the Central Limit Theorem is! We are saying the distribution of \( X_1, X_2, X_3, \ldots, X_n\) can be anything we wish, as long as the random variables all have the same distribution. Regardless, the sample mean (as well as the sum) will tend to a normal distribution.
Before we consider our examples, allow us to take a few moments to use the following applet. First select a distribution and then select the value of \(n\). Notice that regardless of the starting distribution, the sample mean will tend to a normal distribution as \(n\) increases.
Let \( \overline{X}_{25} \) denote the sample mean of 25 random variables where each random variable comes from a distribution with a mean of 15 and variance of 4. Find \(P(14.4 \leq \overline{X}_{25} \leq 15.6) \).
- Answer
-
In this problem \(\mu = 15\) and \( \sigma^2 = 4\). By the Central Limit Theorem, \( \overline{X}_{25} \) will approximately be
\begin{align*}
\mathcal{N} \bigg(\mu, \frac{\sigma^2}{n} \bigg) = \mathcal{N}\bigg(15, \frac{4}{25}\bigg)
\end{align*}To finish the problem, we note that
\begin{align*}
P(14.4 \leq \overline{X}_{25} \leq 15.6) &= normalcdf(-1E99, 15.6, 15, 2/5) - normalcdf(-1E99, 14.4, 15, 2/5) = 0.8663855426
\end{align*}If you do not have a calculator, then you are forced to use the theorem from last section with the CDF chart:
\begin{align*}
P(14.4 \leq \overline{X}_{25} \leq 15.6) &= P \bigg( \frac{14.4-15}{2/5} \leq \frac{X-15}{2/5} \leq \frac{15.6-15}{2/5} \bigg) \\
& \approx P(-1.5 \leq Z \leq 1.5) \\
&= 0.9332 - 0.0668 \\
&= 0.8664
\end{align*}
Let \( \overline{X}_{25} \) denote the sample mean when the underlying random variables come from a distribution whose pdf is given by
\[
f(x) =
\begin{cases}
\frac{x^3}{4} & \text{if} ~ 0 < x < 2 \\
0 & \text{otherwise} \\
\end{cases} \nonumber\ \]
Find/approximate \( P(1.5 \leq \overline{X}_{25} \leq 1.65)\).
- Answer
-
Before we can even answer this question, we have to understand what the problem is asking us. This question is saying suppose you have these random variables, \( X_1, X_2, \ldots X_{25} \) where these random variables all have the above density. We are asked to find or approximate
\begin{align*}
P \bigg(1.5 \leq \frac{X_1 + X_2 + \ldots + X_{25}}{25} \leq 1.65 \bigg)
\end{align*}By the Central Limit Theorem, \( \dfrac{X_1 + X_2 + \ldots + X_{25}}{25} = \overline{X}_{25} \) will approximately be \( \mathcal{N}(\mu, \frac{\sigma^2}{25}) \). Let us find \(\mu \) and \( \sigma^2 \).
\begin{align*}
\mu = \mathbb{E}[X_i] = \int_{all ~x} xf(x) ~ dx = \int_{0}^{2} x \bigg(\frac{x^3}{4} \bigg) ~ dx = \frac{8}{5} = 1.6
\end{align*}\begin{align*}
\sigma^2 = \mathbb{V}ar[X] &= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \\
&= \mathbb{E}[X^2] - 1.6^2 \\
&= \int_{0}^{2} x^2 \bigg(\frac{x^3}{4} \bigg) ~ dx - 1.6^2 \\
&= \frac{8}{3} - 1.6^2 \\
&= \frac{8}{75}
\end{align*}By the Central Limit Theorem, \( \dfrac{X_1 + X_2 + \ldots + X_{25}}{25} = \overline{X}_{25} \) will approximately be \( \mathcal{N}(\mu, \frac{\sigma^2}{25})\) which turns out to be \( \mathcal{N}(1.6, \frac{8/75}{25}) =\mathcal{N}(1.6, \frac{8}{1875})\).
So \( \overline{X}_{25} \sim \mathcal{N}(1.6, \frac{8}{1875}) \). Hence,
\[ P(1.5 \leq \overline{X}_{25} \leq 1.65) \approx normalcdf(1.5, 1.65, 1.6, \sqrt{\frac{8}{1875}} \approx 0.7151096523 \nonumber\ \]
Or alternatively, if we wish to use the standardization process, we obtain the following:
\begin{align*}
P(1.5 \leq \overline{X}_{25} \leq 1.65) &= P \bigg( \frac{1.5-1.6}{\sqrt{\frac{8}{1875}}} \leq \frac{\overline{X}_{25} -1.6}{\sqrt{\frac{8}{1875}}} \leq \frac{1.65-1.6}{\sqrt{\frac{8}{1875}}} \bigg) \\
&\approx P(-1.531 \leq Z \leq 0.765) \\
&\approx P(-1.53 \leq Z \leq 0.77) \\
&= \Phi(0.77) - \Phi(-1.53) \\
&=0.7794 - 0.0630 \\
&= 0.7164
\end{align*}
Suppose \( X_1, X_2, \ldots, X_{20} \) are all independent and identically distributed as \( \mathcal{U}[0,1] \). Find \(P(X_1 + X_2 + \ldots + X_{20} < 9.1) \).
- Answer
-
Recall if \( X \sim \mathcal{U} [ a , b ] \) then \( \mathbb{E}[X] = \dfrac{a+b}{2} \) and \( \mathbb{V}ar[X] = \dfrac{(b-a)^2}{12} \). Hence, for the \( \mathcal{U}[0,1] \), \( \mathbb{E}[X] = \dfrac{0+1}{2} = \dfrac{1}{2} = \mu \) and \( \mathbb{V}ar[X] = \dfrac{(1-0)^2}{12} = \dfrac{1}{12} = \sigma^2 \).
We know by The Central Limit Theorem, \( X_1 + X_2 + \ldots + X_n \) will be approximately \( \mathcal{N}(n\mu, n \sigma^2) \) and so \( X_1 + X_2 + \ldots + X_{20} \) will be approximately \( \mathcal{N} \big( 20( \frac{1}{2}), 20 (\frac{1}{12}) \big) = \mathcal{N} ( 10, \frac{5}{3} ) \). Thus, we have the following:
\begin{align*} P(X_1 + X_2 + \ldots + X_{20} < 9.1) &= normalcdf \bigg(-1E99, 9.1, 10, \sqrt{\frac{5}{3}} \bigg) \\ & \approx 0.2428584558 \end{align*}