12.1: The Central Limit Theorem
( \newcommand{\kernel}{\mathrm{null}\,}\)
In the last section, we introduced the normal random variable and primarily discussed how to find the probabilities associated with a normal random variable. We also mentioned that the normal random variable is arguably the most important continuous random variable. We shall now find out why. We first make the following definition:
Definition: Let X1,X2,X3,…,Xn be random variables. We call the new random variable,
X1+X2+X3+…+Xnn
the sample mean and we denote it by ¯Xn.
1) Suppose Xi models the outcome of our ith roll of a fair die. Then the random variable X1+X2+…+X100100 is the sample mean of the 100 rolls.
2) Let us suppose there are 30 people in a classroom and let Xi models the amount of time person i spends watching TV on a daily basis. Then the random variable X1+X2+…+X3030 represents the sample mean.
The question we will consider is if the random variables, X1,X2,X3,…,Xn are all independent and identically distributed, then can we say anything about their sample mean? The answer is yes and the details are given to us in one of the most important theorems in all of probability and statistics - The Central Limit Theorem.
Theorem: Suppose the random variables X1,X2,X3,…,Xn are all independent and identically distributed where each random variable has mean μ and variance σ2. Then for each fixed x,
limn→∞P[¯Xn−μσ/√n≤x]=Φ(x)
Or equivalently
limn→∞P[X1+X2+…+Xn−nμσ√n≤x]=Φ(x)
Suppose we have a bunch of random variables, X1,X2,X3,…,Xn that are all independent and have the same distribution. (For instance, all of them are Uniform random variables with the same parameters or all of them are Exponential random variables with the same parameters). Since they come from the same or identical distribution, then they all have the same average, μ, and same variance, σ2. The Central Limit Theorem tells us that:
1) the new random variable, X1+X2+…+Xnn=¯Xn will approximately be N(μ,σ2n).
2) the new random variable, X1+X2+…+Xn will be approximately N(nμ,nσ2).
Additionally, notice how general the Central Limit Theorem is! We are saying the distribution of X1,X2,X3,…,Xn can be anything we wish, as long as the random variables all have the same distribution. Regardless, the sample mean (as well as the sum) will tend to a normal distribution.
Before we consider our examples, allow us to take a few moments to use the following applet. First select a distribution and then select the value of n. Notice that regardless of the starting distribution, the sample mean will tend to a normal distribution as n increases.
Let ¯X25 denote the sample mean of 25 random variables where each random variable comes from a distribution with a mean of 15 and variance of 4. Find P(14.4≤¯X25≤15.6).
- Answer
-
In this problem μ=15 and σ2=4. By the Central Limit Theorem, ¯X25 will approximately be
N(μ,σ2n)=N(15,425)To finish the problem, we note that
P(14.4≤¯X25≤15.6)=normalcdf(−1E99,15.6,15,2/5)−normalcdf(−1E99,14.4,15,2/5)=0.8663855426If you do not have a calculator, then you are forced to use the theorem from last section with the CDF chart:
P(14.4≤¯X25≤15.6)=P(14.4−152/5≤X−152/5≤15.6−152/5)≈P(−1.5≤Z≤1.5)=0.9332−0.0668=0.8664
Let ¯X25 denote the sample mean when the underlying random variables come from a distribution whose pdf is given by
\boldsymbol{f(x) =
\begin{cases}
\frac{x^3}{4} & \text{if} ~ 0 < x < 2 \\
0 & \text{otherwise} \\
\end{cases} \nonumber\}
Find/approximate P(1.5≤¯X25≤1.65).
- Answer
-
Before we can even answer this question, we have to understand what the problem is asking us. This question is saying suppose you have these random variables, X1,X2,…X25 where these random variables all have the above density. We are asked to find or approximate
P(1.5≤X1+X2+…+X2525≤1.65)By the Central Limit Theorem, X1+X2+…+X2525=¯X25 will approximately be N(μ,σ225). Let us find μ and σ2.
μ=E[Xi]=∫all xxf(x) dx=∫20x(x34) dx=85=1.6
σ2=Var[X]=E[X2]−(E[X])2=E[X2]−1.62=∫20x2(x34) dx−1.62=83−1.62=875
By the Central Limit Theorem, X1+X2+…+X2525=¯X25 will approximately be N(μ,σ225) which turns out to be N(1.6,8/7525)=N(1.6,81875).
So ¯X25∼N(1.6,81875). Hence,
P(1.5≤¯X25≤1.65)≈normalcdf(1.5,1.65,1.6,√81875≈0.7151096523
Or alternatively, if we wish to use the standardization process, we obtain the following:
P(1.5≤¯X25≤1.65)=P(1.5−1.6√81875≤¯X25−1.6√81875≤1.65−1.6√81875)≈P(−1.531≤Z≤0.765)≈P(−1.53≤Z≤0.77)=Φ(0.77)−Φ(−1.53)=0.7794−0.0630=0.7164
Suppose X1,X2,…,X20 are all independent and identically distributed as U[0,1]. Find P(X1+X2+…+X20<9.1).
- Answer
-
Recall if X∼U[a,b] then E[X]=a+b2 and Var[X]=(b−a)212. Hence, for the U[0,1], E[X]=0+12=12=μ and Var[X]=(1−0)212=112=σ2.
We know by The Central Limit Theorem, X1+X2+…+Xn will be approximately N(nμ,nσ2) and so X1+X2+…+X20 will be approximately N(20(12),20(112))=N(10,53). Thus, we have the following:
P(X1+X2+…+X20<9.1)=normalcdf(−1E99,9.1,10,√53)≈0.2428584558