$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 11 The Central Limit Theorem, Stirling's formula and the de Moivre-Laplace theorem

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

The Central Limit Theorem, Stirling's formula and the de Moivre-Laplace theorem

\label{chapter:stirling}

Our goal in the next few chapters will be to formulate and prove one of the fundamental results of probability theory, known as the Central Limit Theorem. Roughly speaking, this theorem establishes the normal distribution as the universal limiting law for the distribution of sums of independent and identically distributed random variables, and therefore explains the central role that the normal distribution plays in probability theory and statistics, and why it appears in virtually all applied sciences and is applicable to the study of many real-life phenomena.

We start with a motivating example that was also historically the first instance in which the phenomenon that came to be known as the Central Limit Theorem was observed. Let $$X_1, X_2, \ldots be an i.i.d. of \(\textrm{Binom}(1,p) random variables, and let \(S_n = \sum_{k=1}^n X_k$$, a r.v. with distribution $$\textrm{Binom}(n,p)$$.

\begin{thm}[The de Moivre-Laplace theorem] For any $$t\in \R$$,
$\prob \left(\frac{S_n - n p}{\sqrt{n p(1-p)}} \le t \right) \xrightarrow[n\to\infty]{} \Phi(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^t e^{-x^2/2}\,dx.  \label{thm-demoivre-laplace} \end{thm} Since this is such a concrete example, the proof will simply require us to estimate a sum of the form $$\sum_{0\le k\le t} \binom{n}{k} p^k (1-p)^{n-k}$$. Knowing how to estimate such sums is a useful skill in its own right. Since the binomial coefficients are involved, we also need some preparation related to Stirling's formula. \begin{lem} The limit $$C = \lim_{n\to\infty} \frac{n!}{\sqrt{n} (n/e)^n} exists. \label{lem-prestirling} \end{lem} \begin{proof} \begin{eqnarray*} \log n! &=& \sum_{k=1}^n \log k = \sum_{=1}^n \int_1^k \frac{dx}{x} = \int_1^n \frac{n-\lfloor x\rfloor}{x}\,dx \\ &=& \int_1^n \frac{n+\frac12 + (\{ x\}-\frac12)-x}{x}\, dx = (n+1/2)\log n - n + 1 + \int_1^n \frac{\{x\}-\frac12}{x}\,dx \\ &=& (n+1/2)\log n - n + 1 + \int_1^\infty \frac{\{x\}-\frac12}{x}\,dx + o(1), \end{eqnarray*} where the last integral converges because \(\int_1^t (\{x\}-\frac12) dx is bounded and \(1/x decreates monotonically to 0 as \(x\to\infty$$. \end{proof} Note that an easy consequence of Lemma~\ref{lem-prestirling} is that $$\binom{2n}{n} = (1+o(1))2^{2n}/C\sqrt{n/2}$$. We shall now use this to find the value of $$C$$. \begin{lem} Let $$f:\R\to\R be an \(n+1 times continuously-differentiable function. Then for all \(x\in\R$$, we have f(x) = f(0) + f'(0)x + \frac{f''(0)}{2}x^2 + \ldots + \frac{f^{(n)}(0)}{n!} x^n + R_n(x),$
where
$R_n(x) = \frac{1}{n!} \int_0^x f^{(n+1)}(t)(x-t)^n\,dt.  \label{lem-prevlemma} \end{lem} \begin{proof} This follows by induction on $$n$$, using integration by parts. \end{proof} \begin{lem} $$C=\sqrt{2\pi}$$. \end{lem} \begin{proof} Apply Lemma~\ref{lem-prevlemma} with $$f(x) = (1+x)^{2n+1} to compute \(R_n(1): \begin{eqnarray*} \frac{1}{2^{2n+1}} R_n(1) &=& \frac{1}{2^{2n+1}} \cdot \frac{1}{n!}\int_0^1 (2n+1)(2n)\cdots (n+1)(1+t)^n (1-t)^n \,dt \\ &=& \frac{2 \binom{2n}{n}}{2^{2n+1}} (n+\frac12) \int_0^1 (1-t^2)^n\, dt = \frac{ \binom{2n}{n} \sqrt{n}}{2^{2n}} (1+\frac{1}{2n}) \int_0^{\sqrt{n}} \left(1-\frac{u^2}{n}\right)^n\,du \\ &\xrightarrow[n\to\infty]{} & \frac{\sqrt{2}}{C} \int_0^\infty e^{-u^2}\,du = \frac{\sqrt{2}}{C} \cdot \frac{\sqrt{\pi}}{2}. \end{eqnarray*} The convergence of the integrals is justified by the fact that \((1-u^2/n)^n \le e^{-u^2} for all \(0\le u\le \sqrt{n}$$, and (1-u^2/n)^n \to e^{-u^2} as $$n\to\infty$$, uniformly on compact intervals. To finish the proof, note that \[ \frac{1}{2^{2n+1}} R_n(1) = \sum_{n<k\le 2n+1} \frac{\binom{2n+1}{k}}{2^{2n+1}} = \frac12$
(this is the probability that a $$\textrm{Binom}(2n+1,1/2) random variable takes a value \(> n$$). Therefore $$C=\sqrt{2\pi}$$, as claimed.
\end{proof}

\begin{cor}[Stirling's formula] $$\lim_{n\to\infty} \frac{n!}{\sqrt{2\pi n} (n/e)^n}=1$$.
\end{cor}

Note that the proof is based on computing $$\prob(S_{2n+1} > n) in two different ways, when \(S_{2n+1}\sim \textrm{Binom}(2n+1,1/2)$$. This is just the special case $$p=1/2, t=0 of Theorem~\ref{thm-demoivre-laplace}. In this very special case, by symmetry the probability is equal to \(1/2$$; on the other hand, Lemma~\ref{lem-prevlemma} enables us to relate this to the asymptotic behavior of $$n! and to (half of) the gaussian integral \(\int_{-\infty}^\infty e^{-x^2}\,dx$$. The evaluation of the constant $$C in Stirling's formula is the part that is attributed to James Stirling. The form that appears in Lemma~\ref{lem-prestirling} is due to Abraham de Moivre (1733). With this preparation, it is now possible to apply the same technique to prove Theorem~\ref{thm-demoivre-laplace}. Instead of the function \(f(x)=(1+x)^{2n+1}$$, take the function $$g(x)=((1-p) + p x)^n=\sum_{k=0}^n \binom{n}{k} p^k (1-p)^{n-k} x^k$$, and compute the remainder $$R_k(1) of the Taylor expansion of \(g$$, where $$k \approx np + t \sqrt{np(1-p)}$$. This should converge to $$1-\Phi(t)$$, and indeed, this follows without too much difficulty from Lemma~\ref{lem-prevlemma}. The computation is left as an exercise. We also sketch another way of proving Theorem~\ref{thm-demoivre-laplace} by directly approximating the probabilities $$\binom{n}{k} p^k (1-p)^{n-k} by Gaussian densities. \begin{proof}[Sketch of Proof of Theorem~\ref{thm-demoivre-laplace}] Denote \(q=1-p$$. For a large $$n$$, let $$k be approximately equal to \(np+t\sqrt{npq}$$, and use Stirling's formula to estimate the probability $$\prob(S_n=k)$$, as follows:
\begin{eqnarray*}
\prob(S_n = k) &=& \binom{n}{k} p^k q^{n-k} = (1+o(1)) \frac{\sqrt{2\pi n} (n/e)^n p^k q^{n-k}}{ \sqrt{2\pi k} (k/e)^k \sqrt{2\pi (n-k)} ((n-k)/e)^{n-k}}
\\ &=& \frac{1+o(1)}{\sqrt{2\pi n p q}} \left(\frac{np}{k}\right)^k \left( \frac{nq}{n-k}\right)^{n-k}
\\ &=& \frac{1+o(1)}{\sqrt{2\pi n p q}} \left(1+\frac{t\sqrt{q}}{\sqrt{np}}\right)^{-k} \left(1- \frac{t\sqrt{p}}{\sqrt{nq}}\right)^{-(n-k)}.
\end{eqnarray*}
Taking the logarithm of the product of the last two factors, using the facts that $$k\approx np+t\sqrt{npq},\ n-k\approx nq-t\sqrt{npq}$$, and that $$\log(1+x)=x-x^2/2+O(x^3) when \(x\to0$$, we see that
$$\log\left[\left(1+\frac{t\sqrt{q}}{\sqrt{np}}\right)^{-k} \left(1- \frac{t\sqrt{p}}{\sqrt{nq}}\right)^{-(n-k)}\right] \hspace{220.0pt}$$

\vspace{-35.0pt}
\begin{eqnarray*}
\\ &=& -(np+t\sqrt{npq}) \log\left(1+\frac{t\sqrt{q}}{\sqrt{np}} \right)
- (nq-t\sqrt{npq}) \log\left(1-\frac{t\sqrt{p}}{nq}\right)
\\ &=& -(np+t\sqrt{npq}) \left(\frac{t\sqrt{q}}{\sqrt{np}}-\frac{t^2 q}{2np} \right)
- (nq-t\sqrt{npq}) \left( -\frac{t\sqrt{p}}{\sqrt{nq}} - \frac{t^2 p}{2nq} \right) + O\left(\frac{t^3}{\sqrt{n}}\right)
\\ &=& -t\sqrt{npq} -t^2q + \frac{t^2 q}{2} + t \sqrt{npq} - t^2 p + \frac{t^2 p}{2} + O\left(\frac{t^3}{\sqrt{n}}\right)
\\ &=& -\frac{t^2}{2} + O\left(\frac{t^3}{\sqrt{n}}\right).
\end{eqnarray*}
It follows that
$\prob(S_n = k) = \frac{1+o(1)}{\sqrt{2\pi n p q}} e^{-t^2/2}  In other words, the individual probabilities for $$S_n approximate a normal density! From here, it is not too hard to show that the probability$\prob\left(a\le \frac{S_n - np}{\sqrt{npq}} \le b\right) = \sum_{np+a\sqrt{npq}\le k\le np+b\sqrt{npq}} \prob(S_n=k)$is approximately a Riemann sum for the integral \((2\pi)^{-1/2}\int_a^b e^{-x^2/2}\,dx = \Phi(b)-\Phi(a)$$. In fact, this is true since for $$a,b fixed and \(k ranging between \(np+a\sqrt{npq} and \(np+b\sqrt{npq}$$, the error concealed by the $$o(1) term is uniformly small (smaller than any \(\epsilon>0$$, say, when $$n is sufficiently large), since this error term originates with three applications of Stirling's approximation formula (for \(n!$$, for $$k! and for \((n-k)!$$) followed by the \(\log function second-order Taylor expansion above. %This requires having slightly better estimates for \(n! that are uniform in \(n -- for example, the proof can be completed using the precise bounds %\[ 1 \le \frac{n!}{\sqrt{2\pi n}(n/e)^n} \le e^{1/12n}.$
\end{proof}

One lesson that can be learned from this proof is that doing computations for specific distributions can be \emph{messy}! So we might be better off looking for more general, and therefore more conceptual, techniques for proving convergence to the normal distribution, that require less explicit computations; fortunately such techniques exist, and will lead us to the much more general central limit theorem.