12 Convergence in distribution
 Page ID
 1113
Definition
Since we will be talking about convergence of the distribution of random variables to the normal distribution, it makes sense to develop the general theory of convergence of distributions to a limiting distribution.
Definition: Converging Distribution Functions 

Let \((F_n)_{n=1}^\infty\) be a sequence of distribution functions. We say that \(F_n\) converges to a limiting distribution function \(F\), and denote this by \(F_n \implies F\), if \(F_n(x)\to F(x)\) as \(n\to\infty\) for any \(x\in\R\) which is a continuity point of \(F\). If \(X, (X_n)_{n=1}^\infty\) are random variables, we say that \(X_n\) converges in distribution to \(X\) (or, interchangeably, converges in distribution to \(F_X\) if \(F_{X_n}\implies F_X\). 
This definition, which may seem unnatural at first sight, will become more reasonable after we prove the following lemma.
Lemma 

The following are equivalent:

Proof 

Proof that \(2\implies 1\): Assume that \(\expec f(X_n) \xrightarrow[n\to\infty]{} \expec f(X)\) for any bounded continuous function \(f:\R\to\R\), and fix \(x\in \R\). For any \(t\in \R\) and \(\epsilon>0\), define a function \(g_{t,\epsilon}:\R\to\R\) by \[ g_{t,\epsilon}(u) = \begin{cases} Then we have that \[ \expec (g_{x\epsilon,\epsilon}(X_n)) \le F_{X_n}(x) = \expec(\ind_{(\infty,x]}(X_n)) \le \expec(g_{x,\epsilon}(X_n))\] Letting \(n\to\infty\) gives the chain of inequalities \[ F_{X}(x\epsilon) \le \expec(g_{x\epsilon,x}(X)) \le \liminf_{n\to\infty} F_{X_n}(x) \le \limsup_{n\to\infty} F_{X_n}(x) \le \expec(g_{x,\epsilon}(X)) \le F_X(x+\epsilon). \] Now if \(x\) is a point of continuity of \(F_X\), letting \(\epsilon \downarrow 0\) gives that \(\lim_{n\to\infty}F_{X_n}(x) = F_X(x)\). Proof that \(3\implies 2\): this follows immediately by applying the bounded convergence theorem to the sequence \(g(Y_n)\). Proof that \(1 \implies 3\): Take \((\Omega,{\cal F},\prob) = ((0,1),{\cal B}(0,1), \textrm{Leb})\). For each \(n\ge 1\), let \(Y_n(x) = \sup\{ y : F_{X_n}(y) < x \}\) be the lower quantile function of \(X_n\), as discussed in a previous lecture, and similarly let \(Y(x)=\sup\{ y : F_X(y)<x\}\) be the lower quantile function of \(X\). Then as we previously showed, we have \(F_Y \equiv F_X\) and \(F_{Y_n}\equiv F_{X_n}\) for all \(n\). It remains to show that \(Y_n(x)\to Y(x)\) for almost all \(x\in(0,1)\). In fact, we show that this is true for all but a countable set of \(x\)'s. Denote \(Y^*(x) = \inf\{ y : F_X(y)>x\}\) (the upper quantile function of \(X\)). As we have seen, we always have \(Y(x) \le Y^*(x)\), and \(Y(x) = Y^*(x)\) for all \(x\in(0,1)\) except on a countable set of \(x\)'s (the exceptional \(x\)'s correspond to intervals where \(F_X\) is constant; these intervals are disjoint and each one contains a rational point). Let \(x\in(0,1)\) be such that \(Y(x)=Y^*(x)\). This means that for any \(y<Y(x)\) we have \(F_X(y)<x\), and for any \(z>Y(x)\) we have \(F_X(z)>x\). Now, take a \(y<Y(x)\) which is a continuity point of \(F_X\). Then \(F_{X_n}(y)\to F_X(y)\) as \(n\to\infty\), so also \(F_{X_n}(y)< x\) for sufficiently large \(n\), which means (by the definition of \(Y_n\)) that \(Y_n(x)\ge y\) for such large \(n\). This establishes that \(\liminf_{n\to\infty} Y_n(x)\ge y\), and therefore that \(\liminf_{n\to\infty} Y_n(x)\ge Y(x)\), since we have continuity points \(y<Y(x)\) that are arbitrarily close to \(Y(x)\). Similarly, take a \(z>Y(x)\) which is a continuity point of \(F_X\). Then \(F_{X_n}(z)\to F_x(z)\) as \(n\to\infty\), so also \(F_{X_n}(z)>x\) for large \(n\), which implies that \(Y_n(x)\le z\). Again, by taking continuity points \(z>Y(x)\) that are arbitrarily close to \(Y(x)\) we get that \(\limsup_{n\to\infty} Y_n(x) \le Y(x)\). Combining these last two results shows that \(Y_n(x)\to Y(x)\) which was what we wanted. 
Examples
 Normal convergence: We showed that if \(X_1,X_2,\ldots \) are i.i.d.\ \(\textrm{Binom}(1,p)\) r.v.'s and \(S_n=\sum_{k=1}^n X_k\), then \[ \frac{S_nn \expec(X_1)}{\sqrt{n} \sigma(X_1)} \implies N(0,1). \] Similarly, using explicit computations (see the homework) it is not too difficult to see that this is also true when \(X_1\sim \textrm{Poisson}(1)\), \(X_1 \sim \textrm{Exp}(1)\), and in other specific examples. The central limit theorem generalizes this claim to any i.i.d.\ sequence with finite variance.
 Waiting for rare events: If for each \(0<p<1\) we have a r.v. \(X_p \sim \textrm{Geom}_0(p)\), then \(\prob(X_p \ge n) = (1p)^{n1}\). It follows that \[ \prob(p X_p > x) = (1p)^{\lfloor x/p \rfloor} \xrightarrow[p\downarrow 0]{} e^{x}, \qquad (x > 0),\] so \[ p X_p \implies \textrm{Exp}(1)\qquad \textrm{ as }p\downarrow 0. \]
 P'olya's urn: Let \(X_n\) be the number of white balls in the P'olya urn experiment after starting with one white ball and one black ball and performing the experiment for \(n\) steps (so that there are \(n+2\) balls). In a homework exercise we showed that \(X_n\) is a discrete uniform r.v. on \(\{1,2,\ldots,n+1\}\). It follows easily that the proportion of white balls in the urn converges in distribution: \[ \frac{X_n}{n+2} \implies U(0,1). \]
 Gumbel distribution: If \(X_1,X_2,\ldots\) are i.i.d.\ \(\textrm{Exp}(1)\) random variables, and \(M_n=\max(X_1,\ldots,X_n)\), we showed in a homework exercise that \[ \prob(M_n\log n \le x) \xrightarrow[n\to\infty]{} e^{e^{x}}, \qquad x\to\infty \] It follows that \[ M_n\log n \implies F \] where \(F(x) = \exp\left(e^{x}\right)\) is called the Gumbel distribution.
Compactness and tightness
Theorem: Helly's Selection Theorem 

If \((F_n)_{n=1}^\infty\) is a sequence of distribution functions, then there is a subsequence \(F_{n_k}\) and a rightcontinuous, nondecreasing function \(H:\R\to[0,1]\) such that \[ F_{n_k}(x)\xrightarrow[n\to\infty]{} H(x)\] holds for any \(x\in\R\) which is a continuity point of \(H\). 
Note. The subsequential limit \(H\) need not be a distribution function, since it may not satisfy the properties \(\lim_{x\to\infty} H(x) = 0\) or \(\lim_{x\to\infty} H(x)=1\). For example, taking \(F_n = F_{X_n}\), where \(X_n \sim U[n,n]\), we see that \(F_n(x)\to 1/2\) for all \(x\in\R\). For a more interesting example, take \(G_n = (F_n + F_{Z_n})/2\) where \(F_n\) are as in the previous example, and \(Z_n\) is some sequence of r.v.'s that converges in distribution.
Proof 

First, note that we can find a subsequence \((n_k)_{k=1}^\infty\) such that \(F_{n_k}(r)\) converges to a limit \(G(r)\) at least for any \emph{rational} number \(r\). This is done by combining the compactness of the interval \([0,1]\) (which implies that for any specific \(a\in\R\) we can always take a subsequence to make the sequence of numbers \(F_n(a)\) converge to a limit) with a diagonal argument (for some enumeration \(r_1, r_2, r_3, \ldots\) of the rationals, first take a subsequence to force convergence at \(r_1\); then take a subsequence of that subsequence to force convergence at \(r_2\), etc.; now form a subsequence whose \(k\)th term is the \(k\)th term of the \(k\)th subsequence in this series). Now, use \(G(\cdot)\), which is defined only on the rationals and not necessarily rightcontinuous (but is nondecreasing), to define a function \(H:\R \to \R\) by \[ H(x) = \inf\{ G(r) : r\in\mathbb{Q}, r>x \}. \] This function is clearly nondecreasing, and is also rightcontinuous, since we have \[ \lim_{x_n \downarrow x} H(x_n) = \inf\{ G(r) : r\in\mathbb{Q}, r>x_n\textrm{ for some }n \} = \inf\{ G(r) : r\in\mathbb{Q}, r>x \} = H(x). \] Finally, let \(x\) be a continuity point of \(H\). To show that \(F_{n_k}(x)\to H(x)\), fix some \(\epsilon>0\) and let \(r_1,r_2,s\) be rationals such that \(r_1 < r_2 < x < s\) and \[ H(x)\epsilon < H(r_1) \le H(r_2) \le H(x) \le H(s) < H(x)+\epsilon. \] Then since \(F_{n_k}(r_2)\to G(r_2)\ge H(r_1)\), and \(F_{n_k}(s)\to G(s)\le H(s)\), it follows that for sufficiently large \(k\) we have \[ H(x)\epsilon < F_{n_k}(r_2) \le F_{n_k}(x) \le F_{n_k}(s) < H(x)+\epsilon. \] Therefore \[ H(x)\epsilon \le \liminf_{n\to\infty} F_{n_k}(x) \le \limsup_{n\to\infty} F_{n_k}(x) \le H(x)+\epsilon, \] and since \(\epsilon\) was arbitrary this proves the claim. 
Theorem~\ref{thmhelly} can be thought of as a kind of compactness property for probability distributions, except that the subsequential limit guaranteed to exist by the theorem is not a distribution function. To ensure that we get a distribution function, it turns out that a certain property called tightness has to hold.
Definition 

A sequence \((\mu_n)_{n=1}^\infty\) of probability measures on \((\R,{\cal B})\) is called tight if for any \(\epsilon>0\) there exists an \(M>0\) such that \[ \liminf_{n\to\infty} \mu_n([M,M]) \ge 1\epsilon. \] A sequence of distribution functions \((F_n)_{n=1}^\infty\) is called tight if the associated probability measures determined by \(F_n\) form a tight sequence, or, more explicitly, if for any \(\epsilon>0\) there exists an \(M>0\) such that \[ \limsup_{n\to\infty} (1F_n(M)+F_n(M)) < \epsilon. \] A sequence of random variables is called tight if the sequence of their distribution functions is tight. 
Theorem 

If \((F_n)_{n=1}^\infty\) is a tight sequence of distribution functions, then there exists a subsequence \((F_{n_k})_{k=1}^\infty\) and a distribution function \(F\) such that \(F_{n_k} \implies F\). In fact, any subsequential limit \(H\) as guaranteed to exist in the previous theorem is a distribution function. 
Exercise 

Prove that the converse is also true, i.e., if a sequence is not tight then it must have at least one subsequential limit \(H\) (in the sense of the subsequence converging to \(H\) at any continuity point of \(H\)) that is not a proper distribution function. In particular, it is worth noting that a sequence that converges in distribution is tight. 
Proof 

Let \(H\) be a nondecreasing, rightcontinuous function that arises as a subsequential limitindistribution of a subsequence \(F_{n_k}\), that we know exists by Theorem~\ref{thmhelly}. To show that \(H\) is a distribution function, fix \(\epsilon>0\), and let \(M>0\) be the constant guaranteed to exist in the definition of tightness. Let \(x<M\) be a continuity point of \(H\). We have \[ H(x)=\lim_{k\to\infty} F_{n_k}(x) \le \limsup_{k\to\infty} F_{n_k}(M) \le \limsup_{k\to\infty} (F_{n_k}(M)+(1F_{n_k}(M)) ) < \epsilon, \] so this shows that \(\lim_{x\to\infty} H(x) = 0. \( Similarly, let \(x>M\) be a continuity point of \(H\). Then \[ H(x)=\lim_{k\to\infty} F_{n_k}(x) \ge \liminf_{k\to\infty} F_{n_k}(M) \ge \liminf_{k\to\infty} (F_{n_k}(M))F_{n_k}(M) ) > 1\epsilon, \] which shows that \(\lim_{x\to\infty} H(x)=1.\) 
The condition of tightness is not very restrictive, and in practical situations it is usually quite easy to verify. The following lemma gives an example that is relevant for our purposes.
Lemma 

If \(X_1,X_2,\ldots\) are r.v.'s such that \(\expec X_n=0\) and \(\var(X_n)<C\) for all \(n\), then \((X_n)_n\) is a tight sequence. 
Proof 

Use Chebyshev's inequality: \[\prob(X_n>M) \le \frac{\var(X_n)}{M^2} \le \frac{C}{M^2},\] so, if \(\epsilon>0\) is given, taking \(M=\sqrt{C/\epsilon}\) ensures that the lefthand side is bounded by \(\epsilon\). 
Contributors
 Dan Romik (UC Davis)