Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

10 Applications and further examples

The Weierstrass approximation theorem

As an application of WLLN (or, rather, of Chebyshev's inequality), we prove the following theorem in analysis, which seems to have no connection to probability whatsoever.

Theorem: The Weier Approximation Theorem
If \(f:[0,1]\to\R\) is a continuous function, then \(f\) can be uniformly approximated by polynomials. That is, for any \(\epsilon>0\) there exists a polynomial \(p\) such that \(||f-p||:=\max_{x\in[0,1]}|f(x)-p(x)|<\epsilon\).
Proof

Let \(f:[0,1]\to\R\) be a continuous function. Define the sequence of Bernstein polynomials of \(f\) by

\[ B_n(x)= B_n^f(x) = \sum_{k=0}^n f\left(\frac{k}{n}\right) \binom{n}{k} x^k (1-x)^{n-k}. \]

We claim that \(||B_n-f||\to 0\) as \(n\to\infty\), which will prove the theorem. We will prove this by giving a probabilistic interpretation to \(B_n\). Indeed, fix \(x\in [0,1]\), and let \(X_1,X_2,\ldots\) be a sequence of i.i.d.\ r.v.'s with the Bernoulli distribution \(\textrm{Binom}(1,x)\). Denote \(S_n=\sum_{k=1}^\infty X_k\). Then it is easy to see that

\[ B_n(x) = \expec_x \left[f\left(\frac{S_n}{n}\right)\right], \]

where the notation ``\(\expec_x\)'' just means taking the expectation, while at the same time highlighting the fact that the distribution of \(S_n\) depends on the parameter \(x\).

Now the idea is that since the law of large numbers implies that \(S_n/n\) is with high probability close to its mean \(x\), and \(f\) is a continuous function, then \(f(S_n/n)\) should with high probability be close to \(f(x)\), and therefore also the average value of \(f(S_n/n)\), namely \(B_n(x)\), should be close to \(f(x)\), which is what we want. However, we want to make this claim uniformly in \(x\), so instead of invoking the WLLN (Theorem~\ref{thm-wlln}) we have to go back and look ``under the hood'' at Chebyshev's inequality which we used to prove it. We estimate \(|B_n(x)-f(x)|\) as follows. Fix some arbitrary \(\epsilon>0\). Let \(\delta>0\) be such that for any \(u,v\in[0,1]\), if \(|u-v|<\delta\) then \(|f(u)-f(v)|<\epsilon\) (this is guaranteed to exist because \(f\) is uniformly continuous on \([0,1]\)). Then

\begin{eqnarray*}
|B_n(x)-f(x)|\!\!\! &=&\!\!\! \left|\expec_x f\left(\frac{S_n}{n}\right)-f(x)\right| \le \expec_x \left|f\left(\frac{S_n}{n}\right)-f(x)\right|
\\ &=&\!\!\! \expec_x \left[\left|f\left(\frac{S_n}{n}\right)-f(x)\right|\ind_{\{ \left|\frac{S_n}{n}-x\right|>\delta \}}\right] +
\expec_x \left[\left|f\left(\frac{S_n}{n}\right)-f(x)\right|\ind_{\{ \left|\frac{S_n}{n}-x\right|\le\delta \}}\right]
\end{eqnarray*}

In this last expression, each of the two expectations is small for a different reason.

The second expectation is bounded by \(\epsilon\), since on the event that \(|S_n/n-x|\le \delta\), we have that \(|f(S_n/n)-f(x)|<\epsilon\). To bound the first expectation, denote \(M=||f||:=\max_{x\in[0,1]}|f(x)|<\infty\). Then, by bounding the difference of \(f\)-values by \(2M\) and then using Chebyshev's inequality, we get

 \[
\expec_x \left[\left|f\left(\frac{S_n}{n}\right)-f(x)\right|\ind_{\{ \left|\frac{S_n}{n}-x\right|>\epsilon \}} \right]
\le 2M\cdot \prob_x\!\left( \left|\frac{S_n}{n}-x\right|>\epsilon\right)
\le \frac{2 M \sigma^2(X_1)}{n \epsilon^2} = \frac{2 M x(1-x)}{n\epsilon^2}
\]

This bound converges to \(0\), not just for a single \(x\) but (fortunately for us) uniformly in \(x\in[0,1]\), since it is bounded from above by \(M/(2n\epsilon^2)\). So we have shown that

\[ ||B_n-f|| = \max_{x\in[0,1]} |B_n(x)-f(x)| \le \epsilon + \dfrac{M}{2n\epsilon^2}. \]

It follows that \(\limsup_{n\to\infty} ||B_n-f|| \le \epsilon\), and since \(\epsilon\) was an arbitrary positive number the result follows

Infinite expectations and triangular arrays

After treating the ``classical'' case of an i.i.d.\ sequence with finite expectations, let's turn to slightly more exotic situations. First, what happens if the expectation is infinite? For the strong law, the following result shows that we have no hope of having convergence in a meaningful sense.

Theorem: Converse to SLLN

If \(X_1,X_2,\ldots,\) is an i.i.d.\ sequence of r.v.'s with \(\expec|X_1|=\infty\), then

\[ \prob\left( \exists \lim_{n\to\infty} \frac{1}{n}\sum_{k=1}^n X_k \right) = 0. \]

What about the weak law? It turns out that a weak law can still hold in certain situations with infinite expectations, although one has to consider a generalized empirical average where the sum of the first \(n\) samples is divided by a quantity growing faster than \(n\). Rather than develop a complete theory, we will consider the particular example of the St. Petersburg Lottery. In this example, the winning in a single round of lottery is an integer-valued random variable with the following distribution:

\[ \prob(X=2^k) = 2^{-k}, \qquad k=1,2,3,\ldots \]

Let \(X_1,X_2,\ldots,\) be an i.i.d.\ sequence with the same distribution, and let \(S_n = \sum_{k=1}^n X_k\). How much should you agree to pay to be allowed to play this lottery \(n\) times? (Even the seemingly simple case \(n=1\) of this question  has been a subject of quite some debate by economists! See \verb+http://en.wikipedia.org/wiki/St...sburg_paradox+). In other words, how big should we expect \(S_n\) to be, with probability close to 1?

 

Theorem
\[ \frac{S_n}{n\log_2 n} \xrightarrow[n\to\infty]{\prob} 1. \]

In other words, to be allowed to pay the game \(n\) times when \(n\) is large (and assuming the payoff is in dollars), it may be considered reasonable to pay exactly (or even better, slightly less than) \(\log_2 n\) dollars per round played. For example, if \(n=1024\) you would be paying $10 per round.


\begin{proof}
The proof uses a truncation idea similar to the one we saw in the proof of SLLN, except that for each \(n\) we will truncate the first \(n\) variables at a level which is a function of \(n\). Denote \(b_n = n\log_2 n\), \(Y_{n,k} = X_k \ind_{\{X_k<b_n\}}\), \(T_n=\sum_{k=1}^n Y_{n,k}\), and \(a_n = \expec(T_n)\). We will prove that

\begin{equation}\label{eq:thenprove}
\frac{S_n-a_n}{b_n}\xrightarrow[n\to\infty]{\prob} 0.
\end{equation}

First we check that this is enough, by estimating \(a_n\):
\begin{eqnarray*}
a_n &=& \sum_{k=1}^n \expec(Y_{n,k}) = \sum_{k=1}^n \expec(X_k \ind_{\{X_k<b_n\}})
= \sum_{k=1}^n \left(\frac12\cdot 2 + \frac14\cdot 4 + \frac18 \cdot 8 + \ldots + \frac{1}{2^{m_{n}}}\cdot 2^{m_{n}}
\right)
\end{eqnarray*}

where \(m_n\) is the largest integer such that \(2^{m_n} \le b_n\), or in other words \( m_n = \lfloor \log_2 n + \log_2 \log_2 n \rfloor\), which gives

\[ a_n = \sum_{k=1}^n \lfloor \log_2 n + \log_2 \log_2 n \rfloor = n \log_2 n + O(n \log_2 \log_2 n), \]

so \(a_n\) indeed behaves like \(n\log_2 n\) up to first-order asymptotics.

Now to prove \eqref{eq:thenprove}, note that for any \(\epsilon>0\),

\begin{equation} \label{eq:lhs}
\prob\left( \left|\frac{S_n-a_n}{b_n}\right| > \epsilon\right) \le
\prob(T_n\neq S_n) + \prob\left( \left|\frac{T_n-a_n}{b_n}\right| > \epsilon\right).
\end{equation}

In this bound, the first term is bounded by

\[ \sum_{k=1}^n \prob(Y_{n,k}\neq X_k) = \sum_{k=1}^n \prob(X_k>b_n) \le \sum_{k=1}^n \frac{2}{b_n} \le \frac{2}{\log_2 n} \xrightarrow[n\to\infty]{} 0. \]

To bound the second term, use Chebyshev's inequality and the fact that \(\var(Y_{n,k})\le \expec(Y_{n,k}^2)\) to write

\begin{eqnarray*}
\prob\left( \left|\frac{T_n-a_n}{b_n}\right| > \epsilon\right) &\le& \frac{\var(T_n)}{\epsilon^2 b_n^2}
\le \frac{1}{\epsilon^2 b_n^2} \sum_{k=1}^n \expec(Y_{n,k}^2) \\
&\le& \frac{1}{\epsilon^2 b_n^2} \sum_{k=1}^n
\left(\frac12\cdot 2^2 + \frac14\cdot 4^2 + \frac18 \cdot 8^2 + \ldots + \frac{1}{2^{m_{n}}}\cdot 2^{2m_{n}} \right)
\\ &\le&
\frac{1}{\epsilon^2 b_n^2} \sum_{k=1}^n 2\cdot 2^{m_n}
\le \frac{2}{\epsilon^2 b_n^2} \sum_{k=1}^n b_n = \frac{2}{\epsilon^2 \log_2 n} \xrightarrow[n\to\infty]{} 0.
\end{eqnarray*}

We conclude that the left-hand side of \eqref{eq:lhs} converges to 0, as \(n\to\infty\), which finishes the proof.
\end{proof}

Another twist on laws of large numbers comes when we replace the notion of an i.i.d.\ sequence by a more general notion of a triangular array. In this case, for each \(n\) we have a sequence \(X_{n,1},X_{n,2},\ldots,X_{n,n}\) of independent, but not necessarily identically distributed, and we denote \(S_n=\sum_{k=1}^n X_{n,k}\) -- this is the sum of the samples in the \(n\)-th experiment. Here, for each \(n\) there could be a separate experiment involving \(n\) different r.v.'s, and the r.v.'s \(X_{n,k}\) and \(X_{m,j}\) for \(n\neq m\) are not even assumed to be defined on the same probability space, let alone to be independent of each other.

Again, instead of giving general conditions for a law of large numbers to hold, consider the following example of the so-called coupon collector's problem: A brand of breakfast cereals comes with a small toy chosen uniformly at random from a set of \(n\) possible kinds of toys. A collector will buy more boxes of cereals until she has collected all \(n\) different toys. Denote by \(T_n\) the number of boxes she ends up buying. What can we say about the size of \(T_n\)? Fortunately, we can represent it as a sum of independent r.v.'s, by writing

\[ T_n = X_{n,1} + X_{n,2} + X_{n,3} + \ldots + X_{n,n}, \]

where
\begin{eqnarray*}
X_{n,1} &=& \textrm{number of boxes purchased to get one kind of toy} \ \ =\ \ 1, \\
X_{n,2} &=& \textrm{number of boxes purchased after having one toy to get a different kind}, \\
X_{n,3} &=& \textrm{number of boxes purchased after having two kinds of toys to get a third kind}, \\
& \vdots&  \\
X_{n,n} &=& \textrm{number of boxes purchased after having \(n-1\) kinds of toys to get the last kind}.
\end{eqnarray*}
Clearly these r.v.'s are independent. Furthermore, \(X_{n,k}\) is a geometric r.v. with parameter \(p_{n,k} = (n-k+1)/n\). This gives us that
\[ \expec(T_n) = \sum_{k=1}^n \expec(X_{n,k}) = \sum_{k=1}^n \frac{n}{n-k+1} = n\left(\frac{1}{n}+\frac{1}{n-1}+\ldots+\frac{1}{2}+\frac{1}{1}\right) = n H_n, \]
where \( H_n = \sum_{k=1}^n 1/k\) is the \emph{$n\)-th harmonic number}, and
\[ \var(T_n) = \sum_{k=1}^n \var(X_{n,k}) =
\sum_{k=1}^n \frac{k-1}{n}\left(\frac{n}{n-k+1}\right)^2 \le n^2 \sum_{k=1}^n \frac{1}{k^2} \le 2 n^2
$$
(in this example, we only need a bound for \(\var(T_n)\), but it is possible also to get more precise asymptotics for this quantity). It follows using Chebyshev's inequality that for each \(\epsilon>0\),

\[ \prob\left( \left|\frac{T_n-n H_n}{n \log n} \right| > \epsilon\right) \le \frac{\var(T_n)}{\epsilon^2 n^2 (\log n)^2} \le \frac{2}{\epsilon^2 (\log n)^2} \xrightarrow[n\to\infty]{} 0,\]

so \((T_n-n H_n)/(n\log n)\) converges in probability to \(0\), and therefore we get that

\[ \frac{T_n}{n\log n} \xrightarrow[n\to\infty]{\prob} 1. \]

The lesson to be learned from the above examples is that some naturally-occurring problems in real life lead to more complicated situations than can be modeled with an i.i.d.\ sequence with finite mean; but often such problems can be analyzed anyway using the same ideas and techniques that we developed. In probability textbooks you can find a general treatment of various conditions under which a triangular array of independent random variables satisfies a (strong or weak) law of large numbers.

Random series of independent samples

We now look at the related topic of infinite series of independent r.v.'s. When can such a series be said to converge? A key technical result that will help us answer this question in some cases is the following beautiful inequality due to Kolmogorov.

Theorem: Kolmogorov's maximal inequality

Assume that \(X_1,X_2,\ldots,X_n\) are independent r.v.'s with finite variances, and let \(S_k=\sum_{j=1}^k X_j\). Then

\[ \prob\left( \max_{1\le k\le n} |S_k-\expec (S_k)| > t \right) \le \frac{\var(S_n)}{t^2} = \frac{\sum_{k=1}^n \sigma^2(X_k)}{t^2}. \]

Before we start the proof, note that the bound on the right-hand side is the usual variance bound that follows from Chebyshev's inequality; except that the event on the left-hand side whose probability this quantity bounds is a much bigger event than the usual deviation event \(\{ |S_n-\expec(S_n)|>t\}\) for which we know the Chebyshev bound holds!

\begin{proof} We may assume without loss of generality that \(\expec(X_k)=0\) for all \(k\). Denote

\[ A = \left\{ \max_{1\le k\le n} |S_k| > t\right\}, \]

and define events \(A_1,A_2,\ldots,A_n\) by

\[ A_k = \left\{ |S_k|\ge t, \ \ \max_{1\le j<k} |S_k|<t \right\}. \]

In words, \(A_k\) is the event that the sequence of cumulative sums \((S_j)_{j=1}^n\) exceeded \(t\) in absolute value for the first time at time \(k\). Note that these events are disjoint and their union is the event \(A\) whose probability we are trying to bound. As a consequence, we can lower-bound the variance \(\var(S_n)=\expec(S_n^2)\) of \(S_n\), as follows:

\begin{eqnarray*}
\expec(S_n^2) &\ge& \expec(S_n^2 1_A) = \sum_{k=1}^n \expec \left(S_n^2 1_{A_k}\right)
= \sum_{k=1}^n \expec \left[ \left(S_k+(S_n-S_k)\right)^2 1_{A_k} \right]
\\ &=&
\sum_{k=1}^n \Big[\expec (S_k^2 1_{A_k}) + \expec\left[ (S_n-S_k)^2 1_{A_k} \right]
+ 2 \expec \left[(S_k 1_{A_k}) (S_n-S-k) \right] \Big].
\end{eqnarray*}

In this last expression, the terms \(\expec \left[(S_k 1_{A_k}) (S_n-S-k) \right]\) are equal to \(0\), since \(S_k 1_{A_k}\) is a random variable that depends only on \(X_1,\ldots,X_k\) and hence independent of \(S_n-S_k\), which depends only on the values of \(X_{k+1},\ldots,X_n\), which causes the expectation of their product to be equal to the product of the expectations, which is 0. Furthermore, the middle terms \(\expec\left[ (S_n-S_k)^2 1_{A_k} \right]\) are all nonnegative, and each of the first terms \(\expec (S_k^2 1_{A_k})\) satisfies

\[ \expec (S_k^2 1_{A_k}) \ge \expec (t^2 1_{A_k}) = t^2 \prob(A_k), \]

since on the event \(A_k\) we know that \(S_k^2\) is at least \(t^2\) (look again at the definition of \(A_k\)). Combining these observations, we get that

\[ \var(S_n)  \ge t^2\sum_{k=1}^n \prob(A_k) = t^2 \prob(A), \]

which is exactly the claim that was to be proved.
\end{proof}

As a corollary, we get a result on convergence of random series.

\begin{thm} Let \(X_1,X_2,\ldots\) be a sequence of independent r.v.'s such that \(\expec(X_n)=0\) for all \(n\), and assume that \(\sum_{n=1}^\infty \var(X_n) < \infty\). Then the random series \(\sum_{n=1}^\infty X_n\) converges almost surely.
\label{thm-one-series}
\end{thm}

\begin{proof} Denote as usual \(S_n = \sum_{k=1}^n X_k\). We have the following equality of events:
\begin{eqnarray*}
\left\{ \sum_{n=1}^\infty X_n\textrm{ converges}
\right\}
&=&
\left\{ \left(\sum_{n=1}^N X_n\right)_{N\ge 1} \textrm{ is a Cauchy sequence}
\right\}
\\ &=&
\bigcap_{\epsilon>0}
\bigcup_{N\ge 1}
\bigcap_{n\ge N}
\Big\{
|S_n-S_N|<\epsilon
\Big\}.
\end{eqnarray*}

Or, put differently, we can look at the complement of this event and represent it as

\begin{eqnarray*}
\left\{ \sum_{n=1}^\infty X_n\textrm{ does not converge}
\right\}
&=&
\bigcup_{\epsilon>0}
\bigcap_{N\ge 1}
\bigcup_{n\ge N}
\Big\{
|S_n-S_N|\ge\epsilon
\Big\}
\\ &=&
\bigcup_{\epsilon>0}
\bigcap_{N\ge 1}
\left\{ \sup_{n\ge N}
|S_n-S_N|\ge\epsilon
\right\}.
\end{eqnarray*}

This form is exactly suitable for an application of the Kolmogorov's maximal inequality, except that here we have an infinite sequence of partial sums instead of a finite maximum. However, by the ``continuity from below'' property of probability measures, we see that it does not matter. More precisely, for any \(\epsilon>0\) and \(N\ge 1\), we have

\begin{eqnarray*}
 \prob\left(
 \sup_{n\ge N}
|S_n-S_N|\ge\epsilon
\right) &=& \lim_{M\to\infty} \prob\left(
 \sup_{N\le n\le M}
|S_n-S_N|\ge\epsilon
\right) \le \lim_{M\to\infty} \frac{\var(S_M-S_N)}{\epsilon^2} \\ &=& \frac{1}{\epsilon^2} \sum_{n=N}^\infty \var(X_n).
\end{eqnarray*}
Therefore also
\begin{eqnarray*}
 \prob\left( \bigcap_{N\ge 1} \left\{
 \sup_{n\ge N}
|S_n-S_N|\ge\epsilon
\right\} \right) &\le& \inf_{N\ge 1}
 \prob\left(
 \sup_{n\ge N}
|S_n-S_N|\ge\epsilon
\right) \\ &\le&
\inf_{N\ge 1} \frac{1}{\epsilon^2} \sum_{n=N}^\infty \var(X_n) = 0,
\end{eqnarray*}

because of our assumption that the sum of the variances converges. Finally, this is true for all \(\epsilon>0\), so by the usual trick of replacing an uncountably-infinite intersection by a countable one (provided that the particular form of the event in question warrants this!), we get the claim that

\[ \prob\left(\sum_{n=1}^\infty X_n\textrm{ does not converge}\right) = 0. \]

\end{proof}

One could ask whether the sufficient condition given by the theorem above is also necessary (it is). More generally, what happens for random variables with non-zero expectations? What happens for r.v.'s with infinite expectations, or infinite variances? Kolmogorov formulated a general theorem that gives a necessary and sufficient condition for a series of independent random variables to converge almost surely.

%Due to lack of time, we will not prove it here, but the techniques required to prove it are rather similar to the techniques we have been using so far.

Theorem: The Kolmogorov three-series theorem
If \(X_1,X_2,\ldots\) is a sequence of independent random variables, then the random series \(\sum_{n=1}^\infty X_n\) converges almost surely if and only if the following three conditions hold:
  1. \item \(\sum_{n=1}^\infty \prob(|X_n|>1) < \infty\).
  2. \item The series \(\sum_{n=1}^\infty \expec(X_n \ind_{\{|X_n|\le 1\}})\) converges.
  3. \item \(\sum_{n=1}^\infty \var(X_n \ind_{\{|X_n| \le 1\}}) < \infty\).
If one of the conditions does not hold, then series \(\sum X_n\) diverges almost surely.

We postpone the proof of Theorem~\ref{thm-three-series} until later; we will prove it in Section~\ref{sec-k-3-series} as an application of a generalized version of the central limit theorem. Note that since the convergence of the series \(\sum X_n\) is equivalent to the convergence of \(\sum a X_n\) for any constant \(a\), the value \(1\) chosen for the truncation of \(X_n\) in the theorem is arbitrary and can be replaced by any other constant.

\begin{example} Let \((c_n)_{n=1}^\infty\) be a sequence of real numbers. Consider the series with random signs associated with the sequence \((c_n)\), which we denote \(\sum_{n=1}^\infty \pm c_n\), and which more precisely represents the series

\[ \sum_{n=1}^\infty c_n X_n, \]

where \(X_1, X_2, \ldots\) is a sequence of i.i.d. r.v.'s taking the values \(-1,+1\) with respective probabilities \(1/2,1/2\). By Theorem~\ref{thm-one-series} it follows that if \(\sum_{n=1}^\infty c_n^2 < \infty\) then the series \(\sum_{n=1}^\infty \pm c_n\) converges almost surely. From Theorem~\ref{thm-three-series}, one can check easily that this condition is also necessary, in other words that the series with random series converges a.s.\ if and only if the series of squares \(\sum c_n^2\) converges. Thus, for example, the \textbf{harmonic series with random signs} \(\sum \pm \frac{1}{n}\) converges a.s., but the analogous series of square root reciprocals \(\sum_n \pm \frac{1}{\sqrt{n}}\) diverges a.s. Compare this to the series with alternating signs \(\sum_n \frac{(-1)^n}{n}\) and \(\sum_n \frac{(-1)^n}{\sqrt{n}}\), both of which are known to converge! Try to develop your intuition by thinking about the reasons why the behavior for series with alternating signs and that for random signs is not the same.
\end{example}

Contributors