$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 15 Random number generation

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

An important problem in applied probability theory is to produce one or more random variables distributed according to some specified distribution, starting from a source of random information whose distribution is fixed. This is referred to as random number generation and is also known as statistical simulation or sampling. In this chapter we survey several interesting methods for random number generation

Note that we assume that our source of randomness provides us with truly random information as the input for the computation. In the absence of such random information, the problem of producing information that \emph{appears} to be random using a deterministic computation applied to a non-random input, is a very different problem, known as pseudorandom number generation, which we will not discuss (although it is also very interesting!).

## Unbiasing a source of randomness: simulating an unbiased coin using biased coin tosses

Assume that your source of randomness produces a sequence $$X_1, X_2, \ldots$$ of Bernoulli random variables with some bias $$0<p<1$$. Such random bits are much more practical for use in applications if they are unbiased, that is, if $$p=1/2$$. Can we use the $$p$$-biased coin tosses to produce an unbiased coin toss? Yes: the famous mathematician John von Neumann suggested the following simple method in 1951.

\paragraph{Simulation Method 1.}

1. Sample a pair $$X,Y$$ of the $$p$$-biased coin tosses.
2. If $$(X,Y)=(1,0)$$, output $$0$$''.
3. If $$(X,Y)=(0,1)$$, output $$1$$''.
4. If $$(X,Y)=(0,0)$$ or $$(X,Y)=(1,1)$$, go back to step 1.
Exercise 15.1
Show that the method works; that is, that with probability $$1$$ the method eventually outputs a random variable $$Z \sim \berdist(1/2)$$.

An interesting and useful feature of Von Neumann's method is that it does not even require knowing the bias $$p$$ of the source; that is, it is a universal unbiasing method.''

What about the efficiency of the method? It seems interesting to ask how many samples of the biased sequence we will need to produce our unbiased random bit. This number is random, and unbounded --- we may have to wait an arbitrarily long time for the simulation to finish --- so perhaps it makes more sense to ask about the mean number of samples required.

Exercise 15.2
Show that on average the method requires $$\frac{1}{p(1-p)}$$ samples of the sequence $$(X_n)_{n=1}^\infty$$. (In particular, when $$p$$ is very close to $$0$$ or $$1$$ the average wait becomes very long.)

## Simulating a biased coin using unbiased coin tosses

Let us now consider the reverse problem of simulating a biased coin toss with bias $$p$$ when our source of randomness produces independent unbiased bits $$X_1,X_2,\ldots \sim \berdist(1/2)$$. Of course, here we assume that the value $$p$$ of the desired bias is known, otherwise the question makes no sense. It turns out that in this case too there is a simple method, and moreover the method works extremely well even if $$p$$ is a very complicated number such as $$1/\sqrt{2}$$ or $$\pi-3$$.

The method is based on the easy-to-prove observation that the random variable $$U=\sum_{n=1}^\infty \frac{X_n}{2^n}$$ has distribution $$U[0,1]$$. Then the indicator random variable $$Z = 1_{\{U\le p\}}$$ is a $$\berdist(p)$$ random variable. It only remains to note that $$Z$$ can be computed efficiently by uncovering the random bits $$X_1,X_2,\ldots$$ one at a time and stopping the computation as soon as it becomes apparent whether the event $$\{U\le p\}$$ occurred or its complement $$\{U>p\}$$. The way this question is settled is described as follows.

\paragraph{Simulation Method 2.} Let $$p=(0.\alpha_1 \alpha_2 \alpha_3 \ldots )_2$$ be the binary expansion of $$p$$, that is, $$\alpha_n\in \{0,1\}$$ and $$p=\sum_{n=1}^\infty \alpha_n/2^{n}$$.

1. Sample the random bits $$X_1,X_2,\ldots$$ one at a time until the first time $$m\ge1$$ such that $$X_m \neq \alpha_m$$.
2. If $$(X_m,\alpha_m)=(0,1)$$, output $$1$$''.
3. If $$(X_m,\alpha_m)=(1,0)$$, output $$0$$''.
Exercise 15.3
Prove that the output is $$1$$ if $$U< p$$; $$0$$ if $$U>p$$; and the algorithm never terminates in the event (which has probability 0) that $$U=p$$.
Exercise 15.4
Show that the number $$N$$ of bits which had to be sampled to obtain an answer satisfies $$\expec N = 2$$. What is the distribution of $$N$$?

## Simulating an arbitrary discrete distribution using unbiased coin tosses

We can generalize Simulation Method 2 described above to get a method for simulating an arbitrary discrete random variable taking values $$\alpha_1,\ldots,\alpha_k$$ with respective probabilities $$p_1,\ldots,p_k$$, using a sequence of independent unbiased coin tosses $$X_1,X_2,\ldots \sim \berdist(1/2)$$. Let $$U=\sum_{n=1}^\infty X_n/2^n$$ as before, and for $$m\ge 1$$ let $$U_m = \sum_{k=1}^m X_k/2^k$$ be the partial sums of the binary expansion of $$U$$.

\paragraph{Simulation Method 3.}
Denote $$c_0 = 0, c_j = p_1+\ldots+p_j$$. Note that the intervals $$(c_0,c_1), (c_1,c_2),\ldots, (c_{k-1},c_k)$$ form a partition of the interval $$(0,1)$$ into subintervals of lengths $$p_1,\ldots,p_k$$. To produce a sample from the discrete distribution with atoms of size $$p_j$$ at $$\alpha_j$$ for $$j=1,\ldots,k$$:

1. Sample $$X_1,X_2,\ldots$$ one at a time until the first time $$m$$ for which there is a $$j$$ such that the condition $c_{j-1} < U_m < U_m+\frac{1}{2^m} < c_j$ holds. Note that this condition can be checked by comparing the binary string $$(X_1,\ldots,X_m)$$ against the first $$m$$ digits in the binary expansions of the numbers $$c_0,\ldots,c_k$$.
2. Output $$\alpha_j$$.
Exercise 15.5
Show that the condition $$c_{j-1} < U_m < U_m+\frac{1}{2^m} < c_j$$ is equivalent to $$c_{j-1} < U < c_j$$, and that therefore for each $$1\le j\le k$$ the output $$\alpha_j$$ is obtained with probability $$p_j$$.

The average number of samples from $$X_1,X_2,\ldots$$ required for the simulation looks like a highly nontrivial function of the probabilities $$p_1,\ldots,p_k$$. Amazingly, to a good approximation it is equal to the entropy of the distribution $$p_1,\ldots,p_k$$, a well-known function from information theory that is known to measure the information content of a random variable. The precise result to that effect is as follows.

Theorem 15.6

The entropy of the discrete probability distribution $$p_1,\ldots,p_k$$ is defined by $$H(p_1,\ldots,p_j) = -\sum_{j=1}^k p_j \log_2 p_j$$. The number of samples $$N$$ required in the simulation satisfies

$H(p_1,\ldots,p_k) \le \expec N \le H(p_1,\ldots,p_k) + 4.$

For the proof, see Sharp entropy bounds for discrete statistical simulation (Stat. Prob. Lett. 42 (1999), 219--227), or section 5.11 of the book Elements of Information Theory, 2nd Ed. by T.~M.~Cover and J.~A.~Thomas, where a similar result is proved for a different simulation method for which $$\expec N$$ can be bounded from above by the slightly better bound $$H(p_1,\ldots,p_k)+2$$.

## Simulating a general r.v.\ using a uniform r.v.

Theorem~\ref{thm-rvexistence} from section~\ref{sec:dist-funs}, which we proved earlier for its theoretical importance, can be interpreted about a statement about how to simulate a random variable with a specified distribution $$F$$ using a $$U[0,1]$$ random variable.

\paragraph{Simulation Method 4.} Given a c.d.f.\ $$F$$ and a $$U[0,1]$$-distributed random variable $$U$$, set $$X=g(U)$$, where

$g(p) = \sup \{ x\in\R\,:\, F(x) < p \} \qquad (0<p<1).$

(this is the lower quantile function associated with $$F$$; see section~\ref{sec:dist-funs}).

Exercise 15.7
Explain why the above method works and its connection to Theorem~\ref{thm-rvexistence}.

## Simulating an exponential random variable

As an illustration of Simulation Method 4 described above, in the case where $$F$$ is the $$\expdist(\lambda)$$ distribution, the function $$g$$ is easily computed to be $$g(p)=-\frac{1}{\lambda} \log(1-p)$$, so taking $$X=-\frac{1}{\lambda} \log (1-U)$$ produces an $$\expdist(\lambda)$$ r.v. We can simplify this slightly by noting that if $$U\sim U[0,1]$$ then also $$1-U \sim U[0,1]$$, so one can use the simpler function $$g(1-p)=-\frac{1}{\lambda}\log p$$.

\paragraph{Simulation Method 5.} Given a random variable $$U\sim U[0,1]$$, the random variable $$X = -\frac{1}{\lambda}\log U$$ has distribution $$\expdist(\lambda)$$.

## Simulating a normal random variable

Although Simulation Method 4 is very general, in many practical cases it is hard or annoying to compute the associated quantile function $$g(p)$$ (which in the case of an absolutely continuous distribution amounts to inverting the c.d.f.). The normal distribution is an example where there exists a more practical method that is based instead on the polar decomposition of a standard bivariate normal vector (section~\ref{sec:normal-dist}).

\paragraph{Simulation Method 6.} Given two independent r.v.s $$U_1,U_2 \sim U[0,1]$$, define
\begin{align*}
\Theta &= 2\pi U_1, \\
R &= \sqrt{-2 \log U_2}, \\
X &= R\cos \Theta, \\
Y &= R \sin \Theta.
\end{align*}

Then $$\Theta \sim U[0,2\pi]$$, $$R^2 \sim \expdist(1/2)$$, and $$R$$ and $$\Theta$$ are independent. Therefore $$X,Y$$ are independent and have the standard normal distribution $$N(0,1)$$.

## Simulating a Poisson random variable

Given a sequence $$U_1,U_2,\ldots$$ of i.i.d. $$U[0,1]$$ random variables, the following method simulates a $$\poissondist(\lambda)$$ r.v.

\paragraph{Simulation Method 7.}

1. Sample the $$U_k$$'s one at a time until the first time $$m$$ when $$M_m =\prod_{k=1}^m U_k$$ satisfies $$M_m \le e^{-\lambda}$$.
2. Output $$m-1$$.
Exercise 15.8
Explain why the output $$Z$$ of this algorithm has distribution $$\poissondist(\lambda)$$, using the connection between the Poisson distribution and the cumulative sums of i.i.d. exponential random variables (see section~\ref{sec:exp-dist}).

Note that the number of samples this method requires is equal to one plus the random variable $$Z$$ being simulated. In particular, the mean number of samples required is $$\expec Z = \lambda+1$$.