5.5: Random Genetic Drift
Up to now, our simplified genetic models have all assumed an infinite population, neglecting stochastic effects. Here, we consider a finite population: the resulting stochastic effects on the allelic frequencies is called random genetic drift. The simplest genetic model incorporating random genetic drift assumes a fixed-sized population of \(N\) individuals, and models the evolution of a diallelic haploid genetic locus.
There are two widely used genetic models for finite populations: the WrightFisher model and the Moran model. The Wright-Fisher model is most similar to our infinite-population discrete-generation model. In the Wright-Fisher model, \(N\) adult individuals release a very large number of gametes into a gene pool, and the next generation is formed from \(N\) random gametes independently chosen from the gene pool. The Moran model takes a different approach. In the Moran model, a single evolution step consists of one random individual in the population reproducing, and another random individual dying, with the population always maintained at the constant size \(N\) . Because two random events occur every generation in the Moran model, and \(N\) random events occur every generation in the Wright-Fisher model, a total of \(N / 2\) evolution steps in the Moran model is comparable, but not exactly identical, to a single discrete generation in the Wright-Fisher model. It has been shown, however, that the two models become identical in the limit of large \(N\) . For our purposes, the Moran model is mathematically more tractable and we adopt it here.
We develop our model analogously to the stochastic population growth model derived in \(\S 3.1\) . We let \(n\) denote the number of \(A\) -alleles in the population, and \(N-n\) the number of \(a\) -alleles. With \(n=0,1,2, \ldots, N\) a discrete random variable, the probability mass function \(p_{n}(g)\) denotes the probability of having \(n A\) -alleles at evolution step \(g .\) With \(n\) A-alleles in a population of size \(N\) , the probability of an individual with \(A\) either reproducing or dying is given by \(s_{n}=n / N\) ; the corresponding probability for \(a\) is \(1-s_{n}\) . There are three ways to obtain a population of \(n A\) -alleles at evolution step \(g+1\) . First, there were \(n A\) -alleles at evolution step \(g\) , and the individual reproducing carried the same allele as the individual dying. Second, there were \(n-1\) -alleles at evolution step \(g\) , and the individual reproducing has \(A\) and the individual dying has \(a\) . And third, there were \(n+1 A\) -alleles at evolution step \(g\) , and the individual reproducing has \(a\) and the individual dying has \(A\) . Multiplying the probabilities and summing the three cases results in
\[\begin{align} \nonumber p_{n}(g+1)=\left(s_{n}^{2}+\left(1-s_{n}\right)^{2}\right) p_{n}(g)+s_{n-1}\left(1-s_{n-1}\right) p_{n-1}(g) & \\[4pt] &+s_{n+1}\left(1-s_{n+1}\right) p_{n+1}(g) \end{align} \nonumber \]
Note that this equation is valid for \(0<n<N\) , and that the equations at the boundaries-representing the probabilities that one of the alleles is fixed-are
\[\begin{align} p_{0}(g+1) &=p_{0}(g)+s_{1}\left(1-s_{1}\right) p_{1}(g) \\[4pt] p_{N}(g+1) &=p_{N}(g)+s_{N-1}\left(1-s_{N-1}\right) p_{N-1}(g) \end{align} \nonumber \]
The boundaries are called absorbing and the probability of fixation of an allele monotonically increases with each birth and death. Once the probability of fixation of an allele is unity, there are no further changes in allele frequencies.
We illustrate the solution of \((5.5.1)\) -(5.5.3) in Fig. \(5.3\) for a small population of size \(N=20\) , and where the number of \(A\) -alleles is precisely known in the founding generation, with either (a) \(p_{10}(0)=1\) , or; (b) \(p_{13}(0)=1\) . We plot the probability mass density of the number of \(A\) -alleles every \(N\) evolution steps, up to \(7 N\) steps, corresponding to approximately fourteen discrete generations of evolution in the Wright-Fisher model. Notice how the probability distribution diffuses away from its initial value, and how the probabilities eventually concentrate on the boundaries, with both \(p_{0}\) and \(p_{20}\) monotonically increasing. In fact, after a large number of generations, \(p_{0}\) approaches the initial frequency of the \(a\) -allele and \(p_{20}\) approaches the initial frequency of the \(A\) -allele (not shown in figures).
Now, the number of \(A\) -alleles chosen when forming the next Wright-Fisher generation is a binomial random variable \(n^{\prime}\) with parameters \((N, n / N)\) . Therefore, \(E\left[n^{\prime}\right]=n\) and \(\operatorname{Var}\left[n^{\prime}\right]=n(1-n / N)\) . With \(x^{\prime}=n^{\prime} / N\) and \(x=n / N\) , we have \(E\left[x^{\prime}\right]=x\) , and \(\operatorname{Var}\left[x^{\prime}\right]=x(1-x) / N\) . The function \(V(x)\) can therefore be interpreted as the variance of \(x\) over a single Wright-Fisher generation.
Although (5.5.9) and (5.5.10) depend explicitly on the population size \(N\) , the population size can be eliminated by a simple change of variables. If we let
\[\tau=t / N \text {, } \nonumber \]
then the differential equation (5.5.9) transforms to
\[\frac{\partial P(x, \tau)}{\partial \tau}=\frac{1}{2} \frac{\partial^{2}}{\partial x^{2}}(x(1-x) P(x, \tau)) \nonumber \]
independent of \(N\) . The change-of-variables given by (5.5.11) simply states that a doubling of the population size will lengthen the time scale of evolution by a corresponding factor of two. Remember that here we are already working under the assumption that population sizes are large.
Equation (5.5.12) is a diffusion-like equation for the probability distribution function \(P(x, \tau)\) . A diffusion approximation for studying genetic drift was first introduced into population genetics by its founders Fisher and Wright, and was later extensively developed in the \(1950^{\prime}\) s by the Japanese biologist Motoo Kimura. Here, our analysis of this equation relies on a more recent paper by McKane and Waxman (2007), who showed how to construct the analytical solution of (5.5.12) at the boundaries of \(x\) . For \(0<x<1\) , it is suggestive from the numerical solutions shown in Fig. \(5.3\) that the probability distribution might asymptotically become independent of \(x\) . Accordingly, we look for an asymptotic solution to \((5.5.12)\) at the interior values of \(x\) satisfying \(P(x, \tau)=P(\tau)\) . Equation (5.5.12) becomes
\[\begin{aligned} \frac{d P(\tau)}{d \tau} &=\frac{1}{2}(x(1-x))^{\prime \prime} P(\tau) \\[4pt] &=-P(\tau) \end{aligned} \nonumber \]
with asymptotic solution
\[P(\tau)=c e^{-\tau} \nonumber \]
and we observe that the total probability over the interior values of \(x\) decays exponentially.
To understand how the solution behaves at the boundaries of \(x\) , we require boundary conditions for \(P(x, \tau)\) . In fact, because \(P(x, \tau)\) is singular at the boundaries of \(x\) , appropriate boundary conditions can not be obtained directly from the difference equations given by (5.5.2 and 5.5.3). Rather, boundary conditions are most easily obtained by first recasting (5.5.12) into the form of a continuity equation.
We let \(j(x, \tau)\) denote the so-called probability current. In a small region of size \(\Delta x\) lying in the interval \((x, x+\Delta x)\) , the time-rate of change of the probability \(P(x, \tau) \Delta x\) is due to the flow of probability into and out of this region. With an appropriate definition of the probability current, we have in general
\[\frac{\partial}{\partial \tau}(P(x, \tau) \Delta x)=j(x, t)-j(x+\Delta x) \nonumber \]
or as \(\Delta x \rightarrow 0\) ,
\[\frac{\partial P(x, \tau)}{\partial \tau}+\frac{\partial j(x, \tau)}{\partial x}=0 \nonumber \]
which is the usual form of a continuity equation. Identification of (5.5.15) with (5.5.12) shows that the probability current of our problem is given by
\[j(x, \tau)=-\frac{1}{2} \frac{\partial}{\partial x}(x(1-x) P(x, \tau)) . \nonumber \]
Now, since the total probability is unity; that is,
\[\int_{0}^{1} P(x, \tau) d x=1 \nonumber \]
probability can not flow in or out of the boundaries of \(x\) , and we must therefore have
\[j(0, \tau)=0, \quad j(1, \tau)=0, \nonumber \]
which are the appropriate boundary conditions for (5.5.12).
We can look for stationary solutions of (5.5.12). Use of the continuity equation (5.5.15) together with the boundary conditions (5.5.18) shows that the stationary solution has zero probability current; that is, \(j(x)=0\) . Integration of \(j(x)\) using (5.5.16) results in
\[x(1-x) P(x)=c_{1} \nonumber \]
where \(c_{1}\) is an integration constant. The readily apparent solution is \(P(x)=\) \(c_{1} /[x(1-x)]\) , but there are also two less obvious solutions. Probability distribution functions are allowed to contain singular solutions corresponding to Dirac delta functions, and here we consider the possibility of there being Dirac delta functions at the boundaries. By noticing that both \(x \delta(x)=0\) and \((1-x) \delta(1-x)=0\) , we can see that the general solution for \(P(x)\) can be written as
\[P(x)=\frac{c_{1}}{x(1-x)}+c_{2} \delta(x)+c_{3} \delta(1-x) . \nonumber \]
Requiring \(P(x)\) to satisfy (5.5.17) results in \(c_{1}=0\) and \(c_{2}+c_{3}=1\) . To determine the remaining free constant, we can compute the mean frequency of the \(A\) -allele in the population. From the continuity equation \((5.5.15)\) , we have
\[\begin{aligned} \frac{\partial}{\partial \tau} \int_{0}^{1} x P(x, \tau) d x &=-\int_{0}^{1} x \frac{\partial j(x, \tau)}{\partial x} d x \\[4pt] &=\int_{0}^{1} j(x, \tau) d x \\[4pt] &=0 \end{aligned} \nonumber \]
where the first integral on the right-hand-side was done by parts using the boundary conditions given by (5.5.18), and the second integral was done using (5.5.16) and the vanishing of \(x(1-x) P(x)\) on the boundaries. The mean frequency of the \(A\) allele is therefore a constant-as one would expect for a nondirectional random genetic drift-and we can assume that its initial value is \(p\) . We therefore obtain for our stationary solution
\[P(x)=(1-p) \delta(x)+p \delta(1-x) \nonumber \]
The eventual probability of fixing the \(A\) -allele is therefore simply equal to its initial frequency. For example, suppose that within a population of \(N\) individuals homogeneous for \(a\) , a single neutral mutation occurs so that one individual now carries the \(A\) -allele. What is the probability that the \(A\) -allele eventually becomes fixed? Our result would yield the probability \(1 / N\) , which is the initial frequency of the \(A\) -allele. Intuitively, after a sufficient number of generations has passed, all living individuals should be descendant from a single ancestral individual living at the time the single mutation occurred. The probability that that single individual carried the \(A\) -allele is just \(1 / N\) .
We further note that Kimura was the first to find an analytical solution of the diffusion equation (5.5.12). A solution method using Fourier transforms can be found in the appendix of McKane and Waxman (2007). In addition to making use of Dirac delta functions, these authors require Heaviside step functions, Bessel functions, spherical Bessel functions, hypergeometric functions, Legendre polynomials, and Gegenbauer polynomials. The resulting solution is of the form
\[P(x, \tau)=\Pi_{0}(\tau) \delta(x)+\Pi_{1}(\tau) \delta(1-x)+f(x, \tau) \nonumber \]
If the number of \(A\) -alleles are known at the initial instant, and the frequency is \(p\) , then
\[P(x, 0)=\delta(x-p) \nonumber \]
and \(\Pi_{0}(0)=\Pi_{1}(0)=0 ; f(x, 0)=\delta(x-p)\) . As \(\tau \rightarrow \infty\) , we have \(f(x, \tau) \rightarrow 0\) ; \(\Pi_{0}(\tau) \rightarrow 1-p\) and \(\Pi_{1}(\tau) \rightarrow p\)
We can easily demonstrate at least one exact solution of the form (5.5.22). For an initial uniform probability distribution given by
\[P(x, 0)=1, \nonumber \]
the probability distribution at the interior points remains uniform and is given by \((5.5.13)\) with \(c=1\) . If we assume the form of solution given by (5.5.22) together with the requirement of unit probability given by (5.5.17), we can obtain the exact result
\[P(x, \tau)=\frac{1}{2}\left(1-e^{-\tau}\right)(\delta(x)+\delta(1-x))+e^{-\tau} \nonumber \]