5.1: Haploid Genetics
- Page ID
- 93511
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)We first consider the modeling of selection in a population of haploid organisms. Selection is modeled by fitness coefficients, with different genotypes having different fitnesses. We begin with a simple model that counts the number of individuals in the next generation, and then show how this model can be reformulated in terms of allele frequencies and relative fitness coefficients.
Table \(5.1\) formulates the basic model. We assume that there are two alleles \(A\) and \(a\) for a particular haploid gene. These alleles are carried in the population by \(n_{A}\) and \(n_{a}\) individuals, respectively. A fraction \(g_{A}\left(g_{a}\right)\) of individuals carrying allele \(A(a)\) is assumed to survive to reproduction age, and those that survive contribute \(f_{A}\left(f_{a}\right)\) offspring to the next generation. These are of course average values, but under the assumption of an (almost) infinite population, our model is deterministic. Accordingly, with \(n_{A}^{(i)}\left(n_{a}^{(i)}\right)\) representing the number of individuals carrying allele \(A(a)\) in the \(i\) th generation, and formulating a discrete generation model, we have
\[n_{A}^{(i+1)}=f_{A} g_{A} n_{A}^{(i)}, \quad n_{a}^{(i+1)}=f_{a} g_{a} n_{a}^{(i)} \nonumber \]
It is mathematically easier and more transparent to work with allele frequencies rather than individual numbers. We denote the frequency (or more accurately, proportion) of allele \(A(a)\) in the \(i\) th generation by \(p_{i}\left(q_{i}\right) ;\) that is,
\[p_{i}=\frac{n_{A}^{(i)}}{n_{A}^{(i)}+n_{a}^{(i)}}, \quad q_{i}=\frac{n_{a}^{(i)}}{n_{A}^{(i)}+n_{a}^{(i)}} \nonumber \]
where evidently \(p_{i}+q_{i}=1 .\) Now, from (5.1.1),
\[n_{A}^{(i+1)}+n_{a}^{(i+1)}=f_{A} g_{A} n_{A}^{(i)}+f_{a} g_{a} n_{a}^{(i)}, \nonumber \]
so that dividing the first equation in (5.1.1) by (5.1.3) yields
\[\begin{align} \nonumber p_{i+1} &=\frac{f_{A} g_{A} n_{A}^{(i)}}{f_{A} g_{A} n_{A}^{(i)}+f_{a} g_{a} n_{a}^{(i)}} \\[4pt] &=\frac{f_{A} g_{A} p_{i}}{f_{A} g_{A} p_{i}+f_{a} g_{a} q_{i}} \\[4pt] &=\frac{\left(\frac{f_{A} g_{A}}{f_{a} g_{a}}\right) p_{i}}{\left(\frac{f_{A} g_{A}}{f_{a} g_{a}}\right) p_{i}+q_{i}}\nonumber \end{align} \nonumber \]
where the second equality comes from dividing the numerator and denominator by \(n_{A}^{(i)}+n_{a}^{(i)}\), and the third equality from dividing the numerator and denominator by
genotype | \(A\) | \(a\) |
---|---|---|
freq. of gamete | \(p\) | \(q\) |
relative fitness | \(1+s\) | 1 |
freq after selection | \((1+s) p / w\) | \(q / w\) |
normalization | \(w=(1+s) p+q\) |
Table 5.2: Haploid genetic model of the spread of a favored allele.
\(f_{A} g_{A}\). Similarly,
\[q_{i+1}=\frac{q_{i}}{\left(\frac{f_{A} g_{A}}{f_{a} g_{a}}\right) p_{i}+q_{i}} \nonumber \]
which could also be derived using \(q_{i+1}=1-p_{i+1}\). We observe from the evolution equations for the allele frequencies, (5.1.4) and (5.1.5), that only the relative fitness \(f_{A} g_{A} / f_{a} g_{a}\) of the alleles matters. Accordingly, in our models, we will consider only relative fitnesses, and we will arbitrarily set one fitness to unity to simplify the algebra and make the final result more transparent.
Spread of a favored allele
We consider a simple model for the spread of a favored allele in Table 5.2, with \(s>0\). Denoting \(p^{\prime}\) by the frequency of \(A\) in the next generation (not (!) the derivative of \(p\) ), the evolution equation is given by
\[\begin{align} \nonumber p^{\prime} &=\frac{(1+s) p}{w} \\[4pt] &=\frac{(1+s) p}{1+s p} \end{align} \nonumber \]
where we have used \((1+s) p+q=1+s p\), since \(p+q=1\). Note that (5.1.6) is the same as (5.1.4) with \(p^{\prime}=p_{i+1}, p=p_{i}\), and \(f_{A} g_{A} / f_{a} g_{a}=1+s\). Fixed points of (5.1.6) are determined from \(p^{\prime}=p .\) We find two fixed points: \(p_{*}=0\), corresponding to a population in which allele \(A\) is absent; and \(p_{*}=1\), corresponding to a population in which allele \(A\) is fixed. Intuitively, \(p_{*}=0\) is unstable while \(p_{*}=1\) is stable.
To illustrate how a stability analysis is performed analytically for a difference equation (instead of a differential equation), consider the general difference equation
\[p^{\prime}=f(p) \nonumber \]
With \(p=p_{*}\) a fixed point such that \(p_{*}=f\left(p_{*}\right)\), we write \(p=p_{*}+\epsilon\) so that (5.1.7) becomes
\[\begin{aligned} p_{*}+\epsilon^{\prime} &=f\left(p_{*}+\epsilon\right) \\[4pt] &=f\left(p_{*}\right)+\epsilon f^{\prime}\left(p_{*}\right)+\ldots \\[4pt] &=p_{*}+\epsilon f^{\prime}\left(p_{*}\right)+\ldots, \end{aligned} \nonumber \]
where \(f^{\prime}\left(p_{*}\right)\) denotes the derivative of \(f\) evaluated at \(p_{*} .\) Therefore, to leading-order in \(\epsilon\)
\[\left|\epsilon^{\prime} / \epsilon\right|=\left|f^{\prime}\left(p_{*}\right)\right|, \nonumber \]
and the fixed point is stable provided that \(\left|f^{\prime}\left(p_{*}\right)\right|<1\). For our haploid model,
\[f(p)=\frac{(1+s) p}{1+s p}, \quad f^{\prime}(p)=\frac{1+s}{(1+s p)^{2}} \nonumber \]
genotype | \(A\) | \(a\) |
---|---|---|
freq. of gamete | \(p\) | \(q\) |
relative fitness | 1 | \(1-s\) |
freq after selection | \(p / w\) | \((1-s) q / w\) |
freq after mutation | \((1-u) p / w\) | \([(1-s) q+u p] / w\) |
normalization | \(w=p+(1-s) q\) |
so that \(f^{\prime}\left(p_{*}=0\right)=1+s>1\), and \(f^{\prime}\left(p_{*}=1\right)=1 /(1+s)<1\), confirming that \(p_{*}=0\) is unstable and \(p_{*}=1\) is stable.
If the selection coefficient \(s\) is small, the model equation (5.1.6) simplifies further. We have
\[\begin{aligned} p^{\prime} &=\frac{(1+s) p}{1+s p} \\[4pt] &=(1+s) p\left(1-s p+\mathrm{O}\left(s^{2}\right)\right) \\[4pt] &=p+\left(p-p^{2}\right) s+\mathrm{O}\left(s^{2}\right) \end{aligned} \nonumber \]
so that to leading-order in \(s\),
\[p^{\prime}-p=s p(1-p) \nonumber \]
If \(p^{\prime}-p \ll 1\), which is valid for \(s \ll 1\), we can approximate this difference equation by the differential equation
\[d p / d n=s p(1-p) \nonumber \]
which shows that the frequency of allele \(A\) satisfies the now very familiar logistic equation.
Although a polymorphism for this gene exists in the population as the new allele spreads, eventually \(A\) becomes fixed in the population and the polymorphism is lost. In the next section, we consider how a polymorphism can be maintained in a haploid population by a balance between mutation and selection.
Mutation-selection balance
We consider a gene with two alleles: a wildtype allele \(A\) and a mutant allele \(a\). We view the mutant allele as a defective genotype, which confers on the carrier a lowered fitness \(1-s\) relative to the wildtype. Although all mutant alleles may not have identical DNA sequences, we assume that they share in common the same phenotype of reduced fitness. We model the opposing effects of two evolutionary forces: natural selection, which favors the wildtype allele \(A\) over the mutant allele \(a\), and mutation, which confers a small probability \(u\) that allele \(A\) mutates to allele \(a\) in each newborn individual. Schematically,
\[A \stackrel{u}{\underset{s}{\rightleftharpoons}} a \nonumber \]
where \(u\) represents mutation and \(s\) represents selection. The model is shown in Table 5.3. The equations for \(p\) and \(q\) in the next generation are
\[\begin{aligned} p^{\prime} &=\frac{(1-u) p}{w} \\[4pt] &=\frac{(1-u) p}{1-s(1-p)} \end{aligned} \nonumber \]
genotype | \(A A\) | \(A a\) | \(a a\) |
---|---|---|---|
referred to as | wildtype homozygote | heterozygote | mutant homozygote |
frequency | \(P\) | \(Q\) | \(R\) |
and
\[\begin{align} \nonumber q^{\prime} &=\frac{(1-s) q+u p}{w} \\[4pt] &=\frac{(1-s-u) q+u}{1-s q} \end{align} \nonumber \]
where we have used \(p+q=1\) to eliminate \(q\) from the equation for \(p^{\prime}\) and \(p\) from the equation for \(q^{\prime}\). The equations for \(p^{\prime}\) and \(q^{\prime}\) are linearly dependent since \(p^{\prime}+q^{\prime}=1\), and we need solve only one of them.
Considering (5.1.13), the fixed points determined from \(p^{\prime}=p\) are \(p_{*}=0\), for which the mutant allele \(a\) is fixed in the population and there is no polymorphism, and the solution to
\[1-s\left(1-p_{*}\right)=1-u \nonumber \]
which is \(p_{*}=1-u / s\), and there is a polymorphism. The stabilities of these two fixed points are determined by considering \(p^{\prime}=f(p)\), with \(f(p)\) given by the righthand-side of (5.1.13). Taking the derivative of \(f\),
\[f^{\prime}(p)=\frac{(1-u)(1-s)}{[1-s(1-p)]^{2}} \nonumber \]
so that
\[f^{\prime}\left(p_{*}=0\right)=\frac{1-u}{1-s}, \quad f^{\prime}\left(p_{*}=1-u / s\right)=\frac{1-s}{1-u} . \nonumber \]
Applying the criterion \(\left|f^{\prime}\left(p_{*}\right)\right|<1\) for stability, \(p_{*}=0\) is stable for \(s<u\) and \(p_{*}=1-u / s\) is stable for \(s>u\). A polymorphism is therefore possible under mutation-selection balance when \(s>u>0\).