Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

11.4: Fundamental Limit Theorem for Regular Chains**

 

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

The fundamental limit theorem for regular Markov chains states that if \(\mat{P}\) is a regular transition matrix then \[\lim_{n \to \infty} \mat {P}^n = \mat {W}\ ,\] where \(\mat{W}\) is a matrix with each row equal to the unique fixed probability row vector \(\mat{w}\) for \(\mat{P}\). In this section we shall give two very different proofs of this theorem.

Our first proof is carried out by showing that, for any column vector \(\mat{y}\), \(\mat{P}^n \mat {y}\) tends to a constant vector. As indicated in Section 1.3, this will show that \(\mat{P}^n\) converges to a matrix with constant columns or, equivalently, to a matrix with all rows the same.

The following lemma says that if an \(r\)-by-\(r\) transition matrix has no zero entries, and \(\mat {y}\) is any column vector with \(r\) entries, then the vector \(\mat {P}\mat{y}\) has entries which are “closer together" than the entries are in \(\mat {y}\).

Let \(\mat{P}\) be an \(r\)-by-\(r\) transition matrix with no zero entries. Let \(d\) be the smallest entry of the matrix. Let \(\mat{y}\) be a column vector with \(r\) components, the largest of which is \(M_0\) and the smallest \(m_0\). Let \(M_1\) and \(m_1\) be the largest and smallest component, respectively, of the vector \(\mat {P} \mat {y}\). Then \[M_1 - m_1 \leq (1 - 2d)(M_0 - m_0)\ .\]

In the discussion following Theorem[thm 11.3.6], it was noted that each entry in the vector \(\mat {P}\mat{y}\) is a weighted average of the entries in \(\mat {y}\). The largest weighted average that could be obtained in the present case would occur if all but one of the entries of \(\mat {y}\) have value \(M_0\) and one entry has value \(m_0\), and this one small entry is weighted by the smallest possible weight, namely \(d\). In this case, the weighted average would equal \[dm_0 + (1-d)M_0\ .\] Similarly, the smallest possible weighted average equals \[dM_0 + (1-d)m_0\ .\] Thus, \[\begin{aligned} M_1 - m_1 &\le& \Bigl(dm_0 + (1-d)M_0\Bigr) - \Bigl(dM_0 + (1-d)m_0\Bigr) \\ &=& (1 - 2d)(M_0 - m_0)\ .\end{aligned}\] This completes the proof of the lemma.

We turn now to the proof of the fundamental limit theorem for regular Markov chains.

(Fundamental Limit Theorem for Regular Chains) If \(\mat{P}\) is the transition matrix for a regular Markov chain, then \[\lim_{n \to \infty} \mat {P}^n = \mat {W}\ ,\] where \(\mat {W}\) is matrix with all rows equal. Furthermore, all entries in \(\mat{W}\) are strictly positive.

We prove this theorem for the special case that \(\mat{P}\) has no 0 entries. The extension to the general case is indicated in Exercise [exer 11.4.6]. Let be any \(r\)-component column vector, where \(r\) is the number of states of the chain. We assume that \(r > 1\), since otherwise the theorem is trivial. Let \(M_n\) and \(m_n\) be, respectively, the maximum and minimum components of the vector \(\mat {P}^n \mat { y}\). The vector \(\mat {P}^n \mat {y}\) is obtained from the vector \(\mat {P}^{n - 1} \mat {y}\) by multiplying on the left by the matrix \(\mat{P}\). Hence each component of \(\mat {P}^n \mat {y}\) is an average of the components of \(\mat {P}^{n - 1} \mat {y}\). Thus \[M_0 \geq M_1 \geq M_2 \geq\cdots\] and \[m_0 \leq m_1 \leq m_2 \leq\cdots\ .\] Each sequence is monotone and bounded: \[m_0 \leq m_n \leq M_n \leq M_0\ .\] Hence, each of these sequences will have a limit as \(n\) tends to infinity.

Let \(M\) be the limit of \(M_n\) and \(m\) the limit of \(m_n\). We know that \(m \leq M\). We shall prove that \(M - m = 0\). This will be the case if \(M_n - m_n\) tends to 0. Let \(d\) be the smallest element of \(\mat{P}\). Since all entries of \(\mat{P}\) are strictly positive, we have \(d > 0\). By our lemma \[M_n - m_n \leq (1 - 2d)(M_{n - 1} - m_{n - 1})\ .\] From this we see that \[M_n - m_n \leq (1 - 2d)^n(M_0 - m_0)\ .\] Since \(r \ge 2\), we must have \(d \leq 1/2\), so \(0 \leq 1 - 2d < 1\), so the difference \(M_n - m_n\) tends to 0 as \(n\) tends to infinity. Since every component of \(\mat {P}^n \mat {y}\) lies between \(m_n\) and \(M_n\), each component must approach the same number \(u = M = m\). This shows that \[\lim_{n \to \infty} \mat {P}^n \mat {y} = \mat{u}\ , \label{eq 11.4.4}\] where \(\mat{u}\) is a column vector all of whose components equal \(u\).

Now let \(\mat{y}\) be the vector with \(j\)th component equal to 1 and all other components equal to 0. Then \(\mat {P}^n \mat {y}\) is the \(j\)th column of \(\mat {P}^n\). Doing this for each \(j\) proves that the columns of \(\mat {P}^n\) approach constant column vectors. That is, the rows of \(\mat {P}^n\) approach a common row vector \(\mat{w}\), or, \[\lim_{n \to \infty} \mat {P}^n = \mat {W}\ . \label{eq 11.4.5}\]

It remains to show that all entries in \(\mat{W}\) are strictly positive. As before, let \(\mat{y}\) be the vector with \(j\)th component equal to 1 and all other components equal to 0. Then \(\mat{P}\mat{y}\) is the \(j\)th column of \(\mat{P}\), and this column has all entries strictly positive. The minimum component of the vector \(\mat{P}\mat{y}\) was defined to be \(m_1\), hence \(m_1 > 0\). Since \(m_1 \le m\), we have \(m > 0\). Note finally that this value of \(m\) is just the \(j\)th component of \(\mat{w}\), so all components of \(\mat{w}\) are strictly positive.

Doeblin’s Proof

We give now a very different proof of the main part of the fundamental limit theorem for regular Markov chains. This proof was first given by Doeblin,17 a brilliant young mathematician who was killed in his twenties in the Second World War.

[thm 11.4.1] Let \(\mat {P}\) be the transition matrix for a regular Markov chain with fixed vector \(\mat {w}\). Then for any initial probability vector \(\mat {u}\), \(\mat {uP}^n \rightarrow \mat {w}\) as \(n \rightarrow \infty.\)

Let \(X_0,\ X_1,\ \ldots\) be a Markov chain with transition matrix \(\mat {P}\) started in state \(s_i\). Let \(Y_0,\ Y_1,\ \ldots\) be a Markov chain with transition probability \(\mat {P}\) started with initial probabilities given by \(\mat {w}\). The \(X\) and \(Y\) processes are run independently of each other.

We consider also a third Markov chain \(\mat{P}^*\) which consists of watching both the \(X\) and \(Y\) processes. The states for \(\mat{P}^*\) are pairs \((s_i, s_j)\). The transition probabilities are given by \[\mat{P}^{*}[(i,j),(k,l)] = \mat{P}(i,k) \cdot \mat{P}(j,l)\ .\] Since \(\mat{P}\) is regular there is an \(N\) such that \(\mat{P}^{N}(i,j) > 0\) for all \(i\) and \(j\). Thus for the \(\mat{P}^*\) chain it is also possible to go from any state \((s_i, s_j)\) to any other state \((s_k,s_l)\) in at most \(N\) steps. That is \(\mat{P}^*\) is also a regular Markov chain.

We know that a regular Markov chain will reach any state in a finite time. Let \(T\) be the first time the the chain \(\mat{P}^*\) is in a state of the form \((s_k,s_k)\). In other words, \(T\) is the first time that the \(X\) and the \(Y\) processes are in the same state. Then we have shown that \[P[T > n] \rightarrow 0 \;\;\mbox{as}\;\; n \rightarrow \infty\ .\] If we watch the \(X\) and \(Y\) processes after the first time they are in the same state we would not predict any difference in their long range behavior. Since this will happen no matter how we started these two processes, it seems clear that the long range behaviour should not depend upon the starting state. We now show that this is true.

We first note that if \(n \ge T\), then since \(X\) and \(Y\) are both in the same state at time \(T\), \[P(X_n = j\ |\ n \ge T) = P(Y_n = j\ |\ n \ge T)\ .\] If we multiply both sides of this equation by \(P(n \ge T)\), we obtain \[P(X_n = j,\ n \ge T) = P(Y_n = j,\ n \ge T)\ . \label{eq 11.4.1}\] We know that for all \(n\), \[P(Y_n = j) = w_j\ .\] But \[P(Y_n = j) = P(Y_n = j,\ n \ge T) + P(Y_n = j,\ n < T)\ ,\] and the second summand on the right-hand side of this equation goes to 0 as \(n\) goes to \(\infty\), since \(P(n < T)\) goes to 0 as \(n\) goes to \(\infty\). So, \[P(Y_n = j,\ n \ge T) \rightarrow w_j\ ,\] as \(n\) goes to \(\infty\). From Equation [eq 11.4.1], we see that \[P(X_n = j,\ n \ge T) \rightarrow w_j\ ,\] as \(n\) goes to \(\infty\). But by similar reasoning to that used above, the difference between this last expression and \(P(X_n = j)\) goes to 0 as \(n\) goes to \(\infty\). Therefore, \[P(X_n = j) \rightarrow w_j\ ,\] as \(n\) goes to \(\infty\). This completes the proof.

In the above proof, we have said nothing about the rate at which the distributions of the \(X_n\)’s approach the fixed distribution \(\mat {w}\). In fact, it can be shown that18 \[\sum ^{r}_{j = 1} \mid P(X_{n} = j) - w_j \mid \leq 2 P(T > n)\ .\] The left-hand side of this inequality can be viewed as the distance between the distribution of the Markov chain after \(n\) steps, starting in state \(s_i\), and the limiting distribution \(\mat {w}\).

i[exer 11.4.1] Define \(\mat{P}\) and \(\mat{y}\) by \[\mat {P} = \pmatrix{ .5 & .5 \cr.25 & .75 }, \qquad \mat {y} = \pmatrix{ 1 \cr 0 }\ .\] Compute \(\mat {P}\mat{y}\), \(\mat {P}^2 \mat {y}\), and \(\mat {P}^4 \mat {y}\) and show that the results are approaching a constant vector. What is this vector?

i[exer 11.4.2] Let \(\mat{P}\) be a regular \(r \times r\) transition matrix and \(\mat{y}\) any \(r\)-component column vector. Show that the value of the limiting constant vector for \(\mat {P}^n \mat {y}\) is \(\mat{w}\mat{y}\).

i[exer 11.4.3] Let \[\mat {P} = \pmatrix{ 1 & 0 & 0 \cr .25 & 0 & .75 \cr 0 & 0 & 1 }\] be a transition matrix of a Markov chain. Find two fixed vectors of \(\mat {P}\) that are linearly independent. Does this show that the Markov chain is not regular?

i[exer 11.4.4] Describe the set of all fixed column vectors for the chain given in Exercise [exer 11.4.3].

i[exer 11.4.6] The theorem that \(\mat {P}^n \to \mat {W}\) was proved only for the case that \(\mat{P}\) has no zero entries. Fill in the details of the following extension to the case that \(\mat{P}\) is regular. Since \(\mat{P}\) is regular, for some \(N, \mat {P}^N\) has no zeros. Thus, the proof given shows that \(M_{nN} - m_{nN}\) approaches 0 as \(n\) tends to infinity. However, the difference \(M_n - m_n\) can never increase. (Why?) Hence, if we know that the differences obtained by looking at every \(N\)th time tend to 0, then the entire sequence must also tend to 0.

i[exer 11.4.7] Let \(\mat{P}\) be a regular transition matrix and let \(\mat{w}\) be the unique non-zero fixed vector of \(\mat{P}\). Show that no entry of \(\mat{w}\) is 0.

i[exer 11.4.8] Here is a trick to try on your friends. Shuffle a deck of cards and deal them out one at a time. Count the face cards each as ten. Ask your friend to look at one of the first ten cards; if this card is a six, she is to look at the card that turns up six cards later; if this card is a three, she is to look at the card that turns up three cards later, and so forth. Eventually she will reach a point where she is to look at a card that turns up \(x\) cards later but there are not \(x\) cards left. You then tell her the last card that she looked at even though you did not know her starting point. You tell her you do this by watching her, and she cannot disguise the times that she looks at the cards. In fact you just do the same procedure and, even though you do not start at the same point as she does, you will most likely end at the same point. Why?

i[exer 11.4.9] Write a program to play the game in Exercise [exer 11.4.8].

i[exer 11.4.10] (Suggested by Peter Doyle) In the proof of Theorem [thm 11.4.1], we assumed the existence of a fixed vector \(\mat{w}\). To avoid this assumption, beef up the coupling argument to show (without assuming the existenceof a stationary distribution \(\mat{w}\)) that for appropriate constants\(C\) and \(r<1\), the distance between \(\alpha P^n\) and \(\beta P^n\) is at most\(C r^n\) for any starting distributions \(\alpha\) and \(\beta\).Apply this in the case where \(\beta = \alpha P\) toconclude that the sequence \(\alpha P^n\) is a Cauchy sequence,and that its limit is a matrix \(W\) whose rows are all equal to a probabilityvector \(w\) with \(wP=w\). Note that the distance between \(\alpha P^n\) and\(w\) is at most \(C r^n\), so in freeing ourselves from the assumption abouthaving a fixed vector we’ve proved that the convergence to equilibriumtakes place exponentially fast.