7.3: Applications of Spectral Theory
- Use diagonalization to find a high power of a matrix.
- Use diagonalization to solve dynamical systems.
Raising a Matrix to a High Power
Suppose we have a matrix \(A\) and we want to find \(A^{50}\). One could try to multiply \(A\) with itself 50 times, but this is computationally extremely intensive (try it!). However diagonalization allows us to compute high powers of a matrix relatively easily. Suppose \(A\) is diagonalizable, so that \(P^{-1}AP=D\). We can rearrange this equation to write \(A=PDP^{-1}\).
Now, consider \(A^{2}\). Since \(A=PDP^{-1}\), it follows that \[A^{2} = \left( PDP^{-1}\right) ^{2}=PDP^{-1}PDP^{-1}=PD^{2}P^{-1}\nonumber\]
Similarly, \[A^3 = \left( PDP^{-1}\right) ^{3}=PDP^{-1}PDP^{-1}PDP^{-1}=PD^{3}P^{-1}\nonumber\]
In general, \[A^n = \left( PDP^{-1}\right) ^{n}=PD^{n}P^{-1}\nonumber\]
Therefore, we have reduced the problem to finding \(D^{n}\). In order to compute \(D^{n}\), then because \(D\) is diagonal we only need to raise every entry on the main diagonal of \(D\) to the power of \(n\).
Below is a video on powers of a diagonal matrix.
Below is a video on diagonalizing a 2x2 matrix to determine a power.
Through this method, we can compute large powers of matrices. Consider the following example.
Let \(A=\left [ \begin{array}{rrr} 2 & 1 & 0 \\ 0 & 1 & 0 \\ -1 & -1 & 1 \end{array} \right ].\) Find \(A^{50}.\)
Solution
We will first diagonalize \(A\). The steps are left as an exercise and you may wish to verify that the eigenvalues of \(A\) are \(\lambda_1 =1, \lambda_2=1\), and \(\lambda_3=2\).
The basic eigenvectors corresponding to \(\lambda_1, \lambda_2 = 1\) are \[X_1 = \left [ \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \right ] , X_2 = \left [ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right ]\nonumber\]
The basic eigenvector corresponding to \(\lambda_3 = 2\) is \[X_3 = \left [ \begin{array}{r} -1 \\ 0 \\ 1 \end{array} \right ]\nonumber\]
Now we construct \(P\) by using the basic eigenvectors of \(A\) as the columns of \(P\). Thus \[P= \left [ \begin{array}{rrr} X_1 & X_2 & X_3 \end{array} \right ] = \left [ \begin{array}{rrr} 0 & -1 & -1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right ]\nonumber\] Then also \[P^{-1}=\left [ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & 0 \\ -1 & -1 & 0 \end{array} \right ]\nonumber\] which you may wish to verify.
Then, \[\begin{aligned} P^{-1}AP &=\left [ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & 0 \\ -1 & -1 & 0 \end{array} \right ] \left [ \begin{array}{rrr} 2 & 1 & 0 \\ 0 & 1 & 0 \\ -1 & -1 & 1 \end{array} \right ] \left [ \begin{array}{rrr} 0 & -1 & -1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right ] \\ &=\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ] \\ &= D\end{aligned}\]
Now it follows by rearranging the equation that \[A=PDP^{-1}=\left [ \begin{array}{rrr} 0 & -1 & -1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right ] \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ] \left [ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & 0 \\ -1 & -1 & 0 \end{array} \right ]\nonumber\]
Therefore, \[\begin{aligned} A^{50} &=PD^{50}P^{-1} \\ &=\left [ \begin{array}{rrr} 0 & -1 & -1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right ] \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ] ^{50}\left [ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & 0 \\ -1 & -1 & 0 \end{array} \right ] \end{aligned}\]
By our discussion above, \(D^{50}\) is found as follows. \[\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ] ^{50}=\left [ \begin{array}{rrr} 1^{50} & 0 & 0 \\ 0 & 1^{50} & 0 \\ 0 & 0 & 2^{50} \end{array} \right ]\nonumber\]
It follows that \[\begin{aligned} A^{50} &=\left [ \begin{array}{rrr} 0 & -1 & -1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right ] \left [ \begin{array}{rrr} 1^{50} & 0 & 0 \\ 0 & 1^{50} & 0 \\ 0 & 0 & 2^{50} \end{array} \right ] \left [ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & 1 & 0 \\ -1 & -1 & 0 \end{array} \right ] \\ &=\left [ \begin{array}{ccc} 2^{50} & -1+2^{50} & 0 \\ 0 & 1 & 0 \\ 1-2^{50} & 1-2^{50} & 1 \end{array} \right ] \end{aligned}\]
Below is a video on diagonalizing a 2x2 matrix to find the matrix raised to a high power.
Through diagonalization, we can efficiently compute a high power of \(A\). Without this, we would be forced to multiply this by hand!
The next section explores another interesting application of diagonalization.
Raising a Symmetric Matrix to a High Power
We already have seen how to use matrix diagonalization to compute powers of matrices. This requires computing eigenvalues of the matrix \(A\), and finding an invertible matrix of eigenvectors \(P\) such that \(P^{-1}AP\) is diagonal. In this section we will see that if the matrix \(A\) is symmetric (see Definition 2.5.2 ), then we can actually find such a matrix \(P\) that is an orthogonal matrix of eigenvectors. Thus \(P^{-1}\) is simply its transpose \(P^T\), and \(P^TAP\) is diagonal. When this happens we say that \(A\) is orthogonally diagonalizable
In fact this happens if and only if \(A\) is a symmetric matrix as shown in the following important theorem.
The following conditions are equivalent for an \(n \times n\) matrix \(A\):
- \(A\) is symmetric.
- \(A\) has an orthonormal set of eigenvectors.
- \(A\) is orthogonally diagonalizable.
- Proof
-
The complete proof is beyond this course, but to give an idea assume that \(A\) has an orthonormal set of eigenvectors, and let \(P\) consist of these eigenvectors as columns. Then \(P^{-1}=P^T\), and \(P^TAP=D\) a diagonal matrix. But then \(A=PDP^T\), and \[A^T=(PDP^T)^T = (P^T)^TD^TP^T=PDP^T=A\nonumber\] so \(A\) is symmetric.
Now given a symmetric matrix \(A\), one shows that eigenvectors corresponding to different eigenvalues are always orthogonal. So it suffices to apply the Gram-Schmidt process on the set of basic eigenvectors of each eigenvalue to obtain an orthonormal set of eigenvectors.
We demonstrate this in the following example.
Let \(A=\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & \frac{3}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{3}{2} \end{array} \right ] .\) Find an orthogonal matrix \(P\) such that \(P^{T}AP\) is a diagonal matrix.
Solution
In this case, verify that the eigenvalues are 2 and 1. First we will find an eigenvector for the eigenvalue \(2\). This involves row reducing the following augmented matrix. \[\left [ \begin{array}{ccc|c} 2 - 1 & 0 & 0 & 0 \\ 0 & 2- \frac{3}{2} & - \frac{1}{2} & 0 \\ 0 & - \frac{1}{2} & 2- \frac{3}{2} & 0 \end{array} \right ]\nonumber\] The reduced row-echelon form is \[\left [ \begin{array}{rrr|r} 1 & 0 & 0 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber\] and so an eigenvector is \[\left [ \begin{array}{c} 0 \\ 1 \\ 1 \end{array} \right ]\nonumber\] Finally to obtain an eigenvector of length one (unit eigenvector) we simply divide this vector by its length to yield: \[\left [ \begin{array}{c} 0 \\ \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \right ]\nonumber\]
Next consider the case of the eigenvalue \(1\). To obtain basic eigenvectors, the matrix which needs to be row reduced in this case is \[\left [ \begin{array}{ccc|c} 1-1 & 0 & 0 & 0 \\ 0 & 1- \frac{3}{2} & - \frac{1}{2} & 0 \\ 0 & - \frac{1}{2} & 1- \frac{3}{2} & 0 \end{array} \right ]\nonumber\] The reduced row-echelon form is \[\left [ \begin{array}{rrr|r} 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber\] Therefore, the eigenvectors are of the form \[\left [ \begin{array}{c} s \\ -t \\ t \end{array} \right ]\nonumber\] Note that all these vectors are automatically orthogonal to eigenvectors corresponding to the first eigenvalue. This follows from the fact that \(A\) is symmetric, as mentioned earlier.
We obtain basic eigenvectors \[\left [ \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \right ] \text{ and }\left [ \begin{array}{r} 0 \\ -1 \\ 1 \end{array} \right ]\nonumber\] Since they are themselves orthogonal (by luck here) we do not need to use the Gram-Schmidt process and instead simply normalize these vectors to obtain \[\left [ \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \right ] \text{ and }\left [ \begin{array}{c} 0 \\ -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \right ]\nonumber\] An orthogonal matrix \(P\) to orthogonally diagonalize \(A\) is then obtained by letting these basic vectors be the columns. \[P= \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ]\nonumber\] We verify this works. \(P^{T}AP\) is of the form \[\left [ \begin{array}{ccc} 0 & - \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ 1 & 0 & 0 \\ 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \right ] \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \frac{3}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{3}{2} \end{array} \right ] \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ]\nonumber\] \[= \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ]\nonumber\] which is the desired diagonal matrix.
We can now apply this technique to efficiently compute high powers of a symmetric matrix.
Let \(A=\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & \frac{3}{2} & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{3}{2} \end{array} \right ] .\) Compute \(A^7\).
Solution
We found in Example \(\PageIndex{2}\) that \(P^TAP=D\) is diagonal, where
\[P= \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ] \text{ and } D = \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ]\nonumber \]
Thus \(A=PDP^T\) and \(A^7=PDP^T \; PDP^T \; \cdots PDP^T = PD^7P^T\) which gives:
\[\begin{array}{rr} A^7 & = \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ] \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \end{array} \right ] ^7 \left [ \begin{array}{ccc} 0 & - \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ 1 & 0 & 0 \\ 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \right ] \\ & = \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ] \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 2^7 \end{array} \right ] \left [ \begin{array}{ccc} 0 & - \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ 1 & 0 & 0 \\ 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \right ] \\ & = \left [ \begin{array}{ccc} 0 & 1 & 0 \\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{array} \right ] \left [ \begin{array}{ccc} 0 & - \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ 1 & 0 & 0 \\ 0 & \frac{2^7}{\sqrt{2}} & \frac{2^7}{\sqrt{2}} \end{array} \right ] \\ & = \left [ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & \frac{2^7+1}{2} & \frac{2^7-1}{2}\\ 0 & \frac{2^7-1}{2} & \frac{2^7+1}{2} \end{array} \right ] \\ \end{array}\nonumber\]
Markov Matrices
There are applications of great importance which feature a special type of matrix. Matrices whose columns consist of non-negative numbers that sum to one are called Markov matrices . An important application of Markov matrices is in population migration, as illustrated in the following definition.
Let \(m\) locations be denoted by the numbers \(1,2,\cdots ,m.\) Suppose it is the case that each year the proportion of residents in location \(j\) which move to location \(i\) is \(a_{ij}\). Also suppose no one escapes or emigrates from without these \(m\) locations. This last assumption requires \(\sum_{i}a_{ij}=1\), and means that the matrix \(A\), such that \(A = \left [ a_{ij} \right ]\), is a Markov matrix. In this context, \(A\) is also called a migration matrix .
Consider the following example which demonstrates this situation.
Let \(A\) be a Markov matrix given by \[A = \left [ \begin{array}{rr} .4 & .2 \\ .6 & .8 \end{array} \right ]\nonumber\] Verify that \(A\) is a Markov matrix and describe the entries of \(A\) in terms of population migration.
Solution
The columns of \(A\) are comprised of non-negative numbers which sum to \(1\). Hence, \(A\) is a Markov matrix.
Now, consider the entries \(a_{ij}\) of \(A\) in terms of population. The entry \(a_{11} = .4\) is the proportion of residents in location one which stay in location one in a given time period. Entry \(a_{21} = .6\) is the proportion of residents in location 1 which move to location 2 in the same time period. Entry \(a_{12} = .2\) is the proportion of residents in location 2 which move to location 1. Finally, entry \(a_{22} = .8\) is the proportion of residents in location 2 which stay in location 2 in this time period.
Considered as a Markov matrix, these numbers are usually identified with probabilities. Hence, we can say that the probability that a resident of location one will stay in location one in the time period is \(.4\).
Observe that in Example \(\PageIndex{4}\) if there was initially say 15 thousand people in location 1 and 10 thousands in location 2, then after one year there would be \(.4 \times 15 + .2 \times 10 = 8\) thousands people in location 1 the following year, and similarly there would be \(.6 \times 15 + .8 \times 10 = 17\) thousands people in location 2 the following year.
More generally let \(X_n=\left [ x_{1n} \cdots x_{mn}\right ] ^{T}\) where \(x_{in}\) is the population of location \(i\) at time period \(n\). We call \(X_n\) the state vector at period \(n\) . In particular, we call \(X_0\) the initial state vector. Letting \(A\) be the migration matrix, we compute the population in each location \(i\) one time period later by \(AX_n\). In order to find the population of location \(i\) after \(k\) years, we compute the \(i^{th}\) component of \(A^{k}X.\) This discussion is summarized in the following theorem.
Let \(A\) be the migration matrix of a population and let \(X_n\) be the vector whose entries give the population of each location at time period \(n\). Then \(X_n\) is the state vector at period \(n\) and it follows that \[X_{n+1} = A X_n\nonumber\]
The sum of the entries of \(X_n\) will equal the sum of the entries of the initial vector \(X_{0}\). Since the columns of \(A\) sum to \(1\), this sum is preserved for every multiplication by \(A\) as demonstrated below. \[\sum_{i}\sum_{j}a_{ij}x_{j}=\sum_{j}x_{j}\left( \sum_{i}a_{ij}\right) =\sum_{j}x_{j}\nonumber\]
Consider the following example.
Consider the migration matrix \[A = \left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ]\nonumber\] for locations \(1,2,\) and \(3.\) Suppose initially there are \(100\) residents in location \(1\), \(200\) in location \(2\) and \(400\) in location \(3\). Find the population in the three locations after \(1,2,\) and \(10\) units of time.
Solution
Using Theorem \(\PageIndex{2}\) we can find the population in each location using the equation \(X_{n+1} = AX_n\). For the population after \(1\) unit, we calculate \(X_1 = AX_0\) as follows. \[\begin{aligned} X_1 &= AX_0 \\ \left [ \begin{array}{r} x_{11} \\ x_{21} \\ x_{31} \end{array}\right ] &= \left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ] \left [ \begin{array}{r} 100 \\ 200 \\ 400 \end{array} \right ] \\ &= \left [ \begin{array}{r} 100 \\ 180 \\ 420 \end{array}\right ]\end{aligned}\] Therefore after one time period, location \(1\) has \(100\) residents, location \(2\) has \(180\), and location \(3\) has \(420\). Notice that the total population is unchanged, it simply migrates within the given locations. We find the locations after two time periods in the same way. \[\begin{aligned} X_2 &= AX_1 \\ \left [ \begin{array}{r} x_{12} \\ x_{22} \\ x_{32} \end{array}\right ] &= \left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ] \left [ \begin{array}{r} 100 \\ 180 \\ 420 \end{array} \right ] \\ &= \left [ \begin{array}{r} 102 \\ 164 \\ 434 \end{array}\right ]\end{aligned}\]
We could progress in this manner to find the populations after \(10\) time periods. However from our above discussion, we can simply calculate \(\left( A^{n}X_0\right) _{i}\), where \(n\) denotes the number of time periods which have passed. Therefore, we compute the populations in each location after \(10\) units of time as follows. \[\begin{aligned} X_{10} &= A^{10}X_0 \\ \left [ \begin{array}{r} x_{1 10} \\ x_{2 10} \\ x_{3 10} \end{array} \right ] &= \left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ] ^{10}\left [ \begin{array}{r} 100 \\ 200 \\ 400 \end{array} \right ] \\ &= \left [ \begin{array}{c} 115.\, 085\,829\,22 \\ 120.\, 130\,672\,44 \\ 464.\, 783\,498\,34 \end{array} \right ]\end{aligned}\] Since we are speaking about populations, we would need to round these numbers to provide a logical answer. Therefore, we can say that after \(10\) units of time, there will be \(115\) residents in location one, \(120\) in location two, and \(465\) in location three.
A second important application of Markov matrices is the concept of random walks. Suppose a walker has \(m\) locations to choose from, denoted \(1, 2, \cdots, m\). Let \(a_{ij}\) refer to the probability that the person will travel to location \(i\) from location \(j\). Again, this requires that \[\sum_{i=1}^{k}a_{ij}=1\nonumber\] In this context, the vector \(X_n=\left [ x_{1n} \cdots x_{mn}\right ] ^{T}\) contains the probabilities \(x_{in}\) the walker ends up in location \(i, 1\leq i \leq m\) at time \(n\).
Suppose three locations exist, referred to as locations \(1, 2\) and \(3\). The Markov matrix of probabilities \(A = [a_{ij}]\) is given by \[\left [ \begin{array}{rrr} 0.4 & 0.1 & 0.5 \\ 0.4 & 0.6 & 0.1 \\ 0.2 & 0.3 & 0.4 \end{array} \right ]\nonumber\] If the walker starts in location \(1\), calculate the probability that he ends up in location \(3\) at time \(n = 2\).
Solution
Since the walker begins in location \(1\), we have \[X_{0} = \left [ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right ]\nonumber\] The goal is to calculate \(x_{32}\). To do this we calculate \(X_{2}\), using \(X_{n+1} = AX_{n}\). \[\begin{aligned} X_{1} &= A X_{0} \\ &= \left [ \begin{array}{rrr} 0.4 & 0.1 & 0.5 \\ 0.4 & 0.6 & 0.1 \\ 0.2 & 0.3 & 0.4 \end{array} \right ] \left [ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right ] \\ &= \left [ \begin{array}{r} 0.4 \\ 0.4 \\ 0.2 \end{array} \right ] \\\end{aligned}\] \[\begin{aligned} X_{2} &= A X_{1} \\ &= \left [ \begin{array}{rrr} 0.4 & 0.1 & 0.5 \\ 0.4 & 0.6 & 0.1 \\ 0.2 & 0.3 & 0.4 \end{array} \right ] \left [ \begin{array}{c} 0.4 \\ 0.4 \\ 0.2 \end{array} \right ] \\ &= \left [ \begin{array}{r} 0.3 \\ 0.42 \\ 0.28 \end{array} \right ] \\\end{aligned}\] This gives the probabilities that our walker ends up in locations 1, 2, and 3. For this example we are interested in location 3, with a probability on \(0.28\).
Returning to the context of migration, suppose we wish to know how many residents will be in a certain location after a very long time. It turns out that if some power of the migration matrix has all positive entries, then there is a vector \(X_s\) such that \(A^{n}X_{0}\) approaches \(X_s\) as \(n\) becomes very large. Hence as more time passes and \(n\) increases, \(A^{n}X_{0}\) will become closer to the vector \(X_s\).
Consider Theorem \(\PageIndex{2}\) . Let \(n\) increase so that \(X_n\) approaches \(X_s\). As \(X_n\) becomes closer to \(X_s\), so too does \(X_{n+1}\). For sufficiently large \(n\), the statement \(X_{n+1} = AX_n\) can be written as \(X_s = AX_s\).
This discussion motivates the following theorem.
Let \(A\) be a migration matrix. Then there exists a steady state vector written \(X_s\) such that \[X_s = AX_s\nonumber \] where \(X_s\) has positive entries which have the same sum as the entries of \(X_0\).
As \(n\) increases, the state vectors \(X_n\) will approach \(X_s\).
Note that the condition in Theorem \(\PageIndex{3}\) can be written as \((I - A)X_s=0\), representing a homogeneous system of equations.
Consider the following example. Notice that it is the same example as the Example \(\PageIndex{5}\) but here it will involve a longer time frame.
Consider the migration matrix \[A = \left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ]\nonumber \] for locations \(1,2,\) and \(3.\) Suppose initially there are 100 residents in location 1, 200 in location 2 and 400 in location 4. Find the population in the three locations after a long time.
Solution
By Theorem \(\PageIndex{3}\) the steady state vector \(X_s\) can be found by solving the system \((I-A)X_s = 0\).
Thus we need to find a solution to \[\left( \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ] -\left [ \begin{array}{rrr} .6 & 0 & .1 \\ .2 & .8 & 0 \\ .2 & .2 & .9 \end{array} \right ] \right) \left [ \begin{array}{c} x_{1s} \\ x_{2s}\\ x_{3s} \end{array} \right ] =\left [ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right ]\nonumber \] The augmented matrix and the resulting reduced row-echelon form are given by \[\left [ \begin{array}{rrr|r} 0.4 & 0 & -0.1 & 0 \\ -0.2 & 0.2 & 0 & 0 \\ -0.2 & -0.2 & 0.1 & 0 \end{array} \right ] \rightarrow \cdots \rightarrow \left [ \begin{array}{rrr|r} 1 & 0 & -0.25 & 0 \\ 0 & 1 & -0.25 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber\] Therefore, the eigenvectors are \[t\left [ \begin{array}{c} 0.25 \\ 0.25 \\ 1 \end{array} \right ]\nonumber\]
The initial vector \(X_0\) is given by \[\left [ \begin{array}{r} 100 \\ 200 \\ 400 \end{array} \right ]\nonumber\]
Now all that remains is to choose the value of \(t\) such that \[0.25t+0.25t+t=100+200+400\nonumber\] Solving this equation for \(t\) yields \(t= \ \frac{1400}{3}\). Therefore the population in the long run is given by \[ \ \frac{1400}{3}\left [ \begin{array}{c} 0.25 \\ 0.25 \\ 1 \end{array} \right ] = \left [ \begin{array}{c} 116. 666\,666\,666\, 666\,7 \\ 116. 666\,666\,666\, 666\,7 \\ 466. 666\,666\,666\, 666\,7 \end{array} \right ]\nonumber\]
Again, because we are working with populations, these values need to be rounded. The steady state vector \(X_s\) is given by \[\left [ \begin{array}{c} 117 \\ 117 \\ 466 \end{array} \right ]\nonumber\]
We can see that the numbers we calculated in Example \(\PageIndex{5}\) for the populations after the \(10^{th}\) unit of time are not far from the long term values.
Consider another example.
Suppose a migration matrix is given by \[A = \left [ \begin{array}{ccc} \ \frac{1}{5} & \ \frac{1}{2} & \ \frac{1}{5} \\ \ \frac{1}{4} & \ \frac{1}{4} & \ \frac{1}{2} \\ \ \frac{11}{20} & \ \frac{1}{4} & \ \frac{3}{10} \end{array} \right ]\nonumber\] Find the comparison between the populations in the three locations after a long time.
Solution
In order to compare the populations in the long term, we want to find the steady state vector \(X_s\). Solve \[\left( \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ] -\left [ \begin{array}{ccc} \ \frac{1}{5} & \ \frac{1}{2} & \ \frac{1}{5} \\ \ \frac{1}{4} & \ \frac{1}{4} & \ \frac{1}{2} \\ \ \frac{11}{20} & \ \frac{1}{4} & \ \frac{3}{10} \end{array} \right ] \right) \left [ \begin{array}{c} x_{1s} \\ x_{2s} \\ x_{3s} \end{array} \right ] =\left [ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right ]\nonumber\] The augmented matrix and the resulting reduced row-echelon form are given by \[\left [ \begin{array}{rrr|r} \ \frac{4}{5} & - \ \frac{1}{2} & - \ \frac{1}{5} & 0 \\ - \ \frac{1}{4} & \ \frac{3}{4} & - \ \frac{1}{2} & 0 \\ - \ \frac{11}{20} & - \ \frac{1}{4} & \ \frac{7}{10} & 0 \end{array} \right ] \rightarrow \cdots \rightarrow \left [ \begin{array}{rrr|r} 1 & 0 & - \ \frac{16}{19} & 0 \\ 0 & 1 & - \ \frac{18}{19} & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber\] and so an eigenvector is \[\left [ \begin{array}{c} 16 \\ 18 \\ 19 \end{array} \right ]\nonumber\]
Therefore, the proportion of population in location 2 to location 1 is given by \( \ \frac{18}{16}\). The proportion of population 3 to location 2 is given by \( \ \frac{19}{18}\).
Eigenvalues of Markov Matrices
The following is an important proposition.
Let \(A=\left [ a_{ij}\right ]\) be a migration matrix. Then \(1\) is always an eigenvalue for \(A.\)
- Proof
-
Remember that the determinant of a matrix always equals that of its transpose. Therefore, \[\det \left( \lambda I - A\right) =\det \left( \left( \lambda I - A\right) ^{T}\right) =\det \left( \lambda I - A^T\right)\nonumber\] because \(I^{T}=I.\) Thus the characteristic equation for \(A\) is the same as the characteristic equation for \(A^{T}\). Consequently, \(A\) and \(A^{T}\) have the same eigenvalues. We will show that \(1\) is an eigenvalue for \(A^{T}\) and then it will follow that \(1\) is an eigenvalue for \(A\).
Remember that for a migration matrix, \(\sum_{i}a_{ij}=1.\) Therefore, if \(A^{T}=\left [ b_{ij}\right ]\) with \(b_{ij}=a_{ji},\) it follows that \[\sum_{j}b_{ij}=\sum_{j}a_{ji}=1\nonumber\]
Therefore, from matrix multiplication, \[A^{T}\left [ \begin{array}{r} 1 \\ \vdots \\ 1 \end{array} \right ] =\left [ \begin{array}{c} \sum_{j}b_{ij} \\ \vdots \\ \sum_{j}b_{ij} \end{array} \right ] =\left [ \begin{array}{r} 1 \\ \vdots \\ 1 \end{array} \right ]\nonumber\]
Notice that this shows that \(\left [ \begin{array}{r} 1 \\ \vdots \\ 1 \end{array} \right ]\) is an eigenvector for \(A^{T}\) corresponding to the eigenvalue, \(\lambda =1.\) As explained above, this shows that \(\lambda =1\) is an eigenvalue for \(A\) because \(A\) and \(A^{T}\) have the same eigenvalues.
Below is a video on powers of matrices and Markov matrices.
Dynamical Systems
The migration matrices discussed above give an example of a discrete dynamical system. We call them discrete because they involve discrete values taken at a sequence of points rather than on a continuous interval of time.
An example of a situation which can be studied in this way is a predator prey model. Consider the following model where \(x\) is the number of prey and \(y\) the number of predators in a certain area at a certain time. These are functions of \(n\in \mathbb{N}\) where \(n=1,2,\cdots\) are the ends of intervals of time which may be of interest in the problem. In other words, \(x \left( n \right)\) is the number of prey at the end of the \(n^{th}\) interval of time. An example of this situation may be modeled by the following equation \[\left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] =\left [ \begin{array}{rr} 2 & -3 \\ 1 & 4 \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber\] This says that from time period \(n\) to \(n+1\), \(x\) increases if there are more \(x\) and decreases as there are more \(y\). In the context of this example, this means that as the number of predators increases, the number of prey decreases. As for \(y,\) it increases if there are more \(y\) and also if there are more \(x\).
This is an example of a matrix recurrence which we define now.
Suppose a dynamical system is given by \[\begin{aligned} x_{n+1} &= a x_n + b y_n \\ y_{n+1} &= c x_n + d y_n\end{aligned}\]
This system can be expressed as \(V_{n+1} = A V_{n}\) where \(V_{n} = \left [ \begin{array}{r} x_n \\ y_n \end{array} \right ]\) and \(A = \left [ \begin{array}{rr} a & b \\ c & d \end{array} \right ]\).
In this section, we will examine how to find solutions to a dynamical system given certain initial conditions. This process involves several concepts previously studied, including matrix diagonalization and Markov matrices. The procedure is given as follows. Recall that when diagonalized, we can write \(A^{n} = PD^{n}P^{-1}\).
Suppose a dynamical system is given by \[\begin{aligned} x_{n+1} &= a x_n + b y_n \\ y_{n+1} &= c x_n + d y_n\end{aligned}\]
Given initial conditions \(x_0\) and \(y_0\), the solutions to the system are found as follows:
- Express the dynamical system in the form \(V_{n+1} = AV_n\).
- Diagonalize \(A\) to be written as \(A = PDP^{-1}\).
- Then \(V_{n} = PD^{n} P^{-1} V_{0}\) where \(V_{0}\) is the vector containing the initial conditions.
- If given specific values for \(n\), substitute into this equation. Otherwise, find a general solution for \(n\).
We will now consider an example in detail.
Suppose a dynamical system is given by \[\begin{aligned} x_{n+1} &= 1.5 x_n - 0.5y_n\\ y_{n+1} &= 1.0 x_n\end{aligned}\]
Express this system as a matrix recurrence and find solutions to the dynamical system for initial conditions \(x_0=20, y_0=10\).
Solution
First, we express the system as a matrix recurrence. \[\begin{aligned} V_{n+1} &= AV_{n}\\ \left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] &=\left [ \begin{array}{rr} 1.5 & -0.5 \\ 1.0 & 0 \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\end{aligned}\]
Then \[A = \left [ \begin{array}{rr} 1.5 & -0.5 \\ 1.0 & 0 \end{array} \right ]\nonumber \] You can verify that the eigenvalues of \(A\) are \(1\) and \(.5\). By diagonalizing, we can write \(A\) in the form \[P^{-1} D P = \left [ \begin{array}{rr} 1 & 1 \\ 1 & 2 \end{array} \right ] \left [ \begin{array}{rr} 1 & 0 \\ 0 & .5 \end{array} \right ] \left [ \begin{array}{rr} 2 & -1 \\ -1 & 1 \end{array} \right ]\nonumber\]
Now given an initial condition \[V_0 = \left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ]\nonumber\] the solution to the dynamical system is given by \[\begin{aligned} V_n &= P D^n P^{-1} V_0\\ \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ] &=\left [ \begin{array}{rr} 1 & 1 \\ 1 & 2 \end{array} \right ] \left [ \begin{array}{rr} 1 & 0 \\ 0 & .5 \end{array} \right ] ^{n}\left [ \begin{array}{rr} 2 & -1 \\ -1 & 1 \end{array} \right ] \left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ] \\ &=\left [ \begin{array}{rr} 1 & 1 \\ 1 & 2 \end{array} \right ] \left [ \begin{array}{rr} 1 & 0 \\ 0 & \left( .5\right) ^{n} \end{array} \right ] \left [ \begin{array}{rr} 2 & -1 \\ -1 & 1 \end{array} \right ] \left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ] \\ &=\left [ \begin{array}{c} y_{0}\left( \left( .5\right) ^{n}-1\right) -x_{0}\left( \left( .5\right) ^{n}-2\right) \\ y_{0}\left( 2\left( .5\right) ^{n}-1\right) -x_{0}\left( 2\left( .5\right) ^{n}-2\right) \end{array} \right ] \end{aligned}\]
If we let \(n\) become arbitrarily large, this vector approaches \[\left [ \begin{array}{c} 2x_{0}-y_{0} \\ 2x_{0}-y_{0} \end{array} \right ]\nonumber\]
Thus for large \(n,\) \[\left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ] \approx \left [ \begin{array}{c} 2x_{0}-y_{0} \\ 2x_{0}-y_{0} \end{array} \right ]\nonumber\]
Now suppose the initial condition is given by \[\left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ] = \left [ \begin{array}{r} 20 \\ 10 \end{array} \right ]\nonumber\]
Then, we can find solutions for various values of \(n\). Here are the solutions for values of \(n\) between \(1\) and \(5\) \[n=1: \left [ \begin{array}{r} 25.0 \\ 20.0 \end{array} \right ], n=2: \left [ \begin{array}{r} 27.5 \\ 25.0 \end{array} \right ], n=3: \left [ \begin{array}{r} 28.75 \\ 27.5 \end{array} \right ]\nonumber\] \[n=4: \left [ \begin{array}{r} 29.375 \\ 28.75 \end{array} \right ], n=5: \left [ \begin{array}{r} 29.688 \\ 29.375 \end{array} \right ]\nonumber\]
Notice that as \(n\) increases, we approach the vector given by \[\left [ \begin{array}{c} 2x_{0}-y_{0} \\ 2x_{0}-y_{0} \end{array} \right ] = \left [ \begin{array}{r} 2\left(20\right)- 10\\ 2\left( 20 \right)-10 \end{array} \right ] = \left [ \begin{array}{r} 30\\ 30 \end{array} \right ]\nonumber\]
These solutions are graphed in the following figure.
The following example demonstrates another system which exhibits some interesting behavior. When we graph the solutions, it is possible for the ordered pairs to spiral around the origin.
Suppose a dynamical system is of the form \[\left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] =\left [ \begin{array}{rr} 0.7 & 0.7 \\ -0.7 & 0.7 \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber \] Find solutions to the dynamical system for given initial conditions.
Solution
Let \[A = \left [ \begin{array}{rr} 0.7 & 0.7 \\ -0.7 & 0.7 \end{array} \right ]\nonumber\] To find solutions, we must diagonalize \(A\). You can verify that the eigenvalues of \(A\) are complex and are given by \(\lambda_1 = .7+.7i\) and \(\lambda_2 = .7-.7i\). The eigenvector for \(\lambda_1 = .7+.7i\) is \[\left [ \begin{array}{r} 1 \\ i \end{array} \right ]\nonumber\] and that the eigenvector for \(\lambda_2 = .7-.7i\) is \[\left [ \begin{array}{r} 1 \\ -i \end{array} \right ]\nonumber\]
Thus the matrix \(A\) can be written in the form \[\left [ \begin{array}{rr} 1 & 1 \\ i & -i \end{array} \right ] \left [ \begin{array}{cc} .7+.7i & 0 \\ 0 & .7-.7i \end{array} \right ] \left [ \begin{array}{rr} \frac{1}{2} & - \frac{1}{2}i \\ \frac{1}{2} & \frac{1}{2}i \end{array} \right ]\nonumber\] and so, \[\begin{aligned} V_n &= PD^nP^{-1}V_0 \\ \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ] &=\left [ \begin{array}{rr} 1 & 1 \\ i & -i \end{array} \right ] \left [ \begin{array}{cc} \left( .7+.7i\right) ^{n} & 0 \\ 0 & \left( .7-.7i\right) ^{n} \end{array} \right ] \left [ \begin{array}{rr} \frac{1}{2} & - \frac{1}{2}i \\ \frac{1}{2} & \frac{1}{2}i \end{array} \right ] \left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ]\end{aligned}\]
The explicit solution is given by \[\left [ \begin{array}{c} x_{0}\left( \frac{1}{2}\left( 0.7-0.7i \right) ^{n}+ \frac{1}{2} \left( 0.7+0.7i\right) ^{n}\right) + y_{0}\left( \frac{1}{2} i\left( 0.7-0.7i\right) ^{n}-\frac{1}{2}i \left( 0.7+0.7i\right) ^{n}\right) \\ y_{0}\left( \frac{1}{2} \left( 0.7-0.7i \right) ^{n}+ \frac{1}{2} \left( 0.7+0.7i\right) ^{n}\right) - x_{0}\left( \frac{1}{2} i\left( 0.7-0.7i\right) ^{n}- \frac{1}{2}i\left( 0.7+0.7i\right) ^{n}\right) \end{array} \right ]\nonumber\]
Suppose the initial condition is \[\left [ \begin{array}{c} x_{0} \\ y_{0} \end{array} \right ] =\left [ \begin{array}{r} 10 \\ 10 \end{array} \right ]\nonumber\] Then one obtains the following sequence of values which are graphed below by letting \(n=1,2,\cdots ,20\)
In this picture, the dots are the values and the dashed line is to help to picture what is happening.
These points are getting gradually closer to the origin, but they are circling the origin in the clockwise direction as they do so. As \(n\) increases, the vector \(\left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\) approaches \(\left [ \begin{array}{r} 0 \\ 0 \end{array} \right ]\)
This type of behavior along with complex eigenvalues is typical of the deviations from an equilibrium point in the Lotka Volterra system of differential equations which is a famous model for predator-prey interactions. These differential equations are given by \[\begin{aligned} x^{\prime } &=x\left( a-by\right) \\ y^{\prime } &=-y\left( c-dx\right)\end{aligned}\] where \(a,b,c,d\) are positive constants. For example, you might have \(X\) be the population of moose and \(Y\) the population of wolves on an island.
Note that these equations make logical sense. The top says that the rate at which the moose population increases would be \(aX\) if there were no predators \(Y\). However, this is modified by multiplying instead by \(\left( a-bY\right)\) because if there are predators, these will militate against the population of moose. The more predators there are, the more pronounced is this effect. As to the predator equation, you can see that the equations predict that if there are many prey around, then the rate of growth of the predators would seem to be high. However, this is modified by the term \(-cY\) because if there are many predators, there would be competition for the available food supply and this would tend to decrease \(Y^{\prime }.\)
The behavior near an equilibrium point, which is a point where the right side of the differential equations equals zero, is of great interest. In this case, the equilibrium point is \[x=\frac{c}{d}, y=\frac{a}{b}\nonumber\] Then one defines new variables according to the formula \[x+\frac{c}{d}=x,\ y=y+\frac{a}{b}\nonumber\] In terms of these new variables, the differential equations become \[\begin{aligned} x^{\prime } &=\left( x+\frac{c}{d}\right) \left( a-b\left( y+\frac{a}{b} \right) \right) \\ y^{\prime } &=-\left( y+\frac{a}{b}\right) \left( c-d\left( x+\frac{c}{d} \right) \right)\end{aligned}\] Multiplying out the right sides yields \[\begin{aligned} x^{\prime } &=-bxy-b\frac{c}{d}y \\ y^{\prime } &=dxy+\frac{a}{b}dx\end{aligned}\] The interest is for \(x,y\) small and so these equations are essentially equal to \[x^{\prime }=-b\frac{c}{d}y,\ y^{\prime }=\frac{a}{b}dx\nonumber\]
Replace \(x^{\prime }\) with the difference quotient \(\frac{x\left( t+h\right) -x\left( t\right) }{h}\) where \(h\) is a small positive number and \(y^{\prime }\) with a similar difference quotient. For example one could have \(h\) correspond to one day or even one hour. Thus, for \(h\) small enough, the following would seem to be a good approximation to the differential equations. \[\begin{aligned} x\left( t+h\right) &=x\left( t\right) -hb\frac{c}{d}y \\ y\left( t+h\right) &=y\left( t\right) +h\frac{a}{b}dx\end{aligned}\] Let \(1,2,3,\cdots\) denote the ends of discrete intervals of time having length \(h\) chosen above. Then the above equations take the form \[\left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] =\left [ \begin{array}{cc} 1 & - \frac{hbc}{d} \\ \frac{had}{b} & 1 \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber\] Note that the eigenvalues of this matrix are always complex.
We are not interested in time intervals of length \(h\) for \(h\) very small. Instead, we are interested in much longer lengths of time. Thus, replacing the time interval with \(mh,\) \[\left [ \begin{array}{c} x\left( n+m\right) \\ y\left( n+m\right) \end{array} \right ] =\left [ \begin{array}{cc} 1 & - \frac{hbc}{d} \\ \frac{had}{b} & 1 \end{array} \right ] ^{m}\left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber\] For example, if \(m=2,\) you would have \[\left [ \begin{array}{c} x\left( n+2\right) \\ y\left( n+2\right) \end{array} \right ] =\left [ \begin{array}{cc} 1-ach^{2} & -2b \frac{c}{d}h \\ 2 \frac{a}{b}dh & 1-ach^{2} \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber\] Note that most of the time, the eigenvalues of the new matrix will be complex.
You can also notice that the upper right corner will be negative by considering higher powers of the matrix. Thus letting \(1,2,3,\cdots\) denote the ends of discrete intervals of time, the desired discrete dynamical system is of the form \[\left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] =\left [ \begin{array}{rr} a & -b \\ c & d \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ]\nonumber\] where \(a,b,c,d\) are positive constants and the matrix will likely have complex eigenvalues because it is a power of a matrix which has complex eigenvalues.
You can see from the above discussion that if the eigenvalues of the matrix used to define the dynamical system are less than 1 in absolute value, then the origin is stable in the sense that as \(n\rightarrow \infty ,\) the solution converges to the origin. If either eigenvalue is larger than 1 in absolute value, then the solutions to the dynamical system will usually be unbounded, unless the initial condition is chosen very carefully. The next example exhibits the case where one eigenvalue is larger than 1 and the other is smaller than 1.
The following example demonstrates a familiar concept as a dynamical system.
The Fibonacci sequence is the sequence given by \[1, 1, 2, 3, 5, \cdots\nonumber\] which is defined recursively in the form \[x\left( 0\right) =1=x\left( 1\right) ,\ x\left( n+2\right) =x\left( n+1\right) +x\left( n\right)\nonumber\] Show how the Fibonacci Sequence can be considered a dynamical system.
Solution
This sequence is extremely important in the study of reproducing rabbits. It can be considered as a dynamical system as follows. Let \(y\left( n\right) =x\left( n+1\right) .\) Then the above recurrence relation can be written as \[\left [ \begin{array}{c} x\left( n+1\right) \\ y\left( n+1\right) \end{array} \right ] =\left [ \begin{array}{rr} 0 & 1 \\ 1 & 1 \end{array} \right ] \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ] ,\ \left [ \begin{array}{c} x\left( 0\right) \\ y\left( 0\right) \end{array} \right ] =\left [ \begin{array}{r} 1 \\ 1 \end{array} \right ]\nonumber \]
Let \[A = \left [ \begin{array}{rr} 0 & 1 \\ 1 & 1 \end{array} \right ]\nonumber \]
The eigenvalues of the matrix \(A\) are \(\lambda_1 = \frac{1}{2}-\frac{1}{2}\sqrt{5}\) and \(\lambda_2 = \frac{1}{2}\sqrt{5}+\frac{1}{2}\). The corresponding eigenvectors are, respectively, \[X_1 = \left [ \begin{array}{c} - \frac{1}{2}\sqrt{5}- \frac{1}{2} \\ 1 \end{array} \right ] , X_2 = \left [ \begin{array}{c} \frac{1}{2}\sqrt{5}- \frac{1}{2} \\ 1 \end{array} \right ]\nonumber \]
You can see from a short computation that one of the eigenvalues is smaller than 1 in absolute value while the other is larger than 1 in absolute value. Now, diagonalizing \(A\) gives us \[\left [ \begin{array}{cc} \frac{1}{2}\sqrt{5}- \frac{1}{2} & - \frac{1}{2}\sqrt{5}- \frac{1}{2} \\ 1 & 1 \end{array} \right ] ^{-1}\left [ \begin{array}{rr} 0 & 1 \\ 1 & 1 \end{array} \right ] \left [ \begin{array}{cc} \frac{1}{2}\sqrt{5}- \frac{1}{2} & - \frac{1}{2}\sqrt{5}- \frac{1}{2} \\ 1 & 1 \end{array} \right ]\nonumber \] \[=\left [ \begin{array}{cc} \frac{1}{2}\sqrt{5}+ \frac{1}{2} & 0 \\ 0 & \frac{1}{2}- \frac{1}{2}\sqrt{5} \end{array} \right ]\nonumber \]
Then it follows that for a given initial condition, the solution to this dynamical system is of the form \[\begin{aligned} \left [ \begin{array}{c} x\left( n\right) \\ y\left( n\right) \end{array} \right ] &=\left [ \begin{array}{cc} \frac{1}{2}\sqrt{5}- \frac{1}{2} & - \frac{1}{2}\sqrt{5}- \frac{1}{2} \\ 1 & 1 \end{array} \right ] \left [ \begin{array}{cc} \left( \frac{1}{2}\sqrt{5}+ \frac{1}{2}\right) ^{n} & 0 \\ 0 & \left( \frac{1}{2}- \frac{1}{2}\sqrt{5}\right) ^{n} \end{array} \right ] \cdot \\ &\left [ \begin{array}{cc} \frac{1}{5}\sqrt{5} & \frac{1}{10}\sqrt{5}+ \frac{1}{2} \\ - \frac{1}{5}\sqrt{5} & \frac{1}{5}\sqrt{5}\left( \frac{1}{2}\sqrt{5}- \frac{1 }{2}\right) \end{array} \right ] \left [ \begin{array}{r} 1 \\ 1 \end{array} \right ]\end{aligned}\] It follows that \[x\left( n\right) =\left( \frac{1}{2}\sqrt{5}+\frac{1}{2}\right) ^{n}\left( \frac{1}{10}\sqrt{5}+\frac{1}{2}\right) +\left( \frac{1}{2}-\frac{1}{2}\sqrt{5}\right) ^{n}\left( \frac{1}{2}-\frac{1}{10}\sqrt{5}\right)\nonumber \]
Here is a picture of the ordered pairs \(\left( x\left( n\right) ,y\left( n\right) \right)\) for \(n=0,1,\cdots ,n\).
There is so much more that can be said about dynamical systems. It is a major topic of study in differential equations and what is given above is just an introduction.
Below is a video Fibonacci numbers using linear algebra.
The Matrix Exponential
The goal of this section is to use the concept of the matrix exponential to solve first order linear differential equations. We begin by proving the matrix exponential.
Suppose \(A\) is a diagonalizable matrix. Then the matrix exponential , written \(e^{A}\), can be easily defined. Recall that if \(D\) is a diagonal matrix, then \[P^{-1}AP=D\nonumber \] \(D\) is of the form \[\left [ \begin{array}{ccc} \lambda _{1} & & 0 \\ & \ddots & \\ 0 & & \lambda _{n} \end{array} \right ] \label{diagonalmatrix}\] and it follows that \[D^{m}=\left [ \begin{array}{ccc} \lambda _{1}^{m} & & 0 \\ & \ddots & \\ 0 & & \lambda _{n}^{m} \end{array} \right ]\nonumber \]
Since \(A\) is diagonalizable, \[A=PDP^{-1}\nonumber \] and \[A^{m}=PD^{m}P^{-1}\nonumber \]
Recall why this is true. \[A=PDP^{-1}\nonumber \] and so \[\begin{aligned} A^{m} &=\overset{ \text{m times}}{\overbrace{PDP^{-1}PDP^{-1}PDP^{-1}\cdots PDP^{-1}}} \\ &=PD^{m}P^{-1}\end{aligned}\]
We now will examine what is meant by the matrix exponental \(e^{A}\). Begin by formally writing the following power series for \(e^{A}\): \[e^{A} = \sum_{k=0}^{\infty }\frac{A^{k}}{k!}=\sum_{k=0}^{\infty }\frac{PD^{k}P^{-1}}{k!}=P \left( \sum_{k=0}^{\infty }\frac{D^{k}}{k!} \right)P^{-1}\nonumber \] If \(D\) is given above in \(\eqref{diagonalmatrix}\), the above sum is of the form \[P \left( \sum_{k=0}^{\infty }\left [ \begin{array}{ccc} \frac{1}{k!}\lambda _{1}^{k} & & 0 \\ & \ddots & \\ 0 & & \frac{1}{k!}\lambda _{n}^{k} \end{array} \right ] \right) P^{-1}\nonumber \] This can be rearranged as follows: \[e^{A}=P\left [ \begin{array}{ccc} \sum_{k=0}^{\infty }\frac{1}{k!}\lambda _{1}^{k} & & 0 \\ & \ddots & \\ 0 & & \sum_{k=0}^{\infty }\frac{1}{k!}\lambda _{n}^{k} \end{array} \right ] P^{-1}\nonumber \] \[=P\left [ \begin{array}{ccc} e^{\lambda _{1}} & & 0 \\ & \ddots & \\ 0 & & e^{\lambda _{n}} \end{array} \right ] P^{-1}\nonumber \]
This justifies the following theorem.
Let \(A\) be a diagonalizable matrix, with eigenvalues \(\lambda_1, ..., \lambda_n\)and corresponding matrix of eigenvectors \(P\). Then the matrix exponential, \(e^{A}\), is given by \[e^{A} = P\left [ \begin{array}{ccc} e^{\lambda _{1}} & & 0 \\ & \ddots & \\ 0 & & e^{\lambda _{n}} \end{array} \right ] P^{-1}\nonumber \]
Let \[A=\left [ \begin{array}{rrr} 2 & -1 & -1 \\ 1 & 2 & 1 \\ -1 & 1 & 2 \end{array} \right ]\nonumber \] Find \(e^{A}\).
Solution
The eigenvalues work out to be \(1,2,3\) and eigenvectors associated with these eigenvalues are \[\left [ \begin{array}{r} 0 \\ -1 \\ 1 \end{array} \right ] \leftrightarrow 1, \left [ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right ] \leftrightarrow 2,\left [ \begin{array}{r} -1 \\ 0 \\ 1 \end{array} \right ] \leftrightarrow 3\nonumber \] Then let \[D=\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{array} \right ], P=\left [ \begin{array}{rrr} 0 & -1 & -1 \\ -1 & -1 & 0 \\ 1 & 1 & 1 \end{array} \right ]\nonumber \] and so \[P^{-1}=\left [ \begin{array}{rrr} 1 & 0 & 1 \\ -1 & -1 & -1 \\ 0 & 1 & 1 \end{array} \right ]\nonumber \]
Then the matrix exponential is \[e^{At} = \left [ \begin{array}{rrr} 0 & -1 & -1 \\ -1 & -1 & 0 \\ 1 & 1 & 1 \end{array} \right ] \left [ \begin{array}{ccc} e^{1} & 0 & 0 \\ 0 & e^{2} & 0 \\ 0 & 0 & e^{3} \end{array} \right ] \left [ \begin{array}{rrr} 1 & 0 & 1 \\ -1 & -1 & -1 \\ 0 & 1 & 1 \end{array} \right ]\nonumber \] \[\left [ \begin{array}{ccc} e^{2} & e^{2}-e^{3} & e^{2}-e^{3} \\ e^{2}-e & e^{2} & e^{2}-e \\ -e^{2}+e & -e^{2}+e^{3} & -e^{2}+e+e^{3} \end{array} \right ]\nonumber \]
The matrix exponential is a useful tool to solve autonomous systems of first order linear differential equations. These are equations which are of the form \[X^{\prime }=AX, X(0) = C\nonumber \] where \(A\) is a diagonalizable \(n\times n\) matrix and \(C\) is a constant vector. \(X\) is a vector of functions in one variable, \(t\): \[X = X(t) = \left [ \begin{array}{c} x_1(t) \\ x_2(t) \\ \vdots \\ x_n(t) \end{array} \right ]\nonumber \] Then \(X^{\prime }\) refers to the first derivative of \(X\) and is given by \[X^{\prime} = X^{\prime}(t) = \left [ \begin{array}{c} x_1^{\prime}(t) \\ x_2^{\prime}(t) \\ \vdots \\ x_n^{\prime}(t) \end{array} \right ], \; x_i^{\prime}(t) = \text{the derivative of}\; x_i(t)\nonumber \]
Then it turns out that the solution to the above system of equations is \(X\left( t\right) =e^{At}C\). To see this, suppose \(A\) is diagonalizable so that \[A=P\left [ \begin{array}{ccc} \lambda _{1} & & \\ & \ddots & \\ & & \lambda _{n} \end{array} \right ] P^{-1}\nonumber \] Then \[e^{At}=P\left [ \begin{array}{ccc} e^{\lambda _{1}t} & & \\ & \ddots & \\ & & e^{\lambda _{n}t} \end{array} \right ] P^{-1}\nonumber \] \[e^{At}C=P\left [ \begin{array}{ccc} e^{\lambda _{1}t} & & \\ & \ddots & \\ & & e^{\lambda _{n}t} \end{array} \right ] P^{-1}C\nonumber \]
Differentiating \(e^{At}C\) yields \[X^{\prime} = \left( e^{At}C\right) ^{\prime }=P\left [ \begin{array}{ccc} \lambda _{1}e^{\lambda _{1}t} & & \\ & \ddots & \\ & & \lambda _{n}e^{\lambda _{n}t} \end{array} \right ] P^{-1}C\nonumber \] \[=P\left [ \begin{array}{ccc} \lambda _{1} & & \\ & \ddots & \\ & & \lambda _{n} \end{array} \right ] \left [ \begin{array}{ccc} e^{\lambda _{1}t} & & \\ & \ddots & \\ & & e^{\lambda _{n}t} \end{array} \right ] P^{-1}C\nonumber \] \[\begin{aligned} &=P\left [ \begin{array}{ccc} \lambda _{1} & & \\ & \ddots & \\ & & \lambda _{n} \end{array} \right ] P^{-1}P\left [ \begin{array}{ccc} e^{\lambda _{1}t} & & \\ & \ddots & \\ & & e^{\lambda _{n}t} \end{array} \right ] P^{-1}C \\ &=A\left( e^{At}C\right) = AX\end{aligned}\] Therefore \(X = X(t) = e^{At}C\) is a solution to \(X^{\prime }=AX\).
To prove that \(X(0) = C\) if \(X(t) = e^{At}C\): \[X(0) = e^{A0}C=P\left [ \begin{array}{ccc} 1 & & \\ & \ddots & \\ & & 1 \end{array} \right ] P^{-1}C=C\nonumber \]
Solve the initial value problem \[\left [ \begin{array}{c} x \\ y \end{array} \right ] ^{\prime }=\left [ \begin{array}{rr} 0 & -2 \\ 1 & 3 \end{array} \right ] \left [ \begin{array}{c} x \\ y \end{array} \right ] ,\ \left [ \begin{array}{c} x(0)\\ y(0) \end{array} \right ] =\left [ \begin{array}{c} 1 \\ 1 \end{array} \right ]\nonumber \]
Solution
The matrix is diagonalizable and can be written as \[\begin{aligned} A &= PDP^{-1} \\ \left [ \begin{array}{rr} 0 & -2 \\ 1 & 3 \end{array} \right ] &=\left [ \begin{array}{rr} 1 & 1 \\ -\frac{1}{2} & -1 \end{array} \right ] \left [ \begin{array}{rr} 1 & 0 \\ 0 & 2 \end{array} \right ] \left [ \begin{array}{rr} 2 & 2 \\ -1 & -2 \end{array} \right ]\end{aligned}\] Therefore, the matrix exponential is of the form \[e^{At} = \left [ \begin{array}{rr} 1 & 1 \\ -\frac{1}{2} & -1 \end{array} \right ] \left [ \begin{array}{cc} e^{t} & 0 \\ 0 & e^{2t} \end{array} \right ] \left [ \begin{array}{rr} 2 & 2 \\ -1 & -2 \end{array} \right ]\nonumber \] The solution to the initial value problem is \[\begin{aligned} X(t) &= e^{At}C \\ \left [ \begin{array}{c} x\left( t\right) \\ y\left( t\right) \end{array} \right ] &= \left [ \begin{array}{rr} 1 & 1 \\ -\frac{1}{2} & -1 \end{array} \right ] \left [ \begin{array}{cc} e^{t} & 0 \\ 0 & e^{2t} \end{array} \right ] \left [ \begin{array}{rr} 2 & 2 \\ -1 & -2 \end{array} \right ] \left [ \begin{array}{c} 1 \\ 1 \end{array} \right ] \\ &=\left [ \begin{array}{c} 4e^{t}-3e^{2t} \\ 3e^{2t}-2e^{t} \end{array} \right ]\end{aligned}\] We can check that this works: \[\begin{aligned} \left [ \begin{array}{c} x\left( 0\right) \\ y\left( 0\right) \end{array} \right ] &= \left [ \begin{array}{c} 4e^{0}-3e^{2(0)} \\ 3e^{2(0)}-2e^{0} \end{array} \right ] \\ &= \left [ \begin{array}{c} 1 \\ 1 \end{array} \right ]\end{aligned}\]
Lastly, \[X^{\prime} = \left [ \begin{array}{c} 4e^{t}-3e^{2t} \\ 3e^{2t}-2e^{t} \end{array} \right ] ^{\prime }=\left [ \begin{array}{c} 4e^{t}-6e^{2t} \\ 6e^{2t}-2e^{t} \end{array} \right ]\nonumber \] and \[AX = \left [ \begin{array}{rr} 0 & -2 \\ 1 & 3 \end{array} \right ] \left [ \begin{array}{c} 4e^{t}-3e^{2t} \\ 3e^{2t}-2e^{t} \end{array} \right ] =\left [ \begin{array}{c} 4e^{t}-6e^{2t} \\ 6e^{2t}-2e^{t} \end{array} \right ]\nonumber \] which is the same thing. Thus this is the solution to the initial value problem.
Below is a video on the using linear algebra to solve systems of differential equations.