8.2: Diagonalization
-
- Last updated
- Save as PDF
Outcomes
- Determine when it is possible to diagonalize a matrix.
- When possible, diagonalize a matrix.
Similarity and Diagonalization
We begin this section by recalling the definition of similar matrices. Recall that if \(A,B\) are two \(n\times n\) matrices, then they are similar if and only if there exists an invertible matrix \(P\) such that \[A=P^{-1}BP\nonumber \]
In this case we write \(A \sim B\). The concept of similarity is an example of an equivalence relation .
Lemma \(\PageIndex{1}\): Similarity is an Equivalence Relation
Similarity is an equivalence relation, i.e. for \(n \times n\) matrices \(A,B,\) and \(C\),
- \(A \sim A\) (reflexive)
- If \(A \sim B\), then \(B \sim A\) (symmetric)
- If \(A \sim B\) and \(B \sim C\), then \(A \sim C\) (transitive)
- Proof
-
It is clear that \(A\sim A\), taking \(P=I\).
Now, if \(A\sim B,\) then for some \(P\) invertible, \[A=P^{-1}BP\nonumber \] and so \[PAP^{-1}=B\nonumber \] But then \[\left( P^{-1}\right) ^{-1}AP^{-1}=B\nonumber \] which shows that \(B\sim A\).
Now suppose \(A\sim B\) and \(B\sim C\). Then there exist invertible matrices \(P,Q\) such that \[A=P^{-1}BP,\ B=Q^{-1}CQ\nonumber \] Then, \[A=P^{-1} \left( Q^{-1}CQ \right)P=\left( QP\right) ^{-1}C\left( QP\right)\nonumber \] showing that \(A\) is similar to \(C\).
Another important concept necessary to this section is the trace of a matrix. Consider the definition.
In words, the trace of a matrix is the sum of the entries on the main diagonal.
The following theorem includes a reference to the characteristic polynomial of a matrix. Recall that for any \(n \times n\) matrix \(A\), the characteristic polynomial of \(A\) is \(c_A(x)=\det(xI-A)\).
We now proceed to the main concept of this section. When a matrix is similar to a diagonal matrix, the matrix is said to be diagonalizable. We define a diagonal matrix \(D\) as a matrix containing a zero in every entry except those on the main diagonal. More precisely, if \(d_{ij}\) is the \(ij^{th}\) entry of a diagonal matrix \(D\), then \(d_{ij}=0\) unless \(i=j\). Such matrices look like the following. \[D = \left[ \begin{array}{ccc} \ast & & 0 \\ & \ddots & \\ 0 & & \ast \end{array} \right]\nonumber \] where \(\ast\) is a number which might not be zero.
The following is the formal definition of a diagonalizable matrix.
Notice that the above equation can be rearranged as \(A=PDP^{-1}\). Suppose we wanted to compute \(A^{100}\). By diagonalizing \(A\) first it suffices to then compute \(\left(PDP^{-1}\right)^{100}\), which reduces to \(PD^{100}P^{-1}\). This last computation is much simpler than \(A^{100}\). While this process is described in detail later, it provides motivation for diagonalization.
Diagonalizing a Matrix
The most important theorem about diagonalizability is the following major result.
Theorem \(\PageIndex{2}\): Eigenvectors and Diagonalizable Matrices
An \(n\times n\) matrix \(A\) is diagonalizable if and only if there is an invertible matrix \(P\) given by \[P=\left[\begin{array}{cccc} X_{1} & X_{2} & \cdots & X_{n} \end{array} \right]\nonumber\] where the \(X_{k}\) are eigenvectors of \(A\).
Moreover if \(A\) is diagonalizable, the corresponding eigenvalues of \(A\) are the diagonal entries of the diagonal matrix \(D\).
- Proof
-
Suppose \(P\) is given as above as an invertible matrix whose columns are eigenvectors of \(A\). Then \(P^{-1}\) is of the form \[P^{-1}=\left[\begin{array}{c} W_{1}^{T} \\ W_{2}^{T} \\ \vdots \\ W_{n}^{T} \end{array} \right]\nonumber \] where \(W_{k}^{T}X_{j}=\delta _{kj},\) which is the Kronecker’s symbol defined by \[\delta _{ij}=\left\{ \begin{array}{c} 1 \text{ if }i=j \\ 0\text{ if }i\neq j \end{array} \right.\nonumber \]
Then \[\begin{aligned} P^{-1}AP & = \left[\begin{array}{c} W_{1}^{T} \\ W_{2}^{T} \\ \vdots \\ W_{n}^{T} \end{array} \right] \left[\begin{array}{cccc} AX_{1} & AX_{2} & \cdots & AX_{n} \end{array} \right] \\ & = \left[\begin{array}{c} W_{1}^{T} \\ W_{2}^{T} \\ \vdots \\ W_{n}^{T} \end{array} \right] \left[\begin{array}{cccc} \lambda _{1}X_{1} & \lambda _{2}X_{2} & \cdots & \lambda _{n}X_{n} \end{array} \right] \\ &= \left[\begin{array}{ccc} \lambda _{1} & & 0 \\ & \ddots & \\ 0 & & \lambda _{n} \end{array} \right] \end{aligned}\]
Conversely, suppose \(A\) is diagonalizable so that \(P^{-1}AP=D.\) Let \[P=\left[\begin{array}{cccc} X_{1} & X_{2} & \cdots & X_{n} \end{array} \right]\nonumber \] where the columns are the \(X_{k}\) and \[D=\left[\begin{array}{ccc} \lambda _{1} & & 0 \\ & \ddots & \\ 0 & & \lambda _{n} \end{array} \right]\nonumber \] Then \[AP=PD=\left[\begin{array}{cccc} X_{1} & X_{2} & \cdots & X_{n} \end{array} \right] \left[\begin{array}{ccc} \lambda _{1} & & 0 \\ & \ddots & \\ 0 & & \lambda _{n} \end{array} \right]\nonumber\] and so \[\left[\begin{array}{cccc} AX_{1} & AX_{2} & \cdots & AX_{n} \end{array} \right] =\left[\begin{array}{cccc} \lambda _{1}X_{1} & \lambda _{2}X_{2} & \cdots & \lambda _{n}X_{n} \end{array} \right]\nonumber\] showing the \(X_{k}\) are eigenvectors of \(A\) and the \(\lambda _{k}\) are eigenvectors.
Notice that because the matrix \(P\) defined above is invertible it follows that the set of eigenvectors of \(A\), \(\left\{ X_1, X_2, \cdots, X_n \right\}\), form a basis of \(\mathbb{R}^n\).
We demonstrate the concept given in the above theorem in the next example. Note that not only are the columns of the matrix \(P\) formed by eigenvectors, but \(P\) must be invertible so must consist of a wide variety of eigenvectors. We achieve this by using basic eigenvectors for the columns of \(P\).
Example \(\PageIndex{1}\): Diagonalize a Matrix
Let \[A=\left[\begin{array}{rrr} 2 & 0 & 0 \\ 1 & 4 & -1 \\ -2 & -4 & 4 \end{array} \right]\nonumber\] Find an invertible matrix \(P\) and a diagonal matrix \(D\) such that \(P^{-1}AP=D\).
Solution
By Theorem \(\PageIndex{2}\) we use the eigenvectors of \(A\) as the columns of \(P\), and the corresponding eigenvalues of \(A\) as the diagonal entries of \(D\).
First, we will find the eigenvalues of \(A\). To do so, we solve \(\det \left( \lambda I -A \right) =0\) as follows. \[\det \left( \lambda \left[\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] - \left[\begin{array}{rrr} 2 & 0 & 0 \\ 1 & 4 & -1 \\ -2 & -4 & 4 \end{array} \right] \right) = 0\nonumber \]
This computation is left as an exercise, and you should verify that the eigenvalues are \(\lambda_1 =2, \lambda_2 = 2\), and \(\lambda_3 = 6\).
Next, we need to find the eigenvectors. We first find the eigenvectors for \(\lambda_1, \lambda_2 = 2\). Solving \(\left(2I - A \right)X = 0\) to find the eigenvectors, we find that the eigenvectors are \[t\left[\begin{array}{r} -2 \\ 1 \\ 0 \end{array} \right] +s\left[\begin{array}{r} 1 \\ 0 \\ 1 \end{array} \right]\nonumber \] where \(t,s\) are scalars. Hence there are two basic eigenvectors which are given by \[X_1 = \left[\begin{array}{r} -2 \\ 1 \\ 0 \end{array} \right], X_2 = \left[\begin{array}{r} 1 \\ 0 \\ 1 \end{array} \right]\nonumber \]
You can verify that the basic eigenvector for \(\lambda_3 =6\) is \(X_3 = \left[\begin{array}{r} 0 \\ 1 \\ -2 \end{array} \right]\)
Then, we construct the matrix \(P\) as follows. \[P= \left[\begin{array}{rrr} X_1 & X_2 & X_3 \end{array} \right] = \left[\begin{array}{rrr} -2 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & -2 \end{array} \right]\nonumber \] That is, the columns of \(P\) are the basic eigenvectors of \(A\). Then, you can verify that \[P^{-1}=\left[\begin{array}{rrr} - \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\ \frac{1}{2} & 1 & \frac{1}{2} \\ \frac{1}{4} & \frac{1}{2} & - \frac{1}{4} \end{array} \right]\nonumber \] Thus, \[\begin{aligned} P^{-1}AP &=\left[\begin{array}{rrr} - \frac{1}{4} & \frac{1}{2} & \frac{1}{4} \\ \frac{1}{2} & 1 & \frac{1}{2} \\ \frac{1}{4} & \frac{1}{2} & - \frac{1}{4} \end{array} \right] \left[\begin{array}{rrr} 2 & 0 & 0 \\ 1 & 4 & -1 \\ -2 & -4 & 4 \end{array} \right] \left[\begin{array}{rrr} -2 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & -2 \end{array} \right] \\ &=\left[\begin{array}{rrr} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 6 \end{array} \right] \end{aligned}\]
You can see that the result here is a diagonal matrix where the entries on the main diagonal are the eigenvalues of \(A\). We expected this based on Theorem \(\PageIndex{2}\) . Notice that eigenvalues on the main diagonal must be in the same order as the corresponding eigenvectors in \(P\).
Consider the next important theorem.
Theorem \(\PageIndex{3}\): Linearly Independent Eigenvectors
Let \(A\) be an \(n\times n\) matrix, and suppose that \(A\) has distinct eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_m\). For each \(i\), let \(X_i\) be a \(\lambda_i\)-eigenvector of \(A\). Then \(\{ X_1, X_2, \ldots, X_m\}\) is linearly independent.
The corollary that follows from this theorem gives a useful tool in determining if \(A\) is diagonalizable.
It is possible that a matrix \(A\) cannot be diagonalized. In other words, we cannot find an invertible matrix \(P\) so that \(P^{-1}AP=D\).
Consider the following example.
Example \(\PageIndex{2}\): A Matrix which cannot be Diagonalized
Let \[A = \left[\begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array} \right]\nonumber\] If possible, find an invertible matrix \(P\) and diagonal matrix \(D\) so that \(P^{-1}AP=D\).
Solution
Through the usual procedure, we find that the eigenvalues of \(A\) are \(\lambda_1 =1, \lambda_2=1.\) To find the eigenvectors, we solve the equation \(\left(\lambda I - A \right) X = 0\). The matrix \(\left(\lambda I -A \right)\) is given by \[\left[\begin{array}{cc} \lambda - 1 & -1 \\ 0 & \lambda - 1 \end{array} \right]\nonumber\]
Substituting in \(\lambda = 1\), we have the matrix \[\left[\begin{array}{cc} 1 - 1 & -1 \\ 0 & 1 - 1 \end{array} \right] = \left[\begin{array}{rr} 0 & -1 \\ 0 & 0 \end{array} \right]\nonumber\]
Then, solving the equation \(\left(\lambda I - A\right) X = 0\) involves carrying the following augmented matrix to its reduced row-echelon form. \[\left[\begin{array}{rr|r} 0 & -1 & 0 \\ 0 & 0 & 0 \end{array} \right] \rightarrow \cdots \rightarrow \left[\begin{array}{rr|r} 0 & -1 & 0 \\ 0 & 0 & 0 \end{array} \right]\nonumber\]
Then the eigenvectors are of the form \[t\left[\begin{array}{r} 1 \\ 0 \end{array} \right]\nonumber\] and the basic eigenvector is \[X_1 = \left[\begin{array}{r} 1 \\ 0 \end{array} \right]\nonumber\]
In this case, the matrix \(A\) has one eigenvalue of multiplicity two, but only one basic eigenvector. In order to diagonalize \(A\), we need to construct an invertible \(2\times 2\) matrix \(P\). However, because \(A\) only has one basic eigenvector, we cannot construct this \(P\). Notice that if we were to use \(X_1\) as both columns of \(P\), \(P\) would not be invertible. For this reason, we cannot repeat eigenvectors in \(P\).
Hence this matrix cannot be diagonalized.
The idea that a matrix may not be diagonalizable suggests that conditions exist to determine when it is possible to diagonalize a matrix. We saw earlier in Corollary \(\PageIndex{1}\) that an \(n \times n\) matrix with \(n\) distinct eigenvalues is diagonalizable. It turns out that there are other useful diagonalizability tests.
First we need the following definition.
In other words, the eigenspace \(E_{\lambda}(A)\) is all \(X\) such that \(AX = \lambda X\). Notice that this set can be written \(E_{\lambda}(A) = \mathrm{null}(\lambda I - A)\), showing that \(E_{\lambda}(A)\) is a subspace of \(\mathbb{R}^n\).
Recall that the multiplicity of an eigenvalue \(\lambda\) is the number of times that it occurs as a root of the characteristic polynomial.
Consider now the following lemma.
This result tells us that if \(\lambda\) is an eigenvalue of \(A\), then the number of linearly independent \(\lambda\)-eigenvectors is never more than the multiplicity of \(\lambda\). We now use this fact to provide a useful diagonalizability condition.
Complex Eigenvalues
In some applications, a matrix may have eigenvalues which are complex numbers. For example, this often occurs in differential equations. These questions are approached in the same way as above.
Consider the following example.
Example \(\PageIndex{3}\): A Real Matrix with Complex Eigenvalues
Let \[A=\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & 1 & 2 \end{array} \right ]\nonumber \] Find the eigenvalues and eigenvectors of \(A\).
Solution
We will first find the eigenvalues as usual by solving the following equation.
\[\det \left( \lambda \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ] - \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & 1 & 2 \end{array} \right ] \right) =0\nonumber \] This reduces to \(\left( \lambda -1\right) \left( \lambda^{2}-4 \lambda +5\right) =0.\) The solutions are \(\lambda_1 =1,\lambda_2 = 2+i\) and \(\lambda_3 =2-i.\)
There is nothing new about finding the eigenvectors for \(\lambda_1 =1\) so this is left as an exercise.
Consider now the eigenvalue \(\lambda_2 =2+i.\) As usual, we solve the equation \(\left(\lambda I -A \right) X = 0\) as given by \[\left( \left( 2+i\right) \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ] - \left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & 1 & 2 \end{array} \right ] \right) X =\left [ \begin{array}{r} 0 \\ 0 \\ 0 \end{array} \right ]\nonumber \] In other words, we need to solve the system represented by the augmented matrix \[\left [ \begin{array}{crr|r} 1+i & 0 & 0 & 0 \\ 0 & i & 1 & 0 \\ 0 & -1 & i & 0 \end{array} \right ]\nonumber \]
We now use our row operations to solve the system. Divide the first row by \(\left( 1+i\right)\) and then take \(-i\) times the second row and add to the third row. This yields \[\left [ \begin{array}{rrr|r} 1 & 0 & 0 & 0 \\ 0 & i & 1 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber\] Now multiply the second row by \(-i\) to obtain the reduced row-echelon form, given by \[\left [ \begin{array}{rrr|r} 1 & 0 & 0 & 0 \\ 0 & 1 & -i & 0 \\ 0 & 0 & 0 & 0 \end{array} \right ]\nonumber \] Therefore, the eigenvectors are of the form \[t\left [ \begin{array}{r} 0 \\ i \\ 1 \end{array} \right ]\nonumber\] and the basic eigenvector is given by \[X_2 = \left [ \begin{array}{r} 0 \\ i \\ 1 \end{array} \right ]\nonumber\]
As an exercise, verify that the eigenvectors for \(\lambda_3 =2-i\) are of the form \[t\left [ \begin{array}{r} 0 \\ -i \\ 1 \end{array} \right ]\nonumber\] Hence, the basic eigenvector is given by \[X_3 = \left [ \begin{array}{r} 0 \\ -i \\ 1 \end{array} \right ]\nonumber\]
As usual, be sure to check your answers! To verify, we check that \(AX_3 = \left(2 - i \right) X_3\) as follows. \[\left [ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & -1 \\ 0 & 1 & 2 \end{array} \right ] \left [ \begin{array}{r} 0 \\ -i \\ 1 \end{array} \right ] = \left [ \begin{array}{c} 0 \\ -1-2i \\ 2-i \end{array} \right ] =\left( 2-i\right) \left [ \begin{array}{r} 0 \\ -i \\ 1 \end{array} \right ]\nonumber \]
Therefore, we know that this eigenvector and eigenvalue are correct.
Notice that in Example \(\PageIndex{3}\) , two of the eigenvalues were given by \(\lambda_2 = 2 + i\) and \(\lambda_3 = 2-i\). You may recall that these two complex numbers are conjugates . It turns out that whenever a matrix containing real entries has a complex eigenvalue \(\lambda\), it also has an eigenvalue equal to \(\overline{\lambda}\), the conjugate of \(\lambda\).