5.5: Similarity and Diagonalization
( \newcommand{\kernel}{\mathrm{null}\,}\)
In Section [sec:3_3] we studied diagonalization of a square matrix A, and found important applications (for example to linear dynamical systems). We can now utilize the concepts of subspace, basis, and dimension to clarify the diagonalization process, reveal some new results, and prove some theorems which could not be demonstrated in Section [sec:3_3].
Before proceeding, we introduce a notion that simplifies the discussion of diagonalization, and is used throughout the book.
Similar Matrices
Similar Matrices015930 If A and B are n×n matrices, we say that A and B are similar, and write A∼B, if B=P−1AP for some invertible matrix P.
Note that A∼B if and only if B=QAQ−1 where Q is invertible (write P−1=Q). The language of similarity is used throughout linear algebra. For example, a matrix A is diagonalizable if and only if it is similar to a diagonal matrix.
If A∼B, then necessarily B∼A. To see why, suppose that B=P−1AP. Then A=PBP−1=Q−1BQ where Q=P−1 is invertible. This proves the second of the following properties of similarity (the others are left as an exercise):
1. A∼A\mbox{ for all square matrices }A.2. \mbox{If }A∼B\mbox{, then }B∼A.3. \mbox{If }A∼B\mbox{ and }B∼C\mbox{, then }A∼C.
These properties are often expressed by saying that the similarity relation ∼ is an equivalence relation on the set of n×n matrices. Here is an example showing how these properties are used.
015950 If A is similar to B and either A or B is diagonalizable, show that the other is also diagonalizable.
We have A∼B. Suppose that A is diagonalizable, say A∼D where D is diagonal. Since B∼A by (2) of ([eq:equivalence_properties]), we have B∼A and A∼D. Hence B∼D by (3) of ([eq:equivalence_properties]), so B is diagonalizable too. An analogous argument works if we assume instead that B is diagonalizable.
Similarity is compatible with inverses, transposes, and powers:
If A∼B then A−1∼B−1,AT∼BT, and Ak∼Bk for all integers k≥1.
The proofs are routine matrix computations using Theorem [thm:008997]. Thus, for example, if A is diagonalizable, so also are AT, A−1 (if it exists), and Ak (for each k≥1). Indeed, if A∼D where D is a diagonal matrix, we obtain AT∼DT, A−1∼D−1, and Ak∼Dk, and each of the matrices DT, D−1, and Dk is diagonal.
We pause to introduce a simple matrix function that will be referred to later.
Trace of a Matrix015972 The trace \functrA of an n×n matrix A is defined to be the sum of the main diagonal elements of A.
In other words:
If A=[aij], then trA=a11+a22+⋯+ann.
It is evident that \functr(A+B)=\functrA+\functrB and that \functr(cA)=c\functrA holds for all n×n matrices A and B and all scalars c. The following fact is more surprising.
015978 Let A and B be n×n matrices. Then \functr(AB)=\functr(BA).
Write A=[aij] and B=[bij]. For each i, the (i,i)-entry di of the matrix AB is given as follows: di=ai1b1i+ai2b2i+⋯+ainbni=∑jaijbji. Hence
\functr(AB)=d1+d2+⋯+dn=∑idi=∑i(∑jaijbji)
Similarly we have \functr(BA)=∑i(∑jbijaji). Since these two double sums are the same, Lemma [lem:015978] is proved.
As the name indicates, similar matrices share many properties, some of which are collected in the next theorem for reference.
016008 If A and B are similar n×n matrices, then A and B have the same determinant, rank, trace, characteristic polynomial, and eigenvalues.
Let B=P−1AP for some invertible matrix P. Then we have
detB=det(P−1)detAdetP=detA because det(P−1)=1/detP
Similarly, rankB=rank(P−1AP)=rankA by Corollary [cor:015519]. Next Lemma [lem:015978] gives
\functr(P−1AP)=\functr[P−1(AP)]=\functr[(AP)P−1]=trA
As to the characteristic polynomial,
cB(x)=det(xI−B)=det{x(P−1IP)−P−1AP}=det{P−1(xI−A)P}=det(xI−A)=cA(x)
Finally, this shows that A and B have the same eigenvalues because the eigenvalues of a matrix are the roots of its characteristic polynomial.
016022 Sharing the five properties in Theorem [thm:016008] does not guarantee that two matrices are similar. The matrices A=[1101] and I=[1001] have the same determinant, rank, trace, characteristic polynomial, and eigenvalues, but they are not similar because P−1IP=I for any invertible matrix P.
Diagonalization Revisited
Recall that a square matrix A is diagonalizable if there exists an invertible matrix P such that P−1AP=D is a diagonal matrix, that is if A is similar to a diagonal matrix D. Unfortunately, not all matrices are diagonalizable, for example [1101] (see Example [exa:009318]). Determining whether A is diagonalizable is closely related to the eigenvalues and eigenvectors of A. Recall that a number λ is called an eigenvalue of A if Ax=λx for some nonzero column x in Rn, and any such nonzero vector x is called an eigenvector of A corresponding to λ (or simply a λ-eigenvector of A). The eigenvalues and eigenvectors of A are closely related to the characteristic polynomial cA(x) of A, defined by
cA(x)=det(xI−A)
If A is n×n this is a polynomial of degree n, and its relationship to the eigenvalues is given in the following theorem (a repeat of Theorem [thm:009033]).
016037 Let A be an n×n matrix.
- The eigenvalues λ of A are the roots of the characteristic polynomial cA(x) of A.
- The λ-eigenvectors x are the nonzero solutions to the homogeneous system (λI−A)x=0
Show that the eigenvalues of a triangular matrix are the main diagonal entries.
Assume that A is triangular. Then the matrix xI−A is also triangular and has diagonal entries (x−a11),(x−a22),…,(x−ann) where A=[aij]. Hence Theorem [thm:007885] gives
cA(x)=(x−a11)(x−a22)⋯(x−ann)
and the result follows because the eigenvalues are the roots of cA(x).
Theorem [thm:009214] asserts (in part) that an n×n matrix A is diagonalizable if and only if it has n eigenvectors x1,…,xn such that the matrix P=[x1⋯xn] with the xi as columns is invertible. This is equivalent to requiring that {x1,…,xn} is a basis of Rn consisting of eigenvectors of A. Hence we can restate Theorem [thm:009214] as follows:
016068 Let A be an n×n matrix.
- A is diagonalizable if and only if Rn has a basis {x1,x2,…,xn} consisting of eigenvectors of A.
- When this is the case, the matrix P=[x1x2⋯xn] is invertible and P−1AP=\funcdiag(λ1,λ2,…,λn) where, for each i, λi is the eigenvalue of A corresponding to xi.
The next result is a basic tool for determining when a matrix is diagonalizable. It reveals an important connection between eigenvalues and linear independence: Eigenvectors corresponding to distinct eigenvalues are necessarily linearly independent.
016090 Let x1,x2,…,xk be eigenvectors corresponding to distinct eigenvalues λ1,λ2,…,λk of an n×n matrix A. Then {x1,x2,…,xk} is a linearly independent set.
We use induction on k. If k=1, then {x1} is independent because x1≠0. In general, suppose the theorem is true for some k≥1. Given eigenvectors {x1,x2,…,xk+1}, suppose a linear combination vanishes:
t1x1+t2x2+⋯+tk+1xk+1=0
We must show that each ti=0. Left multiply ([eq:thm_proof_5_5_4_1]) by A and use the fact that Axi=λixi to get
t1λ1x1+t2λ2x2+⋯+tk+1λk+1xk+1=0
If we multiply ([eq:thm_proof_5_5_4_1]) by λ1 and subtract the result from ([eq:thm_proof_5_5_4_2]), the first terms cancel and we obtain
t2(λ2−λ1)x2+t3(λ3−λ1)x3+⋯+tk+1(λk+1−λ1)xk+1=0
Since x2,x3,…,xk+1 correspond to distinct eigenvalues λ2,λ3,…,λk+1, the set {x2,x3,…,xk+1} is independent by the induction hypothesis. Hence,
t2(λ2−λ1)=0,t3(λ3−λ1)=0,…,tk+1(λk+1−λ1)=0
and so t2=t3=⋯=tk+1=0 because the λi are distinct. Hence ([eq:thm_proof_5_5_4_1]) becomes t1x1=0, which implies that t1=0 because x1≠0. This is what we wanted.
Theorem [thm:016090] will be applied several times; we begin by using it to give a useful condition for when a matrix is diagonalizable.
016145 If A is an n×n matrix with n distinct eigenvalues, then A is diagonalizable.
Choose one eigenvector for each of the n distinct eigenvalues. Then these eigenvectors are independent by Theorem [thm:016090], and so are a basis of Rn by Theorem [thm:014436]. Now use Theorem [thm:016068].
016152 Show that A=[100123−110] is diagonalizable.
A routine computation shows that cA(x)=(x−1)(x−3)(x+1) and so has distinct eigenvalues 1, 3, and −1. Hence Theorem [thm:016145] applies.
However, a matrix can have multiple eigenvalues as we saw in Section [sec:3_3]. To deal with this situation, we prove an important lemma which formalizes a technique that is basic to diagonalization, and which will be used three times below.
016161 Let {x1,x2,…,xk} be a linearly independent set of eigenvectors of an n×n matrix A, extend it to a basis {x1,x2,…,xk,…,xn} of Rn, and let
P=[x1x2⋯xn]
be the (invertible) n×n matrix with the xi as its columns. If λ1,λ2,…,λk are the (not necessarily distinct) eigenvalues of A corresponding to x1,x2,…,xk respectively, then P−1AP has block form
P−1AP=[\funcdiag(λ1,λ2,…,λk)B0A1]
where B has size k×(n−k) and A1 has size (n−k)×(n−k).
If {e1,e2,…,en} is the standard basis of Rn, then
[e1e2…en]=In=P−1P=P−1[x1x2⋯xn]=[P−1x1P−1x2⋯P−1xn]
Comparing columns, we have P−1xi=ei for each 1≤i≤n. On the other hand, observe that
P−1AP=P−1A[x1x2⋯xn]=[(P−1A)x1(P−1A)x2⋯(P−1A)xn]
Hence, if 1≤i≤k, column i of P−1AP is
(P−1A)xi=P−1(λixi)=λi(P−1xi)=λiei
This describes the first k columns of P−1AP, and Lemma [lem:016161] follows.
Note that Lemma [lem:016161] (with k=n) shows that an n×n matrix A is diagonalizable if Rn has a basis of eigenvectors of A, as in (1) of Theorem [thm:016068].
Eigenspace of a Matrix016203 If λ is an eigenvalue of an n×n matrix A, define the eigenspace of A corresponding to λ by
Eλ(A)={x in Rn∣Ax=λx}
This is a subspace of Rn and the eigenvectors corresponding to λ are just the nonzero vectors in Eλ(A). In fact Eλ(A) is the null space of the matrix (λI−A):
Eλ(A)={x∣(λI−A)x=0}=\funcnull(λI−A)
Hence, by Theorem [thm:015561], the basic solutions of the homogeneous system (λI−A)x=0 given by the gaussian algorithm form a basis for Eλ(A). In particular
dimEλ(A) is the number of basic solutions x of (λI−A)x=0
Now recall (Definition [def:009289]) that the multiplicity1 of an eigenvalue λ of A is the number of times λ occurs as a root of the characteristic polynomial cA(x) of A. In other words, the multiplicity of λ is the largest integer m≥1 such that
cA(x)=(x−λ)mg(x)
for some polynomial g(x). Because of ([eq:number_of_basic_solutions]), the assertion (without proof) in Theorem [thm:009296] can be stated as follows: A square matrix is diagonalizable if and only if the multiplicity of each eigenvalue λ equals dim[Eλ(A)]. We are going to prove this, and the proof requires the following result which is valid for any square matrix, diagonalizable or not.
016219 Let λ be an eigenvalue of multiplicity m of a square matrix A. Then dim[Eλ(A)]≤m.
Write dim[Eλ(A)]=d. It suffices to show that cA(x)=(x−λ)dg(x) for some polynomial g(x), because m is the highest power of (x−λ) that divides cA(x). To this end, let {x1,x2,…,xd} be a basis of Eλ(A). Then Lemma [lem:016161] shows that an invertible n×n matrix P exists such that
P−1AP=[λIdB0A1]
in block form, where Id denotes the d×d identity matrix. Now write A′=P−1AP and observe that cA′(x)=cA(x) by Theorem [thm:016008]. But Theorem [thm:007890] gives
cA(x)=cA′(x)=det(xIn−A′)=det[(x−λ)Id−B0xIn−d−A1]=det[(x−λ)Id]det[(xIn−d−A1)]=(x−λ)dg(x)
where g(x)=cA1(x). This is what we wanted.
It is impossible to ignore the question when equality holds in Lemma [lem:016219] for each eigenvalue λ. It turns out that this characterizes the diagonalizable n×n matrices A for which cA(x) factors completely over R. By this we mean that cA(x)=(x−λ1)(x−λ2)⋯(x−λn), where the λi are real numbers (not necessarily distinct); in other words, every eigenvalue of A is real. This need not happen (consider A=[0−110]), and we investigate the general case below.
016250 The following are equivalent for a square matrix A for which cA(x) factors completely.
- A is diagonalizable.
- dim[Eλ(A)] equals the multiplicity of λ for every eigenvalue λ of the matrix A.
Let A be n×n and let λ1,λ2,…,λk be the distinct eigenvalues of A. For each i, let mi denote the multiplicity of λi and write di=dim[Eλi(A)]. Then
cA(x)=(x−λ1)m1(x−λ2)m2…(x−λk)mk
so m1+⋯+mk=n because cA(x) has degree n. Moreover, di≤mi for each i by Lemma [lem:016219].
(1) ⇒ (2). By (1), Rn has a basis of n eigenvectors of A, so let t_{i} of them lie in E_{\lambda_i}(A) for each i. Since the subspace spanned by these t_{i} eigenvectors has dimension t_{i}, we have t_{i} \leq d_{i} for each i by Theorem [thm:014254]. Hence
n = t_1 + \dots + t_k \leq d_1 + \dots + d_k \leq m_1 + \dots + m_k = n \nonumber
It follows that d_1 + \dots + d_k = m_1 + \dots + m_k so, since d_{i} \leq m_{i} for each i, we must have d_{i} = m_{i}. This is (2).
(2) \Rightarrow (1). Let B_{i} denote a basis of E_{\lambda_i}(A) for each i, and let B = B_{1} \cup \dots \cup B_{k}. Since each B_{i} contains m_{i} vectors by (2), and since the B_{i} are pairwise disjoint (the \lambda_{i} are distinct), it follows that B contains n vectors. So it suffices to show that B is linearly independent (then B is a basis of \mathbb{R}^n). Suppose a linear combination of the vectors in B vanishes, and let \mathbf{y}_{i} denote the sum of all terms that come from B_{i}. Then \mathbf{y}_{i} lies in E_{\lambda_i}(A), so the nonzero \mathbf{y}_{i} are independent by Theorem [thm:016090] (as the \lambda_{i} are distinct). Since the sum of the \mathbf{y}_{i} is zero, it follows that \mathbf{y}_{i} = \mathbf{0} for each i. Hence all coefficients of terms in \mathbf{y}_{i} are zero (because B_{i} is independent). Since this holds for each i, it shows that B is independent.
016314 If A = \left[ \begin{array}{rrr} 5 & 8 & 16 \\ 4 & 1 & 8 \\ -4 & -4 & -11 \end{array} \right] and B = \left[ \begin{array}{rrr} 2 & 1 & 1 \\ 2 & 1 & -2 \\ -1 & 0 & -2 \end{array} \right] show that A is diagonalizable but B is not.
We have c_{A}(x) = (x + 3)^{2}(x - 1) so the eigenvalues are \lambda_{1} = -3 and \lambda_{2} = 1. The corresponding eigenspaces are E_{\lambda_1}(A) = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}\} and E_{\lambda_2}(A) = span \;\{\mathbf{x}_{3}\} where
\mathbf{x}_1 = \left[ \begin{array}{r} -1\\ 1\\ 0 \end{array} \right] , \mathbf{x}_2 = \left[ \begin{array}{r} -2\\ 0\\ 1 \end{array} \right] , \mathbf{x}_3 = \left[ \begin{array}{r} 2\\ 1\\ -1 \end{array} \right] \nonumber
as the reader can verify. Since \{\mathbf{x}_{1}, \mathbf{x}_{2}\} is independent, we have dim \;(E_{\lambda_1}(A)) = 2 which is the multiplicity of \lambda_{1}. Similarly, dim \;(E_{\lambda_2}(A)) = 1 equals the multiplicity of \lambda_{2}. Hence A is diagonalizable by Theorem [thm:016250], and a diagonalizing matrix is P = \left[ \begin{array}{ccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \mathbf{x}_{3} \end{array} \right].
Turning to B, c_{B}(x) = (x + 1)^{2}(x - 3) so the eigenvalues are \lambda_{1} = -1 and \lambda_{2} = 3. The corresponding eigenspaces are E_{\lambda_1}(B) =span \;\{\mathbf{y}_{1}\} and E_{\lambda_2}(B) = span \;\{\mathbf{y}_{2}\} where
\mathbf{y}_1 = \left[ \begin{array}{r} -1\\ 2\\ 1 \end{array} \right] , \mathbf{y}_2 = \left[ \begin{array}{r} 5\\ 6\\ -1 \end{array} \right] \nonumber
Here dim \;(E_{\lambda_1}(B)) = 1 is smaller than the multiplicity of \lambda_{1}, so the matrix B is not diagonalizable, again by Theorem [thm:016250]. The fact that dim \;(E_{\lambda_1}(B)) = 1 means that there is no possibility of finding three linearly independent eigenvectors.
Complex Eigenvalues
All the matrices we have considered have had real eigenvalues. But this need not be the case: The matrix A = \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right] has characteristic polynomial c_{A}(x) = x^{2} + 1 which has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is that we must use a larger set of scalars, the complex numbers. The basic properties of these numbers are outlined in Appendix [chap:appacomplexnumbers].
Indeed, nearly everything we have done for real matrices can be done for complex matrices. The methods are the same; the only difference is that the arithmetic is carried out with complex numbers rather than real ones. For example, the gaussian algorithm works in exactly the same way to solve systems of linear equations with complex coefficients, matrix multiplication is defined the same way, and the matrix inversion algorithm works in the same way.
But the complex numbers are better than the real numbers in one respect: While there are polynomials like x^{2} + 1 with real coefficients that have no real root, this problem does not arise with the complex numbers: Every nonconstant polynomial with complex coefficients has a complex root, and hence factors completely as a product of linear factors. This fact is known as the fundamental theorem of algebra.2
016368 Diagonalize the matrix A = \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right].
The characteristic polynomial of A is
c_A(x) = \det (xI - A) = x^2 + 1 = (x - i)(x + i) \nonumber
where i^{2} = -1. Hence the eigenvalues are \lambda_{1} = i and \lambda_{2} = -i, with corresponding eigenvectors \mathbf{x}_1 = \left[ \begin{array}{r} 1 \\ -i \end{array} \right] and \mathbf{x}_2 = \left[ \begin{array}{r} 1 \\ i \end{array} \right]. Hence A is diagonalizable by the complex version of Theorem [thm:016145], and the complex version of Theorem [thm:016068] shows that P = \left[ \begin{array}{cc} \mathbf{x}_1\ & \mathbf{x}_2 \end{array} \right]= \left[ \begin{array}{rr} 1 & 1 \\ -i & i \end{array} \right] is invertible and P^{-1}AP = \left[ \begin{array}{cc} \lambda_1 & 0 \\ 0 & \lambda_2 \end{array} \right] = \left[ \begin{array}{cc} i & 0 \\ 0 & -i \end{array} \right]. Of course, this can be checked directly.
We shall return to complex linear algebra in Section [sec:8_6].
Symmetric Matrices3
On the other hand, many of the applications of linear algebra involve a real matrix A and, while A will have complex eigenvalues by the fundamental theorem of algebra, it is always of interest to know when the eigenvalues are, in fact, real. While this can happen in a variety of ways, it turns out to hold whenever A is symmetric. This important theorem will be used extensively later. Surprisingly, the theory of complex eigenvalues can be used to prove this useful result about real eigenvalues.
Let \overline{z} denote the conjugate of a complex number z. If A is a complex matrix, the conjugate matrix \overline{A} is defined to be the matrix obtained from A by conjugating every entry. Thus, if A = \left[ z_{ij}\right], then \overline{A} = \left[ \overline{z}_{ij} \right]. For example,
\mbox{If } A = \left[ \begin{array}{cc} -i + 2 & 5 \\ i & 3 + 4i \end{array} \right] \mbox{ then } \overline{A} = \left[ \begin{array}{cc} i + 2 & 5 \\ -i & 3 - 4i \end{array} \right] \nonumber
Recall that \overline{z + w} = \overline{z} + \overline{w} and \overline{zw} = \overline{z}\ \overline{w} hold for all complex numbers z and w. It follows that if A and B are two complex matrices, then
\overline{A + B} = \overline{A} + \overline{B},\quad \overline{AB} = \overline{A}\ \overline{B}\quad \mbox{ and } \overline{\lambda A} = \overline{\lambda}\ \overline{A} \nonumber
hold for all complex scalars \lambda. These facts are used in the proof of the following theorem.
016397 Let A be a symmetric real matrix. If \lambda is any complex eigenvalue of A, then \lambda is real.
Observe that \overline{A} = A because A is real. If \lambda is an eigenvalue of A, we show that \lambda is real by showing that \overline{\lambda} = \lambda. Let \mathbf{x} be a (possibly complex) eigenvector corresponding to \lambda, so that \mathbf{x} \neq \mathbf{0} and A\mathbf{x} = \lambda\mathbf{x}. Define c = \mathbf{x}^T\overline{\mathbf{x}}.
If we write \mathbf{x} = \left[ \begin{array}{c} z_1 \\ z_2 \\ \vdots \\ z_n \end{array}\right] where the z_{i} are complex numbers, we have
c = \mathbf{x}^T\overline{\mathbf{x}} = z_1\overline{z_1} + z_2\overline{z_2} + \dotsb + z_n\overline{z_n} = |\overline{z_1}|^2 + |\overline{z_2}|^2 + \dotsb + |\overline{z_n}|^2 \nonumber
Thus c is a real number, and c > 0 because at least one of the z_{i} \neq 0 (as \mathbf{x} \neq \mathbf{0}). We show that \overline{\lambda} = \lambda by verifying that \lambda c = \overline{\lambda}c. We have
\lambda c = \lambda(\mathbf{x}^T\overline{\mathbf{x}}) = (\lambda \mathbf{x})^T\overline{\mathbf{x}} = (A\mathbf{x})^T\overline{\mathbf{x}} = \mathbf{x}^TA^T\overline{\mathbf{x}} \nonumber
At this point we use the hypothesis that A is symmetric and real. This means A^T = A = \overline{A} so we continue the calculation:
\begin{aligned} \lambda c = \mathbf{x}^TA^T\overline{\mathbf{x}} = \mathbf{x}^T(\overline{A}\ \overline{\mathbf{x}}) = \mathbf{x}^T(\overline{A\mathbf{x}}) &= \mathbf{x}^T(\overline{\lambda \mathbf{x}}) \\ &= \mathbf{x}^T(\overline{\lambda}\ \overline{\mathbf{x}}) \\ &= \overline{\lambda}\mathbf{x}^T\overline{\mathbf{x}} \\ &= \overline{\lambda}c\end{aligned} \nonumber
as required.
The technique in the proof of Theorem [thm:016397] will be used again when we return to complex linear algebra in Section [sec:8_6].
016421 Verify Theorem [thm:016397] for every real, symmetric 2 \times 2 matrix A.
If A = \left[ \begin{array}{cc} a & b \\ b & c \end{array} \right] we have c_{A}(x) = x^{2} - (a + c)x + (ac - b^{2}), so the eigenvalues are given by \lambda = \frac{1}{2} [(a + c) \pm \sqrt{(a + c)^2 - 4(ac - b^2)}]. But here
(a + c)^2 - 4(ac -b^2) = (a - c)^2 + 4b^2 \geq 0 \nonumber
for any choice of a, b, and c. Hence, the eigenvalues are real numbers.
- This is often called the algebraic multiplicity of \lambda.↩
- This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.↩
- This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix [chap:appacomplexnumbers].↩