Skip to main content
Mathematics LibreTexts

5.5: Similarity and Diagonalization

  • Page ID
    58859
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In Section [sec:3_3] we studied diagonalization of a square matrix \(A\), and found important applications (for example to linear dynamical systems). We can now utilize the concepts of subspace, basis, and dimension to clarify the diagonalization process, reveal some new results, and prove some theorems which could not be demonstrated in Section [sec:3_3].

    Before proceeding, we introduce a notion that simplifies the discussion of diagonalization, and is used throughout the book.

    Similar Matrices

    Similar Matrices015930 If \(A\) and \(B\) are \(n \times n\) matrices, we say that \(A\) and \(B\) are similar, and write \(A \sim B\), if \(B = P^{-1}AP\) for some invertible matrix \(P\).

    Note that \(A \sim B\) if and only if \(B = QAQ^{-1}\) where \(Q\) is invertible (write \(P^{-1} = Q\)). The language of similarity is used throughout linear algebra. For example, a matrix \(A\) is diagonalizable if and only if it is similar to a diagonal matrix.

    If \(A \sim B\), then necessarily \(B \sim A\). To see why, suppose that \(B = P^{-1}AP\). Then \(A = PBP^{-1} = Q^{-1}BQ\) where \(Q = P^{-1}\) is invertible. This proves the second of the following properties of similarity (the others are left as an exercise):

    \[\begin{aligned} \label{eq:equivalence_properties} & 1.\ A \sim A \textit{\mbox{ for all square matrices }} A. \\ & 2.\ \textit{\mbox{If }} A \sim B\textit{\mbox{, then }} B \sim A. \\ & 3.\ \textit{\mbox{If }} A \sim B \textit{\mbox{ and }} B \sim C \textit{\mbox{, then }} A \sim C. \nonumber\end{aligned} \nonumber \]

    These properties are often expressed by saying that the similarity relation \(\sim\) is an equivalence relation on the set of \(n \times n\) matrices. Here is an example showing how these properties are used.

    015950 If \(A\) is similar to \(B\) and either \(A\) or \(B\) is diagonalizable, show that the other is also diagonalizable.

    We have \(A \sim B\). Suppose that \(A\) is diagonalizable, say \(A \sim D\) where \(D\) is diagonal. Since \(B \sim A\) by (2) of ([eq:equivalence_properties]), we have \(B \sim A\) and \(A \sim D\). Hence \(B \sim D\) by (3) of ([eq:equivalence_properties]), so \(B\) is diagonalizable too. An analogous argument works if we assume instead that \(B\) is diagonalizable.

    Similarity is compatible with inverses, transposes, and powers:

    \[\mbox{If } A \sim B\mbox{ then } \quad A^{-1} \sim B^{-1}, \quad A^T \sim B^T,\quad \mbox{ and } \quad A^k \sim B^k\mbox{ for all integers } k \geq 1. \nonumber \]

    The proofs are routine matrix computations using Theorem [thm:008997]. Thus, for example, if \(A\) is diagonalizable, so also are \(A^{T}\), \(A^{-1}\) (if it exists), and \(A^{k}\) (for each \(k \geq 1\)). Indeed, if \(A \sim D\) where \(D\) is a diagonal matrix, we obtain \(A^{T} \sim D^{T}\), \(A^{-1} \sim D^{-1}\), and \(A^{k} \sim D^{k}\), and each of the matrices \(D^{T}\), \(D^{-1}\), and \(D^{k}\) is diagonal.

    We pause to introduce a simple matrix function that will be referred to later.

    Trace of a Matrix015972 The trace \(\func{tr} A\) of an \(n \times n\) matrix \(A\) is defined to be the sum of the main diagonal elements of \(A\).

    In other words:

    \[\mbox{If } A = \left[ a_{ij}\right],\mbox{ then } tr \; A = a_{11} + a_{22} + \dots + a_{nn}. \nonumber \]

    It is evident that \(\func{tr}(A + B) = \func{tr} A + \func{tr} B\) and that \(\func{tr}(cA) = c \func{tr} A\) holds for all \(n \times n\) matrices \(A\) and \(B\) and all scalars \(c\). The following fact is more surprising.

    015978 Let \(A\) and \(B\) be \(n \times n\) matrices. Then \(\func{tr}(AB) = \func{tr}(BA)\).

    Write \(A = \left[ a_{ij} \right]\) and \(B = \left[ b_{ij} \right]\). For each \(i\), the \((i, i)\)-entry \(d_{i}\) of the matrix \(AB\) is given as follows: \(d_{i} = a_{i1}b_{1i} + a_{i2}b_{2i} + \dots + a_{in}b_{ni} = \sum_{j}a_{ij}b_{ji}\). Hence

    \[\func{tr}(AB) = d_1 + d_2 + \dots + d_n = \sum_{i}d_i = \sum_{i}\left(\sum_{j}a_{ij}b_{ji}\right) \nonumber \]

    Similarly we have \(\func{tr}(BA) = \sum_{i}(\sum_{j}b_{ij}a_{ji})\). Since these two double sums are the same, Lemma [lem:015978] is proved.

    As the name indicates, similar matrices share many properties, some of which are collected in the next theorem for reference.

    016008 If \(A\) and \(B\) are similar \(n \times n\) matrices, then \(A\) and \(B\) have the same determinant, rank, trace, characteristic polynomial, and eigenvalues.

    Let \(B = P^{-1}AP\) for some invertible matrix \(P\). Then we have

    \[\det B = \det (P^{-1}) \det A \det P = \det A\mbox{ because }\det (P^{-1}) = 1/ \det P \nonumber \]

    Similarly, \(rank \; B = rank \;(P^{-1}AP) = rank \; A\) by Corollary [cor:015519]. Next Lemma [lem:015978] gives

    \[\func{tr} (P^{-1}AP) = \func{tr}\left[ P^{-1}(AP)\right] = \func{tr}\left[ (AP)P^{-1}\right] = tr \; A \nonumber \]

    As to the characteristic polynomial,

    \[\begin{aligned} c_B(x) = \det (xI - B) &= \det \{x(P^{-1}IP) - P^{-1}AP\} \\ &= \det \{ P^{-1}(xI - A)P\} \\ &= \det (xI - A) \\ &= c_A(x)\end{aligned} \nonumber \]

    Finally, this shows that \(A\) and \(B\) have the same eigenvalues because the eigenvalues of a matrix are the roots of its characteristic polynomial.

    016022 Sharing the five properties in Theorem [thm:016008] does not guarantee that two matrices are similar. The matrices \(A = \left[ \begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array} \right]\) and \(I = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \right]\) have the same determinant, rank, trace, characteristic polynomial, and eigenvalues, but they are not similar because \(P^{-1}IP = I\) for any invertible matrix \(P\).

    Diagonalization Revisited

    Recall that a square matrix \(A\) is diagonalizable if there exists an invertible matrix \(P\) such that \(P^{-1}AP = D\) is a diagonal matrix, that is if \(A\) is similar to a diagonal matrix \(D\). Unfortunately, not all matrices are diagonalizable, for example \(\left[ \begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array} \right]\) (see Example [exa:009318]). Determining whether \(A\) is diagonalizable is closely related to the eigenvalues and eigenvectors of \(A\). Recall that a number \(\lambda\) is called an eigenvalue of \(A\) if \(A\mathbf{x} = \lambda\mathbf{x}\) for some nonzero column \(\mathbf{x}\) in \(\mathbb{R}^n\), and any such nonzero vector \(\mathbf{x}\) is called an eigenvector of \(A\) corresponding to \(\lambda\) (or simply a \(\lambda\)-eigenvector of \(A\)). The eigenvalues and eigenvectors of \(A\) are closely related to the characteristic polynomial \(c_{A}(x)\) of \(A\), defined by

    \[c_A(x) = \det (xI - A) \nonumber \]

    If \(A\) is \(n \times n\) this is a polynomial of degree \(n\), and its relationship to the eigenvalues is given in the following theorem (a repeat of Theorem [thm:009033]).

    016037 Let \(A\) be an \(n \times n\) matrix.

    1. The eigenvalues \(\lambda\) of \(A\) are the roots of the characteristic polynomial \(c_{A}(x)\) of \(A\).
    2. The \(\lambda\)-eigenvectors \(\mathbf{x}\) are the nonzero solutions to the homogeneous system \[(\lambda I - A)\mathbf{x} = \mathbf{0} \nonumber \]
    Exercise \(\PageIndex{1}\)

    Show that the eigenvalues of a triangular matrix are the main diagonal entries.

    Assume that \(A\) is triangular. Then the matrix \(xI - A\) is also triangular and has diagonal entries \((x - a_{11}), (x - a_{22}), \dots, (x - a_{nn})\) where \(A = \left[ a_{ij}\right]\). Hence Theorem [thm:007885] gives

    \[c_A(x) = (x - a_{11})(x - a_{22})\cdots(x - a_{nn}) \nonumber \]

    and the result follows because the eigenvalues are the roots of \(c_{A}(x)\).

    Theorem [thm:009214] asserts (in part) that an \(n \times n\) matrix \(A\) is diagonalizable if and only if it has \(n\) eigenvectors \(\mathbf{x}_{1}, \dots, \mathbf{x}_{n}\) such that the matrix \(P = \left[ \begin{array}{ccc} \mathbf{x}_{1} & \cdots & \mathbf{x}_{n} \end{array} \right]\) with the \(\mathbf{x}_{i}\) as columns is invertible. This is equivalent to requiring that \(\{\mathbf{x}_{1}, \dots, \mathbf{x}_{n}\}\) is a basis of \(\mathbb{R}^n\) consisting of eigenvectors of \(A\). Hence we can restate Theorem [thm:009214] as follows:

    016068 Let \(A\) be an \(n \times n\) matrix.

    1. \(A\) is diagonalizable if and only if \(\mathbb{R}^n\) has a basis \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\}\) consisting of eigenvectors of \(A\).
    2. When this is the case, the matrix \(P = \left[ \begin{array}{cccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \cdots & \mathbf{x}_{n} \end{array} \right]\) is invertible and \(P^{-1}AP = \func{diag}(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n})\) where, for each \(i\), \(\lambda_{i}\) is the eigenvalue of \(A\) corresponding to \(\mathbf{x}_{i}\).

    The next result is a basic tool for determining when a matrix is diagonalizable. It reveals an important connection between eigenvalues and linear independence: Eigenvectors corresponding to distinct eigenvalues are necessarily linearly independent.

    016090 Let \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\) be eigenvectors corresponding to distinct eigenvalues \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{k}\) of an \(n \times n\) matrix \(A\). Then \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\}\) is a linearly independent set.

    We use induction on \(k\). If \(k = 1\), then \(\{\mathbf{x}_{1}\}\) is independent because \(\mathbf{x}_{1} \neq \mathbf{0}\). In general, suppose the theorem is true for some \(k \geq 1\). Given eigenvectors \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k+1}\}\), suppose a linear combination vanishes:

    \[\label{eq:thm_proof_5_5_4_1} t_1\mathbf{x}_1 + t_2\mathbf{x}_2 + \dots + t_{k+1}\mathbf{x}_{k+1} = \mathbf{0} \]

    We must show that each \(t_{i} = 0\). Left multiply ([eq:thm_proof_5_5_4_1]) by \(A\) and use the fact that \(A\mathbf{x}_{i} = \lambda_{i}\mathbf{x}_{i}\) to get

    \[\label{eq:thm_proof_5_5_4_2} t_1\lambda_1\mathbf{x}_1 + t_2\lambda_2\mathbf{x}_2 + \dots + t_{k+1}\lambda_{k+1}\mathbf{x}_{k+1} = \mathbf{0} \]

    If we multiply ([eq:thm_proof_5_5_4_1]) by \(\lambda_{1}\) and subtract the result from ([eq:thm_proof_5_5_4_2]), the first terms cancel and we obtain

    \[t_2(\lambda_2 - \lambda_1)\mathbf{x}_2 + t_3(\lambda_3 - \lambda_1)\mathbf{x}_3 + \dots + t_{k+1}(\lambda_{k+1} - \lambda_1)\mathbf{x}_{k+1} = \mathbf{0} \nonumber \]

    Since \(\mathbf{x}_{2}, \mathbf{x}_{3}, \dots, \mathbf{x}_{k+1}\) correspond to distinct eigenvalues \(\lambda_{2}, \lambda_{3}, \dots, \lambda_{k+1}\), the set \(\{\mathbf{x}_{2}, \mathbf{x}_{3}, \dots, \mathbf{x}_{k+1}\}\) is independent by the induction hypothesis. Hence,

    \[t_2(\lambda_2 - \lambda_1) = 0, \quad t_3(\lambda_3 - \lambda_1) = 0, \quad \dots, \quad t_{k+1}(\lambda_{k+1} - \lambda_1) = 0 \nonumber \]

    and so \(t_{2} = t_{3} = \dots = t_{k+1} = 0\) because the \(\lambda_{i}\) are distinct. Hence ([eq:thm_proof_5_5_4_1]) becomes \(t_{1}\mathbf{x}_{1} = \mathbf{0}\), which implies that \(t_{1} = 0\) because \(\mathbf{x}_{1} \neq \mathbf{0}\). This is what we wanted.

    Theorem [thm:016090] will be applied several times; we begin by using it to give a useful condition for when a matrix is diagonalizable.

    016145 If \(A\) is an \(n \times n\) matrix with n distinct eigenvalues, then \(A\) is diagonalizable.

    Choose one eigenvector for each of the \(n\) distinct eigenvalues. Then these eigenvectors are independent by Theorem [thm:016090], and so are a basis of \(\mathbb{R}^n\) by Theorem [thm:014436]. Now use Theorem [thm:016068].

    016152 Show that \(A = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 1 & 2 & 3 \\ -1 & 1 & 0 \\ \end{array} \right]\) is diagonalizable.

    A routine computation shows that \(c_{A}(x) = (x - 1)(x - 3)(x + 1)\) and so has distinct eigenvalues \(1\), \(3\), and \(-1\). Hence Theorem [thm:016145] applies.

    However, a matrix can have multiple eigenvalues as we saw in Section [sec:3_3]. To deal with this situation, we prove an important lemma which formalizes a technique that is basic to diagonalization, and which will be used three times below.

    016161 Let \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\}\) be a linearly independent set of eigenvectors of an \(n \times n\) matrix \(A\), extend it to a basis \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}, \dots, \mathbf{x}_{n}\}\) of \(\mathbb{R}^n\), and let

    \[P = \left[ \begin{array}{cccc} \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \end{array} \right] \nonumber \]

    be the (invertible) \(n \times n\) matrix with the \(\mathbf{x}_{i}\) as its columns. If \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{k}\) are the (not necessarily distinct) eigenvalues of \(A\) corresponding to \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\) respectively, then \(P^{-1}AP\) has block form

    \[P^{-1}AP = \left[ \begin{array}{cr} \func{diag}(\lambda_1, \lambda_2, \dots, \lambda_k) & B \\ 0 & A_1 \end{array} \right] \nonumber \]

    where \(B\) has size \(k \times (n - k)\) and \(A_1\) has size \((n - k) \times (n - k)\).

    If \(\{\mathbf{e}_{1}, \mathbf{e}_{2}, \dots, \mathbf{e}_{n}\}\) is the standard basis of \(\mathbb{R}^n\), then

    \[\begin{aligned} \left[ \begin{array}{cccc} \mathbf{e}_1 & \mathbf{e}_2 & \dots & \mathbf{e}_n \end{array} \right] = I_n = P^{-1}P &= P^{-1} \left[ \begin{array}{cccc} \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \end{array} \right] \\ &= \left[ \begin{array}{cccc} P^{-1}\mathbf{x}_1 & P^{-1}\mathbf{x}_2 & \cdots & P^{-1}\mathbf{x}_n \end{array} \right]\end{aligned} \nonumber \]

    Comparing columns, we have \(P^{-1}\mathbf{x}_{i} = \mathbf{e}_{i}\) for each \(1 \leq i \leq n\). On the other hand, observe that

    \[P^{-1}AP = P^{-1}A \left[ \begin{array}{cccc} \mathbf{x}_1 & \mathbf{x}_2 & \cdots & \mathbf{x}_n \end{array} \right] = \left[ \begin{array}{cccc} (P^{-1}A)\mathbf{x}_1 & (P^{-1}A)\mathbf{x}_2 & \cdots & (P^{-1}A)\mathbf{x}_n \end{array} \right] \nonumber \]

    Hence, if \(1 \leq i \leq k\), column \(i\) of \(P^{-1}AP\) is

    \[(P^{-1}A)\mathbf{x}_i = P^{-1}(\lambda_i\mathbf{x}_i) = \lambda_i(P^{-1}\mathbf{x}_i) = \lambda_i\mathbf{e}_i \nonumber \]

    This describes the first \(k\) columns of \(P^{-1}AP\), and Lemma [lem:016161] follows.

    Note that Lemma [lem:016161] (with \(k = n\)) shows that an \(n \times n\) matrix \(A\) is diagonalizable if \(\mathbb{R}^n\) has a basis of eigenvectors of \(A\), as in (1) of Theorem [thm:016068].

    Eigenspace of a Matrix016203 If \(\lambda\) is an eigenvalue of an \(n \times n\) matrix \(A\), define the eigenspace of \(A\) corresponding to \(\lambda\) by

    \[E_{\lambda}(A) = \{\mathbf{x}\mbox{ in } \mathbb{R}^n \mid A\mathbf{x} = \lambda\mathbf{x} \} \nonumber \]

    This is a subspace of \(\mathbb{R}^n\) and the eigenvectors corresponding to \(\lambda\) are just the nonzero vectors in \(E_{\lambda}(A)\). In fact \(E_{\lambda}(A)\) is the null space of the matrix \((\lambda I - A)\):

    \[E_{\lambda}(A) = \{\mathbf{x} \mid (\lambda I - A)\mathbf{x} = \mathbf{0} \} = \func{null}(\lambda I - A) \nonumber \]

    Hence, by Theorem [thm:015561], the basic solutions of the homogeneous system \((\lambda I - A)\mathbf{x} = \mathbf{0}\) given by the gaussian algorithm form a basis for \(E_{\lambda}(A)\). In particular

    \[\label{eq:number_of_basic_solutions} dim \; E_\lambda(A) \mbox{ is the number of basic solutions }\mathbf{x}\mbox{ of } (\lambda I - A)\mathbf{x} = \mathbf{0} \]

    Now recall (Definition [def:009289]) that the multiplicity1 of an eigenvalue \(\lambda\) of \(A\) is the number of times \(\lambda\) occurs as a root of the characteristic polynomial \(c_{A}(x)\) of \(A\). In other words, the multiplicity of \(\lambda\) is the largest integer \(m \geq 1\) such that

    \[c_A(x) = (x - \lambda)^mg(x) \nonumber \]

    for some polynomial \(g(x)\). Because of ([eq:number_of_basic_solutions]), the assertion (without proof) in Theorem [thm:009296] can be stated as follows: A square matrix is diagonalizable if and only if the multiplicity of each eigenvalue \(\lambda\) equals \(dim \;\left[ E_{\lambda}(A)\right]\). We are going to prove this, and the proof requires the following result which is valid for any square matrix, diagonalizable or not.

    016219 Let \(\lambda\) be an eigenvalue of multiplicity \(m\) of a square matrix \(A\). Then \(dim \;\left[ E_{\lambda}(A)\right] \leq m\).

    Write \(dim \;\left[ E_{\lambda}(A)\right] = d\). It suffices to show that \(c_{A}(x) = (x - \lambda)^{d}g(x)\) for some polynomial \(g(x)\), because \(m\) is the highest power of \((x - \lambda)\) that divides \(c_{A}(x)\). To this end, let \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{d}\}\) be a basis of \(E_{\lambda}(A)\). Then Lemma [lem:016161] shows that an invertible \(n \times n\) matrix \(P\) exists such that

    \[P^{-1}AP = \left[ \begin{array}{cc} \lambda I_d & B \\ 0 & A_1 \end{array} \right] \nonumber \]

    in block form, where \(I_{d}\) denotes the \(d \times d\) identity matrix. Now write \(A^\prime = P^{-1}AP\) and observe that \(c_{A^\prime}(x) = c_A(x)\) by Theorem [thm:016008]. But Theorem [thm:007890] gives

    \[\begin{aligned} c_A(x) = c_{A^\prime}(x) = \det (xI_n - A^\prime) &= \det \left[ \begin{array}{cc} (x - \lambda)I_d & -B \\ 0 & xI_{n-d} - A_1 \end{array} \right] \\ &= \det \left[ (x - \lambda)I_d\right] \det \left[ (xI_{n-d} - A_1)\right] \\ &= (x - \lambda)^dg(x)\end{aligned} \nonumber \]

    where \(g(x) = cA_{1}(x)\). This is what we wanted.

    It is impossible to ignore the question when equality holds in Lemma [lem:016219] for each eigenvalue \(\lambda\). It turns out that this characterizes the diagonalizable \(n \times n\) matrices \(A\) for which \(c_{A}(x)\) factors completely over \(\mathbb{R}\). By this we mean that \(c_{A}(x) = (x - \lambda_{1})(x - \lambda_{2}) \cdots (x - \lambda_{n})\), where the \(\lambda_{i}\) are real numbers (not necessarily distinct); in other words, every eigenvalue of \(A\) is real. This need not happen (consider \(A = \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right]\)), and we investigate the general case below.

    016250 The following are equivalent for a square matrix \(A\) for which \(c_{A}(x)\) factors completely.

    1. \(A\) is diagonalizable.
    2. \(dim \;[E_{\lambda}(A)]\) equals the multiplicity of \(\lambda\) for every eigenvalue \(\lambda\) of the matrix \(A\).

    Let \(A\) be \(n \times n\) and let \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{k}\) be the distinct eigenvalues of \(A\). For each \(i\), let \(m_{i}\) denote the multiplicity of \(\lambda_{i}\) and write \(d_{i} = dim \;\left[ E_{\lambda_i}(A) \right]\). Then

    \[c_A(x) = (x - \lambda_1)^{m_1}(x - \lambda_2)^{m_2}\dots (x - \lambda_k)^{m_k} \nonumber \]

    so \(m_{1} + \dots + m_{k} = n\) because \(c_{A}(x)\) has degree \(n\). Moreover, \(d_{i} \leq m_{i}\) for each \(i\) by Lemma [lem:016219].

    (1) \(\Rightarrow\) (2). By (1), \(\mathbb{R}^n\) has a basis of \(n\) eigenvectors of \(A\), so let \(t_{i}\) of them lie in \(E_{\lambda_i}(A)\) for each \(i\). Since the subspace spanned by these \(t_{i}\) eigenvectors has dimension \(t_{i}\), we have \(t_{i} \leq d_{i}\) for each \(i\) by Theorem [thm:014254]. Hence

    \[n = t_1 + \dots + t_k \leq d_1 + \dots + d_k \leq m_1 + \dots + m_k = n \nonumber \]

    It follows that \(d_1 + \dots + d_k = m_1 + \dots + m_k\) so, since \(d_{i} \leq m_{i}\) for each \(i\), we must have \(d_{i} = m_{i}\). This is (2).

    (2) \(\Rightarrow\) (1). Let \(B_{i}\) denote a basis of \(E_{\lambda_i}(A)\) for each \(i\), and let \(B = B_{1} \cup \dots \cup B_{k}\). Since each \(B_{i}\) contains \(m_{i}\) vectors by (2), and since the \(B_{i}\) are pairwise disjoint (the \(\lambda_{i}\) are distinct), it follows that \(B\) contains \(n\) vectors. So it suffices to show that \(B\) is linearly independent (then \(B\) is a basis of \(\mathbb{R}^n\)). Suppose a linear combination of the vectors in \(B\) vanishes, and let \(\mathbf{y}_{i}\) denote the sum of all terms that come from \(B_{i}\). Then \(\mathbf{y}_{i}\) lies in \(E_{\lambda_i}(A)\), so the nonzero \(\mathbf{y}_{i}\) are independent by Theorem [thm:016090] (as the \(\lambda_{i}\) are distinct). Since the sum of the \(\mathbf{y}_{i}\) is zero, it follows that \(\mathbf{y}_{i} = \mathbf{0}\) for each \(i\). Hence all coefficients of terms in \(\mathbf{y}_{i}\) are zero (because \(B_{i}\) is independent). Since this holds for each \(i\), it shows that \(B\) is independent.

    016314 If \(A = \left[ \begin{array}{rrr} 5 & 8 & 16 \\ 4 & 1 & 8 \\ -4 & -4 & -11 \end{array} \right]\) and \(B = \left[ \begin{array}{rrr} 2 & 1 & 1 \\ 2 & 1 & -2 \\ -1 & 0 & -2 \end{array} \right]\) show that \(A\) is diagonalizable but \(B\) is not.

    We have \(c_{A}(x) = (x + 3)^{2}(x - 1)\) so the eigenvalues are \(\lambda_{1} = -3\) and \(\lambda_{2} = 1\). The corresponding eigenspaces are \(E_{\lambda_1}(A) = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) and \(E_{\lambda_2}(A) = span \;\{\mathbf{x}_{3}\}\) where

    \[\mathbf{x}_1 = \left[ \begin{array}{r} -1\\ 1\\ 0 \end{array} \right] , \mathbf{x}_2 = \left[ \begin{array}{r} -2\\ 0\\ 1 \end{array} \right] , \mathbf{x}_3 = \left[ \begin{array}{r} 2\\ 1\\ -1 \end{array} \right] \nonumber \]

    as the reader can verify. Since \(\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) is independent, we have \(dim \;(E_{\lambda_1}(A)) = 2\) which is the multiplicity of \(\lambda_{1}\). Similarly, \(dim \;(E_{\lambda_2}(A)) = 1\) equals the multiplicity of \(\lambda_{2}\). Hence \(A\) is diagonalizable by Theorem [thm:016250], and a diagonalizing matrix is \(P = \left[ \begin{array}{ccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \mathbf{x}_{3} \end{array} \right]\).

    Turning to \(B\), \(c_{B}(x) = (x + 1)^{2}(x - 3)\) so the eigenvalues are \(\lambda_{1} = -1\) and \(\lambda_{2} = 3\). The corresponding eigenspaces are \(E_{\lambda_1}(B) =span \;\{\mathbf{y}_{1}\}\) and \(E_{\lambda_2}(B) = span \;\{\mathbf{y}_{2}\}\) where

    \[\mathbf{y}_1 = \left[ \begin{array}{r} -1\\ 2\\ 1 \end{array} \right] , \mathbf{y}_2 = \left[ \begin{array}{r} 5\\ 6\\ -1 \end{array} \right] \nonumber \]

    Here \(dim \;(E_{\lambda_1}(B)) = 1\) is smaller than the multiplicity of \(\lambda_{1}\), so the matrix \(B\) is not diagonalizable, again by Theorem [thm:016250]. The fact that \(dim \;(E_{\lambda_1}(B)) = 1\) means that there is no possibility of finding three linearly independent eigenvectors.

    Complex Eigenvalues

    All the matrices we have considered have had real eigenvalues. But this need not be the case: The matrix \(A = \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right]\) has characteristic polynomial \(c_{A}(x) = x^{2} + 1\) which has no real roots. Nonetheless, this matrix is diagonalizable; the only difference is that we must use a larger set of scalars, the complex numbers. The basic properties of these numbers are outlined in Appendix [chap:appacomplexnumbers].

    Indeed, nearly everything we have done for real matrices can be done for complex matrices. The methods are the same; the only difference is that the arithmetic is carried out with complex numbers rather than real ones. For example, the gaussian algorithm works in exactly the same way to solve systems of linear equations with complex coefficients, matrix multiplication is defined the same way, and the matrix inversion algorithm works in the same way.

    But the complex numbers are better than the real numbers in one respect: While there are polynomials like \(x^{2} + 1\) with real coefficients that have no real root, this problem does not arise with the complex numbers: Every nonconstant polynomial with complex coefficients has a complex root, and hence factors completely as a product of linear factors. This fact is known as the fundamental theorem of algebra.2

    016368 Diagonalize the matrix \(A = \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right].\)

    The characteristic polynomial of \(A\) is

    \[c_A(x) = \det (xI - A) = x^2 + 1 = (x - i)(x + i) \nonumber \]

    where \(i^{2} = -1\). Hence the eigenvalues are \(\lambda_{1} = i\) and \(\lambda_{2} = -i\), with corresponding eigenvectors \(\mathbf{x}_1 = \left[ \begin{array}{r} 1 \\ -i \end{array} \right]\) and \(\mathbf{x}_2 = \left[ \begin{array}{r} 1 \\ i \end{array} \right].\) Hence \(A\) is diagonalizable by the complex version of Theorem [thm:016145], and the complex version of Theorem [thm:016068] shows that \(P = \left[ \begin{array}{cc} \mathbf{x}_1\ & \mathbf{x}_2 \end{array} \right]= \left[ \begin{array}{rr} 1 & 1 \\ -i & i \end{array} \right]\) is invertible and \(P^{-1}AP = \left[ \begin{array}{cc} \lambda_1 & 0 \\ 0 & \lambda_2 \end{array} \right] = \left[ \begin{array}{cc} i & 0 \\ 0 & -i \end{array} \right]\). Of course, this can be checked directly.

    We shall return to complex linear algebra in Section [sec:8_6].

    Symmetric Matrices3

    On the other hand, many of the applications of linear algebra involve a real matrix \(A\) and, while \(A\) will have complex eigenvalues by the fundamental theorem of algebra, it is always of interest to know when the eigenvalues are, in fact, real. While this can happen in a variety of ways, it turns out to hold whenever \(A\) is symmetric. This important theorem will be used extensively later. Surprisingly, the theory of complex eigenvalues can be used to prove this useful result about real eigenvalues.

    Let \(\overline{z}\) denote the conjugate of a complex number \(z\). If \(A\) is a complex matrix, the conjugate matrix \(\overline{A}\) is defined to be the matrix obtained from \(A\) by conjugating every entry. Thus, if \(A = \left[ z_{ij}\right]\), then \(\overline{A} = \left[ \overline{z}_{ij} \right]\). For example,

    \[\mbox{If } A = \left[ \begin{array}{cc} -i + 2 & 5 \\ i & 3 + 4i \end{array} \right] \mbox{ then } \overline{A} = \left[ \begin{array}{cc} i + 2 & 5 \\ -i & 3 - 4i \end{array} \right] \nonumber \]

    Recall that \(\overline{z + w} = \overline{z} + \overline{w}\) and \(\overline{zw} = \overline{z}\ \overline{w}\) hold for all complex numbers \(z\) and \(w\). It follows that if \(A\) and \(B\) are two complex matrices, then

    \[\overline{A + B} = \overline{A} + \overline{B},\quad \overline{AB} = \overline{A}\ \overline{B}\quad \mbox{ and } \overline{\lambda A} = \overline{\lambda}\ \overline{A} \nonumber \]

    hold for all complex scalars \(\lambda\). These facts are used in the proof of the following theorem.

    016397 Let \(A\) be a symmetric real matrix. If \(\lambda\) is any complex eigenvalue of \(A\), then \(\lambda\) is real.

    Observe that \(\overline{A} = A\) because \(A\) is real. If \(\lambda\) is an eigenvalue of \(A\), we show that \(\lambda\) is real by showing that \(\overline{\lambda} = \lambda\). Let \(\mathbf{x}\) be a (possibly complex) eigenvector corresponding to \(\lambda\), so that \(\mathbf{x} \neq \mathbf{0}\) and \(A\mathbf{x} = \lambda\mathbf{x}\). Define \(c = \mathbf{x}^T\overline{\mathbf{x}}\).

    If we write \(\mathbf{x} = \left[ \begin{array}{c} z_1 \\ z_2 \\ \vdots \\ z_n \end{array}\right]\) where the \(z_{i}\) are complex numbers, we have

    \[c = \mathbf{x}^T\overline{\mathbf{x}} = z_1\overline{z_1} + z_2\overline{z_2} + \dotsb + z_n\overline{z_n} = |\overline{z_1}|^2 + |\overline{z_2}|^2 + \dotsb + |\overline{z_n}|^2 \nonumber \]

    Thus \(c\) is a real number, and \(c > 0\) because at least one of the \(z_{i} \neq 0\) (as \(\mathbf{x} \neq \mathbf{0}\)). We show that \(\overline{\lambda} = \lambda\) by verifying that \(\lambda c = \overline{\lambda}c\). We have

    \[\lambda c = \lambda(\mathbf{x}^T\overline{\mathbf{x}}) = (\lambda \mathbf{x})^T\overline{\mathbf{x}} = (A\mathbf{x})^T\overline{\mathbf{x}} = \mathbf{x}^TA^T\overline{\mathbf{x}} \nonumber \]

    At this point we use the hypothesis that \(A\) is symmetric and real. This means \(A^T = A = \overline{A}\) so we continue the calculation:

    \[\begin{aligned} \lambda c = \mathbf{x}^TA^T\overline{\mathbf{x}} = \mathbf{x}^T(\overline{A}\ \overline{\mathbf{x}}) = \mathbf{x}^T(\overline{A\mathbf{x}}) &= \mathbf{x}^T(\overline{\lambda \mathbf{x}}) \\ &= \mathbf{x}^T(\overline{\lambda}\ \overline{\mathbf{x}}) \\ &= \overline{\lambda}\mathbf{x}^T\overline{\mathbf{x}} \\ &= \overline{\lambda}c\end{aligned} \nonumber \]

    as required.

    The technique in the proof of Theorem [thm:016397] will be used again when we return to complex linear algebra in Section [sec:8_6].

    016421 Verify Theorem [thm:016397] for every real, symmetric \(2 \times 2\) matrix \(A\).

    If \(A = \left[ \begin{array}{cc} a & b \\ b & c \end{array} \right]\) we have \(c_{A}(x) = x^{2} - (a + c)x + (ac - b^{2})\), so the eigenvalues are given by \(\lambda = \frac{1}{2} [(a + c) \pm \sqrt{(a + c)^2 - 4(ac - b^2)}]\). But here

    \[(a + c)^2 - 4(ac -b^2) = (a - c)^2 + 4b^2 \geq 0 \nonumber \]

    for any choice of \(a\), \(b\), and \(c\). Hence, the eigenvalues are real numbers.


    1. This is often called the algebraic multiplicity of \(\lambda\).↩
    2. This was a famous open problem in 1799 when Gauss solved it at the age of 22 in his Ph.D. dissertation.↩
    3. This discussion uses complex conjugation and absolute value. These topics are discussed in Appendix [chap:appacomplexnumbers].↩

    This page titled 5.5: Similarity and Diagonalization is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by W. Keith Nicholson (Lyryx Learning Inc.) via source content that was edited to the style and standards of the LibreTexts platform.