3.4: Diagonalization

Last updated
Save as PDF

Page ID: 214766

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

An \(n \times n\) matrix \(D\) is called a diagonal matrix if all its entries off the main diagonal are zero, that is if \(D\) has the form

\[D = \left[ \begin{array}{cccc} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{array}\right] = \text{diag}(\lambda_1, \lambda_2, \cdots, \lambda_n) \nonumber \]

where \(\lambda_{1}, \lambda_{2}, \dots , \lambda_{n}\) are numbers. Calculations with diagonal matrices are very easy. Indeed, if \(D = \text{diag}(\lambda_{1}, \lambda_{2}, \dots , \lambda_{n})\) and \(E = \text{diag}(\mu_{1}, \mu_{2}, \dots , \mu_{n})\) are two diagonal matrices, their product \(DE\) and sum \(D + E\) are again diagonal, and are obtained by doing the same operations to corresponding diagonal elements:

\[\begin{aligned} DE &= \text{diag}(\lambda_1 \mu_1, \lambda_2 \mu_2, \dots, \lambda_n \mu_n) \\ D + E &= \text{diag}(\lambda_1 + \mu_1, \lambda_2 + \mu_2, \dots, \lambda_n + \mu_n)\end{aligned} \nonumber \]

Because of the simplicity of these formulas, and with an eye on Theorem 3.3.1 and the discussion preceding it, we make another definition:

Definition \(\PageIndex{1}\): Diagonalizable Matrices

An \(n \times n\) matrix \(A\) is called diagonalizable if

\[P^{-1}AP \mbox{ is diagonal for some invertible } n \times n \mbox{ matrix } P \nonumber \]

Here the invertible matrix \(P\) is called a diagonalizing matrix for \(A\).

To discover when such a matrix \(P\) exists, we let \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots , \mathbf{x}_{n}\) denote the columns of \(P\) and look for ways to determine when such \(\mathbf{x}_{i}\) exist and how to compute them. To this end, write \(P\) in terms of its columns as follows:

\[P = \left[ \mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_n \right] \nonumber \]

Observe that \(P^{-1}AP = D\) for some diagonal matrix \(D\) holds if and only if

\[AP = PD \nonumber \]

If we write \(D = \text{diag}(\lambda_{1}, \lambda_{2}, \dots , \lambda_{n})\), where the \(\lambda_{i}\) are numbers to be determined, the equation \(AP = PD\) becomes

\[A\left[ \mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_n \right] = \left[ \mathbf{x}_1, \mathbf{x}_2, \cdots, \mathbf{x}_n \right] \left[ \begin{array}{cccc} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{array}\right] \nonumber \]

By the definition of matrix multiplication, each side simplifies as follows

\[\left[ \begin{array}{cccc} A\mathbf{x}_1 & A\mathbf{x}_2 & \cdots & A\mathbf{x}_n \end{array}\right] = \left[ \begin{array}{cccc} \lambda_1 \mathbf{x}_1 & \lambda_2 \mathbf{x}_2 & \cdots & \lambda_n \mathbf{x}_n \end{array}\right] \nonumber \]

Comparing columns shows that \(A\mathbf{x}_{i} = \lambda_{i}\mathbf{x}_{i}\) for each \(i\), so

\[P^{-1}AP = D \quad \mbox{ if and only if } A\mathbf{x}_i = \lambda_i \mathbf{x}_i \mbox{ for each } i \nonumber \]

In other words, \(P^{-1}AP = D\) holds if and only if the diagonal entries of \(D\) are eigenvalues of \(A\) and the columns of \(P\) are corresponding eigenvectors. This proves the following fundamental result.

Theorem \(\PageIndex{1}\)

Let \(A\) be an \(n \times n\) matrix.

\(A\) is diagonalizable if and only if it has eigenvectors \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots , \mathbf{x}_{n}\) such that the matrix \(P = \left[ \begin{array}{cccc} \mathbf{x}_{1}& \mathbf{x}_{2} & \dots & \mathbf{x}_{n} \end{array}\right]\) is invertible.
When this is the case, \(P^{-1}AP = \text{diag}(\lambda_{1}, \lambda_{2}, \dots , \lambda_{n})\) where, for each \(i\), \(\lambda_{i}\) is the eigenvalue of \(A\) corresponding to \(\mathbf{x}_{i}\).

Example \(\PageIndex{1}\)

Diagonalize the matrix \(A = \left[ \begin{array}{rrr} 2 & 0 & 0 \\ 1 & 2 & -1 \\ 1 & 3 & -2 \end{array}\right]\) in Example 3.3.4.

Solution

By Example 3.3.4, the eigenvalues of \(A\) are \(\lambda_{1} = 2\), \(\lambda_{2} = 1\), and \(\lambda_{3} = -1\), with corresponding basic eigenvectors \(\mathbf{x}_1 = \left[ \begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right], \mathbf{x}_2 = \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array}\right]\), and \(\mathbf{x}_3 = \left[ \begin{array}{r} 0 \\ 1 \\ 3 \end{array}\right]\) respectively. Since the matrix \(P= \left[ \begin{array}{cccc} \mathbf{x}_1 & \mathbf{x}_2 & \mathbf{x}_3 \end{array}\right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 1 & 3 \end{array}\right]\) is invertible, Theorem 3.4.1 guarantees that

\[P^{-1} AP = \left[ \begin{array}{ccc} \lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{array}\right] = \left[ \begin{array}{ccc} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{array}\right] =D \nonumber \]

The reader can verify this directly—easier to check \(AP = PD\).

In Example 3.4.1, suppose we let \(Q = \left[ \begin{array}{ccc} \mathbf{x}_{2} & \mathbf{x}_{1} & \mathbf{x}_{3} \end{array} \right]\) be the matrix formed from the eigenvectors \(\mathbf{x}_{1}\), \(\mathbf{x}_{2}\), and \(\mathbf{x}_{3}\) of \(A\), but in a different order than that used to form \(P\). Then \(Q^{-1}AQ = \text{diag}(\lambda_{2}, \lambda_{1}, \lambda_{3})\) is diagonal by Theorem 3.4.1, but the eigenvalues are in the new order. Hence we can choose the diagonalizing matrix \(P\) so that the eigenvalues \(\lambda_{i}\) appear in any order we want along the main diagonal of \(D\).

In every example above each eigenvalue has had only one basic eigenvector. Here is a diagonalizable matrix where this is not the case.

Example \(\PageIndex{2}\)

Diagonalize the matrix \(A = \left[ \begin{array}{rrr} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{array}\right]\)

Solution

To compute the characteristic polynomial of \(A\) first add rows 2 and 3 of \(xI - A\) to row 1:

\[\begin{aligned} c_A(x) &= \det \left[ \begin{array}{ccc} x & -1 & -1 \\ -1 & x & -1\\ -1 & -1 & x \end{array}\right] = \det \left[ \begin{array}{ccc} x-2 & x-2 & x-2 \\ -1 & x & -1\\ -1 & -1 & x \end{array}\right] \\ & = \det \left[ \begin{array}{ccc} x-2 & 0 & 0 \\ -1 & x+1 & 0\\ -1 & 0 & x+1 \end{array}\right] = (x-2)(x+1)^2\end{aligned} \nonumber \]

Hence the eigenvalues are \(\lambda_{1} = 2\) and \(\lambda_{2} = -1\), with \(\lambda_{2}\) repeated twice (we say that \(\lambda_{2}\) has multiplicity two). However, \(A\) is diagonalizable. For \(\lambda_{1} = 2\), the system of equations \((\lambda_{1}I - A)\mathbf{x} = \mathbf{0}\) has general solution \(\mathbf{x} = t \left[ \begin{array}{r} 1\\ 1\\ 1 \end{array}\right]\) as the reader can verify, so a basic \(\lambda_{1}\)-eigenvector is \(\mathbf{x}_1 = \left[ \begin{array}{r} 1\\ 1\\ 1 \end{array}\right]\).

Turning to the repeated eigenvalue \(\lambda_{2} = -1\), we must solve \((\lambda_{2}I - A)\mathbf{x} = \mathbf{0}\). By gaussian elimination, the general solution is \(\mathbf{x} = s \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array}\right] + t \left[ \begin{array}{r} -1\\ 0\\ 1 \end{array}\right]\) where \(s\) and \(t\) are arbitrary. Hence the gaussian algorithm produces two basic \(\lambda_{2}\)-eigenvectors \(\mathbf{x}_2 = \left[ \begin{array}{r} -1\\ 1\\ 0 \end{array}\right]\) and \(\mathbf{y}_2 = \left[ \begin{array}{r} -1\\ 0\\ 1 \end{array}\right]\) If we take \(P = \left[ \begin{array}{ccc} \mathbf{x}_1 & \mathbf{x}_2 & \mathbf{y}_2 \end{array}\right] = \left[ \begin{array}{rrr} 1 & -1 & -1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{array}\right]\) we find that \(P\) is invertible. Hence \(P^{-1}AP = \text{diag}(2, -1, -1)\) by Theorem 3.4.1.

Example 3.4.2 typifies every diagonalizable matrix. To describe the general case, we need some terminology.

Definition \(\PageIndex{2}\): Multiplicity of an Eigenvalue

An eigenvalue \(\lambda\) of a square matrix \(A\) is said to have multiplicity \(m\) if it occurs \(m\) times as a root of the characteristic polynomial \(c_{A}(x)\).

For example, the eigenvalue \(\lambda_{2} = -1\) in Example 3.4.2 has multiplicity \(2\). In that example the gaussian algorithm yields two basic \(\lambda_{2}\)-eigenvectors, the same number as the multiplicity. This works in general.

Theorem \(\PageIndex{2}\)

A square matrix \(A\) is diagonalizable if and only if every eigenvalue \(\lambda\) of multiplicity \(m\) yields exactly \(m\) basic eigenvectors; that is, if and only if the general solution of the system \((\lambda I - A)\mathbf{x} = \mathbf{0}\) has exactly \(m\) parameters.

One case of Theorem 3.4.2 deserves mention.

Theorem \(\PageIndex{3}\)

An \(n \times n\) matrix with \(n\) distinct eigenvalues is diagonalizable.

The proofs of Theorem 3.4.2 and Theorem 3.4.3 require more advanced techniques and are given in Chapter 5. The following procedure summarizes the method.

Theorem: Diagonalization Algorithm

To diagonalize an \(n \times n\) matrix \(A\):

Step 1. Find the distinct eigenvalues \(\lambda\) of \(A\).
Step 2. Compute a set of basic eigenvectors corresponding to each of these eigenvalues \(\lambda\) as basic solutions of the homogeneous system \((\lambda I - A)\mathbf{x} = \mathbf{0}\).
Step 3. The matrix A is diagonalizable if and only if there are n basic eigenvectors in all.
Step 4. If \(A\) is diagonalizable, the \(n \times n\) matrix \(P\) with these basic eigenvectors as its columns is a diagonalizing matrix for \(A\), that is, \(P\) is invertible and \(P^{-1}AP\) is diagonal.

The diagonalization algorithm is valid even if the eigenvalues are nonreal complex numbers. In this case the eigenvectors will also have complex entries, but we will not pursue this here.

Example \(\PageIndex{3}\)

Show that \(A = \left[ \begin{array}{rr} 1 & 1 \\ 0 & 1 \end{array}\right]\) is not diagonalizable.

Solution

The characteristic polynomial is \(c_{A}(x) = (x - 1)^{2}\), so \(A\) has only one eigenvalue \(\lambda_{1} = 1\) of multiplicity \(2\). But the system of equations \((\lambda_{1}I - A)\mathbf{x} = \mathbf{0}\) has general solution \(t \left[ \begin{array}{r} 1 \\ 0 \end{array}\right]\), so there is only one parameter, and so only one basic eigenvector \(\left[ \begin{array}{r} 1 \\ 2 \end{array}\right]\). Hence \(A\) is not diagonalizable.

We have \(c_{A}(x) = (x - 1)^{2}\) so the only eigenvalue of \(A\) is \(\lambda = 1\). Hence, if \(A\) were diagonalizable, Theorem 3.4.1 would give \(P^{-1}AP = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array}\right] = I\) for some invertible matrix \(P\). But then \(A = PIP^{-1} = I\), which is not the case. So \(A\) cannot be diagonalizable.

Diagonalizable matrices share many properties of their eigenvalues. The following example illustrates why.

Example \(\PageIndex{4}\)

If \(\lambda^{3} = 5\lambda\) for every eigenvalue of the diagonalizable matrix \(A\), show that \(A^{3} = 5A\).

Solution

Let \(P^{-1}AP = D = \text{diag}(\lambda_{1}, \dots , \lambda_{n})\). Because \(\lambda_i^3\) = \(5\lambda_{i}\) for each \(i\), we obtain

\[D^3 = \text{diag}(\lambda_1^3 ,\dots, \lambda_n^3) = \text{diag}(5\lambda_1, \dots, 5 \lambda_n) = 5D \nonumber \]

Hence \(A^{3} = (PDP^{-1})^{3} = PD^{3}P^{-1} = P(5D)P^{-1} = 5(PDP^{-1}) = 5A\) using Theorem 3.3.1. This is what we wanted.

If \(p(x)\) is any polynomial and \(p(\lambda) = 0\) for every eigenvalue of the diagonalizable matrix \(A\), an argument similar to that in Example \(\PageIndex{4}\) shows that \(p(A) = 0\). Thus Example \(\PageIndex{4}\) deals with the case \(p(x) = x^{3} - 5x\). In general, \(p(A)\) is called the evaluation of the polynomial \(p(x)\) at the matrix \(A\). For example, if \(p(x) = 2x^{3} - 3x + 5\), then \(p(A) = 2A^{3} - 3A + 5I\)—note the use of the identity matrix.

In particular, if \(c_{A}(x)\) denotes the characteristic polynomial of \(A\), we certainly have \(c_{A}(\lambda) = 0\) for each eigenvalue \(\lambda\) of \(A\) (Theorem 3.3.2). Hence \(c_{A}(A) = 0\) for every diagonalizable matrix \(A\). This is, in fact, true for any square matrix, diagonalizable or not, and the general result is called the Cayley-Hamilton theorem. It is proved in Section 8.6 and again in Section 11.1.

Search

Text Color

Text Size

Margin Size

Font Type

Solution

Solution

Solution

Solution