8.2: Orthogonal Diagonalization
- Page ID
- 58877
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Recall (Theorem [thm:016068]) that an \(n \times n\) matrix \(A\) is diagonalizable if and only if it has \(n\) linearly independent eigenvectors. Moreover, the matrix \(P\) with these eigenvectors as columns is a diagonalizing matrix for \(A\), that is
\[P^{-1}AP \mbox{ is diagonal.} \nonumber \]
As we have seen, the really nice bases of \(\mathbb{R}^n\) are the orthogonal ones, so a natural question is: which \(n \times n\) matrices have an orthogonal basis of eigenvectors? These turn out to be precisely the symmetric matrices, and this is the main result of this section.
Before proceeding, recall that an orthogonal set of vectors is called orthonormal if \(\|\mathbf{v}\| = 1\) for each vector \(\mathbf{v}\) in the set, and that any orthogonal set \(\{\mathbf{v}_{1}, \mathbf{v}_{2}, \dots, \mathbf{v}_{k}\}\) can be “normalized”, that is converted into an orthonormal set \(\lbrace \frac{1}{\| \mathbf{v}_{1} \|}\mathbf{v}_{1}, \frac{1}{\| \mathbf{v}_{2} \|}\mathbf{v}_{2}, \dots, \frac{1}{\| \mathbf{v}_{k} \|}\mathbf{v}_{k} \rbrace\). In particular, if a matrix \(A\) has \(n\) orthogonal eigenvectors, they can (by normalizing) be taken to be orthonormal. The corresponding diagonalizing matrix \(P\) has orthonormal columns, and such matrices are very easy to invert.
024227 The following conditions are equivalent for an \(n \times n\) matrix \(P\).
- \(P\) is invertible and \(P^{-1} = P^{T}\).
- The rows of \(P\) are orthonormal.
- The columns of \(P\) are orthonormal.
First recall that condition (1) is equivalent to \(PP^{T} = I\) by Corollary [cor:004612] of Theorem [thm:004553]. Let \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\) denote the rows of \(P\). Then \(\mathbf{x}_{j}^{T}\) is the \(j\)th column of \(P^{T}\), so the \((i, j)\)-entry of \(PP^{T}\) is \(\mathbf{x}_{i}\bullet \mathbf{x}_{j}\). Thus \(PP^{T} = I\) means that \(\mathbf{x}_{i}\bullet \mathbf{x}_{j} = 0\) if \(i \neq j\) and \(\mathbf{x}_{i}\bullet \mathbf{x}_{j} = 1\) if \(i = j\). Hence condition (1) is equivalent to (2). The proof of the equivalence of (1) and (3) is similar.
Orthogonal Matrices024256 An \(n \times n\) matrix \(P\) is called an orthogonal matrixif it satisfies one (and hence all) of the conditions in Theorem [thm:024227].
024259 The rotation matrix \(\left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\right]\) is orthogonal for any angle \(\theta\).
These orthogonal matrices have the virtue that they are easy to invert—simply take the transpose. But they have many other important properties as well. If \(T : \mathbb{R}^n \to \mathbb{R}^n\) is a linear operator, we will prove (Theorem [thm:032147]) that \(T\) is distance preserving if and only if its matrix is orthogonal. In particular, the matrices of rotations and reflections about the origin in \(\mathbb{R}^2\) and \(\mathbb{R}^3\) are all orthogonal (see Example [exa:024259]).
It is not enough that the rows of a matrix \(A\) are merely orthogonal for \(A\) to be an orthogonal matrix. Here is an example.
024269 The matrix \(\left[ \begin{array}{rrr} 2 & 1 & 1 \\ -1 & 1 & 1 \\ 0 & -1 & 1 \end{array}\right]\) has orthogonal rows but the columns are not orthogonal. However, if the rows are normalized, the resulting matrix \(\def\arraystretch{1.5}\left[ \begin{array}{rrr} \frac{2}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} \\ \frac{-1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ 0 & \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array}\right]\) is orthogonal (so the columns are now orthonormal as the reader can verify).
024275 If \(P\) and \(Q\) are orthogonal matrices, then \(PQ\) is also orthogonal, as is \(P^{-1} = P^{T}\).
\(P\) and \(Q\) are invertible, so \(PQ\) is also invertible and
\[(PQ)^{-1} = Q^{-1}P^{-1} = Q^{T}P^{T} = (PQ)^{T} \nonumber \]
Hence \(PQ\) is orthogonal. Similarly,
\[(P^{-1})^{-1} = P = (P^{T})^{T} = (P^{-1})^{T} \nonumber \]
shows that \(P^{-1}\) is orthogonal.
Orthogonally Diagonalizable Matrices024297 An \(n \times n\) matrix \(A\) is said to be orthogonally diagonalizable when an orthogonal matrix \(P\) can be found such that \(P^{-1}AP = P^{T}AP\) is diagonal.
This condition turns out to characterize the symmetric matrices.
Principal Axes Theorem024303 The following conditions are equivalent for an \(n \times n\) matrix \(A\).
- \(A\) has an orthonormal set of \(n\) eigenvectors.
- \(A\) is orthogonally diagonalizable.
- \(A\) is symmetric.
(1) \(\Leftrightarrow\) (2). Given (1), let \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\) be orthonormal eigenvectors of \(A\). Then \(P = \left[ \begin{array}{cccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \dots & \mathbf{x}_{n} \end{array}\right]\) is orthogonal, and \(P^{-1}AP\) is diagonal by Theorem [thm:009214]. This proves (2). Conversely, given (2) let \(P^{-1}AP\) be diagonal where \(P\) is orthogonal. If \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\) are the columns of \(P\) then \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\}\) is an orthonormal basis of \(\mathbb{R}^n\) that consists of eigenvectors of \(A\) by Theorem [thm:009214]. This proves (1).
(2) \(\Rightarrow\) (3). If \(P^{T}AP = D\) is diagonal, where \(P^{-1} = P^{T}\), then \(A = PDP^{T}\). But \(D^{T} = D\), so this gives \(A^{T} = P^{TT}D^{T}P^{T} = PDP^{T} = A\).
(3) \(\Rightarrow\) (2). If \(A\) is an \(n \times n\) symmetric matrix, we proceed by induction on \(n\). If \(n = 1\), \(A\) is already diagonal. If \(n > 1\), assume that (3) \(\Rightarrow\) (2) for \((n - 1) \times (n - 1)\) symmetric matrices. By Theorem [thm:016397] let \(\lambda_{1}\) be a (real) eigenvalue of \(A\), and let \(A\mathbf{x}_{1} = \lambda_{1}\mathbf{x}_{1}\), where \(\|\mathbf{x}_{1}\| = 1\). Use the Gram-Schmidt algorithm to find an orthonormal basis \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\}\) for \(\mathbb{R}^n\). Let \(P_{1} = \left[ \begin{array}{cccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \dots & \mathbf{x}_{n} \end{array}\right]\), so \(P_{1}\) is an orthogonal matrix and \(P_{1}^TAP_{1} = \left[ \begin{array}{cc} \lambda_{1} & B \\ 0 & A_{1} \end{array}\right]\) in block form by Lemma [lem:016161]. But \(P_{1}^TAP_{1}\) is symmetric (\(A\) is), so it follows that \(B = 0\) and \(A_{1}\) is symmetric. Then, by induction, there exists an \((n - 1) \times (n - 1)\) orthogonal matrix \(Q\) such that \(Q^{T}A_{1}Q = D_{1}\) is diagonal. Observe that \(P_{2} = \left[ \begin{array}{cc} 1 & 0\\ 0 & Q \end{array}\right]\) is orthogonal, and compute:
\[\begin{aligned} (P_{1}P_{2})^TA(P_{1}P_{2}) &= P_{2}^T(P_{1}^TAP_{1})P_{2} \\ &= \left[ \begin{array}{cc} 1 & 0 \\ 0 & Q^T \end{array}\right] \left[ \begin{array}{cc} \lambda_{1} & 0 \\ 0 & A_{1} \end{array}\right]\left[ \begin{array}{cc} 1 & 0 \\ 0 & Q \end{array}\right]\\ &= \left[ \begin{array}{cc} \lambda_{1} & 0 \\ 0 & D_{1} \end{array}\right]\end{aligned} \nonumber \]
is diagonal. Because \(P_{1}P_{2}\) is orthogonal, this proves (2).
A set of orthonormal eigenvectors of a symmetric matrix \(A\) is called a set of principal axes for \(A\). The name comes from geometry, and this is discussed in Section [sec:8_8]. Because the eigenvalues of a (real) symmetric matrix are real, Theorem [thm:024303] is also called the real spectral theorem, and the set of distinct eigenvalues is called the spectrum of the matrix. In full generality, the spectral theorem is a similar result for matrices with complex entries (Theorem [thm:025860]).
024374 Find an orthogonal matrix \(P\) such that \(P^{-1}AP\) is diagonal, where \(A = \left[ \begin{array}{rrr} 1 & 0 & -1 \\ 0 & 1 & 2 \\ -1 & 2 & 5 \end{array}\right]\).
The characteristic polynomial of \(A\) is (adding twice row 1 to row 2):
\[c_{A}(x) = \det \left[ \begin{array}{ccc} x - 1 & 0 & 1 \\ 0 & x - 1 & -2 \\ 1 & -2 & x - 5 \end{array}\right] = x(x - 1)(x - 6) \nonumber \]
Thus the eigenvalues are \(\lambda = 0\), \(1\), and \(6\), and corresponding eigenvectors are
\[\mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ -2 \\ 1 \end{array}\right] \; \mathbf{x}_{2} = \left[ \begin{array}{r} 2 \\ 1 \\ 0 \end{array}\right] \; \mathbf{x}_{3} = \left[ \begin{array}{r} -1 \\ 2 \\ 5 \end{array}\right] \nonumber \]
respectively. Moreover, by what appears to be remarkably good luck, these eigenvectors are orthogonal. We have \(\|\mathbf{x}_{1}\|^{2} = 6\), \(\|\mathbf{x}_{2}\|^{2} = 5\), and \(\|\mathbf{x}_{3}\|^{2} = 30\), so
\[P = \left[ \begin{array}{ccc} \frac{1}{\sqrt{6}}\mathbf{x}_{1} & \frac{1}{\sqrt{5}}\mathbf{x}_{2} & \frac{1}{\sqrt{30}}\mathbf{x}_{3} \end{array}\right] = \frac{1}{\sqrt{30}} \left[ \begin{array}{ccc} \sqrt{5} & 2\sqrt{6} & -1 \\ -2\sqrt{5} & \sqrt{6} & 2 \\ \sqrt{5} & 0 & 5 \end{array}\right] \nonumber \]
is an orthogonal matrix. Thus \(P^{-1} = P^{T}\) and
\[P^TAP = \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 6 \end{array}\right] \nonumber \]
by the diagonalization algorithm.
Actually, the fact that the eigenvectors in Example [exa:024374] are orthogonal is no coincidence. Theorem [thm:016090] guarantees they are linearly independent (they correspond to distinct eigenvalues); the fact that the matrix is symmetric implies that they are orthogonal. To prove this we need the following useful fact about symmetric matrices.
024396 If A is an \(n \times n\) symmetric matrix, then
\[(A\mathbf{x})\bullet \mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) \nonumber \]
for all columns \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathbb{R}^n\).
Recall that \(\mathbf{x}\bullet \mathbf{y} = \mathbf{x}^{T} \mathbf{y}\) for all columns \(\mathbf{x}\) and \(\mathbf{y}\). Because \(A^{T} = A\), we get
\[(A\mathbf{x})\bullet \mathbf{y} = (A\mathbf{x})^T\mathbf{y} = \mathbf{x}^TA^T\mathbf{y} = \mathbf{x}^TA\mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) \nonumber \]
024407 If \(A\) is a symmetric matrix, then eigenvectors of \(A\) corresponding to distinct eigenvalues are orthogonal.
Let \(A\mathbf{x} = \lambda \mathbf{x}\) and \(A\mathbf{y} = \mu \mathbf{y}\), where \(\lambda \neq \mu\). Using Theorem [thm:024396], we compute
\[\lambda(\mathbf{x}\bullet \mathbf{y}) = (\lambda\mathbf{x})\bullet \mathbf{y} = (A\mathbf{x})\bullet \mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) = \mathbf{x}\bullet (\mu\mathbf{y}) = \mu(\mathbf{x}\bullet \mathbf{y}) \nonumber \]
Hence \((\lambda - \mu)(\mathbf{x}\bullet \mathbf{y}) = 0\), and so \(\mathbf{x}\bullet \mathbf{y} = 0\) because \(\lambda \neq \mu\).
Now the procedure for diagonalizing a symmetric \(n \times n\) matrix is clear. Find the distinct eigenvalues (all real by Theorem [thm:016397]) and find orthonormal bases for each eigenspace (the Gram-Schmidt algorithm may be needed). Then the set of all these basis vectors is orthonormal (by Theorem [thm:024407]) and contains \(n\) vectors. Here is an example.
024416 Orthogonally diagonalize the symmetric matrix \(A = \left[ \begin{array}{rrr} 8 & -2 & 2 \\ -2 & 5 & 4 \\ 2 & 4 & 5 \end{array}\right]\).
The characteristic polynomial is
\[c_{A}(x) = \det \left[ \begin{array}{ccc} x-8 & 2 & -2 \\ 2 & x-5 & -4 \\ -2 & -4 & x-5 \end{array}\right] = x(x-9)^2 \nonumber \]
Hence the distinct eigenvalues are \(0\) and \(9\) of multiplicities \(1\) and \(2\), respectively, so \(dim \;(E_{0}) = 1\) and \(dim \;(E_{9}) = 2\) by Theorem [thm:016250] (\(A\) is diagonalizable, being symmetric). Gaussian elimination gives
\[E_{0}(A) = span \;\{\mathbf{x}_{1}\}, \enskip \mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ 2 \\ -2 \end{array}\right], \quad \mbox{ and } \quad E_{9}(A) = span \; \left\lbrace \left[ \begin{array}{r} -2 \\ 1 \\ 0 \end{array}\right], \left[ \begin{array}{r} 2 \\ 0 \\ 1 \end{array}\right] \right\rbrace \nonumber \]
The eigenvectors in \(E_{9}\) are both orthogonal to \(\mathbf{x}_{1}\) as Theorem [thm:024407] guarantees, but not to each other. However, the Gram-Schmidt process yields an orthogonal basis
\[\{\mathbf{x}_{2}, \mathbf{x}_{3}\} \mbox{ of } E_{9}(A) \quad \mbox{ where } \quad \mathbf{x}_{2} = \left[ \begin{array}{r} -2 \\ 1 \\ 0 \end{array}\right] \mbox{ and } \mathbf{x}_{3} = \left[ \begin{array}{r} 2 \\ 4 \\ 5 \end{array}\right] \nonumber \]
Normalizing gives orthonormal vectors \(\{\frac{1}{3}\mathbf{x}_{1}, \frac{1}{\sqrt{5}}\mathbf{x}_{2}, \frac{1}{3\sqrt{5}}\mathbf{x}_{3}\}\), so
\[P = \left[ \begin{array}{rrr} \frac{1}{3}\mathbf{x}_{1} & \frac{1}{\sqrt{5}}\mathbf{x}_{2} & \frac{1}{3\sqrt{5}}\mathbf{x}_{3} \end{array}\right] = \frac{1}{3\sqrt{5}}\left[ \begin{array}{rrr} \sqrt{5} & -6 & 2 \\ 2\sqrt{5} & 3 & 4 \\ -2\sqrt{5} & 0 & 5 \end{array}\right] \nonumber \]
is an orthogonal matrix such that \(P^{-1}AP\) is diagonal.
It is worth noting that other, more convenient, diagonalizing matrices \(P\) exist. For example, \(\mathbf{y}_{2} = \left[ \begin{array}{r} 2 \\ 1 \\ 2 \end{array}\right]\) and \(\mathbf{y}_{3} = \left[ \begin{array}{r} -2 \\ 2 \\ 1 \end{array}\right]\) lie in \(E_{9}(A)\) and they are orthogonal. Moreover, they both have norm \(3\) (as does \(\mathbf{x}_{1}\)), so
\[Q = \left[ \begin{array}{ccc} \frac{1}{3}\mathbf{x}_{1} & \frac{1}{3}\mathbf{y}_{2} & \frac{1}{3}\mathbf{y}_{3} \end{array}\right] = \frac{1}{3}\left[ \begin{array}{rrr} 1 & 2 & -2 \\ 2 & 1 & 2 \\ -2 & 2 & 1 \end{array}\right] \nonumber \]
is a nicer orthogonal matrix with the property that \(Q^{-1}AQ\) is diagonal.
If \(A\) is symmetric and a set of orthogonal eigenvectors of \(A\) is given, the eigenvectors are called principal axes of \(A\). The name comes from geometry. An expression \(q = ax_{1}^2 + bx_{1}x_{2} + cx_{2}^2\) is called a quadratic form in the variables \(x_{1}\) and \(x_{2}\), and the graph of the equation \(q = 1\) is called a conic in these variables. For example, if \(q = x_{1}x_{2}\), the graph of \(q = 1\) is given in the first diagram.
But if we introduce new variables \(y_{1}\) and \(y_{2}\) by setting \(x_{1} = y_{1} + y_{2}\) and \(x_{2} = y_{1} - y_{2}\), then \(q\) becomes \(q = y_{1}^2 - y_{2}^2\), a diagonal form with no cross term \(y_{1}y_{2}\) (see the second diagram). Because of this, the \(y_{1}\) and \(y_{2}\) axes are called the principal axes for the conic (hence the name). Orthogonal diagonalization provides a systematic method for finding principal axes. Here is an illustration.
024463 Find principal axes for the quadratic form \(q = x_{1}^2 -4x_{1}x_{2} + x_{2}^2\).
In order to utilize diagonalization, we first express \(q\) in matrix form. Observe that
\[q = \left[ \begin{array}{cc} x_{1} & x_{2} \end{array}\right] \left[ \begin{array}{rr} 1 & -4 \\ 0 & 1 \end{array}\right] \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right] \nonumber \]
The matrix here is not symmetric, but we can remedy that by writing
\[q = x_{1}^2 -2x_{1}x_{2} - 2x_{2}x_{1} + x_{2}^2 \nonumber \]
Then we have
\[q = \left[ \begin{array}{cc} x_{1} & x_{2} \end{array}\right] \left[ \begin{array}{rr} 1 & -2 \\ -2 & 1 \end{array}\right] \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right] = \mathbf{x}^TA\mathbf{x} \nonumber \]
where \(\mathbf{x} = \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right]\) and \(A = \left[ \begin{array}{rr} 1 & -2 \\ -2 & 1 \end{array}\right]\) is symmetric. The eigenvalues of \(A\) are \(\lambda_{1} = 3\) and \(\lambda_{2} = -1\), with corresponding (orthogonal) eigenvectors \(\mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ -1 \end{array}\right]\) and \(\mathbf{x}_{2} = \left[ \begin{array}{c} 1 \\ 1 \end{array}\right]\). Since \(\| \mathbf{x}_{1} \| = \| \mathbf{x}_{2} \| = \sqrt{2}\), so
\[P = \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array}\right] \mbox{ is orthogonal and } P^TAP = D = \left[ \begin{array}{rr} 3 & 0 \\ 0 & -1 \end{array}\right] \nonumber \]
Now define new variables \(\left[ \begin{array}{c} y_{1} \\ y_{2} \end{array}\right] = \mathbf{y}\) by \(\mathbf{y} = P^{T}\mathbf{x}\), equivalently \(\mathbf{x} = P\mathbf{y}\) (since \(P^{-1} = P^{T}\)). Hence
\[y_{1} = \frac{1}{\sqrt{2}}(x_{1} - x_{2}) \quad \mbox{ and } \quad y_{2} = \frac{1}{\sqrt{2}}(x_{1} + x_{2}) \nonumber \]
In terms of \(y_{1}\) and \(y_{2}\), \(q\) takes the form
\[q = \mathbf{x}^TA\mathbf{x} = (P\mathbf{y})^TA(P\mathbf{y}) = \mathbf{y}^T(P^TAP)\mathbf{y} = \mathbf{y}^TD\mathbf{y} = 3y_{1}^2 - y_{2}^2 \nonumber \]
Note that \(\mathbf{y} = P^{T}\mathbf{x}\) is obtained from \(\mathbf{x}\) by a counterclockwise rotation of \(\frac{\pi}{4}\) (see Theorem [thm:004693]).
Observe that the quadratic form \(q\) in Example [exa:024463] can be diagonalized in other ways. For example
\[q = x_{1}^2 - 4x_{1}x_2 + x_{2}^2 = z_{1}^2 - \frac{1}{3}z_{2}^2 \nonumber \]
where \(z_{1} = x_{1} -2x_{2}\) and \(z_{2} = 3x_{2}\). We examine this more carefully in Section [sec:8_8].
If we are willing to replace “diagonal” by “upper triangular” in the principal axes theorem, we can weaken the requirement that \(A\) is symmetric to insisting only that \(A\) has real eigenvalues.
Triangulation Theorem024503 If \(A\) is an \(n \times n\) matrix with \(n\) real eigenvalues, an orthogonal matrix \(P\) exists such that \(P^{T}AP\) is upper triangular.
We modify the proof of Theorem [thm:024303]. If \(A\mathbf{x}_{1} = \lambda_{1}\mathbf{x}_{1}\) where \(\|\mathbf{x}_{1}\| = 1\), let \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\}\) be an orthonormal basis of \(\mathbb{R}^n\), and let \(P_{1} = \left[ \begin{array}{cccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \cdots & \mathbf{x}_{n} \end{array}\right]\). Then \(P_{1}\) is orthogonal and \(P_{1}^TAP_{1} = \left[ \begin{array}{cc} \lambda_{1} & B \\ 0 & A_{1} \end{array}\right]\) in block form. By induction, let \(Q^{T}A_{1}Q = T_{1}\) be upper triangular where \(Q\) is of size \((n-1)\times(n-1)\) and orthogonal. Then \(P_{2} = \left[ \begin{array}{cc} 1 & 0 \\ 0 & Q \end{array}\right]\) is orthogonal, so \(P = P_{1}P_{2}\) is also orthogonal and \(P^TAP = \left[ \begin{array}{cc} \lambda_{1} & BQ \\ 0 & T_{1} \end{array}\right]\) is upper triangular.
The proof of Theorem [thm:024503] gives no way to construct the matrix \(P\). However, an algorithm will be given in Section [sec:11_1] where an improved version of Theorem [thm:024503] is presented. In a different direction, a version of Theorem [thm:024503] holds for an arbitrary matrix with complex entries (Schur’s theorem in Section [sec:8_6]).
As for a diagonal matrix, the eigenvalues of an upper triangular matrix are displayed along the main diagonal. Because \(A\) and \(P^{T}AP\) have the same determinant and trace whenever \(P\) is orthogonal, Theorem [thm:024503] gives:
024536 If \(A\) is an \(n \times n\) matrix with real eigenvalues \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) (possibly not all distinct), then \(\det A = \lambda_{1}\lambda_{2} \dots \lambda_{n}\) and \(\func{tr}A = \lambda_{1} + \lambda_{2} + \dots + \lambda_{n}\).
This corollary remains true even if the eigenvalues are not real (using Schur’s theorem).


