Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

8.2: Orthogonal Diagonalization

( \newcommand{\kernel}{\mathrm{null}\,}\)

Recall (Theorem [thm:016068]) that an n×n matrix A is diagonalizable if and only if it has n linearly independent eigenvectors. Moreover, the matrix P with these eigenvectors as columns is a diagonalizing matrix for A, that is

P1AP is diagonal.

As we have seen, the really nice bases of Rn are the orthogonal ones, so a natural question is: which n×n matrices have an orthogonal basis of eigenvectors? These turn out to be precisely the symmetric matrices, and this is the main result of this section.

Before proceeding, recall that an orthogonal set of vectors is called orthonormal if v=1 for each vector v in the set, and that any orthogonal set {v1,v2,,vk} can be “normalized”, that is converted into an orthonormal set {1v1v1,1v2v2,,1vkvk}. In particular, if a matrix A has n orthogonal eigenvectors, they can (by normalizing) be taken to be orthonormal. The corresponding diagonalizing matrix P has orthonormal columns, and such matrices are very easy to invert.

024227 The following conditions are equivalent for an n×n matrix P.

  1. P is invertible and P1=PT.
  2. The rows of P are orthonormal.
  3. The columns of P are orthonormal.

First recall that condition (1) is equivalent to PPT=I by Corollary [cor:004612] of Theorem [thm:004553]. Let x1,x2,,xn denote the rows of P. Then xTj is the jth column of PT, so the (i,j)-entry of PPT is xixj. Thus PPT=I means that xixj=0 if ij and xixj=1 if i=j. Hence condition (1) is equivalent to (2). The proof of the equivalence of (1) and (3) is similar.

Orthogonal Matrices024256 An n×n matrix P is called an orthogonal matrixif it satisfies one (and hence all) of the conditions in Theorem [thm:024227].

024259 The rotation matrix [cosθsinθsinθcosθ] is orthogonal for any angle θ.

These orthogonal matrices have the virtue that they are easy to invert—simply take the transpose. But they have many other important properties as well. If T:RnRn is a linear operator, we will prove (Theorem [thm:032147]) that T is distance preserving if and only if its matrix is orthogonal. In particular, the matrices of rotations and reflections about the origin in R2 and R3 are all orthogonal (see Example [exa:024259]).

It is not enough that the rows of a matrix A are merely orthogonal for A to be an orthogonal matrix. Here is an example.

024269 The matrix [211111011] has orthogonal rows but the columns are not orthogonal. However, if the rows are normalized, the resulting matrix [26161613131301212] is orthogonal (so the columns are now orthonormal as the reader can verify).

024275 If P and Q are orthogonal matrices, then PQ is also orthogonal, as is P1=PT.

P and Q are invertible, so PQ is also invertible and

(PQ)1=Q1P1=QTPT=(PQ)T

Hence PQ is orthogonal. Similarly,

(P1)1=P=(PT)T=(P1)T

shows that P1 is orthogonal.

Orthogonally Diagonalizable Matrices024297 An n×n matrix A is said to be orthogonally diagonalizable when an orthogonal matrix P can be found such that P1AP=PTAP is diagonal.

This condition turns out to characterize the symmetric matrices.

Principal Axes Theorem024303 The following conditions are equivalent for an n×n matrix A.

  1. A has an orthonormal set of n eigenvectors.
  2. A is orthogonally diagonalizable.
  3. A is symmetric.

(1) (2). Given (1), let x1,x2,,xn be orthonormal eigenvectors of A. Then P=[x1x2xn] is orthogonal, and P1AP is diagonal by Theorem [thm:009214]. This proves (2). Conversely, given (2) let P1AP be diagonal where P is orthogonal. If x1,x2,,xn are the columns of P then {x1,x2,,xn} is an orthonormal basis of Rn that consists of eigenvectors of A by Theorem [thm:009214]. This proves (1).

(2) (3). If PTAP=D is diagonal, where P1=PT, then A=PDPT. But DT=D, so this gives AT=PTTDTPT=PDPT=A.

(3) (2). If A is an n×n symmetric matrix, we proceed by induction on n. If n=1, A is already diagonal. If n>1, assume that (3) (2) for (n1)×(n1) symmetric matrices. By Theorem [thm:016397] let λ1 be a (real) eigenvalue of A, and let Ax1=λ1x1, where x1=1. Use the Gram-Schmidt algorithm to find an orthonormal basis {x1,x2,,xn} for Rn. Let P1=[x1x2xn], so P1 is an orthogonal matrix and PT1AP1=[λ1B0A1] in block form by Lemma [lem:016161]. But PT1AP1 is symmetric (A is), so it follows that B=0 and A1 is symmetric. Then, by induction, there exists an (n1)×(n1) orthogonal matrix Q such that QTA1Q=D1 is diagonal. Observe that P2=[100Q] is orthogonal, and compute:

(P1P2)TA(P1P2)=PT2(PT1AP1)P2=[100QT][λ100A1][100Q]=[λ100D1]

is diagonal. Because P1P2 is orthogonal, this proves (2).

A set of orthonormal eigenvectors of a symmetric matrix A is called a set of principal axes for A. The name comes from geometry, and this is discussed in Section [sec:8_8]. Because the eigenvalues of a (real) symmetric matrix are real, Theorem [thm:024303] is also called the real spectral theorem, and the set of distinct eigenvalues is called the spectrum of the matrix. In full generality, the spectral theorem is a similar result for matrices with complex entries (Theorem [thm:025860]).

024374 Find an orthogonal matrix P such that P1AP is diagonal, where A=[101012125].

The characteristic polynomial of A is (adding twice row 1 to row 2):

cA(x)=det

Thus the eigenvalues are \lambda = 0, 1, and 6, and corresponding eigenvectors are

\mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ -2 \\ 1 \end{array}\right] \; \mathbf{x}_{2} = \left[ \begin{array}{r} 2 \\ 1 \\ 0 \end{array}\right] \; \mathbf{x}_{3} = \left[ \begin{array}{r} -1 \\ 2 \\ 5 \end{array}\right] \nonumber

respectively. Moreover, by what appears to be remarkably good luck, these eigenvectors are orthogonal. We have \|\mathbf{x}_{1}\|^{2} = 6, \|\mathbf{x}_{2}\|^{2} = 5, and \|\mathbf{x}_{3}\|^{2} = 30, so

P = \left[ \begin{array}{ccc} \frac{1}{\sqrt{6}}\mathbf{x}_{1} & \frac{1}{\sqrt{5}}\mathbf{x}_{2} & \frac{1}{\sqrt{30}}\mathbf{x}_{3} \end{array}\right] = \frac{1}{\sqrt{30}} \left[ \begin{array}{ccc} \sqrt{5} & 2\sqrt{6} & -1 \\ -2\sqrt{5} & \sqrt{6} & 2 \\ \sqrt{5} & 0 & 5 \end{array}\right] \nonumber

is an orthogonal matrix. Thus P^{-1} = P^{T} and

P^TAP = \left[ \begin{array}{ccc} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 6 \end{array}\right] \nonumber

by the diagonalization algorithm.

Actually, the fact that the eigenvectors in Example [exa:024374] are orthogonal is no coincidence. Theorem [thm:016090] guarantees they are linearly independent (they correspond to distinct eigenvalues); the fact that the matrix is symmetric implies that they are orthogonal. To prove this we need the following useful fact about symmetric matrices.

024396 If A is an n \times n symmetric matrix, then

(A\mathbf{x})\bullet \mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) \nonumber

for all columns \mathbf{x} and \mathbf{y} in \mathbb{R}^n.

Recall that \mathbf{x}\bullet \mathbf{y} = \mathbf{x}^{T} \mathbf{y} for all columns \mathbf{x} and \mathbf{y}. Because A^{T} = A, we get

(A\mathbf{x})\bullet \mathbf{y} = (A\mathbf{x})^T\mathbf{y} = \mathbf{x}^TA^T\mathbf{y} = \mathbf{x}^TA\mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) \nonumber

024407 If A is a symmetric matrix, then eigenvectors of A corresponding to distinct eigenvalues are orthogonal.

Let A\mathbf{x} = \lambda \mathbf{x} and A\mathbf{y} = \mu \mathbf{y}, where \lambda \neq \mu. Using Theorem [thm:024396], we compute

\lambda(\mathbf{x}\bullet \mathbf{y}) = (\lambda\mathbf{x})\bullet \mathbf{y} = (A\mathbf{x})\bullet \mathbf{y} = \mathbf{x}\bullet (A\mathbf{y}) = \mathbf{x}\bullet (\mu\mathbf{y}) = \mu(\mathbf{x}\bullet \mathbf{y}) \nonumber

Hence (\lambda - \mu)(\mathbf{x}\bullet \mathbf{y}) = 0, and so \mathbf{x}\bullet \mathbf{y} = 0 because \lambda \neq \mu.

Now the procedure for diagonalizing a symmetric n \times n matrix is clear. Find the distinct eigenvalues (all real by Theorem [thm:016397]) and find orthonormal bases for each eigenspace (the Gram-Schmidt algorithm may be needed). Then the set of all these basis vectors is orthonormal (by Theorem [thm:024407]) and contains n vectors. Here is an example.

024416 Orthogonally diagonalize the symmetric matrix A = \left[ \begin{array}{rrr} 8 & -2 & 2 \\ -2 & 5 & 4 \\ 2 & 4 & 5 \end{array}\right].

The characteristic polynomial is

c_{A}(x) = \det \left[ \begin{array}{ccc} x-8 & 2 & -2 \\ 2 & x-5 & -4 \\ -2 & -4 & x-5 \end{array}\right] = x(x-9)^2 \nonumber

Hence the distinct eigenvalues are 0 and 9 of multiplicities 1 and 2, respectively, so dim \;(E_{0}) = 1 and dim \;(E_{9}) = 2 by Theorem [thm:016250] (A is diagonalizable, being symmetric). Gaussian elimination gives

E_{0}(A) = span \;\{\mathbf{x}_{1}\}, \enskip \mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ 2 \\ -2 \end{array}\right], \quad \mbox{ and } \quad E_{9}(A) = span \; \left\lbrace \left[ \begin{array}{r} -2 \\ 1 \\ 0 \end{array}\right], \left[ \begin{array}{r} 2 \\ 0 \\ 1 \end{array}\right] \right\rbrace \nonumber

The eigenvectors in E_{9} are both orthogonal to \mathbf{x}_{1} as Theorem [thm:024407] guarantees, but not to each other. However, the Gram-Schmidt process yields an orthogonal basis

\{\mathbf{x}_{2}, \mathbf{x}_{3}\} \mbox{ of } E_{9}(A) \quad \mbox{ where } \quad \mathbf{x}_{2} = \left[ \begin{array}{r} -2 \\ 1 \\ 0 \end{array}\right] \mbox{ and } \mathbf{x}_{3} = \left[ \begin{array}{r} 2 \\ 4 \\ 5 \end{array}\right] \nonumber

Normalizing gives orthonormal vectors \{\frac{1}{3}\mathbf{x}_{1}, \frac{1}{\sqrt{5}}\mathbf{x}_{2}, \frac{1}{3\sqrt{5}}\mathbf{x}_{3}\}, so

P = \left[ \begin{array}{rrr} \frac{1}{3}\mathbf{x}_{1} & \frac{1}{\sqrt{5}}\mathbf{x}_{2} & \frac{1}{3\sqrt{5}}\mathbf{x}_{3} \end{array}\right] = \frac{1}{3\sqrt{5}}\left[ \begin{array}{rrr} \sqrt{5} & -6 & 2 \\ 2\sqrt{5} & 3 & 4 \\ -2\sqrt{5} & 0 & 5 \end{array}\right] \nonumber

is an orthogonal matrix such that P^{-1}AP is diagonal.

It is worth noting that other, more convenient, diagonalizing matrices P exist. For example, \mathbf{y}_{2} = \left[ \begin{array}{r} 2 \\ 1 \\ 2 \end{array}\right] and \mathbf{y}_{3} = \left[ \begin{array}{r} -2 \\ 2 \\ 1 \end{array}\right] lie in E_{9}(A) and they are orthogonal. Moreover, they both have norm 3 (as does \mathbf{x}_{1}), so

Q = \left[ \begin{array}{ccc} \frac{1}{3}\mathbf{x}_{1} & \frac{1}{3}\mathbf{y}_{2} & \frac{1}{3}\mathbf{y}_{3} \end{array}\right] = \frac{1}{3}\left[ \begin{array}{rrr} 1 & 2 & -2 \\ 2 & 1 & 2 \\ -2 & 2 & 1 \end{array}\right] \nonumber

is a nicer orthogonal matrix with the property that Q^{-1}AQ is diagonal.

If A is symmetric and a set of orthogonal eigenvectors of A is given, the eigenvectors are called principal axes of A. The name comes from geometry. An expression q = ax_{1}^2 + bx_{1}x_{2} + cx_{2}^2 is called a quadratic form in the variables x_{1} and x_{2}, and the graph of the equation q = 1 is called a conic in these variables. For example, if q = x_{1}x_{2}, the graph of q = 1 is given in the first diagram.

But if we introduce new variables y_{1} and y_{2} by setting x_{1} = y_{1} + y_{2} and x_{2} = y_{1} - y_{2}, then q becomes q = y_{1}^2 - y_{2}^2, a diagonal form with no cross term y_{1}y_{2} (see the second diagram). Because of this, the y_{1} and y_{2} axes are called the principal axes for the conic (hence the name). Orthogonal diagonalization provides a systematic method for finding principal axes. Here is an illustration.

024463 Find principal axes for the quadratic form q = x_{1}^2 -4x_{1}x_{2} + x_{2}^2.

In order to utilize diagonalization, we first express q in matrix form. Observe that

q = \left[ \begin{array}{cc} x_{1} & x_{2} \end{array}\right] \left[ \begin{array}{rr} 1 & -4 \\ 0 & 1 \end{array}\right] \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right] \nonumber

The matrix here is not symmetric, but we can remedy that by writing

q = x_{1}^2 -2x_{1}x_{2} - 2x_{2}x_{1} + x_{2}^2 \nonumber

Then we have

q = \left[ \begin{array}{cc} x_{1} & x_{2} \end{array}\right] \left[ \begin{array}{rr} 1 & -2 \\ -2 & 1 \end{array}\right] \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right] = \mathbf{x}^TA\mathbf{x} \nonumber

where \mathbf{x} = \left[ \begin{array}{c} x_{1} \\ x_{2} \end{array}\right] and A = \left[ \begin{array}{rr} 1 & -2 \\ -2 & 1 \end{array}\right] is symmetric. The eigenvalues of A are \lambda_{1} = 3 and \lambda_{2} = -1, with corresponding (orthogonal) eigenvectors \mathbf{x}_{1} = \left[ \begin{array}{r} 1 \\ -1 \end{array}\right] and \mathbf{x}_{2} = \left[ \begin{array}{c} 1 \\ 1 \end{array}\right]. Since \| \mathbf{x}_{1} \| = \| \mathbf{x}_{2} \| = \sqrt{2}, so

P = \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array}\right] \mbox{ is orthogonal and } P^TAP = D = \left[ \begin{array}{rr} 3 & 0 \\ 0 & -1 \end{array}\right] \nonumber

Now define new variables \left[ \begin{array}{c} y_{1} \\ y_{2} \end{array}\right] = \mathbf{y} by \mathbf{y} = P^{T}\mathbf{x}, equivalently \mathbf{x} = P\mathbf{y} (since P^{-1} = P^{T}). Hence

y_{1} = \frac{1}{\sqrt{2}}(x_{1} - x_{2}) \quad \mbox{ and } \quad y_{2} = \frac{1}{\sqrt{2}}(x_{1} + x_{2}) \nonumber

In terms of y_{1} and y_{2}, q takes the form

q = \mathbf{x}^TA\mathbf{x} = (P\mathbf{y})^TA(P\mathbf{y}) = \mathbf{y}^T(P^TAP)\mathbf{y} = \mathbf{y}^TD\mathbf{y} = 3y_{1}^2 - y_{2}^2 \nonumber

Note that \mathbf{y} = P^{T}\mathbf{x} is obtained from \mathbf{x} by a counterclockwise rotation of \frac{\pi}{4} (see Theorem [thm:004693]).

Observe that the quadratic form q in Example [exa:024463] can be diagonalized in other ways. For example

q = x_{1}^2 - 4x_{1}x_2 + x_{2}^2 = z_{1}^2 - \frac{1}{3}z_{2}^2 \nonumber

where z_{1} = x_{1} -2x_{2} and z_{2} = 3x_{2}. We examine this more carefully in Section [sec:8_8].

If we are willing to replace “diagonal” by “upper triangular” in the principal axes theorem, we can weaken the requirement that A is symmetric to insisting only that A has real eigenvalues.

Triangulation Theorem024503 If A is an n \times n matrix with n real eigenvalues, an orthogonal matrix P exists such that P^{T}AP is upper triangular.

We modify the proof of Theorem [thm:024303]. If A\mathbf{x}_{1} = \lambda_{1}\mathbf{x}_{1} where \|\mathbf{x}_{1}\| = 1, let \{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}\} be an orthonormal basis of \mathbb{R}^n, and let P_{1} = \left[ \begin{array}{cccc} \mathbf{x}_{1} & \mathbf{x}_{2} & \cdots & \mathbf{x}_{n} \end{array}\right]. Then P_{1} is orthogonal and P_{1}^TAP_{1} = \left[ \begin{array}{cc} \lambda_{1} & B \\ 0 & A_{1} \end{array}\right] in block form. By induction, let Q^{T}A_{1}Q = T_{1} be upper triangular where Q is of size (n-1)\times(n-1) and orthogonal. Then P_{2} = \left[ \begin{array}{cc} 1 & 0 \\ 0 & Q \end{array}\right] is orthogonal, so P = P_{1}P_{2} is also orthogonal and P^TAP = \left[ \begin{array}{cc} \lambda_{1} & BQ \\ 0 & T_{1} \end{array}\right] is upper triangular.

The proof of Theorem [thm:024503] gives no way to construct the matrix P. However, an algorithm will be given in Section [sec:11_1] where an improved version of Theorem [thm:024503] is presented. In a different direction, a version of Theorem [thm:024503] holds for an arbitrary matrix with complex entries (Schur’s theorem in Section [sec:8_6]).

As for a diagonal matrix, the eigenvalues of an upper triangular matrix are displayed along the main diagonal. Because A and P^{T}AP have the same determinant and trace whenever P is orthogonal, Theorem [thm:024503] gives:

024536 If A is an n \times n matrix with real eigenvalues \lambda_{1}, \lambda_{2}, \dots, \lambda_{n} (possibly not all distinct), then \det A = \lambda_{1}\lambda_{2} \dots \lambda_{n} and \func{tr}A = \lambda_{1} + \lambda_{2} + \dots + \lambda_{n}.

This corollary remains true even if the eigenvalues are not real (using Schur’s theorem).


This page titled 8.2: Orthogonal Diagonalization is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by W. Keith Nicholson (Lyryx Learning Inc.) via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?