8.5: Computing Eigenvalues

Last updated
Save as PDF

Page ID: 58880

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

In practice, the problem of finding eigenvalues of a matrix is virtually never solved by finding the roots of the characteristic polynomial. This is difficult for large matrices and iterative methods are much better. Two such methods are described briefly in this section.

The Power Method

In Chapter [chap:3] our initial rationale for diagonalizing matrices was to be able to compute the powers of a square matrix, and the eigenvalues were needed to do this. In this section, we are interested in efficiently computing eigenvalues, and it may come as no surprise that the first method we discuss uses the powers of a matrix.

Recall that an eigenvalue \(\lambda\) of an \(n \times n\) matrix \(A\) is called a dominant eigenvalue if \(\lambda\) has multiplicity \(1\), and

\[|\lambda| > |\mu| \quad \mbox{ for all eigenvalues } \mu \neq \lambda \nonumber \]

Any corresponding eigenvector is called a dominant eigenvector of \(A\). When such an eigenvalue exists, one technique for finding it is as follows: Let \(\mathbf{x}_{0}\) in \(\mathbb{R}^n\) be a first approximation to a dominant eigenvector \(\lambda\), and compute successive approximations \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots\) as follows:

\[\mathbf{x}_{1} = A\mathbf{x}_{0} \quad \mathbf{x}_{2} = A\mathbf{x}_{1} \quad \mathbf{x}_{3} = A\mathbf{x}_{2} \quad \cdots \nonumber \]

In general, we define

\[\mathbf{x}_{k+1} = A\mathbf{x}_{k} \quad \mbox{ for each } k \geq 0 \nonumber \]

If the first estimate \(\mathbf{x}_{0}\) is good enough, these vectors \(\mathbf{x}_{n}\) will approximate the dominant eigenvector \(\lambda\) (see below). This technique is called the power method (because \(\mathbf{x}_{k} = A^{k}\mathbf{x}_{0}\) for each \(k \geq 1\)). Observe that if \(\mathbf{z}\) is any eigenvector corresponding to \(\lambda\), then

\[\frac{\mathbf{z}\bullet (A\mathbf{z})}{\| \mathbf{z} \|^2} = \frac{\mathbf{z}\bullet (\lambda\mathbf{z})}{\| \mathbf{z} \|^2} = \lambda \nonumber \]

Because the vectors \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{n}, \dots\) approximate dominant eigenvectors, this suggests that we define the Rayleigh quotients as follows:

\[r_{k} = \frac{\mathbf{x}_{k}\bullet \mathbf{x}_{k+1}}{\| \mathbf{x}_{k} \|^2} \quad \mbox{ for } k \geq 1 \nonumber \]

Then the numbers \(r_{k}\) approximate the dominant eigenvalue \(\lambda\).

025326 Use the power method to approximate a dominant eigenvector and eigenvalue of \(A = \left[ \begin{array}{rr} 1 & 1 \\ 2 & 0 \end{array}\right]\).

The eigenvalues of \(A\) are \(2\) and \(-1\), with eigenvectors \(\left[ \begin{array}{rr} 1 \\ 1 \end{array}\right]\) and \(\left[ \begin{array}{rr} 1 \\ -2 \end{array}\right]\). Take \(\mathbf{x}_{0} = \left[ \begin{array}{rr} 1 \\ 0 \end{array}\right]\) as the first approximation and compute \(\mathbf{x}_{1}, \mathbf{x}_{2}, \dots,\) successively, from \(\mathbf{x}_{1} = A\mathbf{x}_{0}, \mathbf{x}_{2} = A\mathbf{x}_{1}, \dots\). The result is

\[\mathbf{x}_{1} = \left[ \begin{array}{rr} 1 \\ 2 \end{array}\right], \ \mathbf{x}_{2} = \left[ \begin{array}{rr} 3 \\ 2 \end{array}\right], \ \mathbf{x}_{3} = \left[ \begin{array}{rr} 5 \\ 6 \end{array}\right], \ \mathbf{x}_{4} = \left[ \begin{array}{rr} 11 \\ 10 \end{array}\right], \ \mathbf{x}_{3} = \left[ \begin{array}{rr} 21 \\ 22 \end{array}\right], \ \dots \nonumber \]

These vectors are approaching scalar multiples of the dominant eigenvector \(\left[ \begin{array}{rr} 1 \\ 1 \end{array}\right]\). Moreover, the Rayleigh quotients are

\[r_{1} = \frac{7}{5}, r_{2} = \frac{27}{13}, r_{3} = \frac{115}{61}, r_{4} = \frac{451}{221}, \dots \nonumber \]

and these are approaching the dominant eigenvalue \(2\).

To see why the power method works, let \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{m}\) be eigenvalues of \(A\) with \(\lambda_{1}\) dominant and let \(\mathbf{y}_{1}, \mathbf{y}_{2}, \dots, \mathbf{y}_{m}\) be corresponding eigenvectors. What is required is that the first approximation \(\mathbf{x}_{0}\) be a linear combination of these eigenvectors:

\[\mathbf{x}_{0} = a_{1}\mathbf{y}_{1} + a_{2}\mathbf{y}_{2} + \dots + a_{m}\mathbf{y}_{m} \quad \mbox{ with } a_{1} \neq 0 \nonumber \]

If \(k \geq 1\), the fact that \(\mathbf{x}_{k} = A^{k}\mathbf{x}_{0}\) and \(A^k\mathbf{y}_{i} = \lambda_{i}^k\mathbf{y}_{i}\) for each \(i\) gives

\[\mathbf{x}_{k} = a_{1}\lambda_{1}^k\mathbf{y}_{1} + a_{2}\lambda_{2}^k\mathbf{y}_{2} + \dots + a_{m}\lambda_{m}^k\mathbf{y}_{m} \quad \mbox{ for } k \geq 1 \nonumber \]

Hence

\[\frac{1}{\lambda_{1}^k}\mathbf{x}_{k} = a_{1}\mathbf{y}_{1} + a_{2}\left(\frac{\lambda_{2}}{\lambda_{1}}\right)^k\mathbf{y}_{2} + \dots + a_{m}\left(\frac{\lambda_{m}}{\lambda_{1}}\right)^k\mathbf{y}_{m} \nonumber \]

The right side approaches \(a_{1}\mathbf{y}_{1}\) as \(k\) increases because \(\lambda_{1}\) is dominant \(\left( \left|\frac{\lambda_{i}}{\lambda_{1}} \right| < 1 \mbox{ for each } i > 1 \right)\). Because \(a_{1} \neq 0\), this means that \(\mathbf{x}_{k}\) approximates the dominant eigenvector \(a_{1}\lambda_{1}^k\mathbf{y}_{1}\).

The power method requires that the first approximation \(\mathbf{x}_{0}\) be a linear combination of eigenvectors. (In Example [exa:025326] the eigenvectors form a basis of \(\mathbb{R}^2\).) But even in this case the method fails if \(a_{1} = 0\), where \(a_{1}\) is the coefficient of the dominant eigenvector (try \(\mathbf{x}_{0} = \left[ \begin{array}{rr} -1 \\ 2 \end{array}\right]\) in Example [exa:025326]). In general, the rate of convergence is quite slow if any of the ratios \(\left| \frac{\lambda_{i}}{\lambda_{1}} \right|\) is near \(1\). Also, because the method requires repeated multiplications by \(A\), it is not recommended unless these multiplications are easy to carry out (for example, if most of the entries of \(A\) are zero).

QR-Algorithm

A much better method for approximating the eigenvalues of an invertible matrix \(A\) depends on the factorization (using the Gram-Schmidt algorithm) of \(A\) in the form

\[A = QR \nonumber \]

where \(Q\) is orthogonal and \(R\) is invertible and upper triangular (see Theorem [thm:025166]). The QR-algorithm uses this repeatedly to create a sequence of matrices \(A_{1} =A, A_{2}, A_{3}, \dots,\) as follows:

Define \(A_{1} = A\) and factor it as \(A_{1} = Q_{1}R_{1}\).
Define \(A_{2} = R_{1}Q_{1}\) and factor it as \(A_{2} = Q_{2}R_{2}\).
Define \(A_{3} = R_{2}Q_{2}\) and factor it as \(A_{3} = Q_{3}R_{3}\).
\(\vdots\)

In general, \(A_{k}\) is factored as \(A_{k} = Q_{k}R_{k}\) and we define \(A_{k + 1} = R_{k}Q_{k}\). Then \(A_{k + 1}\) is similar to \(A_{k}\) [in fact, \(A_{k+1} = R_{k}Q_{k} = (Q_{k}^{-1}A_{k})Q_{k}\)], and hence each \(A_{k}\) has the same eigenvalues as \(A\). If the eigenvalues of \(A\) are real and have distinct absolute values, the remarkable thing is that the sequence of matrices \(A_{1}, A_{2}, A_{3}, \dots\) converges to an upper triangular matrix with these eigenvalues on the main diagonal. [See below for the case of complex eigenvalues.]

025425 If \(A = \left[ \begin{array}{rr} 1 & 1 \\ 2 & 0 \end{array}\right]\) as in Example [exa:025326], use the QR-algorithm to approximate the eigenvalues.

The matrices \(A_{1}\), \(A_{2}\), and \(A_{3}\) are as follows:

\[\begin{aligned} A_{1} = & \left[ \begin{array}{rr} 1 & 1 \\ 2 & 0 \end{array}\right] = Q_{1}R_{1} \quad \mbox{ where } Q_{1} = \frac{1}{\sqrt{5}}\left[ \begin{array}{rr} 1 & 2 \\ 2 & -1 \end{array}\right] \mbox{ and } R_{1} = \frac{1}{\sqrt{5}}\left[ \begin{array}{rr} 5 & 1 \\ 0 & 2 \end{array}\right] \\ A_{2} = & \frac{1}{5}\left[ \begin{array}{rr} 7 & 9 \\ 4 & -2 \end{array}\right] = \left[ \begin{array}{rr} 1.4 & -1.8 \\ -0.8 & -0.4 \end{array}\right]= Q_{2}R_{2} \\ &\mbox{ where } Q_{2} = \frac{1}{\sqrt{65}}\left[ \begin{array}{rr} 7 & 4 \\ 4 & -7 \end{array}\right] \mbox{ and } R_{2} = \frac{1}{\sqrt{65}}\left[ \begin{array}{rr} 13 & 11 \\ 0 & 10 \end{array}\right] \\ A_{3} = &\frac{1}{13}\left[ \begin{array}{rr} 27 & -5 \\ 8 & -14 \end{array}\right] = \left[ \begin{array}{rr} 2.08 & -0.38 \\ 0.62 & -1.08 \end{array}\right]\end{aligned} \nonumber \]

This is converging to \(\left[ \begin{array}{rr} 2 & \ast \\ 0 & -1 \end{array}\right]\) and so is approximating the eigenvalues \(2\) and \(-1\) on the main diagonal.

It is beyond the scope of this book to pursue a detailed discussion of these methods. The reader is referred to J. M. Wilkinson, The Algebraic Eigenvalue Problem (Oxford, England: Oxford University Press, 1965) or G. W. Stewart, Introduction to Matrix Computations (New York: Academic Press, 1973). We conclude with some remarks on the QR-algorithm.

Shifting. Convergence is accelerated if, at stage \(k\) of the algorithm, a number \(s_{k}\) is chosen and \(A_{k} - s_{k}I\) is factored in the form \(Q_{k}R_{k}\) rather than \(A_{k}\) itself. Then

\[Q_{k}^{-1}A_{k}Q_{k} = Q_{k}^{-1}(Q_{k}R_{k} + s_{k}I)Q_{k} = R_{k}Q_{k} + s_{k}I \nonumber \]

so we take \(A_{k+1} = R_{k}Q_{k} + s_{k}I\). If the shifts \(s_{k}\) are carefully chosen, convergence can be greatly improved.

Preliminary Preparation. A matrix such as

\[\left[ \begin{array}{rrrrr} \ast & \ast & \ast & \ast & \ast \\ \ast & \ast & \ast & \ast & \ast \\ 0 & \ast & \ast & \ast & \ast \\ 0 & 0 & \ast & \ast & \ast \\ 0 & 0 & 0 & \ast & \ast \\ \end{array}\right] \nonumber \]

is said to be in upper Hessenberg form, and the QR-factorizations of such matrices are greatly simplified. Given an \(n \times n\) matrix \(A\), a series of orthogonal matrices \(H_{1}, H_{2}, \dots, H_{m}\) (called Householder matrices) can be easily constructed such that

\[B = H_{m}^T \cdots H_{1}^TAH_{1} \cdots H_{m} \nonumber \]

is in upper Hessenberg form. Then the QR-algorithm can be efficiently applied to \(B\) and, because \(B\) is similar to \(A\), it produces the eigenvalues of \(A\).

Complex Eigenvalues. If some of the eigenvalues of a real matrix \(A\) are not real, the QR-algorithm converges to a block upper triangular matrix where the diagonal blocks are either \(1 \times 1\) (the real eigenvalues) or \(2 \times 2\) (each providing a pair of conjugate complex eigenvalues of \(A\)).

Search

Text Color

Text Size

Margin Size

Font Type