8.4: QR-Factorization
- Page ID
- 58879
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)One of the main virtues of orthogonal matrices is that they can be easily inverted—the transpose is the inverse. This fact, combined with the factorization theorem in this section, provides a useful way to simplify many matrix calculations (for example, in least squares approximation).
QR-factorization025100 Let \(A\) be an \(m \times n\) matrix with independent columns. A QR-factorization1 of \(A\) expresses it as \(A = QR\) where \(Q\) is \(m \times n\) with orthonormal columns and \(R\) is an invertible and upper triangular matrix with positive diagonal entries.
The importance of the factorization lies in the fact that there are computer algorithms that accomplish it with good control over round-off error, making it particularly useful in matrix calculations. The factorization is a matrix version of the Gram-Schmidt process.
Suppose \(A = \left[ \begin{array}{cccc} \mathbf{c}_{1} & \mathbf{c}_{2} & \cdots & \mathbf{c}_{n} \end{array}\right]\) is an \(m \times n\) matrix with linearly independent columns \(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{n}\). The Gram-Schmidt algorithm can be applied to these columns to provide orthogonal columns \(\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{n}\) where \(\mathbf{f}_{1} = \mathbf{c}_{1}\) and
\[\mathbf{f}_{k} = \mathbf{c}_{k} - \frac{\mathbf{c}_{k}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} + \frac{\mathbf{c}_{k}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} - \dots - \frac{\mathbf{c}_{k}\bullet \mathbf{f}_{k-1}}{\| \mathbf{f}_{k-1} \|^2}\mathbf{f}_{k-1} \nonumber \]
for each \(k = 2, 3, \dots, n\). Now write \(\mathbf{q}_{k} = \frac{1}{\| \mathbf{f}_{k} \|}\mathbf{f}_{k}\) for each \(k\). Then \(\mathbf{q}_{1}, \mathbf{q}_{2}, \dots, \mathbf{q}_{n}\) are orthonormal columns, and the above equation becomes
\[\| \mathbf{f}_{k} \| \mathbf{q}_{k} = \mathbf{c}_{k} - (\mathbf{c}_{k}\bullet \mathbf{q}_{1})\mathbf{q}_{1} - (\mathbf{c}_{k}\bullet \mathbf{q}_{2})\mathbf{q}_{2} - \dots - (\mathbf{c}_{k}\bullet \mathbf{q}_{k-1})\mathbf{q}_{k-1} \nonumber \]
Using these equations, express each \(\mathbf{c}_{k}\) as a linear combination of the \(\mathbf{q}_{i}\):
\[\begin{array}{ccl} \mathbf{c}_{1} &=& \| \mathbf{f}_{1} \| \mathbf{q}_{1} \\ \mathbf{c}_{2} &=& (\mathbf{c}_{2}\bullet \mathbf{q}_{1})\mathbf{q}_{1} + \| \mathbf{f}_{2} \| \mathbf{q}_{2} \\ \mathbf{c}_{3} &=& (\mathbf{c}_{3}\bullet \mathbf{q}_{1})\mathbf{q}_{1} + (\mathbf{c}_{3}\bullet \mathbf{q}_{2})\mathbf{q}_{2} + \| \mathbf{f}_{3} \| \mathbf{q}_{3} \\ \vdots && \vdots \\ \mathbf{c}_{n} &=& (\mathbf{c}_{n}\bullet \mathbf{q}_{1})\mathbf{q}_{1} + (\mathbf{c}_{n}\bullet \mathbf{q}_{2})\mathbf{q}_{2} + (\mathbf{c}_{n}\bullet \mathbf{q}_{3})\mathbf{q}_{3} + \dots + \| \mathbf{f}_{n} \| \mathbf{q}_{n} \end{array} \nonumber \]
These equations have a matrix form that gives the required factorization:
\[\begin{aligned} A & = \left[ \begin{array}{ccccc} \mathbf{c}_{1} & \mathbf{c}_{2} & \mathbf{c}_{3} & \cdots & \mathbf{c}_{n} \end{array} \right] \nonumber \\ &= \left[ \begin{array}{ccccc} \mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3} & \cdots & \mathbf{q}_{n} \end{array} \right] \left[ \begin{array}{ccccc} \| \mathbf{f}_{1} \| & \mathbf{c}_{2}\bullet \mathbf{q}_{1} & \mathbf{c}_{3}\bullet \mathbf{q}_{1} & \cdots & \mathbf{c}_{n}\bullet \mathbf{q}_{1} \\ 0 & \| \mathbf{f}_{2} \| & \mathbf{c}_{3}\bullet \mathbf{q}_{2} & \cdots & \mathbf{c}_{n}\bullet \mathbf{q}_{2} \\ 0 & 0 & \| \mathbf{f}_{3} \| & \cdots & \mathbf{c}_{n}\bullet \mathbf{q}_{3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \| \mathbf{f}_{n} \| \end{array} \right] \label{matrixFactEq}\end{aligned} \]
Here the first factor \(Q = \left[ \begin{array}{ccccc} \mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3} & \cdots & \mathbf{q}_{n} \end{array}\right]\) has orthonormal columns, and the second factor is an \(n \times n\) upper triangular matrix \(R\) with positive diagonal entries (and so is invertible). We record this in the following theorem.
QR-Factorization025133 Every \(m \times n\) matrix \(A\) with linearly independent columns has a QR-factorization \(A = QR\) where \(Q\) has orthonormal columns and \(R\) is upper triangular with positive diagonal entries.
The matrices \(Q\) and \(R\) in Theorem [thm:025133] are uniquely determined by \(A\); we return to this below.
025139 Find the QR-factorization of \(A = \left[ \begin{array}{rrr} 1 & 1 & 0 \\ -1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{array}\right]\).
Denote the columns of \(A\) as \(\mathbf{c}_{1}\), \(\mathbf{c}_{2}\), and \(\mathbf{c}_{3}\), and observe that \(\{\mathbf{c}_{1}, \mathbf{c}_{2}, \mathbf{c}_{3}\}\) is independent. If we apply the Gram-Schmidt algorithm to these columns, the result is:
\[\mathbf{f}_{1} = \mathbf{c}_{1} = \left[ \begin{array}{r} 1 \\ -1 \\ 0 \\ 0 \end{array}\right], \quad \mathbf{f}_{2} = \mathbf{c}_{2} - \frac{1}{2}\mathbf{f}_{1} = \left[ \def\arraystretch{1.2} \begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ 1 \\ 0 \end{array}\right], \mbox{ and } \quad \mathbf{f}_{3} = \mathbf{c}_{3} + \frac{1}{2}\mathbf{f}_{1} - \mathbf{f}_{2} = \left[ \begin{array}{r} 0 \\ 0 \\ 0 \\ 1 \end{array}\right]. \nonumber \]
Write \(\mathbf{q}_{j} = \frac{1}{\| \mathbf{f}_{j} \|^2}\mathbf{f}_{j}\) for each \(j\), so \(\{\mathbf{q}_{1}, \mathbf{q}_{2}, \mathbf{q}_{3}\}\) is orthonormal. Then equation ([matrixFactEq]) preceding Theorem [thm:025133] gives \(A = QR\) where
\[\begin{aligned} Q &= \left[ \begin{array}{ccc} \mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3} \end{array}\right] = \left[ \def\arraystretch{1.3}\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & 0 \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & 0 \\ 0 & \frac{2}{\sqrt{6}} & 0 \\ 0 & 0 & 1 \end{array}\right] = \frac{1}{\sqrt{6}} \left[ \begin{array}{ccc} \sqrt{3} & 1 & 0 \\ -\sqrt{3} & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & \sqrt{6} \end{array} \right] \\ R &= \left[ \begin{array}{ccc} \| \mathbf{f}_{1} \| & \mathbf{c}_{2}\bullet \mathbf{q}_{1} & \mathbf{c}_{3}\bullet \mathbf{q}_{1} \\ 0 & \| \mathbf{f}_{2} \| & \mathbf{c}_{3}\bullet \mathbf{q}_{2} \\ 0 & 0 & \| \mathbf{f}_{3} \| \\ \end{array} \right] = \left[ \def\arraystretch{1.5} \begin{array}{ccc} \sqrt{2} & \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{2}} \\ 0 & \frac{\sqrt{3}}{\sqrt{2}} & \frac{\sqrt{3}}{\sqrt{2}} \\ 0 & 0 & 1 \end{array} \right] = \frac{1}{\sqrt{2}}\left[ \begin{array}{ccc} 2 & 1 & -1 \\ 0 & \sqrt{3} & \sqrt{3} \\ 0 & 0 & \sqrt{2} \end{array} \right] \end{aligned} \nonumber \]
The reader can verify that indeed \(A = QR\).
If a matrix \(A\) has independent rows and we apply QR-factorization to \(A^{T}\), the result is:
025162 If \(A\) has independent rows, then \(A\) factors uniquely as \(A = LP\) where \(P\) has orthonormal rows and \(L\) is an invertible lower triangular matrix with positive main diagonal entries.
Since a square matrix with orthonormal columns is orthogonal, we have
025166 Every square, invertible matrix \(A\) has factorizations \(A = QR\) and \(A = LP\) where \(Q\) and \(P\) are orthogonal, \(R\) is upper triangular with positive diagonal entries, and \(L\) is lower triangular with positive diagonal entries.
In Section [sec:5_6] we found how to find a best approximation \(\mathbf{z}\) to a solution of a (possibly inconsistent) system \(A\mathbf{x} = \mathbf{b}\) of linear equations: take \(\mathbf{z}\) to be any solution of the “normal” equations \((A^{T}A)\mathbf{z} = A^{T}\mathbf{b}\). If \(A\) has independent columns this \(\mathbf{z}\) is unique (\(A^{T}A\) is invertible by Theorem [thm:015672]), so it is often desirable to compute \((A^{T}A)^{-1}\). This is particularly useful in least squares approximation (Section [sec:5_6]). This is simplified if we have a QR-factorization of \(A\) (and is one of the main reasons for the importance of Theorem [thm:025133]). For if \(A = QR\) is such a factorization, then \(Q^{T}Q = I_{n}\) because \(Q\) has orthonormal columns (verify), so we obtain
\[A^TA = R^TQ^TQR = R^TR \nonumber \]
Hence computing \((A^{T}A)^{-1}\) amounts to finding \(R^{-1}\), and this is a routine matter because \(R\) is upper triangular. Thus the difficulty in computing \((A^{T}A)^{-1}\) lies in obtaining the QR-factorization of \(A\).
We conclude by proving the uniqueness of the QR-factorization.
025187 Let \(A\) be an \(m \times n\) matrix with independent columns. If \(A = QR\) and \(A = Q_{1}R_{1}\) are QR-factorizations of \(A\), then \(Q_{1} = Q\) and \(R_{1} = R\).
Write \(Q = \left[ \begin{array}{cccc} \mathbf{c}_{1} & \mathbf{c}_{2} & \cdots & \mathbf{c}_{n} \end{array} \right]\) and \(Q_{1} = \left[ \begin{array}{cccc} \mathbf{d}_{1} & \mathbf{d}_{2} & \cdots & \mathbf{d}_{n} \end{array} \right]\) in terms of their columns, and observe first that \(Q^TQ = I_{n} = Q_{1}^TQ_{1}\) because \(Q\) and \(Q_{1}\) have orthonormal columns. Hence it suffices to show that \(Q_{1} = Q\) (then \(R_{1} = Q_{1}^TA = Q^TA = R\)). Since \(Q_{1}^TQ_{1} = I_{n}\), the equation \(QR = Q_{1}R_{1}\) gives \(Q_{1}^TQ = R_{1}R^{-1}\); for convenience we write this matrix as
\[Q_{1}^TQ = R_{1}R^{-1} = \left[ \begin{array}{c} t_{ij} \end{array}\right] \nonumber \]
This matrix is upper triangular with positive diagonal elements (since this is true for \(R\) and \(R_{1}\)), so \(t_{ii} > 0\) for each \(i\) and \(t_{ij} = 0\) if \(i > j\). On the other hand, the \((i, j)\)-entry of \(Q_{1}^TQ\) is \(\mathbf{d}_{i}^T\mathbf{c}_{j} = \mathbf{d}_{i}\bullet \mathbf{c}_{j}\), so we have \(\mathbf{d}_{i}\bullet \mathbf{c}_{j} = t_{ij}\) for all \(i\) and \(j\). But each \(\mathbf{c}_{j}\) is in \(span \;\{\mathbf{d}_{1}, \mathbf{d}_{2}, \dots, \mathbf{d}_{n}\}\) because \(Q = Q_{1}(R_{1}R^{-1})\). Hence the expansion theorem gives
\[\mathbf{c}_{j} = (\mathbf{d}_{1}\bullet \mathbf{c}_{j})\mathbf{d}_{1} + (\mathbf{d}_{2}\bullet \mathbf{c}_{j})\mathbf{d}_{2} + \dots + (\mathbf{d}_{n}\bullet \mathbf{c}_{j})\mathbf{d}_{n} = t_{1j}\mathbf{d}_{1} + t_{2j}\mathbf{d}_{2} + \dots + t_{jj}\mathbf{d}_{i} \nonumber \]
because \(\mathbf{d}_{i}\bullet \mathbf{c}_{j} = t_{ij} = 0\) if \(i > j\). The first few equations here are
\[\begin{array}{ccl} \mathbf{c}_{1} &=& t_{11}\mathbf{d}_{1} \\ \mathbf{c}_{2} &=& t_{12}\mathbf{d}_{1} + t_{22}\mathbf{d}_{2} \\ \mathbf{c}_{3} &=& t_{13}\mathbf{d}_{1} + t_{23}\mathbf{d}_{2} + t_{33}\mathbf{d}_{3} \\ \mathbf{c}_{4} &=& t_{14}\mathbf{d}_{1} + t_{24}\mathbf{d}_{2} + t_{34}\mathbf{d}_{3} + t_{44}\mathbf{d}_{4} \\ \vdots && \vdots \end{array} \nonumber \]
The first of these equations gives \(1 = \| \mathbf{c}_{1} \| = \| t_{11}\mathbf{d}_{1} \| = | t_{11} | \| \mathbf{d}_{1} \| = t_{11}\), whence \(\mathbf{c}_{1} = \mathbf{d}_{1}\). But then we have \(t_{12} = \mathbf{d}_{1}\bullet \mathbf{c}_{2} = \mathbf{c}_{1}\bullet \mathbf{c}_{2} = 0\), so the second equation becomes \(\mathbf{c}_{2} = t_{22}\mathbf{d}_{2}\). Now a similar argument gives \(\mathbf{c}_{2} = \mathbf{d}_{2}\), and then \(t_{13} = 0\) and \(t_{23} = 0\) follows in the same way. Hence \(\mathbf{c}_{3} = t_{33}\mathbf{d}_{3}\) and \(\mathbf{c}_{3} = \mathbf{d}_{3}\). Continue in this way to get \(\mathbf{c}_{i} = \mathbf{d}_{i}\) for all \(i\). This means that \(Q_{1} = Q\), which is what we wanted.
- This section is not used elsewhere in the book↩