8.1: Orthogonal Complements and Projections
- Page ID
- 58876
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Orthogonal Complements and Projections
If \(\{\mathbf{v}_{1}, \dots , \mathbf{v}_{m}\}\) is linearly independent in a general vector space, and if \(\mathbf{v}_{m+1}\) is not in \(span \;\{\mathbf{v}_{1}, \dots , \mathbf{v}_{m}\}\), then \(\{\mathbf{v}_{1}, \dots , \mathbf{v}_{m}, \mathbf{v}_{m+1}\}\) is independent (Lemma [lem:019357]). Here is the analog for orthogonal sets in \(\mathbb{R}^n\).
Orthogonal Lemma023597 Let \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots , \mathbf{f}_{m}\}\) be an orthogonal set in \(\mathbb{R}^n\). Given \(\mathbf{x}\) in \(\mathbb{R}^n\), write
\[\mathbf{f}_{m+1} = \mathbf{x} - \frac{\mathbf{x}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} - \frac{\mathbf{x}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} - \dots - \frac{\mathbf{x}\bullet \mathbf{f}_{m}}{\| \mathbf{f}_{m} \|^2}\mathbf{f}_{m} \nonumber \]
Then:
- \(\mathbf{f}_{m+1}\bullet \mathbf{f}_{k} = 0\) for \(k = 1, 2, \dots , m\).
- If \(\mathbf{x}\) is not in \(span \;\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\), then \(\mathbf{f}_{m+1} \neq \mathbf{0}\) and \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}, \mathbf{f}_{m+1}\}\) is an orthogonal set.
For convenience, write \(t_{i} = (\mathbf{x}\bullet \mathbf{f}_{i}) / \|\mathbf{f}_{i}\|^{2}\) for each \(i\). Given \(1 \leq k \leq m\):
\[\begin{aligned} \mathbf{f}_{m+1}\bullet \mathbf{f}_{k} &= (\mathbf{x} - t_{1}\mathbf{f}_{1} - \dots -t_{k}\mathbf{f}_{k} - \dots -t_{m}\mathbf{f}_{m})\bullet \mathbf{f}_{k} \\ &= \mathbf{x}\bullet \mathbf{f}_{k} - t_{1}(\mathbf{f}_{1}\bullet \mathbf{f}_{k}) - \dots -t_{k}(\mathbf{f}_{k}\bullet \mathbf{f}_{k}) - \dots -t_{m}(\mathbf{f}_{m}\bullet \mathbf{f}_{k}) \\ &= \mathbf{x}\bullet \mathbf{f}_{k} - t_{k} \| \mathbf{f}_{k} \|^2 \\ &= 0 \end{aligned} \nonumber \]
This proves (1), and (2) follows because \(\mathbf{f}_{m + 1} \neq \mathbf{0}\) if \(\mathbf{x}\) is not in \(span \;\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\).
The orthogonal lemma has three important consequences for \(\mathbb{R}^n\). The first is an extension for orthogonal sets of the fundamental fact that any independent set is part of a basis (Theorem [thm:019430]).
023635 Let \(U\) be a subspace of \(\mathbb{R}^n\).
- Every orthogonal subset \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\) in \(U\) is a subset of an orthogonal basis of \(U\).
- \(U\) has an orthogonal basis.
- If \(span \;\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\} = U\), it is already a basis. Otherwise, there exists \(\mathbf{x}\) in \(U\) outside \(span \;\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\). If \(\mathbf{f}_{m+1}\) is as given in the orthogonal lemma, then \(\mathbf{f}_{m+1}\) is in \(U\) and \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}, \mathbf{f}_{m+1}\}\) is orthogonal. If \(span \;\{\mathbf{f}_{1}, \dots, \mathbf{f}_{m}, \mathbf{f}_{m+1}\} = U\), we are done. Otherwise, the process continues to create larger and larger orthogonal subsets of \(U\). They are all independent by Theorem [thm:015056], so we have a basis when we reach a subset containing dim \(U\) vectors.
- If \(U = \{\mathbf{0}\}\), the empty basis is orthogonal. Otherwise, if \(\mathbf{f} \neq \mathbf{0}\) is in \(U\), then \(\{\mathbf{f}\}\) is orthogonal, so (2) follows from (1).
We can improve upon (2) of Theorem [thm:023635]. In fact, the second consequence of the orthogonal lemma is a procedure by which any basis \(\{\mathbf{x}_{1}, \dots , \mathbf{x}_{m}\}\) of a subspace \(U\) of \(\mathbb{R}^n\) can be systematically modified to yield an orthogonal basis \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\) of \(U\). The \(\mathbf{f}_{i}\) are constructed one at a time from the \(\mathbf{x}_{i}\).
To start the process, take \(\mathbf{f}_{1} = \mathbf{x}_{1}\). Then \(\mathbf{x}_{2}\) is not in \(span \;\{\mathbf{f}_{1}\}\) because \(\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) is independent, so take
\[\mathbf{f}_{2} = \mathbf{x}_{2} - \frac{\mathbf{x}_{2}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} \nonumber \]
Thus \(\{\mathbf{f}_{1}, \mathbf{f}_{2}\}\) is orthogonal by Lemma [lem:023597]. Moreover, \(span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}\} = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) (verify), so \(\mathbf{x}_{3}\) is not in \(span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}\}\). Hence \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \mathbf{f}_{3}\}\) is orthogonal where
\[\mathbf{f}_{3} = \mathbf{x}_{3} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} \nonumber \]
Again, \(span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}, \mathbf{f}_{3}\} = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}, \mathbf{x}_{3}\}\), so \(\mathbf{x}_{4}\) is not in \(span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}, \mathbf{f}_{3}\}\) and the process continues. At the \(m\)th iteration we construct an orthogonal set \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\) such that
\[span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{m}\} = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{m}\} = U \nonumber \]
Hence \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots , \mathbf{f}_{m}\}\) is the desired orthogonal basis of \(U\). The procedure can be summarized as follows.
Gram-Schmidt Orthogonalization Algorithm023713 If \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots , \mathbf{x}_{m}\}\) is any basis of a subspace \(U\) of \(\mathbb{R}^n\), construct \(\mathbf{f}_{1}, \mathbf{f}_{2}, \dots , \mathbf{f}_{m}\) in \(U\) successively as follows:
\[\begin{array}{ccl} \mathbf{f}_{1} &=& \mathbf{x}_{1} \\ \mathbf{f}_{2} &=& \mathbf{x}_{2} - \frac{\mathbf{x}_{2}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} \\ \mathbf{f}_{3} &=& \mathbf{x}_{3} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} \\ \vdots &&\\ \mathbf{f}_{k} &=& \mathbf{x}_{k} - \frac{\mathbf{x}_{k}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} - \frac{\mathbf{x}_{k}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} - \dots -\frac{\mathbf{x}_{k}\bullet \mathbf{f}_{k-1}}{\| \mathbf{f}_{k-1} \|^2}\mathbf{f}_{k-1} \end{array} \nonumber \]
for each \(k = 2, 3, \dots , m\). Then
- \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots , \mathbf{f}_{m}\}\) is an orthogonal basis of \(U\).
- \(span \;\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots , \mathbf{f}_{k}\} = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots , \mathbf{x}_{k}\}\) for each \(k = 1, 2, \dots , m\).
The process (for \(k = 3\)) is depicted in the diagrams. Of course, the algorithm converts any basis of \(\mathbb{R}^n\) itself into an orthogonal basis.
023743 Find an orthogonal basis of the row space of \(A = \left[ \begin{array}{rrrr} 1 & 1 & -1 & -1\\ 3 & 2 & 0 & 1\\ 1 & 0 & 1 & 0 \end{array} \right]\).
Let \(\mathbf{x}_{1}\), \(\mathbf{x}_{2}\), \(\mathbf{x}_{3}\) denote the rows of \(A\) and observe that \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \mathbf{x}_{3}\}\) is linearly independent. Take \(\mathbf{f}_{1} = \mathbf{x}_{1}\). The algorithm gives
\[\begin{aligned} \mathbf{f}_{2} &= \mathbf{x}_{2} - \frac{\mathbf{x}_{2}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} = (3, 2, 0, 1) - \frac{4}{4}(1, 1, -1, -1) = (2, 1, 1, 2) \\ \mathbf{f}_{3} &= \mathbf{x}_{3} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} - \frac{\mathbf{x}_{3}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} = \mathbf{x}_{3} - \frac{0}{4}\mathbf{f}_{1} - \frac{3}{10}\mathbf{f}_{2} = \frac{1}{10}(4, -3, 7, -6)\end{aligned} \nonumber \]
Hence \(\{(1, 1, -1, -1), (2, 1, 1, 2), \frac{1}{10}(4, -3, 7, -6)\}\) is the orthogonal basis provided by the algorithm. In hand calculations it may be convenient to eliminate fractions (see the Remark below), so \(\{(1, 1, -1, -1), (2, 1, 1, 2), (4, -3, 7, -6)\}\) is also an orthogonal basis for row \(A\).
Observe that the vector \(\frac{\mathbf{x}\bullet \mathbf{f}_{i}}{\| \mathbf{f}_{i} \|^2}\mathbf{f}_{i}\) is unchanged if a nonzero scalar multiple of \(\mathbf{f}_{i}\) is used in place of \(\mathbf{f}_{i}\). Hence, if a newly constructed \(\mathbf{f}_{i}\) is multiplied by a nonzero scalar at some stage of the Gram-Schmidt algorithm, the subsequent \(\mathbf{f}\)s will be unchanged. This is useful in actual calculations.
Projections
Suppose a point \(\mathbf{x}\) and a plane \(U\) through the origin in \(\mathbb{R}^3\) are given, and we want to find the point \(\mathbf{p}\) in the plane that is closest to \(\mathbf{x}\). Our geometric intuition assures us that such a point \(\mathbf{p}\) exists. In fact (see the diagram), \(\mathbf{p}\) must be chosen in such a way that \(\mathbf{x} - \mathbf{p}\) is perpendicular to the plane.
Now we make two observations: first, the plane \(U\) is a subspace of \(\mathbb{R}^3\) (because \(U\) contains the origin); and second, that the condition that \(\mathbf{x} - \mathbf{p}\) is perpendicular to the plane \(U\) means that \(\mathbf{x} - \mathbf{p}\) is orthogonal to every vector in \(U\). In these terms the whole discussion makes sense in \(\mathbb{R}^n\). Furthermore, the orthogonal lemma provides exactly what is needed to find \(\mathbf{p}\) in this more general setting.
Orthogonal Complement of a Subspace of \(\mathbb{R}^n\)023776 If \(U\) is a subspace of \(\mathbb{R}^n\), define the orthogonal complement \(U^\perp\) of \(U\) (pronounced “\(U\)-perp”) by
\[U^\perp = \{\mathbf{x} \mbox{ in } \mathbb{R}^n \mid \mathbf{x}\bullet \mathbf{y} = 0 \mbox{ for all } \mathbf{y} \mbox{ in } U\} \nonumber \]
The following lemma collects some useful properties of the orthogonal complement; the proof of (1) and (2) is left as Exercise [ex:8_1_6].
023783 Let \(U\) be a subspace of \(\mathbb{R}^n\).
- \(U^\perp\) is a subspace of \(\mathbb{R}^n\).
- \(\{\mathbf{0}\}^\perp = \mathbb{R}^n\) and \((\mathbb{R}^n)^\perp = \{\mathbf{0}\}\).
- If \(U = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\}\), then \(U^\perp = \{\mathbf{x} \mbox{ in } \mathbb{R}^n \mid \mathbf{x}\bullet \mathbf{x}_{i} = 0 \mbox{ for } i = 1, 2, \dots, k\}\).
- Let \(U = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{k}\}\); we must show that \(U^\perp = \{\mathbf{x} \mid \mathbf{x}\bullet \mathbf{x}_{i} = 0 \mbox{ for each } i\}\). If \(\mathbf{x}\) is in \(U^\perp\) then \(\mathbf{x}\bullet \mathbf{x}_{i} = 0\) for all \(i\) because each \(\mathbf{x}_{i}\) is in \(U\). Conversely, suppose that \(\mathbf{x}\bullet \mathbf{x}_{i} = 0\) for all \(i\); we must show that \(\mathbf{x}\) is in \(U^\perp\), that is, \(\mathbf{x}\bullet \mathbf{y} = 0\) for each \(\mathbf{y}\) in \(U\). Write \(\mathbf{y} = r_{1}\mathbf{x}_{1} + r_{2}\mathbf{x}_{2} + \dots + r_{k}\mathbf{x}_{k}\), where each \(r_{i}\) is in \(\mathbb{R}\). Then, using Theorem [thm:014833],
\[\mathbf{x}\bullet \mathbf{y} = r_{1}(\mathbf{x}\bullet \mathbf{x}_{1}) + r_{2}(\mathbf{x}\bullet \mathbf{x}_{2})+ \dots +r_{k}(\mathbf{x}\bullet \mathbf{x}_{k}) = r_{1}0 + r_{2}0 + \dots + r_{k}0 = 0 \nonumber \]
023829 Find \(U^\perp\) if \(U = span \;\{(1, -1, 2, 0), (1, 0, -2, 3)\}\) in \(\mathbb{R}^4\).
By Lemma [lem:023783], \(\mathbf{x} = (x, y, z, w)\) is in \(U^\perp\) if and only if it is orthogonal to both \((1, -1, 2, 0)\) and \((1, 0, -2, 3)\); that is,
\[\def\arraycolsep{1.5pt} \begin{array}{rrrrrrrr} x & - & y & + & 2z & & & =0\\ x & & & - & 2z & +& 3w & =0 \end{array} \nonumber \]
Gaussian elimination gives \(U^\perp = span \;\{(2, 4, 1, 0), (3, 3, 0, -1)\}\).
Now consider vectors \(\mathbf{x}\) and \(\mathbf{d} \neq \mathbf{0}\) in \(\mathbb{R}^3\). The projection \(\mathbf{p} = \proj{\mathbf{d}}{\mathbf{x}}\) of \(\mathbf{x}\) on \(\mathbf{d}\) was defined in Section [sec:4_2] as in the diagram.
The following formula for \(\mathbf{p}\) was derived in Theorem [thm:011958]
\[\mathbf{p} = \proj{\mathbf{d}}{\mathbf{x}} = \left(\frac{\mathbf{x}\bullet \mathbf{d}}{\|\mathbf{d}\|^2} \right)\mathbf{d} \nonumber \]
where it is shown that \(\mathbf{x} - \mathbf{p}\) is orthogonal to \(\mathbf{d}\). Now observe that the line \(U = \mathbb{R}\mathbf{d} = \{t\mathbf{d} \mid t \in \mathbb{R}\}\) is a subspace of \(\mathbb{R}^3\), that \(\{\mathbf{d}\}\) is an orthogonal basis of \(U\), and that \(\mathbf{p} \in U\) and \(\mathbf{x} - \mathbf{p} \in U^\perp\) (by Theorem [thm:011958]).
In this form, this makes sense for any vector \(\mathbf{x}\) in \(\mathbb{R}^n\) and any subspace \(U\) of \(\mathbb{R}^n\), so we generalize it as follows. If \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{m}\}\) is an orthogonal basis of \(U\), we define the projection \(\mathbf{p}\) of \(\mathbf{x}\) on \(U\) by the formula
\[\label{projPofXonUeq} \mathbf{p} = \left(\frac{\mathbf{x}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\right)\mathbf{f}_{1} + \left(\frac{\mathbf{x}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\right)\mathbf{f}_{2}+ \dots +\left(\frac{\mathbf{x}\bullet \mathbf{f}_{m}}{\| \mathbf{f}_{m} \|^2}\right)\mathbf{f}_{m} \nonumber \]
Then \(\mathbf{p} \in U\) and (by the orthogonal lemma) \(\mathbf{x} - \mathbf{p} \in U^\perp\), so it looks like we have a generalization of Theorem [thm:011958].
However there is a potential problem: the formula ([projPofXonUeq]) for \(\mathbf{p}\) must be shown to be independent of the choice of the orthogonal basis \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{m}\}\). To verify this, suppose that \(\{\mathbf{f}_1^\prime, \mathbf{f}_2^\prime, \dots, \mathbf{f}_m^\prime \}\) is another orthogonal basis of \(U\), and write
\[\mathbf{p}^{\prime} = \left(\frac{\mathbf{x}\bullet \mathbf{f}^{\prime}_{1}}{\| \mathbf{f}^{\prime}_{1} \|^2}\right)\mathbf{f}^{\prime}_{1} + \left(\frac{\mathbf{x}\bullet \mathbf{f}^{\prime}_{2}}{\| \mathbf{f}^{\prime}_{2} \|^2}\right)\mathbf{f}^{\prime}_{2} + \dots +\left(\frac{\mathbf{x}\bullet \mathbf{f}^{\prime}_{m}}{\| \mathbf{f}^{\prime}_{m} \|^2}\right)\mathbf{f}^{\prime}_{m} \nonumber \]
As before, \(\mathbf{p}^{\prime} \in U\) and \(\mathbf{x} - \mathbf{p}^{\prime} \in U^\perp\), and we must show that \(\mathbf{p}^{\prime} = \mathbf{p}\). To see this, write the vector \(\mathbf{p} - \mathbf{p}^\prime\) as follows:
\[\mathbf{p} - \mathbf{p}^{\prime} = (\mathbf{x} - \mathbf{p}^{\prime}) - (\mathbf{x} - \mathbf{p}) \nonumber \]
This vector is in \(U\) (because \(\mathbf{p}\) and \(\mathbf{p}^\prime\) are in \(U\)) and it is in \(U^\perp\) (because \(\mathbf{x} - \mathbf{p}^\prime\) and \(\mathbf{x} - \mathbf{p}\) are in \(U^\perp\)), and so it must be zero (it is orthogonal to itself!). This means \(\mathbf{p}^\prime = \mathbf{p}\) as desired.
Hence, the vector \(\mathbf{p}\) in equation ([projPofXonUeq]) depends only on \(\mathbf{x}\) and the subspace \(U\), and not on the choice of orthogonal basis \(\{\mathbf{f}_{1}, \dots, \mathbf{f}_{m}\}\) of \(U\) used to compute it. Thus, we are entitled to make the following definition:
Projection onto a Subspace of \(\mathbb{R}^n\)023874 Let \(U\) be a subspace of \(\mathbb{R}^n\) with orthogonal basis \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{m}\}\). If \(\mathbf{x}\) is in \(\mathbb{R}^n\), the vector
\[{\normalfont \proj{U}{\mathbf{x}} =\frac{\mathbf{x}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} + \frac{\mathbf{x}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2}+ \dots +\frac{\mathbf{x}\bullet \mathbf{f}_{m}}{\| \mathbf{f}_{m} \|^2}\mathbf{f}_{m}} \nonumber \]
is called the orthogonal projection of \(\mathbf{x}\) on \(U\). For the zero subspace \(U = \{\mathbf{0}\}\), we define
\[{\normalfont \proj{\{\mathbf{0}\}}{\mathbf{x}} = \mathbf{0}} \nonumber \]
The preceding discussion proves (1) of the following theorem.
Projection Theorem023885 If \(U\) is a subspace of \(\mathbb{R}^n\) and \(\mathbf{x}\) is in \(\mathbb{R}^n\), write \(\mathbf{p} = \proj{U}{\mathbf{x}}\). Then:
- \(\mathbf{p}\) is in \(U\) and \(\mathbf{x} - \mathbf{p}\) is in \(U^\perp\).
- \(\mathbf{p}\) is the vector in \(U\) closest to \(\mathbf{x}\) in the sense that
\[\| \mathbf{x} - \mathbf{p} \| < \| \mathbf{x} - \mathbf{y} \| \quad \mbox{ for all }\mathbf{y} \in U, \mathbf{y} \neq \mathbf{p} \nonumber \]
- This is proved in the preceding discussion (it is clear if \(U = \{\mathbf{0}\}\)).
- Write \(\mathbf{x} - \mathbf{y} = (\mathbf{x} - \mathbf{p}) + (\mathbf{p} - \mathbf{y})\). Then \(\mathbf{p} - \mathbf{y}\) is in \(U\) and so is orthogonal to \(\mathbf{x} - \mathbf{p}\) by (1). Hence, the Pythagorean theorem gives
\[\| \mathbf{x} - \mathbf{y} \|^2 = \| \mathbf{x} - \mathbf{p} \|^2 +\| \mathbf{p} - \mathbf{y} \|^2 > \| \mathbf{x} - \mathbf{p} \|^2 \nonumber \]
023908 Let \(U = span \;\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) in \(\mathbb{R}^4\) where \(\mathbf{x}_{1} = (1, 1, 0, 1)\) and \(\mathbf{x}_{2} = (0, 1, 1, 2)\). If \(\mathbf{x} = (3, -1, 0, 2)\), find the vector in \(U\) closest to \(\mathbf{x}\) and express \(\mathbf{x}\) as the sum of a vector in \(U\) and a vector orthogonal to \(U\).
\(\{\mathbf{x}_{1}, \mathbf{x}_{2}\}\) is independent but not orthogonal. The Gram-Schmidt process gives an orthogonal basis \(\{\mathbf{f}_{1}, \mathbf{f}_{2}\}\) of \(U\) where \(\mathbf{f}_{1} = \mathbf{x}_{1} = (1, 1, 0, 1)\) and
\[\mathbf{f}_{2} =\mathbf{x}_{2} - \frac{\mathbf{x}_{2}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} = \mathbf{x}_{2} - \frac{3}{3}\mathbf{f}_{1} = (-1, 0, 1, 1) \nonumber \]
Hence, we can compute the projection using \(\{\mathbf{f}_{1}, \mathbf{f}_{2}\}\):
\[\mathbf{p} = \proj{U}{\mathbf{x}} =\frac{\mathbf{x}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} + \frac{\mathbf{x}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} = \frac{4}{3}\mathbf{f}_{1} + \frac{-1}{3}\mathbf{f}_{2} = \frac{1}{3}\left[ \begin{array}{rrrr} 5 & 4 & -1 & 3 \end{array}\right] \nonumber \]
Thus, \(\mathbf{p}\) is the vector in \(U\) closest to \(\mathbf{x}\), and \(\mathbf{x} - \mathbf{p} = \frac{1}{3}(4, -7, 1, 3)\) is orthogonal to every vector in \(U\). (This can be verified by checking that it is orthogonal to the generators \(\mathbf{x}_{1}\) and \(\mathbf{x}_{2}\) of \(U\).) The required decomposition of \(\mathbf{x}\) is thus
\[\mathbf{x} = \mathbf{p} + (\mathbf{x} - \mathbf{p}) = \frac{1}{3}(5, 4, -1, 3) + \frac{1}{3}(4, -7, 1, 3) \nonumber \]
023934 Find the point in the plane with equation \(2x + y - z = 0\) that is closest to the point \((2, -1, -3)\).
We write \(\mathbb{R}^3\) as rows. The plane is the subspace \(U\) whose points \((x, y, z)\) satisfy \(z = 2x + y\). Hence
\[U = \{(s, t, 2s + t) \mid s,t \mbox{ in }\mathbb{R}\} = span \;\{(0, 1, 1),(1, 0, 2)\} \nonumber \]
The Gram-Schmidt process produces an orthogonal basis \(\{\mathbf{f}_{1}, \mathbf{f}_{2}\}\) of \(U\) where \(\mathbf{f}_{1} = (0, 1, 1)\) and \(\mathbf{f}_{2} = (1, -1, 1)\). Hence, the vector in \(U\) closest to \(\mathbf{x} = (2, -1, -3)\) is
\[\proj{U}{\mathbf{x}} =\frac{\mathbf{x}\bullet \mathbf{f}_{1}}{\| \mathbf{f}_{1} \|^2}\mathbf{f}_{1} + \frac{\mathbf{x}\bullet \mathbf{f}_{2}}{\| \mathbf{f}_{2} \|^2}\mathbf{f}_{2} = -2\mathbf{f}_{1} + 0\mathbf{f}_{2} = (0, -2, -2) \nonumber \]
Thus, the point in \(U\) closest to \((2, -1, -3)\) is \((0, -2, -2)\).
The next theorem shows that projection on a subspace of \(\mathbb{R}^n\) is actually a linear operator \(\mathbb{R}^n \to \mathbb{R}^n\).
023953 Let \(U\) be a fixed subspace of \(\mathbb{R}^n\). If we define \(T : \mathbb{R}^n \to \mathbb{R}^n\) by
\[T(\mathbf{x}) = \proj{U}{\mathbf{x}} \quad \mbox{ for all }\mathbf{x}\mbox{ in }\mathbb{R}^n \nonumber \]
- \(T\) is a linear operator.
- \(im \;T = U\) and \(\text{ker}T = U^\perp\).
- \(dim \;U + dim \;U^\perp = n\).
If \(U = \{\mathbf{0}\}\), then \(U^\perp = \mathbb{R}^n\), and so \(T(\mathbf{x}) = \proj{\{\mathbf{0}\}}{\mathbf{x}} = \mathbf{0}\) for all \(\mathbf{x}\). Thus \(T = 0\) is the zero (linear) operator, so (1), (2), and (3) hold. Hence assume that \(U \neq \{\mathbf{0}\}\).
- If \(\{\mathbf{f}_{1}, \mathbf{f}_{2}, \dots, \mathbf{f}_{m}\}\) is an orthonormal basis of \(U\), then
\[\label{orthonormalUeq} T(\mathbf{x}) = (\mathbf{x}\bullet \mathbf{f}_{1})\mathbf{f}_{1} + (\mathbf{x}\bullet \mathbf{f}_{2})\mathbf{f}_{2} + \dots + (\mathbf{x}\bullet \mathbf{f}_{m})\mathbf{f}_{m} \quad \mbox{ for all }\mathbf{x} \mbox{ in } \mathbb{R}^n \nonumber \]
by the definition of the projection. Thus \(T\) is linear because
\[(\mathbf{x} + \mathbf{y})\bullet \mathbf{f}_{i} = \mathbf{x}\bullet \mathbf{f}_{i} + \mathbf{y}\bullet \mathbf{f}_{i} \quad \mbox{ and } \quad (r\mathbf{x})\bullet \mathbf{f}_{i} = r(\mathbf{x}\bullet \mathbf{f}_{i}) \quad \mbox{ for each } i \nonumber \]
- Now suppose that \(\mathbf{x}\) is in \(U^\perp\). Then \(\mathbf{x}\bullet \mathbf{f}_{i} = 0\) for each \(i\) (again because each \(\mathbf{f}_{i}\) is in \(U\)) so \(\mathbf{x}\) is in \(\text{ker}T\) by ([orthonormalUeq]). Hence \(U^\perp \subseteq \text{ker} T\). On the other hand, Theorem [thm:023885] shows that \(\mathbf{x} - T(\mathbf{x})\) is in \(U^\perp\) for all \(\mathbf{x}\) in \(\mathbb{R}^n\), and it follows that \(\text{ker}T \subseteq U^\perp\). Hence \(\text{ker}T = U^\perp\), proving (2).
- This follows from (1), (2), and the dimension theorem (Theorem [thm:021499]).