8.6: The Singular Value Decomposition
- Page ID
- 58881
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)When working with a square matrix \(A\) it is clearly useful to be able to “diagonalize” \(A\), that is to find a factorization \(A=Q^{-1}DQ\) where \(Q\) is invertible and \(D\) is diagonal. Unfortunately such a factorization may not exist for \(A\). However, even if \(A\) is not square gaussian elimination provides a factorization of the form \(A=PDQ\) where \(P\) and \(Q\) are invertible and \(D\) is diagonal—the Smith Normal form (Theorem [thm:005369]). However, if \(A\) is real we can choose \(P\) and \(Q\) to be orthogonal real matrices and \(D\) to be real. Such a factorization is called a singular value decomposition (SVD) for \(A\), one of the most useful tools in applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix \(A\), and illustrate some of its many applications.
We need a fact about two subspaces associated with an \(m\times n\) matrix \(A\):
\[im \;A=\{A\mathbf{x}\mid \mathbf{x} \mbox{ in } \mathbb{R}^{n}\}\qquad \mbox{and}\qquad \func{col}A=span \;\{\mathbf{a}\mid\mathbf{a} \mbox{ is a column of }A\} \nonumber \]
Then \(im \;A\) is called the image of \(A\) (so named because of the linear transformation \(\mathbb{R}^{n}\rightarrow \mathbb{R}^{m}\) with \(\mathbf{x}\mapsto A\mathbf{x}\)); and \(\func{col}A\) is called the column space of \(A\) (Definition [def:015376]). Surprisingly, these spaces are equal:
svdlemma1 For any \(m\times n\) matrix \(A\), \(im \; A=\func{col} A\).
Let \(A=\left[ \begin{array}{cccc} \mathbf{a}_{1} & \mathbf{a}_{2} & \cdots & \mathbf{a}_{n} \end{array}\right]\) in terms of its columns. Let \(\mathbf{x}\in im \; A\), say \(\mathbf{x}=A\mathbf{y}\), \(\mathbf{y}\) in \(\mathbb{R}^{n}\). If \(\mathbf{y}=\left[ \begin{array}{cccc} y_{1} & y_{2} & \cdots & y_{n}\end{array}\right]^{T}\), then \(A\mathbf{y}=y_{1}\mathbf{a}_{1}+y_{2}\mathbf{a}_{2}+\cdots +y_{n}\mathbf{a}_{n}\in \func{col} A\) by Definition [def:002668]. This shows that \(im \; A\subseteq \func{col}A\). For the other inclusion, each \(\mathbf{a}_{k}=A\mathbf{e}_{k}\) where \(\mathbf{e}_{k}\) is column \(k\) of \(I_{n}\).
Singular Value Decompositions
We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem [thm:016397]), and it is orthogonally diagonalizable by the Principal Axes Theorem (Theorem [thm:024303]). So for any real matrix \(A\) (square or not), the fact that both \(A^{T}A\) and \(AA^{T}\) are real and symmetric suggests that we can learn a lot about \(A\) by studying them. This section shows just how true this is.
The following Lemma reveals some similarities between \(A^{T}A\) and \(AA^{T}\) which simplify the statement and the proof of the SVD we are constructing.
svdlemma2 Let \(A\) be a real \(m\times n\) matrix. Then:
- The eigenvalues of \(A^{T}A\) and \(AA^{T}\) are real and non-negative.
- \(A^{T}A\) and \(AA^{T}\) have the same set of positive eigenvalues.
- Then (1.) follows for \(A^{T}A\), and the case \(AA^{T}\) follows by replacing \(A\) by \(A^{T}\).
- Moreover, \(A\mathbf{q}\neq \mathbf{0}\) since \(A^{T}A\mathbf{q}=\lambda \mathbf{q}\neq \mathbf{0}\) and both \(\lambda \neq 0\) and \(\mathbf{q}\neq \mathbf{0}\). Hence \(\lambda\) is an eigenvalue of \(AA^{T}\), proving \(N(A^{T}A)\subseteq N(AA^{T})\). For the other inclusion replace \(A\) by \(A^{T}\).
To analyze an \(m\times n\) matrix \(A\) we have two symmetric matrices to work with: \(A^{T}A\) and \(AA^{T}\). In view of Lemma [lem:svdlemma2], we choose \(A^{T}A\) (sometimes called the Gram matrix of \(A\)), and derive a series of facts which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in several steps:
- The \(n\times n\) matrix \(A^{T}A\) is real and symmetric so, by the Principal Axes Theorem [thm:024303], let \(\{\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}\}\subseteq \mathbb{R}^{n}\) be an orthonormal basis of eigenvectors of \(A^{T}A,\) with corresponding eigenvalues \(\lambda_{1},\lambda_{2},\ldots ,\lambda_{n}\). By Lemma [lem:svdlemma2](1), \(\lambda_{i}\) is real for each \(i\) and \(\lambda_{i}\geq 0\). By re-ordering the \(\mathbf{q}_{i}\) we may (and do) assume that
\[\label{svdeqni} \lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0 \quad \mbox{ and }\footnote{Of course they could \emph{all} be positive $(r=n)$ or \emph{all} zero (so $A^{T}A=0$, and hence $A=0$ by Exercise \ref{ex:5.3.9}).} \quad \lambda_{i}=0 \mbox{ if } i>r \tag{\textbf{i}} \]
\[\label{svdeqnii} Q=\left[ \begin{array}{cccc} \mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n} \end{array}\right] \mbox{ is orthogonal }\quad \mbox{and} \quad \mbox{ orthogonally diagonalizes }A^{T}A \tag{\textbf{ii}} \]
- Because \(\{\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}\}\) is an orthonormal set, this gives
\[\label{svdeqniii} A\mathbf{q}_{i}\dotprod A\mathbf{q}_{j}=0 \mbox{ if } i\neq j \quad \quad \mbox{and} \quad \quad \| A\mathbf{q}_{i}\|^{2}=\lambda_{i}\| \mathbf{q}_{i}\| ^{2}=\lambda_{i} \mbox{ for each } i \tag{\textbf{iii}} \]
We can extract two conclusions from ([svdeqniii]) and ([svdeqni]):
\[\label{svdeqniv} \{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\}\subseteq im \; A \mbox{ is an orthogonal set}\quad \mbox{and} \quad A\mathbf{q}_{i}=\mathbf{0} \mbox{ if }i>r \tag{\textbf{iv}} \]
With this write \(U=span \;\{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\}\subseteq im \;A\); we claim that \(U=im \;A\), that is \(im \;A\subseteq U\). For this we must show that \(A\mathbf{x}\in U\) for each \(\mathbf{x}\in \mathbb{R}^{n}\). Since \(\{\mathbf{q}_{1},\ldots ,\mathbf{q}_{r},\ldots ,\mathbf{q}_{n}\}\) is a basis of \(\mathbb{R}^{n}\) (it is orthonormal), we can write \(\mathbf{x}_{k}=t_{1}\mathbf{q}_{1}+\cdots +t_{r}\mathbf{q}_{r}+\cdots +t_{n}\mathbf{q}_{n}\) where each \(t_{j}\in \mathbb{R}\). Then, using ([svdeqniv]) we obtain
\[A\mathbf{x}=t_{1}A\mathbf{q}_{1}+\cdots +t_{r}A\mathbf{q}_{r}+\cdots +t_{n}A\mathbf{q}_{n}=t_{1}A\mathbf{q}_{1}+\cdots +t_{r}A\mathbf{q}_{r}\in U \nonumber \]
This shows that \(U=im \;A\), and so
\[\label{svdeqnv} \{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\} \mbox{ is an \emph{orthogonal} basis of } im \;(A) \tag{\textbf{v}} \]
But \(\func{col}A=im \;A\) by Lemma [lem:svdlemma1], and \(rank \;A=dim \;(\func{col}A)\) by Theorem [thm:015444], so
\[\label{svdeqnvi} rank \; A=dim \;(\func{col}A)=dim \;(im \;A)\overset{(\text{\mathbf{v}})}{=}r \tag{\textbf{vi}} \]
- svddef1 The real numbers \(\sigma_{i}=\sqrt{\lambda_{i}}\overset{\text{(\textbf{iii})}}{=}\| A\bar{\mathbf{q}_{i}}\|\) for \(i=1,2,\ldots,n\), are called the singular values of the matrix \(A\).
Clearly \(\sigma_{1},\sigma_{2},\ldots ,\sigma_{r}\) are the positive singular values of \(A\). By ([svdeqni]) we have
\[\label{svdeqnvii} \sigma_{1}\geq \sigma _{2}\geq \cdots \geq \sigma_{r}>0 \qquad \mbox{and} \qquad \sigma_{i}=0 \mbox{ if } i>r \tag{\textbf{vii}} \]
With ([svdeqnvi]) this makes the following definitions depend only upon \(A\).
svddef2 Let \(A\) be a real, \(m\times n\) matrix of rank \(r\), with positive singular values \(\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0\) and \(\sigma_{i}=0\) if \(i>r\). Define:
\[D_{A}=\func{diag}(\sigma_{1},\ldots ,\sigma_{r})\quad \qquad \mbox{and}\quad \qquad \Sigma_{A}= \left[ \begin{array}{cc} D_{A} & 0 \\ 0 & 0 \end{array} \right]_{m\times n} \nonumber \]
Here \(\Sigma_{A}\) is in block form and is called the singular matrix of \(A\).
The singular values \(\sigma_{i}\) and the matrices \(D_{A}\) and \(\Sigma_{A}\) will be referred to frequently below.
- Returning to our narrative, normalize the vectors \(A\mathbf{q}_{1}\), \(A\mathbf{q}_{2},\ldots, A\mathbf{q}_{r}\), by defining
\[\label{svdeqnviii} \mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\| }A\mathbf{q}_{i}\in \mathbb{R}^{m} \quad \mbox{for each } i=1,2,\ldots ,r \tag{\textbf{viii}} \]
\[\label{svdeqnix} \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} \mbox{ is an \emph{orthonormal} basis of } \func{col}A\subseteq \mathbb{R}^{m} \tag{\textbf{ix}} \]
Employing the Gram-Schmidt algorithm (or otherwise), construct \(\mathbf{p}_{r+1},\ldots ,\mathbf{p}_{m}\) so that
\[\label{svdeqnx} \{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots ,\mathbf{p}_{m}\} \mbox{ is an orthonormal basis of } \mathbb{R}^{m} \tag{\textbf{x}} \]
- These matrices are related. In fact we have:
\[\label{svdeqnxi} \sigma_{i}\mathbf{p}_{i}=\sqrt{\lambda_{i}} \mathbf{p}_{i}\overset{(\text{\ref{svdeqniii}})}{=}\| A\mathbf{q}_{i}\| \mathbf{p}_{i}\overset{(\text{\ref{svdeqnviii}})}{=}A\mathbf{q}_{i} \quad \mbox{ for each } i=1,2,\ldots ,r \tag{\textbf{xi}} \]
This yields the following expression for \(AQ\) in terms of its columns:
\[\label{svdeqnxii} AQ=\left[ \begin{array}{cccccc} A\mathbf{q}_{1} & \cdots & A\mathbf{q}_{r} & A\mathbf{q}_{r+1} & \cdots & A\mathbf{q}_{n}\end{array}\right] \overset{(\text{\ref{svdeqniv}})}{=}\left[ \begin{array}{cccccc}\sigma _{1} \mathbf{p}_{1} & \cdots & \sigma_{r}\mathbf{p}_{r}& \mathbf{0}& \cdots &\mathbf{0}\end{array}\right] \tag{\textbf{xii}} \]
Then we compute:
\[\begin{aligned} P\Sigma _{A} & = \left[ \begin{array}{cccccc}\mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \mathbf{p}_{r+1} & \cdots & \mathbf{p}_{m}\end{array}\right] \left[ \begin{array}{cc} \begin{array}{ccc} \sigma _{1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \sigma _{r} \end{array} & \begin{array}{ccc} 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & \cdots & 0 \end{array} \\ \begin{array}{ccc} 0 & \cdots & ~0 \\ \vdots & & \vdots \\ 0 & \cdots & ~0 \end{array} & \begin{array}{ccc} 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & \cdots & 0 \end{array} \end{array} \right] \\ &= \left[ \begin{array}{cccccc}\sigma_{1}\mathbf{p}_{1} & \cdots & \sigma_{r}\mathbf{p}_{r} & \mathbf{0} & \cdots & \mathbf{0}\end{array}\right] \\ & \overset{\text{(\ref{svdeqnxii})}}{=} AQ\end{aligned} \nonumber \]
Finally, as \(Q^{-1}=Q^{T}\) it follows that \(A=P\Sigma _{A}Q^{T}\).
With this we can state the main theorem of this Section.
svdtheorem1 Let \(A\) be a real \(m\times n\) matrix, and let \(\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0\) be the positive singular values of \(A\). Then \(r\) is the rank of \(A\) and we have the factorization
\[A=P\Sigma_{A}Q^{T}\qquad \mbox{where } P \mbox{ and } Q \mbox{ are orthogonal matrices} \nonumber \]
The factorization \(A=P\Sigma _{A}Q^{T}\) in Theorem [thm:svdtheorem1], where \(P\) and \(Q\) are orthogonal matrices, is called a Singular Value Decomposition (SVD) of \(A\). This decomposition is not unique. For example if \(r<m\) then the vectors \(\mathbf{p}_{r+1},\ldots ,\mathbf{p}_{m}\) can be any extension of \(\{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r}\}\) to an orthonormal basis of \(\mathbb{R}^{m}\), and each will lead to a different matrix \(P\) in the decomposition. For a more dramatic example, if \(A=I_{n}\) then \(\Sigma_{A}=I_{n}\), and \(A=P\Sigma_{A}P^{T}\) is a SVD of \(A\) for any orthogonal \(n\times n\) matrix \(P\).
svdexample1 Find a singular value decomposition for \(A= \left[ \begin{array}{rrr} 1 & 0 & 1 \\ -1 & 1 & 0 \end{array} \right]\).
We have \(A^{T}A= \left[ \begin{array}{rrr} 2 & -1 & 1 \\ -1 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right]\), so the characteristic polynomial is
\[c_{A^{T}A}(x)=\det \left[ \begin{array}{ccc} x-2 & 1 & -1 \\ 1 & x-1 & 0 \\ -1 & 0 & x-1 \end{array} \right] =(x-3)(x-1)x \nonumber \]
Hence the eigenvalues of \(A^{T}A\) (in descending order) are \(\lambda_{1}=3\), \(\lambda_{2}=1\) and \(\lambda_{3}=0\) with, respectively, unit eigenvectors
\[\mathbf{q}_{1}=\frac{1}{\sqrt{6}} \left[ \begin{array}{r} 2 \\ -1 \\ 1 \end{array} \right], \quad \mathbf{q}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right],\quad \mbox{and} \quad \mathbf{q}_{3}=\frac{1}{\sqrt{3}} \left[ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right] \nonumber \]
It follows that the orthogonal matrix \(Q\) in Theorem [thm:svdtheorem1] is
\[Q=\left[ \begin{array}{ccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3}\end{array}\right]=\frac{1}{\sqrt{6}} \left[ \begin{array}{rrr} 2 & 0 & -\sqrt{2} \\ -1 & \sqrt{3} & -\sqrt{2} \\ 1 & \sqrt{3} & \sqrt{2} \end{array} \right] \nonumber \]
The singular values here are \(\sigma_{1}=\sqrt{3}\), \(\sigma_{2}=1\) and \(\sigma_{3}=0,\) so \(rank \;(A)=2\)—clear in this case—and the singular matrix is
\[\Sigma_{A}= \left[ \begin{array}{ccc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \end{array} \right] =\left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \nonumber \]
So it remains to find the \(2\times 2\) orthogonal matrix \(P\) in Theorem [thm:svdtheorem1]. This involves the vectors
\[A\mathbf{q}_{1}=\frac{\sqrt{6}}{2} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right],\quad A\mathbf{q}_{2}=\frac{\sqrt{2}}{2} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] ,\quad \mbox{and} \quad A\mathbf{q}_{3}= \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \nonumber \]
Normalize \(A\mathbf{q}_{1}\) and \(A\mathbf{q}_{2}\) to get
\[\mathbf{p}_{1}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right] \quad \mbox{and} \quad \mathbf{p}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \nonumber \]
In this case, \(\{\mathbf{p}_{1},\mathbf{p}_{2}\}\) is already a basis of \(\mathbb{R}^{2}\) (so the Gram-Schmidt algorithm is not needed), and we have the \(2\times 2\) orthogonal matrix
\[P=\left[ \begin{array}{cc} \mathbf{p}_{1} & \mathbf{p}_{2} \end{array}\right]=\frac{1}{\sqrt{2}} \left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \nonumber \]
Finally (by Theorem [thm:svdtheorem1]) the singular value decomposition for \(A\) is
\[A=P\Sigma_{A}Q^{T}= \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \frac{1}{\sqrt{6}}\left[ \begin{array}{rrr} 2 & -1 & 1 \\ 0 & \sqrt{3} & \sqrt{3} \\ -\sqrt{2} & -\sqrt{2} & \sqrt{2} \end{array} \right] \nonumber \]
Of course this can be confirmed by direct matrix multiplication.
Thus, computing an SVD for a real matrix \(A\) is a routine matter, and we now describe a systematic procedure for doing so.
SVD Algorithmsvdalgorithm Given a real \(m\times n\) matrix \(A\), find an SVD \(A=P\Sigma_{A}Q^{T}\) as follows:
- Use the Diagonalization Algorithm (see page ) to find the (real and non-negative) eigenvalues \(\lambda_{1},\lambda_{2},\ldots,\lambda_{n}\) of \(A^{T}A\) with corresponding (orthonormal) eigenvectors \(\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}\). Reorder the \(\mathbf{q}_{i}\) (if necessary) to ensure that the nonzero eigenvalues are \(\lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0\) and \(\lambda_{i}=0\) if \(i>r\).
- The integer \(r\) is the rank of the matrix \(A\).
- The \(n\times n\) orthogonal matrix \(Q\) in the SVD is \(Q=\left[ \begin{array}{cccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n}\end{array}\right]\).
- Define \(\mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\|}A\mathbf{q}_{i}\) for \(i=1,2,\ldots ,r\) (where \(r\) is as in step 1). Then \(\{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\}\) is orthonormal in \(\mathbb{R}^{m}\) so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis \(\{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots , \mathbf{p}_{m}\}\) in \(\mathbb{R}^{m}\).
- The \(m\times m\) orthogonal matrix \(P\) in the SVD is \(P=\left[ \begin{array}{ccccc} \mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \cdots & \mathbf{p}_{m}\end{array}\right]\).
- The singular values for \(A\) are \(\sigma_{1},\sigma_{2,}\ldots,\sigma_{n}\) where \(\sigma_{i}=\sqrt{\lambda_{i}}\) for each \(i\). Hence the nonzero singular values are \(\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0\), and so the singular matrix of \(A\) in the SVD is \(\Sigma_{A}= \left[ \begin{array}{cc} \func{diag}(\sigma_{1},\ldots ,\sigma_{r}) & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\).
- Thus \(A=P\Sigma Q^{T}\) is a SVD for \(A\).
In practise the singular values \(\sigma_{i}\), the matrices \(P\) and \(Q\), and even the rank of an \(m\times n\) matrix are not calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of accuracy. The reader is referred to books on numerical linear algebra.
So the main virtue of Theorem [thm:svdtheorem1] is that it provides a way of constructing an SVD for every real matrix \(A\). In particular it shows that every real matrix \(A\) has a singular value decomposition1 in the following, more general, sense:
svddef3 A Singular Value Decomposition (SVD) of an \(m\times n\) matrix \(A\) of rank \(r\) is a factorization \(A=U\Sigma V^{T}\) where \(U\) and \(V\) are orthogonal and \(\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\) in block form where \(D=\func{diag}(d_{1},d_{2},\ldots ,d_{r})\) where each \(d_{i}>0\), and \(r\leq m\) and \(r\leq n\).
Note that for any SVD \(A=U\Sigma V^{T}\) we immediately obtain some information about \(A\):
svdlemma3 If \(A=U\Sigma V^{T}\) is any SVD for \(A\) as in Definition [def:svddef3], then:
- \(r=rank \;A\).
- The numbers \(d_{1},d_{2},\dots ,d_{r}\) are the singular values of \(A^{T}A\) in some order.
Use the notation of Definition [def:svddef3]. We have
\[A^{T}A=(V\Sigma^{T}U^{T})(U\Sigma V^{T})=V(\Sigma^{T}\Sigma)V^{T} \nonumber \]
so \(\Sigma^{T}\Sigma\) and \(A^{T}A\) are similar \(n\times n\) matrices (Definition [def:015930]). Hence \(r=rank \; A\) by Corollary [cor:015519], proving (1.). Furthermore, \(\Sigma^{T}\Sigma\) and \(A^{T}A\) have the same eigenvalues by Theorem [thm:016008]; that is (using (1.)):
\[\{d_{1}^{2},d_{2}^{2},\dots ,d_{r}^{2}\}=\{\lambda_{1},\lambda_{2},\dots ,\lambda_{r}\} \quad \mbox{are equal as sets} \nonumber \]
where \(\lambda_{1},\lambda_{2},\dots ,\lambda_{r}\) are the positive eigenvalues of \(A^{T}A\). Hence there is a permutation \(\tau\) of \(\{1,2,\cdots ,r\}\) such that \(d_{i}^{2}=\lambda_{i\tau }\) for each \(i=1,2,\dots ,r\). Hence \(d_{i}=\sqrt{\lambda_{i\tau }}=\sigma_{i\tau }\) for each \(i\) by Definition [def:svddef1]. This proves (2.).
We note in passing that more is true. Let \(A\) be \(m\times n\) of rank \(r\), and let \(A=U\Sigma V^{T}\) be any SVD for \(A\). Using the proof of Lemma [lem:svdlemma3] we have \(d_{i}=\sigma_{i\tau }\) for some permutation \(\tau\) of \(\{1,2,\dots ,r\}.\) In fact, it can be shown that there exist orthogonal matrices \(U_{1}\) and \(V_{1}\) obtained from \(U\) and \(V\) by \(\tau\)-permuting columns and rows respectively, such that \(A=U_{1}\Sigma_{A}V_{1}^{T}\) is an SVD of \(A\).
Fundamental Subspaces
It turns out that any singular value decomposition contains a great deal of information about an \(m\times n\) matrix \(A\) and the subspaces associated with \(A\). For example, in addition to Lemma [lem:svdlemma3], the set \(\{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\}\) of vectors constructed in the proof of Theorem [thm:svdtheorem1] is an orthonormal basis of \(\func{col} A\) (by ([svdeqnv]) and ([svdeqnviii]) in the proof). There are more such examples, which is the thrust of this subsection. In particular, there are four subspaces associated to a real \(m\times n\) matrix \(A\) that have come to be called fundamental:
svddef4 The fundamental subspaces of an \(m\times n\) matrix \(A\) are:
- \(\func{row}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a row of } A\}\)
- \(\func{col}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a column of } A\}\)
- \(\func{null}A=\{\mathbf{x}\in \mathbb{R}^{n}\mid A\mathbf{x}=\mathbf{0}\}\)
- \(\func{null}A^{T}=\{\mathbf{x}\in \mathbb{R}^{n}\mid A^{T}\mathbf{x}=\mathbf{0}\}\)
If \(A=U\Sigma V^{T}\) is any SVD for the real \(m\times n\) matrix \(A\), any orthonormal bases of \(U\) and \(V\) provide orthonormal bases for each of these fundamental subspaces. We are going to prove this, but first we need three properties related to the orthogonal complement \(U^{\perp}\) of a subspace \(U\) of \(\mathbb{R}^{n}\), where (Definition [def:023776]):
\[U^{\perp}=\{\mathbf{x}\in \mathbb{R}^{n}\mid \mathbf{u}\bullet \mathbf{x}=0 \mbox{ for all } \mathbf{u}\in U\} \nonumber \]
The orthogonal complement plays an important role in the Projection Theorem (Theorem [thm:023885]), and we return to it in Section [sec:10_2]. For now we need:
svdlemma4 If \(A\) is any matrix then:
- \((\func{row}A)^{\perp}=\func{null}A\)and \((\func{col}A)^{\perp}=\func{null}A^{T}\).
- If \(U\) is any subspace of \(\mathbb{R}^{n}\) then \(U^{\perp\perp}=U\).
- Let \(\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{m}\}\) be an orthonormal basis of \(\mathbb{R}^{m}\). If \(U=span \;\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{k}\}\), then
\[U^{\perp}=span \;\{\mathbf{f}_{k+1},\ldots ,\mathbf{f}_{m}\} \nonumber \]
- Hence \(\func{null}A=(\func{row}A)^{\perp}\). Now replace \(A\) by \(A^{T}\) to get \(\func{null}A^{T}=(\func{row}A^{T})^{\perp}=(\func{col}A)^{\perp}\), which is the other identity in (1).
- If \(\mathbf{x}\in U\) then \(\mathbf{y}\dotprod \mathbf{x}=0\) for all \(\mathbf{y}\in U^{\perp}\), that is \(\mathbf{x}\in U^{\perp\perp}\). This proves that \(U\subseteq U^{\perp\perp}\), so it is enough to show that \(dim \;U=dim \;U^{\perp\perp}\). By Theorem [thm:023953] we see that \(dim \;V^{\perp}=n-dim \;V\) for any subspace \(V\subseteq \mathbb{R}^{n}\). Hence
\[dim \;U^{\perp\perp}=n-dim \;U^{\perp}=n-(n-dim \;U)=dim \;U, \mbox{ as required} \nonumber \]
- \[\begin{array}{ccccccccccccc} \mathbf{x} & = & (\mathbf{f}_{1}\dotprod \mathbf{x})\mathbf{f}_{1} & + & \cdots & + & (\mathbf{f}_{k}\dotprod \mathbf{x})\mathbf{f}_{k} & + &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1} & + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \\ & = & \mathbf{0} &+ &\cdots &+ &\mathbf{0} &+ &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1}& + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \end{array} \nonumber \]
With this we can see how any SVD for a matrix \(A\) provides orthonormal bases for each of the four fundamental subspaces of \(A\).
svdtheorem2 Let \(A\) be an \(m\times n\) real matrix, let \(A=U\Sigma V^{T}\) be any SVD for \(A\) where \(U\) and \(V\) are orthogonal of size \(m\times m\) and \(n\times n\) respectively, and let
\[\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\qquad \mbox{where} \qquad D=\func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}), \mbox{ with each } \lambda_{i}>0 \nonumber \]
Write \(U=\left[ \begin{array}{ccccc}\mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m}\end{array}\right]\) and \(V=\left[ \begin{array}{ccccc}\mathbf{v}_{1} & \cdots & \mathbf{v}_{r} & \cdots & \mathbf{v}_{n}\end{array}\right],\) so \(\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r},\ldots ,\mathbf{u}_{m}\}\) and \(\{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r},\ldots ,\mathbf{v}_{n}\}\) are orthonormal bases of \(\mathbb{R}^{m}\) and \(\mathbb{R}^{n}\) respectively. Then
- \(r=rank \;A\), and the singular values of \(A\) are \(\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots ,\sqrt{\lambda_{r}}\).
- The fundamental spaces are described as follows:
- \(\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\}\) is an orthonormal basis of \(\func{col}A\).
- \(\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\}\) is an orthonormal basis of \(\func{null}A^{T}\).
- \(\{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\}\) is an orthonormal basis of \(\func{null}A\).
- \(\{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r}\}\) is an orthonormal basis of \(\func{row}A\).
- This is Lemma [lem:svdlemma3].
-
- As \(\func{col}A=\func{col}(AV)\) by Lemma [lem:015527] and \(AV=U\Sigma\), (a.) follows from
\[U\Sigma =\left[ \begin{array}{ccccc} \mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m} \end{array}\right] \left[ \begin{array}{cc} \func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}) & 0 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{cccccc} \lambda_{1}\mathbf{u}_{1} & \cdots & \lambda_{r}\mathbf{u}_{r} & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \nonumber \]
- We have \((\func{col}A)^{\perp}\overset{\text{(a.)}}{=}(span \;\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\})^{\perp}=span \;\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\}\) by Lemma [lem:svdlemma4](3). This proves (b.) because \((\func{col}A)^{\perp}=\func{null}A^{T}\) by Lemma [lem:svdlemma4](1).
- So to prove (c.) it is enough to show that \(\mathbf{v}_{j}\in \func{null}A\) whenever \(j>r\). To this end write
\[\lambda_{r+1}=\cdots =\lambda_{n}=0,\quad \mbox{so} \quad E^{T}E=\func{diag}(\lambda_{1}^{2},\ldots ,\lambda_{r}^{2},\lambda_{r+1}^{2},\ldots,\lambda_{n}^{2}) \nonumber \]
Observe that each \(\lambda_{j}\) is an eigenvalue of \(\Sigma^{T}\Sigma\) with eigenvector \(\mathbf{e}_{j}=\) column \(j\) of \(I_{n}\). Thus \(\mathbf{v}_{j}=V\mathbf{e}_{j}\) for each \(j\). As \(A^{T}A=V\Sigma^{T}\Sigma V^{T}\) (proof of Lemma [lem:svdlemma3]), we obtain
\[(A^{T}A)\mathbf{v}_{j}=(V\Sigma^{T}\Sigma V^{T})(V\mathbf{e}_{j})=V(\Sigma^{T}\Sigma \mathbf{e}_{j})=V\left( \lambda_{j}^{2}\mathbf{e}_{j}\right) =\lambda_{j}^{2}V\mathbf{e}_{j}=\lambda_{j}^{2}\mathbf{v}_{j} \nonumber \]
for \(1\leq j\leq n\). Thus each \(\mathbf{v}_{j}\) is an eigenvector of \(A^{T}A\) corresponding to \(\lambda_{j}^{2}\). But then
\[\| A\mathbf{v}_{j}\|^{2}=(A\mathbf{v}_{j})^{T}A\mathbf{v}_{j}=\mathbf{v}_{j}^{T}(A^{T}A\mathbf{v}_{j})=\mathbf{v}_{j}^{T}(\lambda _{j}^{2}\mathbf{v}_{j})=\lambda_{j}^{2}\| \mathbf{v}_{j}\|^{2}=\lambda_{j}^{2}\quad \mbox{for } i=1,\ldots ,n \nonumber \]
In particular, \(A\mathbf{v}_{j}=\mathbf{0}\) whenever \(j>r\), so \(\mathbf{v}_{j}\in \func{null}A\) if \(j>r\), as desired. This proves (c).
- This proves (d.), and hence Theorem [thm:svdtheorem2].
- As \(\func{col}A=\func{col}(AV)\) by Lemma [lem:015527] and \(AV=U\Sigma\), (a.) follows from
svdexample2 Consider the homogeneous linear system
\[A\mathbf{x}=\mathbf{0} \mbox{ of } m \mbox{ equations in } n \mbox{ variables} \nonumber \]
Then the set of all solutions is \(\func{null} A\). Hence if \(A=U\Sigma V^{T}\) is any SVD for \(A\) then (in the notation of Theorem [thm:svdtheorem2]) \(\{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\}\) is an orthonormal basis of the set of solutions for the system. As such they are a set of basic solutions for the system, the most basic notion in Chapter [chap:1].
The Polar Decomposition of a Real Square Matrix
If \(A\) is real and \(n\times n\) the factorization in the title is related to the polar decomposition \(A\). Unlike the SVD, in this case the decomposition is uniquely determined by \(A\).
Recall (Section [sec:8_3]) that a symmetric matrix \(A\) is called positive definite if and only if \(\mathbf{x}^{T}A\mathbf{x}>0\) for every column \(\mathbf{x} \neq \mathbf{0} \in \mathbb{R}^{n}\). Before proceeding, we must explore the following weaker notion:
svddef5 A real \(n\times n\) matrix \(G\) is called positiveif it is symmetric and
\[\mathbf{x}^{T}G\mathbf{x}\geq 0 \quad \mbox{for all } \mathbf{x}\in \mathbb{R}^{n} \nonumber \]
Clearly every positive definite matrix is positive, but the converse fails. Indeed, \(A= \left[ \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right]\) is positive because, if \(\mathbf{x}=\left[ \begin{array}{cc} a & b \end{array}\right]^{T}\) in \(\mathbb{R}^{2}\), then \(\mathbf{x}^{T}A\mathbf{x}=(a+b)^{2}\geq 0\). But \(\mathbf{y}^{T}A\mathbf{y}=0\) if \(\mathbf{y}=\left[ \begin{array}{cc} 1 & -1 \end{array}\right]^{T}\), so \(A\) is not positive definite.
svdlemma5 Let \(G\) denote an \(n\times n\) positive matrix.
- If \(A\) is any \(m\times n\) matrix and \(G\) is positive, then \(A^{T}GA\) is positive (and \(m\times m\)).
- If \(G=\func{diag}(d_{1}, d_{2},\cdots ,d_{n})\) and each \(d_{i}\geq 0\) then \(G\) is positive.
- \(\mathbf{x}^{T}(A^{T}GA)\mathbf{x}=(A\mathbf{x})^{T}G(A\mathbf{x})\geq 0\) because \(G\) is positive.
- If \(\mathbf{x}=\left[ \begin{array}{cccc}x_{1}& x_{2} & \cdots & x_{n}\end{array}\right]^{T}\), then
\[\mathbf{x}^{T}G\mathbf{x}=d_{1}x_{1}^{2}+d_{2}x_{2}^{2}+\cdots +d_{n}x_{n}^{2}\geq 0 \nonumber \]
svddef6 If \(A\) is a real \(n\times n\) matrix, a factorization
\[A=GQ \mbox{ where } G \mbox{ is positive and } Q \mbox{ is orthogonal} \nonumber \]
is called a polar decomposition for \(A\).
Any SVD for a real square matrix \(A\) yields a polar form for \(A\).
svdtheorem3 Every square real matrix has a polar form.
Let \(A=U\Sigma V^{T}\) be a SVD for \(A\) with \(\Sigma\) as in Definition [def:svddef3] and \(m=n\). Since \(U^{T}U=I_{n}\) here we have
\[A=U\Sigma V^{T}=(U\Sigma )(U^{T}U)V^{T}=(U\Sigma U^{T})(UV^{T}) \nonumber \]
So if we write \(G=U\Sigma U^{T}\) and \(Q=UV^{T}\), then \(Q\) is orthogonal, and it remains to show that \(G\) is positive. But this follows from Lemma [lem:svdlemma5].
The SVD for a square matrix \(A\) is not unique (\(I_{n}=PI_{n}P^{T}\) for any orthogonal matrix \(P\)). But given the proof of Theorem [thm:svdtheorem3] it is surprising that the polar decomposition is unique.2 We omit the proof.
The name “polar form” is reminiscent of the same form for complex numbers (see Appendix [chap:appacomplexnumbers]). This is no coincidence. To see why, we represent the complex numbers as real \(2\times 2\) matrices. Write \(\mathbf{M}_{2}(\mathbb{R})\) for the set of all real \(2\times 2\) matrices, and define
\[\sigma :\mathbb{C}\rightarrow\|{M}_{2}(\mathbb{R})\quad \mbox{by} \quad \sigma (a+bi)= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \mbox{ for all } a+bi \mbox{ in } \mathbb{C} \nonumber \]
One verifies that \(\sigma\) preserves addition and multiplication in the sense that
\[\sigma (zw)=\sigma (z)\sigma (w)\qquad \mbox{and}\qquad \sigma (z+w)=\sigma (z)+\sigma (w) \nonumber \]
for all complex numbers \(z\) and \(w\). Since \(\theta\) is one-to-one we may identify each complex number \(a+bi\) with the matrix \(\theta (a+bi)\), that is we write
\[a+bi= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \quad \mbox{for all } a+bi \mbox{ in } \mathbb{C} \nonumber \]
Thus \(0= \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right]\), \(1= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] =I_{2}\), \(i= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right]\), and \(r = \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right]\) if \(r\) is real.
If \(z=a+bi\) is nonzero then the absolute value \(r=|z| =\sqrt{a^{2}+b^{2}}\neq 0\). If \(\theta\) is the angle of \(z\) in standard position, then \(\cos\theta =a/r\) and \(\sin\theta =b/r\). Observe:
\[\label{svdeqnxiii} \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} a/r & -b/r \\ b/r & a/r \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] = GQ \tag{\textbf{xiii}} \]
where \(G= \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right]\) is positive and \(Q= \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right]\) is orthogonal. But in \(\mathbb{C}\) we have \(G=r\) and \(Q=\cos\theta +i\sin\theta\) so ([svdeqnxiii]) reads \(z=r(\cos\theta +i\sin\theta )=re^{i\theta }\) which is the classical polar form for the complex number \(a+bi\). This is why ([svdeqnxiii]) is called the polar form of the matrix \(\left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right]\); Definition [def:svddef6] simply adopts the terminology for \(n\times n\) matrices.
The Pseudoinverse of a Matrix
It is impossible for a non-square matrix \(A\) to have an inverse (see the footnote to Definition [def:004202]). Nonetheless, one candidate for an “inverse” of \(A\) is an \(m\times n\) matrix \(B\) such that
\[ABA=A \qquad \mbox{and} \qquad BAB=B \nonumber \]
Such a matrix \(B\) is called a middle inverse for \(A\). If \(A\) is invertible then \(A^{-1}\) is the unique middle inverse for \(A\), but a middle inverse is not unique in general, even for square matrices. For example, if \(A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]\) then \(B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right]\) is a middle inverse for \(A\) for any \(b\).
If \(ABA=A\) and \(BAB=B\) it is easy to see that \(AB\) and \(BA\) are both idempotent matrices. In 1955 Roger Penrose observed that the middle inverse is unique if both \(AB\) and \(BA\) are symmetric. We omit the proof.
Penrose’ Theorempenrosetheorem
Given any real \(m\times n\) matrix \(A\), there is exactly one \(n\times m\) matrix \(B\) such that \(A\) and \(B\) satisfy the following conditions:
- \(ABA=A\) and \(BAB=B\).
- Both \(AB\) and \(BA\) are symmetric.
svddef7 Let \(A\) be a real \(m\times n\) matrix. The pseudoinverse of \(A\) is the unique \(n\times m\) matrix \(A^{+}\) such that \(A\) and \(A^{+}\) satisfy P1 and P2, that is:
\[AA^{+}A=A,\qquad A^{+}AA^{+}=A^{+},\qquad \mbox{and both } AA^{+} \mbox{ and } A^{+}A \mbox{ are symmetric}\footnotemark \nonumber \]
If \(A\) is invertible then \(A^{+}=A^{-1}\) as expected. In general, the symmetry in conditions P1 and P2 shows that \(A\) is the pseudoinverse of \(A^{+}\), that is \(A^{++}=A\).
svdtheorem4 Let \(A\) be an \(m\times n\) matrix.
- If \(rank \;A=m\) then \(AA^{T}\) is invertible and \(A^{+}=A^{T}(AA^{T})^{-1}\).
- If \(rank \;A=n\) then \(A^{T}A\) is invertible and \(A^{+}=(A^{T}A)^{-1}A^{T}\).
Here \(AA^{T}\) (respectively \(A^{T}A)\) is invertible by Theorem [thm:015711] (respectively Theorem [thm:015672]). The rest is a routine verification.
In general, given an \(m\times n\) matrix \(A\), the pseudoinverse \(A^{+}\) can be computed from any SVD for \(A\). To see how, we need some notation. Let \(A=U\Sigma V^{T}\) be an SVD for \(A\) (as in Definition [def:svddef3]) where \(U\) and \(V\) are orthogonal and \(\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\) in block form where \(D=\func{diag}(d_{1},d_{2},\ldots ,d_{r})\) where each \(d_{i}>0\). Hence \(D\) is invertible, so we make:
svddef8 \(\Sigma^{\prime }= \left[ \begin{array}{cc} D^{-1} & 0 \\ 0 & 0 \end{array} \right]_{n\times m}.\)
A routine calculation gives:
svdlemma6
2
- \(\Sigma \Sigma^{\prime }\Sigma =\Sigma\)
- \(\Sigma^{\prime }\Sigma \Sigma^{\prime }=\Sigma^{\prime }\)
- \(\Sigma \Sigma^{\prime }= \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{m\times m}\)
- \(\Sigma^{\prime }\Sigma = \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{n\times n}\)
That is, \(\Sigma^{\prime }\) is the pseudoinverse of \(\Sigma\).
Now given \(A=U\Sigma V^{T}\), define \(B=V\Sigma^{\prime }U^{T}\). Then
\[ABA=(U\Sigma V^{T})(V\Sigma^{\prime }U^{T})(U\Sigma V^{T})=U(\Sigma \Sigma^{\prime }\Sigma )V^{T}=U\Sigma V^{T}=A \nonumber \]
by Lemma [lem:svdlemma6]. Similarly \(BAB=B\). Moreover \(AB=U(\Sigma \Sigma^{\prime })U^{T}\) and \(BA=V(\Sigma^{\prime }\Sigma )V^{T}\) are both symmetric again by Lemma [lem:svdlemma6]. This proves
svdtheorem5 Let \(A\) be real and \(m\times n\), and let \(A=U\Sigma V^{T}\) is any SVD for \(A\) as in Definition [def:svddef3]. Then \(A^{+}=V\Sigma^{\prime }U^{T}\).
Of course we can always use the SVD constructed in Theorem [thm:svdtheorem1] to find the pseudoinverse. If \(A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]\), we observed above that \(B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right]\) is a middle inverse for \(A\) for any \(b\). Furthermore \(AB\) is symmetric but \(BA\) is not, so \(B\neq A^{+}\).
svdexample3 Find \(A^{+}\) if \(A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]\).
\(A^{T}A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right]\) with eigenvalues \(\lambda_{1}=1\) and \(\lambda_{2}=0\) and corresponding eigenvectors \(\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]\) and \(\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 1 \end{array} \right]\). Hence \(Q=\left[ \begin{array}{cc}\mathbf{q}_{1}&\mathbf{q}_{2}\end{array}\right]=I_{2}\). Also \(A\) has rank \(1\) with singular values \(\sigma_{1}=1\) and \(\sigma_{2}=0,\) so \(\Sigma_{A}= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]=A\) and \(\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]=A^{T}\) in this case.
Since \(A\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]\) and \(A\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right]\), we have \(\mathbf{p}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]\) which extends to an orthonormal basis \(\{\mathbf{p}_{1},\mathbf{p}_{2},\mathbf{p}_{3}\}\) of \(\mathbb{R}^{3}\) where (say) \(\mathbf{p}_{2}= \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right]\) and \(\mathbf{p}_{3}= \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]\). Hence \(P=\left[ \begin{array}{ccc} \mathbf{p}_{1}&\mathbf{p}_{2}&\mathbf{p}_{3}\end{array}\right]=I\), so the SVD for \(A\) is \(A=P\Sigma _{A}Q^{T}.\) Finally, the pseudoinverse of \(A\) is \(A^{+}=Q\Sigma^{\prime}_{A}P^{T}=\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]\). Note that \(A^{+}=A^{T}\) in this case.
The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse. The verifications are left as exercises.
svdlemma7 Let \(A\) be an \(m\times n\) matrix of rank \(r\).
- \(A^{++}=A\).
- If \(A\) is invertible then \(A^{+}=A^{-1}\).
- \((A^{T})^{+}=(A^{+})^{T}\).
- \((kA)^{+}=kA^{+}\) for any real \(k\).
- \((UAV)^{+}=U^{T}(A^{+})V^{T}\) whenever \(U\) and \(V\) are orthogonal.
- In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]↩
- See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.↩