8.6: The Singular Value Decomposition
( \newcommand{\kernel}{\mathrm{null}\,}\)
When working with a square matrix A it is clearly useful to be able to “diagonalize” A, that is to find a factorization A=Q−1DQ where Q is invertible and D is diagonal. Unfortunately such a factorization may not exist for A. However, even if A is not square gaussian elimination provides a factorization of the form A=PDQ where P and Q are invertible and D is diagonal—the Smith Normal form (Theorem [thm:005369]). However, if A is real we can choose P and Q to be orthogonal real matrices and D to be real. Such a factorization is called a singular value decomposition (SVD) for A, one of the most useful tools in applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix A, and illustrate some of its many applications.
We need a fact about two subspaces associated with an m×n matrix A:
imA={Ax∣x in Rn}and\funccolA=span{a∣a is a column of A}
Then imA is called the image of A (so named because of the linear transformation Rn→Rm with x↦Ax); and \funccolA is called the column space of A (Definition [def:015376]). Surprisingly, these spaces are equal:
svdlemma1 For any m×n matrix A, imA=\funccolA.
Let A=[a1a2⋯an] in terms of its columns. Let x∈imA, say x=Ay, y in Rn. If y=[y1y2⋯yn]T, then Ay=y1a1+y2a2+⋯+ynan∈\funccolA by Definition [def:002668]. This shows that imA⊆\funccolA. For the other inclusion, each ak=Aek where ek is column k of In.
Singular Value Decompositions
We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem [thm:016397]), and it is orthogonally diagonalizable by the Principal Axes Theorem (Theorem [thm:024303]). So for any real matrix A (square or not), the fact that both ATA and AAT are real and symmetric suggests that we can learn a lot about A by studying them. This section shows just how true this is.
The following Lemma reveals some similarities between ATA and AAT which simplify the statement and the proof of the SVD we are constructing.
svdlemma2 Let A be a real m×n matrix. Then:
- The eigenvalues of ATA and AAT are real and non-negative.
- ATA and AAT have the same set of positive eigenvalues.
- Then (1.) follows for ATA, and the case AAT follows by replacing A by AT.
- Moreover, Aq≠0 since ATAq=λq≠0 and both λ≠0 and q≠0. Hence λ is an eigenvalue of AAT, proving N(ATA)⊆N(AAT). For the other inclusion replace A by AT.
To analyze an m×n matrix A we have two symmetric matrices to work with: ATA and AAT. In view of Lemma [lem:svdlemma2], we choose ATA (sometimes called the Gram matrix of A), and derive a series of facts which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in several steps:
- The n×n matrix ATA is real and symmetric so, by the Principal Axes Theorem [thm:024303], let {q1,q2,…,qn}⊆Rn be an orthonormal basis of eigenvectors of ATA, with corresponding eigenvalues λ1,λ2,…,λn. By Lemma [lem:svdlemma2](1), λi is real for each i and λi≥0. By re-ordering the qi we may (and do) assume that
λ1≥λ2≥⋯≥λr>0 and \footnoteOfcoursetheycould\emphallbepositive$(r=n)$or\emphallzero(so$ATA=0$,andhence$A=0$byExercise???).λi=0 if i>r
Q=[q1q2⋯qn] is orthogonal and orthogonally diagonalizes ATA
- Because {q1,q2,…,qn} is an orthonormal set, this gives
Aqi\dotprodAqj=0 if i≠jand‖Aqi‖2=λi‖qi‖2=λi for each i
We can extract two conclusions from ([svdeqniii]) and ([svdeqni]):
{Aq1,Aq2,…,Aqr}⊆imA is an orthogonal setandAqi=0 if i>r
With this write U=span{Aq1,Aq2,…,Aqr}⊆imA; we claim that U=imA, that is imA⊆U. For this we must show that Ax∈U for each x∈Rn. Since {q1,…,qr,…,qn} is a basis of Rn (it is orthonormal), we can write xk=t1q1+⋯+trqr+⋯+tnqn where each tj∈R. Then, using ([svdeqniv]) we obtain
Ax=t1Aq1+⋯+trAqr+⋯+tnAqn=t1Aq1+⋯+trAqr∈U
This shows that U=imA, and so
{Aq1,Aq2,…,Aqr} is an \emph{orthogonal} basis of im(A)
But \funccolA=imA by Lemma [lem:svdlemma1], and rankA=dim(\funccolA) by Theorem [thm:015444], so
rankA=dim(\funccolA)=dim(imA)(\mathbf{v})=r
- svddef1 The real numbers σi=√λi(\textbf{iii})=‖A¯qi‖ for i=1,2,…,n, are called the singular values of the matrix A.
Clearly σ1,σ2,…,σr are the positive singular values of A. By ([svdeqni]) we have
σ1≥σ2≥⋯≥σr>0andσi=0 if i>r
With ([svdeqnvi]) this makes the following definitions depend only upon A.
svddef2 Let A be a real, m×n matrix of rank r, with positive singular values σ1≥σ2≥⋯≥σr>0 and σi=0 if i>r. Define:
DA=\funcdiag(σ1,…,σr)andΣA=[DA000]m×n
Here ΣA is in block form and is called the singular matrix of A.
The singular values σi and the matrices DA and ΣA will be referred to frequently below.
- Returning to our narrative, normalize the vectors Aq1, Aq2,…,Aqr, by defining
pi=1‖Aqi‖Aqi∈Rmfor each i=1,2,…,r
{p1,p2,…,pr} is an \emph{orthonormal} basis of \funccolA⊆Rm
Employing the Gram-Schmidt algorithm (or otherwise), construct pr+1,…,pm so that
{p1,…,pr,…,pm} is an orthonormal basis of Rm
- These matrices are related. In fact we have:
σipi=√λipi(\textbf{iii})=‖Aqi‖pi(\textbf{viii})=Aqi for each i=1,2,…,r
This yields the following expression for AQ in terms of its columns:
AQ=[Aq1⋯AqrAqr+1⋯Aqn](\textbf{iv})=[σ1p1⋯σrpr0⋯0]
Then we compute:
PΣA=[p1⋯prpr+1⋯pm][σ1⋯0⋮⋱⋮0⋯σr0⋯0⋮⋮0⋯00⋯ 0⋮⋮0⋯ 00⋯0⋮⋮0⋯0]=[σ1p1⋯σrpr0⋯0](\textbf{xii})=AQ
Finally, as Q−1=QT it follows that A=PΣAQT.
With this we can state the main theorem of this Section.
svdtheorem1 Let A be a real m×n matrix, and let σ1≥σ2≥⋯≥σr>0 be the positive singular values of A. Then r is the rank of A and we have the factorization
A=PΣAQTwhere P and Q are orthogonal matrices
The factorization A=PΣAQT in Theorem [thm:svdtheorem1], where P and Q are orthogonal matrices, is called a Singular Value Decomposition (SVD) of A. This decomposition is not unique. For example if r<m then the vectors pr+1,…,pm can be any extension of {p1,…,pr} to an orthonormal basis of Rm, and each will lead to a different matrix P in the decomposition. For a more dramatic example, if A=In then ΣA=In, and A=PΣAPT is a SVD of A for any orthogonal n×n matrix P.
svdexample1 Find a singular value decomposition for A=[101−110].
We have ATA=[2−11−110101], so the characteristic polynomial is
cATA(x)=det
Hence the eigenvalues of A^{T}A (in descending order) are \lambda_{1}=3, \lambda_{2}=1 and \lambda_{3}=0 with, respectively, unit eigenvectors
\mathbf{q}_{1}=\frac{1}{\sqrt{6}} \left[ \begin{array}{r} 2 \\ -1 \\ 1 \end{array} \right], \quad \mathbf{q}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right],\quad \mbox{and} \quad \mathbf{q}_{3}=\frac{1}{\sqrt{3}} \left[ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right] \nonumber
It follows that the orthogonal matrix Q in Theorem [thm:svdtheorem1] is
Q=\left[ \begin{array}{ccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3}\end{array}\right]=\frac{1}{\sqrt{6}} \left[ \begin{array}{rrr} 2 & 0 & -\sqrt{2} \\ -1 & \sqrt{3} & -\sqrt{2} \\ 1 & \sqrt{3} & \sqrt{2} \end{array} \right] \nonumber
The singular values here are \sigma_{1}=\sqrt{3}, \sigma_{2}=1 and \sigma_{3}=0, so rank \;(A)=2—clear in this case—and the singular matrix is
\Sigma_{A}= \left[ \begin{array}{ccc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \end{array} \right] =\left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \nonumber
So it remains to find the 2\times 2 orthogonal matrix P in Theorem [thm:svdtheorem1]. This involves the vectors
A\mathbf{q}_{1}=\frac{\sqrt{6}}{2} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right],\quad A\mathbf{q}_{2}=\frac{\sqrt{2}}{2} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] ,\quad \mbox{and} \quad A\mathbf{q}_{3}= \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \nonumber
Normalize A\mathbf{q}_{1} and A\mathbf{q}_{2} to get
\mathbf{p}_{1}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right] \quad \mbox{and} \quad \mathbf{p}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \nonumber
In this case, \{\mathbf{p}_{1},\mathbf{p}_{2}\} is already a basis of \mathbb{R}^{2} (so the Gram-Schmidt algorithm is not needed), and we have the 2\times 2 orthogonal matrix
P=\left[ \begin{array}{cc} \mathbf{p}_{1} & \mathbf{p}_{2} \end{array}\right]=\frac{1}{\sqrt{2}} \left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \nonumber
Finally (by Theorem [thm:svdtheorem1]) the singular value decomposition for A is
A=P\Sigma_{A}Q^{T}= \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \frac{1}{\sqrt{6}}\left[ \begin{array}{rrr} 2 & -1 & 1 \\ 0 & \sqrt{3} & \sqrt{3} \\ -\sqrt{2} & -\sqrt{2} & \sqrt{2} \end{array} \right] \nonumber
Of course this can be confirmed by direct matrix multiplication.
Thus, computing an SVD for a real matrix A is a routine matter, and we now describe a systematic procedure for doing so.
SVD Algorithmsvdalgorithm Given a real m\times n matrix A, find an SVD A=P\Sigma_{A}Q^{T} as follows:
- Use the Diagonalization Algorithm (see page ) to find the (real and non-negative) eigenvalues \lambda_{1},\lambda_{2},\ldots,\lambda_{n} of A^{T}A with corresponding (orthonormal) eigenvectors \mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}. Reorder the \mathbf{q}_{i} (if necessary) to ensure that the nonzero eigenvalues are \lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0 and \lambda_{i}=0 if i>r.
- The integer r is the rank of the matrix A.
- The n\times n orthogonal matrix Q in the SVD is Q=\left[ \begin{array}{cccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n}\end{array}\right].
- Define \mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\|}A\mathbf{q}_{i} for i=1,2,\ldots ,r (where r is as in step 1). Then \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} is orthonormal in \mathbb{R}^{m} so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis \{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots , \mathbf{p}_{m}\} in \mathbb{R}^{m}.
- The m\times m orthogonal matrix P in the SVD is P=\left[ \begin{array}{ccccc} \mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \cdots & \mathbf{p}_{m}\end{array}\right].
- The singular values for A are \sigma_{1},\sigma_{2,}\ldots,\sigma_{n} where \sigma_{i}=\sqrt{\lambda_{i}} for each i. Hence the nonzero singular values are \sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0, and so the singular matrix of A in the SVD is \Sigma_{A}= \left[ \begin{array}{cc} \func{diag}(\sigma_{1},\ldots ,\sigma_{r}) & 0 \\ 0 & 0 \end{array} \right]_{m\times n}.
- Thus A=P\Sigma Q^{T} is a SVD for A.
In practise the singular values \sigma_{i}, the matrices P and Q, and even the rank of an m\times n matrix are not calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of accuracy. The reader is referred to books on numerical linear algebra.
So the main virtue of Theorem [thm:svdtheorem1] is that it provides a way of constructing an SVD for every real matrix A. In particular it shows that every real matrix A has a singular value decomposition1 in the following, more general, sense:
svddef3 A Singular Value Decomposition (SVD) of an m\times n matrix A of rank r is a factorization A=U\Sigma V^{T} where U and V are orthogonal and \Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n} in block form where D=\func{diag}(d_{1},d_{2},\ldots ,d_{r}) where each d_{i}>0, and r\leq m and r\leq n.
Note that for any SVD A=U\Sigma V^{T} we immediately obtain some information about A:
svdlemma3 If A=U\Sigma V^{T} is any SVD for A as in Definition [def:svddef3], then:
- r=rank \;A.
- The numbers d_{1},d_{2},\dots ,d_{r} are the singular values of A^{T}A in some order.
Use the notation of Definition [def:svddef3]. We have
A^{T}A=(V\Sigma^{T}U^{T})(U\Sigma V^{T})=V(\Sigma^{T}\Sigma)V^{T} \nonumber
so \Sigma^{T}\Sigma and A^{T}A are similar n\times n matrices (Definition [def:015930]). Hence r=rank \; A by Corollary [cor:015519], proving (1.). Furthermore, \Sigma^{T}\Sigma and A^{T}A have the same eigenvalues by Theorem [thm:016008]; that is (using (1.)):
\{d_{1}^{2},d_{2}^{2},\dots ,d_{r}^{2}\}=\{\lambda_{1},\lambda_{2},\dots ,\lambda_{r}\} \quad \mbox{are equal as sets} \nonumber
where \lambda_{1},\lambda_{2},\dots ,\lambda_{r} are the positive eigenvalues of A^{T}A. Hence there is a permutation \tau of \{1,2,\cdots ,r\} such that d_{i}^{2}=\lambda_{i\tau } for each i=1,2,\dots ,r. Hence d_{i}=\sqrt{\lambda_{i\tau }}=\sigma_{i\tau } for each i by Definition [def:svddef1]. This proves (2.).
We note in passing that more is true. Let A be m\times n of rank r, and let A=U\Sigma V^{T} be any SVD for A. Using the proof of Lemma [lem:svdlemma3] we have d_{i}=\sigma_{i\tau } for some permutation \tau of \{1,2,\dots ,r\}. In fact, it can be shown that there exist orthogonal matrices U_{1} and V_{1} obtained from U and V by \tau-permuting columns and rows respectively, such that A=U_{1}\Sigma_{A}V_{1}^{T} is an SVD of A.
Fundamental Subspaces
It turns out that any singular value decomposition contains a great deal of information about an m\times n matrix A and the subspaces associated with A. For example, in addition to Lemma [lem:svdlemma3], the set \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} of vectors constructed in the proof of Theorem [thm:svdtheorem1] is an orthonormal basis of \func{col} A (by ([svdeqnv]) and ([svdeqnviii]) in the proof). There are more such examples, which is the thrust of this subsection. In particular, there are four subspaces associated to a real m\times n matrix A that have come to be called fundamental:
svddef4 The fundamental subspaces of an m\times n matrix A are:
- \func{row}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a row of } A\}
- \func{col}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a column of } A\}
- \func{null}A=\{\mathbf{x}\in \mathbb{R}^{n}\mid A\mathbf{x}=\mathbf{0}\}
- \func{null}A^{T}=\{\mathbf{x}\in \mathbb{R}^{n}\mid A^{T}\mathbf{x}=\mathbf{0}\}
If A=U\Sigma V^{T} is any SVD for the real m\times n matrix A, any orthonormal bases of U and V provide orthonormal bases for each of these fundamental subspaces. We are going to prove this, but first we need three properties related to the orthogonal complement U^{\perp} of a subspace U of \mathbb{R}^{n}, where (Definition [def:023776]):
U^{\perp}=\{\mathbf{x}\in \mathbb{R}^{n}\mid \mathbf{u}\bullet \mathbf{x}=0 \mbox{ for all } \mathbf{u}\in U\} \nonumber
The orthogonal complement plays an important role in the Projection Theorem (Theorem [thm:023885]), and we return to it in Section [sec:10_2]. For now we need:
svdlemma4 If A is any matrix then:
- (\func{row}A)^{\perp}=\func{null}Aand (\func{col}A)^{\perp}=\func{null}A^{T}.
- If U is any subspace of \mathbb{R}^{n} then U^{\perp\perp}=U.
- Let \{\mathbf{f}_{1},\ldots ,\mathbf{f}_{m}\} be an orthonormal basis of \mathbb{R}^{m}. If U=span \;\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{k}\}, then
U^{\perp}=span \;\{\mathbf{f}_{k+1},\ldots ,\mathbf{f}_{m}\} \nonumber
- Hence \func{null}A=(\func{row}A)^{\perp}. Now replace A by A^{T} to get \func{null}A^{T}=(\func{row}A^{T})^{\perp}=(\func{col}A)^{\perp}, which is the other identity in (1).
- If \mathbf{x}\in U then \mathbf{y}\dotprod \mathbf{x}=0 for all \mathbf{y}\in U^{\perp}, that is \mathbf{x}\in U^{\perp\perp}. This proves that U\subseteq U^{\perp\perp}, so it is enough to show that dim \;U=dim \;U^{\perp\perp}. By Theorem [thm:023953] we see that dim \;V^{\perp}=n-dim \;V for any subspace V\subseteq \mathbb{R}^{n}. Hence
dim \;U^{\perp\perp}=n-dim \;U^{\perp}=n-(n-dim \;U)=dim \;U, \mbox{ as required} \nonumber
- \begin{array}{ccccccccccccc} \mathbf{x} & = & (\mathbf{f}_{1}\dotprod \mathbf{x})\mathbf{f}_{1} & + & \cdots & + & (\mathbf{f}_{k}\dotprod \mathbf{x})\mathbf{f}_{k} & + &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1} & + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \\ & = & \mathbf{0} &+ &\cdots &+ &\mathbf{0} &+ &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1}& + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \end{array} \nonumber
With this we can see how any SVD for a matrix A provides orthonormal bases for each of the four fundamental subspaces of A.
svdtheorem2 Let A be an m\times n real matrix, let A=U\Sigma V^{T} be any SVD for A where U and V are orthogonal of size m\times m and n\times n respectively, and let
\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\qquad \mbox{where} \qquad D=\func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}), \mbox{ with each } \lambda_{i}>0 \nonumber
Write U=\left[ \begin{array}{ccccc}\mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m}\end{array}\right] and V=\left[ \begin{array}{ccccc}\mathbf{v}_{1} & \cdots & \mathbf{v}_{r} & \cdots & \mathbf{v}_{n}\end{array}\right], so \{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r},\ldots ,\mathbf{u}_{m}\} and \{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r},\ldots ,\mathbf{v}_{n}\} are orthonormal bases of \mathbb{R}^{m} and \mathbb{R}^{n} respectively. Then
- r=rank \;A, and the singular values of A are \sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots ,\sqrt{\lambda_{r}}.
- The fundamental spaces are described as follows:
- \{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\} is an orthonormal basis of \func{col}A.
- \{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\} is an orthonormal basis of \func{null}A^{T}.
- \{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\} is an orthonormal basis of \func{null}A.
- \{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r}\} is an orthonormal basis of \func{row}A.
- This is Lemma [lem:svdlemma3].
-
- As \func{col}A=\func{col}(AV) by Lemma [lem:015527] and AV=U\Sigma, (a.) follows from
U\Sigma =\left[ \begin{array}{ccccc} \mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m} \end{array}\right] \left[ \begin{array}{cc} \func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}) & 0 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{cccccc} \lambda_{1}\mathbf{u}_{1} & \cdots & \lambda_{r}\mathbf{u}_{r} & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \nonumber
- We have (\func{col}A)^{\perp}\overset{\text{(a.)}}{=}(span \;\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\})^{\perp}=span \;\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\} by Lemma [lem:svdlemma4](3). This proves (b.) because (\func{col}A)^{\perp}=\func{null}A^{T} by Lemma [lem:svdlemma4](1).
- So to prove (c.) it is enough to show that \mathbf{v}_{j}\in \func{null}A whenever j>r. To this end write
\lambda_{r+1}=\cdots =\lambda_{n}=0,\quad \mbox{so} \quad E^{T}E=\func{diag}(\lambda_{1}^{2},\ldots ,\lambda_{r}^{2},\lambda_{r+1}^{2},\ldots,\lambda_{n}^{2}) \nonumber
Observe that each \lambda_{j} is an eigenvalue of \Sigma^{T}\Sigma with eigenvector \mathbf{e}_{j}= column j of I_{n}. Thus \mathbf{v}_{j}=V\mathbf{e}_{j} for each j. As A^{T}A=V\Sigma^{T}\Sigma V^{T} (proof of Lemma [lem:svdlemma3]), we obtain
(A^{T}A)\mathbf{v}_{j}=(V\Sigma^{T}\Sigma V^{T})(V\mathbf{e}_{j})=V(\Sigma^{T}\Sigma \mathbf{e}_{j})=V\left( \lambda_{j}^{2}\mathbf{e}_{j}\right) =\lambda_{j}^{2}V\mathbf{e}_{j}=\lambda_{j}^{2}\mathbf{v}_{j} \nonumber
for 1\leq j\leq n. Thus each \mathbf{v}_{j} is an eigenvector of A^{T}A corresponding to \lambda_{j}^{2}. But then
\| A\mathbf{v}_{j}\|^{2}=(A\mathbf{v}_{j})^{T}A\mathbf{v}_{j}=\mathbf{v}_{j}^{T}(A^{T}A\mathbf{v}_{j})=\mathbf{v}_{j}^{T}(\lambda _{j}^{2}\mathbf{v}_{j})=\lambda_{j}^{2}\| \mathbf{v}_{j}\|^{2}=\lambda_{j}^{2}\quad \mbox{for } i=1,\ldots ,n \nonumber
In particular, A\mathbf{v}_{j}=\mathbf{0} whenever j>r, so \mathbf{v}_{j}\in \func{null}A if j>r, as desired. This proves (c).
- This proves (d.), and hence Theorem [thm:svdtheorem2].
- As \func{col}A=\func{col}(AV) by Lemma [lem:015527] and AV=U\Sigma, (a.) follows from
svdexample2 Consider the homogeneous linear system
A\mathbf{x}=\mathbf{0} \mbox{ of } m \mbox{ equations in } n \mbox{ variables} \nonumber
Then the set of all solutions is \func{null} A. Hence if A=U\Sigma V^{T} is any SVD for A then (in the notation of Theorem [thm:svdtheorem2]) \{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\} is an orthonormal basis of the set of solutions for the system. As such they are a set of basic solutions for the system, the most basic notion in Chapter [chap:1].
The Polar Decomposition of a Real Square Matrix
If A is real and n\times n the factorization in the title is related to the polar decomposition A. Unlike the SVD, in this case the decomposition is uniquely determined by A.
Recall (Section [sec:8_3]) that a symmetric matrix A is called positive definite if and only if \mathbf{x}^{T}A\mathbf{x}>0 for every column \mathbf{x} \neq \mathbf{0} \in \mathbb{R}^{n}. Before proceeding, we must explore the following weaker notion:
svddef5 A real n\times n matrix G is called positiveif it is symmetric and
\mathbf{x}^{T}G\mathbf{x}\geq 0 \quad \mbox{for all } \mathbf{x}\in \mathbb{R}^{n} \nonumber
Clearly every positive definite matrix is positive, but the converse fails. Indeed, A= \left[ \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right] is positive because, if \mathbf{x}=\left[ \begin{array}{cc} a & b \end{array}\right]^{T} in \mathbb{R}^{2}, then \mathbf{x}^{T}A\mathbf{x}=(a+b)^{2}\geq 0. But \mathbf{y}^{T}A\mathbf{y}=0 if \mathbf{y}=\left[ \begin{array}{cc} 1 & -1 \end{array}\right]^{T}, so A is not positive definite.
svdlemma5 Let G denote an n\times n positive matrix.
- If A is any m\times n matrix and G is positive, then A^{T}GA is positive (and m\times m).
- If G=\func{diag}(d_{1}, d_{2},\cdots ,d_{n}) and each d_{i}\geq 0 then G is positive.
- \mathbf{x}^{T}(A^{T}GA)\mathbf{x}=(A\mathbf{x})^{T}G(A\mathbf{x})\geq 0 because G is positive.
- If \mathbf{x}=\left[ \begin{array}{cccc}x_{1}& x_{2} & \cdots & x_{n}\end{array}\right]^{T}, then
\mathbf{x}^{T}G\mathbf{x}=d_{1}x_{1}^{2}+d_{2}x_{2}^{2}+\cdots +d_{n}x_{n}^{2}\geq 0 \nonumber
svddef6 If A is a real n\times n matrix, a factorization
A=GQ \mbox{ where } G \mbox{ is positive and } Q \mbox{ is orthogonal} \nonumber
is called a polar decomposition for A.
Any SVD for a real square matrix A yields a polar form for A.
svdtheorem3 Every square real matrix has a polar form.
Let A=U\Sigma V^{T} be a SVD for A with \Sigma as in Definition [def:svddef3] and m=n. Since U^{T}U=I_{n} here we have
A=U\Sigma V^{T}=(U\Sigma )(U^{T}U)V^{T}=(U\Sigma U^{T})(UV^{T}) \nonumber
So if we write G=U\Sigma U^{T} and Q=UV^{T}, then Q is orthogonal, and it remains to show that G is positive. But this follows from Lemma [lem:svdlemma5].
The SVD for a square matrix A is not unique (I_{n}=PI_{n}P^{T} for any orthogonal matrix P). But given the proof of Theorem [thm:svdtheorem3] it is surprising that the polar decomposition is unique.2 We omit the proof.
The name “polar form” is reminiscent of the same form for complex numbers (see Appendix [chap:appacomplexnumbers]). This is no coincidence. To see why, we represent the complex numbers as real 2\times 2 matrices. Write \mathbf{M}_{2}(\mathbb{R}) for the set of all real 2\times 2 matrices, and define
\sigma :\mathbb{C}\rightarrow\|{M}_{2}(\mathbb{R})\quad \mbox{by} \quad \sigma (a+bi)= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \mbox{ for all } a+bi \mbox{ in } \mathbb{C} \nonumber
One verifies that \sigma preserves addition and multiplication in the sense that
\sigma (zw)=\sigma (z)\sigma (w)\qquad \mbox{and}\qquad \sigma (z+w)=\sigma (z)+\sigma (w) \nonumber
for all complex numbers z and w. Since \theta is one-to-one we may identify each complex number a+bi with the matrix \theta (a+bi), that is we write
a+bi= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \quad \mbox{for all } a+bi \mbox{ in } \mathbb{C} \nonumber
Thus 0= \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right], 1= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] =I_{2}, i= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right], and r = \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] if r is real.
If z=a+bi is nonzero then the absolute value r=|z| =\sqrt{a^{2}+b^{2}}\neq 0. If \theta is the angle of z in standard position, then \cos\theta =a/r and \sin\theta =b/r. Observe:
\label{svdeqnxiii} \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} a/r & -b/r \\ b/r & a/r \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] = GQ \tag{\textbf{xiii}}
where G= \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] is positive and Q= \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] is orthogonal. But in \mathbb{C} we have G=r and Q=\cos\theta +i\sin\theta so ([svdeqnxiii]) reads z=r(\cos\theta +i\sin\theta )=re^{i\theta } which is the classical polar form for the complex number a+bi. This is why ([svdeqnxiii]) is called the polar form of the matrix \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right]; Definition [def:svddef6] simply adopts the terminology for n\times n matrices.
The Pseudoinverse of a Matrix
It is impossible for a non-square matrix A to have an inverse (see the footnote to Definition [def:004202]). Nonetheless, one candidate for an “inverse” of A is an m\times n matrix B such that
ABA=A \qquad \mbox{and} \qquad BAB=B \nonumber
Such a matrix B is called a middle inverse for A. If A is invertible then A^{-1} is the unique middle inverse for A, but a middle inverse is not unique in general, even for square matrices. For example, if A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right] then B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right] is a middle inverse for A for any b.
If ABA=A and BAB=B it is easy to see that AB and BA are both idempotent matrices. In 1955 Roger Penrose observed that the middle inverse is unique if both AB and BA are symmetric. We omit the proof.
Penrose’ Theorempenrosetheorem
Given any real m\times n matrix A, there is exactly one n\times m matrix B such that A and B satisfy the following conditions:
- ABA=A and BAB=B.
- Both AB and BA are symmetric.
svddef7 Let A be a real m\times n matrix. The pseudoinverse of A is the unique n\times m matrix A^{+} such that A and A^{+} satisfy P1 and P2, that is:
AA^{+}A=A,\qquad A^{+}AA^{+}=A^{+},\qquad \mbox{and both } AA^{+} \mbox{ and } A^{+}A \mbox{ are symmetric}\footnotemark \nonumber
If A is invertible then A^{+}=A^{-1} as expected. In general, the symmetry in conditions P1 and P2 shows that A is the pseudoinverse of A^{+}, that is A^{++}=A.
svdtheorem4 Let A be an m\times n matrix.
- If rank \;A=m then AA^{T} is invertible and A^{+}=A^{T}(AA^{T})^{-1}.
- If rank \;A=n then A^{T}A is invertible and A^{+}=(A^{T}A)^{-1}A^{T}.
Here AA^{T} (respectively A^{T}A) is invertible by Theorem [thm:015711] (respectively Theorem [thm:015672]). The rest is a routine verification.
In general, given an m\times n matrix A, the pseudoinverse A^{+} can be computed from any SVD for A. To see how, we need some notation. Let A=U\Sigma V^{T} be an SVD for A (as in Definition [def:svddef3]) where U and V are orthogonal and \Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n} in block form where D=\func{diag}(d_{1},d_{2},\ldots ,d_{r}) where each d_{i}>0. Hence D is invertible, so we make:
svddef8 \Sigma^{\prime }= \left[ \begin{array}{cc} D^{-1} & 0 \\ 0 & 0 \end{array} \right]_{n\times m}.
A routine calculation gives:
svdlemma6
2
- \Sigma \Sigma^{\prime }\Sigma =\Sigma
- \Sigma^{\prime }\Sigma \Sigma^{\prime }=\Sigma^{\prime }
- \Sigma \Sigma^{\prime }= \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{m\times m}
- \Sigma^{\prime }\Sigma = \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{n\times n}
That is, \Sigma^{\prime } is the pseudoinverse of \Sigma.
Now given A=U\Sigma V^{T}, define B=V\Sigma^{\prime }U^{T}. Then
ABA=(U\Sigma V^{T})(V\Sigma^{\prime }U^{T})(U\Sigma V^{T})=U(\Sigma \Sigma^{\prime }\Sigma )V^{T}=U\Sigma V^{T}=A \nonumber
by Lemma [lem:svdlemma6]. Similarly BAB=B. Moreover AB=U(\Sigma \Sigma^{\prime })U^{T} and BA=V(\Sigma^{\prime }\Sigma )V^{T} are both symmetric again by Lemma [lem:svdlemma6]. This proves
svdtheorem5 Let A be real and m\times n, and let A=U\Sigma V^{T} is any SVD for A as in Definition [def:svddef3]. Then A^{+}=V\Sigma^{\prime }U^{T}.
Of course we can always use the SVD constructed in Theorem [thm:svdtheorem1] to find the pseudoinverse. If A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right], we observed above that B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right] is a middle inverse for A for any b. Furthermore AB is symmetric but BA is not, so B\neq A^{+}.
svdexample3 Find A^{+} if A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right].
A^{T}A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right] with eigenvalues \lambda_{1}=1 and \lambda_{2}=0 and corresponding eigenvectors \mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] and \mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 1 \end{array} \right]. Hence Q=\left[ \begin{array}{cc}\mathbf{q}_{1}&\mathbf{q}_{2}\end{array}\right]=I_{2}. Also A has rank 1 with singular values \sigma_{1}=1 and \sigma_{2}=0, so \Sigma_{A}= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]=A and \Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]=A^{T} in this case.
Since A\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] and A\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right], we have \mathbf{p}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] which extends to an orthonormal basis \{\mathbf{p}_{1},\mathbf{p}_{2},\mathbf{p}_{3}\} of \mathbb{R}^{3} where (say) \mathbf{p}_{2}= \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right] and \mathbf{p}_{3}= \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]. Hence P=\left[ \begin{array}{ccc} \mathbf{p}_{1}&\mathbf{p}_{2}&\mathbf{p}_{3}\end{array}\right]=I, so the SVD for A is A=P\Sigma _{A}Q^{T}. Finally, the pseudoinverse of A is A^{+}=Q\Sigma^{\prime}_{A}P^{T}=\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]. Note that A^{+}=A^{T} in this case.
The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse. The verifications are left as exercises.
svdlemma7 Let A be an m\times n matrix of rank r.
- A^{++}=A.
- If A is invertible then A^{+}=A^{-1}.
- (A^{T})^{+}=(A^{+})^{T}.
- (kA)^{+}=kA^{+} for any real k.
- (UAV)^{+}=U^{T}(A^{+})V^{T} whenever U and V are orthogonal.
- In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]↩
- See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.↩