11.1: The Singular Value Decomposition
( \newcommand{\kernel}{\mathrm{null}\,}\)
,The singular value decomposition is another name for the spectral representation of a rectangular matrix. Of course if A is m-by-m and m≠n then it does not make sense to speak of the eigenvalues of A. We may, however, rely on the previous section to give us relevant spectral representations of the two symmetric matrices
- ATA
- AAT
That these two matrices together may indeed tell us 'everything' about A can be gleaned from
N(ATA)=N(A)
N(AAT)=N(AT)
R(ATA)=R(AT)
R(AAT)=R(A)
You have proven the first of these in a previous exercise. The proof of the second is identical. The row and column space results follow from the first two via orthogonality.
On the spectral side, we shall now see that the eigenvalues of AAT and ATA are nonnegative and that their nonzero eigenvalues coincide. Let us first confirm this on the adjacency matrix associated with the unstable swing
A=(0100−10100001)
The respective products are
AAT=(100020001)
ATA=(10−100100−10100001)
Analysis of the first is particularly simple. Its null space is clearly just the zero vector while λ1=2 and λ2=1 are its eigenvalues. Their geometric multiplicities are n1=1 and n2=2. In ATA we recognize the S matrix from the exercise in another module and recall that its eigenvalues are λ1=2, λ2=1, and λ3=0 with multiplicities n1=1, n2=2, and n3=1. Hence, at least for this A, the eigenvalues of AAT and ATA are nonnegative and their nonzero eigenvalues coincide. In addition, the geometric multiplicities of the nonzero eigenvalues sum to 3, the rank of A.
The eigenvalues of AAT and ATA are nonnegative. Their nonzero eigenvalues, including geometric multiplicities, coincide. The geometric multiplicities of the nonzero eigenvalues sum to the rank of A.
If ATAx=λx then xTATAx=λxTx, i.e., (||Ax||)2=λ(||x||)2 and so λ≥0. A similar argument works for AAT.
Now suppose that λj>0 and that {xj,k}njk=1 constitutes an orthogonal basis for the eigenspace R(Pj), starting from
ATAxj,k=λjxj,k
we find, on multiplying through (from the left) by A that
AATAxj,k=λjAxj,k
i.e., λj is an eigenvalue of AAT with eigenvector Axj,k, so long as Axj,k≠0.
It follows from the first paragraph of this proof that ||Axj,k||=√λj, which, by hypothesis, is nonzero. Hence,
∀1≤k≤nj:(yj,k≡Axj,k√λj
is a collection of unit eigenvectors of AAT associated with λj. Let us now show that these vectors are orthonormal for fixed j.
yTj,iyj,k=1λjxTj,iATAxj,k=xTj,ixj,k=0
We have now demonstrated that if λj>0 is an eigenvalue of ATA of geometric multiplicity nj. Reversing the argument, i.e., generating eigenvectors of ATA from those of AAT we find that the geometric multiplicities must indeed coincide.
Regarding the rank statement, we discern from Equation that if λj>0 then xj,k∈R(ATA). The union of these vectors indeed constitutes a basis for R(ATA), for anything orthogonal to each of these xj,k necessarily lies in the eigenspace corresponding to a zero eigenvalue, i.e., in N(ATA). As R(ATA)=R(AT) it follows that dimR(ATA)=rdimRATA=r and hence the nj, for λj>0, sum to r.
Let us now gather together some of the separate pieces of the proof. For starters, we order the eigenvalues of ATA from high to low,
λ1>λ2>⋯>λh
and write
ATA=XΛnXT
where
∀Xj={xj,1,⋯,xj,nj}:(X={X1,⋯,Xh})
and Λn is the n−by−n diagonal matrix with λ1 in the first n1 slots, λ2 in the next n2 slots, etc. Similarly
AAT=YΛmYT
where
∀Yj={yj,1,⋯,yj,nj}:(Y={Y1,⋯,Yh})
and Λm is the mmmm diagonal matrix with λ1 in the first n1 slots, λ2 in the next n2 slots, etc. The yj,k were defined in Equation under the assumption that λj>0. If λj=0 let Yj denote an orthonormal basis for N(AAT). Finally, call
σj=√λj
and let Σ denote the m-by-n matrix diagonal matrix with σ1 in the first n1 slots, σ2 in the next n2 slots, etc. Notice that
ΣTΣ=Λn
ΣΣT=Λm
Now recognize that Equation may be written
Axj,k=σjyj,k
and that this is simply the column by column rendition of
AX=YΣ
As XXT=I we may multiply through (from the right) by XT and arrive at the singular value decomposition of A
A=YΣXT
Let us confirm this on the A matrix in Equation. We have
Λ4=(2000010000100000)
X=1√2(−10010√200100100√20)
Λ3=(200010001)
Y=(010100001)
Hence
Λ=(√200001000010)
and so A=YΣXT says that A should coincide with
(010100001)(√200001000010)(−1√2001√2010000011√201√20)
This indeed agrees with A. It also agrees (up to sign changes on the columns of X with what one receives upon typing[Y, SIG, X] = scd(A)
in Matlab.
You now ask what we get for our troubles. I express the first dividend as a proposition that looks to me like a quantitative version of the fundamental theorem of linear algebra.
If YΣXT is the singular value decomposition of A then
- The rank of A, call it r, is the number of nonzero elements in Σ
- The first r columns of X constitute an orthonormal basis for R(AT). The n−r last columns of X constitute an orthonormal basis for N(A)
- The first r columns of Y constitute an orthonormal basis for R(A). The m−r last columns of Y constitute an orthonormal basis for N(AT)
Let us now 'solve' Ax=b with the help of the pseudo-inverse of A. You know the 'right' thing to do, namely reciprocate all of the nonzero singular values. Because m is not necessarily n we must also be careful with dimensions. To be precise, let Σ+ denote the n−by−m matrix whose first n1 diagonal elements are 1σ1, whose next n2 diagonal elements are 1σ2 and so on. In the case that σh=0, set the final nh diagonal elements of Σ+ to zero. Now, one defines the pseudo-inverse of A to be
A+≡XΣ+YT
In the case of that A is that appearing in Equation we find
Σ+=(√200010001000)
and so
(−1√2001√2010000011√201√20)(1√200010001000)(010100001)
therefore
A+=(0−1201000120001)
in agreement with what appears from pinv(A)
. Let us now investigate the sense in which A+ is the inverse of A. Suppose that b∈Rm and that we wish to solve Ax=b. We suspect that A+b should be a good candidate. Observe by Equation that
(ATA)A+b=XΛnXTXΣ+YTb
because XTX=I
(ATA)A+b=XΛnΣ+YTb
(ATA)A+b=XΣTΣσ+YTb
because ΣTΣΣ+=ΣT
(ATA)A+b=XΣTYTb
(ATA)A+b=ATb
that is A+b satisfies the least-squares problem ATAx=ATb.