11.1: The Singular Value Decomposition
,The singular value decomposition is another name for the spectral representation of a rectangular matrix. Of course if \(A\) is m-by-m and \(m \ne n\) then it does not make sense to speak of the eigenvalues of \(A\). We may, however, rely on the previous section to give us relevant spectral representations of the two symmetric matrices
- \(A^{T}A\)
- \(AA^{T}\)
That these two matrices together may indeed tell us 'everything' about \(A\) can be gleaned from
\[\mathscr{N}(A^{T}A) = \mathscr{N}(A) \nonumber\]
\[\mathscr{N}(AA^{T}) = \mathscr{N}(A^T) \nonumber\]
\[\mathscr{R}(A^{T}A) = \mathscr{R}(A^T) \nonumber\]
\[\mathscr{R}(AA^{T}) = \mathscr{R}(A) \nonumber\]
You have proven the first of these in a previous exercise. The proof of the second is identical. The row and column space results follow from the first two via orthogonality.
On the spectral side, we shall now see that the eigenvalues of \(AA^{T}\) and \(A^{T}A\) are nonnegative and that their nonzero eigenvalues coincide. Let us first confirm this on the adjacency matrix associated with the unstable swing
\[A = \begin{pmatrix} {0}&{1}&{0}&{0}\\ {-1}&{0}&{1}&{0}\\ {0}&{0}&{0}&{1} \end{pmatrix} \nonumber\]
The respective products are
\[AA^{T} = \begin{pmatrix} {1}&{0}&{0}\\ {0}&{2}&{0}\\ {0}&{0}&{1} \end{pmatrix} \nonumber\]
\[A^{T}A = \begin{pmatrix} {1}&{0}&{-1}&{0}\\ {0}&{1}&{0}&{0}\\ {-1}&{0}&{1}&{0}\\ {0}&{0}&{0}&{1} \end{pmatrix} \nonumber\]
Analysis of the first is particularly simple. Its null space is clearly just the zero vector while \(\lambda_{1} = 2\) and \(\lambda_{2} = 1\) are its eigenvalues. Their geometric multiplicities are \(n_{1} = 1\) and \(n_{2} = 2\). In \(A^{T}A\) we recognize the \(S\) matrix from the exercise in another module and recall that its eigenvalues are \(\lambda_{1} = 2\), \(\lambda_{2} = 1\), and \(\lambda_{3} = 0\) with multiplicities \(n_{1} = 1\), \(n_{2} = 2\), and \(n_{3} = 1\). Hence, at least for this \(A\), the eigenvalues of \(AA^{T}\) and \(A^{T}A\) are nonnegative and their nonzero eigenvalues coincide. In addition, the geometric multiplicities of the nonzero eigenvalues sum to 3, the rank of \(A\).
The eigenvalues of \(AA^{T}\) and \(A^{T}A\) are nonnegative. Their nonzero eigenvalues, including geometric multiplicities, coincide. The geometric multiplicities of the nonzero eigenvalues sum to the rank of \(A\).
If \(A^{T}A x = \lambda x\) then \(x^{T}A^{T}Ax = \lambda x^{T}x\), i.e., \((||Ax||)^2 = \lambda (||x||)^2\) and so \(\lambda \ge 0\). A similar argument works for \(AA^{T}\).
Now suppose that \(\lambda_{j} > 0\) and that \(\{x_{j,k}\}^{n_{j}}_{k = 1}\) constitutes an orthogonal basis for the eigenspace \(\mathscr{R}(P_{j})\), starting from
\[A^{T}Ax_{j,k} = \lambda_{j} x_{j,k} \nonumber\]
we find, on multiplying through (from the left) by \(A\) that
\[AA^{T}Ax_{j,k} = \lambda_{j} A x_{j,k} \nonumber\]
i.e., \(\lambda_{j}\) is an eigenvalue of \(AA^{T}\) with eigenvector \(Ax_{j,k}\), so long as \(Ax_{j,k} \ne 0\).
It follows from the first paragraph of this proof that \(||Ax_{j,k}|| = \sqrt{\lambda_{j}}\), which, by hypothesis, is nonzero. Hence,
\[\forall 1 \le k \le n_{j} : (y_{j,k} \equiv \frac{Ax_{j,k}}{\sqrt{\lambda_{j}}} \nonumber\]
is a collection of unit eigenvectors of \(AA^{T}\) associated with \(\lambda_{j}\). Let us now show that these vectors are orthonormal for fixed \(j\).
\[y^{T}_{j,i} y_{j,k} = \frac{1}{\lambda_{j}} x^{T}_{j,i} A^{T}Ax_{j,k} = x^{T}_{j,i}x_{j,k} = 0 \nonumber\]
We have now demonstrated that if \(\lambda_{j} > 0\) is an eigenvalue of \(A^{T}A\) of geometric multiplicity \(n_{j}\). Reversing the argument, i.e., generating eigenvectors of \(A^{T}A\) from those of \(AA^{T}\) we find that the geometric multiplicities must indeed coincide.
Regarding the rank statement, we discern from Equation that if \(\lambda_{j} > 0\) then \(x_{j,k} \in \mathscr{R}(A^{T}A)\). The union of these vectors indeed constitutes a basis for \(\mathscr{R}(A^{T}A)\), for anything orthogonal to each of these \(x_{j,k}\) necessarily lies in the eigenspace corresponding to a zero eigenvalue, i.e., in \(\mathscr{N}(A^{T}A)\). As \(\mathscr{R}(A^{T}A) = \mathscr{R}(A^T)\) it follows that \(\dim \mathscr{R}(A^{T}A) = r \dim \mathscr{R} A^{T} A = r\) and hence the \(n_{j}\), for \(\lambda_{j} > 0\), sum to \(r\).
Let us now gather together some of the separate pieces of the proof. For starters, we order the eigenvalues of \(A^{T}A\) from high to low,
\[\lambda_{1} > \lambda_{2} > \cdots > \lambda_{h} \nonumber\]
and write
\[A^{T} A = X \Lambda_{n} X^T \nonumber\]
where
\[\forall X_{j} = \{x_{j,1}, \cdots, x_{j, n_{j}}\} : (X = \{X_{1}, \cdots, X_{h}\}) \nonumber\]
and \(\Lambda_{n}\) is the \(n-by-n\) diagonal matrix with \(\lambda_{1}\) in the first \(n_{1}\) slots, \(\lambda_{2}\) in the next \(n_{2}\) slots, etc. Similarly
\[AA^{T} = Y \Lambda_{m} Y^T \nonumber\]
where
\[\forall Y_{j} = \{y_{j,1}, \cdots, y_{j, n_{j}}\} : (Y = \{Y_{1}, \cdots, Y_{h}\}) \nonumber\]
and \(\Lambda_{m}\) is the mmmm diagonal matrix with \(\lambda_{1}\) in the first \(n_{1}\) slots, \(\lambda_{2}\) in the next \(n_{2}\) slots, etc. The \(y_{j, k}\) were defined in Equation under the assumption that \(\lambda_{j} > 0\). If \(\lambda_{j} = 0\) let \(Y_{j}\) denote an orthonormal basis for \(\mathscr{N}(AA^{T})\). Finally, call
\[\sigma_{j} = \sqrt{\lambda_{j}} \nonumber\]
and let \(\Sigma\) denote the m-by-n matrix diagonal matrix with \(\sigma_{1}\) in the first \(n_{1}\) slots, \(\sigma_{2}\) in the next \(n_{2}\) slots, etc. Notice that
\[\Sigma^{T} \Sigma = \Lambda_{n} \nonumber\]
\[\Sigma \Sigma^{T} = \Lambda_{m} \nonumber\]
Now recognize that Equation may be written
\[Ax_{j,k} = \sigma_{j} y_{j,k} \nonumber\]
and that this is simply the column by column rendition of
\[AX = Y\Sigma \nonumber\]
As \(XX^{T} = I\) we may multiply through (from the right) by \(X^{T}\) and arrive at the singular value decomposition of \(A\)
\[A = Y \Sigma X^{T} \nonumber\]
Let us confirm this on the \(A\) matrix in Equation. We have
\[\Lambda_{4} = \begin{pmatrix} {2}&{0}&{0}&{0}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{1}&{0}\\ {0}&{0}&{0}&{0} \end{pmatrix} \nonumber\]
\[X = \frac{1}{\sqrt{2}} \begin{pmatrix} {-1}&{0}&{0}&{1}\\ {0}&{\sqrt{2}}&{0}&{0}\\ {1}&{0}&{0}&{1}\\ {0}&{0}&{\sqrt{2}}&{0} \end{pmatrix} \nonumber\]
\[\Lambda_{3} = \begin{pmatrix} {2}&{0}&{0}\\ {0}&{1}&{0}\\ {0}&{0}&{1} \end{pmatrix} \nonumber\]
\[Y = \begin{pmatrix} {0}&{1}&{0}\\ {1}&{0}&{0}\\ {0}&{0}&{1} \end{pmatrix} \nonumber\]
Hence
\[\Lambda = \begin{pmatrix} {\sqrt{2}}&{0}&{0}&{0}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{1}&{0} \end{pmatrix} \nonumber\]
and so \(A = Y \Sigma X^T\) says that \(A\) should coincide with
\[\begin{pmatrix} {0}&{1}&{0}\\ {1}&{0}&{0}\\ {0}&{0}&{1} \end{pmatrix} \begin{pmatrix} {\sqrt{2}}&{0}&{0}&{0}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{1}&{0} \end{pmatrix} \begin{pmatrix} {-\frac{1}{\sqrt{2}}}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{0}&{1}\\ {\frac{1}{\sqrt{2}}}&{0}&{\frac{1}{\sqrt{2}}}&{0} \end{pmatrix}\]
This indeed agrees with \(A\). It also agrees (up to sign changes on the columns of \(X\) with what one receives upon typing
[Y, SIG, X] = scd(A)
in Matlab.
You now ask what we get for our troubles. I express the first dividend as a proposition that looks to me like a quantitative version of the fundamental theorem of linear algebra.
If \(Y \Sigma X^T\) is the singular value decomposition of \(A\) then
- The rank of \(A\), call it \(r\), is the number of nonzero elements in \(\Sigma\)
- The first \(r\) columns of \(X\) constitute an orthonormal basis for \(\mathscr{R}(A^T)\). The \(n-r\) last columns of \(X\) constitute an orthonormal basis for \(\mathscr{N}(A)\)
- The first \(r\) columns of \(Y\) constitute an orthonormal basis for \(\mathscr{R}(A)\). The \(m-r\) last columns of \(Y\) constitute an orthonormal basis for \(\mathscr{N}(A^T)\)
Let us now 'solve' \(A \textbf{x} = \textbf{b}\) with the help of the pseudo-inverse of \(A\). You know the 'right' thing to do, namely reciprocate all of the nonzero singular values. Because \(m\) is not necessarily \(n\) we must also be careful with dimensions. To be precise, let \(\Sigma^{+}\) denote the \(n-by-m\) matrix whose first \(n_{1}\) diagonal elements are \(\frac{1}{\sigma_{1}}\), whose next \(n_{2}\) diagonal elements are \(\frac{1}{\sigma_{2}}\) and so on. In the case that \(\sigma_{h} = 0\), set the final \(n_{h}\) diagonal elements of \(\Sigma^{+}\) to zero. Now, one defines the pseudo-inverse of \(A\) to be
\[A^{+} \equiv X \Sigma^{+}Y^{T} \nonumber\]
In the case of that \(A\) is that appearing in Equation we find
\[\Sigma^{+} = \begin{pmatrix} {\sqrt{2}}&{0}&{0}\\ {0}&{1}&{0}\\ {0}&{0}&{1}\\ {0}&{0}&{0} \end{pmatrix} \nonumber\]
and so
\[\begin{pmatrix} {-\frac{1}{\sqrt{2}}}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{0}&{1}\\ {\frac{1}{\sqrt{2}}}&{0}&{\frac{1}{\sqrt{2}}}&{0} \end{pmatrix} \begin{pmatrix} {\frac{1}{\sqrt{2}}}&{0}&{0}\\ {0}&{1}&{0}\\ {0}&{0}&{1}\\ {0}&{0}&{0} \end{pmatrix} \begin{pmatrix} {0}&{1}&{0}\\ {1}&{0}&{0}\\ {0}&{0}&{1} \end{pmatrix}\]
therefore
\[A^{+} = \begin{pmatrix} {0}&{\frac{-1}{2}}&{0}\\ {1}&{0}&{0}\\ {0}&{\frac{1}{2}}&{0}\\ {0}&{0}&{1} \end{pmatrix} \nonumber\]
in agreement with what appears from
pinv(A)
. Let us now investigate the sense in which \(A^{+}\) is the inverse of \(A\). Suppose that \(b \in \mathbb{R}^m\) and that we wish to solve \(A \textbf{x} = \textbf{b}\). We suspect that \(A^{+}b\) should be a good candidate. Observe by Equation that
\[(A^{T}A)A^{+} b = X \Lambda_{n} X^{T} X \Sigma^{+}Y^{T} b \nonumber\]
because \(X^{T}X = I\)
\[(A^{T}A)A^{+} b = X \Lambda_{n} \Sigma^{+}Y^{T} b \nonumber\]
\[(A^{T}A)A^{+} b = X \Sigma^{T} \Sigma \sigma^{+} Y^{T} b \nonumber\]
because \(\Sigma^{T} \Sigma \Sigma^{+} = \Sigma^{T}\)
\[(A^{T}A)A^{+} b = X \Sigma^{T} Y^{T} b \nonumber\]
\[(A^{T}A)A^{+} b = A^{T} b \nonumber\]
that is \(A^{+}b\) satisfies the least-squares problem \(A^{T}Ax = A^{T} b\).