Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

8.6: The Singular Value Decomposition

( \newcommand{\kernel}{\mathrm{null}\,}\)

When working with a square matrix A it is clearly useful to be able to “diagonalize” A, that is to find a factorization A=Q1DQ where Q is invertible and D is diagonal. Unfortunately such a factorization may not exist for A. However, even if A is not square gaussian elimination provides a factorization of the form A=PDQ where P and Q are invertible and D is diagonal—the Smith Normal form (Theorem [thm:005369]). However, if A is real we can choose P and Q to be orthogonal real matrices and D to be real. Such a factorization is called a singular value decomposition (SVD) for A, one of the most useful tools in applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix A, and illustrate some of its many applications.

We need a fact about two subspaces associated with an m×n matrix A:

imA={Axx in Rn}and\funccolA=span{aa is a column of A}

Then imA is called the image of A (so named because of the linear transformation RnRm with xAx); and \funccolA is called the column space of A (Definition [def:015376]). Surprisingly, these spaces are equal:

svdlemma1 For any m×n matrix A, imA=\funccolA.

Let A=[a1a2an] in terms of its columns. Let ximA, say x=Ay, y in Rn. If y=[y1y2yn]T, then Ay=y1a1+y2a2++ynan\funccolA by Definition [def:002668]. This shows that imA\funccolA. For the other inclusion, each ak=Aek where ek is column k of In.

Singular Value Decompositions

We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem [thm:016397]), and it is orthogonally diagonalizable by the Principal Axes Theorem (Theorem [thm:024303]). So for any real matrix A (square or not), the fact that both ATA and AAT are real and symmetric suggests that we can learn a lot about A by studying them. This section shows just how true this is.

The following Lemma reveals some similarities between ATA and AAT which simplify the statement and the proof of the SVD we are constructing.

svdlemma2 Let A be a real m×n matrix. Then:

  1. The eigenvalues of ATA and AAT are real and non-negative.
  2. ATA and AAT have the same set of positive eigenvalues.
  1. Then (1.) follows for ATA, and the case AAT follows by replacing A by AT.
  2. Moreover, Aq0 since ATAq=λq0 and both λ0 and q0. Hence λ is an eigenvalue of AAT, proving N(ATA)N(AAT). For the other inclusion replace A by AT.

To analyze an m×n matrix A we have two symmetric matrices to work with: ATA and AAT. In view of Lemma [lem:svdlemma2], we choose ATA (sometimes called the Gram matrix of A), and derive a series of facts which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in several steps:

  1. The n×n matrix ATA is real and symmetric so, by the Principal Axes Theorem [thm:024303], let {q1,q2,,qn}Rn be an orthonormal basis of eigenvectors of ATA, with corresponding eigenvalues λ1,λ2,,λn. By Lemma [lem:svdlemma2](1), λi is real for each i and λi0. By re-ordering the qi we may (and do) assume that

    λ1λ2λr>0 and \footnoteOfcoursetheycould\emphallbepositive$(r=n)$or\emphallzero(so$ATA=0$,andhence$A=0$byExercise???).λi=0 if i>r

    Q=[q1q2qn] is orthogonal and orthogonally diagonalizes ATA

  2. Because {q1,q2,,qn} is an orthonormal set, this gives

    Aqi\dotprodAqj=0 if ijandAqi2=λiqi2=λi for each i

    We can extract two conclusions from ([svdeqniii]) and ([svdeqni]):

    {Aq1,Aq2,,Aqr}imA is an orthogonal setandAqi=0 if i>r

    With this write U=span{Aq1,Aq2,,Aqr}imA; we claim that U=imA, that is imAU. For this we must show that AxU for each xRn. Since {q1,,qr,,qn} is a basis of Rn (it is orthonormal), we can write xk=t1q1++trqr++tnqn where each tjR. Then, using ([svdeqniv]) we obtain

    Ax=t1Aq1++trAqr++tnAqn=t1Aq1++trAqrU

    This shows that U=imA, and so

    {Aq1,Aq2,,Aqr} is an \emph{orthogonal} basis of im(A)

    But \funccolA=imA by Lemma [lem:svdlemma1], and rankA=dim(\funccolA) by Theorem [thm:015444], so

    rankA=dim(\funccolA)=dim(imA)(\mathbf{v})=r

  3. svddef1 The real numbers σi=λi(\textbf{iii})=A¯qi for i=1,2,,n, are called the singular values of the matrix A.

    Clearly σ1,σ2,,σr are the positive singular values of A. By ([svdeqni]) we have

    σ1σ2σr>0andσi=0 if i>r

    With ([svdeqnvi]) this makes the following definitions depend only upon A.

    svddef2 Let A be a real, m×n matrix of rank r, with positive singular values σ1σ2σr>0 and σi=0 if i>r. Define:

    DA=\funcdiag(σ1,,σr)andΣA=[DA000]m×n

    Here ΣA is in block form and is called the singular matrix of A.

    The singular values σi and the matrices DA and ΣA will be referred to frequently below.

  4. Returning to our narrative, normalize the vectors Aq1, Aq2,,Aqr, by defining

    pi=1AqiAqiRmfor each i=1,2,,r

    {p1,p2,,pr} is an \emph{orthonormal} basis of \funccolARm

    Employing the Gram-Schmidt algorithm (or otherwise), construct pr+1,,pm so that

    {p1,,pr,,pm} is an orthonormal basis of Rm

  5. These matrices are related. In fact we have:

    σipi=λipi(\textbf{iii})=Aqipi(\textbf{viii})=Aqi for each i=1,2,,r

    This yields the following expression for AQ in terms of its columns:

    AQ=[Aq1AqrAqr+1Aqn](\textbf{iv})=[σ1p1σrpr00]

    Then we compute:

    PΣA=[p1prpr+1pm][σ100σr00000 00 00000]=[σ1p1σrpr00](\textbf{xii})=AQ

    Finally, as Q1=QT it follows that A=PΣAQT.

With this we can state the main theorem of this Section.

svdtheorem1 Let A be a real m×n matrix, and let σ1σ2σr>0 be the positive singular values of A. Then r is the rank of A and we have the factorization

A=PΣAQTwhere P and Q are orthogonal matrices

The factorization A=PΣAQT in Theorem [thm:svdtheorem1], where P and Q are orthogonal matrices, is called a Singular Value Decomposition (SVD) of A. This decomposition is not unique. For example if r<m then the vectors pr+1,,pm can be any extension of {p1,,pr} to an orthonormal basis of Rm, and each will lead to a different matrix P in the decomposition. For a more dramatic example, if A=In then ΣA=In, and A=PΣAPT is a SVD of A for any orthogonal n×n matrix P.

svdexample1 Find a singular value decomposition for A=[101110].

We have ATA=[211110101], so the characteristic polynomial is

cATA(x)=det

Hence the eigenvalues of A^{T}A (in descending order) are \lambda_{1}=3, \lambda_{2}=1 and \lambda_{3}=0 with, respectively, unit eigenvectors

\mathbf{q}_{1}=\frac{1}{\sqrt{6}} \left[ \begin{array}{r} 2 \\ -1 \\ 1 \end{array} \right], \quad \mathbf{q}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right],\quad \mbox{and} \quad \mathbf{q}_{3}=\frac{1}{\sqrt{3}} \left[ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right] \nonumber

It follows that the orthogonal matrix Q in Theorem [thm:svdtheorem1] is

Q=\left[ \begin{array}{ccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3}\end{array}\right]=\frac{1}{\sqrt{6}} \left[ \begin{array}{rrr} 2 & 0 & -\sqrt{2} \\ -1 & \sqrt{3} & -\sqrt{2} \\ 1 & \sqrt{3} & \sqrt{2} \end{array} \right] \nonumber

The singular values here are \sigma_{1}=\sqrt{3}, \sigma_{2}=1 and \sigma_{3}=0, so rank \;(A)=2—clear in this case—and the singular matrix is

\Sigma_{A}= \left[ \begin{array}{ccc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \end{array} \right] =\left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \nonumber

So it remains to find the 2\times 2 orthogonal matrix P in Theorem [thm:svdtheorem1]. This involves the vectors

A\mathbf{q}_{1}=\frac{\sqrt{6}}{2} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right],\quad A\mathbf{q}_{2}=\frac{\sqrt{2}}{2} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] ,\quad \mbox{and} \quad A\mathbf{q}_{3}= \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \nonumber

Normalize A\mathbf{q}_{1} and A\mathbf{q}_{2} to get

\mathbf{p}_{1}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right] \quad \mbox{and} \quad \mathbf{p}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \nonumber

In this case, \{\mathbf{p}_{1},\mathbf{p}_{2}\} is already a basis of \mathbb{R}^{2} (so the Gram-Schmidt algorithm is not needed), and we have the 2\times 2 orthogonal matrix

P=\left[ \begin{array}{cc} \mathbf{p}_{1} & \mathbf{p}_{2} \end{array}\right]=\frac{1}{\sqrt{2}} \left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \nonumber

Finally (by Theorem [thm:svdtheorem1]) the singular value decomposition for A is

A=P\Sigma_{A}Q^{T}= \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \frac{1}{\sqrt{6}}\left[ \begin{array}{rrr} 2 & -1 & 1 \\ 0 & \sqrt{3} & \sqrt{3} \\ -\sqrt{2} & -\sqrt{2} & \sqrt{2} \end{array} \right] \nonumber

Of course this can be confirmed by direct matrix multiplication.

Thus, computing an SVD for a real matrix A is a routine matter, and we now describe a systematic procedure for doing so.

SVD Algorithmsvdalgorithm Given a real m\times n matrix A, find an SVD A=P\Sigma_{A}Q^{T} as follows:

  1. Use the Diagonalization Algorithm (see page ) to find the (real and non-negative) eigenvalues \lambda_{1},\lambda_{2},\ldots,\lambda_{n} of A^{T}A with corresponding (orthonormal) eigenvectors \mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}. Reorder the \mathbf{q}_{i} (if necessary) to ensure that the nonzero eigenvalues are \lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0 and \lambda_{i}=0 if i>r.
  2. The integer r is the rank of the matrix A.
  3. The n\times n orthogonal matrix Q in the SVD is Q=\left[ \begin{array}{cccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n}\end{array}\right].
  4. Define \mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\|}A\mathbf{q}_{i} for i=1,2,\ldots ,r (where r is as in step 1). Then \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} is orthonormal in \mathbb{R}^{m} so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis \{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots , \mathbf{p}_{m}\} in \mathbb{R}^{m}.
  5. The m\times m orthogonal matrix P in the SVD is P=\left[ \begin{array}{ccccc} \mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \cdots & \mathbf{p}_{m}\end{array}\right].
  6. The singular values for A are \sigma_{1},\sigma_{2,}\ldots,\sigma_{n} where \sigma_{i}=\sqrt{\lambda_{i}} for each i. Hence the nonzero singular values are \sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0, and so the singular matrix of A in the SVD is \Sigma_{A}= \left[ \begin{array}{cc} \func{diag}(\sigma_{1},\ldots ,\sigma_{r}) & 0 \\ 0 & 0 \end{array} \right]_{m\times n}.
  7. Thus A=P\Sigma Q^{T} is a SVD for A.

In practise the singular values \sigma_{i}, the matrices P and Q, and even the rank of an m\times n matrix are not calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of accuracy. The reader is referred to books on numerical linear algebra.

So the main virtue of Theorem [thm:svdtheorem1] is that it provides a way of constructing an SVD for every real matrix A. In particular it shows that every real matrix A has a singular value decomposition1 in the following, more general, sense:

svddef3 A Singular Value Decomposition (SVD) of an m\times n matrix A of rank r is a factorization A=U\Sigma V^{T} where U and V are orthogonal and \Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n} in block form where D=\func{diag}(d_{1},d_{2},\ldots ,d_{r}) where each d_{i}>0, and r\leq m and r\leq n.

Note that for any SVD A=U\Sigma V^{T} we immediately obtain some information about A:

svdlemma3 If A=U\Sigma V^{T} is any SVD for A as in Definition [def:svddef3], then:

  1. r=rank \;A.
  2. The numbers d_{1},d_{2},\dots ,d_{r} are the singular values of A^{T}A in some order.

Use the notation of Definition [def:svddef3]. We have

A^{T}A=(V\Sigma^{T}U^{T})(U\Sigma V^{T})=V(\Sigma^{T}\Sigma)V^{T} \nonumber

so \Sigma^{T}\Sigma and A^{T}A are similar n\times n matrices (Definition [def:015930]). Hence r=rank \; A by Corollary [cor:015519], proving (1.). Furthermore, \Sigma^{T}\Sigma and A^{T}A have the same eigenvalues by Theorem [thm:016008]; that is (using (1.)):

\{d_{1}^{2},d_{2}^{2},\dots ,d_{r}^{2}\}=\{\lambda_{1},\lambda_{2},\dots ,\lambda_{r}\} \quad \mbox{are equal as sets} \nonumber

where \lambda_{1},\lambda_{2},\dots ,\lambda_{r} are the positive eigenvalues of A^{T}A. Hence there is a permutation \tau of \{1,2,\cdots ,r\} such that d_{i}^{2}=\lambda_{i\tau } for each i=1,2,\dots ,r. Hence d_{i}=\sqrt{\lambda_{i\tau }}=\sigma_{i\tau } for each i by Definition [def:svddef1]. This proves (2.).

We note in passing that more is true. Let A be m\times n of rank r, and let A=U\Sigma V^{T} be any SVD for A. Using the proof of Lemma [lem:svdlemma3] we have d_{i}=\sigma_{i\tau } for some permutation \tau of \{1,2,\dots ,r\}. In fact, it can be shown that there exist orthogonal matrices U_{1} and V_{1} obtained from U and V by \tau-permuting columns and rows respectively, such that A=U_{1}\Sigma_{A}V_{1}^{T} is an SVD of A.

Fundamental Subspaces

It turns out that any singular value decomposition contains a great deal of information about an m\times n matrix A and the subspaces associated with A. For example, in addition to Lemma [lem:svdlemma3], the set \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} of vectors constructed in the proof of Theorem [thm:svdtheorem1] is an orthonormal basis of \func{col} A (by ([svdeqnv]) and ([svdeqnviii]) in the proof). There are more such examples, which is the thrust of this subsection. In particular, there are four subspaces associated to a real m\times n matrix A that have come to be called fundamental:

svddef4 The fundamental subspaces of an m\times n matrix A are:

  • \func{row}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a row of } A\}
  • \func{col}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a column of } A\}
  • \func{null}A=\{\mathbf{x}\in \mathbb{R}^{n}\mid A\mathbf{x}=\mathbf{0}\}
  • \func{null}A^{T}=\{\mathbf{x}\in \mathbb{R}^{n}\mid A^{T}\mathbf{x}=\mathbf{0}\}

If A=U\Sigma V^{T} is any SVD for the real m\times n matrix A, any orthonormal bases of U and V provide orthonormal bases for each of these fundamental subspaces. We are going to prove this, but first we need three properties related to the orthogonal complement U^{\perp} of a subspace U of \mathbb{R}^{n}, where (Definition [def:023776]):

U^{\perp}=\{\mathbf{x}\in \mathbb{R}^{n}\mid \mathbf{u}\bullet \mathbf{x}=0 \mbox{ for all } \mathbf{u}\in U\} \nonumber

The orthogonal complement plays an important role in the Projection Theorem (Theorem [thm:023885]), and we return to it in Section [sec:10_2]. For now we need:

svdlemma4 If A is any matrix then:

  1. (\func{row}A)^{\perp}=\func{null}Aand (\func{col}A)^{\perp}=\func{null}A^{T}.
  2. If U is any subspace of \mathbb{R}^{n} then U^{\perp\perp}=U.
  3. Let \{\mathbf{f}_{1},\ldots ,\mathbf{f}_{m}\} be an orthonormal basis of \mathbb{R}^{m}. If U=span \;\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{k}\}, then

    U^{\perp}=span \;\{\mathbf{f}_{k+1},\ldots ,\mathbf{f}_{m}\} \nonumber

  1. Hence \func{null}A=(\func{row}A)^{\perp}. Now replace A by A^{T} to get \func{null}A^{T}=(\func{row}A^{T})^{\perp}=(\func{col}A)^{\perp}, which is the other identity in (1).
  2. If \mathbf{x}\in U then \mathbf{y}\dotprod \mathbf{x}=0 for all \mathbf{y}\in U^{\perp}, that is \mathbf{x}\in U^{\perp\perp}. This proves that U\subseteq U^{\perp\perp}, so it is enough to show that dim \;U=dim \;U^{\perp\perp}. By Theorem [thm:023953] we see that dim \;V^{\perp}=n-dim \;V for any subspace V\subseteq \mathbb{R}^{n}. Hence

    dim \;U^{\perp\perp}=n-dim \;U^{\perp}=n-(n-dim \;U)=dim \;U, \mbox{ as required} \nonumber

  3. \begin{array}{ccccccccccccc} \mathbf{x} & = & (\mathbf{f}_{1}\dotprod \mathbf{x})\mathbf{f}_{1} & + & \cdots & + & (\mathbf{f}_{k}\dotprod \mathbf{x})\mathbf{f}_{k} & + &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1} & + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \\ & = & \mathbf{0} &+ &\cdots &+ &\mathbf{0} &+ &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1}& + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \end{array} \nonumber

With this we can see how any SVD for a matrix A provides orthonormal bases for each of the four fundamental subspaces of A.

svdtheorem2 Let A be an m\times n real matrix, let A=U\Sigma V^{T} be any SVD for A where U and V are orthogonal of size m\times m and n\times n respectively, and let

\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\qquad \mbox{where} \qquad D=\func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}), \mbox{ with each } \lambda_{i}>0 \nonumber

Write U=\left[ \begin{array}{ccccc}\mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m}\end{array}\right] and V=\left[ \begin{array}{ccccc}\mathbf{v}_{1} & \cdots & \mathbf{v}_{r} & \cdots & \mathbf{v}_{n}\end{array}\right], so \{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r},\ldots ,\mathbf{u}_{m}\} and \{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r},\ldots ,\mathbf{v}_{n}\} are orthonormal bases of \mathbb{R}^{m} and \mathbb{R}^{n} respectively. Then

  1. r=rank \;A, and the singular values of A are \sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots ,\sqrt{\lambda_{r}}.
  2. The fundamental spaces are described as follows:
    1. \{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\} is an orthonormal basis of \func{col}A.
    2. \{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\} is an orthonormal basis of \func{null}A^{T}.
    3. \{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\} is an orthonormal basis of \func{null}A.
    4. \{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r}\} is an orthonormal basis of \func{row}A.
  1. This is Lemma [lem:svdlemma3].
    1. As \func{col}A=\func{col}(AV) by Lemma [lem:015527] and AV=U\Sigma, (a.) follows from

      U\Sigma =\left[ \begin{array}{ccccc} \mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m} \end{array}\right] \left[ \begin{array}{cc} \func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}) & 0 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{cccccc} \lambda_{1}\mathbf{u}_{1} & \cdots & \lambda_{r}\mathbf{u}_{r} & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \nonumber

    2. We have (\func{col}A)^{\perp}\overset{\text{(a.)}}{=}(span \;\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\})^{\perp}=span \;\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\} by Lemma [lem:svdlemma4](3). This proves (b.) because (\func{col}A)^{\perp}=\func{null}A^{T} by Lemma [lem:svdlemma4](1).
    3. So to prove (c.) it is enough to show that \mathbf{v}_{j}\in \func{null}A whenever j>r. To this end write

      \lambda_{r+1}=\cdots =\lambda_{n}=0,\quad \mbox{so} \quad E^{T}E=\func{diag}(\lambda_{1}^{2},\ldots ,\lambda_{r}^{2},\lambda_{r+1}^{2},\ldots,\lambda_{n}^{2}) \nonumber

      Observe that each \lambda_{j} is an eigenvalue of \Sigma^{T}\Sigma with eigenvector \mathbf{e}_{j}= column j of I_{n}. Thus \mathbf{v}_{j}=V\mathbf{e}_{j} for each j. As A^{T}A=V\Sigma^{T}\Sigma V^{T} (proof of Lemma [lem:svdlemma3]), we obtain

      (A^{T}A)\mathbf{v}_{j}=(V\Sigma^{T}\Sigma V^{T})(V\mathbf{e}_{j})=V(\Sigma^{T}\Sigma \mathbf{e}_{j})=V\left( \lambda_{j}^{2}\mathbf{e}_{j}\right) =\lambda_{j}^{2}V\mathbf{e}_{j}=\lambda_{j}^{2}\mathbf{v}_{j} \nonumber

      for 1\leq j\leq n. Thus each \mathbf{v}_{j} is an eigenvector of A^{T}A corresponding to \lambda_{j}^{2}. But then

      \| A\mathbf{v}_{j}\|^{2}=(A\mathbf{v}_{j})^{T}A\mathbf{v}_{j}=\mathbf{v}_{j}^{T}(A^{T}A\mathbf{v}_{j})=\mathbf{v}_{j}^{T}(\lambda _{j}^{2}\mathbf{v}_{j})=\lambda_{j}^{2}\| \mathbf{v}_{j}\|^{2}=\lambda_{j}^{2}\quad \mbox{for } i=1,\ldots ,n \nonumber

      In particular, A\mathbf{v}_{j}=\mathbf{0} whenever j>r, so \mathbf{v}_{j}\in \func{null}A if j>r, as desired. This proves (c).

    4. This proves (d.), and hence Theorem [thm:svdtheorem2].

svdexample2 Consider the homogeneous linear system

A\mathbf{x}=\mathbf{0} \mbox{ of } m \mbox{ equations in } n \mbox{ variables} \nonumber

Then the set of all solutions is \func{null} A. Hence if A=U\Sigma V^{T} is any SVD for A then (in the notation of Theorem [thm:svdtheorem2]) \{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\} is an orthonormal basis of the set of solutions for the system. As such they are a set of basic solutions for the system, the most basic notion in Chapter [chap:1].

The Polar Decomposition of a Real Square Matrix

If A is real and n\times n the factorization in the title is related to the polar decomposition A. Unlike the SVD, in this case the decomposition is uniquely determined by A.

Recall (Section [sec:8_3]) that a symmetric matrix A is called positive definite if and only if \mathbf{x}^{T}A\mathbf{x}>0 for every column \mathbf{x} \neq \mathbf{0} \in \mathbb{R}^{n}. Before proceeding, we must explore the following weaker notion:

svddef5 A real n\times n matrix G is called positiveif it is symmetric and

\mathbf{x}^{T}G\mathbf{x}\geq 0 \quad \mbox{for all } \mathbf{x}\in \mathbb{R}^{n} \nonumber

Clearly every positive definite matrix is positive, but the converse fails. Indeed, A= \left[ \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right] is positive because, if \mathbf{x}=\left[ \begin{array}{cc} a & b \end{array}\right]^{T} in \mathbb{R}^{2}, then \mathbf{x}^{T}A\mathbf{x}=(a+b)^{2}\geq 0. But \mathbf{y}^{T}A\mathbf{y}=0 if \mathbf{y}=\left[ \begin{array}{cc} 1 & -1 \end{array}\right]^{T}, so A is not positive definite.

svdlemma5 Let G denote an n\times n positive matrix.

  1. If A is any m\times n matrix and G is positive, then A^{T}GA is positive (and m\times m).
  2. If G=\func{diag}(d_{1}, d_{2},\cdots ,d_{n}) and each d_{i}\geq 0 then G is positive.
  1. \mathbf{x}^{T}(A^{T}GA)\mathbf{x}=(A\mathbf{x})^{T}G(A\mathbf{x})\geq 0 because G is positive.
  2. If \mathbf{x}=\left[ \begin{array}{cccc}x_{1}& x_{2} & \cdots & x_{n}\end{array}\right]^{T}, then

    \mathbf{x}^{T}G\mathbf{x}=d_{1}x_{1}^{2}+d_{2}x_{2}^{2}+\cdots +d_{n}x_{n}^{2}\geq 0 \nonumber

svddef6 If A is a real n\times n matrix, a factorization

A=GQ \mbox{ where } G \mbox{ is positive and } Q \mbox{ is orthogonal} \nonumber

is called a polar decomposition for A.

Any SVD for a real square matrix A yields a polar form for A.

svdtheorem3 Every square real matrix has a polar form.

Let A=U\Sigma V^{T} be a SVD for A with \Sigma as in Definition [def:svddef3] and m=n. Since U^{T}U=I_{n} here we have

A=U\Sigma V^{T}=(U\Sigma )(U^{T}U)V^{T}=(U\Sigma U^{T})(UV^{T}) \nonumber

So if we write G=U\Sigma U^{T} and Q=UV^{T}, then Q is orthogonal, and it remains to show that G is positive. But this follows from Lemma [lem:svdlemma5].

The SVD for a square matrix A is not unique (I_{n}=PI_{n}P^{T} for any orthogonal matrix P). But given the proof of Theorem [thm:svdtheorem3] it is surprising that the polar decomposition is unique.2 We omit the proof.

The name “polar form” is reminiscent of the same form for complex numbers (see Appendix [chap:appacomplexnumbers]). This is no coincidence. To see why, we represent the complex numbers as real 2\times 2 matrices. Write \mathbf{M}_{2}(\mathbb{R}) for the set of all real 2\times 2 matrices, and define

\sigma :\mathbb{C}\rightarrow\|{M}_{2}(\mathbb{R})\quad \mbox{by} \quad \sigma (a+bi)= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \mbox{ for all } a+bi \mbox{ in } \mathbb{C} \nonumber

One verifies that \sigma preserves addition and multiplication in the sense that

\sigma (zw)=\sigma (z)\sigma (w)\qquad \mbox{and}\qquad \sigma (z+w)=\sigma (z)+\sigma (w) \nonumber

for all complex numbers z and w. Since \theta is one-to-one we may identify each complex number a+bi with the matrix \theta (a+bi), that is we write

a+bi= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \quad \mbox{for all } a+bi \mbox{ in } \mathbb{C} \nonumber

Thus 0= \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right], 1= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] =I_{2}, i= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right], and r = \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] if r is real.

If z=a+bi is nonzero then the absolute value r=|z| =\sqrt{a^{2}+b^{2}}\neq 0. If \theta is the angle of z in standard position, then \cos\theta =a/r and \sin\theta =b/r. Observe:

\label{svdeqnxiii} \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} a/r & -b/r \\ b/r & a/r \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] = GQ \tag{\textbf{xiii}}

where G= \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] is positive and Q= \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] is orthogonal. But in \mathbb{C} we have G=r and Q=\cos\theta +i\sin\theta so ([svdeqnxiii]) reads z=r(\cos\theta +i\sin\theta )=re^{i\theta } which is the classical polar form for the complex number a+bi. This is why ([svdeqnxiii]) is called the polar form of the matrix \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right]; Definition [def:svddef6] simply adopts the terminology for n\times n matrices.

The Pseudoinverse of a Matrix

It is impossible for a non-square matrix A to have an inverse (see the footnote to Definition [def:004202]). Nonetheless, one candidate for an “inverse” of A is an m\times n matrix B such that

ABA=A \qquad \mbox{and} \qquad BAB=B \nonumber

Such a matrix B is called a middle inverse for A. If A is invertible then A^{-1} is the unique middle inverse for A, but a middle inverse is not unique in general, even for square matrices. For example, if A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right] then B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right] is a middle inverse for A for any b.

If ABA=A and BAB=B it is easy to see that AB and BA are both idempotent matrices. In 1955 Roger Penrose observed that the middle inverse is unique if both AB and BA are symmetric. We omit the proof.

Penrose’ Theorempenrosetheorem

Given any real m\times n matrix A, there is exactly one n\times m matrix B such that A and B satisfy the following conditions:

  1. ABA=A and BAB=B.
  2. Both AB and BA are symmetric.

svddef7 Let A be a real m\times n matrix. The pseudoinverse of A is the unique n\times m matrix A^{+} such that A and A^{+} satisfy P1 and P2, that is:

AA^{+}A=A,\qquad A^{+}AA^{+}=A^{+},\qquad \mbox{and both } AA^{+} \mbox{ and } A^{+}A \mbox{ are symmetric}\footnotemark \nonumber

If A is invertible then A^{+}=A^{-1} as expected. In general, the symmetry in conditions P1 and P2 shows that A is the pseudoinverse of A^{+}, that is A^{++}=A.

svdtheorem4 Let A be an m\times n matrix.

  1. If rank \;A=m then AA^{T} is invertible and A^{+}=A^{T}(AA^{T})^{-1}.
  2. If rank \;A=n then A^{T}A is invertible and A^{+}=(A^{T}A)^{-1}A^{T}.

Here AA^{T} (respectively A^{T}A) is invertible by Theorem [thm:015711] (respectively Theorem [thm:015672]). The rest is a routine verification.

In general, given an m\times n matrix A, the pseudoinverse A^{+} can be computed from any SVD for A. To see how, we need some notation. Let A=U\Sigma V^{T} be an SVD for A (as in Definition [def:svddef3]) where U and V are orthogonal and \Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n} in block form where D=\func{diag}(d_{1},d_{2},\ldots ,d_{r}) where each d_{i}>0. Hence D is invertible, so we make:

svddef8 \Sigma^{\prime }= \left[ \begin{array}{cc} D^{-1} & 0 \\ 0 & 0 \end{array} \right]_{n\times m}.

A routine calculation gives:

svdlemma6

2

  • \Sigma \Sigma^{\prime }\Sigma =\Sigma
  • \Sigma^{\prime }\Sigma \Sigma^{\prime }=\Sigma^{\prime }
  • \Sigma \Sigma^{\prime }= \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{m\times m}
  • \Sigma^{\prime }\Sigma = \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{n\times n}

That is, \Sigma^{\prime } is the pseudoinverse of \Sigma.

Now given A=U\Sigma V^{T}, define B=V\Sigma^{\prime }U^{T}. Then

ABA=(U\Sigma V^{T})(V\Sigma^{\prime }U^{T})(U\Sigma V^{T})=U(\Sigma \Sigma^{\prime }\Sigma )V^{T}=U\Sigma V^{T}=A \nonumber

by Lemma [lem:svdlemma6]. Similarly BAB=B. Moreover AB=U(\Sigma \Sigma^{\prime })U^{T} and BA=V(\Sigma^{\prime }\Sigma )V^{T} are both symmetric again by Lemma [lem:svdlemma6]. This proves

svdtheorem5 Let A be real and m\times n, and let A=U\Sigma V^{T} is any SVD for A as in Definition [def:svddef3]. Then A^{+}=V\Sigma^{\prime }U^{T}.

Of course we can always use the SVD constructed in Theorem [thm:svdtheorem1] to find the pseudoinverse. If A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right], we observed above that B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right] is a middle inverse for A for any b. Furthermore AB is symmetric but BA is not, so B\neq A^{+}.

svdexample3 Find A^{+} if A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right].

A^{T}A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right] with eigenvalues \lambda_{1}=1 and \lambda_{2}=0 and corresponding eigenvectors \mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] and \mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 1 \end{array} \right]. Hence Q=\left[ \begin{array}{cc}\mathbf{q}_{1}&\mathbf{q}_{2}\end{array}\right]=I_{2}. Also A has rank 1 with singular values \sigma_{1}=1 and \sigma_{2}=0, so \Sigma_{A}= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]=A and \Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]=A^{T} in this case.

Since A\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] and A\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right], we have \mathbf{p}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right] which extends to an orthonormal basis \{\mathbf{p}_{1},\mathbf{p}_{2},\mathbf{p}_{3}\} of \mathbb{R}^{3} where (say) \mathbf{p}_{2}= \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right] and \mathbf{p}_{3}= \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]. Hence P=\left[ \begin{array}{ccc} \mathbf{p}_{1}&\mathbf{p}_{2}&\mathbf{p}_{3}\end{array}\right]=I, so the SVD for A is A=P\Sigma _{A}Q^{T}. Finally, the pseudoinverse of A is A^{+}=Q\Sigma^{\prime}_{A}P^{T}=\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]. Note that A^{+}=A^{T} in this case.

The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse. The verifications are left as exercises.

svdlemma7 Let A be an m\times n matrix of rank r.

  1. A^{++}=A.
  2. If A is invertible then A^{+}=A^{-1}.
  3. (A^{T})^{+}=(A^{+})^{T}.
  4. (kA)^{+}=kA^{+} for any real k.
  5. (UAV)^{+}=U^{T}(A^{+})V^{T} whenever U and V are orthogonal.

  1. In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]↩
  2. See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.↩

This page titled 8.6: The Singular Value Decomposition is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by W. Keith Nicholson (Lyryx Learning Inc.) via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?