Search

Text Color

Margin Size

Font Type

Enable Dyslexic Font

8.6: The Singular Value Decomposition

Last updated

Jan 3, 2024
Save as PDF
- 8.5E: Computing Eigenvalues Exercises
- 8.6.1: Singular Value Decompositions

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\id}{\mathrm{id}}$ $\newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$ $\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$ $\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\id}{\mathrm{id}}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\kernel}{\mathrm{null}\,}$

$\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$

$\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$

$\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$ $\newcommand{\AA}{\unicode[.8,0]{x212B}}$

$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vectorC}[1]{\textbf{#1}}$

$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$

$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$

$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\avec}{\mathbf a}$

$\newcommand{\bvec}{\mathbf b}$

$\newcommand{\cvec}{\mathbf c}$

$\newcommand{\dvec}{\mathbf d}$

$\newcommand{\dtil}{\widetilde{\mathbf d}}$

$\newcommand{\evec}{\mathbf e}$

$\newcommand{\fvec}{\mathbf f}$

$\newcommand{\nvec}{\mathbf n}$

$\newcommand{\pvec}{\mathbf p}$

$\newcommand{\qvec}{\mathbf q}$

$\newcommand{\svec}{\mathbf s}$

$\newcommand{\tvec}{\mathbf t}$

$\newcommand{\uvec}{\mathbf u}$

$\newcommand{\vvec}{\mathbf v}$

$\newcommand{\wvec}{\mathbf w}$

$\newcommand{\xvec}{\mathbf x}$

$\newcommand{\yvec}{\mathbf y}$

$\newcommand{\zvec}{\mathbf z}$

$\newcommand{\rvec}{\mathbf r}$

$\newcommand{\mvec}{\mathbf m}$

$\newcommand{\zerovec}{\mathbf 0}$

$\newcommand{\onevec}{\mathbf 1}$

$\newcommand{\real}{\mathbb R}$

$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$

$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$

$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$

$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$

$\newcommand{\bcal}{\cal B}$

$\newcommand{\ccal}{\cal C}$

$\newcommand{\scal}{\cal S}$

$\newcommand{\wcal}{\cal W}$

$\newcommand{\ecal}{\cal E}$

$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$

$\newcommand{\gray}[1]{\color{gray}{#1}}$

$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$

$\newcommand{\rank}{\operatorname{rank}}$

$\newcommand{\row}{\text{Row}}$

$\newcommand{\col}{\text{Col}}$

$\renewcommand{\row}{\text{Row}}$

$\newcommand{\nul}{\text{Nul}}$

$\newcommand{\var}{\text{Var}}$

$\newcommand{\corr}{\text{corr}}$

$\newcommand{\len}[1]{\left|#1\right|}$

$\newcommand{\bbar}{\overline{\bvec}}$

$\newcommand{\bhat}{\widehat{\bvec}}$

$\newcommand{\bperp}{\bvec^\perp}$

$\newcommand{\xhat}{\widehat{\xvec}}$

$\newcommand{\vhat}{\widehat{\vvec}}$

$\newcommand{\uhat}{\widehat{\uvec}}$

$\newcommand{\what}{\widehat{\wvec}}$

$\newcommand{\Sighat}{\widehat{\Sigma}}$

$\newcommand{\lt}{<}$

$\newcommand{\gt}{>}$

$\newcommand{\amp}{&}$

$\definecolor{fillinmathshade}{gray}{0.9}$

When working with a square matrix $A$ it is clearly useful to be able to “diagonalize” $A$ , that is to find a factorization $A=Q^{-1}DQ$ where $Q$ is invertible and $D$ is diagonal. Unfortunately such a factorization may not exist for $A$ . However, even if $A$ is not square gaussian elimination provides a factorization of the form $A=PDQ$ where $P$ and $Q$ are invertible and $D$ is diagonal—the Smith Normal form (Theorem [thm:005369]). However, if $A$ is real we can choose $P$ and $Q$ to be orthogonal real matrices and $D$ to be real. Such a factorization is called a singular value decomposition (SVD) for $A$ , one of the most useful tools in applied linear algebra. In this Section we show how to explicitly compute an SVD for any real matrix $A$ , and illustrate some of its many applications.

We need a fact about two subspaces associated with an $m\times n$ matrix $A$ :

$im \;A=\{A\mathbf{x}\mid \mathbf{x} \mbox{ in } \mathbb{R}^{n}\}\qquad \mbox{and}\qquad \func{col}A=span \;\{\mathbf{a}\mid\mathbf{a} \mbox{ is a column of }A\} \nonumber$

Then $im \;A$ is called the image of $A$ (so named because of the linear transformation $\mathbb{R}^{n}\rightarrow \mathbb{R}^{m}$ with $\mathbf{x}\mapsto A\mathbf{x}$ ); and $\func{col}A$ is called the column space of $A$ (Definition [def:015376]). Surprisingly, these spaces are equal:

For any $m\times n$ matrix $A$ , $im \; A=\func{col} A$ .

Let $A=\left[ \begin{array}{cccc} \mathbf{a}_{1} & \mathbf{a}_{2} & \cdots & \mathbf{a}_{n} \end{array}\right]$ in terms of its columns. Let $\mathbf{x}\in im \; A$ , say $\mathbf{x}=A\mathbf{y}$ , $\mathbf{y}$ in $\mathbb{R}^{n}$ . If $\mathbf{y}=\left[ \begin{array}{cccc} y_{1} & y_{2} & \cdots & y_{n}\end{array}\right]^{T}$ , then $A\mathbf{y}=y_{1}\mathbf{a}_{1}+y_{2}\mathbf{a}_{2}+\cdots +y_{n}\mathbf{a}_{n}\in \func{col} A$ by Definition [def:002668]. This shows that $im \; A\subseteq \func{col}A$ . For the other inclusion, each $\mathbf{a}_{k}=A\mathbf{e}_{k}$ where $\mathbf{e}_{k}$ is column $k$ of $I_{n}$ .

Singular Value Decompositions

We know a lot about any real symmetric matrix: Its eigenvalues are real (Theorem [thm:016397]), and it is orthogonally diagonalizable by the Principal Axes Theorem (Theorem [thm:024303]). So for any real matrix $A$ (square or not), the fact that both $A^{T}A$ and $AA^{T}$ are real and symmetric suggests that we can learn a lot about $A$ by studying them. This section shows just how true this is.

The following Lemma reveals some similarities between $A^{T}A$ and $AA^{T}$ which simplify the statement and the proof of the SVD we are constructing.

Let $A$ be a real $m\times n$ matrix. Then:

The eigenvalues of $A^{T}A$ and $AA^{T}$ are real and non-negative.
$A^{T}A$ and $AA^{T}$ have the same set of positive eigenvalues.

Then (1.) follows for $A^{T}A$ , and the case $AA^{T}$ follows by replacing $A$ by $A^{T}$ .
Moreover, $A\mathbf{q}\neq \mathbf{0}$ since $A^{T}A\mathbf{q}=\lambda \mathbf{q}\neq \mathbf{0}$ and both $\lambda \neq 0$ and $\mathbf{q}\neq \mathbf{0}$ . Hence $\lambda$ is an eigenvalue of $AA^{T}$ , proving $N(A^{T}A)\subseteq N(AA^{T})$ . For the other inclusion replace $A$ by $A^{T}$ .

To analyze an $m\times n$ matrix $A$ we have two symmetric matrices to work with: $A^{T}A$ and $AA^{T}$ . In view of Lemma [lem:svdlemma2], we choose $A^{T}A$ (sometimes called the Gram matrix of $A$ ), and derive a series of facts which we will need. This narrative is a bit long, but trust that it will be worth the effort. We parse it out in several steps:

The $n\times n$ matrix $A^{T}A$ is real and symmetric so, by the Principal Axes Theorem [thm:024303], let $\{\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}\}\subseteq \mathbb{R}^{n}$ be an orthonormal basis of eigenvectors of $A^{T}A,$ with corresponding eigenvalues $\lambda_{1},\lambda_{2},\ldots ,\lambda_{n}$ . By Lemma [lem:svdlemma2](1), $\lambda_{i}$ is real for each $i$ and $\lambda_{i}\geq 0$ . By re-ordering the $\mathbf{q}_{i}$ we may (and do) assume that
$\label{svdeqni} \lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0 \quad \mbox{ and }\footnote{Of course they could \emph{all} be positive $(r=n)$ or \emph{all} zero (so $A^{T}A=0$, and hence $A=0$ by Exercise \ref{ex:5.3.9}).} \quad \lambda_{i}=0 \mbox{ if } i>r \tag{\textbf{i}}$

$\label{svdeqnii} Q=\left[ \begin{array}{cccc} \mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n} \end{array}\right] \mbox{ is orthogonal }\quad \mbox{and} \quad \mbox{ orthogonally diagonalizes }A^{T}A \tag{\textbf{ii}}$
Because $\{\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}\}$ is an orthonormal set, this gives
$\label{svdeqniii} A\mathbf{q}_{i}\dotprod A\mathbf{q}_{j}=0 \mbox{ if } i\neq j \quad \quad \mbox{and} \quad \quad \| A\mathbf{q}_{i}\|^{2}=\lambda_{i}\| \mathbf{q}_{i}\| ^{2}=\lambda_{i} \mbox{ for each } i \tag{\textbf{iii}}$

We can extract two conclusions from ([svdeqniii]) and ([svdeqni]):

$\label{svdeqniv} \{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\}\subseteq im \; A \mbox{ is an orthogonal set}\quad \mbox{and} \quad A\mathbf{q}_{i}=\mathbf{0} \mbox{ if }i>r \tag{\textbf{iv}}$

With this write $U=span \;\{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\}\subseteq im \;A$ ; we claim that $U=im \;A$ , that is $im \;A\subseteq U$ . For this we must show that $A\mathbf{x}\in U$ for each $\mathbf{x}\in \mathbb{R}^{n}$ . Since $\{\mathbf{q}_{1},\ldots ,\mathbf{q}_{r},\ldots ,\mathbf{q}_{n}\}$ is a basis of $\mathbb{R}^{n}$ (it is orthonormal), we can write $\mathbf{x}_{k}=t_{1}\mathbf{q}_{1}+\cdots +t_{r}\mathbf{q}_{r}+\cdots +t_{n}\mathbf{q}_{n}$ where each $t_{j}\in \mathbb{R}$ . Then, using ([svdeqniv]) we obtain

$A\mathbf{x}=t_{1}A\mathbf{q}_{1}+\cdots +t_{r}A\mathbf{q}_{r}+\cdots +t_{n}A\mathbf{q}_{n}=t_{1}A\mathbf{q}_{1}+\cdots +t_{r}A\mathbf{q}_{r}\in U \nonumber$

This shows that $U=im \;A$ , and so

$\label{svdeqnv} \{A\mathbf{q}_{1}, A\mathbf{q}_{2},\ldots ,A\mathbf{q}_{r}\} \mbox{ is an \emph{orthogonal} basis of } im \;(A) \tag{\textbf{v}}$

But $\func{col}A=im \;A$ by Lemma [lem:svdlemma1], and $rank \;A=dim \;(\func{col}A)$ by Theorem [thm:015444], so

$\label{svdeqnvi} rank \; A=dim \;(\func{col}A)=dim \;(im \;A)\overset{(\text{\mathbf{v}})}{=}r \tag{\textbf{vi}}$
The real numbers $\sigma_{i}=\sqrt{\lambda_{i}}\overset{\text{(\textbf{iii})}}{=}\| A\bar{\mathbf{q}_{i}}\|$ for $i=1,2,\ldots,n$ , are called the singular values of the matrix $A$ .
Clearly $\sigma_{1},\sigma_{2},\ldots ,\sigma_{r}$ are the positive singular values of $A$ . By ([svdeqni]) we have

$\label{svdeqnvii} \sigma_{1}\geq \sigma _{2}\geq \cdots \geq \sigma_{r}>0 \qquad \mbox{and} \qquad \sigma_{i}=0 \mbox{ if } i>r \tag{\textbf{vii}}$

With ([svdeqnvi]) this makes the following definitions depend only upon $A$ .

Let $A$ be a real, $m\times n$ matrix of rank $r$ , with positive singular values $\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0$ and $\sigma_{i}=0$ if $i>r$ . Define:

$D_{A}=\func{diag}(\sigma_{1},\ldots ,\sigma_{r})\quad \qquad \mbox{and}\quad \qquad \Sigma_{A}= \left[ \begin{array}{cc} D_{A} & 0 \\ 0 & 0 \end{array} \right]_{m\times n} \nonumber$

Here $\Sigma_{A}$ is in block form and is called the singular matrix of $A$ .

The singular values $\sigma_{i}$ and the matrices $D_{A}$ and $\Sigma_{A}$ will be referred to frequently below.
Returning to our narrative, normalize the vectors $A\mathbf{q}_{1}$ , $A\mathbf{q}_{2},\ldots, A\mathbf{q}_{r}$ , by defining
$\label{svdeqnviii} \mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\| }A\mathbf{q}_{i}\in \mathbb{R}^{m} \quad \mbox{for each } i=1,2,\ldots ,r \tag{\textbf{viii}}$

$\label{svdeqnix} \{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\} \mbox{ is an \emph{orthonormal} basis of } \func{col}A\subseteq \mathbb{R}^{m} \tag{\textbf{ix}}$

Employing the Gram-Schmidt algorithm (or otherwise), construct $\mathbf{p}_{r+1},\ldots ,\mathbf{p}_{m}$ so that

$\label{svdeqnx} \{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots ,\mathbf{p}_{m}\} \mbox{ is an orthonormal basis of } \mathbb{R}^{m} \tag{\textbf{x}}$
These matrices are related. In fact we have:
$\label{svdeqnxi} \sigma_{i}\mathbf{p}_{i}=\sqrt{\lambda_{i}} \mathbf{p}_{i}\overset{(\text{\ref{svdeqniii}})}{=}\| A\mathbf{q}_{i}\| \mathbf{p}_{i}\overset{(\text{\ref{svdeqnviii}})}{=}A\mathbf{q}_{i} \quad \mbox{ for each } i=1,2,\ldots ,r \tag{\textbf{xi}}$

This yields the following expression for $AQ$ in terms of its columns:

$\label{svdeqnxii} AQ=\left[ \begin{array}{cccccc} A\mathbf{q}_{1} & \cdots & A\mathbf{q}_{r} & A\mathbf{q}_{r+1} & \cdots & A\mathbf{q}_{n}\end{array}\right] \overset{(\text{\ref{svdeqniv}})}{=}\left[ \begin{array}{cccccc}\sigma _{1} \mathbf{p}_{1} & \cdots & \sigma_{r}\mathbf{p}_{r}& \mathbf{0}& \cdots &\mathbf{0}\end{array}\right] \tag{\textbf{xii}}$

Then we compute:

$\begin{aligned} P\Sigma _{A} & = \left[ \begin{array}{cccccc}\mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \mathbf{p}_{r+1} & \cdots & \mathbf{p}_{m}\end{array}\right] \left[ \begin{array}{cc} \begin{array}{ccc} \sigma _{1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \sigma _{r} \end{array} & \begin{array}{ccc} 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & \cdots & 0 \end{array} \\ \begin{array}{ccc} 0 & \cdots & ~0 \\ \vdots & & \vdots \\ 0 & \cdots & ~0 \end{array} & \begin{array}{ccc} 0 & \cdots & 0 \\ \vdots & & \vdots \\ 0 & \cdots & 0 \end{array} \end{array} \right] \\ &= \left[ \begin{array}{cccccc}\sigma_{1}\mathbf{p}_{1} & \cdots & \sigma_{r}\mathbf{p}_{r} & \mathbf{0} & \cdots & \mathbf{0}\end{array}\right] \\ & \overset{\text{(\ref{svdeqnxii})}}{=} AQ\end{aligned} \nonumber$

Finally, as $Q^{-1}=Q^{T}$ it follows that $A=P\Sigma _{A}Q^{T}$ .

With this we can state the main theorem of this Section.

Let $A$ be a real $m\times n$ matrix, and let $\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0$ be the positive singular values of $A$ . Then $r$ is the rank of $A$ and we have the factorization

$A=P\Sigma_{A}Q^{T}\qquad \mbox{where } P \mbox{ and } Q \mbox{ are orthogonal matrices} \nonumber$

The factorization $A=P\Sigma _{A}Q^{T}$ in Theorem [thm:svdtheorem1], where $P$ and $Q$ are orthogonal matrices, is called a Singular Value Decomposition (SVD) of $A$ . This decomposition is not unique. For example if $r<m$ then the vectors $\mathbf{p}_{r+1},\ldots ,\mathbf{p}_{m}$ can be any extension of $\{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r}\}$ to an orthonormal basis of $\mathbb{R}^{m}$ , and each will lead to a different matrix $P$ in the decomposition. For a more dramatic example, if $A=I_{n}$ then $\Sigma_{A}=I_{n}$ , and $A=P\Sigma_{A}P^{T}$ is a SVD of $A$ for any orthogonal $n\times n$ matrix $P$ .

Find a singular value decomposition for $A= \left[ \begin{array}{rrr} 1 & 0 & 1 \\ -1 & 1 & 0 \end{array} \right]$ .

We have $A^{T}A= \left[ \begin{array}{rrr} 2 & -1 & 1 \\ -1 & 1 & 0 \\ 1 & 0 & 1 \end{array} \right]$ , so the characteristic polynomial is

$c_{A^{T}A}(x)=\det \left[ \begin{array}{ccc} x-2 & 1 & -1 \\ 1 & x-1 & 0 \\ -1 & 0 & x-1 \end{array} \right] =(x-3)(x-1)x \nonumber$

Hence the eigenvalues of $A^{T}A$ (in descending order) are $\lambda_{1}=3$ , $\lambda_{2}=1$ and $\lambda_{3}=0$ with, respectively, unit eigenvectors

$\mathbf{q}_{1}=\frac{1}{\sqrt{6}} \left[ \begin{array}{r} 2 \\ -1 \\ 1 \end{array} \right], \quad \mathbf{q}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 0 \\ 1 \\ 1 \end{array} \right],\quad \mbox{and} \quad \mathbf{q}_{3}=\frac{1}{\sqrt{3}} \left[ \begin{array}{r} -1 \\ -1 \\ 1 \end{array} \right] \nonumber$

It follows that the orthogonal matrix $Q$ in Theorem [thm:svdtheorem1] is

$Q=\left[ \begin{array}{ccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \mathbf{q}_{3}\end{array}\right]=\frac{1}{\sqrt{6}} \left[ \begin{array}{rrr} 2 & 0 & -\sqrt{2} \\ -1 & \sqrt{3} & -\sqrt{2} \\ 1 & \sqrt{3} & \sqrt{2} \end{array} \right] \nonumber$

The singular values here are $\sigma_{1}=\sqrt{3}$ , $\sigma_{2}=1$ and $\sigma_{3}=0,$ so $rank \;(A)=2$ —clear in this case—and the singular matrix is

$\Sigma_{A}= \left[ \begin{array}{ccc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \end{array} \right] =\left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \nonumber$

So it remains to find the $2\times 2$ orthogonal matrix $P$ in Theorem [thm:svdtheorem1]. This involves the vectors

$A\mathbf{q}_{1}=\frac{\sqrt{6}}{2} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right],\quad A\mathbf{q}_{2}=\frac{\sqrt{2}}{2} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] ,\quad \mbox{and} \quad A\mathbf{q}_{3}= \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] \nonumber$

Normalize $A\mathbf{q}_{1}$ and $A\mathbf{q}_{2}$ to get

$\mathbf{p}_{1}=\frac{1}{\sqrt{2}} \left[ \begin{array}{r} 1 \\ -1 \end{array} \right] \quad \mbox{and} \quad \mathbf{p}_{2}=\frac{1}{\sqrt{2}} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \nonumber$

In this case, $\{\mathbf{p}_{1},\mathbf{p}_{2}\}$ is already a basis of $\mathbb{R}^{2}$ (so the Gram-Schmidt algorithm is not needed), and we have the $2\times 2$ orthogonal matrix

$P=\left[ \begin{array}{cc} \mathbf{p}_{1} & \mathbf{p}_{2} \end{array}\right]=\frac{1}{\sqrt{2}} \left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \nonumber$

Finally (by Theorem [thm:svdtheorem1]) the singular value decomposition for $A$ is

$A=P\Sigma_{A}Q^{T}= \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} 1 & 1 \\ -1 & 1 \end{array} \right] \left[ \begin{array}{ccc} \sqrt{3} & 0 & 0 \\ 0 & 1 & 0 \end{array} \right] \frac{1}{\sqrt{6}}\left[ \begin{array}{rrr} 2 & -1 & 1 \\ 0 & \sqrt{3} & \sqrt{3} \\ -\sqrt{2} & -\sqrt{2} & \sqrt{2} \end{array} \right] \nonumber$

Of course this can be confirmed by direct matrix multiplication.

Thus, computing an SVD for a real matrix $A$ is a routine matter, and we now describe a systematic procedure for doing so.

Given a real $m\times n$ matrix $A$ , find an SVD $A=P\Sigma_{A}Q^{T}$ as follows:

Use the Diagonalization Algorithm (see page ) to find the (real and non-negative) eigenvalues $\lambda_{1},\lambda_{2},\ldots,\lambda_{n}$ of $A^{T}A$ with corresponding (orthonormal) eigenvectors $\mathbf{q}_{1},\mathbf{q}_{2},\ldots ,\mathbf{q}_{n}$ . Reorder the $\mathbf{q}_{i}$ (if necessary) to ensure that the nonzero eigenvalues are $\lambda_{1}\geq \lambda_{2}\geq \cdots \geq \lambda_{r}>0$ and $\lambda_{i}=0$ if $i>r$ .
The integer $r$ is the rank of the matrix $A$ .
The $n\times n$ orthogonal matrix $Q$ in the SVD is $Q=\left[ \begin{array}{cccc}\mathbf{q}_{1} & \mathbf{q}_{2} & \cdots & \mathbf{q}_{n}\end{array}\right]$ .
Define $\mathbf{p}_{i}=\frac{1}{\| A\mathbf{q}_{i}\|}A\mathbf{q}_{i}$ for $i=1,2,\ldots ,r$ (where $r$ is as in step 1). Then $\{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\}$ is orthonormal in $\mathbb{R}^{m}$ so (using Gram-Schmidt or otherwise) extend it to an orthonormal basis $\{\mathbf{p}_{1},\ldots ,\mathbf{p}_{r},\ldots , \mathbf{p}_{m}\}$ in $\mathbb{R}^{m}$ .
The $m\times m$ orthogonal matrix $P$ in the SVD is $P=\left[ \begin{array}{ccccc} \mathbf{p}_{1} & \cdots & \mathbf{p}_{r} & \cdots & \mathbf{p}_{m}\end{array}\right]$ .
The singular values for $A$ are $\sigma_{1},\sigma_{2,}\ldots,\sigma_{n}$ where $\sigma_{i}=\sqrt{\lambda_{i}}$ for each $i$ . Hence the nonzero singular values are $\sigma_{1}\geq \sigma_{2}\geq \cdots \geq \sigma_{r}>0$ , and so the singular matrix of $A$ in the SVD is $\Sigma_{A}= \left[ \begin{array}{cc} \func{diag}(\sigma_{1},\ldots ,\sigma_{r}) & 0 \\ 0 & 0 \end{array} \right]_{m\times n}$ .
Thus $A=P\Sigma Q^{T}$ is a SVD for $A$ .

In practise the singular values $\sigma_{i}$ , the matrices $P$ and $Q$ , and even the rank of an $m\times n$ matrix are not calculated this way. There are sophisticated numerical algorithms for calculating them to a high degree of accuracy. The reader is referred to books on numerical linear algebra.

So the main virtue of Theorem [thm:svdtheorem1] is that it provides a way of constructing an SVD for every real matrix $A$ . In particular it shows that every real matrix $A$ has a singular value decomposition¹ in the following, more general, sense:

A Singular Value Decomposition (SVD) of an $m\times n$ matrix $A$ of rank $r$ is a factorization $A=U\Sigma V^{T}$ where $U$ and $V$ are orthogonal and $\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}$ in block form where $D=\func{diag}(d_{1},d_{2},\ldots ,d_{r})$ where each $d_{i}>0$ , and $r\leq m$ and $r\leq n$ .

Note that for any SVD $A=U\Sigma V^{T}$ we immediately obtain some information about $A$ :

If $A=U\Sigma V^{T}$ is any SVD for $A$ as in Definition [def:svddef3], then:

$r=rank \;A$ .
The numbers $d_{1},d_{2},\dots ,d_{r}$ are the singular values of $A^{T}A$ in some order.

Use the notation of Definition [def:svddef3]. We have

$A^{T}A=(V\Sigma^{T}U^{T})(U\Sigma V^{T})=V(\Sigma^{T}\Sigma)V^{T} \nonumber$

so $\Sigma^{T}\Sigma$ and $A^{T}A$ are similar $n\times n$ matrices (Definition [def:015930]). Hence $r=rank \; A$ by Corollary [cor:015519], proving (1.). Furthermore, $\Sigma^{T}\Sigma$ and $A^{T}A$ have the same eigenvalues by Theorem [thm:016008]; that is (using (1.)):

$\{d_{1}^{2},d_{2}^{2},\dots ,d_{r}^{2}\}=\{\lambda_{1},\lambda_{2},\dots ,\lambda_{r}\} \quad \mbox{are equal as sets} \nonumber$

where $\lambda_{1},\lambda_{2},\dots ,\lambda_{r}$ are the positive eigenvalues of $A^{T}A$ . Hence there is a permutation $\tau$ of $\{1,2,\cdots ,r\}$ such that $d_{i}^{2}=\lambda_{i\tau }$ for each $i=1,2,\dots ,r$ . Hence $d_{i}=\sqrt{\lambda_{i\tau }}=\sigma_{i\tau }$ for each $i$ by Definition [def:svddef1]. This proves (2.).

We note in passing that more is true. Let $A$ be $m\times n$ of rank $r$ , and let $A=U\Sigma V^{T}$ be any SVD for $A$ . Using the proof of Lemma [lem:svdlemma3] we have $d_{i}=\sigma_{i\tau }$ for some permutation $\tau$ of $\{1,2,\dots ,r\}.$ In fact, it can be shown that there exist orthogonal matrices $U_{1}$ and $V_{1}$ obtained from $U$ and $V$ by $\tau$ -permuting columns and rows respectively, such that $A=U_{1}\Sigma_{A}V_{1}^{T}$ is an SVD of $A$ .

Fundamental Subspaces

It turns out that any singular value decomposition contains a great deal of information about an $m\times n$ matrix $A$ and the subspaces associated with $A$ . For example, in addition to Lemma [lem:svdlemma3], the set $\{\mathbf{p}_{1},\mathbf{p}_{2},\ldots ,\mathbf{p}_{r}\}$ of vectors constructed in the proof of Theorem [thm:svdtheorem1] is an orthonormal basis of $\func{col} A$ (by ([svdeqnv]) and ([svdeqnviii]) in the proof). There are more such examples, which is the thrust of this subsection. In particular, there are four subspaces associated to a real $m\times n$ matrix $A$ that have come to be called fundamental:

The fundamental subspaces of an $m\times n$ matrix $A$ are:

$\func{row}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a row of } A\}$
$\func{col}A=span \;\{\mathbf{x}\mid \mathbf{x} \mbox{ is a column of } A\}$
$\func{null}A=\{\mathbf{x}\in \mathbb{R}^{n}\mid A\mathbf{x}=\mathbf{0}\}$
$\func{null}A^{T}=\{\mathbf{x}\in \mathbb{R}^{n}\mid A^{T}\mathbf{x}=\mathbf{0}\}$

If $A=U\Sigma V^{T}$ is any SVD for the real $m\times n$ matrix $A$ , any orthonormal bases of $U$ and $V$ provide orthonormal bases for each of these fundamental subspaces. We are going to prove this, but first we need three properties related to the orthogonal complement $U^{\perp}$ of a subspace $U$ of $\mathbb{R}^{n}$ , where (Definition [def:023776]):

$U^{\perp}=\{\mathbf{x}\in \mathbb{R}^{n}\mid \mathbf{u}\bullet \mathbf{x}=0 \mbox{ for all } \mathbf{u}\in U\} \nonumber$

The orthogonal complement plays an important role in the Projection Theorem (Theorem [thm:023885]), and we return to it in Section [sec:10_2]. For now we need:

If $A$ is any matrix then:

$(\func{row}A)^{\perp}=\func{null}A$ and $(\func{col}A)^{\perp}=\func{null}A^{T}$ .
If $U$ is any subspace of $\mathbb{R}^{n}$ then $U^{\perp\perp}=U$ .
Let $\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{m}\}$ be an orthonormal basis of $\mathbb{R}^{m}$ . If $U=span \;\{\mathbf{f}_{1},\ldots ,\mathbf{f}_{k}\}$ , then
$U^{\perp}=span \;\{\mathbf{f}_{k+1},\ldots ,\mathbf{f}_{m}\} \nonumber$

Hence $\func{null}A=(\func{row}A)^{\perp}$ . Now replace $A$ by $A^{T}$ to get $\func{null}A^{T}=(\func{row}A^{T})^{\perp}=(\func{col}A)^{\perp}$ , which is the other identity in (1).
If $\mathbf{x}\in U$ then $\mathbf{y}\dotprod \mathbf{x}=0$ for all $\mathbf{y}\in U^{\perp}$ , that is $\mathbf{x}\in U^{\perp\perp}$ . This proves that $U\subseteq U^{\perp\perp}$ , so it is enough to show that $dim \;U=dim \;U^{\perp\perp}$ . By Theorem [thm:023953] we see that $dim \;V^{\perp}=n-dim \;V$ for any subspace $V\subseteq \mathbb{R}^{n}$ . Hence
$dim \;U^{\perp\perp}=n-dim \;U^{\perp}=n-(n-dim \;U)=dim \;U, \mbox{ as required} \nonumber$
$\begin{array}{ccccccccccccc} \mathbf{x} & = & (\mathbf{f}_{1}\dotprod \mathbf{x})\mathbf{f}_{1} & + & \cdots & + & (\mathbf{f}_{k}\dotprod \mathbf{x})\mathbf{f}_{k} & + &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1} & + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \\ & = & \mathbf{0} &+ &\cdots &+ &\mathbf{0} &+ &(\mathbf{f}_{k+1}\dotprod \mathbf{x})\mathbf{f}_{k+1}& + &\cdots &+&(\mathbf{f}_{m}\dotprod \mathbf{x})\mathbf{f}_{m} \end{array} \nonumber$

With this we can see how any SVD for a matrix $A$ provides orthonormal bases for each of the four fundamental subspaces of $A$ .

Let $A$ be an $m\times n$ real matrix, let $A=U\Sigma V^{T}$ be any SVD for $A$ where $U$ and $V$ are orthogonal of size $m\times m$ and $n\times n$ respectively, and let

$\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}\qquad \mbox{where} \qquad D=\func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}), \mbox{ with each } \lambda_{i}>0 \nonumber$

Write $U=\left[ \begin{array}{ccccc}\mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m}\end{array}\right]$ and $V=\left[ \begin{array}{ccccc}\mathbf{v}_{1} & \cdots & \mathbf{v}_{r} & \cdots & \mathbf{v}_{n}\end{array}\right],$ so $\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r},\ldots ,\mathbf{u}_{m}\}$ and $\{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r},\ldots ,\mathbf{v}_{n}\}$ are orthonormal bases of $\mathbb{R}^{m}$ and $\mathbb{R}^{n}$ respectively. Then

$r=rank \;A$ , and the singular values of $A$ are $\sqrt{\lambda_{1}},\sqrt{\lambda_{2}},\ldots ,\sqrt{\lambda_{r}}$ .
The fundamental spaces are described as follows:
1. $\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\}$ is an orthonormal basis of $\func{col}A$ .
2. $\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\}$ is an orthonormal basis of $\func{null}A^{T}$ .
3. $\{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\}$ is an orthonormal basis of $\func{null}A$ .
4. $\{\mathbf{v}_{1},\ldots ,\mathbf{v}_{r}\}$ is an orthonormal basis of $\func{row}A$ .

This is Lemma [lem:svdlemma3].
1. As $\func{col}A=\func{col}(AV)$ by Lemma [lem:015527] and $AV=U\Sigma$ , (a.) follows from
  $U\Sigma =\left[ \begin{array}{ccccc} \mathbf{u}_{1} & \cdots & \mathbf{u}_{r} & \cdots & \mathbf{u}_{m} \end{array}\right] \left[ \begin{array}{cc} \func{diag}(\lambda_{1},\lambda_{2},\ldots ,\lambda_{r}) & 0 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{cccccc} \lambda_{1}\mathbf{u}_{1} & \cdots & \lambda_{r}\mathbf{u}_{r} & \mathbf{0} & \cdots & \mathbf{0} \end{array}\right] \nonumber$
2. We have $(\func{col}A)^{\perp}\overset{\text{(a.)}}{=}(span \;\{\mathbf{u}_{1},\ldots ,\mathbf{u}_{r}\})^{\perp}=span \;\{\mathbf{u}_{r+1},\ldots ,\mathbf{u}_{m}\}$ by Lemma [lem:svdlemma4](3). This proves (b.) because $(\func{col}A)^{\perp}=\func{null}A^{T}$ by Lemma [lem:svdlemma4](1).
3. So to prove (c.) it is enough to show that $\mathbf{v}_{j}\in \func{null}A$ whenever $j>r$ . To this end write
  $\lambda_{r+1}=\cdots =\lambda_{n}=0,\quad \mbox{so} \quad E^{T}E=\func{diag}(\lambda_{1}^{2},\ldots ,\lambda_{r}^{2},\lambda_{r+1}^{2},\ldots,\lambda_{n}^{2}) \nonumber$
  
  Observe that each $\lambda_{j}$ is an eigenvalue of $\Sigma^{T}\Sigma$ with eigenvector $\mathbf{e}_{j}=$ column $j$ of $I_{n}$ . Thus $\mathbf{v}_{j}=V\mathbf{e}_{j}$ for each $j$ . As $A^{T}A=V\Sigma^{T}\Sigma V^{T}$ (proof of Lemma [lem:svdlemma3]), we obtain
  
  $(A^{T}A)\mathbf{v}_{j}=(V\Sigma^{T}\Sigma V^{T})(V\mathbf{e}_{j})=V(\Sigma^{T}\Sigma \mathbf{e}_{j})=V\left( \lambda_{j}^{2}\mathbf{e}_{j}\right) =\lambda_{j}^{2}V\mathbf{e}_{j}=\lambda_{j}^{2}\mathbf{v}_{j} \nonumber$
  
  for $1\leq j\leq n$ . Thus each $\mathbf{v}_{j}$ is an eigenvector of $A^{T}A$ corresponding to $\lambda_{j}^{2}$ . But then
  
  $\| A\mathbf{v}_{j}\|^{2}=(A\mathbf{v}_{j})^{T}A\mathbf{v}_{j}=\mathbf{v}_{j}^{T}(A^{T}A\mathbf{v}_{j})=\mathbf{v}_{j}^{T}(\lambda _{j}^{2}\mathbf{v}_{j})=\lambda_{j}^{2}\| \mathbf{v}_{j}\|^{2}=\lambda_{j}^{2}\quad \mbox{for } i=1,\ldots ,n \nonumber$
  
  In particular, $A\mathbf{v}_{j}=\mathbf{0}$ whenever $j>r$ , so $\mathbf{v}_{j}\in \func{null}A$ if $j>r$ , as desired. This proves (c).
4. This proves (d.), and hence Theorem [thm:svdtheorem2].

svdexample2 Consider the homogeneous linear system

$A\mathbf{x}=\mathbf{0} \mbox{ of } m \mbox{ equations in } n \mbox{ variables} \nonumber$

Then the set of all solutions is $\func{null} A$ . Hence if $A=U\Sigma V^{T}$ is any SVD for $A$ then (in the notation of Theorem [thm:svdtheorem2]) $\{\mathbf{v}_{r+1},\ldots ,\mathbf{v}_{n}\}$ is an orthonormal basis of the set of solutions for the system. As such they are a set of basic solutions for the system, the most basic notion in Chapter [chap:1].

The Polar Decomposition of a Real Square Matrix

If $A$ is real and $n\times n$ the factorization in the title is related to the polar decomposition $A$ . Unlike the SVD, in this case the decomposition is uniquely determined by $A$ .

Recall (Section [sec:8_3]) that a symmetric matrix $A$ is called positive definite if and only if $\mathbf{x}^{T}A\mathbf{x}>0$ for every column $\mathbf{x} \neq \mathbf{0} \in \mathbb{R}^{n}$ . Before proceeding, we must explore the following weaker notion:

A real $n\times n$ matrix $G$ is called positiveif it is symmetric and

$\mathbf{x}^{T}G\mathbf{x}\geq 0 \quad \mbox{for all } \mathbf{x}\in \mathbb{R}^{n} \nonumber$

Clearly every positive definite matrix is positive, but the converse fails. Indeed, $A= \left[ \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right]$ is positive because, if $\mathbf{x}=\left[ \begin{array}{cc} a & b \end{array}\right]^{T}$ in $\mathbb{R}^{2}$ , then $\mathbf{x}^{T}A\mathbf{x}=(a+b)^{2}\geq 0$ . But $\mathbf{y}^{T}A\mathbf{y}=0$ if $\mathbf{y}=\left[ \begin{array}{cc} 1 & -1 \end{array}\right]^{T}$ , so $A$ is not positive definite.

Let $G$ denote an $n\times n$ positive matrix.

If $A$ is any $m\times n$ matrix and $G$ is positive, then $A^{T}GA$ is positive (and $m\times m$ ).
If $G=\func{diag}(d_{1}, d_{2},\cdots ,d_{n})$ and each $d_{i}\geq 0$ then $G$ is positive.

$\mathbf{x}^{T}(A^{T}GA)\mathbf{x}=(A\mathbf{x})^{T}G(A\mathbf{x})\geq 0$ because $G$ is positive.
If $\mathbf{x}=\left[ \begin{array}{cccc}x_{1}& x_{2} & \cdots & x_{n}\end{array}\right]^{T}$ , then
$\mathbf{x}^{T}G\mathbf{x}=d_{1}x_{1}^{2}+d_{2}x_{2}^{2}+\cdots +d_{n}x_{n}^{2}\geq 0 \nonumber$

If $A$ is a real $n\times n$ matrix, a factorization

$A=GQ \mbox{ where } G \mbox{ is positive and } Q \mbox{ is orthogonal} \nonumber$

is called a polar decomposition for $A$ .

Any SVD for a real square matrix $A$ yields a polar form for $A$ .

svdtheorem3 Every square real matrix has a polar form.

Let $A=U\Sigma V^{T}$ be a SVD for $A$ with $\Sigma$ as in Definition [def:svddef3] and $m=n$ . Since $U^{T}U=I_{n}$ here we have

$A=U\Sigma V^{T}=(U\Sigma )(U^{T}U)V^{T}=(U\Sigma U^{T})(UV^{T}) \nonumber$

So if we write $G=U\Sigma U^{T}$ and $Q=UV^{T}$ , then $Q$ is orthogonal, and it remains to show that $G$ is positive. But this follows from Lemma [lem:svdlemma5].

The SVD for a square matrix $A$ is not unique ( $I_{n}=PI_{n}P^{T}$ for any orthogonal matrix $P$ ). But given the proof of Theorem [thm:svdtheorem3] it is surprising that the polar decomposition is unique.² We omit the proof.

The name “polar form” is reminiscent of the same form for complex numbers (see Appendix [chap:appacomplexnumbers]). This is no coincidence. To see why, we represent the complex numbers as real $2\times 2$ matrices. Write $\mathbf{M}_{2}(\mathbb{R})$ for the set of all real $2\times 2$ matrices, and define

$\sigma :\mathbb{C}\rightarrow\|{M}_{2}(\mathbb{R})\quad \mbox{by} \quad \sigma (a+bi)= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \mbox{ for all } a+bi \mbox{ in } \mathbb{C} \nonumber$

One verifies that $\sigma$ preserves addition and multiplication in the sense that

$\sigma (zw)=\sigma (z)\sigma (w)\qquad \mbox{and}\qquad \sigma (z+w)=\sigma (z)+\sigma (w) \nonumber$

for all complex numbers $z$ and $w$ . Since $\theta$ is one-to-one we may identify each complex number $a+bi$ with the matrix $\theta (a+bi)$ , that is we write

$a+bi= \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] \quad \mbox{for all } a+bi \mbox{ in } \mathbb{C} \nonumber$

Thus $0= \left[ \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right]$ , $1= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] =I_{2}$ , $i= \left[ \begin{array}{rr} 0 & -1 \\ 1 & 0 \end{array} \right]$ , and $r = \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right]$ if $r$ is real.

If $z=a+bi$ is nonzero then the absolute value $r=|z| =\sqrt{a^{2}+b^{2}}\neq 0$ . If $\theta$ is the angle of $z$ in standard position, then $\cos\theta =a/r$ and $\sin\theta =b/r$ . Observe:

$\label{svdeqnxiii} \left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} a/r & -b/r \\ b/r & a/r \end{array} \right] =\left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right] \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right] = GQ \tag{\textbf{xiii}}$

where $G= \left[ \begin{array}{cc} r & 0 \\ 0 & r \end{array} \right]$ is positive and $Q= \left[ \begin{array}{rr} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array} \right]$ is orthogonal. But in $\mathbb{C}$ we have $G=r$ and $Q=\cos\theta +i\sin\theta$ so ([svdeqnxiii]) reads $z=r(\cos\theta +i\sin\theta )=re^{i\theta }$ which is the classical polar form for the complex number $a+bi$ . This is why ([svdeqnxiii]) is called the polar form of the matrix $\left[ \begin{array}{rr} a & -b \\ b & a \end{array} \right]$ ; Definition [def:svddef6] simply adopts the terminology for $n\times n$ matrices.

The Pseudoinverse of a Matrix

It is impossible for a non-square matrix $A$ to have an inverse (see the footnote to Definition [def:004202]). Nonetheless, one candidate for an “inverse” of $A$ is an $m\times n$ matrix $B$ such that

$ABA=A \qquad \mbox{and} \qquad BAB=B \nonumber$

Such a matrix $B$ is called a middle inverse for $A$ . If $A$ is invertible then $A^{-1}$ is the unique middle inverse for $A$ , but a middle inverse is not unique in general, even for square matrices. For example, if $A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]$ then $B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right]$ is a middle inverse for $A$ for any $b$ .

If $ABA=A$ and $BAB=B$ it is easy to see that $AB$ and $BA$ are both idempotent matrices. In 1955 Roger Penrose observed that the middle inverse is unique if both $AB$ and $BA$ are symmetric. We omit the proof.

Penrose’ Theorempenrosetheorem

Given any real $m\times n$ matrix $A$ , there is exactly one $n\times m$ matrix $B$ such that $A$ and $B$ satisfy the following conditions:

$ABA=A$ and $BAB=B$ .
Both $AB$ and $BA$ are symmetric.

Let $A$ be a real $m\times n$ matrix. The pseudoinverse of $A$ is the unique $n\times m$ matrix $A^{+}$ such that $A$ and $A^{+}$ satisfy P1 and P2, that is:

$AA^{+}A=A,\qquad A^{+}AA^{+}=A^{+},\qquad \mbox{and both } AA^{+} \mbox{ and } A^{+}A \mbox{ are symmetric}\footnotemark \nonumber$

If $A$ is invertible then $A^{+}=A^{-1}$ as expected. In general, the symmetry in conditions P1 and P2 shows that $A$ is the pseudoinverse of $A^{+}$ , that is $A^{++}=A$ .

Let $A$ be an $m\times n$ matrix.

If $rank \;A=m$ then $AA^{T}$ is invertible and $A^{+}=A^{T}(AA^{T})^{-1}$ .
If $rank \;A=n$ then $A^{T}A$ is invertible and $A^{+}=(A^{T}A)^{-1}A^{T}$ .

Here $AA^{T}$ (respectively $A^{T}A)$ is invertible by Theorem [thm:015711] (respectively Theorem [thm:015672]). The rest is a routine verification.

In general, given an $m\times n$ matrix $A$ , the pseudoinverse $A^{+}$ can be computed from any SVD for $A$ . To see how, we need some notation. Let $A=U\Sigma V^{T}$ be an SVD for $A$ (as in Definition [def:svddef3]) where $U$ and $V$ are orthogonal and $\Sigma = \left[ \begin{array}{cc} D & 0 \\ 0 & 0 \end{array} \right]_{m\times n}$ in block form where $D=\func{diag}(d_{1},d_{2},\ldots ,d_{r})$ where each $d_{i}>0$ . Hence $D$ is invertible, so we make:

$\Sigma^{\prime }= \left[ \begin{array}{cc} D^{-1} & 0 \\ 0 & 0 \end{array} \right]_{n\times m}.$

A routine calculation gives:

svdlemma6

$\Sigma \Sigma^{\prime }\Sigma =\Sigma$
$\Sigma^{\prime }\Sigma \Sigma^{\prime }=\Sigma^{\prime }$
$\Sigma \Sigma^{\prime }= \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{m\times m}$
$\Sigma^{\prime }\Sigma = \left[ \begin{array}{cc} I_{r} & 0 \\ 0 & 0 \end{array} \right]_{n\times n}$

That is, $\Sigma^{\prime }$ is the pseudoinverse of $\Sigma$ .

Now given $A=U\Sigma V^{T}$ , define $B=V\Sigma^{\prime }U^{T}$ . Then

$ABA=(U\Sigma V^{T})(V\Sigma^{\prime }U^{T})(U\Sigma V^{T})=U(\Sigma \Sigma^{\prime }\Sigma )V^{T}=U\Sigma V^{T}=A \nonumber$

by Lemma [lem:svdlemma6]. Similarly $BAB=B$ . Moreover $AB=U(\Sigma \Sigma^{\prime })U^{T}$ and $BA=V(\Sigma^{\prime }\Sigma )V^{T}$ are both symmetric again by Lemma [lem:svdlemma6]. This proves

Let $A$ be real and $m\times n$ , and let $A=U\Sigma V^{T}$ is any SVD for $A$ as in Definition [def:svddef3]. Then $A^{+}=V\Sigma^{\prime }U^{T}$ .

Of course we can always use the SVD constructed in Theorem [thm:svdtheorem1] to find the pseudoinverse. If $A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]$ , we observed above that $B= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ b & 0 & 0 \end{array} \right]$ is a middle inverse for $A$ for any $b$ . Furthermore $AB$ is symmetric but $BA$ is not, so $B\neq A^{+}$ .

Find $A^{+}$ if $A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]$ .

$A^{T}A= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \end{array} \right]$ with eigenvalues $\lambda_{1}=1$ and $\lambda_{2}=0$ and corresponding eigenvectors $\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \end{array} \right]$ and $\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 1 \end{array} \right]$ . Hence $Q=\left[ \begin{array}{cc}\mathbf{q}_{1}&\mathbf{q}_{2}\end{array}\right]=I_{2}$ . Also $A$ has rank $1$ with singular values $\sigma_{1}=1$ and $\sigma_{2}=0,$ so $\Sigma_{A}= \left[ \begin{array}{cc} 1 & 0 \\ 0 & 0 \\ 0 & 0 \end{array} \right]=A$ and $\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]=A^{T}$ in this case.

Since $A\mathbf{q}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]$ and $A\mathbf{q}_{2}= \left[ \begin{array}{c} 0 \\ 0 \\ 0 \end{array} \right]$ , we have $\mathbf{p}_{1}= \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]$ which extends to an orthonormal basis $\{\mathbf{p}_{1},\mathbf{p}_{2},\mathbf{p}_{3}\}$ of $\mathbb{R}^{3}$ where (say) $\mathbf{p}_{2}= \left[ \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \right]$ and $\mathbf{p}_{3}= \left[ \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \right]$ . Hence $P=\left[ \begin{array}{ccc} \mathbf{p}_{1}&\mathbf{p}_{2}&\mathbf{p}_{3}\end{array}\right]=I$ , so the SVD for $A$ is $A=P\Sigma _{A}Q^{T}.$ Finally, the pseudoinverse of $A$ is $A^{+}=Q\Sigma^{\prime}_{A}P^{T}=\Sigma^{\prime}_{A}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]$ . Note that $A^{+}=A^{T}$ in this case.

The following Lemma collects some properties of the pseudoinverse that mimic those of the inverse. The verifications are left as exercises.

Let $A$ be an $m\times n$ matrix of rank $r$ .

$A^{++}=A$ .
If $A$ is invertible then $A^{+}=A^{-1}$ .
$(A^{T})^{+}=(A^{+})^{T}$ .
$(kA)^{+}=kA^{+}$ for any real $k$ .
$(UAV)^{+}=U^{T}(A^{+})V^{T}$ whenever $U$ and $V$ are orthogonal.

In fact every complex matrix has an SVD [J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997]↩
See J.T. Scheick, Linear Algebra with Applications, McGraw-Hill, 1997, page 379.↩

Singular Value Decompositions

Fundamental Subspaces

The Polar Decomposition of a Real Square Matrix

The Pseudoinverse of a Matrix

Support Center

How can we help?