8.7: Complex Matrices
- Page ID
- 59050
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)If \(A\) is an \(n \times n\) matrix, the characteristic polynomial \(c_{A}(x)\) is a polynomial of degree \(n\) and the eigenvalues of \(A\) are just the roots of \(c_{A}(x)\). In most of our examples these roots have been real numbers (in fact, the examples have been carefully chosen so this will be the case!); but it need not happen, even when the characteristic polynomial has real coefficients. For example, if \(A = \left[ \begin{array}{rr} 0 & 1 \\ -1 & 0 \end{array}\right]\) then \(c_{A}(x) = x^{2} + 1\) has roots \(i\) and \(-i\), where \(i\) is a complex number satisfying \(i^{2} = -1\). Therefore, we have to deal with the possibility that the eigenvalues of a (real) square matrix might be complex numbers.
In fact, nearly everything in this book would remain true if the phrase real number were replaced by complex number wherever it occurs. Then we would deal with matrices with complex entries, systems of linear equations with complex coefficients (and complex solutions), determinants of complex matrices, and vector spaces with scalar multiplication by any complex number allowed. Moreover, the proofs of most theorems about (the real version of) these concepts extend easily to the complex case. It is not our intention here to give a full treatment of complex linear algebra. However, we will carry the theory far enough to give another proof that the eigenvalues of a real symmetric matrix \(A\) are real (Theorem [thm:016397]) and to prove the spectral theorem, an extension of the principal axes theorem (Theorem [thm:024303]).
The set of complex numbers is denoted \(\mathbb{C}\). We will use only the most basic properties of these numbers (mainly conjugation and absolute values), and the reader can find this material in Appendix [chap:appacomplexnumbers].
If \(n \geq 1\), we denote the set of all \(n\)-tuples of complex numbers by \(\mathbb{C}^n\). As with \(\mathbb{R}^n\), these \(n\)-tuples will be written either as row or column matrices and will be referred to as vectors. We define vector operations on \(\mathbb{C}^n\) as follows:
\[\begin{aligned} (v_{1}, v_{2}, \dots, v_{n}) + (w_{1}, w_{2}, \dots, w_{n}) &= (v_{1} + w_{1}, v_{2} + w_{2}, \dots, v_{n} + w_{n}) \\ u(v_{1}, v_{2}, \dots, v_{n}) &= (uv_{1}, uv_{2}, \dots, uv_{n}) \quad \mbox{ for } u \mbox{ in } \mathbb{C}\end{aligned} \nonumber \]
With these definitions, \(\mathbb{C}^n\) satisfies the axioms for a vector space (with complex scalars) given in Chapter [chap:6]. Thus we can speak of spanning sets for \(\mathbb{C}^n\), of linearly independent subsets, and of bases. In all cases, the definitions are identical to the real case, except that the scalars are allowed to be complex numbers. In particular, the standard basis of \(\mathbb{R}^n\) remains a basis of \(\mathbb{C}^n\), called the standard basis of \(\mathbb{C}^n\).
A matrix \(A = \left[ a_{ij} \right]\) is called a complex matrix if every entry \(a_{ij}\) is a complex number. The notion of conjugation for complex numbers extends to matrices as follows: Define the conjugate of \(A = \left[ a_{ij} \right]\) to be the matrix
\[\overline{A} = \left[ \begin{array}{c} \overline{a}_{ij} \end{array}\right] \nonumber \]
obtained from \(A\) by conjugating every entry. Then (using Appendix [chap:appacomplexnumbers])
\[\overline{A + B} = \overline{A} + \overline{B} \quad \mbox{ and } \quad \overline{AB} = \overline{A} \; \overline{B} \nonumber \]
holds for all (complex) matrices of appropriate size.
The Standard Inner Product
There is a natural generalization to \(\mathbb{C}^n\) of the dot product in \(\mathbb{R}^n\).
Standard Inner Product in \(\mathbb{R}^n\)025549 Given \(\mathbf{z} = (z_{1}, z_{2}, \dots, z_{n})\) and \(\mathbf{w} = (w_{1}, w_{2}, \dots, w_{n})\) in \(\mathbb{C}^n\), define their standard inner product \(\langle \mathbf{z}, \mathbf{w} \rangle\) by
\[\langle \mathbf{z}, \mathbf{w} \rangle = z_{1}\overline{w}_{1} + z_{2}\overline{w}_{2} + \dots + z_{n}\overline{w}_{n} = \mathbf{z}\bullet \overline{\mathbf{w}} \nonumber \]
where \(\overline{w}\) is the conjugate of the complex number \(w\).
Clearly, if \(\mathbf{z}\) and \(\mathbf{w}\) actually lie in \(\mathbb{R}^n\), then \(\langle \mathbf{z}, \mathbf{w} \rangle = \mathbf{z}\bullet \mathbf{w}\) is the usual dot product.
025563 If \(\mathbf{z} = (2, 1 - i, 2i, 3 - i)\) and \(\mathbf{w} = (1 - i, -1, -i, 3 + 2i)\), then
\[\begin{aligned} \langle \mathbf{z}, \mathbf{w} \rangle &= 2(1 + i) + (1 - i)(-1) + (2i)(i) + (3 - i)(3 - 2i) = 6 -6i \\ \langle \mathbf{z}, \mathbf{z} \rangle &= 2 \cdot 2 + (1 - i)(1 + i) + (2i)(-2i) + (3 - i)(3 + i) = 20\end{aligned} \nonumber \]
Note that \(\langle \mathbf{z}, \mathbf{w} \rangle\) is a complex number in general. However, if \(\mathbf{w} = \mathbf{z} = (z_{1}, z_{2}, \dots, z_{n})\), the definition gives \(\langle \mathbf{z}, \mathbf{z} \rangle = |z_{1}|^{2} + \dots + |z_{n}|^{2}\) which is a nonnegative real number, equal to \(0\) if and only if \(\mathbf{z} = \mathbf{0}\). This explains the conjugation in the definition of \(\langle \mathbf{z}, \mathbf{w} \rangle\), and it gives (4) of the following theorem.
025575 Let \(\mathbf{z}\), \(\mathbf{z}_{1}\), \(\mathbf{w}\), and \(\mathbf{w}_{1}\) denote vectors in \(\mathbb{C}^n\), and let \(\lambda\) denote a complex number.
- \(\langle \mathbf{z} + \mathbf{z}_{1}, \mathbf{w}\rangle = \langle \mathbf{z}, \mathbf{w} \rangle + \langle \mathbf{z}_{1}, \mathbf{w} \rangle\) and \(\langle \mathbf{z}, \mathbf{w} + \mathbf{w}_{1} \rangle = \langle \mathbf{z}, \mathbf{w} \rangle + \langle \mathbf{z}, \mathbf{w}_{1} \rangle\).
- \(\langle \lambda \mathbf{z}, \mathbf{w} \rangle = \lambda \langle \mathbf{z}, \mathbf{w} \rangle\) and \(\langle \mathbf{z}, \lambda \mathbf{w} \rangle = \overline{\lambda} \langle \mathbf{z}, \mathbf{w} \rangle\).
- \(\langle \mathbf{z}, \mathbf{w} \rangle = \overline{\langle \mathbf{w}, \mathbf{z} \rangle}\).
- \(\langle \mathbf{z}, \mathbf{z} \rangle \geq 0\), and \(\langle \mathbf{z}, \mathbf{z} \rangle = 0\) if and only if \(\mathbf{z} = \mathbf{0}\).
We leave (1) and (2) to the reader (Exercise [ex:8_6_10]), and (4) has already been proved. To prove (3), write \(\mathbf{z} = (z_{1}, z_{2}, \dots, z_{n})\) and \(\mathbf{w} = (w_{1}, w_{2}, \dots, w_{n})\). Then
\[\begin{aligned} \overline{\langle \mathbf{w}, \mathbf{z} \rangle} = (\overline{w_{1}\overline{z}_{1} + \dots + w_{n}\overline{z}_{n}}) &= \overline{w}_{1}\overline{\overline{z}}_{1} + \dots + \overline{w}_{n}\overline{\overline{z}}_{n} \\ &= z_{1}\overline{w}_{1} + \dots + z_{n}\overline{w}_{n} = \langle \mathbf{z}, \mathbf{w} \rangle\end{aligned} \nonumber \]
Norm and Length in \(\mathbb{C}^n\)025606 As for the dot product on \(\mathbb{R}^n\), property (4) enables us to define the norm or length \(\|\mathbf{z}\|\) of a vector \(\mathbf{z} = (z_{1}, z_{2}, \dots, z_{n})\) in \(\mathbb{C}^n\):
\[\| \mathbf{z} \| = \sqrt{\langle \mathbf{z}, \mathbf{z} \rangle} = \sqrt{|z_{1}|^2 + |z_{2}|^2 + \dots + |z_{n}|^2} \nonumber \]
The only properties of the norm function we will need are the following (the proofs are left to the reader):
025616 If \(\mathbf{z}\) is any vector in \(\mathbb{C}^n\), then
- \(\| \mathbf{z} \| \geq 0\) and \(\| \mathbf{z} \| = 0\) if and only if \(\mathbf{z} = \mathbf{0}\).
- \(\| \lambda\mathbf{z} \| = |\lambda| \| \mathbf{z} \|\) for all complex numbers \(\lambda\).
A vector \(\mathbf{u}\) in \(\mathbb{C}^n\) is called a unit vector if \(\|\mathbf{u}\| = 1\). Property (2) in Theorem [thm:025616] then shows that if \(\mathbf{z} \neq \mathbf{0}\) is any nonzero vector in \(\mathbb{C}^n\), then \(\mathbf{u} = \frac{1}{\| \mathbf{z} \|}\mathbf{z}\) is a unit vector.
025631 In \(\mathbb{C}^4\), find a unit vector \(\mathbf{u}\) that is a positive real multiple of \(\mathbf{z} = (1 - i, i, 2, 3 + 4i)\).
\(\| \mathbf{z} \| = \sqrt{2+1+4+25} = \sqrt{32} = 4\sqrt{2}\), so take \(\mathbf{u} = \frac{1}{4\sqrt{2}}\mathbf{z}\).
Transposition of complex matrices is defined just as in the real case, and the following notion is fundamental.
Conjugate Transpose in \(\mathbb{C}^n\)025646 The conjugate transpose \(A^{H}\) of a complex matrix \(A\) is defined by
\[A^H = (\overline{A})^T = \overline{(A^T)} \nonumber \]
Observe that \(A^{H} = A^{T}\) when \(A\) is real.1
025654
\[\left[ \begin{array}{ccr} 3 & 1 - i & 2 + i \\ 2i & 5 + 2i & -i \end{array}\right]^H = \left[ \begin{array}{cc} 3 & -2i \\ 1 + i & 5 - 2i \\ 2 - i & i \end{array}\right] \nonumber \]
The following properties of \(A^{H}\) follow easily from the rules for transposition of real matrices and extend these rules to complex matrices. Note the conjugate in property (3).
025659 Let \(A\) and \(B\) denote complex matrices, and let \(\lambda\) be a complex number.
- \((A^{H})^{H} = A\).
- \((A + B)^{H} = A^{H} + B^{H}\).
- \((\lambda A)^H = \overline{\lambda}A^H\).
- \((AB)^{H} = B^{H}A^{H}\).
Hermitian and Unitary Matrices
If \(A\) is a real symmetric matrix, it is clear that \(A^{H} = A\). The complex matrices that satisfy this condition turn out to be the most natural generalization of the real symmetric matrices:
Hermitian Matrices025684 A square complex matrix \(A\) is called hermitianif \(A^{H} = A\), equivalently if \(\overline{A} = A^T\).
Hermitian matrices are easy to recognize because the entries on the main diagonal must be real, and the “reflection” of each nondiagonal entry in the main diagonal must be the conjugate of that entry.
025690 \(\left[ \begin{array}{ccc} 3 & i & 2 + i \\ -i & -2 & -7 \\ 2 - i & -7 & 1 \end{array}\right]\) is hermitian, whereas \(\left[ \begin{array}{rr} 1 & i \\ i & -2 \end{array}\right]\) and \(\left[ \begin{array}{rr} 1 & i \\ -i & i \end{array}\right]\) are not.
The following Theorem extends Theorem [thm:024396], and gives a very useful characterization of hermitian matrices in terms of the standard inner product in \(\mathbb{C}^n\).
025697 An \(n \times n\) complex matrix \(A\) is hermitian if and only if
\[\langle A\mathbf{z}, \mathbf{w} \rangle = \langle \mathbf{z}, A\mathbf{w} \rangle \nonumber \]
for all \(n\)-tuples \(\mathbf{z}\) and \(\mathbf{w}\) in \(\mathbb{C}^n\).
If \(A\) is hermitian, we have \(A^T = \overline{A}\). If \(\mathbf{z}\) and \(\mathbf{w}\) are columns in \(\mathbb{C}^n\), then \(\langle \mathbf{z}, \mathbf{w} \rangle = \mathbf{z}^T\overline{\mathbf{w}}\), so
\[\langle A\mathbf{z}, \mathbf{w} \rangle =(A\mathbf{z})^T\overline{\mathbf{w}} = \mathbf{z}^TA^T\overline{\mathbf{w}} = \mathbf{z}^T\overline{A}\overline{\mathbf{w}} = \mathbf{z}^T(\overline{A\mathbf{w}}) = \langle \mathbf{z}, A\mathbf{w} \rangle \nonumber \]
To prove the converse, let \(\mathbf{e}_{j}\) denote column \(j\) of the identity matrix. If \(A = \left[ a_{ij} \right]\), the condition gives
\[\overline{a}_{ij} = \langle \mathbf{e}_{i}, A\mathbf{e}_{j} \rangle = \langle A\mathbf{e}_{i}, \mathbf{e}_{j} \rangle = {a}_{ij} \nonumber \]
Hence \(\overline{A} = A^T\), so \(A\) is hermitian.
Let \(A\) be an \(n \times n\) complex matrix. As in the real case, a complex number \(\lambda\) is called an eigenvalue of \(A\) if \(A\mathbf{x} = \lambda \mathbf{x}\) holds for some column \(\mathbf{x} \neq \mathbf{0}\) in \(\mathbb{C}^n\). In this case \(\mathbf{x}\) is called an eigenvector of \(A\) corresponding to \(\lambda\). The characteristic polynomial \(c_{A}(x)\) is defined by
\[c_{A}(x) = \det (xI - A) \nonumber \]
This polynomial has complex coefficients (possibly nonreal). However, the proof of Theorem [thm:009033] goes through to show that the eigenvalues of \(A\) are the roots (possibly complex) of \(c_{A}(x)\).
It is at this point that the advantage of working with complex numbers becomes apparent. The real numbers are incomplete in the sense that the characteristic polynomial of a real matrix may fail to have all its roots real. However, this difficulty does not occur for the complex numbers. The so-called fundamental theorem of algebra ensures that every polynomial of positive degree with complex coefficients has a complex root. Hence every square complex matrix \(A\) has a (complex) eigenvalue. Indeed (Appendix [chap:appacomplexnumbers]), \(c_{A}(x)\) factors completely as follows:
\[c_{A}(x) = (x -\lambda_{1})(x -\lambda_{2}) \cdots (x -\lambda_{n}) \nonumber \]
where \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) are the eigenvalues of \(A\) (with possible repetitions due to multiple roots).
The next result shows that, for hermitian matrices, the eigenvalues are actually real. Because symmetric real matrices are hermitian, this re-proves Theorem [thm:016397]. It also extends Theorem [thm:024407], which asserts that eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues are actually orthogonal. In the complex context, two \(n\)-tuples \(\mathbf{z}\) and \(\mathbf{w}\) in \(\mathbb{C}^n\) are said to be orthogonal if \(\langle \mathbf{z}, \mathbf{w} \rangle = 0\).
025729 Let \(A\) denote a hermitian matrix.
- The eigenvalues of \(A\) are real.
- Eigenvectors of \(A\) corresponding to distinct eigenvalues are orthogonal.
Let \(\lambda\) and \(\mu\) be eigenvalues of \(A\) with (nonzero) eigenvectors \(\mathbf{z}\) and \(\mathbf{w}\). Then \(A\mathbf{z} = \lambda \mathbf{z}\) and \(A\mathbf{w} = \mu \mathbf{w}\), so Theorem [thm:025697] gives
\[\label{eigenvalEq} \lambda \langle \mathbf{z}, \mathbf{w} \rangle = \langle \lambda \mathbf{z}, \mathbf{w} \rangle = \langle A\mathbf{z}, \mathbf{w} \rangle = \langle \mathbf{z}, A\mathbf{w} \rangle = \langle \mathbf{z}, \mu \mathbf{w} \rangle = \overline{\mu} \langle \mathbf{z}, \mathbf{w} \rangle \]
If \(\mu = \lambda\) and \(\mathbf{w} = \mathbf{z}\), this becomes \(\lambda \langle \mathbf{z}, \mathbf{z} \rangle = \overline{\lambda} \langle \mathbf{z}, \mathbf{z} \rangle\). Because \(\langle \mathbf{z}, \mathbf{z} \rangle = \|\mathbf{z}\|^{2} \neq 0\), this implies \(\lambda = \overline{\lambda}\). Thus \(\lambda\) is real, proving (1). Similarly, \(\mu\) is real, so equation ([eigenvalEq]) gives \(\lambda \langle \mathbf{z}, \mathbf{w} \rangle = \mu \langle \mathbf{z}, \mathbf{w} \rangle\). If \(\lambda \neq \mu\), this implies \(\langle \mathbf{z}, \mathbf{w} \rangle = 0\), proving (2).
The principal axes theorem (Theorem [thm:024303]) asserts that every real symmetric matrix \(A\) is orthogonally diagonalizable—that is \(P^{T}AP\) is diagonal where \(P\) is an orthogonal matrix \((P^{-1} = P^{T})\). The next theorem identifies the complex analogs of these orthogonal real matrices.
Orthogonal and Orthonormal Vectors in \(\mathbb{C}^n\)025749 As in the real case, a set of nonzero vectors \(\{\mathbf{z}_{1}, \mathbf{z}_{2}, \dots, \mathbf{z}_{m}\}\) in \(\mathbb{C}^n\) is called orthogonal if \(\langle \mathbf{z}_{i}, \mathbf{z}_{j}\rangle = 0\) whenever \(i \neq j\), and it is orthonormal if, in addition, \(\|\mathbf{z}_{i} \| = 1\) for each \(i\).
025759 The following are equivalent for an \(n \times n\) complex matrix \(A\).
- \(A\) is invertible and \(A^{-1} = A^{H}\).
- The rows of \(A\) are an orthonormal set in \(\mathbb{C}^n\).
- The columns of \(A\) are an orthonormal set in \(\mathbb{C}^n\).
If \(A = \left[ \begin{array}{cccc} \mathbf{c}_{1} & \mathbf{c}_{2} & \cdots & \mathbf{c}_{n} \end{array}\right]\) is a complex matrix with \(j\)th column \(\mathbf{c}_{j}\), then \(A^T\overline{A} = \left[ \langle \mathbf{c}_{i}, \mathbf{c}_{j}\rangle \right]\), as in Theorem [thm:024227]. Now (1) \(\Leftrightarrow\) (2) follows, and (1) \(\Leftrightarrow\) (3) is proved in the same way.
Unitary Matrices025781 A square complex matrix \(U\) is called unitary if \(U^{-1} = U^{H}\).
Thus a real matrix is unitary if and only if it is orthogonal.
025787 The matrix \(A = \left[ \begin{array}{rr} 1 + i & 1 \\ 1 - i & i \end{array}\right]\) has orthogonal columns, but the rows are not orthogonal. Normalizing the columns gives the unitary matrix \(\frac{1}{2}\left[ \begin{array}{rr} 1 + i & \sqrt{2} \\ 1 - i & \sqrt{2}i \end{array}\right]\).
Given a real symmetric matrix \(A\), the diagonalization algorithm in Section [sec:3_3] leads to a procedure for finding an orthogonal matrix \(P\) such that \(P^{T}AP\) is diagonal (see Example [exa:024374]). The following example illustrates Theorem [thm:025729] and shows that the technique works for complex matrices.
025794 Consider the hermitian matrix \(A = \left[ \begin{array}{cc} 3 & 2 + i \\ 2 - i & 7 \end{array}\right]\). Find the eigenvalues of \(A\), find two orthonormal eigenvectors, and so find a unitary matrix \(U\) such that \(U^{H}AU\) is diagonal.
The characteristic polynomial of \(A\) is
\[c_{A}(x) = \det (xI - A) = \det \left[ \begin{array}{rr} x - 3 & -2 - i \\ -2 + i & x - 7 \end{array}\right] = (x-2)(x-8) \nonumber \]
Hence the eigenvalues are \(2\) and \(8\) (both real as expected), and corresponding eigenvectors are \(\left[ \begin{array}{cc} 2 + i \\ -1 \end{array}\right]\) and \(\left[ \begin{array}{cc} 1 \\ 2 - i \end{array}\right]\) (orthogonal as expected). Each has length \(\sqrt{6}\) so, as in the (real) diagonalization algorithm, let \(U = \frac{1}{\sqrt{6}}\left[ \begin{array}{cc} 2 + i & 1 \\ -1 & 2 - i \end{array}\right]\) be the unitary matrix with the normalized eigenvectors as columns.
Then \(U^HAU = \left[ \begin{array}{rr} 2 & 0 \\ 0 & 8 \end{array}\right]\) is diagonal.
Unitary Diagonalization
An \(n \times n\) complex matrix \(A\) is called unitarily diagonalizable if \(U^{H}AU\) is diagonal for some unitary matrix \(U\). As Example [exa:025794] suggests, we are going to prove that every hermitian matrix is unitarily diagonalizable. However, with only a little extra effort, we can get a very important theorem that has this result as an easy consequence.
A complex matrix is called upper triangular if every entry below the main diagonal is zero. We owe the following theorem to Issai Schur.2
Schur’s Theorem025814 If \(A\) is any \(n \times n\) complex matrix, there exists a unitary matrix \(U\) such that
\[U^HAU = T \nonumber \]
is upper triangular. Moreover, the entries on the main diagonal of \(T\) are the eigenvalues \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) of \(A\) (including multiplicities).
We use induction on \(n\). If \(n = 1\), \(A\) is already upper triangular. If \(n > 1\), assume the theorem is valid for \((n - 1) \times (n - 1)\) complex matrices. Let \(\lambda_{1}\) be an eigenvalue of \(A\), and let \(\mathbf{y}_{1}\) be an eigenvector with \(\|\mathbf{y}_{1}\| = 1\). Then \(\mathbf{y}_{1}\) is part of a basis of \(\mathbb{C}^n\) (by the analog of Theorem [thm:019430]), so the (complex analog of the) Gram-Schmidt process provides \(\mathbf{y}_{2}, \dots, \mathbf{y}_{n}\) such that \(\{\mathbf{y}_{1}, \mathbf{y}_{2}, \dots, \mathbf{y}_{n}\}\) is an orthonormal basis of \(\mathbb{C}^n\). If \(U_{1} = \left[ \begin{array}{cccc} \mathbf{y}_{1} & \mathbf{y}_{2} & \cdots & \mathbf{y}_{n} \end{array}\right]\) is the matrix with these vectors as its columns, then (see Lemma [lem:015527])
\[U_{1}^HAU_{1} = \left[ \begin{array}{cc} \lambda_{1} & X_{1} \\ 0 & A_{1} \end{array}\right] \nonumber \]
in block form. Now apply induction to find a unitary \((n - 1) \times (n - 1)\) matrix \(W_{1}\) such that \(W_{1}^HA_{1}W_{1} = T_{1}\) is upper triangular. Then \(U_{2} = \left[ \begin{array}{cc} 1 & 0 \\ 0 & W_{1} \end{array}\right]\) is a unitary \(n \times n\) matrix. Hence \(U = U_{1}U_{2}\) is unitary (using Theorem [thm:025759]), and
\[\begin{aligned} U^HAU &= U_{2}^H(U_{1}^HAU_{1})U_{2} \\ &= \left[ \begin{array}{cc} 1 & 0 \\ 0 & W_{1}^H \end{array}\right] \left[ \begin{array}{cc} \lambda_{1} & X_{1} \\ 0 & A_{1} \end{array}\right] \left[ \begin{array}{cc} 1 & 0 \\ 0 & W_{1} \end{array}\right] = \left[ \begin{array}{cc} \lambda_{1} & X_{1}W_{1} \\ 0 & T_{1} \end{array}\right]\end{aligned} \nonumber \]
is upper triangular. Finally, \(A\) and \(U^{H}AU = T\) have the same eigenvalues by (the complex version of) Theorem [thm:016008], and they are the diagonal entries of \(T\) because \(T\) is upper triangular.
The fact that similar matrices have the same traces and determinants gives the following consequence of Schur’s theorem.
025850 Let \(A\) be an \(n \times n\) complex matrix, and let \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) denote the eigenvalues of \(A\), including multiplicities. Then
\[\det A = \lambda_1\lambda_2 \cdots \lambda_n \quad \mbox{and} \quad tr \;A = \lambda_1 + \lambda_2 + \cdots + \lambda_n \nonumber \]
Schur’s theorem asserts that every complex matrix can be “unitarily triangularized.” However, we cannot substitute “unitarily diagonalized” here. In fact, if \(A = \left[ \begin{array}{cc} 1 & 1 \\ 0 & 1 \end{array}\right]\), there is no invertible complex matrix \(U\) at all such that \(U^{-1}AU\) is diagonal. However, the situation is much better for hermitian matrices.
Spectral Theorem025860 If \(A\) is hermitian, there is a unitary matrix \(U\) such that \(U^{H}AU\) is diagonal.
By Schur’s theorem, let \(U^{H}AU = T\) be upper triangular where \(U\) is unitary. Since \(A\) is hermitian, this gives
\[T^H = (U^HAU)^H = U^HA^HU^{HH} = U^HAU = T \nonumber \]
This means that \(T\) is both upper and lower triangular. Hence \(T\) is actually diagonal.
The principal axes theorem asserts that a real matrix \(A\) is symmetric if and only if it is orthogonally diagonalizable (that is, \(P^{T}AP\) is diagonal for some real orthogonal matrix \(P\)). Theorem [thm:025860] is the complex analog of half of this result. However, the converse is false for complex matrices: There exist unitarily diagonalizable matrices that are not hermitian.
025874 Show that the non-hermitian matrix \(A = \left[ \begin{array}{rr} 0 & 1 \\ -1 & 0 \end{array}\right]\) is unitarily diagonalizable.
The characteristic polynomial is \(c_{A}(x) = x^{2} + 1\). Hence the eigenvalues are \(i\) and \(-i\), and it is easy to verify that \(\left[ \begin{array}{r} i \\ -1 \end{array}\right]\) and \(\left[ \begin{array}{r} -1 \\ i \end{array}\right]\) are corresponding eigenvectors. Moreover, these eigenvectors are orthogonal and both have length \(\sqrt{2}\), so \(U = \frac{1}{\sqrt{2}}\left[ \begin{array}{rr} i & -1 \\ -1 & i \end{array}\right]\) is a unitary matrix such that \(U^HAU = \left[ \begin{array}{rr} i & 0 \\ 0 & -i \end{array}\right]\) is diagonal.
There is a very simple way to characterize those complex matrices that are unitarily diagonalizable. To this end, an \(n \times n\) complex matrix \(N\) is called normal if \(NN^{H} = N^{H}N\). It is clear that every hermitian or unitary matrix is normal, as is the matrix \(\left[ \begin{array}{rr} 0 & 1 \\ -1 & 0 \end{array}\right]\) in Example [exa:025874]. In fact we have the following result.
025890 An \(n \times n\) complex matrix \(A\) is unitarily diagonalizable if and only if \(A\) is normal.
Assume first that \(U^{H}AU = D\), where \(U\) is unitary and \(D\) is diagonal. Then \(DD^{H} = D^{H}D\) as is easily verified. Because \(DD^{H} = U^{H}(AA^{H})U\) and \(D^{H}D = U^{H}(A^{H}A)U\), it follows by cancellation that \(AA^{H} = A^{H}A\).
Conversely, assume \(A\) is normal—that is, \(AA^{H} = A^{H}A\). By Schur’s theorem, let \(U^{H}AU = T\), where \(T\) is upper triangular and \(U\) is unitary. Then \(T\) is normal too:
\[TT^H = U^H(AA^H)U = U^H(A^HA)U = T^HT \nonumber \]
Hence it suffices to show that a normal \(n \times n\) upper triangular matrix \(T\) must be diagonal. We induct on \(n\); it is clear if \(n = 1\). If \(n > 1\) and \(T = \left[ t_{ij} \right]\), then equating \((1, 1)\)-entries in \(TT^{H}\) and \(T^{H}T\) gives
\[|t_{11}|^2 + |t_{12}|^2 + \dots + |t_{1n}|^2 = |t_{11}|^2 \nonumber \]
This implies \(t_{12} = t_{13} = \dots = t_{1n} = 0\), so \(T = \left[ \begin{array}{cc} t_{11} & 0 \\ 0 & T_{1} \end{array}\right]\) in block form. Hence \(T = \left[ \begin{array}{cc} \overline{t}_{11} & 0 \\ 0 & T_{1}^H \end{array}\right]\) so \(TT^{H} = T^{H}T\) implies \(T_{1}T_{1}^H = T_{1}T_{1}^H\). Thus \(T_{1}\) is diagonal by induction, and the proof is complete.
We conclude this section by using Schur’s theorem (Theorem [thm:025814]) to prove a famous theorem about matrices. Recall that the characteristic polynomial of a square matrix \(A\) is defined by \(c_{A}(x) = \det (xI - A)\), and that the eigenvalues of \(A\) are just the roots of \(c_{A}(x)\).
Cayley-Hamilton Theorem025927 If \(A\) is an \(n \times n\) complex matrix, then \(c_{A}(A) = 0\); that is, \(A\) is a root of its characteristic polynomial.
If \(p(x)\) is any polynomial with complex coefficients, then \(p(P^{-1}AP) = P^{-1}p(A)P\) for any invertible complex matrix \(P\). Hence, by Schur’s theorem, we may assume that \(A\) is upper triangular. Then the eigenvalues \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) of \(A\) appear along the main diagonal, so
\[c_{A}(x) = (x - \lambda_{1})(x - \lambda_{2})(x - \lambda_{3}) \cdots (x -\lambda_{n}) \nonumber \]
Thus
\[c_{A}(A) = (A - \lambda_{1}I)(A - \lambda_{2}I)(A - \lambda_{3}I) \cdots (A - \lambda_{n}I) \nonumber \]
Note that each matrix \(A - \lambda_{i}I\) is upper triangular. Now observe:
- \(A - \lambda_{1}I\) has zero first column because column 1 of \(A\) is \((\lambda_{1}, 0, 0, \dots, 0)^{T}\).
- Then \((A - \lambda_{1}I)(A - \lambda_{2}I)\) has the first two columns zero because the second column of \((A - \lambda_{2}I)\) is \((b, 0, 0, \dots, 0)^{T}\) for some constant \(b\).
- Next \((A - \lambda_{1}I)(A - \lambda_{2}I)(A - \lambda_{3}I)\) has the first three columns zero because column 3 of \((A -\lambda_{3}I)\) is \((c, d, 0, \dots, 0)^{T}\) for some constants \(c\) and \(d\).
Continuing in this way we see that \((A - \lambda_{1}I)(A - \lambda_{2}I)(A - \lambda_{3}I) \cdots (A - \lambda_{n}I)\) has all \(n\) columns zero; that is, \(c_{A}(A) = 0\).
- Other notations for \(A^{H}\) are \(A^\ast\) and \(A^\dagger\).↩
- Issai Schur (1875–1941) was a German mathematician who did fundamental work in the theory of representations of groups as matrices.↩