1.3: General Notation, Transposes, and Inverses

Last updated
Save as PDF

Page ID: 96139

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

A useful notation for writing a general \(m\)-by-\(n\) matrix \(\text{A}\) is

\[\text{A}=\left(\begin{array}{cccc}a_{11}&a_{12}&\cdots&a_{1n} \\ a_{21}&a_{22}&\cdots&a_{2n} \\ \vdots&\vdots&\ddots&\vdots \\ a_{m1}&a_{m2}&\cdots&a_{mn}\end{array}\right).\label{eq:1} \]

Here, the matrix element of \(\text{A}\) in the \(i\)th row and the \(j\)th column is denoted as \(a_{ij}\).

Matrix multiplication can be written in terms of the matrix elements. Let \(\text{A}\) be an \(m\)-by-\(n\) matrix and let \(\text{B}\) be an \(n\)-by-\(p\) matrix. Then \(\text{C} = \text{AB}\) is an \(m\)-by-\(p\) matrix, and its \(ij\) element can be written as

\[c_{ij}=\sum\limits_{k=1}^na_{ik}b_{kj}.\label{eq:2} \]

Notice that the second index of \(a\) and the first index of \(b\) are summed over.

We can define the transpose of the matrix \(\text{A}\), denoted by \(\text{A}^{\text{T}}\) and spoken as A-transpose, as the matrix for which the rows become the columns and the columns become the rows. Here, using \(\eqref{eq:1}\),

\[\text{A}^{\text{T}}=\left(\begin{array}{cccc}a_{11}&a_{21}&\cdots&a_{m1}\\a_{12}&a_{22}&\cdots&a_{m2} \\ \vdots&\vdots&\ddots&\vdots \\ a_{1n}&a_{2n}&\cdots&a_{mn}\end{array}\right),\nonumber \]

where we could write

\[a_{ij}^{\text{T}}=a_{ji}.\nonumber \]

Evidently, if \(\text{A}\) is \(m\)-by-\(n\) then \(\text{A}^{\text{T}}\) is \(n\)-by-\(m\). As a simple example, view the following pair:

\[\text{A}=\left(\begin{array}{cc}a&d\\b&e\\c&f\end{array}\right),\text{A}^{\text{T}}=\left(\begin{array}{ccc}a&b&c\\d&e&f\end{array}\right).\label{eq:3} \]

If \(\text{A}\) is a square matrix, and \(\text{A}^{\text{T}} = \text{A}\), then we say that \(\text{A}\) is symmetric. For example the \(3\)-by-\(3\) matrix

\[\text{A}=\left(\begin{array}{ccc}a&b&c\\b&d&e\\c&e&f\end{array}\right)\nonumber \]

is symmetric. A matrix that satisfies \(\text{A}^{\text{T}} = −\text{A}\) is called skew symmetric. For example,

\[\text{A}=\left(\begin{array}{ccc}0&b&c\\-b&0&e\\-c&-e&0\end{array}\right)\nonumber \]

is skew symmetric. Notice that the diagonal elements must be zero. A sometimes useful fact is that every matrix can be written as the sum of a symmetric and a skew-symmetric matrix using

\[\text{A}=\frac{1}{2}(\text{A}+\text{A}^{\text{T}})+\frac{1}{2}(\text{A}-\text{A}^{\text{T}}).\nonumber \]

This is just like the fact that every function can be written as the sum of an even and an odd function.

How do we write the transpose of the product of two matrices? Again, let \(\text{A}\) be an \(m\)-by-\(n\) matrix, \(\text{B}\) be an \(n\)-by-\(p\) matrix, and \(\text{C} = \text{AB}\). We have

\[c_{ij}^{\text{T}}=c_{ji}=\sum\limits_{k=1}^na_{jk}b_{ki}=\sum\limits_{k=1}^nb_{ik}^{\text{T}}a_{kj}^{\text{T}}.\nonumber \]

With \(\text{C}^{\text{T}}=(\text{AB})^{\text{T}}\), we have

\[(\text{AB})^{\text{T}}=\text{B}^{\text{T}}\text{A}^{\text{T}}.\nonumber \]

In words, the transpose of the product of matrices is equal to the product of the transposes with the order of multiplication reversed.

The transpose of a column vector is a row vector. The inner product (or dot product) between two vectors is obtained by the product of a row vector and a column vector. With column vectors

\[\text{u}=\left(\begin{array}{c}u_1\\u_2\\u_3\end{array}\right),\quad\text{v}=\left(\begin{array}{c}v_1\\v_2\\v_3\end{array}\right),\nonumber \]

the inner product between these two vectors becomes

\[\text{u}^{\text{T}}\text{v}=\left(\begin{array}{ccc}u_1&u_2&u_3\end{array}\right)\left(\begin{array}{c}v_1\\v_2\\v_3\end{array}\right)=u_1v_1+u_2v_2+u_3v_3.\nonumber \]

The norm-squared of a vector becomes

\[\text{u}^{\text{T}}\text{u}=\left(\begin{array}{ccc}u_1&u_2&u_3\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\u_3\end{array}\right)=u_1^2+u_2^2+u_3^2.\nonumber \]

We say that two column vectors are orthogonal if their inner product is zero. We say that a column vector is normalized if it has a norm of one. A set of column vectors that are normalized and mutually orthogonal are said to be orthonormal.

When the vectors are complex, the inner product needs to be defined differently. Instead of a transpose of a matrix, one defines the conjugate transpose as the transpose together with taking the complex conjugate of every element of the matrix. The symbol used is that of a dagger, so that

\[\text{u}^\dagger =\left(\begin{array}{ccc}\overline{u}_1&\overline{u}_2&\overline{u}_3\end{array}\right).\nonumber \]

Then

\[\text{u}^\dagger \text{u}=\left(\begin{array}{ccc}\overline{u}_1&\overline{u}_2&\overline{u}_3\end{array}\right)\left(\begin{array}{c}u_1\\u_2\\u_3\end{array}\right)=|u_1|^2+|u_2|^2+|u_3|^2.\nonumber \]

When a real matrix is equal to its transpose we say that the matrix is symmetric. When a complex matrix is equal to its conjugate transpose, we say that the matrix is Hermitian. Hermitian matrices play a fundamental role in quantum physics.

An outer product is also defined, and is used in some applications. The outer product between \(\text{u}\) and \(\text{v}\) is given by

\[\text{uv}^{\text{T}}=\left(\begin{array}{c}u_1\\u_2\\u_3\end{array}\right)\left(\begin{array}{ccc}\overline{u}_1&\overline{u}_2&\overline{u}_3\end{array}\right)=\left(\begin{array}{ccc}u_1v_1&u_1v_2&u_1v_3 \\ u_2v_1&u_2v_2&u_2v_3\\u_3v_1&u_3v_2&u_3v_3\end{array}\right).\nonumber \]

Notice that every column is a multiple of the single vector \(\text{u}\), and every row is a multiple of the single vector \(\text{v}^{\text{T}}\).

The transpose operation can also be used to make square matrices. If \(\text{A}\) is an \(m\)-by-\(n\) matrix, then \(\text{A}^{\text{T}}\) is \(n\)-by-\(m\) and \(\text{A}^{\text{T}}\text{A}\) is an \(n\)-by-\(n\) matrix. For example, using \(\eqref{eq:3}\), we

\[\text{A}^{\text{T}}\text{A}=\left(\begin{array}{cc}a^2+b^2+c^2&ad+be+cf \\ ad+be+cf&d^2+e^2+f^2\end{array}\right)\nonumber \]

Notice that \(\text{A}^{\text{T}}\text{A}\) is symmetric because

\[(\text{A}^{\text{T}}\text{A})^{\text{T}}=\text{A}^{\text{T}}\text{A}.\nonumber \]

The trace of a square matrix \(\text{A}\), denoted as \(\text{Tr A}\), is the sum of the diagonal elements of \(\text{A}\). So if \(\text{A}\) is an \(n\)-by-\(n\) matrix, then

\[\text{Tr A}=\sum\limits_{i=1}^na_{ii}.\nonumber \]

Example \(\PageIndex{1}\)

Let \(\text{A}\) be an \(m\)-by-\(n\) matrix. Prove that \(\text{Tr}(\text{A}^{\text{T}}\text{A})\) is the sum of the squares of all the elements of \(\text{A}\).

Solution

Note that \(\text{A}^{\text{T}}\text{A}\) is an \(n\)-by-\(n\) matrix. We have

\[\begin{aligned}\text{Tr}(\text{A}^{\text{T}}\text{A})&=\sum\limits_{i=1}^n(\text{A}^{\text{T}}\text{A})_{ii} \\ &=\sum\limits_{i=1}^n\sum\limits_{j=1}^ma_{ij}^{\text{T}}a_{ji} \\ &=\sum\limits_{i=1}^n\sum\limits_{j=1}^ma_{ji}a_{ji} \\ &=\sum\limits_{i=1}^m\sum\limits_{j=1}^na_{ij}^2.\end{aligned} \nonumber \]

Square matrices may also have inverses. Later, we will see that for a matrix to have an inverse its determinant, which we will define in general, must be nonzero. Here, if an \(n\)-by-\(n\) matrix \(\text{A}\) has an inverse, denoted as \(\text{A}^{−1}\), then

\[\text{AA}^{-1}=\text{A}^{-1}\text{A}=\text{I}.\nonumber \]

If both the \(n\)-by-\(n\) matrices \(\text{A}\) and \(\text{B}\) have inverses then we can ask what is the inverse of the product of these two matrices? Observe that from the definition of an inverse,

\[(\text{AB})^{-1}(\text{AB})=\text{I}.\nonumber \]

We can first multiple on the right by \(\text{B}^{−1}\), and then by \(\text{A}^{−1}\), to obtain

\[(\text{AB})^{-1}=\text{B}^{-1}\text{A}^{-1}.\nonumber \]

Again in words, the inverse of the product of matrices is equal to the product of the inverses with the order of multiplication reversed. Be careful here: this rule applies only if both matrices in the product are invertible.

Example \(\PageIndex{2}\)

Assume that \(\text{A}\) is an invertible matrix. Prove that \((\text{A}^{−1})^{\text{T}} = (\text{A}^{\text{T}})^{−1}\). In words: the transpose of the inverse matrix is the inverse of the transpose matrix.

Solution

We know that

\[\text{AA}^{−1} = \text{I}\quad\text{ and }\quad\text{A}^{−1}\text{A} = \text{I}.\nonumber \]

Taking the transpose of these equations, and using \((\text{AB})^{\text{T}} = \text{B}^{\text{T}}\text{A}^{\text{T}}\) and \(\text{I}^{\text{T}} = \text{I}\), we obtain

\[(\text{A}^{-1})^{\text{T}}\text{A}^{\text{T}}=\text{I}\quad\text{ and }\quad\text{A}^{\text{T}}(\text{A}^{-1})^{\text{T}}=\text{I}.\nonumber \]

We can therefore conclude that \((\text{A}^{-1})^{\text{T}}=(\text{A}^{\text{T}})^{-1}\).

It is illuminating to derive the inverse of a two-by-two matrix. To find the inverse of \(\text{A}\) given by

\[\text{A}=\left(\begin{array}{cc}a&b\\c&d\end{array}\right),\nonumber \]

the most direct approach would be to write

\[\left(\begin{array}{cc}a&b\\c&d\end{array}\right)\left(\begin{array}{cc}x_1&x_2\\y_1&y_2\end{array}\right)=\left(\begin{array}{cc}1&0\\0&1\end{array}\right)\nonumber \]

and solve for \(x_1,\: x_2,\: y_1,\) and \(y_2\). There are two inhomogeneous and two homogeneous equations given by

\[\begin{array}{ll}ax_1+by_1=1,&cx_1+dy_1=0, \\ cx_2+dy_2=1,&ax_2+by_2=0.\end{array}\nonumber \]

To solve, we can eliminate \(y_1\) and \(y_2\) using the two homogeneous equations, and then solve for \(x_1\) and \(x_2\) using the two inhomogeneous equations. Finally, we use the two homogeneous equations to solve for \(y_1\) and \(y_2\). The solution for \(\text{A}^{−1}\) is found to be

\[\text{A}^{-1}=\frac{1}{ad-bc}\left(\begin{array}{rr}d&-b\\-c&a\end{array}\right).\label{eq:4} \]

The factor in front of the matrix is the definition of the determinant for our two-by-two matrix \(\text{A}\):

\[\det\text{ A}=\left|\begin{array}{cc}a&b\\c&d\end{array}\right|=ad-bc.\nonumber \]

The determinant of a two-by-two matrix is the product of the diagonals minus the product of the off-diagonals. Evidently, \(\text{A}\) is invertible only if \(\det\text{ A}\neq 0\). Notice that the inverse of a two-by-two matrix, in words, is found by switching the diagonal elements of the matrix, negating the off-diagonal elements, and dividing by the determinant. It can be useful in a linear algebra course to remember this formula.