2.3: Elementary Matrices
- Page ID
- 161303
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Use multiplication by an elementary matrix to apply row operations.
- Write a matrix as a product of elementary matrices.
We now turn our attention to a special type of matrix called an elementary matrix. An elementary matrix is always a square matrix. Recall the row operations given in Definition 1.2.2. Any elementary matrix, which we often denote by \(E\), is obtained from applying one row operation to the identity matrix of the same size.
For example, the matrix \[E = \left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \end{array} \right]\nonumber \] is the elementary matrix obtained from switching the two rows. The matrix \[E = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \end{array} \right]\nonumber \] is the elementary matrix obtained from multiplying the second row of the \(3 \times 3\) identity matrix by \(3\). The matrix \[E = \left[ \begin{array}{rr} 1 & 0 \\ -3 & 1 \end{array} \right]\nonumber \] is the elementary matrix obtained from adding \(-3\) times the first row to the third row.
You may construct an elementary matrix from any row operation, but remember that you can only apply one operation.
Consider the following definition.
Let \(E\) be an \(n \times n\) matrix. Then \(E\) is an elementary matrix if it is the result of applying one row operation to the \(n \times n\) identity matrix \(I_n\).
Those which involve switching rows of the identity matrix are called permutation matrices.
Therefore, \(E\) constructed above by switching the two rows of \(I_2\) is called a permutation matrix.
Elementary matrices can be used in place of row operations and therefore are very useful. It turns out that multiplying (on the left hand side) by an elementary matrix \(E\) will have the same effect as doing the row operation used to obtain \(E\).
The following theorem is an important result which we will use throughout this text.
To perform any of the three row operations on a matrix \(A\) it suffices to take the product \(EA\), where \(E\) is the elementary matrix obtained by using the desired row operation on the identity matrix.
Therefore, instead of performing row operations on a matrix \(A\), we can row reduce through matrix multiplication with the appropriate elementary matrix. We will examine this theorem in detail for each of the three row operations given in Definition 1.3.2.
First, consider the following lemma.
Let \(P^{ij}\) denote the elementary matrix which involves switching the \(i^{th}\) and the \(j^{th}\) rows. Then \(P^{ij}\) is a permutation matrix and \[P^{ij}A=B\nonumber \] where \(B\) is obtained from \(A\) by switching the \(i^{th}\) and the \(j^{th}\) rows.
We will explore this idea more in the following example.
Let \[P^{12} = \left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right], A = \left[ \begin{array}{cc} a & b \\ g & d \\ e & f \end{array} \right] \nonumber \]
Find \(B\) where \(B = P^{12}A\).
Solution
You can see that the matrix \(P^{12}\) is obtained by switching the first and second rows of the \(3 \times 3\) identity matrix \(I\).
Using our usual procedure, compute the product \(P^{12}A = B\). The result is given by
\[B =\left[ \begin{array}{cc} g & d \\ a & b \\ e & f \end{array} \right] \nonumber\]
Notice that \(B\) is the matrix obtained by switching rows \(1\) and \(2\) of \(A\). Therefore by multiplying \(A\) by \(P^{12}\), the row operation which was applied to \(I\) to obtain \(P^{12}\) is applied to \(A\) to obtain \(B\).
Theorem \(\PageIndex{1}\) applies to all three row operations, and we now look at the row operation of multiplying a row by a scalar. Consider the following lemma.
Let \(E\left( k,i\right)\) denote the elementary matrix corresponding to the row operation in which the \(i^{th}\) row is multiplied by the nonzero scalar, \(k.\) Then
\[E\left( k,i\right) A=B \nonumber\]
where \(B\) is obtained from \(A\) by multiplying the \(i^{th}\) row of \(A\) by \(k\).
We will explore this lemma further in the following example.
Let
\[E \left(5, 2\right) = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 1 \end{array} \right], A = \left[ \begin{array}{cc} a & b \\ c & d \\ e & f \end{array} \right] \nonumber\]
Find the matrix \(B\) where \(B = E \left(5, 2\right)A\)
Solution
You can see that \(E \left(5, 2\right)\) is obtained by multiplying the second row of the identity matrix by \(5\).
Using our usual procedure for multiplication of matrices, we can compute the product \(E \left(5, 2\right)A\). The resulting matrix is given by
\[B =\left[ \begin{array}{cc} a & b \\ 5c & 5d \\ e & f \end{array} \right] \nonumber\]
Notice that \(B\) is obtained by multiplying the second row of \(A\) by the scalar \(5\).
There is one last row operation to consider. The following lemma discusses the final operation of adding a multiple of a row to another row.
Let \(E\left( k \times i+j\right)\) denote the elementary matrix obtained from \(I\) by adding \(k\) times the \(i^{th}\) row to the \(j^{th}\). Then
\[E\left( k \times i+j\right) A=B\nonumber \]
where \(B\) is obtained from \(A\) by adding \(k\) times the \(i^{th}\) row to the \(j^{th}\) row of \(A.\)
Consider the following example.
Let
\[E\left( 2 \times 1+3\right) = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & 0 & 1 \end{array} \right], A = \left[ \begin{array}{cc} a & b \\ c & d \\ e & f \end{array} \right] \nonumber\]
Find \(B\) where \(B = E\left( 2 \times 1+3\right)A\).
Solution
You can see that the matrix \(E\left( 2 \times 1+3\right)\) was obtained by adding \(2\) times the first row of \(I\) to the third row of \(I\).
Using our usual procedure, we can compute the product \(E\left( 2 \times 1+3\right)A\). The resulting matrix \(B\) is given by \[B = \left[ \begin{array}{cc} a & b \\ c & d \\ 2a+e & 2b+f \end{array} \right] \nonumber\]
You can see that \(B\) is the matrix obtained by adding \(2\) times the first row of \(A\) to the third row.
Suppose we have applied a row operation to a matrix \(A\). Consider the row operation required to return \(A\) to its original form, to undo the row operation. It turns out that this action is how we find the inverse of an elementary matrix \(E\).
Consider the following theorem.
Every elementary matrix is invertible and its inverse is also an elementary matrix.
In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on \(I\). \(E^{-1}\) will be obtained by performing the row operation which would carry \(E\) back to \(I\).
- If \(E\) is obtained by switching rows \(i\) and \(j\), then \(E^{-1}\) is also obtained by switching rows \(i\) and \(j\).
- If \(E\) is obtained by multiplying row \(i\) by the scalar \(k\), then \(E^{-1}\) is obtained by multiplying row \(i\) by the scalar \(\frac{1}{k}\).
- If \(E\) is obtained by adding \(k\) times row \(i\) to row \(j\), then \(E^{-1}\) is obtained by subtracting \(k\) times row \(i\) from row \(j\).
Consider the following example.
Let \[E = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 2 \end{array} \right]\nonumber \]
Find \(E^{-1}\).
Solution
Consider the elementary matrix \(E\) given by
\[E = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 2 \end{array} \right]\nonumber \]
Here, \(E\) is obtained from the \(2 \times 2\) identity matrix by multiplying the second row by \(2\). In order to carry \(E\) back to the identity, we need to multiply the second row of \(E\) by \(\frac{1}{2}\). Hence,
\(E^{-1}\) is given by \[E^{-1} = \left[ \begin{array}{rr} 1 & 0 \\ 0 & \frac{1}{2} \end{array} \right] \nonumber\]
We can verify that \(EE^{-1}=I\). Take the product \(EE^{-1}\), given by
\[EE^{-1} = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 2 \end{array} \right] \left[ \begin{array}{rr} 1 & 0 \\ 0 & \frac{1}{2} \end{array} \right] = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \right] \nonumber\]
This equals \(I\) so we know that we have compute \(E^{-1}\) properly.
Suppose an \(m \times n\) matrix \(A\) is row reduced to its reduced row-echelon form. By tracking each row operation completed, this row reduction can be completed through multiplication by elementary matrices.
Consider the following definition.
Let \(A\) be an \(m \times n\) matrix and let \(B\) be the reduced row-echelon form of \(A\). Then we can write \(B = UA\) where \(U\) is the product of all elementary matrices representing the row operations done to \(A\) to obtain \(B\).
Consider the following example.
Let \(A = \left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \\ 2 & 0 \end{array} \right]\). Find \(B\), the reduced row-echelon form of \(A\) and write it in the form \(B=UA\).
Solution
To find \(B\), row reduce \(A\). For each step, we will record the appropriate elementary matrix. First, switch rows \(1\) and \(2\).
\[\left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \\ 2 & 0 \end{array} \right] \rightarrow \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 2 & 0 \end{array} \right]\nonumber \]
The resulting matrix is equivalent to finding the product of \(P^{12} =\left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right]\) and \(A\).
Next, add \((-2)\) times row \(1\) to row \(3\).
\[\left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 2 & 0 \end{array} \right] \rightarrow \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right]\nonumber \]
This is equivalent to multiplying by the matrix \(E(-2 \times 1 + 3) = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right]\). Notice that the resulting matrix is \(B\), the required reduced row-echelon form of \(A\).
We can then write
\[\begin{aligned} B &= E(-2 \times 1 + 2) \left( P^{12} A \right) \\ &= \left( E(-2 \times 1 + 2) P^{12} \right) A \\ &= U A\end{aligned}\]
It remains to find the matrix \(U\).
\[\begin{aligned} U &= E(-2 \times 1 + 2) P^{12} \\ &= \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right] \left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right] \\ &= \left[ \begin{array}{rrr} 0 & 1 & 0\\ 1 & 0 & 0 \\ 0 & -2 & 1 \end{array} \right]\end{aligned}\]
We can verify that \(B = UA\) holds for this matrix \(U\): \[\begin{aligned} UA &= \left[ \begin{array}{rrr} 0 & 1 & 0\\ 1 & 0 & 0 \\ 0 & -2 & 1 \end{array} \right] \left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \\ 2 & 0 \end{array} \right] \\ &= \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right] \\ &= B \end{aligned}\]
While the process used in the above example is reliable and simple when only a few row operations are used, it becomes cumbersome in a case where many row operations are needed to carry \(A\) to \(B\). The following theorem provides an alternate way to find the matrix \(U\).
Let \(A\) be an \(m \times n\) matrix and let \(B\) be its reduced row-echelon form. Then \(B = UA\) where \(U\) is an invertible \(m \times m\) matrix found by forming the matrix \(\left[ A | I_m \right]\) and row reducing to \(\left[ B | U \right]\).
Let’s revisit the above example using the process outlined in Theorem \(\PageIndex{3}\).
Let \(A = \left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \\ 2 & 0 \end{array}\right]\). Using the process outlined in Theorem \(\PageIndex{3}\), find \(U\) such that \(B=UA\).
Solution
First, set up the matrix \(\left[ A | I_m \right]\). \[\left[ \begin{array}{rr|rrr} 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 2 & 0 & 0 & 0 & 1 \end{array}\right]\nonumber \] Now, row reduce this matrix until the left side equals the reduced row-echelon form of \(A\).
\[\begin{aligned} \left[ \begin{array}{rr|rrr} 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 2 & 0 & 0 & 0 & 1 \end{array}\right] &\rightarrow \left[ \begin{array}{rr|rrr} 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 1 & 0 & 0 \\ 2 & 0 & 0 & 0 & 1 \end{array}\right] \\ &\rightarrow \left[ \begin{array}{rr|rrr} 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & -2 & 1 \end{array}\right]\end{aligned}\]
The left side of this matrix is \(B\), and the right side is \(U\). Comparing this to the matrix \(U\) found above in Example \(\PageIndex{5}\), you can see that the same matrix is obtained regardless of which process is used.
Recall from Algorithm 2.2.1 that an \(n \times n\) matrix \(A\) is invertible if and only if \(A\) can be carried to the \(n \times n\) identity matrix using the usual row operations. This leads to an important consequence related to the above discussion.
Suppose \(A\) is an \(n \times n\) invertible matrix. Then, set up the matrix \(\left[ A | I_n \right]\) as done above, and row reduce until it is of the form \(\left[ B | U \right]\). In this case, \(B = I_n\) because \(A\) is invertible.
\[\begin{aligned} B &= UA \\ I_n &=UA \\ U^{-1} &= A \end{aligned}\]
Now suppose that \(U = E_1 E_2 \cdots E_k\) where each \(E_i\) is an elementary matrix representing a row operation used to carry \(A\) to \(I\). Then,
\[U^{-1} = \left( E_1 E_2 \cdots E_k \right) ^{-1} = E_k^{-1} \cdots E_2^{-1} E_1^{-1}\nonumber \]
Remember that if \(E_i\) is an elementary matrix, so too is \(E_i^{-1}\). It follows that
\[\begin{aligned} A&= U^{-1} \\ &= E_k^{-1} \cdots E_2^{-1} E_1^{-1}\end{aligned}\]
and \(A\) can be written as a product of elementary matrices.
Let \(A\) be an \(n \times n\) matrix. Then \(A\) is invertible if and only if it can be written as a product of elementary matrices.
Consider the following example.
Let \(A = \left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 1 & 0 \\ 0 & -2 & 1 \end{array} \right]\). Write \(A\) as a product of elementary matrices.
Solution
We will use the process outlined in Theorem \(\PageIndex{3}\) to write \(A\) as a product of elementary matrices. We will set up the matrix \(\left[ A | I \right]\) and row reduce, recording each row operation as an elementary matrix.
First:
\[\left[ \begin{array}{rrr|rrr} 0 & 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & -2 & 1 & 0 & 0 & 1 \end{array} \right] \rightarrow \left[ \begin{array}{rrr|rrr} 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & -2 & 1 & 0 & 0 & 1 \end{array} \right] \nonumber\]
represented by the elementary matrix \(E_1= \left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right]\).
Secondly:
\[\left[ \begin{array}{rrr|rrr} 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & -2 & 1 & 0 & 0 & 1 \end{array} \right] \rightarrow \left[ \begin{array}{rrr|rrr} 1 & 0 & 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & -2 & 1 & 0 & 0 & 1 \end{array} \right] \nonumber\]
represented by the elementary matrix \(E_2 = \left[ \begin{array}{rrr} 1 & -1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]\).
Finally:
\[\left[ \begin{array}{rrr|rrr} 1 & 0& 0 & -1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & -2 & 1 & 0 & 0 & 1 \end{array} \right] \rightarrow \left[ \begin{array}{rrr|rrr} 1 & 0 & 0 &-1 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 2 & 0 & 1 \end{array} \right] \nonumber\]
represented by the elementary matrix \(E_3= \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]\).
Notice that the reduced row-echelon form of \(A\) is \(I\). Hence \(I = UA\) where \(U\) is the product of the above elementary matrices. It follows that \(A = U^{-1}\). Since we want to write \(A\) as a product of elementary matrices, we wish to express \(U^{-1}\) as a product of elementary matrices.
\[\begin{aligned} U^{-1} &= \left( E_3 E_2 E_1 \right)^{-1}\\[6px] &= E_1^{-1} E_2^{-1} E_3^{-1} \\[6px] &= \left[ \begin{array}{rrr} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{rrr} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array} \right] \\ &= A\end{aligned}\]
This gives \(A\) written as a product of elementary matrices. By Theorem \(\PageIndex{4}\) it follows that \(A\) is invertible.
More on Matrix Inverses
In this section, we will prove three theorems which will clarify the concept of matrix inverses. In order to do this, first recall some important properties of elementary matrices.
Recall that an elementary matrix is a square matrix obtained by performing an elementary operation on an identity matrix. Each elementary matrix is invertible, and its inverse is also an elementary matrix. If \(E\) is an \(m \times m\) elementary matrix and \(A\) is an \(m \times n\) matrix, then the product \(EA\) is the result of applying to \(A\) the same elementary row operation that was applied to the \(m \times m\) identity matrix in order to obtain \(E\).
Let \(R\) be the reduced row-echelon form of an \(m \times n\) matrix \(A\). \(R\) is obtained by iteratively applying a sequence of elementary row operations to \(A\). Denote by \(E_1, E_2, \cdots, E_k\) the elementary matrices associated with the elementary row operations which were applied, in order, to the matrix \(A\) to obtain the resulting \(R\). We then have that \(R = \left( E_k \cdots \left( E_2 \left( E_1A \right) \right)\right) = E_k \cdots E_2E_1A\). Let \(E\) denote the product matrix \(E_k \cdots E_2E_1\) so that we can write \(R=EA\) where \(E\) is an invertible matrix whose inverse is the product \((E_1)^{-1}(E_2)^{-1} \cdots (E_k)^{-1}\).
Now, we will consider some preliminary lemmas.
Suppose that \(A\) and \(B\) are matrices such that the product \(AB\) is an identity matrix. Then the reduced row-echelon form of \(A\) does not have a row of zeros.
- Proof
-
Let \(R\) be the reduced row-echelon form of \(A\). Then \(R=EA\) for some invertible square matrix \(E\) as described above. By hypothesis \(AB=I\) where \(I\) is an identity matrix, so we have a chain of equalities \[R(BE^{-1}) = (EA)(BE^{-1}) = E(AB)E^{-1} = EIE^{-1} = EE^{-1} = I\nonumber \] If \(R\) would have a row of zeros, then so would the product \(R(BE^{-1})\). But since the identity matrix \(I\) does not have a row of zeros, neither can \(R\) have one.
We now consider a second important lemma.
Suppose that \(A\) and \(B\) are matrices such that the product \(AB\) is an identity matrix. Then \(A\) has at least as many columns as it has rows.
- Proof
-
Let \(R\) be the reduced row-echelon form of \(A\). By Lemma \(\PageIndex{1}\), we know that \(R\) does not have a row of zeros, and therefore each row of \(R\) has a leading \(1\). Since each column of \(R\) contains at most one of these leading \(1\)s, \(R\) must have at least as many columns as it has rows.
An important theorem follows from this lemma.
Only square matrices can be invertible.
- Proof
-
Suppose that \(A\) and \(B\) are matrices such that both products \(AB\) and \(BA\) are identity matrices. We will show that \(A\) and \(B\) must be square matrices of the same size. Let the matrix \(A\) have \(m\) rows and \(n\) columns, so that \(A\) is an \(m \times n\) matrix. Since the product \(AB\) exists, \(B\) must have \(n\) rows, and since the product \(BA\) exists, \(B\) must have \(m\) columns so that \(B\) is an \(n \times m\) matrix. To finish the proof, we need only verify that \(m=n\).
We first apply Lemma \(\PageIndex{2}\) with \(A\) and \(B\), to obtain the inequality \(m \leq n\). We then apply Lemma \(\PageIndex{2}\) again (switching the order of the matrices), to obtain the inequality \(n \leq m\). It follows that \(m=n\), as we wanted.
Of course, not all square matrices are invertible. In particular, zero matrices are not invertible, along with many other square matrices.
The following proposition will be useful in proving the next theorem.
If \(R\) is the reduced row-echelon form of a square matrix, then either \(R\) has a row of zeros or \(R\) is an identity matrix.
The proof of this proposition is left as an exercise to the reader. We now consider the second important theorem of this section.
Suppose \(A\) and \(B\) are square matrices such that \(AB=I\) where \(I\) is an identity matrix. Then it follows that \(BA=I\). Further, both \(A\) and \(B\) are invertible and \(B=A^{-1}\) and \(A=B^{-1}\).
- Proof
-
Let \(R\) be the reduced row-echelon form of a square matrix \(A\). Then, \(R=EA\) where \(E\) is an invertible matrix. Since \(AB=I\), Lemma \(\PageIndex{1}\) gives us that \(R\) does not have a row of zeros. By noting that \(R\) is a square matrix and applying Proposition \(\PageIndex{1}\), we see that \(R=I\). Hence, \(EA=I\).
Using both that \(EA=I\) and \(AB=I\), we can finish the proof with a chain of equalities as given by \[\begin{aligned} BA = IBIA &= (EA)B(E^{-1}E)A \\ &= E(AB)E^{-1}(EA) \\ &= EIE^{-1}I \\ &= EE^{-1} = I\end{aligned}\]
It follows from the definition of the inverse of a matrix that \(B=A^{-1}\) and \(A=B^{-1}\).
This theorem is very useful, since with it we need only test one of the products \(AB\) or \(BA\) in order to check that \(B\) is the inverse of \(A\). The hypothesis that \(A\) and \(B\) are square matrices is very important, and without this the theorem does not hold.
We will now consider an example.
Let \[A = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right],\nonumber \] Show that \(A^{T}A = I\) but \(AA^{T} \neq 0\).
Solution
Consider the product \(A^{T}A\) given by \[\left[ \begin{array}{rrr} 1 & 0 & 0\\ 0 & 1 & 0 \end{array} \right] \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right] = \left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \right]\] Therefore, \(A^{T}A = I_2\), where \(I_2\) is the \(2 \times 2\) identity matrix. However, the product \(AA^{T}\) is \[\left[ \begin{array}{rr} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{rrr} 1 & 0 & 0\\ 0 & 1 & 0 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{array} \right]\] Hence \(AA^{T}\) is not the \(3 \times 3\) identity matrix. This shows that for Theorem \(\PageIndex{2}\), it is essential that both matrices be square and of the same size.
Is it possible to have matrices \(A\) and \(B\) such that \(AB=I\), while \(BA=0\)? This question is left to the reader to answer, and you should take a moment to consider the answer.
We conclude this section with an important theorem.
For any matrix \(A\) the following conditions are equivalent:
- \(A\) is invertible
- The reduced row-echelon form of \(A\) is an identity matrix
- Proof
-
In order to prove this, we show that for any given matrix \(A\), each condition implies the other. We first show that if \(A\) is invertible, then its reduced row-echelon form is an identity matrix, then we show that if the reduced row-echelon form of \(A\) is an identity matrix, then \(A\) is invertible.
If \(A\) is invertible, there is some matrix \(B\) such that \(AB = I\). By Lemma \(\PageIndex{1}\), we get that the of \(A\) does not have a row of zeros. Then by Theorem \(\PageIndex{1}\), it follows that \(A\) and the reduced row-echelon form of \(A\) are square matrices. Finally, by Proposition \(\PageIndex{1}\), this reduced row-echelon form of \(A\) must be an identity matrix. This proves the first implication.
Now suppose the reduced row-echelon form of \(A\) is an identity matrix \(I\). Then \(I=EA\) for some product \(E\) of elementary matrices. By Theorem \(\PageIndex{2}\), we can conclude that \(A\) is invertible.
Theorem \(\PageIndex{3}\) corresponds to Algorithm 2.2.1, which claims that \(A^{-1}\) is found by row reducing the augmented matrix \(\left[ A|I\right]\) to the form \(\left[ I|A^{-1}\right]\). This will be a matrix product \(E\left[ A|I\right]\) where \(E\) is a product of elementary matrices. By the rules of matrix multiplication, we have that \(E\left[ A|I\right] = \left[ EA|EI\right] = \left[ EA|E\right]\).
It follows that the reduced row-echelon form of \(\left[ A|I\right]\) is \(\left[ EA|E\right]\), where \(EA\) gives the reduced row-echelon form of \(A\). By Theorem \(\PageIndex{3}\), if \(EA \neq I\), then \(A\) is not invertible, and if \(EA=I\), \(A\) is invertible. If \(EA=I\), then by Theorem \(\PageIndex{2}\), \(E=A^{-1}\). This proves that Algorithm 2.2.1 does in fact find \(A^{-1}\).