2.3: Elementary Matrices
( \newcommand{\kernel}{\mathrm{null}\,}\)
- Use multiplication by an elementary matrix to apply row operations.
- Write a matrix as a product of elementary matrices.
We now turn our attention to a special type of matrix called an elementary matrix. An elementary matrix is always a square matrix. Recall the row operations given in Definition 1.2.2. Any elementary matrix, which we often denote by E, is obtained from applying one row operation to the identity matrix of the same size.
For example, the matrix E=[0110] is the elementary matrix obtained from switching the two rows. The matrix E=[100030001] is the elementary matrix obtained from multiplying the second row of the 3×3 identity matrix by 3. The matrix E=[10−31] is the elementary matrix obtained from adding −3 times the first row to the third row.
You may construct an elementary matrix from any row operation, but remember that you can only apply one operation.
Consider the following definition.
Let E be an n×n matrix. Then E is an elementary matrix if it is the result of applying one row operation to the n×n identity matrix In.
Those which involve switching rows of the identity matrix are called permutation matrices.
Therefore, E constructed above by switching the two rows of I2 is called a permutation matrix.
Elementary matrices can be used in place of row operations and therefore are very useful. It turns out that multiplying (on the left hand side) by an elementary matrix E will have the same effect as doing the row operation used to obtain E.
The following theorem is an important result which we will use throughout this text.
To perform any of the three row operations on a matrix A it suffices to take the product EA, where E is the elementary matrix obtained by using the desired row operation on the identity matrix.
Therefore, instead of performing row operations on a matrix A, we can row reduce through matrix multiplication with the appropriate elementary matrix. We will examine this theorem in detail for each of the three row operations given in Definition 1.3.2.
First, consider the following lemma.
Let Pij denote the elementary matrix which involves switching the ith and the jth rows. Then Pij is a permutation matrix and PijA=B where B is obtained from A by switching the ith and the jth rows.
We will explore this idea more in the following example.
Let P12=[010100001],A=[abgdef]
Find B where B=P12A.
Solution
You can see that the matrix P12 is obtained by switching the first and second rows of the 3×3 identity matrix I.
Using our usual procedure, compute the product P12A=B. The result is given by
B=[gdabef]
Notice that B is the matrix obtained by switching rows 1 and 2 of A. Therefore by multiplying A by P12, the row operation which was applied to I to obtain P12 is applied to A to obtain B.
Theorem 2.3.1 applies to all three row operations, and we now look at the row operation of multiplying a row by a scalar. Consider the following lemma.
Let E(k,i) denote the elementary matrix corresponding to the row operation in which the ith row is multiplied by the nonzero scalar, k. Then
E(k,i)A=B
where B is obtained from A by multiplying the ith row of A by k.
We will explore this lemma further in the following example.
Let
E(5,2)=[100050001],A=[abcdef]
Find the matrix B where B=E(5,2)A
Solution
You can see that E(5,2) is obtained by multiplying the second row of the identity matrix by 5.
Using our usual procedure for multiplication of matrices, we can compute the product E(5,2)A. The resulting matrix is given by
B=[ab5c5def]
Notice that B is obtained by multiplying the second row of A by the scalar 5.
There is one last row operation to consider. The following lemma discusses the final operation of adding a multiple of a row to another row.
Let E(k×i+j) denote the elementary matrix obtained from I by adding k times the ith row to the jth. Then
E(k×i+j)A=B
where B is obtained from A by adding k times the ith row to the jth row of A.
Consider the following example.
Let
E(2×1+3)=[100010201],A=[abcdef]
Find B where B=E(2×1+3)A.
Solution
You can see that the matrix E(2×1+3) was obtained by adding 2 times the first row of I to the third row of I.
Using our usual procedure, we can compute the product E(2×1+3)A. The resulting matrix B is given by B=[abcd2a+e2b+f]
You can see that B is the matrix obtained by adding 2 times the first row of A to the third row.
Suppose we have applied a row operation to a matrix A. Consider the row operation required to return A to its original form, to undo the row operation. It turns out that this action is how we find the inverse of an elementary matrix E.
Consider the following theorem.
Every elementary matrix is invertible and its inverse is also an elementary matrix.
In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on I. E−1 will be obtained by performing the row operation which would carry E back to I.
- If E is obtained by switching rows i and j, then E−1 is also obtained by switching rows i and j.
- If E is obtained by multiplying row i by the scalar k, then E−1 is obtained by multiplying row i by the scalar 1k.
- If E is obtained by adding k times row i to row j, then E−1 is obtained by subtracting k times row i from row j.
Consider the following example.
Let E=[1002]
Find E−1.
Solution
Consider the elementary matrix E given by
E=[1002]
Here, E is obtained from the 2×2 identity matrix by multiplying the second row by 2. In order to carry E back to the identity, we need to multiply the second row of E by 12. Hence,
E−1 is given by E−1=[10012]
We can verify that EE−1=I. Take the product EE−1, given by
EE−1=[1002][10012]=[1001]
This equals I so we know that we have compute E−1 properly.
Suppose an m×n matrix A is row reduced to its reduced row-echelon form. By tracking each row operation completed, this row reduction can be completed through multiplication by elementary matrices.
Consider the following definition.
Let A be an m×n matrix and let B be the reduced row-echelon form of A. Then we can write B=UA where U is the product of all elementary matrices representing the row operations done to A to obtain B.
Consider the following example.
Let A=[011020]. Find B, the reduced row-echelon form of A and write it in the form B=UA.
Solution
To find B, row reduce A. For each step, we will record the appropriate elementary matrix. First, switch rows 1 and 2.
[011020]→[100120]
The resulting matrix is equivalent to finding the product of P12=[010100001] and A.
Next, add (−2) times row 1 to row 3.
[100120]→[100100]
This is equivalent to multiplying by the matrix E(−2×1+3)=[100010−201]. Notice that the resulting matrix is B, the required reduced row-echelon form of A.
We can then write
B=E(−2×1+2)(P12A)=(E(−2×1+2)P12)A=UA
It remains to find the matrix U.
U=E(−2×1+2)P12=[100010−201][010100001]=[0101000−21]
We can verify that B=UA holds for this matrix U: UA=[0101000−21][011020]=[100100]=B
While the process used in the above example is reliable and simple when only a few row operations are used, it becomes cumbersome in a case where many row operations are needed to carry A to B. The following theorem provides an alternate way to find the matrix U.
Let A be an m×n matrix and let B be its reduced row-echelon form. Then B=UA where U is an invertible m×m matrix found by forming the matrix [A|Im] and row reducing to [B|U].
Let’s revisit the above example using the process outlined in Theorem 2.3.3.
Let A=[011020]. Using the process outlined in Theorem 2.3.3, find U such that B=UA.
Solution
First, set up the matrix [A|Im]. [011001001020001] Now, row reduce this matrix until the left side equals the reduced row-echelon form of A.
[011001001020001]→[100100110020001]→[1001001100000−21]
The left side of this matrix is B, and the right side is U. Comparing this to the matrix U found above in Example 2.3.5, you can see that the same matrix is obtained regardless of which process is used.
Recall from Algorithm 2.2.1 that an n×n matrix A is invertible if and only if A can be carried to the n×n identity matrix using the usual row operations. This leads to an important consequence related to the above discussion.
Suppose A is an n×n invertible matrix. Then, set up the matrix [A|In] as done above, and row reduce until it is of the form [B|U]. In this case, B=In because A is invertible.
B=UAIn=UAU−1=A
Now suppose that U=E1E2⋯Ek where each Ei is an elementary matrix representing a row operation used to carry A to I. Then,
U−1=(E1E2⋯Ek)−1=E−1k⋯E−12E−11
Remember that if Ei is an elementary matrix, so too is E−1i. It follows that
A=U−1=E−1k⋯E−12E−11
and A can be written as a product of elementary matrices.
Let A be an n×n matrix. Then A is invertible if and only if it can be written as a product of elementary matrices.
Consider the following example.
Let A=[0101100−21]. Write A as a product of elementary matrices.
Solution
We will use the process outlined in Theorem 2.3.3 to write A as a product of elementary matrices. We will set up the matrix [A|I] and row reduce, recording each row operation as an elementary matrix.
First:
[0101001100100−21001]→[1100100101000−21001]
represented by the elementary matrix E1=[010100001].
Secondly:
[1100100101000−21001]→[100−1100101000−21001]
represented by the elementary matrix E2=[1−10010001].
Finally:
[100−1100101000−21001]→[100−110010100001201]
represented by the elementary matrix E3=[100010021].
Notice that the reduced row-echelon form of A is I. Hence I=UA where U is the product of the above elementary matrices. It follows that A=U−1. Since we want to write A as a product of elementary matrices, we wish to express U−1 as a product of elementary matrices.
U−1=(E3E2E1)−1=E−11E−12E−13=[010100001][110010001][1000100−21]=A
This gives A written as a product of elementary matrices. By Theorem 2.3.4 it follows that A is invertible.
More on Matrix Inverses
In this section, we will prove three theorems which will clarify the concept of matrix inverses. In order to do this, first recall some important properties of elementary matrices.
Recall that an elementary matrix is a square matrix obtained by performing an elementary operation on an identity matrix. Each elementary matrix is invertible, and its inverse is also an elementary matrix. If E is an m×m elementary matrix and A is an m×n matrix, then the product EA is the result of applying to A the same elementary row operation that was applied to the m×m identity matrix in order to obtain E.
Let R be the reduced row-echelon form of an m×n matrix A. R is obtained by iteratively applying a sequence of elementary row operations to A. Denote by E1,E2,⋯,Ek the elementary matrices associated with the elementary row operations which were applied, in order, to the matrix A to obtain the resulting R. We then have that R=(Ek⋯(E2(E1A)))=Ek⋯E2E1A. Let E denote the product matrix Ek⋯E2E1 so that we can write R=EA where E is an invertible matrix whose inverse is the product (E1)−1(E2)−1⋯(Ek)−1.
Now, we will consider some preliminary lemmas.
Suppose that A and B are matrices such that the product AB is an identity matrix. Then the reduced row-echelon form of A does not have a row of zeros.
- Proof
-
Let R be the reduced row-echelon form of A. Then R=EA for some invertible square matrix E as described above. By hypothesis AB=I where I is an identity matrix, so we have a chain of equalities R(BE−1)=(EA)(BE−1)=E(AB)E−1=EIE−1=EE−1=I If R would have a row of zeros, then so would the product R(BE−1). But since the identity matrix I does not have a row of zeros, neither can R have one.
We now consider a second important lemma.
Suppose that A and B are matrices such that the product AB is an identity matrix. Then A has at least as many columns as it has rows.
- Proof
-
Let R be the reduced row-echelon form of A. By Lemma 2.3.1, we know that R does not have a row of zeros, and therefore each row of R has a leading 1. Since each column of R contains at most one of these leading 1s, R must have at least as many columns as it has rows.
An important theorem follows from this lemma.
Only square matrices can be invertible.
- Proof
-
Suppose that A and B are matrices such that both products AB and BA are identity matrices. We will show that A and B must be square matrices of the same size. Let the matrix A have m rows and n columns, so that A is an m×n matrix. Since the product AB exists, B must have n rows, and since the product BA exists, B must have m columns so that B is an n×m matrix. To finish the proof, we need only verify that m=n.
We first apply Lemma 2.3.2 with A and B, to obtain the inequality m≤n. We then apply Lemma 2.3.2 again (switching the order of the matrices), to obtain the inequality n≤m. It follows that m=n, as we wanted.
Of course, not all square matrices are invertible. In particular, zero matrices are not invertible, along with many other square matrices.
The following proposition will be useful in proving the next theorem.
If R is the reduced row-echelon form of a square matrix, then either R has a row of zeros or R is an identity matrix.
The proof of this proposition is left as an exercise to the reader. We now consider the second important theorem of this section.
Suppose A and B are square matrices such that AB=I where I is an identity matrix. Then it follows that BA=I. Further, both A and B are invertible and B=A−1 and A=B−1.
- Proof
-
Let R be the reduced row-echelon form of a square matrix A. Then, R=EA where E is an invertible matrix. Since AB=I, Lemma 2.3.1 gives us that R does not have a row of zeros. By noting that R is a square matrix and applying Proposition 2.3.1, we see that R=I. Hence, EA=I.
Using both that EA=I and AB=I, we can finish the proof with a chain of equalities as given by BA=IBIA=(EA)B(E−1E)A=E(AB)E−1(EA)=EIE−1I=EE−1=I
It follows from the definition of the inverse of a matrix that B=A−1 and A=B−1.
This theorem is very useful, since with it we need only test one of the products AB or BA in order to check that B is the inverse of A. The hypothesis that A and B are square matrices is very important, and without this the theorem does not hold.
We will now consider an example.
Let A=[100100], Show that ATA=I but AAT≠0.
Solution
Consider the product ATA given by [100010][100100]=[1001] Therefore, ATA=I2, where I2 is the 2×2 identity matrix. However, the product AAT is [100100][100010]=[100010000] Hence AAT is not the 3×3 identity matrix. This shows that for Theorem 2.3.2, it is essential that both matrices be square and of the same size.
Is it possible to have matrices A and B such that AB=I, while BA=0? This question is left to the reader to answer, and you should take a moment to consider the answer.
We conclude this section with an important theorem.
For any matrix A the following conditions are equivalent:
- A is invertible
- The reduced row-echelon form of A is an identity matrix
- Proof
-
In order to prove this, we show that for any given matrix A, each condition implies the other. We first show that if A is invertible, then its reduced row-echelon form is an identity matrix, then we show that if the reduced row-echelon form of A is an identity matrix, then A is invertible.
If A is invertible, there is some matrix B such that AB=I. By Lemma 2.3.1, we get that the of A does not have a row of zeros. Then by Theorem 2.3.1, it follows that A and the reduced row-echelon form of A are square matrices. Finally, by Proposition 2.3.1, this reduced row-echelon form of A must be an identity matrix. This proves the first implication.
Now suppose the reduced row-echelon form of A is an identity matrix I. Then I=EA for some product E of elementary matrices. By Theorem 2.3.2, we can conclude that A is invertible.
Theorem 2.3.3 corresponds to Algorithm 2.2.1, which claims that A−1 is found by row reducing the augmented matrix [A|I] to the form [I|A−1]. This will be a matrix product E[A|I] where E is a product of elementary matrices. By the rules of matrix multiplication, we have that E[A|I]=[EA|EI]=[EA|E].
It follows that the reduced row-echelon form of [A|I] is [EA|E], where EA gives the reduced row-echelon form of A. By Theorem 2.3.3, if EA≠I, then A is not invertible, and if EA=I, A is invertible. If EA=I, then by Theorem 2.3.2, E=A−1. This proves that Algorithm 2.2.1 does in fact find A−1.