12.3: Solving linear systems by factoring the coefficient matrix
( \newcommand{\kernel}{\mathrm{null}\,}\)
There are many ways in which you might try to solve a given system of linear equations. This section is primarily devoted to describing two particularly popular techniques, both of which involve factoring the coefficient matrix for the system into a product of simpler matrices. These techniques are also at the heart of many frequently used numerical (i.e., computer-assisted) applications of Linear Algebra.
You should note that the factorization of complicated objects into simpler components is an extremely common problem solving technique in mathematics. E.g., we will often factor a given polynomial into several polynomials of lower degree, and one can similarly use the prime factorization for an integer in order to simplify certain numerical computations.
A.3.1 Factorizing matrices using Gaussian elimination
In this section, we discuss a particularly significant factorization for matrices known as Gaussian elimination (a.k.a. Gauss-Jordan elimination). Gaussian elimination can be used to express any matrix as a product involving one matrix in so-called reduced row-echelon form and one or more so-called elementary matrices. Then, once such a factorization has been found, we can immediately solve any linear system that has the factorized matrix as its coefficient matrix. Moreover, the underlying technique for arriving at such a factorization is essentially an extension of the techniques already familiar to you for solving small systems of linear equations by hand.
Let m,n∈Z+ denote positive integers, and suppose that A∈Fm×n is an m×n matrix over F. Then, following Section A.2.2, we will make extensive use of A(i,⋅) and A(⋅,j) to
denote the row vectors and column vectors of A, respectively.
Definition A.3.1. Let A∈Fm×n be an m×n matrix over mathbbF. Then we say that A is in
row-echelon form (abbreviated REF) if the rows of A satisfy the following conditions:
- either A(1,⋅) is the zero vector or the first non-zero entry in A(1,⋅) (when read from left to right) is a one.
- for i=1,…,m, if any row vector A(i,⋅) is the zero vector, then each subsequent row vector A(i+1,⋅),…,A(m,⋅) is also the zero vector.
- for i=2,…,m, if some A(i,⋅)is not the zero vector, then the first non-zero entry (when read from left to right) is a one and occurs to the right of the initial one in A(i−1,⋅).
The initial leading one in each non-zero row is called a pivot. We furthermore say that A is in reduced row-echelon form (abbreviated RREF) if
(4) for each column vector A(⋅,j) containing a pivot (j=2,...,n), the pivot is the only non-zero element in A(⋅,j).
The motivation behind Definition A.3.1 is that matrix equations having their coffcient matrix in RREF (and, in some sense, also REF) are particularly easy to solve. Note, in particular, that the only square matrix in RREF without zero rows is the identity matrix.
Example A.3.2. The following matrices are all in REF:
A1=[111101110001],A2=[111001100000],A3=[110101100001],A4=[110000100001],A5=[101000010000],A6=[001000010000],A7=[000100000000],A8=[000000000000].
However, only A4 through A8 are in RREF, as you should verify. Moreover, if we take the
transpose of each of these matrices (as defined in Section A.5.1), then only AT6,AT7, and AT8 are in RREF.
Example A.3.3.
1. Consider the following matrix in RREF:
A=[1000010000100001].
Given any vector b=(bi)∈F4, the matrix equation Ax=bcorresponds to the system
of equations
x1 =b1x2=b2x3=b3x4=b4}.
Since A is in RREF (in fact, A=I4 is the 4×4 identity matrix), we can immediately conclude that the matrix equation Ax=b has the solution x=b for any choice of b.
Moreover, as we will see in Section A.4.2, x=b is the only solution to this system.
2. Consider the following matrix in RREF:
A=[16004−2001031000152000000].
Given any vector b=(bi)∈F4, the matrix equation Ax=b corresponds to the system
x1+6x2+4x5−2x6=b1x3+3x5+x6=b2x4+5x52x6=b30=b4}.
Since A is in RREF, we can immediately conclude a number of facts about solutions to this system. First of all, solutions exist if and only if b4=0. Moreover, by “solving for the pivots”, we see that the system reduces to
x1=b1−6x2−4x5+2x6x3=b2−3x5+x6x4=b3−5x5+2x6},
and so there is only enough information to specify values for x1,x3, and x4 in terms of the otherwise arbitrary values for x2,x5, and x6.
In this context, x1,x3, and x4 are called leading variables since these are the variable corresponding to the pivots in A. We similarly call x2,x5, and x6 free variables since the leading variables have been expressed in terms of these remaining variable. In particular, given any scalars α,β,γ∈F, it follows that the vector
x=[x1x2x3x4x5x6]=[b1−6α−4β+2γαb2−3β−γb3−5β−2γβγ]=[b10b2b300]+[−6αα0000]+[−4β0−3β−5ββ0]+[2γ0−γ−2γ0γ]
must satisfy the matrix equation Ax=b. One can also verify that every solution to
the matrix equation must be of this form. It then follows that the set of all solutions should somehow be “three dimensional”. As the above examples illustrate, a matrix equation having coefficient matrix in RREF
corresponds to a system of equations that can be solved with only a small amount of computation. Somewhat amazingly, any matrix can be factored into a product that involves exactly one matrix in RREF and one or more of the matrices defined as follows.
Definition A.3.4. A square matrix E∈Fm×m is called an elementary matrix if it has
one of the following forms:
1. (row exchange, a.k.a. “row swap”, matrix) E is obtained from the identity matrix Im by interchanging the row vectors I(r,⋅)m and I(s,⋅)m for some particular choice of positive integers r,s∈{1,2,...,m}. I.e., in the case that r<s,
HERE!
2. (row scaling matrix) E is obtained from the identity matrix Im by replacing the row vector I(r,⋅)m with αI(r,⋅)m for some choice of non-zero scalar 0≠α∈F and some choice of positive integer r∈{1,2,...,m}. I.e.,
where Err is the matrix having “r,r entry” equal to one and all other entries equal to zero. (Recall that Err was defined in Section A.2.1 as a standard basis vector for the vector space Fm×m.)
3. (row combination, a.k.a. “row sum”, matrix) E is obtained from the identity matrix Im by replacing the row vector Im with I(r,⋅)m+αI(s,⋅)m for some choice of scalar α∈F and some choice of positive integers r,s∈{1,2,…,m}. I.e., in the case that r<s,
where Ers is the matrix having “r,s entry” equal to one and all other entries equal to zero. (Ers was also defined in Section A.2.1 as a standard basis vector for Fm×m .)
The “elementary” in the name “elementary matrix” comes from the correspondence between these matrices and so-called “elementary operations” on systems of equations. In particular, each of the elementary matrices is clearly invertible (in the sense defined in Section A.2.3), just as each “elementary operation” is itself completely reversible. We illustrate this correspondence in the following example.
Example A.3.5. Define A,x, and b by
A=[253123108],x=[x1x2x3], and b=[459].
We illustrate the correspondence between elementary matrices and “elementary” operations
on the system of linear equations corresponding to the matrix equation Ax=b, as follows.
System of Equations
2x1+5x2+3x3=5x1+2x2+3x3=4x1+8x3=9
Corresponding Matrix Equation
Ax=b
To begin solving this system, one might want to either multiply the first equation through by 1/2 or interchange the first equation with one of the other equations. From a computational perspective, it is preferable to perform an interchange since multiplying through by 1/2 would unnecessarily introduce fractions. Thus, we choose to interchange the first and second equation in order to obtain
System of Equations
x1+2x2+3x3=42x1+5x2+3x3=5x1+8x3=9
Corresponding Matrix Equation
E0Ax=E0b, where E0=[010100001].
Another reason for performing the above interchange is that it now allows us to use more convenient “row combination” operations when eliminating the variable x1 from all but one of the equations. In particular, we can multiply the first equation through by −2 and add it to the second equation in order to obtain
System of Equations
x1+2x2+3x3=4x2−3x3=−3x1+8x3=9
Corresponding Matrix Equation
E1E0Ax=E1E0b, where E1=[100−210001].
Similarly, in order to eliminate the variable x1 from the third equation, we can next multiply the first equation through by −1 and add it to the third equation in order to obtain
System of Equations
x1+2x2+3x3=4x2−3x3=−3−2x2+5x3=5
Corresponding Matrix Equation
E2E1E0Ax=E2E1E0b, where E2=[100010−101].
Now that the variable x1 only appears in the first equation, we can somewhat similarly isolate the variable x2 by multiplying the second equation through by 2 and adding it to the third equation in order to obtain
System of Equations
x1+2x2+3x3=4x2−3x3=−3−x3=−1
Corresponding Matrix Equation
E3⋯E0Ax=E3⋯E0b, where E3=[100010021].
Finally, in order to complete the process of transforming the coefficient matrix into REF, we need only rescale row three by −1. This corresponds to multiplying the third equation through by −1 in order to obtain
System of Equations
x1+2x2+3x3=4x2−3x3=−3x3=1
Corresponding Matrix Equation
E4⋯E0Ax=E4⋯E0b, where E4=[10001000−1].
Now that the coefficient matrix is in REF, we can already solve for the variables x1,x2, and x3 using a process called back substitution. In other words, it should be clear from the third equation that x3=1. Using this value and solving for x2 in the second equation, it then follows that
x2=−3+3x3=−3+3=0.
Similarly, by solving the first equation for x1 , it follows that
x1=4−2x2−3x3=4−3=1.
From a computational perspective, this process of back substitution can be applied to solve any system of equations when the coefficient matrix of the corresponding matrix equation is in REF. However, from an algorithmic perspective, it is often more useful to continue “row reducing” the coefficient matrix in order to produce a coefficient matrix in full RREF.
Here, there are several next natural steps that we could now perform in order to move toward RREF. Since we have so far worked “from the top down, from left to right”, we choose to now work “from bottom up, from right to left”. In other words, we now multiply the third equation through by 3 and then add it to the second equation in order to obtain
System of Equations
x1+2x2+3x3=4x2=0x3=1
Corresponding Matrix Equation
E5⋯E0Ax=E5⋯E0b, where E5=[100013001].
Next, we can multiply the third equation through by −3 and add it to the first equation in order to obtain
System of Equations
x1+2x2=1x2=0x3=1
Corresponding Matrix Equation
E6⋯E0Ax=E6⋯E0b, where E6=[10−3010001].
Finally, we can multiply the second equation through by −2 and add it to the first equation in order to obtain
System of Equations
x1=1x2=0x3=1
Corresponding Matrix Equation
E7⋯E0Ax=E7⋯E0b, where E7=[1−20010001].
Now it should be extremely clear that we obtained a correct solution when using back substitution on the linear system
E4⋯E0Ax=E4⋯E0b.
However, in many applications, it is not enough to merely find a solution. Instead, it is important to describe every solution. As we will see in the remaining sections of these notes, this requires the machinery of Linear Algebra. In particular, we will need the theory of vector spaces and linear maps.
To close this section, we take a closer look at the following expression obtained from the above analysis:
E7E6⋯E1E0A=I3.
Because of the way in which we have defined elementary matrices, it should be clear that each of the matrices E0,E1,…,E7 is invertible. Thus, we can use Theorem A.2.9 in order to “solve” for A:
A=(E7E6⋯E1E0)−1I3=E−10E−11⋯E−17I3.
In effect, since the inverse of an elementary matrix is itself easily seen to be an elementary matrix, this has factored A into the product of eight elementary matrices (namely, E−10,E−11,…,E−17 ) and one matrix in RREF (namely, I3 ). Moreover, because each elementary matrix is invertible, we can conclude that x solves Ax=b if and only if x solves
(E7E6⋯E1E0A)x=(I3)x=(E7E6⋯E1E0)b.
Consequently, given any linear system, one can use Gaussian elimination in order to reduce the problem to solving a linear system whose coefficient matrix is in RREF.
Similarly, we can conclude that the inverse of A is
A−1=E7E6⋯E1E0=[13−5−3−401695−21].
Having computed this product, one could essentially “reuse” much of the above computation in order to solve the matrix equation Ax=b′ for several different right-hand sides b′∈F3 . The process of “resolving” a linear system is a common practice in applied mathematics.
A.3.2 Solving homogenous linear systems
In this section, we study the solutions for an important special class of linear systems. As we will see in the next section, though, solving any linear system is fundamentally dependent upon knowing how to solve these so-called homogeneous systems.
As usual, we use m, n∈Z+ to denote arbitrary positive integers.
Definition A.3.6. The system of linear equations, System (A.1.1), is called a homogeneous system if the right-hand side of each equation is zero. In other words, a homogeneous system corresponds to a matrix equation of the form
Ax=0,
where A∈Fm×n is an m×n matrix and x is an n-tuple of unknowns. We also call the set
N={v∈Fn|Av=0}
the solution space for the homogeneous system corresponding to Ax=0.
When describing the solution space for a homogeneous linear system, there are three important cases to keep in mind:
Definition A.3.7. The system of linear equations System (A.1.1) is called
- overdetermined if m>n.
- square if m=n.
- underdetermined if m<n.
In particular, we can say a great deal about underdetermined homogeneous systems, which
we state as a corollary to the following more general result.
Theorem A.3.8. Let N be the solution space for the homogeneous linear system corresponding to the matrix equation Ax=0, where A∈Fm×n. Then
1. the zero vector 0∈N.
2. N is a subspace of the vector space Fn.
This is an amazing theorem. Since N is a subspace of Fn , we know that either N will contain exactly one element (namely, the zero vector) or N will contain infinitely many elements.
Corollary A.3.9. Every homogeneous system of linear equations is solved by the zero vector. Moreover, every underdetermined homogeneous system has infinitely many solution.
We call the zero vector the trivial solution for a homogeneous linear system. The fact that every homogeneous linear system has the trivial solution thus reduces solving such a system to determining if solutions other than the trivial solution exist.
One method for finding the solution space of a homogeneous system is to first use Gaussian elimination (as demonstrated in Example A.3.5) in order to factor the coefficient matrix of the system. Then, because the original linear system is homogeneous, the homogeneous system corresponding to the resulting RREF matrix will have the same solutions as the original system. In other words, if a given matrix A satisfies
EkEk−1⋯E0A=A0,
where each Ei is an elementary matrix and A0 is an RREF matrix, then the matrix equation Ax=0 has the exact same solution set as A0x=0 since E−10E−11⋯E−1k0=0.
Example A.3.10. In the following examples, we illustrate the process of determining the solution space for a homogeneous linear system having coefficient matrix in RREF.
1. Consider the matrix equation Ax=0, where A is the matrix given by
A=[100010001000].
This corresponds to an overdetermined homogeneous system of linear equations. Moreover, since there are no free variables (as defined in Example A.3.3), it should be clear that this system has only the trivial solution. Thus, N={0}.
2. Consider the matrix equation Ax=0, where A is the matrix given by
A=[101011000000].
This corresponds to an overdetermined homogeneous system of linear equations. Unlike the above example, we see that x3 is a free variable for this system, and so we would expect the solution space to contain more than just the zero vector. As in Example A.3.3, we can solve for the leading variables in terms of the free variable in order to obtain
x1=−x3x2=−x3},
It follows that, given any scalar α∈F, every vector of the form
x=[x1x2x3]=[−α−αα]=α[−1−11]
is a solution to Ax=0. Therefore,
N={(x1,x2,x3)∈F3 | x1=−x3,x2=−x3}=span((−1,−1,1)).
3. Consider the matrix equation Ax=0, where A is the matrix given by
A=[111000000].
This corresponds to a square homogeneous system of linear equations with two free variables. Thus, using the same technique as n the previous example, we can solve for the leading variable in order to obtain x1=−x2−x3. It follows that, given any scalars α,β∈F, every vector of the form
x=[x1x2x3]=[−α−βαβ]=α[−110]+β[−101]
is a solution to Ax=0. Therefore,
N=(x1,x2,x3)∈F3 | x1+x2+x3=0=span((−1,1,0),(−1,0,1)).
A.3.3 Solving inhomogeneous linear systems
In this section, we demonstrate the relationship between arbitrary linear systems and homogeneous linear systems. Specifically, we will see that it takes little more work to solve a general linear system than it does to solve the homogeneous system associated to it.
As usual, we use m,n∈Z+ to denote arbitrary positive integers.
Definition A.3.11. The system of linear equations System (A.1.1) is called an inhomogeneous system if the right-hand side of at least one equation is not zero. In other words, an inhomogeneous system corresponds to a matrix equation of the form
Ax=b,
where A∈Fm×n is an m×n matrix, x is an n-tuple of unknowns, and b∈Fm is a vector having at least one non-zero component. We also call the set
U={v∈Fn | Av=b}
the solution set for the linear system corresponding to Ax=b.
As illustrated in Example A.3.3, the zero vector cannot be a solution for an inhomogeneous system. Consequently, the solution set U for an inhomogeneous linear system will never be a subspace of any vector space. Instead, it will be a related algebraic structure as described in the following theorem.
Theorem A.3.12. Let U be the solution space for the inhomogeneous linear system corresponding to the matrix equation Ax=b, where A∈Fm×n and b∈Fm is a vector having at least one non-zero component. Then, given any element u∈U, we have that
U=u+N={u+n | n∈N},
where N is the solution space to Ax=0. In other words, if B=(n(1),n(2),…,n(k)) is a list of vectors forming a basis for N, then every element of U can be written in the form
u+α1n(1)+α2n(2)+…+αkn(k)
for some choice of scalars α1,α2,…,αk∈F.
As a consequence of this theorem, we can conclude that inhomogeneous linear systems behave a lot like homogeneous systems. The main difference is that inhomogeneous systems are not necessarily solvable. This, then, creates three possibilities: an inhomogeneous linear system will either have no solution, a unique solution, or infinitely many solutions. An important special case is as follows.
Corollary A.3.13. Every overdetermined inhomogeneous linear system will necessarily be unsolvable for some choice of values for the right-hand sides of the equations.
The solution set U for an inhomogeneous linear system is called an affine subspace of Fn since it is a genuine subspace of Fn that has been “offset” by a vector u∈Fn . Any set having this structure might also be called a coset (when used in the context of Group Theory) or a linear manifold (when used in a geometric context such as a discussion of
intersection hyperplanes).
In order to actually find the solution set for an inhomogeneous linear system, we rely on Theorem A.3.12. Given an m×n matrix A∈Fm×n and a non-zero vector b∈Fm , we call Ax=0 the associated homogeneous matrix equation to the inhomogeneous matrix equation Ax=b. Then, according to Theorem A.3.12, U can be found by first finding the solution space N for the associated equation Ax=0 and then finding any so-called particular solution u∈Fn to Ax=b.
As with homogeneous systems, one can first use Gaussian elimination in order to factorize A, and so we restrict the following examples to the special case of RREF matrices.
Example A.3.14. The following examples use the same matrices as in Example A.3.10.
1. Consider the matrix equation Ax=b, where A is the matrix given by
A=[100010001000]
and b∈F4 has at least one non-zero component. Then Ax=b corresponds to an overdetermined inhomogeneous system of linear equations and will not necessarily be solvable for all possible choices of b.
In particular, note that the bottom row A(4,⋅) of A corresponds to the equation
0=b4,
from which Ax=b has no solution unless the fourth component of b is zero. Furthermore, the remaining rows of A correspond to the equations
x1=b1,x2=b2, and x3=b3.
It follows that, given any vector b∈Fn with fourth component zero, x=b is the only solution to Ax=b. In other words, U={b}.
2. Consider the matrix equation Ax=b, where A is the matrix given by
A=[101011000000]
and b∈F4. This corresponds to an overdetermined inhomogeneous system of linear equations. Note, in particular, that the bottom two rows of the matrix corresponds to the equations 0=b3 and 0=b4, from which Ax=b has no solution unless the third and fourth component of the vector b are both zero. Furthermore, we conclude from the remaining rows of the matrix that x3 is a free variable for this system and that
x1=b1−x3x2=b2−x3}.
It follows that, given any scalar α∈F, every vector of the form
x=[x1x2x3]=[b1−αb2−αα]=[b1b20]+α[−1−11]=u+αn
is a solution to Ax=b. Recall from Example A.3.10 that the solution space for the associated homogeneous matrix equation Ax=0 is
N={(x1,x2,x3)∈F3 | x1=−x3,x2=−x3}=span((−1,−1,1)).
Thus, in the language of Theorem A.3.12, we have that u is a particular solution for
Ax=b and that (n) is a basis for N. Therefore, the solution set for Ax=b is
U=(b1,b2,0)+N={(x1,x2,x3)∈F3 | x1=b1−x3,x2=b2−x3}.
3. Consider the matrix equation Ax=b, where A is the matrix given by
A=[111000000000]
and b∈F4 . This corresponds to a square inhomogeneous system of linear equations with two free variables. As above, this system has no solutions unless b2=b3=0, and, given any scalars α,β∈F, every vector of the form
x=[x1x2x3]=[b1−α−βαβ]=[b100]+α[−110]+β[−101]=u+αn(1)+βn(2)
is a solution to Ax=b. Recall from Example A.3.10, that the solution space for the associated homogeneous matrix equation Ax=0 is
N={(x1,x2,x3)∈F3 | x1+x2+x3=0}=span((−1,1,0),(−1,0,1)).
Thus, in the language of Theorem A.3.12, we have that u is a particular solution for Ax=b and that (n(1),n(2)) is a basis for N. Therefore, the solution set for Ax=b is
U=(b1,0,0)+N={(x1,x2,x3)∈F3 | x1+x2+x3=b1}.
A.3.4 Solving linear systems with LU-factorization
Let n∈Z+ be a positive integer, and suppose that A∈Fn×n is an upper triangular matrix and that b∈Fn is a column vector. Then, in order to solve the matrix equation Ax=b, there is no need to apply Gaussian elimination. Instead, we can exploit the triangularity of A in order to directly obtain a solution.
Using the notation in System (A.1.1), note that the last equation in the linear system corresponding to Ax=b can only involve the single unknown xn, and so we can obtain the solution
xn=bnann
as long as ann≠0. If ann=0, then we must be careful to distinguish between the two cases in which bn=0 or bn≠0. Thus, for reasons that will become clear below, we assume that the diagonal elements of A are all nonzero. Under this assumption, there is no ambiguity in substitution the solution for xn into the penultimate (a.k.a. second-to-last) equation. Since A is upper triangular, the penultimate equation involves only the single unknown xn−1, and so we obtain the solution
xn−1=bn−1−an−1,nxnan−1,n−1.
We can then similarly substitute the solutions for xn and xn−1 into the ante penultimate (a.k.a. third-to-last) equation in order to solve for xn−2 , and so on until a complete solution is found. In particular,
x1=b1−∑nk=2ankxka11.
As in Example A.3.5, we call this process back substitution. Given an arbitrary linear system, back substitution essentially allows us to halt the Gaussian elimination procedure and immediately obtain a solution for the system as soon as an upper triangular matrix (possibly in REF or even RREF) has been obtained from the coefficient matrix.
A similar procedure can be applied when A is lower triangular. Again using the notation in System (A.1.1), the first equation contains only x1 , and so
x1=b1a11.
We are again assuming that the diagonal entries of A are all nonzero. Then, acting similarly
to back substitution, we can substitute the solution for x1 into the second equation in order
to obtain
x2=b2−a21x1a22.
Continuing this process, we have created a forward substitution procedure. In particular,
xn=bn−∑nk=2ankxkann.
More generally, suppose that A∈Fn×n is an arbitrary square matrix for which there exists a lower triangular matrix L∈Fn×n and an upper triangular matrix U∈Fn×n such that A=LU. When such matrices exist, we call A=LU an LU-factorization (a.k.a. LU- decomposition) of A. The benefit of such a factorization is that it allows us to exploit the triangularity of L and U when solving linear systems having coefficient matrix A.
To see this, suppose that b∈Fn is a column vector. (As above, we also assume that the none of the diagonal entries in either L or U is zero.) Furthermore, set y=Ux, where x is the as yet unknown solution of Ax=b. Then, by substitution, y must satisfy
Ly=b,
and so, since L is lower triangular, we can immediately solve for y via forward substitution. In other words, we are using the associative of matrix multiplication (cf. Theorem A.2.6) in order to conclude that
(A)x=(LU)x=L(Ux)=L(y)
Then, once we have obtained y∈Fn , we can apply back substitution in order to solve for x To see this, suppose that A=LU is an LU-factorization for the matrix A∈Fn×n and in the matrix equation
Ux=y.
In general, one can only obtain an LU-factorization for a matrix A∈Fn×n when there exist elementary “row combination” matrices E1,E2,…,Ek∈Fn×n and an upper triangular matrix U such that
EkEk−1⋯E1A=U.
There are various generalizations of LU-factorization that allow for more than just elementary “row combinations” matrices in this product, but we do not mention them here. Instead, we provide a detailed example that illustrates how to obtain an LU-factorization and then
how to use such a factorization in solving linear systems.
Example A.3.15. Consider the matrix A∈F3×3 given by
A=[2344510482].
Using the techniques illustrated in Example A.3.5, we have the following matrix product:
[100010021][100010−201][100−210001][2344510482]=[2340−1300−2]=U.
In particular, we have found three elementary “row combination” matrices, which, when multiplied by A, produce an upper triangular matrix U.
Now, in order to produce a lower triangular matrix L such that A=LU, we rely on two facts about lower triangular matrices. First of all, any lower triangular matrix with entirely no-zero diagonal is invertible, and, second, the product of lower triangular matrices is always lower triangular. (Cf. Theorem A.2.7.) More specifically, we have that
[2344510482]=[100−210001]−1[100010−201]−1[100010021]−1[2340−1300−2].
where
[100−210001]−1[100010−201]−1[100010021]−1=[100210001][100010201][1000100−21]=[1002102−21].
We call the resulting lower triangular matrix L and note that A=LU, as desired.
Now, define x,y, and b by
x=[x1x2x3],y=[y1y2y3], and b=[6162].
Applying forward substitution to Ly=b, we obtain the solution
y1=b1=6y2=b2−2y1=4y3=b3−2y1+2y2=−2}.
Then, given this unique solution y to Ly=b, we can apply backward substitution to Ux=y in order to obtain
2x1=y1−3x2−4x3=8−1x2=y2−2x3=22x3=y3=−2}.
It follows that the unique solution to Ax=b is
x1=4x2=−2x3=1}.
In summary, we have given an algorithm for solving any matrix equation Ax=b in which A=LU, where L is lower triangular, U is upper triangular, and both L and U have nothing but non-zero entries along their diagonals.
We note in closing that the simple procedures of back and forward substitution can also be used for computing the inverses of lower and upper triangular matrices. E.g., the inverse U=(uij) of the matrix
U−1=[2340−1300−2]
must satify
[2u11+3u21+4u312u12+3u22+4u322u13+3u23+4u33−u21+3u31−u22+3u32−u23+3u33−2u31−2u32−2u33]=U−1U=I3=[100010001],
from which we obtain the linear system
2u11+3u21+4u31=12u12+3u22+4u32=02u13+3u23+4u33=0−u21+3u31=0−u22+3u32=1−u23+3u33=0−2u31=0−2u32=0−2u33=1}.
in the nine variables u11,u12,…,u33. Since this linear system has upper triangular coefficient matrix, we can apply back substitution in order to directly solve for the entries in U.
The only condition we imposed upon our triangular matrices above was that all diagonal entries were non-zero. It should be clear to you that this non-zero diagonal restriction is a necessary and sufficient condition for a triangular matrix to be non-singular. Moreover, once the inverses of both L and U in an LU-factorization have been obtained, then we can immediately calculate the inverse for A=LU by applying Theorem A.2.9(4):
A−1=(LU)−1=U−1L−1..
Contributors
- Isaiah Lankham, Mathematics Department at UC Davis
- Bruno Nachtergaele, Mathematics Department at UC Davis
- Anne Schilling, Mathematics Department at UC Davis
Both hardbound and softbound versions of this textbook are available online at WorldScientific.com.