2.7: LU-Factorization
( \newcommand{\kernel}{\mathrm{null}\,}\)
The solution to a system Ax=b of linear equations can be solved quickly if A can be factored as A=LU where L and U are of a particularly nice form. In this section we show that Gaussian elimination can be used to find such factorizations.
Triangular Matrices
As for square matrices, if A=[aij] is an m×n matrix, the elements a11,a22,a33,… form the main diagonal of A. Then A is called upper triangular if every entry below and to the left of the main diagonal is zero. Every row-echelon matrix is upper triangular, as are the matrices
[1−103021100−30][021050003100101][1110−11000000]
By analogy, a matrix A is called lower triangular if its transpose is upper triangular, that is if each entry above and to the right of the main diagonal is zero. A matrix is called triangular if it is upper or lower triangular.
006519 Solve the system
x1+2x2−3x3−x4+5x5=35x3+x4+x5=82x5=6
where the coefficient matrix is upper triangular.
As in gaussian elimination, let the “non-leading” variables be parameters: x2=s and x4=t. Then solve for x5, x3, and x1 in that order as follows. The last equation gives
x5=62=3
Substitution into the second last equation gives
x3=1−15t
Finally, substitution of both x5 and x3 into the first equation gives
x1=−9−2s+25t
The method used in Example [exa:006519] is called back substitution because later variables are substituted into earlier equations. It works because the coefficient matrix is upper triangular. Similarly, if the coefficient matrix is lower triangular the system can be solved by forward substitution where earlier variables are substituted into later equations. As observed in Section [sec:1_2], these procedures are more numerically efficient than gaussian elimination.
Now consider a system Ax=b where A can be factored as A=LU where L is lower triangular and U is upper triangular. Then the system Ax=b can be solved in two stages as follows:
- First solve Ly=b for y by forward substitution.
- Then solve Ux=y for x by back substitution.
Then x is a solution to Ax=b because Ax=LUx=Ly=b. Moreover, every solution x arises this way (take y=Ux). Furthermore the method adapts easily for use in a computer.
This focuses attention on efficiently obtaining such factorizations A=LU. The following result will be needed; the proof is straightforward and is left as Exercises [ex:ex2_7_7] and [ex:ex2_7_8].
006547 Let A and B denote matrices.
- If A and B are both lower (upper) triangular, the same is true of AB.
- If A is n×n and lower (upper) triangular, then A is invertible if and only if every main diagonal entry is nonzero. In this case A−1 is also lower (upper) triangular.
LU-Factorization
Let A be an m×n matrix. Then A can be carried to a row-echelon matrix U (that is, upper triangular). As in Section [sec:2_5], the reduction is
A→E1A→E2E1A→E3E2E1A→⋯→EkEk−1⋯E2E1A=U
where E1,E2,…,Ek are elementary matrices corresponding to the row operations used. Hence
A=LU
where L=(EkEk−1⋯E2E1)−1=E−11E−12⋯E−1k−1E−1k. If we do not insist that U is reduced then, except for row interchanges, none of these row operations involve adding a row to a row above it. Thus, if no row interchanges are used, all the Ei are lower triangular, and so L is lower triangular (and invertible) by Lemma [lem:006547]. This proves the following theorem. For convenience, let us say that A can be lower reduced if it can be carried to row-echelon form using no row interchanges.
006566 If A can be lower reduced to a row-echelon matrix U, then
A=LU
where L is lower triangular and invertible and U is upper triangular and row-echelon.
LU-factorization006570 A factorization A=LU as in Theorem [thm:006566] is called an LU-factorization of A.
Such a factorization may not exist (Exercise [ex:ex2_7_4]) because A cannot be carried to row-echelon form using no row interchange. A procedure for dealing with this situation will be outlined later. However, if an LU-factorization A=LU does exist, then the gaussian algorithm gives U and also leads to a procedure for finding L. Example [exa:006575] provides an illustration. For convenience, the first nonzero column from the left in a matrix A is called the leading column of A.
006575 Find an LU-factorization of A=[02−6−240−13320−13710].
We lower reduce A to row-echelon form as follows:
A=[0\tnreduce1A2−6−240−13320\tnreduce1B−13710]→[01−3−12000\tnreduce2A24000\tnreduce2B612]→[01−3−120001200000]=U
The circled columns are determined as follows: The first is the leading column of A, and is used (by lower reduction) to create the first leading 1 and create zeros below it. This completes the work on row 1, and we repeat the procedure on the matrix consisting of the remaining rows. Thus the second circled column is the leading column of this smaller matrix, which we use to create the second leading 1 and the zeros below it. As the remaining row is zero here, we are finished. Then A=LU where
L=[200−120−161]
This matrix L is obtained from I3 by replacing the bottom of the first two columns by the circled columns in the reduction. Note that the rankof A is 2 here, and this is the number of circled columns.
The calculation in Example [exa:006575] works in general. There is no need to calculate the elementary matrices Ei, and the method is suitable for use in a computer because the circled columns can be stored in memory as they are created. The procedure can be formally stated as follows:
LU-Algorithm006589 Let A be an m×n matrix of rankr, and suppose that A can be lower reduced to a row-echelon matrix U. Then A=LU where the lower triangular, invertible matrix L is constructed as follows:
- If A=0, take L=Im and U=0.
- If A≠0, write A1=A and let c1 be the leading column of A1. Use c1 to create the first leading 1 and create zeros below it (using lower reduction). When this is completed, let A2 denote the matrix consisting of rows 2 to m of the matrix just created.
- If A2≠0, let c2 be the leading column of A2 and repeat Step 2 on A2 to create A3.
- Continue in this way until U is reached, where all rows below the last leading 1 consist of zeros. This will happen after r steps.
- Create L by placing c1,c2,…,cr at the bottom of the first r columns of Im.
A proof of the LU-algorithm is given at the end of this section.
LU-factorization is particularly important if, as often happens in business and industry, a series of equations Ax=B1,Ax=B2,…,Ax=Bk, must be solved, each with the same coefficient matrix A. It is very efficient to solve the first system by gaussian elimination, simultaneously creating an LU-factorization of A, and then using the factorization to solve the remaining systems by forward and back substitution.
006623 Find an LU-factorization for A=[5−51005−33221−220−101−11025].
The reduction to row-echelon form is
[\tnexa3a5−51005−33221−220−10\tnexa3b1−11025]→[1−120100\tnexa3c824004−1200\tnexa3d824]→[1−12010011412000\tnexa3e−20000\tnexa3f00]→[1−120100114120001000000]=U
If U denotes this row-echelon matrix, then A=LU, where
L=[5000−3800−24−201801]
The next example deals with a case where no row of zeros is present in U (in fact, A is invertible).
006634 Find an LU-factorization for A=[242112−102].
The reduction to row-echelon form is
[\tnexa4a242112\tnexa4b−102]→[1210\tnexa4c−110\tnexa4d23]→[12101−100\tnexa4e5]→[12101−1001]=U
Hence A=LU where L=[2001−10−125].
There are matrices (for example [0110]) that have no LU-factorization and so require at least one row interchange when being carried to row-echelon form via the gaussian algorithm. However, it turns out that, if all the row interchanges encountered in the algorithm are carried out first, the resulting matrix requires no interchanges and so has an LU-factorization. Here is the precise result.
006646 Suppose an m×n matrix A is carried to a row-echelon matrix U via the gaussian algorithm. Let P1,P2,…,Ps be the elementary matrices corresponding (in order) to the row interchanges used, and write P=Ps⋯P2P1. (If no interchanges are used take P=Im.) Then:
- PA is the matrix obtained from A by doing these interchanges (in order) to A.
- PA has an LU-factorization.
The proof is given at the end of this section.
A matrix P that is the product of elementary matrices corresponding to row interchanges is called a permutation matrix. Such a matrix is obtained from the identity matrix by arranging the rows in a different order, so it has exactly one 1 in each row and each column, and has zeros elsewhere. We regard the identity matrix as a permutation matrix. The elementary permutation matrices are those obtained from I by a single row interchange, and every permutation matrix is a product of elementary ones.
006663 If A=[00−12−1−11221−3601−14], find a permutation matrix P such that PA has an LU-factorization, and then find the factorization.
Apply the gaussian algorithm to A:
A∗→[−1−11200−1221−3601−14]→[11−1−200−120−1−11001−14]∗→[11−1−20−1−11000−1201−14]→[11−1−2011−1000−1200−214]→[11−1−2011−10001−200010]
Two row interchanges were needed (marked with ∗), first rows 1 and 2 and then rows 2 and 3. Hence, as in Theorem [thm:006646],
P=[1000001001000001][0100100000100001]=[0100001010000001]
If we do these interchanges (in order) to A, the result is PA. Now apply the LU-algorithm to PA:
PA=[\tnexa5a−1−11221−3600−12\tnexa5b01−14]→[11−1−20\tnexa5c−1−11000−120\tnexa5d1−14]→[11−1−2011−1000\tnexa5e−1200\tnexa5f−214]→[11−1−2011−10001−2000\tnexa5g10]→[11−1−2011−10001−20001]=U
Hence, PA=LU, where L=[−10002−10000−1001−210] and U=[11−1−2011−10001−20001].
Theorem [thm:006646] provides an important general factorization theorem for matrices. If A is any m×n matrix, it asserts that there exists a permutation matrix P and an LU-factorization PA=LU. Moreover, it shows that either P=I or P=Ps⋯P2P1, where P1,P2,…,Ps are the elementary permutation matrices arising in the reduction of A to row-echelon form. Now observe that P−1i=Pi for each i (they are elementary row interchanges). Thus, P−1=P1P2⋯Ps, so the matrix A can be factored as
A=P−1LU
where P−1 is a permutation matrix, L is lower triangular and invertible, and U is a row-echelon matrix. This is called a PLU-factorization of A.
The LU-factorization in Theorem [thm:006566] is not unique. For example,
[1032][1−23000]=[1031][1−23000]
However, it is necessary here that the row-echelon matrix has a row of zeros. Recall that the rank of a matrix A is the number of nonzero rows in any row-echelon matrix U to which A can be carried by row operations. Thus, if A is m×n, the matrix U has no row of zeros if and only if A has rankm.
006697 Let A be an m×n matrix that has an LU-factorization
A=LU
If A has rankm (that is, U has no row of zeros), then L and U are uniquely determined by A.
Suppose A=MV is another LU-factorization of A, so M is lower triangular and invertible and V is row-echelon. Hence LU=MV, and we must show that L=M and U=V. We write N=M−1L. Then N is lower triangular and invertible (Lemma [lem:006547]) and NU=V, so it suffices to prove that N=I. If N is m×m, we use induction on m. The case m=1 is left to the reader. If m>1, observe first that column 1 of V is N times column 1 of U. Thus if either column is zero, so is the other (N is invertible). Hence, we can assume (by deleting zero columns) that the (1,1)-entry is 1 in both U and V.
Now we write N=[a0XN1],U=[1Y0U1], and V=[1Z0V1] in block form. Then NU=V becomes [aaYXXY+N1U1]=[1Z0V1]. Hence a=1, Y=Z, X=0, and N1U1=V1. But N1U1=V1 implies N1=I by induction, whence N=I.
If A is an m×m invertible matrix, then A has rankm by Theorem [thm:004553]. Hence, we get the following important special case of Theorem [thm:006697].
006719 If an invertible matrix A has an LU-factorization A=LU, then L and U are uniquely determined by A.
Of course, in this case U is an upper triangular matrix with 1s along the main diagonal.
Proofs of Theorems
If c1,c2,…,cr are columns of lengths m,m−1,…,m−r+1, respectively, write L(m)(c1,c2,…,cr) for the lower triangular m×m matrix obtained from Im by placing c1,c2,…,cr at the bottom of the first r columns of Im.
Proceed by induction on n. If A=0 or n=1, it is left to the reader. If n>1, let c1 denote the leading column of A and let k1 denote the first column of the m×m identity matrix. There exist elementary matrices E1,…,Ek such that, in block form,
(Ek⋯E2E1)A=[\multirow2∗$0$\multirow2∗$k1$X1\cline3−3A1] where (Ek⋯E2E1)c1=k1
Moreover, each Ej can be taken to be lower triangular (by assumption). Write
G=(Ek⋯E2E1)−1=E−11E−12⋯E−1k
Then G is lower triangular, and Gk1=c1. Also, each Ej (and so each E−1j) is the result of either multiplying row 1 of Im by a constant or adding a multiple of row 1 to another row. Hence,
G=(E−11E−12⋯E−1k)Im=[\multirow2∗$c1$0\cline2−2Im−1]
in block form. Now, by induction, let A1=L1U1 be an LU-factorization of A1, where L1=L(m−1)[c2,…,cr] and U1 is row-echelon. Then block multiplication gives
G−1A=[\multirow2∗$0$\multirow2∗$k1$X1\cline3−3L1U1]=[100L1][01X100U1]
Hence A=LU, where U=[01X100U1] is row-echelon and
L=[\multirow2∗$c1$0\cline2−2Im−1][100L1]=[\multirow2∗$c1$0\cline2−2L]=L(m)[c1,c2,…,cr]
This completes the proof.
Let A be a nonzero m×n matrix and let kj denote column j of Im. There is a permutation matrix P1 (where either P1 is elementary or P1=Im) such that the first nonzero column c1 of P1A has a nonzero entry on top. Hence, as in the LU-algorithm,
L(m)[c1]−1⋅P1⋅A=[01X100A1]
in block form. Then let P2 be a permutation matrix (either elementary or Im) such that
P2⋅L(m)[c1]−1⋅P1⋅A=[01X100A′1]
and the first nonzero column c2 of A′1 has a nonzero entry on top. Thus,
L(m)[k1,c2]−1⋅P2⋅L(m)[c1]−1⋅P1⋅A=[01X10001X200A2]
in block form. Continue to obtain elementary permutation matrices P1,P2,…,Pr and columns c1,c2,…,cr of lengths m,m−1,…, such that
(LrPrLr−1Pr−1⋯L2P2L1P1)A=U
where U is a row-echelon matrix and Lj=L(m)[k1,…,kj−1,cj]−1 for each j, where the notation means the first j−1 columns are those of Im. It is not hard to verify that each Lj has the form Lj=L(m)[k1,…,kj−1,c′j] where c′j is a column of length m−j+1. We now claim that each permutation matrix Pk can be “moved past” each matrix Lj to the right of it, in the sense that
PkLj=L′jPk
where L′j=L(m)[k1,…,kj−1,c′′j] for some column c′′j of length m−j+1. Given that this is true, we obtain a factorization of the form
(LrL′r−1⋯L′2L′1)(PrPr−1⋯P2P1)A=U
If we write P=PrPr−1⋯P2P1, this shows that PA has an LU-factorization because LrL′r−1⋯L′2L′1 is lower triangular and invertible. All that remains is to prove the following rather technical result.
006829 Let Pk result from interchanging row k of Im with a row below it. If j<k, let cj be a column of length m−j+1. Then there is another column c′j of length m−j+1 such that
Pk⋅L(m)[k1,…,kj−1,cj]=L(m)[k1,…,kj−1,c′j]⋅Pk
The proof is left as Exercise [ex:ex2_7_11].