2.7: LU-Factorization
- Page ID
- 58932
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The solution to a system \(A\mathbf{x} = \mathbf{b}\) of linear equations can be solved quickly if \(A\) can be factored as \(A = LU\) where \(L\) and \(U\) are of a particularly nice form. In this section we show that Gaussian elimination can be used to find such factorizations.
Triangular Matrices
As for square matrices, if \(A = \left[ a_{ij} \right]\) is an \(m \times n\) matrix, the elements \(a_{11}, a_{22}, a_{33}, \dots\) form the main diagonal of \(A\). Then \(A\) is called upper triangular if every entry below and to the left of the main diagonal is zero. Every row-echelon matrix is upper triangular, as are the matrices
\[\left[ \begin{array}{rrrr} 1 & -1 & 0 & 3 \\ 0 & 2 & 1 & 1 \\ 0 & 0 & -3 & 0 \end{array} \right] \quad \left[ \begin{array}{rrrrr} 0 & 2 & 1 & 0 & 5 \\ 0 & 0 & 0 & 3 & 1 \\ 0 & 0 & 1 & 0 & 1 \end{array} \right] \quad \left[ \begin{array}{rrr} 1 & 1 & 1 \\ 0 & -1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} \right] \nonumber \]
By analogy, a matrix \(A\) is called lower triangular if its transpose is upper triangular, that is if each entry above and to the right of the main diagonal is zero. A matrix is called triangular if it is upper or lower triangular.
006519 Solve the system
\[ \begin{array}{rrrrrrrrrrr} x_{1} & + & 2x_{2} & - & 3x_{3} & - & x_{4} & + & 5x_{5} & = & 3 \\ & & & & 5x_{3} & + & x_{4} & + & x_{5} & = & 8 \\ & & & & & & & & 2x_{5} & = & 6 \\ \end{array} \nonumber \]
where the coefficient matrix is upper triangular.
As in gaussian elimination, let the “non-leading” variables be parameters: \(x_{2} = s\) and \(x_{4} = t\). Then solve for \(x_{5}\), \(x_{3}\), and \(x_{1}\) in that order as follows. The last equation gives
\[x_{5} = \frac{6}{2} = 3 \nonumber \]
Substitution into the second last equation gives
\[x_{3} = 1 - \frac{1}{5}t \nonumber \]
Finally, substitution of both \(x_{5}\) and \(x_{3}\) into the first equation gives
\[x_{1} = -9 -2s + \frac{2}{5}t \nonumber \]
The method used in Example [exa:006519] is called back substitution because later variables are substituted into earlier equations. It works because the coefficient matrix is upper triangular. Similarly, if the coefficient matrix is lower triangular the system can be solved by forward substitution where earlier variables are substituted into later equations. As observed in Section [sec:1_2], these procedures are more numerically efficient than gaussian elimination.
Now consider a system \(A\mathbf{x} = \mathbf{b}\) where \(A\) can be factored as \(A = LU\) where \(L\) is lower triangular and \(U\) is upper triangular. Then the system \(A\mathbf{x} = \mathbf{b}\) can be solved in two stages as follows:
- First solve \(L\mathbf{y} = \mathbf{b}\) for \(\mathbf{y}\) by forward substitution.
- Then solve \(U\mathbf{x} = \mathbf{y}\) for \(\mathbf{x}\) by back substitution.
Then \(\mathbf{x}\) is a solution to \(A\mathbf{x} = \mathbf{b}\) because \(A\mathbf{x} = LU\mathbf{x} = L\mathbf{y} = \mathbf{b}\). Moreover, every solution \(\mathbf{x}\) arises this way (take \(\mathbf{y} = U\mathbf{x}\)). Furthermore the method adapts easily for use in a computer.
This focuses attention on efficiently obtaining such factorizations \(A = LU\). The following result will be needed; the proof is straightforward and is left as Exercises [ex:ex2_7_7] and [ex:ex2_7_8].
006547 Let \(A\) and \(B\) denote matrices.
- If \(A\) and \(B\) are both lower (upper) triangular, the same is true of \(AB\).
- If \(A\) is \(n \times n\) and lower (upper) triangular, then A is invertible if and only if every main diagonal entry is nonzero. In this case \(A^{-1}\) is also lower (upper) triangular.
LU-Factorization
Let \(A\) be an \(m \times n\) matrix. Then \(A\) can be carried to a row-echelon matrix \(U\) (that is, upper triangular). As in Section [sec:2_5], the reduction is
\[A \rightarrow E_{1}A \rightarrow E_{2}E_{1}A \rightarrow E_{3}E_{2}E_{1}A \rightarrow \cdots \rightarrow E_{k}E_{k-1} \cdots E_{2}E_{1}A = U \nonumber \]
where \(E_{1}, E_{2}, \dots, E_{k}\) are elementary matrices corresponding to the row operations used. Hence
\[A = LU \nonumber \]
where \(L = \left(E_{k}E_{k-1} \cdots E_{2}E_{1}\right)^{-1} = E_{1}^{-1}E_{2}^{-1} \cdots E_{k-1}^{-1}E_{k}^{-1}\). If we do not insist that \(U\) is reduced then, except for row interchanges, none of these row operations involve adding a row to a row above it. Thus, if no row interchanges are used, all the \(E_{i}\) are lower triangular, and so \(L\) is lower triangular (and invertible) by Lemma [lem:006547]. This proves the following theorem. For convenience, let us say that \(A\) can be lower reduced if it can be carried to row-echelon form using no row interchanges.
006566 If \(A\) can be lower reduced to a row-echelon matrix \(U\), then
\[A = LU \nonumber \]
where \(L\) is lower triangular and invertible and \(U\) is upper triangular and row-echelon.
LU-factorization006570 A factorization \(A = LU\) as in Theorem [thm:006566] is called an LU-factorization of \(A\).
Such a factorization may not exist (Exercise [ex:ex2_7_4]) because \(A\) cannot be carried to row-echelon form using no row interchange. A procedure for dealing with this situation will be outlined later. However, if an LU-factorization \(A = LU\) does exist, then the gaussian algorithm gives \(U\) and also leads to a procedure for finding \(L\). Example [exa:006575] provides an illustration. For convenience, the first nonzero column from the left in a matrix \(A\) is called the leading column of \(A\).
006575 Find an LU-factorization of \(A = \left[ \begin{array}{rrrrr} 0 & 2 & -6 & -2 & 4 \\ 0 & -1 & 3 & 3 & 2 \\ 0 & -1 & 3 & 7 & 10 \end{array} \right]\).
We lower reduce \(A\) to row-echelon form as follows:
\[A = \left[ \begin{array}{rrrrr} 0 & \tn{reduce1A}{2} & -6 & -2 & 4 \\ 0 & -1 & 3 & 3 & 2 \\ 0 & \tn{reduce1B}{-1} & 3 & 7 & 10 \end{array} \right] \rightarrow \left[ \begin{array}{rrrrr} 0 & 1 & -3 & -1 & 2 \\ 0 & 0 & 0 & \tn{reduce2A}{2} & 4 \\ 0 & 0 & 0 & \tn{reduce2B}{6} & 12 \end{array} \right] \rightarrow \left[ \begin{array}{rrrrr} 0 & 1 & -3 & -1 & 2 \\ 0 & 0 & 0 & 1 & 2 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right] = U \nonumber \]
The circled columns are determined as follows: The first is the leading column of \(A\), and is used (by lower reduction) to create the first leading \(1\) and create zeros below it. This completes the work on row 1, and we repeat the procedure on the matrix consisting of the remaining rows. Thus the second circled column is the leading column of this smaller matrix, which we use to create the second leading \(1\) and the zeros below it. As the remaining row is zero here, we are finished. Then \(A = LU\) where
\[L = \left[ \begin{array}{rrr} 2 & 0 & 0 \\ -1 & 2 & 0 \\ -1 & 6 & 1 \end{array} \right] \nonumber \]
This matrix \(L\) is obtained from \(I_{3}\) by replacing the bottom of the first two columns by the circled columns in the reduction. Note that the \(rank \;\)of \(A\) is \(2\) here, and this is the number of circled columns.
The calculation in Example [exa:006575] works in general. There is no need to calculate the elementary matrices \(E_{i}\), and the method is suitable for use in a computer because the circled columns can be stored in memory as they are created. The procedure can be formally stated as follows:
LU-Algorithm006589 Let \(A\) be an \(m \times n\) matrix of \(rank \;r\), and suppose that \(A\) can be lower reduced to a row-echelon matrix \(U\). Then \(A = LU\) where the lower triangular, invertible matrix \(L\) is constructed as follows:
- If \(A = 0\), take \(L = I_{m}\) and \(U = 0\).
- If \(A \neq 0\), write \(A_{1} = A\) and let \(\mathbf{c}_{1}\) be the leading column of \(A_{1}\). Use \(\mathbf{c}_{1}\) to create the first leading \(1\) and create zeros below it (using lower reduction). When this is completed, let \(A_{2}\) denote the matrix consisting of rows 2 to \(m\) of the matrix just created.
- If \(A_{2} \neq 0\), let \(\mathbf{c}_{2}\) be the leading column of \(A_{2}\) and repeat Step 2 on \(A_{2}\) to create \(A_{3}\).
- Continue in this way until \(U\) is reached, where all rows below the last leading \(1\) consist of zeros. This will happen after \(r\) steps.
- Create \(L\) by placing \(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r}\) at the bottom of the first \(r\) columns of \(I_{m}\).
A proof of the LU-algorithm is given at the end of this section.
LU-factorization is particularly important if, as often happens in business and industry, a series of equations \(A\mathbf{x} = B_{1}, A\mathbf{x} = B_{2}, \dots, A\mathbf{x} = B_{k}\), must be solved, each with the same coefficient matrix \(A\). It is very efficient to solve the first system by gaussian elimination, simultaneously creating an LU-factorization of \(A\), and then using the factorization to solve the remaining systems by forward and back substitution.
006623 Find an LU-factorization for \(A = \left[ \begin{array}{rrrrr} 5 & -5 & 10 & 0 & 5 \\ -3 & 3 & 2 & 2 & 1 \\ -2 & 2 & 0 & -1 & 0 \\ 1 & -1 & 10 & 2 & 5 \end{array} \right]\).
The reduction to row-echelon form is
\[\begin{aligned} \left[ \begin{array}{rrrrr} \tn{exa3a}{5} & -5 & 10 & 0 & 5 \\ -3 & 3 & 2 & 2 & 1 \\ -2 & 2 & 0 & -1 & 0 \\ \tn{exa3b}{1} & -1 & 10 & 2 & 5 \end{array} \right] &\rightarrow \left[ \begin{array}{rrrrr} 1 & -1 & 2 & 0 & 1 \\ 0 & 0 & \tn{exa3c}{8} & 2 & 4 \\ 0 & 0 & 4 & -1 & 2 \\ 0 & 0 & \tn{exa3d}{8} & 2 & 4 \end{array} \right] \\ &\rightarrow \left[ \def\arraystretch{1.5}\begin{array}{rrrrr} 1 & -1 & 2 & 0 & 1 \\ 0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} \\ 0 & 0 & 0 & \tn{exa3e}{-2} & 0 \\ 0 & 0 & 0 & \tn{exa3f}{0} & 0 \end{array} \right] \\ &\rightarrow \left[ \def\arraystretch{1.5}\begin{array}{rrrrr} 1 & -1 & 2 & 0 & 1 \\ 0 & 0 & 1 & \frac{1}{4} & \frac{1}{2} \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right] = U\end{aligned} \nonumber \]
If \(U\) denotes this row-echelon matrix, then \(A = LU\), where
\[L = \left[ \begin{array}{rrrrr} 5 & 0 & 0 & 0 \\ -3 & 8 & 0 & 0 \\ -2 & 4 & -2 & 0 \\ 1 & 8 & 0 & 1 \end{array} \right] \nonumber \]
The next example deals with a case where no row of zeros is present in \(U\) (in fact, \(A\) is invertible).
006634 Find an LU-factorization for \(A = \left[ \begin{array}{rrr} 2 & 4 & 2 \\ 1 & 1 & 2 \\ -1 & 0 & 2 \end{array} \right]\).
The reduction to row-echelon form is
\[\left[ \begin{array}{rrr} \tn{exa4a}{2} & 4 & 2 \\ 1 & 1 & 2 \\ \tn{exa4b}{-1} & 0 & 2 \end{array} \right] \rightarrow \left[ \begin{array}{rrr} 1 & 2 & 1 \\ 0 & \tn{exa4c}{-1} & 1 \\ 0 & \tn{exa4d}{2} & 3 \end{array} \right] \rightarrow \left[ \begin{array}{rrr} 1 & 2 & 1 \\ 0 & 1 & -1 \\ 0 & 0 & \tn{exa4e}{5} \end{array} \right] \rightarrow \left[ \begin{array}{rrr} 1 & 2 & 1 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{array} \right] = U \nonumber \]
Hence \(A = LU\) where \(L = \left[ \begin{array}{rrr} 2 & 0 & 0 \\ 1 & -1 & 0 \\ -1 & 2 & 5 \end{array} \right]\).
There are matrices (for example \(\left[ \begin{array}{rr} 0 & 1 \\ 1 & 0 \end{array} \right]\)) that have no LU-factorization and so require at least one row interchange when being carried to row-echelon form via the gaussian algorithm. However, it turns out that, if all the row interchanges encountered in the algorithm are carried out first, the resulting matrix requires no interchanges and so has an LU-factorization. Here is the precise result.
006646 Suppose an \(m \times n\) matrix \(A\) is carried to a row-echelon matrix \(U\) via the gaussian algorithm. Let \(P_{1}, P_{2}, \dots, P_{s}\) be the elementary matrices corresponding (in order) to the row interchanges used, and write \(P = P_{s} \cdots P_{2}P_{1}\). (If no interchanges are used take \(P = I_{m}\).) Then:
- \(PA\) is the matrix obtained from \(A\) by doing these interchanges (in order) to \(A\).
- \(PA\) has an LU-factorization.
The proof is given at the end of this section.
A matrix \(P\) that is the product of elementary matrices corresponding to row interchanges is called a permutation matrix. Such a matrix is obtained from the identity matrix by arranging the rows in a different order, so it has exactly one \(1\) in each row and each column, and has zeros elsewhere. We regard the identity matrix as a permutation matrix. The elementary permutation matrices are those obtained from \(I\) by a single row interchange, and every permutation matrix is a product of elementary ones.
006663 If \(A = \left[ \begin{array}{rrrr} 0 & 0 & -1 & 2 \\ -1 & -1 & 1 & 2 \\ 2 & 1 & -3 & 6 \\ 0 & 1 & -1 & 4 \end{array} \right]\), find a permutation matrix \(P\) such that \(PA\) has an LU-factorization, and then find the factorization.
Apply the gaussian algorithm to \(A\):
\[\begin{aligned} A \xrightarrow{*} \left[ \begin{array}{rrrr} -1 & -1 & 1 & 2 \\ 0 & 0 & -1 & 2 \\ 2 & 1 & -3 & 6 \\ 0 & 1 & -1 & 4 \end{array} \right] &\rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 0 & -1 & 2 \\ 0 & -1 & -1 & 10 \\ 0 & 1 & -1 & 4 \end{array} \right] \xrightarrow{*} \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & -1 & -1 & 10 \\ 0 & 0 & -1 & 2 \\ 0 & 1 & -1 & 4 \end{array} \right] \\ &\rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & -1 & 2 \\ 0 & 0 & -2 & 14 \end{array} \right] \rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & 10 \end{array} \right]\end{aligned} \nonumber \]
Two row interchanges were needed (marked with \(*\)), first rows 1 and 2 and then rows 2 and 3. Hence, as in Theorem [thm:006646],
\[P = \left[ \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{rrrr} 0 & 1 & 0 & 0\\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{rrrr} 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right] \nonumber \]
If we do these interchanges (in order) to \(A\), the result is \(PA\). Now apply the LU-algorithm to \(PA\):
\[\begin{aligned} PA = \left[ \begin{array}{rrrr} \tn{exa5a}{-1} & -1 & 1 & 2 \\ 2 & 1 & -3 & 6 \\ 0 & 0 & -1 & 2 \\ \tn{exa5b}{0} & 1 & -1 & 4 \end{array} \right] &\rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & \tn{exa5c}{-1} & -1 & 10 \\ 0 & 0 & -1 & 2 \\ 0 & \tn{exa5d}{1} & -1 & 4 \end{array} \right] \rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & \tn{exa5e}{-1} & 2 \\ 0 & 0 & \tn{exa5f}{-2} & 14 \end{array} \right] \\ &\rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & \tn{exa5g}{10} \end{array} \right] \rightarrow \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & 1 \end{array} \right] = U\end{aligned} \nonumber \]
Hence, \(PA = LU\), where \(L = \left[ \begin{array}{rrrr} -1 & 0 & 0 & 0 \\ 2 & -1 & 0 & 0 \\ 0 & 0 & -1 & 0 \\ 0 & 1 & -2 & 10 \end{array} \right]\) and \(U = \left[ \begin{array}{rrrr} 1 & 1 & -1 & -2 \\ 0 & 1 & 1 & -10 \\ 0 & 0 & 1 & -2 \\ 0 & 0 & 0 & 1 \end{array} \right]\).
Theorem [thm:006646] provides an important general factorization theorem for matrices. If \(A\) is any \(m \times n\) matrix, it asserts that there exists a permutation matrix \(P\) and an LU-factorization \(PA = LU\). Moreover, it shows that either \(P = I\) or \(P = P_{s} \cdots P_{2}P_{1}\), where \(P_{1}, P_{2}, \dots, P_{s}\) are the elementary permutation matrices arising in the reduction of \(A\) to row-echelon form. Now observe that \(P_{i}^{-1} = P_{i}\) for each \(i\) (they are elementary row interchanges). Thus, \(P^{-1} = P_{1}P_{2} \cdots P_{s}\), so the matrix \(A\) can be factored as
\[A = P^{-1}LU \nonumber \]
where \(P^{-1}\) is a permutation matrix, \(L\) is lower triangular and invertible, and \(U\) is a row-echelon matrix. This is called a PLU-factorization of \(A\).
The LU-factorization in Theorem [thm:006566] is not unique. For example,
\[\left[ \begin{array}{rr} 1 & 0 \\ 3 & 2 \end{array} \right] \left[ \begin{array}{rrr} 1 & -2 & 3 \\ 0 & 0 & 0 \end{array} \right] = \left[ \begin{array}{rr} 1 & 0 \\ 3 & 1 \end{array} \right] \left[ \begin{array}{rrr} 1 & -2 & 3 \\ 0 & 0 & 0 \end{array} \right] \nonumber \]
However, it is necessary here that the row-echelon matrix has a row of zeros. Recall that the \(rank \;\) of a matrix \(A\) is the number of nonzero rows in any row-echelon matrix \(U\) to which \(A\) can be carried by row operations. Thus, if \(A\) is \(m \times n\), the matrix \(U\) has no row of zeros if and only if \(A\) has \(rank \;m\).
006697 Let \(A\) be an \(m \times n\) matrix that has an LU-factorization
\[A = LU \nonumber \]
If \(A\) has \(rank \;m\) (that is, \(U\) has no row of zeros), then \(L\) and \(U\) are uniquely determined by \(A\).
Suppose \(A = MV\) is another LU-factorization of \(A\), so \(M\) is lower triangular and invertible and \(V\) is row-echelon. Hence \(LU = MV\), and we must show that \(L = M\) and \(U = V\). We write \(N = M^{-1}L\). Then \(N\) is lower triangular and invertible (Lemma [lem:006547]) and \(NU = V\), so it suffices to prove that \(N = I\). If \(N\) is \(m \times m\), we use induction on \(m\). The case \(m = 1\) is left to the reader. If \(m > 1\), observe first that column 1 of \(V\) is \(N\) times column 1 of \(U\). Thus if either column is zero, so is the other (\(N\) is invertible). Hence, we can assume (by deleting zero columns) that the \((1, 1)\)-entry is \(1\) in both \(U\) and \(V\).
Now we write \(N = \left[ \begin{array}{cc} a & 0 \\ X & N_{1} \end{array} \right], U = \left[ \begin{array}{cc} 1 & Y \\ 0 & U_{1} \end{array} \right]\), and \(V = \left[ \begin{array}{cc} 1 & Z \\ 0 & V_{1} \end{array} \right]\) in block form. Then \(NU = V\) becomes \(\left[ \begin{array}{cc} a & aY \\ X & XY + N_{1}U_{1} \end{array} \right] = \left[ \begin{array}{cc} 1 & Z \\ 0 & V_{1} \end{array} \right]\). Hence \(a = 1\), \(Y = Z\), \(X = 0\), and \(N_{1}U_{1} = V_{1}\). But \(N_{1}U_{1} = V_{1}\) implies \(N_{1} = I\) by induction, whence \(N = I\).
If \(A\) is an \(m \times m\) invertible matrix, then \(A\) has \(rank \;m\) by Theorem [thm:004553]. Hence, we get the following important special case of Theorem [thm:006697].
006719 If an invertible matrix \(A\) has an LU-factorization \(A = LU\), then \(L\) and \(U\) are uniquely determined by \(A\).
Of course, in this case \(U\) is an upper triangular matrix with \(1\)s along the main diagonal.
Proofs of Theorems
If \(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r}\) are columns of lengths \(m, m - 1, \dots, m - r + 1\), respectively, write \(L^{(m)}(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r})\) for the lower triangular \(m \times m\) matrix obtained from \(I_{m}\) by placing \(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r}\) at the bottom of the first \(r\) columns of \(I_{m}\).
Proceed by induction on \(n\). If \(A = 0\) or \(n = 1\), it is left to the reader. If \(n > 1\), let \(\mathbf{c}_{1}\) denote the leading column of \(A\) and let \(\mathbf{k}_{1}\) denote the first column of the \(m \times m\) identity matrix. There exist elementary matrices \(E_{1}, \dots, E_{k}\) such that, in block form,
\[(E_{k} \cdots E_{2}E_{1})A = \left[ \begin{array}{c|c|c} \multirow{2}{*}{$0$} & \multirow{2}{*}{$\mathbf{k}_1$} & X_1 \\\cline{3-3} & & A_1 \end{array}\right] \quad \mbox{ where } (E_{k} \cdots E_{2}E_{1})\mathbf{c}_{1} = \mathbf{k}_{1} \nonumber \]
Moreover, each \(E_{j}\) can be taken to be lower triangular (by assumption). Write
\[G = (E_{k} \cdots E_{2}E_{1})^{-1} = E_{1}^{-1}E_{2}^{-1} \cdots E_{k}^{-1} \nonumber \]
Then \(G\) is lower triangular, and \(G\mathbf{k}_{1} = \mathbf{c}_{1}\). Also, each \(E_{j}\) (and so each \(E_{j}^{-1}\)) is the result of either multiplying row 1 of \(I_{m}\) by a constant or adding a multiple of row 1 to another row. Hence,
\[G = (E_{1}^{-1}E_{2}^{-1} \cdots E_{k}^{-1})I_{m} = \left[ \begin{array}{c|c} \multirow{2}{*}{$\mathbf{c}_{1}$} & 0 \\\cline{2-2} & I_{m-1} \end{array}\right] \nonumber \]
in block form. Now, by induction, let \(A_{1} = L_{1}U_{1}\) be an LU-factorization of \(A_{1}\), where \(L_{1} = L^{(m-1)} \left[ \mathbf{c}_{2}, \dots, \mathbf{c}_{r} \right]\) and \(U_{1}\) is row-echelon. Then block multiplication gives
\[G^{-1}A = \left[ \begin{array}{c|c|c} \multirow{2}{*}{$0$} & \multirow{2}{*}{$\mathbf{k}_1$} & X_1 \\\cline{3-3} & & L_{1}U_{1} \end{array}\right] = \left[ \begin{array}{c|c} 1 & 0 \\ \hline 0 & L_{1} \end{array} \right] \left[ \begin{array}{c|c|c} 0 & 1 & X_{1} \\ \hline 0 & 0 & U_{1} \end{array} \right] \nonumber \]
Hence \(A = LU\), where \(U = \left[ \begin{array}{c|c|c} 0 & 1 & X_{1} \\ \hline 0 & 0 & U_{1} \end{array} \right]\) is row-echelon and
\[L = \left[ \begin{array}{c|c} \multirow{2}{*}{$\mathbf{c}_{1}$} & 0 \\\cline{2-2} & I_{m-1} \end{array}\right] \left[ \begin{array}{c|c} 1 & 0 \\ \hline 0 & L_{1} \end{array} \right] = \left[ \begin{array}{c|c} \multirow{2}{*}{$\mathbf{c}_{1}$} & 0 \\\cline{2-2} & L \end{array}\right] = L^{(m)} \left[ \mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r} \right] \nonumber \]
This completes the proof.
Let \(A\) be a nonzero \(m \times n\) matrix and let \(\mathbf{k}_{j}\) denote column \(j\) of \(I_{m}\). There is a permutation matrix \(P_{1}\) (where either \(P_{1}\) is elementary or \(P_{1} = I_{m}\)) such that the first nonzero column \(\mathbf{c}_{1}\) of \(P_{1}A\) has a nonzero entry on top. Hence, as in the LU-algorithm,
\[L^{(m)} \left[ \mathbf{c}_{1} \right]^{-1} \cdot P_{1} \cdot A = \left[ \begin{array}{c|c|c} 0 & 1 & X_{1} \\ \hline 0 & 0 & A_{1} \end{array} \right] \nonumber \]
in block form. Then let \(P_{2}\) be a permutation matrix (either elementary or \(I_{m}\)) such that
\[P_{2} \cdot L^{(m)} \left[ \mathbf{c}_{1} \right]^{-1} \cdot P_{1} \cdot A = \left[ \begin{array}{c|c|c} 0 & 1 & X_{1} \\ \hline 0 & 0 & A_{1}^\prime \end{array} \right] \nonumber \]
and the first nonzero column \(\mathbf{c}_{2}\) of \(A^\prime_{1}\) has a nonzero entry on top. Thus,
\[L^{(m)} \left[ \mathbf{k}_{1}, \mathbf{c}_{2} \right]^{-1} \cdot P_{2} \cdot L^{(m)} \left[ \mathbf{c}_{1} \right]^{-1} \cdot P_{1} \cdot A = \left[ \begin{array}{c|c|c} 0 & 1 & X_{1} \\ \hline 0 & 0 & \begin{array}{c|c|c} 0 & 1 & X_{2} \\ \hline 0 & 0 & A_{2} \end{array} \end{array} \right] \nonumber \]
in block form. Continue to obtain elementary permutation matrices \(P_{1}, P_{2}, \dots, P_{r}\) and columns \(\mathbf{c}_{1}, \mathbf{c}_{2}, \dots, \mathbf{c}_{r}\) of lengths \(m, m - 1, \dots\), such that
\[(L_{r}P_{r}L_{r-1}P_{r-1} \cdots L_{2}P_{2}L_{1}P_{1})A = U \nonumber \]
where \(U\) is a row-echelon matrix and \(L_{j} = L^{(m)} \left[ \mathbf{k}_{1}, \dots, \mathbf{k}_{j-1}, \mathbf{c}_{j} \right]^{-1}\) for each \(j\), where the notation means the first \(j - 1\) columns are those of \(I_{m}\). It is not hard to verify that each \(L_{j}\) has the form \(L_{j} = L^{(m)} \left[ \mathbf{k}_{1}, \dots, \mathbf{k}_{j-1}, \mathbf{c}^\prime_{j} \right]\) where \(\mathbf{c}^\prime_{j}\) is a column of length \(m - j + 1\). We now claim that each permutation matrix \(P_{k}\) can be “moved past” each matrix \(L_{j}\) to the right of it, in the sense that
\[P_{k}L_{j} = L_{j}^\prime P_{k} \nonumber \]
where \(L^\prime_{j} = L^{(m)} \left[ \mathbf{k}_{1}, \dots, \mathbf{k}_{j-1}, \mathbf{c}^\prime \prime_{j} \right]\) for some column \(\mathbf{c}^\prime \prime_{j}\) of length \(m - j + 1\). Given that this is true, we obtain a factorization of the form
\[(L_{r}L_{r-1}^\prime \cdots L_{2}^\prime L_{1}^\prime)(P_{r}P_{r-1} \cdots P_{2}P_{1})A = U \nonumber \]
If we write \(P = P_{r}P_{r-1} \cdots P_{2}P_{1}\), this shows that \(PA\) has an LU-factorization because \(L_{r}L^\prime_{r-1} \cdots L^\prime_{2}L^\prime_{1}\) is lower triangular and invertible. All that remains is to prove the following rather technical result.
006829 Let \(P_{k}\) result from interchanging row \(k\) of \(I_{m}\) with a row below it. If \(j < k\), let \(c_{j}\) be a column of length \(m - j + 1\). Then there is another column \(\mathbf{c}_{j}^\prime\) of length \(m - j + 1\) such that
\[P_{k} \cdot L^{(m)} \left[ \mathbf{k}_{1}, \dots, \mathbf{k}_{j-1}, \mathbf{c}_{j} \right] = L^{(m)} \left[ \mathbf{k}_{1}, \dots, \mathbf{k}_{j-1}, \mathbf{c}_{j}^\prime \right] \cdot P_{k} \nonumber \]
The proof is left as Exercise [ex:ex2_7_11].