# A.3 Solving linear systems by factoring the coefficient matrix

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

There are many ways in which you might try to solve a given system of linear equations. This section is primarily devoted to describing two particularly popular techniques, both of which involve factoring the coeﬃcient matrix for the system into a product of simpler matrices. These techniques are also at the heart of many frequently used numerical (i.e., computer-assisted) applications of Linear Algebra.

You should note that the factorization of complicated objects into simpler components is an extremely common problem solving technique in mathematics. E.g., we will often factor a given polynomial into several polynomials of lower degree, and one can similarly use the prime factorization for an integer in order to simplify certain numerical computations.

**A.3.1 Factorizing matrices using Gaussian elimination**

In this section, we discuss a particularly signiﬁcant factorization for matrices known as **Gaussian elimination** (a.k.a. **Gauss-Jordan elimination**). Gaussian elimination can be used to express any matrix as a product involving one matrix in so-called **reduced row-echelon** form and one or more so-called **elementary matrices**. Then, once such a factorization has been found, we can immediately solve any linear system that has the factorized matrix as its coeﬃcient matrix. Moreover, the underlying technique for arriving at such a factorization is essentially an extension of the techniques already familiar to you for solving small systems of linear equations by hand.

Let \(m, n \in \mathbb{Z}_+\) denote positive integers, and suppose that \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix over \(\mathbb{F}.\) Then, following Section A.2.2, we will make extensive use of \(A^{(i, \cdot)}\) and \(A^{(\cdot ,j)}\) to

denote the row vectors and column vectors of \(A\), respectively.

**Deﬁnition A.3.1.** Let \(A \in \mathbb{F}^{m \times n}\) be an \(m \times n\) matrix over \(mathbb{F}.\) Then we say that \(A\) is in

**row-echelon form** (abbreviated **REF**) if the rows of \(A\) satisfy the following conditions:

- either \(A^{(1,\cdot)}\) is the zero vector or the ﬁrst non-zero entry in \(A^{(1,\cdot)}\) (when read from left to right) is a one.
- for \(i = 1, \ldots, m,\) if any row vector \(A^{(i,\cdot)}\) is the zero vector, then each subsequent row vector \(A^{(i+1,\cdot)} ,\ldots , A^{(m,\cdot)}\) is also the zero vector.
- for \(i = 2, \ldots , m,\) if some \(A^{(i,\cdot)}\)is not the zero vector, then the ﬁrst non-zero entry (when read from left to right) is a one and occurs to the right of the initial one in \(A^{(i-1,\cdot)}\).

The initial leading one in each non-zero row is called a **pivot.** We furthermore say that A is in **reduced row-echelon** form (abbreviated **RREF**) if

(4) for each column vector \(A(\cdot,j)\) containing a pivot \((j = 2, . . . , n),\) the pivot is the only non-zero element in \(A(\cdot,j)\).

The motivation behind Deﬁnition A.3.1 is that matrix equations having their coffcient matrix in RREF (and, in some sense, also REF) are particularly easy to solve. Note, in particular, that the only square matrix in RREF without zero rows is the identity matrix.

**Example A.3.2.** The following matrices are all in REF:

\[A_1=\left[ \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right], A_2= \left[ \begin{array}{cccc} 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right], A_3= \left[ \begin{array}{cccc} 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right], A_4= \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right],\\ A_5= \left[ \begin{array}{cccc} 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right], A_6= \left[ \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right], A_7=\left[ \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right], A_8=\left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right].\]

However, only \(A_4\) through \(A_8\) are in RREF, as you should verify. Moreover, if we take the

transpose of each of these matrices (as deﬁned in Section A.5.1), then only \(A^T_6 , A^T_7 ,\) and \(A^T_8\) are in RREF.

**Example A.3.3.**

1. Consider the following matrix in RREF:

\[A = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right].\]

Given any vector \(b = (b_i ) \in \mathbb{F}^4 \), the matrix equation \(Ax = b \)corresponds to the system

of equations

\[\left. \begin{array}{cccccc} x_1 &~~ &~~~~ &~~~ & = & b_1 \\ & x_2 & & & = & b_2 \\ & & x_3 & & = &b_3 \\ & & & x_4 & = & b_4\end{array} \right\}.\]

Since \(A\) is in RREF (in fact, \(A = I_4\) is the \(4 \times 4\) identity matrix), we can immediately conclude that the matrix equation \(Ax = b\) has the solution \(x = b\) for any choice of \(b\).

Moreover, as we will see in Section A.4.2, \(x = b\) is the only solution to this system.

2. Consider the following matrix in RREF:

\[A = \left[ \begin{array}{cccccc} 1 & 6 & 0 & 0 & 4 & -2 \\ 0 & 0 & 1 & 0 & 3 & 1 \\ 0 & 0 & 0 & 1 & 5 & 2 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right].\]

Given any vector \(b = (b_i ) \in \mathbb{F}^4 \), the matrix equation \(Ax = b\) corresponds to the system

\[ \left. \begin{array}{cccccccccccc} x_1 & + & 6x_2 & & & & + & 4x_5 & - & 2x_6 & = & b_1 \\ & & & & x_3 & &+ &3x_5 &+& x_6 & =& b_2 \\ && & & & x_4 & + & 5x_5 && 2x_6 &= &b_3 \\ & & & & & & & & & 0 &=&b_4\end{array} \right\}.\]

Since \(A\) is in RREF, we can immediately conclude a number of facts about solutions to this system. First of all, solutions exist if and only if \(b_4 = 0.\) Moreover, by “solving for the pivots”, we see that the system reduces to

\[\left. \begin{array}{cccccccc} x_1 & = & b_1 & −6x_2 & − & 4x_5 & + & 2x_6 \\ x_3 & = & b_2 & & − & 3x_5 & + & x_6 \\ x_4 & = & b_3 & & − & 5x_5 & + & 2x_6 \end{array} \right\},\]

and so there is only enough information to specify values for \(x_1, x_3,\) and \(x_4\) in terms of the otherwise arbitrary values for \(x_2 , x_5 ,\) and \(x_6 .\)

In this context, \(x_1 , x_3 ,\) and \(x_4\) are called **leading variables** since these are the variable corresponding to the pivots in \(A\). We similarly call \(x_2 , x_5 \), and \(x_6\) **free variables** since the leading variables have been expressed in terms of these remaining variable. In particular, given any scalars \(\alpha , \beta, \gamma \in \mathbb{F},\) it follows that the vector

\[x= \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \\ x_6 \end{array} \right]=\left[ \begin{array}{c} b_1 - 6\alpha -4\beta+2\gamma \\ \alpha \\ b_2-3\beta -\gamma \\ b_3-5\beta -2\gamma \\ \beta \\ \gamma \end{array} \right]= \left[ \begin{array}{c} b_1 \\ 0 \\ b_2 \\ b_3 \\ 0 \\ 0 \end{array} \right]+\left[ \begin{array}{c} -6\alpha \\ \alpha \\ 0 \\ 0 \\ 0 \\ 0 \end{array} \right]+\left[ \begin{array}{c} -4\beta \\ 0 \\ -3\beta \\ -5\beta \\ \beta \\ 0 \end{array} \right]+\left[ \begin{array}{c} 2\gamma \\ 0 \\ -\gamma \\ -2\gamma \\ 0 \\ \gamma \end{array} \right]\]

must satisfy the matrix equation \(Ax = b.\) One can also verify that every solution to

the matrix equation must be of this form. It then follows that the set of all solutions should somehow be “three dimensional”. As the above examples illustrate, a matrix equation having coeﬃcient matrix in RREF

corresponds to a system of equations that can be solved with only a small amount of computation. Somewhat amazingly, any matrix can be factored into a product that involves exactly one matrix in RREF and one or more of the matrices deﬁned as follows.

**Deﬁnition A.3.4**. A square matrix \(E \in \mathbb{F}^{m \times m}\) is called an elementary matrix if it has

one of the following forms:

1. (row exchange, a.k.a. “row swap”, matrix) \(E\) is obtained from the identity matrix \(I_m\) by interchanging the row vectors \(I_m^{(r, \cdot)}\) and \(I_m^{(s, \cdot)}\) for some particular choice of positive integers \(r, s \in \{1, 2, . . . , m\}.\) I.e., in the case that \(r < s,\)

\[ \]

#### HERE!

2. (row scaling matrix) \(E\) is obtained from the identity matrix \(I_m\) by replacing the row vector \(I_m^{(r, \cdot)}\) with \(\alpha I_m^{(r, \cdot)}\) for some choice of non-zero scalar \(0 \neq \alpha \in \mathbb{F}\) and some choice of positive integer \(r \in \{1, 2, . . . , m\}.\) I.e.,

\[ \]

where \(E_{rr}\) is the matrix having “\(r, r\) entry” equal to one and all other entries equal to zero. (Recall that \(E_{rr}\) was deﬁned in Section A.2.1 as a standard basis vector for the vector space \(\mathbb{F}^{m \times m} .\))

3. (row combination, a.k.a. “row sum”, matrix) \(E\) is obtained from the identity matrix \(I_m\) by replacing the row vector \(I_m\) with \(I_m^{(r, \cdot)} + \alpha I_m^{(s, \cdot)}\) for some choice of scalar \(\alpha \in \mathbb{F}\) and some choice of positive integers \(r, s \in \{1, 2, \ldots , m\}.\) I.e., in the case that \(r < s,\)

\[ \]

where \(E_{rs}\) is the matrix having “\(r, s\) entry” equal to one and all other entries equal to zero. (\(E_{rs}\) was also deﬁned in Section A.2.1 as a standard basis vector for \(\mathbb{F}^{m \times m}\) .)

The “elementary” in the name “elementary matrix” comes from the correspondence between these matrices and so-called “elementary operations” on systems of equations. In particular, each of the elementary matrices is clearly invertible (in the sense deﬁned in Section A.2.3), just as each “elementary operation” is itself completely reversible. We illustrate this correspondence in the following example.

**Example A.3.5.** Deﬁne \(A, x,\) and \(b\) by

\[ A= \left[ \begin{array}{ccc} 2 & 5 & 3 \\ 1 & 2 & 3 \\ 1 & 0 & 8 \end{array} \right],x= \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right], \rm{~and~} b= \left[ \begin{array}{c} 4 \\ 5 \\ 9 \end{array} \right].\]

We illustrate the correspondence between elementary matrices and “elementary” operations

on the system of linear equations corresponding to the matrix equation \(Ax = b\), as follows.

__System of Equations__

\[\left. \begin{array}{ccccccc} 2x_1 &+& 5x_2 &+ & 3x_3&=&5 \\ x_1&+ & 2x_2&+ & 3x_3&=&4 \\ x_1 &&& + & 8x_3&=&9 \end{array} \right.\]

__Corresponding Matrix Equation__

\[ Ax = b\]

To begin solving this system, one might want to either multiply the ﬁrst equation through by \(1/2\) or interchange the ﬁrst equation with one of the other equations. From a computational perspective, it is preferable to perform an interchange since multiplying through by \(1/2\) would unnecessarily introduce fractions. Thus, we choose to interchange the ﬁrst and second equation in order to obtain

__System of Equations__

\[\left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ 2x_1 &+& 5x_2 &+ & 3x_3&=&5\\ x_1 &&& + & 8x_3&=&9 \end{array} \right.\]

__Corresponding Matrix Equation__

\[ E_0Ax=E_0b, \rm{~where~} E_0= \left[ \begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right].\]

Another reason for performing the above interchange is that it now allows us to use more convenient “row combination” operations when eliminating the variable \(x_1\) from all but one of the equations. In particular, we can multiply the ﬁrst equation through by \(−2\) and add it to the second equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ && x_2 &- & 3x_3&=&-3\\ x_1 &&& + & 8x_3&=&9 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_1E_0Ax=E_1E_0b, \rm{~where~} E_1= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]. \]

Similarly, in order to eliminate the variable \(x_1\) from the third equation, we can next multiply the ﬁrst equation through by \(−1\) and add it to the third equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ && x_2 &- & 3x_3&=&-3\\ &&-2x_2& + & 5x_3&=&5 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_2E_1E_0Ax=E_2E_1E_0b, \rm{~where~} E_2= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{array} \right]. \]

Now that the variable \(x_1\) only appears in the ﬁrst equation, we can somewhat similarly isolate the variable \(x_2\) by multiplying the second equation through by \(2\) and adding it to the third equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ && x_2 &- & 3x_3&=&-3\\ &&& & -x_3&=&-1 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_3 \cdots E_0Ax= E_3 \cdots E_0b, \rm{~where~} E_3= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]. \]

Finally, in order to complete the process of transforming the coeﬃcient matrix into REF, we need only rescale row three by \(−1\). This corresponds to multiplying the third equation through by \(−1\) in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ && x_2 &- & 3x_3&=&-3\\ &&& & x_3&=&1 \end{array} \right. \]

__Corresponding Matrix Equation__

\[E_4 \cdots E_0Ax= E_4 \cdots E_0b, \rm{~where~} E_4= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{array} \right]. \]

Now that the coeﬃcient matrix is in REF, we can already solve for the variables \(x_1 , x_2 ,\) and \(x_3\) using a process called back substitution. In other words, it should be clear from the third equation that \(x_3 = 1.\) Using this value and solving for \(x_2\) in the second equation, it then follows that

\[ x_2 = -3 + 3x_3 = -3 + 3 = 0.\]

Similarly, by solving the ﬁrst equation for \(x_1\) , it follows that

\[x_1 = 4 - 2x_2 - 3x_3 = 4 - 3 = 1.\]

From a computational perspective, this process of back substitution can be applied to solve any system of equations when the coeﬃcient matrix of the corresponding matrix equation is in REF. However, from an algorithmic perspective, it is often more useful to continue “row reducing” the coeﬃcient matrix in order to produce a coeﬃcient matrix in full RREF.

Here, there are several next natural steps that we could now perform in order to move toward RREF. Since we have so far worked “from the top down, from left to right”, we choose to now work “from bottom up, from right to left”. In other words, we now multiply the third equation through by \(3\) and then add it to the second equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4 \\ && x_2 & & &=&0\\ &&& & x_3&=&1 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_5 \cdots E_0Ax= E_5 \cdots E_0b, \rm{~where~} E_5= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 3 \\ 0 & 0 & 1 \end{array} \right]. \]

Next, we can multiply the third equation through by \(−3\) and add it to the ﬁrst equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2& & &=&1 \\ && x_2 & & &=&0\\ &&& & x_3&=&1 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_6 \cdots E_0Ax= E_6 \cdots E_0b, \rm{~where~} E_6= \left[ \begin{array}{ccc} 1 & 0 & -3 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]. \]

Finally, we can multiply the second equation through by \(−2\) and add it to the ﬁrst equation in order to obtain

__System of Equations__

\[ \left. \begin{array}{ccccccc} x_1& & & & &=&1 \\ && x_2 & & &=&0\\ &&& & x_3&=&1 \end{array} \right. \]

__Corresponding Matrix Equation__

\[ E_7 \cdots E_0Ax= E_7 \cdots E_0b, \rm{~where~} E_7= \left[ \begin{array}{ccc} 1 & -2 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]. \]

Now it should be extremely clear that we obtained a correct solution when using back substitution on the linear system

\[ E_4 \cdots E_0Ax= E_4 \cdots E_0b. \]

However, in many applications, it is not enough to merely ﬁnd a solution. Instead, it is important to describe every solution. As we will see in the remaining sections of these notes, this requires the machinery of Linear Algebra. In particular, we will need the theory of vector spaces and linear maps.

To close this section, we take a closer look at the following expression obtained from the above analysis:

\[E_7 E_6 \cdots E_1 E_0 A = I_3 .\]

Because of the way in which we have deﬁned elementary matrices, it should be clear that each of the matrices \(E_0 , E_1 ,\ldots , E_7\) is invertible. Thus, we can use Theorem A.2.9 in order to “solve” for \(A\):

\[A = (E_7 E_6 \cdots E_1 E_0 )^{-1} I_3 = E_0^{-1} E_1^{-1} \cdots E_7^{-1} I_3 .\]

In eﬀect, since the inverse of an elementary matrix is itself easily seen to be an elementary matrix, this has factored \(A\) into the product of eight elementary matrices (namely, \(E_0^{-1} , E_1^{-1} , \ldots , E_7^{-1}\) ) and one matrix in RREF (namely, \(I_3\) ). Moreover, because each elementary matrix is invertible, we can conclude that \(x\) solves \(Ax = b\) if and only if \(x\) solves

\[(E_7 E_6 \cdots E_1 E_0 A )x = (I_3)x = (E_7 E_6 \cdots E_1 E_0)b.\]

Consequently, given any linear system, one can use Gaussian elimination in order to reduce the problem to solving a linear system whose coeﬃcient matrix is in RREF.

Similarly, we can conclude that the inverse of \(A\) is

\[A^{-1} = E_7 E_6 \cdots E_1 E_0=\left[ \begin{array}{ccc} 13 & -5 & -3 \\ -40 & 16 & 9 \\ 5 & -2 & 1 \end{array} \right].\]

Having computed this product, one could essentially “reuse” much of the above computation in order to solve the matrix equation \(Ax = b{'}\) for several diﬀerent right-hand sides \(b{'} \in \mathbb{F}^3\) . The process of “resolving” a linear system is a common practice in applied mathematics.

**A.3.2 Solving homogenous linear systems**

In this section, we study the solutions for an important special class of linear systems. As we will see in the next section, though, solving any linear system is fundamentally dependent upon knowing how to solve these so-called **homogeneous systems**.

As usual, we use m, \(n \in \mathbb{Z}_+\) to denote arbitrary positive integers.

**Deﬁnition A.3.6.** The system of linear equations, System (A.1.1), is called a** homogeneous system** if the right-hand side of each equation is zero. In other words, a homogeneous system corresponds to a matrix equation of the form

\[Ax = 0,\]

where \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix and x is an \(n\)-tuple of unknowns. We also call the set

\[N = \{v \in \mathbb{F}^n | Av = 0\}\]

the **solution space** for the homogeneous system corresponding to \(Ax = 0.\)

When describing the solution space for a homogeneous linear system, there are three important cases to keep in mind:

**Deﬁnition A.3.7.** The system of linear equations System (A.1.1) is called

**overdetermined**if \(m > n.\)**square**if \(m = n.\)**underdetermined**if \(m < n.\)

In particular, we can say a great deal about underdetermined homogeneous systems, which

we state as a corollary to the following more general result.

**Theorem A.3.8**.* Let* \(N\) *be the solution space for the homogeneous linear system corresponding to the matrix equation* \(Ax = 0,\) *where* \(A \in \mathbb{F}^{m \times n} .\) *Then*

*1. the zero vector* \(0 \in N\).

*2*. \(N\) *is a subspace of the vector space* \(\mathbb{F}^n.\)

This is an amazing theorem. Since \(N\) is a subspace of \(\mathbb{F}^n\) , we know that either \(N\) will contain exactly one element (namely, the zero vector) or \(N\) will contain inﬁnitely many elements.

**Corollary A.3.9.** *Every homogeneous system of linear equations is solved by the zero vector. Moreover, every underdetermined homogeneous system has inﬁnitely many solution.*

We call the zero vector the** trivial solution** for a homogeneous linear system. The fact that every homogeneous linear system has the trivial solution thus reduces solving such a system to determining if solutions other than the trivial solution exist.

One method for ﬁnding the solution space of a homogeneous system is to ﬁrst use Gaussian elimination (as demonstrated in Example A.3.5) in order to factor the coeﬃcient matrix of the system. Then, because the original linear system is homogeneous, the homogeneous system corresponding to the resulting RREF matrix will have the same solutions as the original system. In other words, if a given matrix \(A\) satisﬁes

\[E_k E_{k-1} \cdots E_0 A = A_0 ,\]

where each \(E_i\) is an elementary matrix and \(A_0\) is an RREF matrix, then the matrix equation \(Ax = 0\) has the exact same solution set as \(A_0 x = 0\) since \(E_0^{-1} E_1^{-1} \cdots E_k^{-1} 0 = 0.\)

**Example A.3.10.** In the following examples, we illustrate the process of determining the solution space for a homogeneous linear system having coeﬃcient matrix in RREF.

1. Consider the matrix equation \(Ax = 0\), where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{array} \right].\]

This corresponds to an overdetermined homogeneous system of linear equations. Moreover, since there are no free variables (as deﬁned in Example A.3.3), it should be clear that this system has only the trivial solution. Thus, \(N = \{0\}.\)

2. Consider the matrix equation \(Ax = 0,\) where \(A\) is the matrix given by

\[A= \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right].\]

This corresponds to an overdetermined homogeneous system of linear equations. Unlike the above example, we see that \(x_3\) is a free variable for this system, and so we would expect the solution space to contain more than just the zero vector. As in Example A.3.3, we can solve for the leading variables in terms of the free variable in order to obtain

\[ \left. \begin{array}{cccc} x_1 & = & -& x_3 \\ x_2 & = & -& x_3 \\ \end{array} \right\},\]

It follows that, given any scalar \(\alpha \in \mathbb{F}\), every vector of the form

\[ x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} -\alpha \\ -\alpha \\ \alpha \end{array} \right]=\alpha \left[ \begin{array}{c} -1 \\ -1 \\ 1 \end{array} \right] \]

is a solution to \(Ax = 0.\) Therefore,

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = -x_3 , x_2 = -x_3\} = span ((-1, -1, 1)) .\]

3. Consider the matrix equation \(Ax = 0,\) where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]. \]

This corresponds to a square homogeneous system of linear equations with two free variables. Thus, using the same technique as n the previous example, we can solve for the leading variable in order to obtain \(x_1 = -x_2 - x_3 .\) It follows that, given any scalars \(\alpha , \beta \in \mathbb{F},\) every vector of the form

\[ x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} -\alpha - \beta \\ \alpha \\ \beta \end{array} \right]=\alpha \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right] + \beta \left[ \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \right] \]

is a solution to \(Ax = 0.\) Therefore,

\[N = (x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = 0 = span ((-1, 1, 0), (-1, 0, 1)).\]

**A.3.3 Solving inhomogeneous linear systems**

In this section, we demonstrate the relationship between arbitrary linear systems and homogeneous linear systems. Speciﬁcally, we will see that it takes little more work to solve a general linear system than it does to solve the homogeneous system associated to it.

As usual, we use \(m, n \in \mathbb{Z}_+\) to denote arbitrary positive integers.

**Deﬁnition A.3.11.** The system of linear equations System (A.1.1) is called an** inhomogeneous system** if the right-hand side of at least one equation is not zero. In other words, an inhomogeneous system corresponds to a matrix equation of the form

\[Ax = b,\]

where \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix, \(x\) is an \(n\)-tuple of unknowns, and \(b \in \mathbb{F}^m\) is a vector having at least one non-zero component. We also call the set

\[U = \{v \in \mathbb{F}^n ~|~ Av = b\}\]

the solution set for the linear system corresponding to \(Ax = b.\)

As illustrated in Example A.3.3, the zero vector cannot be a solution for an inhomogeneous system. Consequently, the solution set \(U\) for an inhomogeneous linear system will never be a subspace of any vector space. Instead, it will be a related algebraic structure as described in the following theorem.

**Theorem A.3.12. ***Let* \(U\) *be the solution space for the inhomogeneous linear system corresponding to the matrix equation* \(Ax = b,\) *where* \(A \in \mathbb{F}^{m \times n}\) *and* \(b \in \mathbb{F}^m\) *is a vector having at least one non-zero component. Then, given any element* \(u \in U,\) *we have that*

\[U = u + N = \{u + n ~|~ n \in N\} ,\]

*where* \(N\) *is the solution space to* \(Ax = 0.\)* In other words, if* \(B = (n^{(1)} , n^{(2)} , \ldots , n^{(k)} )\)* is a list of vectors forming a basis for* \(N\), *then every element of* \(U\) *can be written in the form *

\[u + \alpha_1 n^{(1)} + \alpha_2 n^{(2)} + \ldots + \alpha_k n^{(k)}\]

*for some choice of scalars* \(\alpha_1 , \alpha_2 , \ldots , \alpha_k \in \mathbb{F}.\)

As a consequence of this theorem, we can conclude that inhomogeneous linear systems behave a lot like homogeneous systems. The main diﬀerence is that inhomogeneous systems are not necessarily solvable. This, then, creates three possibilities: an inhomogeneous linear system will either have no solution, a unique solution, or inﬁnitely many solutions. An important special case is as follows.

**Corollary A.3.13.** *Every overdetermined inhomogeneous linear system will necessarily be unsolvable for some choice of values for the right-hand sides of the equations. *

The solution set \(U\) for an inhomogeneous linear system is called an** aﬃne** subspace of \(\mathbb{F}n\) since it is a genuine subspace of \(\mathbb{F}^n\) that has been “oﬀset” by a vector \(u \in \mathbb{F}n\) . Any set having this structure might also be called a **coset** (when used in the context of Group Theory) or a **linear manifold** (when used in a geometric context such as a discussion of

intersection hyperplanes).

In order to actually ﬁnd the solution set for an inhomogeneous linear system, we rely on Theorem A.3.12. Given an \(m \times n\) matrix \(A \in \mathbb{F}^{m \times n}\) and a non-zero vector \(b \in \mathbb{F}^m\) , we call \(Ax = 0\) the **associated homogeneous matrix equation** to the inhomogeneous matrix equation \(Ax = b.\) Then, according to Theorem A.3.12, \(U\) can be found by ﬁrst ﬁnding the solution space \(N\) for the associated equation \(Ax = 0\) and then ﬁnding any so-called **particular solution** \(u \in \mathbb{F}^n\) to \(Ax = b.\)

As with homogeneous systems, one can ﬁrst use Gaussian elimination in order to factorize \(A,\) and so we restrict the following examples to the special case of RREF matrices.

**Example A.3.14**. The following examples use the same matrices as in Example A.3.10.

1. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{array} \right] \]

and \(b \in \mathbb{F}^4\) has at least one non-zero component. Then \(Ax = b\) corresponds to an overdetermined inhomogeneous system of linear equations and will not necessarily be solvable for all possible choices of \(b.\)

In particular, note that the bottom row \(A^{(4, \cdot)}\) of \(A\) corresponds to the equation

\[0 = b4 ,\]

from which \(Ax = b\) has no solution unless the fourth component of \(b\) is zero. Furthermore, the remaining rows of \(A\) correspond to the equations

\[x_1 = b_1 , x_2 = b_2 , \rm{~and~} x_3 = b_3 .\]

It follows that, given any vector \(b \in \mathbb{F}^n\) with fourth component zero, \(x = b\) is the only solution to \(Ax = b.\) In other words, \(U = \{b\}.\)

2. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right] \]

and \(b \in \mathbb{F}4 .\) This corresponds to an overdetermined inhomogeneous system of linear equations. Note, in particular, that the bottom two rows of the matrix corresponds to the equations \(0 = b_3\) and \(0 = b_4 ,\) from which \(Ax = b\) has no solution unless the third and fourth component of the vector \(b\) are both zero. Furthermore, we conclude from the remaining rows of the matrix that \(x_3\) is a free variable for this system and that

\[ \left. \begin{array}{ccccc} x_1 & =&b_1 & -& x_3 \\ x_2 & = &b_2& -& x_3 \\ \end{array} \right\}. \]

It follows that, given any scalar \(\alpha \in \mathbb{F},\) every vector of the form

\[ x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} b_1 - \alpha \\ b_2-\alpha \\ \alpha \end{array} \right]= \left[ \begin{array}{c} b_1 \\ b_2 \\ 0 \end{array} \right] + \alpha \left[ \begin{array}{c} -1 \\ -1 \\ 1 \end{array} \right] = u+ \alpha n \]

is a solution to \(Ax = b.\) Recall from Example A.3.10 that the solution space for the associated homogeneous matrix equation \(Ax = 0\) is

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = -x_3 , x_2 = -x_3\} = span ((-1, -1, 1)) .\]

Thus, in the language of Theorem A.3.12, we have that \(u\) is a particular solution for

\(Ax = b\) and that \((n)\) is a basis for \(N.\) Therefore, the solution set for \(Ax = b\) is

\[U = (b_1 , b_2 , 0) + N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = b_1 - x_3 , x_2 = b_2 - x_3\} .\]

3. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right] \]

and \(b \in \mathbb{F}^4\) . This corresponds to a square inhomogeneous system of linear equations with two free variables. As above, this system has no solutions unless \(b_2 = b_3 = 0,\) and, given any scalars \(\alpha , \beta \in \mathbb{F},\) every vector of the form

\[ x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} b_1 - \alpha -\beta \\ \alpha \\ \beta \end{array} \right]= \left[ \begin{array}{c} b_1 \\ 0 \\ 0 \end{array} \right] + \alpha \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right] + \beta \left[ \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \right] = u+ \alpha n^{(1)}+ \beta n^{(2)} \]

is a solution to \(Ax = b.\) Recall from Example A.3.10, that the solution space for the associated homogeneous matrix equation \(Ax = 0\) is

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = 0\} = span ((-1, 1, 0), (-1, 0, 1)) .\]

Thus, in the language of Theorem A.3.12, we have that u is a particular solution for \(Ax = b\) and that \((n^{(1)} , n^{(2)} )\) is a basis for \(N.\) Therefore, the solution set for \(Ax = b\) is

\[U = (b_1 , 0, 0) + N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = b_1\} .\]

**A.3.4 Solving linear systems with LU-factorization**

Let \(n \in \mathbb{Z}_+\) be a positive integer, and suppose that \(A \in \mathbb{F}^{n \times n}\) is an upper triangular matrix and that \(b \in \mathbb{F}^n\) is a column vector. Then, in order to solve the matrix equation \(Ax = b\), there is no need to apply Gaussian elimination. Instead, we can exploit the triangularity of \(A\) in order to directly obtain a solution.

Using the notation in System (A.1.1), note that the last equation in the linear system corresponding to \(Ax = b\) can only involve the single unknown \(x_n \), and so we can obtain the solution

\[x_n = \frac{b_n}{a_{nn}}\]

as long as \(a_{nn} \neq 0.\) If \(a_{nn} = 0,\) then we must be careful to distinguish between the two cases in which \(b_n = 0\) or \(b_n \neq 0.\) Thus, for reasons that will become clear below, we assume that the diagonal elements of \(A\) are all nonzero. Under this assumption, there is no ambiguity in substitution the solution for \(x_n\) into the penultimate (a.k.a. second-to-last) equation. Since \(A\) is upper triangular, the penultimate equation involves only the single unknown \(x_{n-1} ,\) and so we obtain the solution

\[ x_{n-1} = \frac{b_{n-1}-a_{n-1,n}x_n }{a_{n-1,n-1}}. \]

We can then similarly substitute the solutions for \(x_n\) and \(x_{n−1}\) into the ante penultimate (a.k.a. third-to-last) equation in order to solve for \(x_{n−2}\) , and so on until a complete solution is found. In particular,

\[ x_1 = \frac{b_1-\sum_{k=2}^na_{nk}x_{k} }{a_{1 1}}. \]

As in Example A.3.5, we call this process **back substitution**. Given an arbitrary linear system, back substitution essentially allows us to halt the Gaussian elimination procedure and immediately obtain a solution for the system as soon as an upper triangular matrix (possibly in REF or even RREF) has been obtained from the coeﬃcient matrix.

A similar procedure can be applied when \(A\) is lower triangular. Again using the notation in System (A.1.1), the ﬁrst equation contains only \(x_1\) , and so

\[ x_1= \frac{b_1}{a_{1 1}}.\]

We are again assuming that the diagonal entries of \(A\) are all nonzero. Then, acting similarly

to back substitution, we can substitute the solution for \(x_1\) into the second equation in order

to obtain

\[ x_2= \frac{b_2-a_{2 1}x_1}{a_{2 2}}.\]

Continuing this process, we have created a** forward substitution **procedure. In particular,

\[ x_n = \frac{b_n-\sum_{k=2}^na_{nk}x_{k} }{a_{n n}}. \]

More generally, suppose that \(A \in \mathbb{F}^{n \times n}\) is an arbitrary square matrix for which there exists a lower triangular matrix \(L \in \mathbb{F}^{n \times n}\) and an upper triangular matrix \(U \in \mathbb{F}^{n \times n}\) such that \(A = LU.\) When such matrices exist, we call \(A = LU\) an **LU-factorization **(a.k.a. **LU- decomposition**) of \(A\). The beneﬁt of such a factorization is that it allows us to exploit the triangularity of \(L\) and \(U\) when solving linear systems having coeﬃcient matrix \(A\).

To see this, suppose that \(b \in \mathbb{F}^n\) is a column vector. (As above, we also assume that the none of the diagonal entries in either \(L\) or \(U\) is zero.) Furthermore, set \(y = Ux\), where \(x\) is the as yet unknown solution of \(Ax = b.\) Then, by substitution, \(y\) must satisfy

\[Ly = b,\]

and so, since \(L\) is lower triangular, we can immediately solve for \(y\) via forward substitution. In other words, we are using the associative of matrix multiplication (cf. Theorem A.2.6) in order to conclude that

\[(A)x = (LU)x = L(Ux) = L(y)\]

Then, once we have obtained \(y \in \mathbb{F}^n\) , we can apply back substitution in order to solve for \(x\) To see this, suppose that \(A = LU\) is an **LU**-factorization for the matrix \(A \in \mathbb{F}^{n \times n}\) and in the matrix equation

\[Ux = y.\]

In general, one can only obtain an **LU**-factorization for a matrix \(A \in \mathbb{F}^{n \times n}\) when there exist elementary “row combination” matrices \(E_1 , E_2 , \ldots , E_k \in \mathbb{F}^{n \times n}\) and an upper triangular matrix \(U\) such that

\[E_k E_{k-1} \cdots E_1 A = U.\]

There are various generalizations of **LU**-factorization that allow for more than just elementary “row combinations” matrices in this product, but we do not mention them here. Instead, we provide a detailed example that illustrates how to obtain an **LU**-factorization and then

how to use such a factorization in solving linear systems.

**Example A.3.15.** Consider the matrix \(A \in \mathbb{F}^{3 \times 3}\) given by

\[ A = \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right]. \]

Using the techniques illustrated in Example A.3.5, we have the following matrix product:

\[ \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]\left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right]= \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right]=U. \]

In particular, we have found three elementary “row combination” matrices, which, when multiplied by \(A\), produce an upper triangular matrix \(U.\)

Now, in order to produce a lower triangular matrix \(L\) such that \(A = LU\), we rely on two facts about lower triangular matrices. First of all, any lower triangular matrix with entirely no-zero diagonal is invertible, and, second, the product of lower triangular matrices is always lower triangular. (Cf. Theorem A.2.7.) More speciﬁcally, we have that

\[ \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right]=\left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right]. \]

where

\[ \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]^{-1} = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array} \right]= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 2 & -2 & 1 \end{array} \right].\]

We call the resulting lower triangular matrix \(L\) and note that \(A = LU,\) as desired.

Now, deﬁne \(x, y,\) and \(b\) by

\[x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right], y= \left[ \begin{array}{c} y_1 \\ y_2 \\ y_3 \end{array} \right], \rm{~and~} b= \left[ \begin{array}{c} 6 \\ 16 \\ 2 \end{array} \right].\]

Applying forward substitution to \(Ly = b,\) we obtain the solution

\[ \left. \begin{array}{ccccccccc} y_1& = &b_1&&&&&=&6 \\ y_2& =& b_2& - &2y_1&&&=&4 \\ y_3& = &b_3& -& 2y_1& +& 2y_2& =& -2\end{array} \right\}. \]

Then, given this unique solution \(y\) to \(Ly = b,\) we can apply backward substitution to \(Ux = y\) in order to obtain

\[ \left. \begin{array}{ccccccccc} 2x_1& = &y_1&-&3x_2&-&4x_3&=&8 \\ -1x_2& =& y_2& &&-&2x_3&=&2 \\ 2x_3& = &y_3& & & & & =& -2\end{array} \right\}. \]

It follows that the unique solution to \(Ax = b\) is

\[ \left. \begin{array}{ccccccccc} x_1& =&4 \\ x_2& =&-2 \\ x_3& = & 1 \end{array} \right\}. \]

In summary, we have given an algorithm for solving any matrix equation \(Ax = b\) in which \(A = LU,\) where \(L\) is lower triangular, \(U\) is upper triangular, and both \(L\) and \(U\) have nothing but non-zero entries along their diagonals.

We note in closing that the simple procedures of back and forward substitution can also be used for computing the inverses of lower and upper triangular matrices. E.g., the inverse \(U = (u_{ij} )\) of the matrix

\[ U^{-1} = \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right] \]

must satify

\[ \left[ \begin{array}{ccc} 2u_{1 1}+ 3u_{2 1}+4u_{3 1}& 2u_{1 2}+ 3u_{2 2}+4u_{3 2} & 2u_{1 3}+ 3u_{2 3}+4u_{3 3} \\ -u_{2 1}+3u_{3 1} & -u_{2 2}+3u_{3 2} & -u_{2 3}+3u_{3 3} \\ -2u_{3 1} & -2u_{3 2} & -2u_{3 3} \end{array} \right]=U^{-1}U = I_{3}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right], \]

from which we obtain the linear system

\[ \left. \begin{array}{ccccccccccc} 2u_{1 1}&&&+3u_{2 1}&&& +4u_{3 1}&&& =&1 \\ &2u_{1 2}&&&+3u_{2 2} &&& +4u_{3 2} &&=&0 \\ &&2u_{1 3}& &&+3u_{23}&&& +4u_{33} &= & 0 \\ &&&-u_{21}&&& +3u_{31}&&& =& 0 \\ &&&& -u_{22} &&&+3u_{32}&& = &1 \\ &&&&&-u_{23}&&& +3u_{33} &=& 0 \\ &&&&&&-2u_{31} &&&=& 0 \\ &&&&&&&-2u_{32}&& =& 0 \\ &&&&&&&&-2u_{33}& = &1\end{array} \right\}. \]

in the nine variables \(u_{11} , u_{12} , \ldots , u_{33} .\) Since this linear system has upper triangular coeﬃcient matrix, we can apply back substitution in order to directly solve for the entries in \(U.\)

The only condition we imposed upon our triangular matrices above was that all diagonal entries were non-zero. It should be clear to you that this non-zero diagonal restriction is a necessary and suﬃcient condition for a triangular matrix to be non-singular. Moreover, once the inverses of both \(L\) and \(U\) in an **LU**-factorization have been obtained, then we can immediately calculate the inverse for \(A = LU\) by applying Theorem A.2.9(4):

\[ A^{-1} = (LU)^{-1} = U^{-1} L^{-1}. \].

### Contributors

- Isaiah Lankham, Mathematics Department at UC Davis
- Bruno Nachtergaele, Mathematics Department at UC Davis
- Anne Schilling, Mathematics Department at UC Davis

Both hardbound and softbound versions of this textbook are available online at WorldScientific.com.