Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

A.3 Solving linear systems by factoring the coefficient matrix

There are many ways in which you might try to solve a given system of linear equations. This section is primarily devoted to describing two particularly popular techniques, both of which involve factoring the coefficient matrix for the system into a product of simpler matrices. These techniques are also at the heart of many frequently used numerical (i.e., computer-assisted) applications of Linear Algebra.

    You should note that the factorization of complicated objects into simpler components is an extremely common problem solving technique in mathematics. E.g., we will often factor a given polynomial into several polynomials of lower degree, and one can similarly use the prime factorization for an integer in order to simplify certain numerical computations.

A.3.1 Factorizing matrices using Gaussian elimination

In this section, we discuss a particularly significant factorization for matrices known as Gaussian elimination (a.k.a. Gauss-Jordan elimination). Gaussian elimination can be used to express any matrix as a product involving one matrix in so-called reduced row-echelon form and one or more so-called elementary matrices. Then, once such a factorization has been found, we can immediately solve any linear system that has the factorized matrix as its coefficient matrix. Moreover, the underlying technique for arriving at such a factorization is essentially an extension of the techniques already familiar to you for solving small systems of linear equations by hand.

 Let \(m, n \in \mathbb{Z}_+\) denote positive integers, and suppose that \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix over \(\mathbb{F}.\) Then, following Section A.2.2, we will make extensive use of \(A^{(i, \cdot)}\) and \(A^{(\cdot ,j)}\) to
denote the row vectors and column vectors of \(A\), respectively.

Definition A.3.1. Let \(A \in \mathbb{F}^{m \times n}\) be an \(m \times n\) matrix over \(mathbb{F}.\) Then we say that \(A\) is in
row-echelon form (abbreviated REF) if the rows of \(A\) satisfy the following conditions:

  1.  either \(A^{(1,\cdot)}\) is the zero vector or the first non-zero entry in  \(A^{(1,\cdot)}\) (when read from left to right) is a one.
  2.  for \(i = 1, \ldots, m,\) if any row vector \(A^{(i,\cdot)}\) is the zero vector, then each subsequent row vector \(A^{(i+1,\cdot)} ,\ldots , A^{(m,\cdot)}\) is also the zero vector.
  3. for \(i = 2, \ldots , m,\) if some \(A^{(i,\cdot)}\)is not the zero vector, then the first non-zero entry (when read from left to right) is a one and occurs to the right of the initial one in  \(A^{(i-1,\cdot)}\).

The initial leading one in each non-zero row is called a pivot. We furthermore say that A is in reduced row-echelon form (abbreviated RREF) if

      (4) for each column vector \(A(\cdot,j)\) containing a pivot \((j = 2, . . . , n),\) the pivot is the only non-zero element in \(A(\cdot,j)\).

The motivation behind Definition A.3.1 is that matrix equations having their coffcient matrix in RREF (and, in some sense, also REF) are particularly easy to solve. Note, in particular, that the only square matrix in RREF without zero rows is the identity matrix.

Example A.3.2. The following matrices are all in REF:

\[A_1=\left[ \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right], A_2= \left[ \begin{array}{cccc} 1 & 1 & 1 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right], A_3= \left[ \begin{array}{cccc} 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right], A_4= \left[ \begin{array}{cccc} 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right],\\ A_5= \left[ \begin{array}{cccc} 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right], A_6= \left[ \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right], A_7=\left[ \begin{array}{cccc} 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right], A_8=\left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \right].\]

However, only \(A_4\) through \(A_8\) are in RREF, as you should verify. Moreover, if we take the
transpose of each of these matrices (as defined in Section A.5.1), then only \(A^T_6 , A^T_7 ,\) and \(A^T_8\) are in RREF.

Example A.3.3.
1. Consider the following matrix in RREF:

\[A = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right].\]

 

Given any vector \(b = (b_i ) \in \mathbb{F}^4 \), the matrix equation \(Ax = b \)corresponds to the system
of equations

\[\left. \begin{array}{cccccc} x_1 &~~ &~~~~ &~~~ & = & b_1 \\  & x_2 &  & & = & b_2 \\  &  & x_3 & & = &b_3 \\  &  &  & x_4 & = & b_4\end{array} \right\}.\]


Since \(A\) is in RREF (in fact, \(A = I_4\) is the \(4 \times 4\) identity matrix), we can immediately conclude that the matrix equation \(Ax = b\) has the solution \(x = b\) for any choice of \(b\).
Moreover, as we will see in Section A.4.2, \(x = b\) is the only solution to this system.

2. Consider the following matrix in RREF:

\[A = \left[ \begin{array}{cccccc} 1 & 6 & 0 & 0 & 4 & -2 \\ 0 & 0 & 1 & 0 & 3 & 1 \\ 0 & 0 & 0 & 1 & 5 & 2 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right].\]

Given any vector \(b = (b_i ) \in \mathbb{F}^4 \), the matrix equation \(Ax = b\) corresponds to the system

\[ \left. \begin{array}{cccccccccccc} x_1 & + &  6x_2 &  &   & & + & 4x_5 & - & 2x_6 & = & b_1 \\  &  &  & & x_3 &   &+ &3x_5 &+& x_6 & =& b_2 \\ && &  &  & x_4 & + & 5x_5 && 2x_6 &= &b_3 \\ & & & & & & & &  & 0 &=&b_4\end{array} \right\}.\]

Since \(A\) is in RREF, we can immediately conclude a number of facts about solutions to this system. First of all, solutions exist if and only if \(b_4 = 0.\) Moreover, by “solving for the pivots”, we see that the system reduces to


\[\left. \begin{array}{cccccccc} x_1 & = & b_1 & −6x_2 & − & 4x_5 & + & 2x_6 \\ x_3 & = & b_2 & & − & 3x_5 & + & x_6 \\ x_4 & = & b_3 & & − & 5x_5 &  + & 2x_6  \end{array} \right\},\]


and so there is only enough information to specify values for \(x_1, x_3,\) and \(x_4\) in terms of the otherwise arbitrary values for \(x_2 , x_5 ,\) and \(x_6 .\)

In this context, \(x_1 , x_3 ,\) and \(x_4\) are called leading variables since these are the variable corresponding to the pivots in \(A\). We similarly call \(x_2 , x_5 \), and \(x_6\) free variables since the leading variables have been expressed in terms of these remaining variable. In particular, given any scalars \(\alpha , \beta, \gamma \in \mathbb{F},\) it follows that the vector

\[x= \left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \\ x_6 \end{array} \right]=\left[ \begin{array}{c} b_1 - 6\alpha -4\beta+2\gamma \\ \alpha \\ b_2-3\beta -\gamma \\ b_3-5\beta -2\gamma \\ \beta \\ \gamma \end{array} \right]= \left[ \begin{array}{c} b_1 \\ 0 \\ b_2 \\ b_3 \\ 0 \\ 0 \end{array} \right]+\left[ \begin{array}{c} -6\alpha \\ \alpha \\ 0 \\ 0 \\ 0 \\ 0 \end{array} \right]+\left[ \begin{array}{c} -4\beta \\ 0 \\ -3\beta \\ -5\beta \\ \beta \\ 0 \end{array} \right]+\left[ \begin{array}{c} 2\gamma \\ 0 \\ -\gamma \\ -2\gamma \\ 0 \\ \gamma \end{array} \right]\]

must satisfy the matrix equation \(Ax = b.\) One can also verify that every solution to
the matrix equation must be of this form. It then follows that the set of all solutions should somehow be “three dimensional”. As the above examples illustrate, a matrix equation having coefficient matrix in RREF
corresponds to a system of equations that can be solved with only a small amount of computation. Somewhat amazingly, any matrix can be factored into a product that involves exactly one matrix in RREF and one or more of the matrices defined as follows.

Definition A.3.4. A square matrix \(E \in \mathbb{F}^{m \times m}\) is called an elementary matrix if it has
one of the following forms:

1. (row exchange, a.k.a. “row swap”, matrix) \(E\) is obtained from the identity matrix \(I_m\) by interchanging the row vectors \(I_m^{(r, \cdot)}\) and \(I_m^{(s, \cdot)}\) for some particular choice of positive integers \(r, s \in \{1, 2, . . . , m\}.\) I.e., in the case that \(r < s,\)

\[      \]

HERE!


2. (row scaling matrix) \(E\) is obtained from the identity matrix \(I_m\) by replacing the row vector \(I_m^{(r, \cdot)}\) with \(\alpha I_m^{(r, \cdot)}\) for some choice of non-zero scalar \(0 \neq \alpha \in \mathbb{F}\) and some choice of positive integer \(r \in \{1, 2, . . . , m\}.\) I.e.,

\[      \]

where \(E_{rr}\) is the matrix having “\(r, r\) entry” equal to one and all other entries equal to zero. (Recall that \(E_{rr}\) was defined in Section A.2.1 as a standard basis vector for the vector space \(\mathbb{F}^{m \times m} .\))

3. (row combination, a.k.a. “row sum”, matrix) \(E\) is obtained from the identity matrix \(I_m\) by replacing the row vector \(I_m\) with \(I_m^{(r, \cdot)} + \alpha I_m^{(s, \cdot)}\) for some choice of scalar \(\alpha \in \mathbb{F}\) and some choice of positive integers \(r, s \in \{1, 2, \ldots , m\}.\) I.e., in the case that \(r < s,\)

\[       \]

where \(E_{rs}\) is the matrix having “\(r, s\) entry” equal to one and all other entries equal to zero. (\(E_{rs}\) was also defined in Section A.2.1 as a standard basis vector for \(\mathbb{F}^{m \times m}\) .)

       The “elementary” in the name “elementary matrix” comes from the correspondence between these matrices and so-called “elementary operations” on systems of equations. In particular, each of the elementary matrices is clearly invertible (in the sense defined in Section A.2.3), just as each “elementary operation” is itself completely reversible. We illustrate this correspondence in the following example.

Example A.3.5. Define \(A, x,\) and \(b\) by

\[ A= \left[ \begin{array}{ccc} 2 & 5 & 3 \\ 1 & 2 & 3 \\ 1 & 0 & 8 \end{array} \right],x= \left[ \begin{array}{c} x_1  \\ x_2  \\ x_3  \end{array} \right], \rm{~and~} b= \left[ \begin{array}{c} 4  \\ 5  \\ 9  \end{array} \right].\]

We illustrate the correspondence between elementary matrices and “elementary” operations
on the system of linear equations corresponding to the matrix equation \(Ax = b\), as follows.

System of Equations

\[\left. \begin{array}{ccccccc} 2x_1 &+& 5x_2 &+ & 3x_3&=&5 \\ x_1&+ & 2x_2&+ & 3x_3&=&4 \\ x_1 &&& + & 8x_3&=&9 \end{array} \right.\]

Corresponding Matrix Equation

\[ Ax = b\]

To begin solving this system, one might want to either multiply the first equation through by \(1/2\) or interchange the first equation with one of the other equations. From a computational perspective, it is preferable to perform an interchange since multiplying through by \(1/2\) would unnecessarily introduce fractions. Thus, we choose to interchange the first and second equation in order to obtain

System of Equations

\[\left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\  2x_1 &+& 5x_2 &+ & 3x_3&=&5\\ x_1 &&& + & 8x_3&=&9 \end{array} \right.\]

Corresponding Matrix Equation

\[ E_0Ax=E_0b, \rm{~where~} E_0= \left[ \begin{array}{ccc} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{array} \right].\]

Another reason for performing the above interchange is that it now allows us to use more convenient “row combination” operations when eliminating the variable \(x_1\) from all but one of the equations. In particular, we can multiply the first equation through by \(−2\) and add it to the second equation in order to obtain

System of Equations

\[   \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\   && x_2 &- & 3x_3&=&-3\\ x_1 &&& + & 8x_3&=&9 \end{array} \right.  \]

Corresponding Matrix Equation

\[   E_1E_0Ax=E_1E_0b, \rm{~where~} E_1= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right].  \]

Similarly, in order to eliminate the variable \(x_1\) from the third equation, we can next multiply the first equation through by \(−1\) and add it to the third equation in order to obtain

System of Equations

\[ \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\   && x_2 &- & 3x_3&=&-3\\  &&-2x_2& + & 5x_3&=&5 \end{array} \right. \]


Corresponding Matrix Equation

\[  E_2E_1E_0Ax=E_2E_1E_0b, \rm{~where~} E_2= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \end{array} \right].  \]


Now that the variable \(x_1\) only appears in the first equation, we can somewhat similarly isolate the variable \(x_2\) by multiplying the second equation through by \(2\) and adding it to the third equation in order to obtain

System of Equations

\[    \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\   && x_2 &- & 3x_3&=&-3\\  &&&  & -x_3&=&-1 \end{array} \right.  \]

Corresponding Matrix Equation

\[   E_3 \cdots E_0Ax= E_3 \cdots E_0b, \rm{~where~} E_3= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right].  \]

Finally, in order to complete the process of transforming the coefficient matrix into REF, we need only rescale row three by \(−1\). This corresponds to multiplying the third equation through by \(−1\) in order to obtain

System of Equations

\[   \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\   && x_2 &- & 3x_3&=&-3\\  &&&  & x_3&=&1 \end{array} \right.  \]
Corresponding Matrix Equation

\[E_4 \cdots E_0Ax= E_4 \cdots E_0b, \rm{~where~} E_4= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{array} \right]. \]

Now that the coefficient matrix is in REF, we can already solve for the variables \(x_1 , x_2 ,\) and \(x_3\) using a process called back substitution. In other words, it should be clear from the third equation that \(x_3 = 1.\) Using this value and solving for \(x_2\) in the second equation, it then follows that

\[ x_2 = -3 + 3x_3 = -3 + 3 = 0.\]

Similarly, by solving the first equation for \(x_1\) , it follows that

\[x_1 = 4 - 2x_2 - 3x_3 = 4 - 3 = 1.\]

From a computational perspective, this process of back substitution can be applied to solve any system of equations when the coefficient matrix of the corresponding matrix equation is in REF. However, from an algorithmic perspective, it is often more useful to continue “row reducing” the coefficient matrix in order to produce a coefficient matrix in full RREF.

Here, there are several next natural steps that we could now perform in order to move toward RREF. Since we have so far worked “from the top down, from left to right”, we choose to now work “from bottom up, from right to left”. In other words, we now multiply the third equation through by \(3\) and then add it to the second equation in order to obtain

System of Equations

\[    \left. \begin{array}{ccccccc} x_1&+ & 2x_2&+ & 3x_3&=&4  \\   && x_2 & & &=&0\\  &&&  & x_3&=&1 \end{array} \right.   \]

Corresponding Matrix Equation

\[  E_5 \cdots E_0Ax= E_5 \cdots E_0b, \rm{~where~} E_5= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 3 \\ 0 & 0 & 1 \end{array} \right].   \]


Next, we can multiply the third equation through by \(−3\) and add it to the first equation in order to obtain

System of Equations

\[  \left. \begin{array}{ccccccc} x_1&+ & 2x_2& & &=&1  \\   && x_2 & & &=&0\\  &&&  & x_3&=&1 \end{array} \right.    \]

Corresponding Matrix Equation

\[  E_6 \cdots E_0Ax= E_6 \cdots E_0b, \rm{~where~} E_6= \left[ \begin{array}{ccc} 1 & 0 & -3 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right].   \]


Finally, we can multiply the second equation through by \(−2\) and add it to the first equation in order to obtain

System of Equations

\[   \left. \begin{array}{ccccccc} x_1& & & & &=&1  \\   && x_2 & & &=&0\\  &&&  & x_3&=&1 \end{array} \right.  \]

Corresponding Matrix Equation

\[  E_7 \cdots E_0Ax= E_7 \cdots E_0b, \rm{~where~} E_7= \left[ \begin{array}{ccc} 1 & -2 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right].    \]

Now it should be extremely clear that we obtained a correct solution when using back substitution on the linear system

\[  E_4 \cdots E_0Ax= E_4 \cdots E_0b.  \]

However, in many applications, it is not enough to merely find a solution. Instead, it is important to describe every solution. As we will see in the remaining sections of these notes, this requires the machinery of Linear Algebra. In particular, we will need the theory of vector spaces and linear maps.

      To close this section, we take a closer look at the following expression obtained from the above analysis:

\[E_7 E_6 \cdots E_1 E_0 A = I_3 .\]

Because of the way in which we have defined elementary matrices, it should be clear that each of the matrices \(E_0 , E_1 ,\ldots , E_7\) is invertible. Thus, we can use Theorem A.2.9 in order to “solve” for \(A\):


\[A = (E_7 E_6 \cdots E_1 E_0 )^{-1} I_3 = E_0^{-1} E_1^{-1} \cdots E_7^{-1} I_3 .\]

In effect, since the inverse of an elementary matrix is itself easily seen to be an elementary matrix, this has factored \(A\) into the product of eight elementary matrices (namely, \(E_0^{-1} , E_1^{-1} , \ldots , E_7^{-1}\) ) and one matrix in RREF (namely, \(I_3\) ). Moreover, because each elementary matrix is invertible, we can conclude that \(x\) solves \(Ax = b\) if and only if \(x\) solves

\[(E_7 E_6 \cdots E_1 E_0 A )x = (I_3)x = (E_7 E_6 \cdots E_1 E_0)b.\]

Consequently, given any linear system, one can use Gaussian elimination in order to reduce the problem to solving a linear system whose coefficient matrix is in RREF.
      Similarly, we can conclude that the inverse of \(A\) is

\[A^{-1} = E_7 E_6 \cdots E_1 E_0=\left[ \begin{array}{ccc} 13 & -5 & -3 \\ -40 & 16 & 9 \\ 5 & -2 & 1 \end{array} \right].\]

Having computed this product, one could essentially “reuse” much of the above computation in order to solve the matrix equation \(Ax = b{'}\) for several different right-hand sides \(b{'} \in \mathbb{F}^3\) . The process of “resolving” a linear system is a common practice in applied mathematics.

A.3.2 Solving homogenous linear systems

In this section, we study the solutions for an important special class of linear systems. As we will see in the next section, though, solving any linear system is fundamentally dependent upon knowing how to solve these so-called homogeneous systems.

     As usual, we use m, \(n \in \mathbb{Z}_+\) to denote arbitrary positive integers.

Definition A.3.6. The system of linear equations, System (A.1.1), is called a homogeneous system if the right-hand side of each equation is zero. In other words, a homogeneous system corresponds to a matrix equation of the form

\[Ax = 0,\]

where \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix and x is an \(n\)-tuple of unknowns. We also call the set

\[N = \{v \in \mathbb{F}^n | Av = 0\}\]

the solution space for the homogeneous system corresponding to \(Ax = 0.\)

       When describing the solution space for a homogeneous linear system, there are three important cases to keep in mind:

Definition A.3.7. The system of linear equations System (A.1.1) is called

  1. overdetermined if \(m > n.\)
  2. square if \(m = n.\)
  3. underdetermined if \(m < n.\)

In particular, we can say a great deal about underdetermined homogeneous systems, which
we state as a corollary to the following more general result.

Theorem A.3.8. Let \(N\) be the solution space for the homogeneous linear system corresponding to the matrix equation \(Ax = 0,\) where \(A \in \mathbb{F}^{m \times n} .\) Then
1. the zero vector \(0 \in N\).
2. \(N\) is a subspace of the vector space \(\mathbb{F}^n.\) 

This is an amazing theorem. Since \(N\) is a subspace of \(\mathbb{F}^n\) , we know that either \(N\) will contain exactly one element (namely, the zero vector) or \(N\) will contain infinitely many elements.

Corollary A.3.9. Every homogeneous system of linear equations is solved by the zero vector. Moreover, every underdetermined homogeneous system has infinitely many solution.

We call the zero vector the trivial solution for a homogeneous linear system. The fact that every homogeneous linear system has the trivial solution thus reduces solving such a system to determining if solutions other than the trivial solution exist.

      One method for finding the solution space of a homogeneous system is to first use Gaussian elimination (as demonstrated in Example A.3.5) in order to factor the coefficient matrix of the system. Then, because the original linear system is homogeneous, the homogeneous system corresponding to the resulting RREF matrix will have the same solutions as the original system. In other words, if a given matrix \(A\) satisfies

\[E_k E_{k-1} \cdots E_0 A = A_0 ,\]

where each \(E_i\) is an elementary matrix and \(A_0\) is an RREF matrix, then the matrix equation \(Ax = 0\) has the exact same solution set as \(A_0 x = 0\) since \(E_0^{-1} E_1^{-1} \cdots E_k^{-1} 0 = 0.\)

Example A.3.10. In the following examples, we illustrate the process of determining the solution space for a homogeneous linear system having coefficient matrix in RREF.

1. Consider the matrix equation \(Ax = 0\), where \(A\) is the matrix given by

\[  A= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{array} \right].\]

This corresponds to an overdetermined homogeneous system of linear equations. Moreover, since there are no free variables (as defined in Example A.3.3), it should be clear that this system has only the trivial solution. Thus, \(N = \{0\}.\)

2. Consider the matrix equation \(Ax = 0,\) where \(A\) is the matrix given by

\[A= \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right].\]

This corresponds to an overdetermined homogeneous system of linear equations. Unlike the above example, we see that \(x_3\) is a free variable for this system, and so we would expect the solution space to contain more than just the zero vector. As in Example A.3.3, we can solve for the leading variables in terms of the free variable in order to obtain

\[   \left. \begin{array}{cccc} x_1 & = & -& x_3 \\ x_2 & = & -& x_3 \\  \end{array} \right\},\]

It follows that, given any scalar \(\alpha \in \mathbb{F}\), every vector of the form

\[ x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} -\alpha \\ -\alpha \\ \alpha \end{array} \right]=\alpha \left[ \begin{array}{c} -1 \\ -1 \\ 1 \end{array} \right]  \] 


is a solution to \(Ax = 0.\) Therefore,

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = -x_3 , x_2 = -x_3\} = span ((-1, -1, 1)) .\]


3. Consider the matrix equation \(Ax = 0,\) where \(A\) is the matrix given by

\[    A= \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right].  \]


This corresponds to a square homogeneous system of linear equations with two free variables. Thus, using the same technique as n the previous example, we can solve for the leading variable in order to obtain \(x_1 = -x_2 - x_3 .\) It follows that, given any scalars \(\alpha , \beta \in \mathbb{F},\) every vector of the form

\[   x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} -\alpha - \beta \\ \alpha \\ \beta \end{array} \right]=\alpha \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right] + \beta \left[ \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \right]  \]


is a solution to \(Ax = 0.\) Therefore,

\[N = (x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = 0 = span ((-1, 1, 0), (-1, 0, 1)).\]

A.3.3 Solving inhomogeneous linear systems

In this section, we demonstrate the relationship between arbitrary linear systems and homogeneous linear systems. Specifically, we will see that it takes little more work to solve a general linear system than it does to solve the homogeneous system associated to it.
    As usual, we use \(m, n \in \mathbb{Z}_+\) to denote arbitrary positive integers.

Definition A.3.11. The system of linear equations System (A.1.1) is called an inhomogeneous system if the right-hand side of at least one equation is not zero. In other words, an inhomogeneous system corresponds to a matrix equation of the form

\[Ax = b,\]

where \(A \in \mathbb{F}^{m \times n}\) is an \(m \times n\) matrix, \(x\) is an \(n\)-tuple of unknowns, and \(b \in \mathbb{F}^m\) is a vector having at least one non-zero component. We also call the set

\[U = \{v \in \mathbb{F}^n ~|~ Av = b\}\]

the solution set for the linear system corresponding to \(Ax = b.\)

 

     As illustrated in Example A.3.3, the zero vector cannot be a solution for an inhomogeneous system. Consequently, the solution set \(U\) for an inhomogeneous linear system will never be a subspace of any vector space. Instead, it will be a related algebraic structure as described in the following theorem.

Theorem A.3.12. Let \(U\) be the solution space for the inhomogeneous linear system corresponding to the matrix equation \(Ax = b,\) where \(A \in \mathbb{F}^{m \times n}\) and \(b \in \mathbb{F}^m\) is a vector having at least one non-zero component. Then, given any element \(u \in U,\) we have that

\[U = u + N = \{u + n ~|~ n \in N\} ,\]

where \(N\) is the solution space to \(Ax = 0.\) In other words, if \(B = (n^{(1)} , n^{(2)} , \ldots , n^{(k)} )\) is a list of vectors forming a basis for \(N\), then every element of \(U\) can be written in the form

\[u + \alpha_1 n^{(1)} + \alpha_2 n^{(2)} + \ldots + \alpha_k n^{(k)}\]

for some choice of scalars \(\alpha_1 , \alpha_2 , \ldots , \alpha_k \in \mathbb{F}.\)

As a consequence of this theorem, we can conclude that inhomogeneous linear systems behave a lot like homogeneous systems. The main difference is that inhomogeneous systems are not necessarily solvable. This, then, creates three possibilities: an inhomogeneous linear system will either have no solution, a unique solution, or infinitely many solutions. An important special case is as follows.

Corollary A.3.13. Every overdetermined inhomogeneous linear system will necessarily be unsolvable for some choice of values for the right-hand sides of the equations.

      The solution set \(U\) for an inhomogeneous linear system is called an affine subspace of \(\mathbb{F}n\) since it is a genuine subspace of \(\mathbb{F}^n\) that has been “offset” by a vector \(u \in \mathbb{F}n\) . Any set having this structure might also be called a coset (when used in the context of Group Theory) or a linear manifold (when used in a geometric context such as a discussion of
intersection hyperplanes).

      In order to actually find the solution set for an inhomogeneous linear system, we rely on Theorem A.3.12. Given an \(m \times n\) matrix \(A \in \mathbb{F}^{m \times n}\) and a non-zero vector \(b \in \mathbb{F}^m\) , we call \(Ax = 0\) the associated homogeneous matrix equation to the inhomogeneous matrix equation \(Ax = b.\) Then, according to Theorem A.3.12, \(U\) can be found by first finding the solution space \(N\) for the associated equation \(Ax = 0\) and then finding any so-called particular solution \(u \in \mathbb{F}^n\) to \(Ax = b.\)

      As with homogeneous systems, one can first use Gaussian elimination in order to factorize \(A,\) and so we restrict the following examples to the special case of RREF matrices. 

Example A.3.14. The following examples use the same matrices as in Example A.3.10.

      1. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[ A= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{array} \right] \]

and \(b \in \mathbb{F}^4\) has at least one non-zero component. Then \(Ax = b\) corresponds to an overdetermined inhomogeneous system of linear equations and will not necessarily be solvable for all possible choices of \(b.\)

     In particular, note that the bottom row \(A^{(4, \cdot)}\) of \(A\) corresponds to the equation

\[0 = b4 ,\]

from which \(Ax = b\) has no solution unless the fourth component of \(b\) is zero. Furthermore, the remaining rows of \(A\) correspond to the equations

\[x_1 = b_1 , x_2 = b_2 , \rm{~and~} x_3 = b_3 .\]

It follows that, given any vector \(b \in \mathbb{F}^n\) with fourth component zero, \(x = b\) is the only solution to \(Ax = b.\) In other words, \(U = \{b\}.\)

    2. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[   A= \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]   \]

and \(b \in \mathbb{F}4 .\) This corresponds to an overdetermined inhomogeneous system of linear equations. Note, in particular, that the bottom two rows of the matrix corresponds to the equations \(0 = b_3\) and \(0 = b_4 ,\) from which \(Ax = b\) has no solution unless the third and fourth component of the vector \(b\) are both zero. Furthermore, we conclude from the remaining rows of the matrix that \(x_3\) is a free variable for this system and that

\[      \left. \begin{array}{ccccc} x_1 & =&b_1 & -& x_3 \\ x_2 & = &b_2& -& x_3 \\  \end{array} \right\}.    \]


It follows that, given any scalar \(\alpha \in \mathbb{F},\) every vector of the form

\[   x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} b_1 - \alpha \\ b_2-\alpha \\ \alpha \end{array} \right]= \left[ \begin{array}{c} b_1 \\ b_2 \\ 0 \end{array} \right] + \alpha \left[ \begin{array}{c} -1 \\ -1 \\ 1 \end{array} \right] = u+ \alpha n   \]

is a solution to \(Ax = b.\) Recall from Example A.3.10 that the solution space for the associated homogeneous matrix equation \(Ax = 0\) is

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = -x_3 , x_2 = -x_3\} = span ((-1, -1, 1)) .\]

Thus, in the language of Theorem A.3.12, we have that \(u\) is a particular solution for
\(Ax = b\) and that \((n)\) is a basis for \(N.\) Therefore, the solution set for \(Ax = b\) is

\[U = (b_1 , b_2 , 0) + N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 = b_1 - x_3 , x_2 = b_2 - x_3\} .\]

     3. Consider the matrix equation \(Ax = b,\) where \(A\) is the matrix given by

\[    A= \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right]    \]

and \(b \in \mathbb{F}^4\) . This corresponds to a square inhomogeneous system of linear equations with two free variables. As above, this system has no solutions unless \(b_2 = b_3 = 0,\) and, given any scalars \(\alpha , \beta \in \mathbb{F},\) every vector of the form

\[   x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{c} b_1 - \alpha -\beta \\ \alpha \\ \beta \end{array} \right]= \left[ \begin{array}{c} b_1 \\ 0 \\ 0 \end{array} \right] +     \alpha \left[ \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \right]  +  \beta \left[ \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \right] = u+ \alpha n^{(1)}+ \beta n^{(2)}     \]

is a solution to \(Ax = b.\) Recall from Example A.3.10, that the solution space for the associated homogeneous matrix equation \(Ax = 0\) is

\[N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = 0\} = span ((-1, 1, 0), (-1, 0, 1)) .\]

Thus, in the language of Theorem A.3.12, we have that u is a particular solution for \(Ax = b\) and that \((n^{(1)} , n^{(2)} )\) is a basis for \(N.\) Therefore, the solution set for \(Ax = b\) is

\[U = (b_1 , 0, 0) + N = \{(x_1 , x_2 , x_3 ) \in \mathbb{F}^3 ~|~ x_1 + x_2 + x_3 = b_1\} .\]

A.3.4 Solving linear systems with LU-factorization

Let \(n \in \mathbb{Z}_+\) be a positive integer, and suppose that \(A \in \mathbb{F}^{n \times n}\) is an upper triangular matrix and that \(b \in \mathbb{F}^n\) is a column vector. Then, in order to solve the matrix equation \(Ax = b\), there is no need to apply Gaussian elimination. Instead, we can exploit the triangularity of \(A\) in order to directly obtain a solution.

    Using the notation in System (A.1.1), note that the last equation in the linear system corresponding to \(Ax = b\) can only involve the single unknown \(x_n \), and so we can obtain the solution

\[x_n = \frac{b_n}{a_{nn}}\]

as long as \(a_{nn} \neq 0.\) If \(a_{nn} = 0,\) then we must be careful to distinguish between the two cases in which \(b_n = 0\) or \(b_n \neq 0.\) Thus, for reasons that will become clear below, we assume that the diagonal elements of \(A\) are all nonzero. Under this assumption, there is no ambiguity in substitution the solution for \(x_n\) into the penultimate (a.k.a. second-to-last) equation. Since \(A\) is upper triangular, the penultimate equation involves only the single unknown \(x_{n-1} ,\) and so we obtain the solution

\[   x_{n-1} = \frac{b_{n-1}-a_{n-1,n}x_n }{a_{n-1,n-1}}.  \]

We can then similarly substitute the solutions for \(x_n\) and \(x_{n−1}\) into the ante penultimate (a.k.a. third-to-last) equation in order to solve for \(x_{n−2}\) , and so on until a complete solution is found. In particular,

\[   x_1 = \frac{b_1-\sum_{k=2}^na_{nk}x_{k} }{a_{1 1}}.  \]

As in Example A.3.5, we call this process back substitution. Given an arbitrary linear system, back substitution essentially allows us to halt the Gaussian elimination procedure and immediately obtain a solution for the system as soon as an upper triangular matrix (possibly in REF or even RREF) has been obtained from the coefficient matrix.

     A similar procedure can be applied when \(A\) is lower triangular. Again using the notation in System (A.1.1), the first equation contains only \(x_1\) , and so

\[  x_1= \frac{b_1}{a_{1 1}}.\]

We are again assuming that the diagonal entries of \(A\) are all nonzero. Then, acting similarly
to back substitution, we can substitute the solution for \(x_1\) into the second equation in order
to obtain

\[    x_2= \frac{b_2-a_{2 1}x_1}{a_{2 2}}.\]

Continuing this process, we have created a forward substitution procedure. In particular,

\[    x_n = \frac{b_n-\sum_{k=2}^na_{nk}x_{k} }{a_{n n}}. \]

       More generally, suppose that \(A \in \mathbb{F}^{n \times n}\) is an arbitrary square matrix for which there exists a lower triangular matrix \(L \in \mathbb{F}^{n \times n}\) and an upper triangular matrix \(U \in \mathbb{F}^{n \times n}\) such that \(A = LU.\) When such matrices exist, we call \(A = LU\) an LU-factorization (a.k.a. LU- decomposition) of \(A\). The benefit of such a factorization is that it allows us to exploit the triangularity of \(L\) and \(U\) when solving linear systems having coefficient matrix \(A\).

To see this, suppose that \(b \in \mathbb{F}^n\) is a column vector. (As above, we also assume that the none of the diagonal entries in either \(L\) or \(U\) is zero.) Furthermore, set \(y = Ux\), where \(x\) is the as yet unknown solution of \(Ax = b.\) Then, by substitution, \(y\) must satisfy

\[Ly = b,\]

and so, since \(L\) is lower triangular, we can immediately solve for \(y\) via forward substitution. In other words, we are using the associative of matrix multiplication (cf. Theorem A.2.6) in order to conclude that

\[(A)x = (LU)x = L(Ux) = L(y)\]

Then, once we have obtained \(y \in \mathbb{F}^n\) , we can apply back substitution in order to solve for \(x\) To see this, suppose that \(A = LU\) is an LU-factorization for the matrix \(A \in \mathbb{F}^{n \times n}\) and  in the matrix equation

\[Ux = y.\]

          In general, one can only obtain an LU-factorization for a matrix \(A \in \mathbb{F}^{n \times n}\) when there exist elementary “row combination” matrices \(E_1 , E_2 , \ldots , E_k \in \mathbb{F}^{n \times n}\) and an upper triangular matrix \(U\) such that

\[E_k E_{k-1} \cdots E_1 A = U.\]

There are various generalizations of LU-factorization that allow for more than just elementary “row combinations” matrices in this product, but we do not mention them here. Instead, we provide a detailed example that illustrates how to obtain an LU-factorization and then
how to use such a factorization in solving linear systems.

Example A.3.15. Consider the matrix \(A \in \mathbb{F}^{3 \times 3}\) given by
\[   A = \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right].  \]

Using the techniques illustrated in Example A.3.5, we have the following matrix product:

\[   \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]  \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]\left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right]= \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right]=U. \]

In particular, we have found three elementary “row combination” matrices, which, when multiplied by \(A\), produce an upper triangular matrix \(U.\)

      Now, in order to produce a lower triangular matrix \(L\) such that \(A = LU\), we rely on two facts about lower triangular matrices. First of all, any lower triangular matrix with entirely no-zero diagonal is invertible, and, second, the product of lower triangular matrices is always lower triangular. (Cf. Theorem A.2.7.) More specifically, we have that

\[   \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 4 & 5 & 10 \\ 4 & 8 & 2 \end{array} \right]=\left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]^{-1}    \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right]. \]

where

\[      \left[ \begin{array}{ccc} 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -2 & 0 & 1 \end{array} \right]^{-1} \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 2 & 1 \end{array} \right]^{-1} =    \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & 0 & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{array} \right]=  \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 2 & -2 & 1 \end{array} \right].\]

We call the resulting lower triangular matrix \(L\) and note that \(A = LU,\) as desired.
      Now, define \(x, y,\) and \(b\) by

\[x=\left[ \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \right], y= \left[ \begin{array}{c} y_1 \\ y_2 \\ y_3 \end{array} \right], \rm{~and~} b= \left[ \begin{array}{c} 6 \\ 16 \\ 2 \end{array} \right].\]

Applying forward substitution to \(Ly = b,\) we obtain the solution

\[ \left. \begin{array}{ccccccccc} y_1& = &b_1&&&&&=&6 \\  y_2& =& b_2& - &2y_1&&&=&4 \\ y_3& = &b_3& -& 2y_1& +& 2y_2& =& -2\end{array} \right\}.  \]

Then, given this unique solution \(y\) to \(Ly = b,\) we can apply backward substitution to \(Ux = y\) in order to obtain

\[    \left. \begin{array}{ccccccccc} 2x_1& = &y_1&-&3x_2&-&4x_3&=&8 \\  -1x_2& =& y_2&  &&-&2x_3&=&2 \\ 2x_3& = &y_3& & & & & =& -2\end{array} \right\}.   \]


It follows that the unique solution to \(Ax = b\) is

\[     \left. \begin{array}{ccccccccc} x_1& =&4 \\  x_2& =&-2 \\ x_3& = & 1 \end{array} \right\}.    \]

         In summary, we have given an algorithm for solving any matrix equation \(Ax = b\) in which \(A = LU,\) where \(L\) is lower triangular, \(U\) is upper triangular, and both \(L\) and \(U\) have nothing but non-zero entries along their diagonals.

         We note in closing that the simple procedures of back and forward substitution can also be used for computing the inverses of lower and upper triangular matrices. E.g., the inverse \(U = (u_{ij} )\) of the matrix

\[ U^{-1} = \left[ \begin{array}{ccc} 2 & 3 & 4 \\ 0 & -1 & 3 \\ 0 & 0 & -2 \end{array} \right] \]

must satify

\[   \left[ \begin{array}{ccc} 2u_{1 1}+ 3u_{2 1}+4u_{3 1}& 2u_{1 2}+ 3u_{2 2}+4u_{3 2} & 2u_{1 3}+ 3u_{2 3}+4u_{3 3} \\ -u_{2 1}+3u_{3 1} & -u_{2 2}+3u_{3 2} & -u_{2 3}+3u_{3 3} \\ -2u_{3 1} & -2u_{3 2} & -2u_{3 3} \end{array} \right]=U^{-1}U = I_{3}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right],  \]

from which we obtain the linear system

\[   \left. \begin{array}{ccccccccccc} 2u_{1 1}&&&+3u_{2 1}&&& +4u_{3 1}&&& =&1 \\  &2u_{1 2}&&&+3u_{2 2} &&&  +4u_{3 2} &&=&0 \\ &&2u_{1 3}& &&+3u_{23}&&& +4u_{33} &= & 0 \\ &&&-u_{21}&&& +3u_{31}&&& =& 0 \\ &&&& -u_{22} &&&+3u_{32}&& = &1 \\ &&&&&-u_{23}&&& +3u_{33} &=& 0 \\ &&&&&&-2u_{31} &&&=& 0 \\ &&&&&&&-2u_{32}&& =& 0 \\  &&&&&&&&-2u_{33}& = &1\end{array} \right\}.    \]

in the nine variables \(u_{11} , u_{12} , \ldots , u_{33} .\) Since this linear system has upper triangular coefficient matrix, we can apply back substitution in order to directly solve for the entries in \(U.\) 

      The only condition we imposed upon our triangular matrices above was that all diagonal entries were non-zero. It should be clear to you that this non-zero diagonal restriction is a necessary and sufficient condition for a triangular matrix to be non-singular. Moreover, once the inverses of both \(L\) and \(U\) in an LU-factorization have been obtained, then we can immediately calculate the inverse for \(A = LU\) by applying Theorem A.2.9(4):

\[ A^{-1} = (LU)^{-1} = U^{-1} L^{-1}. \].