2.1: Gaussian Elimination

Last updated
Save as PDF

Page ID: 1751

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Systems of linear equations can be written as matrix equations. Now you will learn an efficient algorithm for (maximally) simplifying a system of linear equations (or a matrix equation) -- Gaussian elimination.

Augmented Matrix Notation

Efficiency demands a new notation, called an augmented matrix, which we introduce via examples:

The linear system

\[\left\{\begin{array}{rcr}
x\ +\ y & = & 27 \\
2x-\ y & = &\ 0\, ,
\end{array}\right.
\]

is denoted by the augmented matrix

\[\left(\begin{array}{rr|r}
1 &1 &27 \\ 2 &-1 & 0
\end{array}\right)\]

This notation is simpler than the matrix one,

\[
\begin{pmatrix}
1 &1 \\
2 &-1
\end{pmatrix}
\begin{pmatrix} x \\ y\end{pmatrix}
=
\begin{pmatrix} 27 \\ 0\end{pmatrix} ,
\]

although all three of the above equations denote the same thing.

Another interesting rewriting is

\[
x\begin{pmatrix} 1\\2\end{pmatrix}+y\begin{pmatrix} 1\\-1\end{pmatrix}=\begin{pmatrix} 27\\0\end{pmatrix} .
\]

This tells us that we are trying to find which combination of the vectors $\begin{pmatrix}1\\2\end{pmatrix}$ and $\begin{pmatrix} 1\\-1\end{pmatrix}$ adds up to $\begin{pmatrix} 27\\0\end{pmatrix}$;
the answer is "clearly'' $9\begin{pmatrix}1\\2\end{pmatrix}+18\begin{pmatrix}1\\-1\end{pmatrix}$.

Here is a larger example.
The system
\begin{eqnarray*}
1x + 3y + 2z + 0w &=&9 \\
6x + 2y + 0z -2w &=&0 \\
-1x+ 0 y + 1 z + 1w &=&3 \, ,
\end{eqnarray*}
is denoted by the augmented matrix

\[\left(\begin{array}{rrrr|r}
1 &3 &2 &0 &9 \\ 6 &2 &0 &-2 &0 \\ -1 &0 &1 &1 &3
\end{array}\right),\]

which is equivalent to the matrix equation

\[\begin{pmatrix} 1 &3 &2 &0 \\ 6 &2 &0 &-2 \\ -1 &0 &1 &1\end{pmatrix}
\begin{pmatrix} x\\ y\\z\\w\end{pmatrix}
=\begin{pmatrix}9\\0\\3\end{pmatrix}.
\]

Again, we are trying to find which combination of the columns of the matrix adds up to the vector on the right hand side.

For the the general case of $r$ linear equations in $k$ unknowns, the number of equations is the number of rows $r$ in the augmented matrix, and the number of columns $k$ in the matrix left of the vertical line is the number of unknowns:

\[\left(\begin{array}{rrrr|r}
a^1_1 & a^1_2 & \cdots & a^1_k & b^1 \\ a^2_1 & a^2_2 & \cdots & a^2_k & b^2 \\ \vdots & \vdots & & \vdots & \vdots \\ a^r_1 & a^r_2 & \cdots & a^r_k & b^r
\end{array}\right),\]

Entries left of the divide carry two indices; subscripts denote column number and superscripts row number. We emphasize, the superscripts here do $\textit {not}$ denote exponents.

We now have three ways of writing the same question. Let's put them side by side as we solve the system by strategically adding and subtracting equations. We will not tell you the motivation for this particular series of steps yet, but let you develop some intuition first.

Example $\PageIndex{1}$: How matrix equations and augmented matrices change in elimination

\[\left.\begin{matrix}x + y = 27\\ 2x - y = 0\end{matrix}\right\}\Leftrightarrow
\begin{pmatrix}
1 & 1\\
2 & -1
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
27\\
0
\end{pmatrix}
\Leftrightarrow \left(\begin{array}{rr|r}
1 &1 &27 \\ 2 &-1 &0
\end{array}\right).\]

With the first equation replaced by the sum of the two equations this becomes

\[\left.\begin{matrix}3x + 0 = 27\\ 2x - y = 0\end{matrix}\right\}\Leftrightarrow
\begin{pmatrix}
3 & 0\\
2 & -1
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
27\\
0
\end{pmatrix}
\Leftrightarrow \left(\begin{array}{rr|r}
3 &0 &27 \\ 2 &-1 &0
\end{array}\right).\]

Let the new first equation be the old first equation divided by 3:

\[\left.\begin{matrix}x + 0 = 9\\ 2x - y = 0\end{matrix}\right\}\Leftrightarrow
\begin{pmatrix}
1 & 0\\
2 & -1
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
9\\
0
\end{pmatrix}
\Leftrightarrow \left(\begin{array}{rr|r}
1 &0 &9 \\ 2 &-1 &0
\end{array}\right).\]

Replace the second equation by the second equation minus two times the first equation:

\[\left.\begin{matrix}x + 0 = 9\\ 0 - y = -18\end{matrix}\right\}\Leftrightarrow
\begin{pmatrix}
1 & 0\\
& -1
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
9\\
-18
\end{pmatrix}
\Leftrightarrow \left(\begin{array}{rr|r}
1 &0 &9 \\ 0 &-1 &-18
\end{array}\right).\]

Let the new second equation be the old second equation divided by -1:

\[\left.\begin{matrix}x + 0 = 9\\ 0 - y = 18\end{matrix}\right\}\Leftrightarrow
\begin{pmatrix}
1 & 0\\
0 & 1
\end{pmatrix}
\begin{pmatrix}
x\\
y
\end{pmatrix}
=
\begin{pmatrix}
9\\
18
\end{pmatrix}
\Leftrightarrow \left(\begin{array}{rr|r}
1 &0 &9 \\ 0 &1 &18
\end{array}\right).\]

Did you see what the strategy was? To eliminate $y$ from the first equation and then eliminate $x$ from the second. The result was the solution to the system.

Here is the big idea: Everywhere in the instructions above we can replace the word "equation" with the word "row" and interpret them as telling us what to do with the augmented matrix instead of the system of equations. Performed systemically, the result is the $\textit {Gaussian elimination}$ algorithm.

Equivalence and the Act of Solving

We introduce the symbol $\sim$ which is called "tilde" but should be read as "is (row) equivalent to'' because at each step the augmented matrix changes by an operation on its rows but its solutions do not. For example, we found above that

\[\left(\begin{array}{rr|r}
1 &1 &27 \\ 2 &-1 & 0
\end{array}\right)\sim\left(\begin{array}{rr|r} 1 &0 &9 \\ 2 &-1 &0\end{array}\right)\sim\left(\begin{array}{rr|r} 1 &0 &9 \\ 0 &1 &18\end{array}\right)\]

The last of these augmented matrices is our favorite!

Setting up a string of equivalences like this is a means of solving a system of linear equations. This is the main idea of Section 2.1.3. This next example hints at the main trick:

Example $\PageIndex{2}$: Using Gaussian elimination

\[\left.\begin{matrix}x + y = 5\\ x + 2y = 8\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rr|r} 1 &1 &5 \\ 1 &2 &8\end{array}\right)\sim\left(\begin{array}{rr|r}1 &1 &5 \\ 0 &1 &3\end{array}\right)\sim\left(\begin{array}{rr|r} 1 &0 &2 \\ 0 &1 &3\end{array}\right)\Leftrightarrow\left\{\begin{matrix}x + 0 = 2\\0 + y = 3 \end{matrix}\right.\]

Note that in going from the first to second augmented matrix, we used the top left $1$ to make the bottom left entry zero. For this reason we call the top left entry a pivot. Similarly, to get from the second to third augmented matrix, the bottom right entry (before the divide) was used to make the top right one vanish; so the bottom right entry is also called a pivot.

This name $\textit {pivot}$ is used to indicate the matrix entry used to "zero out'' the other entries in its column.

Reduced Row Echelon Form

For a system of two linear equations, the goal of Gaussian elimination is to convert the part of the augmented matrix left of the dividing line into the matrix

\[I= \begin{pmatrix}
1 &0 \\
0 &1
\end{pmatrix}\, ,\]

called the $\textit{Identity Matrix}$, since this would give the simple statement of a solution $x=a,y=b$. The same goes for larger systems of equations for which the identity matrix $I$ has 1's along its diagonal and all off-diagonal entries vanish:

\[I=\left(\begin{array}{ccccc}1&0&\cdots&0\\0&1&&0\\ \vdots&&\ddots&\vdots\\0&0&\cdots&1\end{array}\right).\]

For many systems, it is not possible to reach the identity in the augmented matrix via Gaussian elimination. In any case, a certain version of the matrix that has the maximum number of components eliminated is said to be the Row Reduced Echelon Form (RREF).

Example $\PageIndex{3}$: Redundant equations

\[\left.\begin{matrix}x + y = 2\\ 2x + 2y = 4\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rr|r} 1 &1 &2 \\ 2 &2 &4\end{array}\right)\sim\left(\begin{array}{rr|r}1 &1 &2 \\ 0 &0 &0\end{array}\right)\Leftrightarrow\left\{\begin{matrix}x + y = 2\\0 + 0 = 0 \end{matrix}\right.\]

This example demonstrates if one equation is a multiple of the other the identity matrix can not be a reached. This is because the first step in elimination will make the second row a row of zeros. Notice that solutions still exists $x=1,y=1$ is a solution. The last augmented matrix here is in RREF.

Example $\PageIndex{4}$: Inconsistent equations

\[\left.\begin{matrix}x + y = 2\\ 2x + 2y = 5\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rr|r} 1 &1 &2 \\ 2 &2 &5\end{array}\right)\sim\left(\begin{array}{rr|r}1 &1 &2 \\ 0 &0 &1\end{array}\right)\Leftrightarrow\left\{\begin{matrix}x + y = 2\\0 + 0 = 1 \end{matrix}\right.\]

This system of equation has a solution if there exists two numbers $x$, and $y$ such that $0+0=1$. That is a tricky way of saying there are no solutions. The last form of the augmented matrix here is in RREF.

Example $\PageIndex{5}$: Silly order of equations

A robot might make this mistake:

$$\left.\begin{matrix}0x + y = -2\\ x + y = 7\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rr|r} 0 &1 &-2 \\ 1 &1 &7\end{array}\right)\sim\cdots,\]

and then give up because the the upper left slot can not function as a pivot since the 0 that lives there can not be used to eliminate the zero below it. Of course, the right thing to do is to change the order of the equations before starting

$$\left.\begin{matrix}x + y = 7\\ 0x + y = -2\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rr|r} 1 &1 &7 \\ 0 &1 &-2\end{array}\right)\sim\left(\begin{array}{rr|r}1 &0 &9 \\ 0 &1 &-2\end{array}\right)\Leftrightarrow\left\{\begin{matrix}x + 0 = 9\\0 + y = -2 \end{matrix}\right.\]

The third augmented matrix above is the RREF of the first and second. That is to say, you can swap rows on your way to RREF.

For larger systems of matrices, these three kinds of problems are the obstruction to obtaining the identity matrix, and hence to a simple statement of a solution in the form $x=a,y=b,\ldots$. What can we do to maximally simplify a system of equations in general? We need to perform operations that simplify our system $\textit {without changing its solutions}$. Because, exchanging the order of equations, multiplying one equation by a non-zero constant or adding equations does not change the system's solutions, we are lead to three operations:

Row Swap:Exchange any two rows.
Scalar Multiplication: Multiply any row by a non-zero constant.
Row Sum: Add a multiple of one row to another row.

These are called Elementary Row Operations, or EROs for short, and are studied in detail in Section 2.3. Suppose now we have a general augmented matrix for which the first entry in the first row does not vanish. Then, using just the three EROs, we could then perform the following algorithm:

Make the leftmost nonzero entry in the top row 1 by multiplication}.
Then use that 1 as a pivot to eliminate everything below it}.
Then go to the next row and make the leftmost non zero entry 1}.
Use that 1 as a pivot to eliminate everything below and above it}!
Go to the next row and make the leftmost nonzero entry 1... etc}.

In the case that the first entry of the first row is zero, we may first interchange the first row with another row whose first entry is non-vanishing and then perform the above algorithm. If the entire first column vanishes, we may still apply the algorithm on the remaining columns.

This algorithm is known as Gaussian elimination, its endpoint is an augmented matrix of the form

\[\left(\begin{array}{cccccccc|c}
1 & * & 0 & * & 0 & \cdots& 0 &* & b^1 \\
0 & 0 & 1 & * & 0 & \cdots& 0 &* & b^2 \\
0 & 0& 0 & 0 & 1 & \cdots& 0 &* & b^3 \\
\vdots & \vdots& \vdots & & \vdots & & &
\vdots & \vdots \\
0 & 0& 0 & 0& 0 & \cdots & 1 &* & b^k \\
0 & 0 & 0 & 0 & 0 & \cdots& 0 &0 & b^{k+1} \\
\vdots & \vdots & \vdots & \vdots & \vdots & & \vdots &\vdots & \vdots \\
0 & 0 & 0 & 0 & 0 & \cdots& 0 & 0& b^r \\
\end{array}\!\right)\]

This is called Reduced Row Echelon Form (RREF). The asterisks denote the possibility of arbitrary numbers ($\textit {e.g.}$, the second 1 in the top line of example12).

The following properties define RREF:

In every row the left most non-zero entry is 1 (and is called a pivot ).
The pivot of any given row is always to the right of the pivot of the row above it.
The pivot is the only non-zero entry in its column.

Here are some examples:

Example $\PageIndex{6}$: Augmented matrix in RREF

\[\left(\begin{array}{rrrr|r} 1 &0 &7 &0 \\ 0 &1 &3 &0 \\ 0 &0 &0 &1 \\ 0 &0 &0 &0\end{array}\right)\]

Example $\PageIndex{7}$: Augmented matrix NOT in RREF

\[\left(\begin{array}{rrr|r} 1 &0 &3 &0 \\ 0 &0 &2 &0 \\ 0 &1 &0 &1 \\ 0 &0 &0 &1\end{array}\right)\]

Actually, this NON-example breaks all three of the rules!

The reason we need the asterisks in the general form of RREF is that not every column need have a pivot, as demonstrated in examples 12 and 15. Here is an example where multiple columns have no pivot:

Example $\PageIndex{8}$: Consecutive columns with no pivot in RREF

\[\left.\begin{matrix}x + y + z + 0w = 2\\ 2x + 2y + 2z + 2w = 4\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rrrr|r} 1 &1 &1 &0 &2 \\ 2 &2 &2 &1 &4\end{array}\right)\sim\left(\begin{array}{rrrr|r}1 &1 &1 &0 &2 \\ 0 &0 &0 &1 &0\end{array}\right)\Leftrightarrow\left\{\begin{matrix}x + y + z = 2\\w = 0 \end{matrix}\right.\]

Note that there was no hope of reaching the identity matrix, because of the shape of the augmented matrix we started with.

It is important that you are able to convert RREF back into a set of equations. The first thing you might notice is that if any of the numbers $b^{k+1},\dots, b^r$ are non-zero then the system of equations is inconsistent and has no solutions. Our next task is to extract all possible solutions from an RREF augmented matrix.

Solution Sets and RREF

RREF is a maximally simplified version of the original system of equations in the following sense:

As many coefficients as possible of the variables vanish.
As many coefficients as possible of the variables is unity.

It is easier to read off solutions from the maximally simplified equations than from the original equations, even when there are infinitely many solutions.

Example $\PageIndex{9}$: Standard approach from a system of equations to the solution set

\[\left.\begin{matrix} x + y + 5w = 1\\ y + 2w = 6\\ z + 4w = 8\end{matrix}\right\}\Leftrightarrow\left(\begin{array}{rrrr|r} 1 &1 &0 &5 &1 \\ 0 &1 &0 &2 &6\\ 0 &0 &1 &4 &8\end{array}\right)\sim\left(\begin{array}{rrrr|r}1 &0 &0 &3 &-5 \\ 0 &1 &0 &2 &6\\ 0 &0 &1 &4 &8\end{array}\right)\Leftrightarrow\begin{Bmatrix} x + 3w = -5\\ y + 2w = 6\\ z + 4w = 8\end{Bmatrix}\Leftrightarrow\begin{Bmatrix} x = -5 - 3w\\ y = 6 - 2w\\ z = 8 - 4w\\ w = w\end{Bmatrix}\]

\[\Leftrightarrow\begin{pmatrix} x \\y \\z \\w \end{pmatrix} = \begin{pmatrix} -5\\ 6 \\8 \\0 \end{pmatrix} +w\begin{pmatrix} -3\\ -2\\ -4\\ 1\end{pmatrix}.\]

There is one solution for each value of $w$, so the solution set is

\[\begin{Bmatrix} \begin{pmatrix} -5\\ 6\\ 8\\ 0\end{pmatrix} + \alpha\begin{pmatrix} -3\\ -2\\ -4\\ 1\end{pmatrix} : \alpha \in \mathbb{R}\end{Bmatrix}\]

Here is a verbal description of the preceding example of the standard approach. We say that $x, y$, and $z$ are pivot variables because they appeared with a pivot coefficient in RREF. Since w never appears with a pivot coefficient, it is not a pivot variable. In the second line we put all the pivot variables on one side and all the non-pivot variables on the other side and added the trivial equation $w = w$ to obtain a system that allowed us to easily read off solutions.

The last example demonstrated the standard approach for solving a system of linear equations in its entirety:

Write the augmented matrix.
Perform EROs to reach RREF.
Express the non-pivot variables in terms of the pivot variables.

There are always exactly enough non-pivot variables to index your solutions. In any approach, the variables which are not expressed in terms of the other variables are called free variables. The standard approach is to use the non-pivot variables as free variables.

Non-standard approach: solve for $w$ in terms of $z$ and substitute into the other equations. You now have an expression for each component in terms of $z$. But why pick $z$ instead of $y$ or $x$? (or $x+y$?) The standard approach not only feels natural, but is $\textit {canonical}$, meaning that everyone will get the same RREF and hence choose the same variables to be free. However, it is important to remember that so long as their $\textit {set}$ of solutions is the same, any two choices of free variables is fine. (You might think of this as the difference between using Google Maps$^{\sf TM}$ or Mapquest$^{\sf TM}$; although their maps may look different, the place $\langle$home $\textit {sic}$$\rangle$ they are describing is the same!)

When you see an RREF augmented matrix with two columns that have no pivot, you know there will be two free variables.

Example $\PageIndex{10}$: Standard approach, multiple free variables

\[\left(\begin{array}{rrrr|r} 1 &0 &7 &0 &4 \\ 0 &1 &3 &4 &1\\ 0 &0 &0 &0 &0 \\ 0 &0 &0 &0 &0\end{array}\right)\Leftrightarrow\begin{Bmatrix} x + 7z = 4 \\ y + 3z + 4w = 1\end{Bmatrix}\Leftrightarrow\begin{Bmatrix} x = 4 - 7z \\ y = 1 - 3z - 4w \\ z = z \\ w = w\end{Bmatrix}\]

\[\Leftrightarrow\begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} 4 \\ 1 \\ 0 \\ 0 \end{pmatrix} + z\begin{pmatrix} -7 \\ -3 \\ 1 \\ 0 \end{pmatrix} + w\begin{pmatrix} 0 \\ -4 \\ 0 \\ 1 \end{pmatrix}\]

So the solution set is

\[\begin{Bmatrix} \begin{pmatrix} 4\\ 1\\ 0\\ 0\end{pmatrix} + z\begin{pmatrix} -7\\ -3\\ 1\\ 0\end{pmatrix} + w\begin{pmatrix} 0\\ -4\\ 0\\ 1\end{pmatrix} : z, w \in \mathbb{R}\end{Bmatrix}\]

There are infinitely many solutions; one for each pair of numbers $z,w$. You can imagine having three, four, or fifty-six non-pivot columns and the same number of free variables indexing your solutions set. In general a solution set to a system of equations with $n$ free variables will be of the form

\[\begin{Bmatrix} P + \mu_{1}H_{1} + \mu_{2}H_{2} + \cdots + \mu_{n}H_{n} : \mu_{1},\cdots,\mu_{n} \in \mathbb{R}\end{Bmatrix}\]

The parts of these solutions play special roles in the associated matrix equation. This will come up again and again long after this discussion of basic calculation methods, so the general language of Linear Algebra will be used to give names to these parts now.

Definition: homogeneous solution

A homogeneous solution to a linear equation $Lx = v$, with $L$ and $v$ known is a vector $H$ such that $LH = 0$ where $0$ is the zero vector.

If you have a particular solution P to a linear equation and add a sum of multiples of homogeneous solutions to it you obtain another particular solution.

Check now that the parts of the solutions with free variables as coefficients from the previous examples are homogeneous solutions, and that by adding a homogeneous solution to a particular solution one obtains a solution to the matrix equation. This will come up over and over again. As an example without matrices, consider the differential equation $\frac{\mathrm{d}^2 f}{\mathrm{d} x^2} = 3$. A particular solution is $\frac{3}{2}x^2$ while $x$ and $1$ are homogeneous solutions. The solution set is $\begin{Bmatrix} \frac{3}{2}x^2 + ax + c1 : a, b \in \mathbb{R}\end{Bmatrix}$. You can imagine similar differential equations with more homogeneous solutions.

Contributor

David Cherney, Tom Denton, and Andrew Waldron (UC Davis)