# 12.1: From linear systems to matrix equations

- Page ID
- 294

We begin this section by reviewing the deﬁnition of and notation for matrices. We then review several different conventions for denoting and studying systems of linear equations, the most fundamental being as a single matrix equation. This point of view has a long history of exploration, and numerous computational devices — including several computer programming languages — have been developed and optimized specifically for analyzing matrix equations.

## A.1.1 Definition of and notation for matrices

Let \(m, n \in \mathbb{Z}_{+}\) be positive integers, and, as usual, let \(\mathbb{F}\) denote either \(\mathbb{R}\) or \(\mathbb{C}\). Then we begin by defining an \(m \times n\) **matrix** \(A\) to be a rectangular array of numbers

\[ A = (a_{i j})_{i,j=1}^{m,n} = (A^{(i, j)})_{i,j=1}^{m,n} = \,

\left.

\begin{bmatrix}

a_{1 1} & \cdots & a_{1 n} \\

\vdots & \ddots & \vdots \\

a_{m 1} & \cdots & a_{m n}

\end{bmatrix}

\right\}

m \mbox{ numbers}

\hspace{-5.675cm}

\underbrace{

\phantom{

\begin{bmatrix}

a & a & a & a_{a_{a}}\\

a & a & a\\

a & a & a\\

a & a & a\\

\end{bmatrix}

}

}_{\textstyle n \mbox{ numbers}}

\]

where each element \(a_{i j} \in \mathbb{F}\) in the array is called an **entry** of \(A\) (specifically, \(a_{i j}\) is called the ``\(i, j\) entry''). We say that \(i\) indexes the **rows** of \(A\) as it ranges over the set \(\{1, \ldots, m\}\) and that \(j\) indexes the **columns** of \(A\) as it ranges over the set \(\{1, \ldots, n\}\). We also say that the matrix \(A\) has **size** \(m \times n\) and note that it is a (finite) sequence of doubly-subscripted numbers for which the two subscripts in no way depend upon each other.

**Definition A.1.1.** Given positive integers \(m, n \in \mathbb{Z}_{+}\), we use \(\mathbb{F}^{m \times n}\) to denote the set of all \(m \times n\) matrices having entries over \(\mathbb{F}.\)

**Example A.1.2. **The matrix \( A = \begin{bmatrix} 1 & 0 & 2 \\ -1 & 3 & i \end{bmatrix} \in \mathbb{C}^{2 \times 3}\), but \(A \notin \mathbb{R}^{2 \times 3}\) since the "\(2,3\)"' entry of \(A\) is not in \(\mathbb{R}.\)

Given the ubiquity of matrices in both abstract and applied mathematics, a rich vocabulary has been developed for describing various properties and features of matrices. In addition, there is also a rich set of equivalent notations. For the purposes of these notes, we will use the above notation unless the size of the matrix is understood from context or is unimportant. In this case, we will drop much of this notation and denote a matrix simply as

\[ A = (a_{i j}) \mbox{ or } A = (a_{i j})_{m \times n}. \]

To get a sense of the essential vocabulary, suppose that we have an \(m \times n\) matrix \(A = (a_{i j})\) with \(m = n\). Then we call \(A\) a **square** matrix. The elements \(a_{1 1}, a_{2 2}, \ldots, a_{n n}\) in a square matrix form the **main diagonal** of \(A\), and the elements \(a_{1 n}, a_{2, n-1}, \ldots, a_{n 1}\) form what is sometimes called the **skew main diagonal** of \(A\). Entries not on the main diagonal are also often called **off-diagonal** entries, and a matrix whose off-diagonal entries are all zero is called a **diagonal matrix**. It is common to call \(a_{1 2}, a_{2 3}, \ldots, a_{n-1,n}\) the **superdiagonal** of \(A\) and \(a_{2 1}, a_{3 2}, \ldots, a_{n,n-1}\) the **subdiagonal** of \(A\). The motivation for this terminology should be clear if you create a sample square matrix and trace the entries within these particular subsequences of the matrix.

Square matrices are important because they are fundamental to applications of Linear Algebra. In particular, virtually every use of Linear Algebra either involves square matrices directly or employs them in some indirect manner. In addition, virtually every usage also involves the notion of **vector**, where here we mean either an \(m \times 1\) matrix (a.k.a.~a **column vector**) or a \(1 \times n\) matrix (a.k.a. a **row vector**).

**Example A.1.3.** Suppose that \(A = (a_{i j})\), \(B = (b_{i j})\), \(C = (c_{ij})\), \(D = (d_{i j})\), and \(E = (e_{i j})\) are the following matrices over \(\mathbb{F}\):

\[ A = \left[

\begin{array}{r}

3 \\

-1 \\

1

\end{array}

\right]\hspace{-.1cm},\hspace{.1cm}

B =

\left[

\begin{array}{rr}

4 & -1 \\

0 & 2

\end{array}

\right]\hspace{-.1cm},\hspace{.1cm}

C =

\left[

\begin{array}{rrr}

1, & 4, & 2

\end{array}

\right]\hspace{-.1cm},\hspace{.1cm}

D =

\left[

\begin{array}{rrr}

1 & 5 & 2 \\

-1 & 0 & 1 \\

3 & 2 & 4

\end{array}

\right]\hspace{-.1cm},\hspace{.1cm}

E =

\left[

\begin{array}{rrr}

6 & 1 & 3 \\

-1 & 1 & 2 \\

4 & 1 & 3

\end{array}

\right]\hspace{-.1cm}.

\]

Then we say that \(A\) is a \(3 \times 1\) matrix (a.k.a.~a column vector), \(B\) is a \(2 \times 2\) square matrix, \(C\) is a \(1 \times 3\) matrix (a.k.a. a row vector), and both \(D\) and \(E\) are square \(3 \times 3\) matrices. Moreover, only \(B\) is an upper-triangular matrix (as defined below), and none of the matrices in this example are diagonal matrices.

We can discuss individual entries in each matrix. E.g.,

- the \(2^{\text{th}}\) row of \(D\) is \(d_{2 1} = -1\), \(d_{2 2} = 0\), and \(d_{2 3} = 1\).
- the main diagonal of \(D\) is the sequence \(d_{1 1} = 1, d_{2 2} = 0, d_{3 3} = 4\).
- the skew main diagonal of \(D\) is the sequence \(d_{1 3} = 2, d_{2 2} = 0, d_{3 1} = 3\).
- the off-diagonal entries of \(D\) are (by row) \(d_{1 2}\), \(d_{1 3}\), \(d_{2 1}\), \(d_{2 3}\), \(d_{3 1}\), and \(d_{3 2}\).
- the \(2^{\text{th}}\) column of \(E\) is \(e_{1 2} = e_{2 2} = e_{3 2} = 1\).
- the superdiagonal of \(E\) is the sequence \(e_{1 2} = 1, e_{2 3} = 2\).
- the subdiagonal of \(E\) is the sequence \(e_{2 1} = -1, e_{3 2} = 1\).

A square matrix \(A = (a_{i j}) \in \mathbb{F}^{n \times n}\) is called **upper triangular** (resp. **lower triangular**) if \(a_{i j} = 0\) for each pair of integers \(i,j \in \{1, \ldots, n\}\) such that \(i > j\) (resp. \(i < j\)). In other words, \(A\) is triangular if it has the form

\[

\begin{bmatrix}

a_{1 1} & a_{1 2} & a_{1 3} & \cdots & a_{1 n} \\

0 & a_{2 2} & a_{2 3} & \cdots & a_{2 n} \\

0 & 0 & a_{3 3} & \cdots & a_{3 n} \\

\vdots & \vdots & \vdots & \ddots & \vdots \\

0 & 0 & 0 & \cdots & a_{n n}

\end{bmatrix}

\ \

\text{or}

\ \

\begin{bmatrix}

a_{1 1} & 0 & 0 & \cdots & 0 \\

a_{2 1} & a_{2 2} & 0 & \cdots & 0 \\

a_{3 1} & a_{3 2} & a_{3 3} & \cdots & 0 \\

\vdots & \vdots & \vdots & \ddots & \vdots \\

a_{n 1} & a_{n 2} & a_{n 3} & \cdots & a_{n n}

\end{bmatrix}.

\]

Note that a diagonal matrix is simultaneously both an upper triangular matrix and a lower triangular matrix.

Two particularly important examples of diagonal matrices are defined as follows: Given any positive integer \(n \in \mathbb{Z}_{+}\), we can construct the **identity matrix** \(I_{n}\) and the **zero matrix** \(0_{n \times n}\) by setting

\[

I_{n} =

\begin{bmatrix}

1 & 0 & 0 & \cdots & 0 & 0 \\

0 & 1 & 0 & \cdots & 0 & 0 \\

0 & 0 & 1 & \cdots & 0 & 0 \\

\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\

0 & 0 & 0 & \cdots & 1 & 0 \\

0 & 0 & 0 & \cdots & 0 & 1 \\

\end{bmatrix}

\mbox{ and }

0_{n \times n} =

\begin{bmatrix}

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

\end{bmatrix},

\]

where each of these matrices is understood to be a square matrix of size \(n \times n\). The zero matrix \(0_{m \times n}\) is analogously defined for any \(m, n \in \mathbb{Z}_{+}\) and has size \(m \times n\). I.e.,

\[

0_{m \times n} =

\left.

\begin{bmatrix}

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\

0 & 0 & 0 & \cdots & 0 & 0 \\

0 & 0 & 0 & \cdots & 0 & 0 \\

\end{bmatrix}

\right\}

m \mbox{ rows}

\hspace{-5.675cm}

\underbrace{

\phantom{

\begin{bmatrix}

a & a & a & a_{a_{a}} & a_{a} \\

a & a & a\\

a & a & a\\

a & a & a\\

a & a & a\\

a & a & a\\

a & a & a\\

\end{bmatrix}

}

}_{\textstyle n \mbox{ columns}}

\]

## A.1.2 Using matrices to encode linear systems

Let \(m, n \in \mathbb{Z}_{+}\) be positive integers. Then a **system of** \(m\) **linear equations in** \(n\) **unknowns** \(x_{1}, \ldots, x_{n}\) looks like

\begin{equation}

\label{eqn:GenericLinearSystem}

\left.

\begin{aligned}

a_{1 1}x_{1} + a_{1 2}x_{2} + a_{1 3}x_{3} + \cdots + a_{1 n}x_{n} & = b_{1} \\

a_{2 1}x_{1} + a_{2 2}x_{2} + a_{2 3}x_{3} + \cdots + a_{2 n}x_{n} & = b_{2} \\

a_{3 1}x_{1} + a_{3 2}x_{2} + a_{3 3}x_{3} + \cdots + a_{3 n}x_{n} & = b_{3} \\

& \ \,\vdots \\

a_{m 1}x_{1} + a_{m 2}x_{2} + a_{m 3}x_{3} + \cdots + a_{m n}x_{n} & = b_{m}

\end{aligned}

\right\}, \tag{A.1.1}

\end{equation}

where each \(a_{i j}, b_{i} \in \mathbb{F}\) is a scalar for \(i = 1, 2, \ldots, m\) and \(j = 1, 2, \ldots, n\). In other words, each scalar \(b_{1}, \ldots, b_{m} \in \mathbb{F}\) is being written as a linear combination of the unknowns \(x_{1}, \ldots, x_{n}\) using coefficients from the field \(\mathbb{F}\). To **solve** System (A.1.1) means to describe the set of all possible values for \(x_{1}, \ldots, x_{n}\) (when thought of as scalars in \(\mathbb{F}\)) such that each of the \(m\) equations in System (A.1.1) is satisfied simultaneously.

Rather than dealing directly with a given linear system, it is often convenient to first encode the system using less cumbersome notation. Specifically, System (A.1.1) can be summarized using exactly three matrices. First, we collect the coefficients from each equation into the \(m \times n\) matrix \(A = (a_{i j}) \in \mathbb{F}^{m \times n}\), which we call the **coefficient matrix** for the linear system. Similarly, we assemble the unknowns \(x_{1}, x_{2}, \ldots, x_{n}\) into an \(n \times 1\) column vector \(x = (x_{i}) \in \mathbb{F}^{n}\), and the right-hand sides \(b_{1}, b_{2}, \ldots, b_{m}\) of the equation are used to form an \(m \times 1\) column vector \(b = (b_{i}) \in \mathbb{F}^{m}\). In other words,

\[

A =

\begin{bmatrix}

a_{1 1} & a_{1 2} & \cdots & a_{1 n} \\

a_{2 1} & a_{2 2} & \cdots & a_{2 n} \\

\vdots & \vdots & \ddots & \vdots \\

a_{m 1} & a_{m 2} & \cdots & a_{m n}

\end{bmatrix}, \ \

x =

\begin{bmatrix}

x_{1}\\

x_{2}\\

\vdots\\

x_{n}

\end{bmatrix},

\text{ and }

b =

\begin{bmatrix}

b_{1}\\

b_{2}\\

\vdots\\

b_{m}

\end{bmatrix}.

\]

Then the left-hand side of the \(i^{\text{th}}\) equation in System (A.1.1) can be recovered by taking the dot product (a.k.a.** Euclidean inner product**) of \(x\) with the \(i^{\text{th}}\) row in \(A\):

\[

\begin{bmatrix}

a_{i 1} & a_{i 2} & \cdots & a_{i n}

\end{bmatrix}

\cdot x

=

\sum_{j = 1}^{n} a_{i j}x_{j}

=

a_{i 1}x_{1} + a_{i 2}x_{2} + a_{i 3}x_{3} + \cdots + a_{i n}x_{n}.

\]

In general, we can extend the dot product between two vectors in order to form the product of any two matrices (as in Section A.2.2). For the purposes of this section, though, it suffices to simply define the product of the matrix \(A \in \mathbb{F}^{m \times n}\) and the vector \(x \in \mathbb{F}^{n}\) to be

\begin{equation}

\label{eqn:MatrixVectorProduct}

Ax =

\begin{bmatrix}

a_{1 1} & a_{1 2} & \cdots & a_{1 n} \\

a_{2 1} & a_{2 2} & \cdots & a_{2 n} \\

\vdots & \vdots & \ddots & \vdots \\

a_{m 1} & a_{m 2} & \cdots & a_{m n}

\end{bmatrix}

\begin{bmatrix}

x_{1}\\

x_{2}\\

\vdots\\

x_{n}

\end{bmatrix}

=

\begin{bmatrix}

a_{1 1}x_{1} + a_{1 2}x_{2} + \cdots + a_{1 n}x_{n} \\

a_{2 1}x_{1} + a_{2 2}x_{2} + \cdots + a_{2 n}x_{n} \\

\vdots\\

a_{m 1}x_{1} + a_{m 2}x_{2} + \cdots + a_{m n}x_{n} \\

\end{bmatrix}. \tag{A.1.2}

\end{equation}

Then, since each entry in the resulting \(m \times 1\) column vector \(Ax \in \mathbb{F}^{m}\) corresponds exactly to the left-hand side of each equation in System (A.1.1), we have effectively encoded System (A.1.1) as the single **matrix equation**

\begin{equation}

Ax =

\begin{bmatrix}

a_{1 1}x_{1} + a_{1 2}x_{2} + \cdots + a_{1 n}x_{n} \\

a_{2 1}x_{1} + a_{2 2}x_{2} + \cdots + a_{2 n}x_{n} \\

\vdots\\

a_{m 1}x_{1} + a_{m 2}x_{2} + \cdots + a_{m n}x_{n} \\

\end{bmatrix}

=

\begin{bmatrix}

b_{1}\\

\vdots\\

b_{m}

\end{bmatrix}

= b. \tag{A.1.3}

\end{equation}

**Example A.1.4.** The linear system

\[

\left.

\begin{array}{rrrrrrrrrrrr}

x_{1} & + & 6 x_{2} & ~ & ~ & ~ & + & 4 x_{5} & - & 2 x_{6} & = & 14 \\

~ & ~ & ~ & ~ & x_{3} & ~ & + & 3 x_{5} & + & x_{6} & = & -3 \\

~ & ~ & ~ & ~ & ~ & x_{4} & + & 5 x_{5} & + & 2 x_{6} & = & 11

\end{array}

\right\}.

\]

has three equations and involves the six variables \(x_{1}, x_{2}, \ldots, x_{6}\). One can check that possible solutions to this system include

\[

\begin{bmatrix}

x_{1} \\

x_{2} \\

x_{3} \\

x_{4} \\

x_{6} \\

x_{6}

\end{bmatrix}

=

\begin{bmatrix}

14 \\

0 \\

-3 \\

11 \\

0 \\

0

\end{bmatrix}

\text{ and }

\begin{bmatrix}

x_{1} \\

x_{2} \\

x_{3} \\

x_{4} \\

x_{6} \\

x_{6}

\end{bmatrix}

=

\begin{bmatrix}

6 \\

1 \\

-9 \\

-5 \\

2 \\

3

\end{bmatrix}.

\]

Note that, in describing these solutions, we have used the six unknowns \(x_{1}, x_{2}, \ldots, x_{6}\) to form the \(6 \times 1\) column vector \(x = (x_{i}) \in \mathbb{F}^{6}\). We can similarly form the coefficient matrix \(A \in \mathbb{F}^{3 \times 6}\) and the \(3 \times 1\) column vector \(b \in \mathbb{F}^{3}\), where

\[ A =

\begin{bmatrix}

1 & 6 & 0 & 0 & 4 & -2 \\

0 & 0 & 1 & 0 & 3 & 1 \\

0 & 0 & 0 & 1 & 5 & 2

\end{bmatrix}

\text{ and }

\begin{bmatrix}

b_{1} \\

b_{2} \\

b_{3}

\end{bmatrix}

=

\begin{bmatrix}

14 \\

-3 \\

11

\end{bmatrix}.

\]

You should check that, given these matrices, each of the solutions given above satisfies Equation (A.1.3).

We close this section by mentioning another common conventions for encoding linear systems. Specifically, rather than attempt to solve Equation (A.1.1) directly, one can instead look at the equivalent problem of describing all coefficients \(x_{1}, \ldots, x_{n} \in \mathbb{F}\) for which the following **vector equation** is satisfied:

\begin{equation}

\label{eqn:GenericVectorSystem}

x_{1}

\begin{bmatrix}

a_{1 1} \\

a_{2 1} \\

a_{3 1} \\

\vdots \\

a_{m 1} \\

\end{bmatrix}

+ x_{2}

\begin{bmatrix}

a_{1 2} \\

a_{2 2} \\

a_{3 2} \\

\vdots \\

a_{m 2} \\

\end{bmatrix}

+ x_{3}

\begin{bmatrix}

a_{1 3} \\

a_{2 3} \\

a_{3 3} \\

\vdots \\

a_{m 3} \\

\end{bmatrix}

+ \cdots + x_{n}

\begin{bmatrix}

a_{1 n} \\

a_{2 n} \\

a_{3 n} \\

\vdots \\

a_{m n} \\

\end{bmatrix}

=

\begin{bmatrix}

b_{1} \\

b_{2} \\

b_{3} \\

\vdots\\

b_{m} \\

\end{bmatrix}. \tag{A.1.4}

\end{equation}

This approach emphasizes analysis of the so-called **column vectors** \(A^{(\cdot, j)}\) \(j = 1, \ldots, n \) of the coefficient matrix \(A\) in the matrix equation \(A x = b\). (See in Section A.2.1 for more details about how Equation (A.1.4). Conversely, it is also common to directly encounter Equation (A.1.4) when studying certain questions about vectors in \(\mathbb{F}^{n}\).

It is important to note that System (A.1.1) differs from Equations (A.1.3) and (A.1.4) only in terms of notation. The common aspect of these different representations is that the left-hand side of each equation in System (A.1.1) is a linear sum. Because of this, it is also common to rewrite System (A.1.1) using more compact notation such as

\[

\sum_{k = 1}^{n}a_{1 k}x_{k} = b_{1},\

\sum_{k = 1}^{n}a_{2 k}x_{k} = b_{2},\

\sum_{k = 1}^{n}a_{3 k}x_{k} = b_{3},\

\ldots,\

\sum_{k = 1}^{n}a_{m k}x_{k} = b_{m}.

\]

## Contributors

- Isaiah Lankham, Mathematics Department at UC Davis
- Bruno Nachtergaele, Mathematics Department at UC Davis
- Anne Schilling, Mathematics Department at UC Davis

Both hardbound and softbound versions of this textbook are available online at WorldScientific.com.