Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

12.1: From linear systems to matrix equations

( \newcommand{\kernel}{\mathrm{null}\,}\)

We begin this section by reviewing the definition of and notation for matrices. We then review several different conventions for denoting and studying systems of linear equations, the most fundamental being as a single matrix equation. This point of view has a long history of exploration, and numerous computational devices — including several computer programming languages — have been developed and optimized specifically for analyzing matrix equations.

A.1.1 Definition of and notation for matrices

Let m,nZ+ be positive integers, and, as usual, let F denote either R or C. Then we begin by defining an m×n matrix A to be a rectangular array of numbers

A=(aij)m,ni,j=1=(A(i,j))m,ni,j=1=[a11a1nam1amn]}m numbers[aaaaaaaaaaaaaaa]n numbers

where each element aijF in the array is called an entry of A (specifically, aij is called the ``i,j entry''). We say that i indexes the rows of A as it ranges over the set {1,,m} and that j indexes the columns of A as it ranges over the set {1,,n}. We also say that the matrix A has size m×n and note that it is a (finite) sequence of doubly-subscripted numbers for which the two subscripts in no way depend upon each other.

Definition A.1.1. Given positive integers m,nZ+, we use Fm×n to denote the set of all m×n matrices having entries over F.

Example A.1.2. The matrix A=[10213i]C2×3, but AR2×3 since the "2,3"' entry of A is not in R.

Given the ubiquity of matrices in both abstract and applied mathematics, a rich vocabulary has been developed for describing various properties and features of matrices. In addition, there is also a rich set of equivalent notations. For the purposes of these notes, we will use the above notation unless the size of the matrix is understood from context or is unimportant. In this case, we will drop much of this notation and denote a matrix simply as

A=(aij) or A=(aij)m×n.

To get a sense of the essential vocabulary, suppose that we have an m×n matrix A=(aij) with m=n. Then we call A a square matrix. The elements a11,a22,,ann in a square matrix form the main diagonal of A, and the elements a1n,a2,n1,,an1 form what is sometimes called the skew main diagonal of A. Entries not on the main diagonal are also often called off-diagonal entries, and a matrix whose off-diagonal entries are all zero is called a diagonal matrix. It is common to call a12,a23,,an1,n the superdiagonal of A and a21,a32,,an,n1 the subdiagonal of A. The motivation for this terminology should be clear if you create a sample square matrix and trace the entries within these particular subsequences of the matrix.

Square matrices are important because they are fundamental to applications of Linear Algebra. In particular, virtually every use of Linear Algebra either involves square matrices directly or employs them in some indirect manner. In addition, virtually every usage also involves the notion of vector, where here we mean either an m×1 matrix (a.k.a.~a column vector) or a 1×n matrix (a.k.a. a row vector).

Example A.1.3. Suppose that A=(aij), B=(bij), C=(cij), D=(dij), and E=(eij) are the following matrices over F:

A=[311],B=[4102],C=[1,4,2],D=[152101324],E=[613112413].

Then we say that A is a 3×1 matrix (a.k.a.~a column vector), B is a 2×2 square matrix, C is a 1×3 matrix (a.k.a. a row vector), and both D and E are square 3×3 matrices. Moreover, only B is an upper-triangular matrix (as defined below), and none of the matrices in this example are diagonal matrices.

We can discuss individual entries in each matrix. E.g.,

  1. the 2th row of D is d21=1, d22=0, and d23=1.
  2. the main diagonal of D is the sequence d11=1,d22=0,d33=4.
  3. the skew main diagonal of D is the sequence d13=2,d22=0,d31=3.
  4. the off-diagonal entries of D are (by row) d12, d13, d21, d23, d31, and d32.
  5. the 2th column of E is e12=e22=e32=1.
  6. the superdiagonal of E is the sequence e12=1,e23=2.
  7. the subdiagonal of E is the sequence e21=1,e32=1.

A square matrix A=(aij)Fn×n is called upper triangular (resp. lower triangular) if aij=0 for each pair of integers i,j{1,,n} such that i>j (resp. i<j). In other words, A is triangular if it has the form

[a11a12a13a1n0a22a23a2n00a33a3n000ann]  or  [a11000a21a2200a31a32a330an1an2an3ann].

Note that a diagonal matrix is simultaneously both an upper triangular matrix and a lower triangular matrix.

Two particularly important examples of diagonal matrices are defined as follows: Given any positive integer nZ+, we can construct the identity matrix In and the zero matrix 0n×n by setting

In=[1000001000001000001000001] and 0n×n=[0000000000000000000000000],

where each of these matrices is understood to be a square matrix of size n×n. The zero matrix 0m×n is analogously defined for any m,nZ+ and has size m×n. I.e.,

0m×n=[0000000000000000000000000]}m rows[aaaaaaaaaaaaaaaaaaaaaaaaaa]n columns

A.1.2 Using matrices to encode linear systems

Let m,nZ+ be positive integers. Then a system of m linear equations in n unknowns x1,,xn looks like

a11x1+a12x2+a13x3++a1nxn=b1a21x1+a22x2+a23x3++a2nxn=b2a31x1+a32x2+a33x3++a3nxn=b3 am1x1+am2x2+am3x3++amnxn=bm},

where each aij,biF is a scalar for i=1,2,,m and j=1,2,,n. In other words, each scalar b1,,bmF is being written as a linear combination of the unknowns x1,,xn using coefficients from the field F. To solve System (A.1.1) means to describe the set of all possible values for x1,,xn (when thought of as scalars in F) such that each of the m equations in System (A.1.1) is satisfied simultaneously.

Rather than dealing directly with a given linear system, it is often convenient to first encode the system using less cumbersome notation. Specifically, System (A.1.1) can be summarized using exactly three matrices. First, we collect the coefficients from each equation into the m×n matrix A=(aij)Fm×n, which we call the coefficient matrix for the linear system. Similarly, we assemble the unknowns x1,x2,,xn into an n×1 column vector x=(xi)Fn, and the right-hand sides b1,b2,,bm of the equation are used to form an m×1 column vector b=(bi)Fm. In other words,

A=[a11a12a1na21a22a2nam1am2amn],  x=[x1x2xn], and b=[b1b2bm].

Then the left-hand side of the ith equation in System (A.1.1) can be recovered by taking the dot product (a.k.a. Euclidean inner product) of x with the ith row in A:

[ai1ai2ain]x=nj=1aijxj=ai1x1+ai2x2+ai3x3++ainxn.

In general, we can extend the dot product between two vectors in order to form the product of any two matrices (as in Section A.2.2). For the purposes of this section, though, it suffices to simply define the product of the matrix AFm×n and the vector xFn to be

Ax=[a11a12a1na21a22a2nam1am2amn][x1x2xn]=[a11x1+a12x2++a1nxna21x1+a22x2++a2nxnam1x1+am2x2++amnxn].

Then, since each entry in the resulting m×1 column vector AxFm corresponds exactly to the left-hand side of each equation in System (A.1.1), we have effectively encoded System (A.1.1) as the single matrix equation

Ax=[a11x1+a12x2++a1nxna21x1+a22x2++a2nxnam1x1+am2x2++amnxn]=[b1bm]=b.

Example A.1.4. The linear system
x1+6x2   +4x52x6=14    x3 +3x5+x6=3     x4+5x5+2x6=11}.

has three equations and involves the six variables x1,x2,,x6. One can check that possible solutions to this system include

[x1x2x3x4x6x6]=[14031100] and [x1x2x3x4x6x6]=[619523].

Note that, in describing these solutions, we have used the six unknowns x1,x2,,x6 to form the 6×1 column vector x=(xi)F6. We can similarly form the coefficient matrix AF3×6 and the 3×1 column vector bF3, where

A=[160042001031000152] and [b1b2b3]=[14311].

You should check that, given these matrices, each of the solutions given above satisfies Equation (A.1.3).

We close this section by mentioning another common conventions for encoding linear systems. Specifically, rather than attempt to solve Equation (A.1.1) directly, one can instead look at the equivalent problem of describing all coefficients x1,,xnF for which the following vector equation is satisfied:

x1[a11a21a31am1]+x2[a12a22a32am2]+x3[a13a23a33am3]++xn[a1na2na3namn]=[b1b2b3bm].

This approach emphasizes analysis of the so-called column vectors A(,j) j=1,,n of the coefficient matrix A in the matrix equation Ax=b. (See in Section A.2.1 for more details about how Equation (A.1.4). Conversely, it is also common to directly encounter Equation (A.1.4) when studying certain questions about vectors in Fn.

It is important to note that System (A.1.1) differs from Equations (A.1.3) and (A.1.4) only in terms of notation. The common aspect of these different representations is that the left-hand side of each equation in System (A.1.1) is a linear sum. Because of this, it is also common to rewrite System (A.1.1) using more compact notation such as

nk=1a1kxk=b1, nk=1a2kxk=b2, nk=1a3kxk=b3, , nk=1amkxk=bm.


This page titled 12.1: From linear systems to matrix equations is shared under a not declared license and was authored, remixed, and/or curated by Isaiah Lankham, Bruno Nachtergaele, & Anne Schilling.

Support Center

How can we help?