3.2 Matrix Operations
- Page ID
- 113658
This page is a draft and is under active development.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)\( \def\Span#1{\text{Span}\left\lbrace #1\right\rbrace} \def\vect#1{\mathbf{#1}} \def\ip{\boldsymbol{\cdot}} \def\iff{\Longleftrightarrow} \def\cp{\times} \)
Introduction
In Chapter 2 matrices were introduced to represent systems of linear equations. The coefficients of a linear system were put into the coefficient matrix \(A \), and a system as a whole could be squeezed into the augmented matrix. In Section Sec:LinearTrafo we used matrices to construct linear transformations. In this chapter we will study matrices as entities on their own, though every now and then we will keep in mind their role in the two contexts just mentioned.
Sum, Scalar Multiple and Transpose
In this section we will define the sum and the product of two matrices, and the transpose of a matrix. Recall that an \(m\times n \) matrix has \(m \) (horizontal) rows of size \(n \) or, equivalently, \(n \) (vertical) columns of size \(m \).
Two matrices are said to have the same size if they have the same number of rows and the same number of columns. Two matrices \(A \) and \(B \) are equal if they have the same size, say \(m \) rows and \(n \) columns, and all the corresponding entries are equal, i.e. \[ a_{ij} = b_{ij}, \text{ for }i = 1,\ldots,m, j = 1,\ldots,n. \nonumber\]
A zero matrix \(O \) is a matrix with all entries equal to 0. If the context requires clarity as to its size it may be denoted by \(O_{mn} \).
If \(A \) is an \(m\times n \) matrix and \(c \) is a scalar, then \(cA \) is the \(m \times n \) matrix that is the result of multiplying each entry of \(A \) by \(c \): \[ c \left[\begin{array}{cccc} a_{11} & a_{12}& \ldots& a_{1n} \\ a_{21} & a_{22}& \ldots& a_{2n} \\ \vdots & \vdots& \cdots& \vdots \\ a_{m1} & a_{m2}& \ldots& a_{mn} \end{array} \right] = \left[\begin{array}{cccc} ca_{11} & ca_{12}& \ldots& ca_{1n} \\ ca_{21} & ca_{22}& \ldots& ca_{2n} \\ \vdots & \vdots& \cdots& \vdots \\ ca_{m1} & ca_{m2}& \ldots& ca_{mn} \end{array} \right]. \nonumber\]
If \(A \) and \(B \) are two \(m\times n \) matrices then the sum \(A+B \) is the \(m\times n \) matrix of which the entry on the position \((i,j) \) is the sum of the corresponding entries of \(A \) and \(B \): \[ \left[\begin{array}{cccc} a_{11} & a_{12}& \ldots& a_{1n} \\ a_{21} & a_{22}& \ldots& a_{2n} \\ \vdots & \vdots& \cdots& \vdots \\ a_{m1} & a_{m2}& \ldots& a_{mn} \end{array} \right] + \left[\begin{array}{cccc} b_{11} & b_{12}& \ldots& b_{1n} \\ b_{21} & b_{22}& \ldots& b_{2n} \\ \vdots & \vdots& \cdots& \vdots \\ b_{m1} & b_{m2}& \ldots& b_{mn} \end{array} \right] = \nonumber\] \[ = \left[\begin{array}{cccc} a_{11}+b_{11} & a_{12}+b_{12}& \ldots& a_{1n}+b_{1n} \\ a_{21}+b_{21} & a_{22}+b_{22}& \ldots& a_{2n}+b_{2n} \\ \vdots & \vdots& \cdots& \vdots \\ a_{m1}+b_{m1} & a_{m2}+b_{m2}& \ldots& a_{mn}+b_{mn} \end{array} \right]. \nonumber\] If \(A \) and \(B \) are not of the same size their sum is not defined.
\[ \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & -4 \end{array}\right] + \left[\begin{array}{r} 3 & 2 \\ 4 & -5 \\ 2 & 5 \end{array}\right] = \left[\begin{array}{r} 4 & 5 \\ 9 & -3 \\ 8 & 1 \end{array}\right], \nonumber\] \[ \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & -4 \end{array}\right] + \left[\begin{array}{r} 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{array}\right] = \left[\begin{array}{r} 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{array}\right] + \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & -4 \end{array}\right] = \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & -4 \end{array}\right], \nonumber\] \[ \begin{array}{lcl} \left[\begin{array}{rr} 1 & 3 & 5 \\ 2 & 4 & 1 \end{array}\right] + (-1) \left[\begin{array}{rr} 1 & 3 & 5 \\ 2 & 4 & 1 \end{array}\right] &=& \left[\begin{array}{rr} 1 & 3 & 5 \\ 2 & 4 & 1 \end{array}\right] + \left[\begin{array}{rr} -1 & -3 &-5 \\ -2 & -4 & -1 \end{array}\right] \\ &=&\quad \left[\begin{array}{rr} 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right]. \end{array}. \nonumber\]
The multiple \((-1)A \) is also written as \(-A \). An obvious property, illustrated in the third example, is: \[ A + (-A) = O, \nonumber\] where \(O \) is the zero matrix.
\[ \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & -4 \end{array}\right] + \left[\begin{array}{rr} 1 & 3 & 5 \\ 2 & 4 & 1 \end{array}\right] \nonumber\] is not defined. This is because the matrices do not have the same size.
The two definitions of sum and scalar multiple are called componentwise definitions. They are completely analogous to the definitions of the scalar multiples of a vector and the sum of two vectors. Hence it is not surprising that they obey exactly the same rules, as is summarized in the next proposition. (cf. Section Sec:Vectors.)
Suppose \(A, B \) and \(C \) are \(m\times n \) matrices and let \(c_{1},c_{2} \) be two real numbers. Then we have:
- \(A+O_{mn}=A=O_{mn}+A \)
- \((A+B)+C=A+(B+C) \)
- \(A+B=B+A \)
- \(A+(-A)=O \)
- \(1A=A \)
- \(c_{1}(A+B)=c_{1}A+c_{1}B \)
- \((c_{1}+c_{2})A=c_{1}A+c_{2}A \)
- \(c_{1}(c_{2}A)=(c_{1}c_{2})A \)
An operator of which the usefulness is not immediately clear, but which fits well in this section with matrix operations, is the following:
The transpose of an \(m \times n \) matrix \(A \) with entries \(a_{ij} \) is the \(n \times m \) matrix \(B \) with entries \(b_{ij} \) defined by \( b_{ij} = a_{ji} \). It is denoted by \(B = A^T \).
\[ \left[\begin{array}{r} 1 & 3 \\ 5 & 2 \\ 6 & 4 \end{array}\right]^T = \left[\begin{array}{rr} 1 & 5 & 6 \\ 3 & 2 & 4 \end{array}\right] \quad \text{and} \quad \left[\begin{array}{rrr} -1 & 2 & -4 & 0\end{array}\right]^T = \left[\begin{array}{r} -1 \\ 2 \\-4 \\ 0\end{array}\right]. \nonumber\]
The following rules involving the three operators defined so far in this section are easy to prove:
Let \(A \) and \(B \) be \(m\times n \) matrices and \(c \) a scalar. Then we have
- \((cA)^T = c A^T \)
- \((A+B)^T = A^T + B^T \)
- \((A^T)^T = A \).
Skip/Read the proof -
We will prove the second statement and leave the other two to the diligent reader. See Exercise 13. So, suppose \(A \) and \(B \) are two \(m \times n \) matrices. Then \(A+B \) is an \(m \times n \) matrix too, hence \((A+B)^T \) is an \(n \times m \) matrix. The matrix \(A^T + B^T \) on the right-hand side of the equation is the sum of two \(n \times m \) matrices, which is again an \(n \times m \) matrix. So the matrices on both sides of the equation have the same size. Next we have to show that they have equal entries on the corresponding positions. If we put \[ E = (A+B)^T \quad \text{and}\quad F = A^T + B^T \nonumber \nonumber\] we see that \[ e_{ij} = \text{ entry of } (A+B) \text{ on position }(j,i) \nonumber \nonumber\] and \[ \begin{array}{rl} f_{ij} &= \text{ entry of } A^T \text{ on position }(i,j) + \text{ entry of } B^T \text{ on position }(i,j) \\ &= \text{ entry of } A \text{ on position }(j,i) + \text{ entry of } B \text{ on position }(j,i)\\ &= \text{ entry of } (A+B) \text{ on position }(j,i)\\ &= e_{ij}, \end{array} \nonumber \nonumber\] so we are done. If you are lost in the forest of indices, have a look at Example 12
Find \(X \) if \(A + 2X^T + B = C \), where \[ A = \left[\begin{array}{rr} 1 & 1 & 2 \\ 3 & 1 & 0 \end{array}\right], \quad B = \left[\begin{array}{rr} 2 & 0 & 3 \\ 2 & 3 & 4 \end{array}\right], \text{ and} \quad C = \left[\begin{array}{rr} 7 & 5 & 1 \\ 1 & 4 & 2 \end{array}\right]. \nonumber \nonumber\] We will extricate \(X \) step by step: \[ A + 2X^T + B = C \iff 2X^T = C-A-B \iff X^T = \tfrac12(C-A-B). \nonumber \nonumber\] Next we transpose both terms to find \[ X = \tfrac12(C-A-B)^T = \frac12 \left[\begin{array}{rr} 4 & 4 & -4 \\ -4 & 0 & -2 \end{array}\right]^T = \left[\begin{array}{r} 2 & -2 \\ 2 & 0 \\ -2 & -1 \end{array}\right] \nonumber \nonumber\]
The product of two matrices
Next we turn our attention to the most important matrix operation, namely the product \(AB \) of two matrices. In the previous chapter we have already seen the special case where \(B \) is a matrix of just one column, i.e., \[ B = \vect{x} = \left[\begin{array}{r}x_1 \\ x_2 \\ \vdots \\ x_n \end{array}\right], \nonumber \nonumber\] a vector in \(\mathbb{R}^n \), which we can identify with an \(n \times 1 \) matrix. We want of course the definition to be consistent with this.
The product of an \(m\times n \) matrix \(A \) and an \(n\times p \) matrix \(B = [ \vect{b}_1 \vect{b}_2 \ldots \vect{b}_p ] \) is defined by \[ AB = [ A\vect{b}_1 A\vect{b}_2 \ldots A\vect{b}_p ]. \nonumber \nonumber\] So we have \[ j\text{-th column of } AB = A\text{ times \(j \)-th column of } B, \quad j = 1,2,\ldots,p \nonumber\] Note that this makes \(AB \) an \(m \times p \) matrix. If the number of columns of \(A \) is not equal to the number of rows of \(B \) the product \(AB \) is not defined.
\[ \left[\begin{array}{r} 1 & -3 \\ -1 & 2 \\ 3& -2 \end{array}\right] \left[\begin{array}{rr} 2 & 1 & 1\\ 3 & 0 & 2 \end{array}\right] = \left[\begin{array}{rr} -7 & 1 & -5 \\ 4 & -1 & 3 \\ 0 & 3 &-1 \end{array}\right]. \nonumber\] For instance, the third column is computed as \[ \left[\begin{array}{r} 1 & -3 \\ -1 & 2 \\ 3& -2 \end{array}\right] \left[\begin{array}{r} 1\\ 2 \end{array}\right] = 1 \left[\begin{array}{r} 1 \\ -1 \\ 3\end{array}\right] + 2 \left[\begin{array}{r} -3 \\ 2 \\ -2 \end{array}\right] = \left[\begin{array}{r} -5 \\ 3 \\ -1\end{array}\right] \nonumber\]
Skip/Read the proof -
We already saw this row-column expansion in Section Sec:MatVecProduct.
The following scheme nicely visualizes the row-column expansion \[ \begin{array}{ccc} & \left[\begin{array} {rrrrr} b_{11} & b_{12}& \ldots& {\color{blue}b_{1j}} & \ldots& b_{1p} \\ b_{21} & b_{22}& \ldots& {\color{blue}b_{2j}} & \ldots& b_{2p} \\ \vdots & \vdots& \ldots& & \ldots& \vdots \\ b_{m1} & b_{m2}& \ldots& {\color{blue}b_{nj}} & \ldots& b_{mp} \end{array}\right] \\ \left[\begin{array} {rrrr} a_{11} & a_{12}& \ldots& \ldots& a_{1n} \\ a_{21} & a_{22}& \ldots& \ldots& a_{2n} \\ \vdots & \vdots& \ldots& \ldots& \vdots \\ {\color{blue}a_{i1}} & {\color{blue}a_{i2}}& {\color{blue}\cdots}& \ldots& {\color{blue}a_{in}} \\ \vdots & \vdots& \ldots& \ldots& \vdots \\ a_{m1} & a_{m2}& \ldots& \ldots& a_{mn} \end{array}\right] \!\! & \! \left[\begin{array} {rrrrr} c_{11} & c_{12}& \ldots& c_{1j} &\ldots& c_{1p} \\ c_{21} & a_{22}& \ldots& c_{2j} &\ldots& c_{2p} \\ \vdots & \vdots& \ldots& & \ldots& \vdots \\ c_{i1} & a_{i2}& \cdots&{\color{blue}c_{ij}} &\ldots& c{in} \\ \vdots & \vdots& \ldots& &\ldots& \vdots \\ a_{m1} & a_{m2}& \ldots& c_{n} &\ldots& a_{np} \end{array}\right] \end{array} \nonumber\]
Let us consider the same matrix product \[ \left[\begin{array}{r} 1 & -3 \\ -1 & 2 \\ 3& -2 \end{array}\right] \left[\begin{array}{rr} 2 & 1 & 1\\ 3 & 0 & 2 \end{array}\right] = \left[\begin{array}{rr} -7 & 1 & -5 \\ 4 & -1 & 3 \\ 0 & 3 &-1 \end{array}\right]. \nonumber \nonumber\] The \(-5 \) on position \((1,3) \) and the \(3 \) on position \((3,2) \) in the product come from \[ -5 = \left[\begin{array}{r} 1 & -3 \end{array}\right] \left[\begin{array}{r} 1\\ 2 \end{array}\right] \text{ and } 3 = \left[\begin{array}{r} 3 & -2 \end{array}\right] \left[\begin{array}{r} 1\\ 0 \end{array}\right]. \nonumber \nonumber\]
The product of a matrix \(A \) with itself is only defined if \(A \) is an \(n \times n \) matrix. In that case we use the obvious notation \[ A^2 = A\cdot A. \nonumber \nonumber\]
This example illustrates the existence of a `unit element' with respect to the multiplication. To identify it we first introduce some more terminology.
An \(n\times n \) matrix \(A \) is called a square matrix. So it is a matrix where the number of columns is equal to the number of rows. For a square matrix \(A \) we call the elements \(a_{ii} \) the diagonal elements. Together the diagonal elements form the (main) diagonal of \(A \). A square matrix where all non-diagonal elements are equal to 0 is called a diagonal matrix.
The other diagonal of a square matrix, the one from bottom left to top right, plays a minor role. For this reason we don't reserve a name for it. By `diagonal' we will always mean: main diagonal.
Consider the matrices \[ A = \left[\begin{array}{r} 2 & 2 \\ 3 & 3 \end{array}\right], \quad B = \left[\begin{array}{rr} 2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 6 \end{array}\right], \quad C = \left[\begin{array}{r} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{array}\right]. \nonumber \nonumber\] The matrices \(A \) and \(B \) are square, and only \(B \) is a diagonal matrix.
The identity matrix \(I_n \) is the \(n \times n \) diagonal matrix with 1's on the diagonal. If the size is irrelevant or clear from the context, we denote it simply by \(I \).
The definition of the product of two matrices and the earlier definition of the product of a matrix and a vector (Def. Dfn:MatVectProd:ProductMatVec) immediately imply that the columns of the product of two matrices are linear combinations of the columns of the first matrix. As is often the case in linear algebra things can be looked at from a different perspective. From Proposition 17 it follows that the elements \(c_{i1},c_{i2},\ldots,c_{in} \) of the \(i \)-th row of the product \(C = AB \) as far as \(A \) is concerned only depend on the elements \(a_{ik} \) of its \(i \)-th row. The following proposition explains in which way.
Skip/Read the proof -
The indicated linear combination yields: \[ a_{i1} \left[\begin{array}{rrr}b_{11} & b_{12} & \ldots &b_{1p} \end{array}\right] + a_{i2} \left[\begin{array}{rr}b_{21} & \ldots &b_{2p} \end{array}\right] + \ldots + a_{in} \left[\begin{array}{rrr}b_{n1} & b_{n2} & \ldots &b_{np} \end{array}\right] \nonumber \nonumber\] \[ = \left[\begin{array}{c} (a_{i1}b_{11} + a_{i2}b_{21}+ \ldots +a_{in}b_{n1}) & \ldots & (a_{i1}b_{1p} + a_{i2} b_{2p} + \ldots + a_{in}b_{np}) \end{array}\right]. \nonumber \nonumber\] This is a row vector with on the \(j \)-th position the number \[ (a_{i1}b_{1j} + a_{i2} b_{2j} + \ldots + a_{in}b_{nj}), \nonumber \nonumber\] and that is precisely the entry \(c_{ij} \) of the matrix \(C = AB \).
Interestingly this opens the way to describe the row operations of Chapter 2 via matrix multiplication. The following example illustrates this for the three basic row operations.
The following multiplication adds the first row of the matrix \[ A = \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \end{array}\right] \nonumber \nonumber\] four times to the second row: \[ \left[\begin{array}{rr} 1 & 0 & 0 \\ 4 & 1 & 0 \\ 0 & 0 & 1\end{array}\right] \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \\ \end{array}\right] = \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ 4a_{11}+a_{21}&4a_{12} +a_{22}& 4a_{13}+a_{23} \\ a_{31}& a_{32} & a_{33} \\ \end{array}\right]. \nonumber \nonumber\] Here the third row is scaled with a factor 5: \[ \left[\begin{array}{rr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 5\end{array}\right] \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right] = \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ 5a_{31}& 5a_{32} & 5a_{33} \\ \end{array}\right]. \nonumber \nonumber\] And with the following multiplication the first and third row of \(A \) are swapped: \[ \left[\begin{array}{rr} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0\end{array}\right] \left[\begin{array}{rr} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \end{array}\right] = \left[\begin{array}{rr}a_{31}& a_{32} & a_{33} \\ a_{21}& a_{22} & a_{23} \\ a_{11}& a_{12} & a_{13} \end{array}\right]. \nonumber \nonumber\]
For future reference we give these matrices a name:
The column-row product in the last example is the building block for yet another way to look at the matrix product. The next exercise explains how.
Properties of the matrix product
Now let us have a look which of the rules of the products of numbers also hold for products of matrices. And which do not.
- \(A(B_1+B_2) = AB_1 + AB_2 \) and \((A_1+A_2)B = A_1B+A_2B \);
- \(A(cB) = c(AB) = (cA)B \);
- \(AI_n = A \) and \(I_mA = A \) (the identity matrix \(I \) acts as a unit element);
-
As an illustration of rule (iv): we compute the two triple products for the three matrices \[ A = \left[\begin{array}{r} 3 & 1 \\ 2 & 1 \\ 0 & 5 \end{array}\right], \quad B = \left[\begin{array}{r} 1 & 2 \\ 3 & 0 \end{array}\right], \quad C = \left[\begin{array}{rr} 1 & 2 & 0 \\ 2 & 1 & 2 \end{array}\right]. \nonumber \nonumber\] On the one hand \[ A(BC) = \left[\begin{array}{r} 3 & 1 \\ 2 & 1 \\ 0 & 5 \end{array}\right] \left[\begin{array}{rr} 5 & 4 & 4\\ 3 & 6 & 0 \end{array}\right] = \left[\begin{array}{rr} 18 & 18 & 12\\ 13 & 14 & 8 \\ 15 & 30 & 0 \end{array}\right], \nonumber \nonumber\] and on the other hand \[ (AB)C = \left[\begin{array}{r} 6 & 6 \\ 5 & 4 \\ 15 & 0 \end{array}\right] \left[\begin{array}{rr} 1 & 2 & 0 \\ 2 & 1 & 2 \end{array}\right] = \left[\begin{array}{rr} 18 & 18 & 12 \\ 13 & 14 & 8 \\ 15 & 30 & 0 \end{array}\right]. \nonumber \nonumber\] So the products are indeed equal. But it is not immediately clear how: the value 14 on position (2,2) comes about in two ways \[ \text{via } A(BC)\!: 14 = 2\cdot4 + 1\cdot 6, \quad \text{via } (AB)C\!: 14 = 5\cdot2 + 4\cdot1. \nonumber \nonumber\] We need a good perspective to give a proof of the general case.
Skip/Read the proof -
Rules (i), (ii) are checked in a straightforward way. See Exercise Exc:RulesProduct.
- \addtocounter{enumi}{2}
- We saw instances of this property already in Example 23 and Exercise 29. For the general case, one way to show validity of the first statement is to note that the \(j \)-th column of \(AI_n \) is \(A\vect{e}_j \) where \(\vect{e}_j \) is the \(j \)-th column of the identity matrix \(I_n \). This gives the linear combination \[ A\vect{e}_j = 0\vect{a}_1 + 0\vect{a}_2 + \ldots + 1\vect{a}_j +\dots + 0\vect{a}_n = \vect{a}_j \nonumber \nonumber\] which shows that the \(j \)-th column of \(AI_n \) is equal to the \(j \)-th column of \(A \). And this holds for any column. The identity \[ I_mA = A \nonumber \nonumber\] is shown in an analogous way, working row by row.
- First we observe that both triple products yield \(m \times q \) matrices. Then the identity can be proved `column by column', as the previous one. We are done if we can show that \[ \begin{array}{rcl} k\text{-th column of }A(BC) &=& k\text{-th column of }(AB)C \\ &=& (AB)( k\text{-th column of }C) = (AB)\vect{c}_k, \end{array} \nonumber \nonumber\] for \( k = 1,2,\ldots q \). Now recall that (by definition) \[ k\text{-th column of }BC = B\vect{c}_k, \nonumber \nonumber\] so \[ k\text{-th column of }A(BC) = A (B\vect{c}_k) \nonumber \nonumber\] Making extensive use of the rule \[ A(c_1\vect{x} + c_2\vect{y}) = c_1A\vect{x} + c_2A\vect{y} \nonumber \nonumber\] we find \[ \begin{array}{ccl} A (B\vect{c}_k) & = & A (c_{1k}\vect{b}_1 +c_{2k}\vect{b}_2 + \ldots + c_{pk}\vect{b}_p)\\ & = & c_{1k}(A\vect{b}_1) +c_{2k}(A\vect{b}_2) + \ldots + c_{pk}(A\vect{b}_p)\\ & = & \left[\begin{array}{rrr} A\vect{b}_1 & A\vect{b}_2 & \ldots & A\vect{b}_p \end{array}\right] \left[\begin{array}{r} c_{1k} \\ \vdots \\ c_{pk} \end{array}\right] \\ & = & (AB)\vect{c}_k. \end{array} \nonumber \nonumber\]
So far so good: matrix multiplication behaves as multiplication of numbers. However, in two important respects the concepts deviate. First of all, commutativity no longer holds.
For the matrices \[ A = \left[\begin{array}{rr} 2 & 2 & 1\\ 3 & 3 & 0 \end{array}\right] \quad \text{and} \quad B = \left[\begin{array}{r} 1 & 3 \\ 3 & 1 \\ 4 & 0 \end{array}\right] \nonumber \nonumber\] it is clear than \[ AB \neq BA \nonumber \nonumber\] simply because the two products are not of the same size: \(AB \) is a \(2\times 2 \) matrix, \(BA \) a \(3\times3 \) matrix. The following example illustrates that \(AB = BA \) is not even guaranteed for two \(n\times n \) matrices \(A \) and \(B \): \[ \left[\begin{array}{r} 1 & 3 \\ 2 & 1 \end{array}\right] \left[\begin{array}{r} 0 & 1 \\ 1 & 2 \end{array}\right] = \left[\begin{array}{r} 3 & 7 \\ 1 & 4 \end{array}\right] \neq \left[\begin{array}{r} 2 & 1 \\ 5 & 5 \end{array}\right] = \left[\begin{array}{r} 0 & 1 \\ 1 & 2 \end{array}\right] \left[\begin{array}{r} 1 & 3 \\ 2 & 1 \end{array}\right]. \nonumber \nonumber\]
The fact that \(AB \neq BA \) can be understood by thinking about the composition of the two transformations corresponding to \(A \) and \(B \). (See Section Sec:LinTrafo.) The following two exercises shed some light on the non-commutativity.
Figure 1.
Note that \(T_A \) is a transformation that 'stretches' horizontally, and \(T_B \) is a reflection. Figure 1 visualizes the transformations corresponding to \(AB \) and \(BA \). When we apply the transformations one after another, the order in which we do this is important.
- Describe in words the row operations corresponding to \(E_1 \) and \(E_2 \).
- Describe in words the combined row operations corresponding to \(E_1E_2 \) and \(E_2E_1 \). Can you explain why \(E_1E_2 \neq E_2E_1 \)?
- Compute \(E_1E_2 \) and \(E_2E_1 \) to double check the last non-identity.
The second major difference between the product of numbers and the product of matrices: for two (e.g.\ real) numbers \(a \) and \(b \) it is known that \[ \text{if } a \neq 0 \text{ and } b \neq 0 \text{ then } ab \neq 0, \nonumber \nonumber\] or, equivalently, \[ ab = 0 \Rightarrow a = 0 \text{ or } b = 0. \nonumber \nonumber\] As the following example shows, things are different in the realm of matrices.
\[ \left[\begin{array}{r} 1 & 2 \\ 2 & 4 \end{array}\right] \left[\begin{array}{r} 2 & 6 \\ -1 & -3 \end{array}\right] = \left[\begin{array}{r} 0 & 0 \\ 0 & 0 \end{array}\right]. \nonumber \nonumber\] So the product of two nonzero matrices may be the zero matrix.
The following example shows that things are even `worse':
\[ \left[\begin{array}{rr} 1 & -3 & 2 \\ 1 & -3 & 2 \\ 1 & -3 & 2 \end{array}\right] \left[\begin{array}{rr}1 & -3 & 2 \\ 1 & -3 & 2 \\ 1 & -3 & 2 \end{array}\right] = \left[\begin{array}{rr} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right], \nonumber \nonumber\] which shows that we cannot even conclude from \(A\cdot A = O \) that \(A \) itself must be the zero matrix.
The next list gives six situations where matrix multiplication acts differently than multiplication of numbers. In fact, all statements can be related to one of the first two.
- In general, \(AB = BA \) does not hold for two \(n\times n \) matrices \(A \) and \(B \).
- In general, from \(AB = O \) it does not follow that either \(A =O \) or \(B = O \).
- In general, \((A+B)(A+B) = A^2 + 2AB + B^2 \) does not hold for two \(n\times n \) matrices \(A \) and \(B \).
- In general, \((A+B)(A-B) = A^2 - B^2 \) does not hold for two \(n\times n \) matrices \(A \) and \(B \).
- In general, from \(AB = AC \) and \(A \neq O \) it does not follow that \(B = C \).
- In general, from \(A^2 = I \) it does not follow that either \(A = I \) or \(A = -I \). For each statement counterexamples can be given, as we already did for the first two. To get more insight in what is really going on, we can also try to find out how the third till the sixth statement relate to the first two statements. For instance, the third statement is closely related to the first. Let us check where `things go wrong'. \[ \begin{array}{cl} (A+B)(A+B)& = A(A+B) +B(A+B)\\ & = A^2 + AB + BA + B^2 \end{array} \nonumber \nonumber\] The last expression is equal to \[ A^2 + 2AB + B^2 \nonumber \nonumber\] if and only if \[ AB + BA = 2AB \iff BA = AB. \nonumber \nonumber\] So any pair of two matrices \(A \) and \(B \) with \[ AB \neq BA \nonumber \nonumber\] provides a counterexample where \[ (A+B)(A+B) \neq A^2 + 2AB + B^2. \nonumber \nonumber\] Likewise, (v) follows from (ii): \[ AB = AC \iff AB - AC = O \iff A(B-C) = O. \nonumber \nonumber\] According to (ii) from the last equation we cannot deduce that \[ \text{either } A = O \quad \text{or}\quad B-C = O. \nonumber \nonumber\] We can create a counterexample by taking for \(A \) and \(B \) nonzero matrices for which \[ AB = O, \nonumber \nonumber\] and we let \(C \) be the zero matrix. Then \(B \neq C \), whereas \[ AB = AC = O \text{ and (by assumption) } A \neq O. \nonumber \nonumber\] Statement (vi) also relates to (ii): \[ A^2 = I \iff A^2 - I = (A+I)(A-I) = O \nonumber \nonumber\] from which we cannot conclude that one of the factors \((A+I) \) or \((A-I) \) must be the zero matrix. In this case we do not get a counterexample for free. You are asked to construct counterexamples in Exercise 46.
- Give a \(2 \times 2 \) matrix \(A \neq \pm I \) for which \(A^2 = I \).
- Give a \(2 \times 2 \) matrix \(A \) not containing any zeros, for which \(A^2 = I \).
- Give a \(2 \times 2 \) matrix \(B \) for which \(B^2 = -I \).
The following property connects the two operations matrix transposition and matrix multiplication.
If \(A \) is an \(m\times n \) matrix and \(B \) an \(n\times p \) matrix, then \[ (AB)^T = B^TA^T. \nonumber \nonumber\]
Before we present the proof, we consider a typical example.
As Example 48 illustrates the rule is not restricted to square matrices \(A \) and \(B \). The proof for general matrices \(A \) and \(B \) for which the product \(AB \) is well defined is as follows
Skip/Read the proof -
To show that \[ (AB)^T = B^TA^T \nonumber \nonumber\] we have to show that the matrices have the same size, and are equal entry by entry. First, we see that \(AB \) is an \(m \times p \) matrix, so \((AB)^T \) is a \(p \times m \) matrix, and \(B^TA^T \), being the product of a \(p \times n \) matrix with an \(n \times m \), is also a \(p \times m \) matrix. Second, the \((i,j) \) entry of \((AB)^T \) is the \((j,i) \) entry of \(AB \), which is the (row-column) product of the \(j \)-th row of \(A \) and the \(i \)-th column of \(B \): \[ [(AB)^T]_{ij} = \left[\begin{array}{rrr} a_{j1} & a_{j2} & \ldots & a_{jn} \end{array}\right] \left[\begin{array}{r} b_{1i} \\ b_{2i} \\ \vdots \\ b_{ni} \end{array}\right]. \nonumber \nonumber\] The \((i,j) \) entry of \(B^TA^T \) is the product of the \(i \)-th row of \(B^T \) and the \(j \)-th column of \(A^T \). Now the \(i \)-th row of \(B^T \) is the \(i \)-th column of \(B \) written as a row, and the \(j \)-th column of \(A^T \) is the \(j \)-th row of \(A \) written as a column: \[ [B^TA^T]_{ij} = \left[\begin{array}{rrr} b_{1i} & b_{2i} & \ldots & b_{ni} \end{array}\right] \left[\begin{array}{r} a_{j1} \\ a_{j2} \\ \vdots \\ a_{jn} \end{array}\right]. \nonumber \nonumber\] Both row-column products end up as the same value \[ a_{j1}b_{1i} + a_{j2}b_{2i} + \ldots + a_{jn}b_{ni} = b_{1i}a_{j1} + b_{2i}a_{j2} + \ldots + b_{ni}a_{jn}. \nonumber \nonumber\]