2.4: Matrix Multiplication
( \newcommand{\kernel}{\mathrm{null}\,}\)
In Section [sec:2_2] matrix-vector products were introduced. If A is an m×n matrix, the product Ax was defined for any n-column x in Rn as follows: If A=[a1a2⋯an] where the aj are the columns of A, and if x=[x1x2⋮xn], Definition [def:002668] reads
Ax=x1a1+x2a2+⋯+xnan
This was motivated as a way of describing systems of linear equations with coefficient matrix A. Indeed every such system has the form Ax=b where b is the column of constants.
In this section we extend this matrix-vector multiplication to a way of multiplying matrices in general, and then investigate matrix algebra for its own sake. While it shares several properties of ordinary arithmetic, it will soon become clear that matrix arithmetic is different in a number of ways.
Matrix multiplication is closely related to composition of transformations.
Composition and Matrix Multiplication
Sometimes two transformations “link” together as follows:
RkT→RnS→Rm
In this case we can apply T first and then apply S, and the result is a new transformation
S∘T:Rk→Rm
called the composite of S and T, defined by
(S∘T)(x)=S[T(x)] for all x in Rk
The action of S∘T can be described as “first T then S ” (note the order!)1. This new transformation is described in the diagram. The reader will have encountered composition of ordinary functions: For example, consider Rg→Rf→R where f(x)=x2 and g(x)=x+1 for all x in R. Then
(f∘g)(x)=f[g(x)]=f(x+1)=(x+1)2(g∘f)(x)=g[f(x)]=g(x2)=x2+1
for all x in R.
Our concern here is with matrix transformations. Suppose that A is an m×n matrix and B is an n×k matrix, and let RkTB→RnTA→Rm be the matrix transformations induced by B and A respectively, that is:
TB(x)=Bx for all x in Rk and TA(y)=Ay for all y in Rn
Write B=[b1b2⋯bk] where bj denotes column j of B for each j. Hence each bj is an n-vector (B is n×k) so we can form the matrix-vector product Abj. In particular, we obtain an m×k matrix
[Ab1Ab2⋯Abk]
with columns Ab1,Ab2,⋯,Abk. Now compute (TA∘TB)(x) for any x=[x1x2⋮xk] in Rk:
(TA∘TB)(x)=TA[TB(x)]Definition of TA∘TB=A(Bx)A and B induce TA and TB=A(x1b1+x2b2+⋯+xkbk)Equation~??? above=A(x1b1)+A(x2b2)+⋯+A(xkbk)Theorem~???=x1(Ab1)+x2(Ab2)+⋯+xk(Abk)Theorem~???=[Ab1Ab2⋯Abk]xEquation~??? above
Because x was an arbitrary vector in Rn, this shows that TA∘TB is the matrix transformation induced by the matrix [Ab1Ab2⋯Abn]. This motivates the following definition.
Matrix Multiplication003447 Let A be an m×n matrix, let B be an n×k matrix, and write B=[b1b2⋯bk] where bj is column j of B for each j. The product matrix AB is the m×k matrix defined as follows:
AB=A[b1b2⋯bk]=[Ab1Ab2⋯Abk]
Thus the product matrix AB is given in terms of its columns Ab1,Ab2,…,Abn: Column j of AB is the matrix-vector product Abj of A and the corresponding column bj of B. Note that each such product Abj makes sense by Definition [def:002668] because A is m×n and each bj is in Rn (since B has n rows). Note also that if B is a column matrix, this definition reduces to Definition [def:002668] for matrix-vector multiplication.
Given matrices A and B, Definition [def:003447] and the above computation give
A(Bx)=[Ab1Ab2⋯Abn]x=(AB)x
for all x in Rk. We record this for reference.
003469 Let A be an m×n matrix and let B be an n×k matrix. Then the product matrix AB is m×k and satisfies
A(Bx)=(AB)x for all x in Rk
Here is an example of how to compute the product AB of two matrices using Definition [def:003447].
003474 Compute AB if A=[235147018] and B=[897261].
The columns of B are b1=[876] and b2=[921], so Definition [def:002668] gives
Ab1=[235147018][876]=[677855] and Ab2=[235147018][921]=[292410]
Hence Definition [def:003447] above gives AB=[Ab1Ab2]=[672978245510].
example232 If A is m×n and B is n×k, Theorem [thm:003469] gives a simple formula for the composite of the matrix transformations TA and TB:
TA∘TB=TAB
Given any x in Rk,
(TA∘TB)(x)=TA[TB(x)]=A[Bx]=(AB)x=TAB(x)
While Definition [def:003447] is important, there is another way to compute the matrix product AB that gives a way to calculate each individual entry. In Section [sec:2_2] we defined the dot product of two n-tuples to be the sum of the products of corresponding entries. We went on to show (Theorem [thm:002903]) that if A is an m×n matrix and x is an n-vector, then entry j of the product Ax is the dot product of row j of A with x. This observation was called the “dot product rule” for matrix-vector multiplication, and the next theorem shows that it extends to matrix multiplication in general.
Dot Product Rule003488 Let A and B be matrices of sizes m×n and n×k, respectively. Then the (i,j)-entry of AB is the dot product of row i of A with column j of B.
Write B=[b1b2⋯bn] in terms of its columns. Then Abj is column j of AB for each j. Hence the (i,j)-entry of AB is entry i of Abj, which is the dot product of row i of A with bj. This proves the theorem.
Thus to compute the (i,j)-entry of AB, proceed as follows (see the diagram):
Go across row i of A, and down column j of B, multiply corresponding entries, and add the results.
[\tnA\tnrowi1\tnrowi\tnrowi2][\tncolumnj1\tncolumnj2]=[\tncolumnj3\tnrowi3\tnijentry\tnrowi4\tncolumnj4]
Note that this requires that the rows of A must be the same length as the columns of B. The following rule is useful for remembering this and for deciding the size of the product matrix AB.
Let A and B denote matrices. If A is m×n and B is n′×k, the product AB can be formed if and only if n=n′. In this case the size of the product matrix AB is m×k, and we say that AB is defined, or that A and B are compatible for multiplication.
The diagram provides a useful mnemonic for remembering this. We adopt the following convention:
Whenever a product of matrices is written, it is tacitly assumed that the sizes of the factors are such that the product is defined.
To illustrate the dot product rule, we recompute the matrix product in Example [exa:003474].
003516 Compute AB if A=[235147018] and B=[897261].
Here A is 3×3 and B is 3×2, so the product matrix AB is defined and will be of size 3×2. Theorem [thm:003488] gives each entry of AB as the dot product of the corresponding row of A with the corresponding column of Bj that is,
AB=[235147018][897261]=[\arraycolsep=8pt2⋅8+3⋅7+5⋅62⋅9+3⋅2+5⋅11⋅8+4⋅7+7⋅61⋅9+4⋅2+7⋅10⋅8+1⋅7+8⋅60⋅9+1⋅2+8⋅1]=[672978245510]
Of course, this agrees with Example [exa:003474].
003527 Compute the (1,3)- and (2,4)-entries of AB where
A=[3−12014] and B=[21600234−1058].
Then compute AB.
The (1,3)-entry of AB is the dot product of row 1 of A and column 3 of B (highlighted in the following display), computed by multiplying corresponding entries and adding the results.
\boldsymbol{\begin{array}{ll} \left[ \begin{array}{rrr} \tn{a1}{3} & -1 & \tn{a2}{2} \\ 0 & 1 & 4 \end{array} \right] \left[ \begin{array}{rrrr} 2 & 1 & \tn{b1}{6} & 0 \\ 0 & 2 & 3 & 4 \\ -1 & 0 & \tn{b2}{5} & 8 \end{array} \right] & (1, 3)\mbox{-entry} = 3 \cdot 6 + (-1) \cdot 3 + 2 \cdot 5 = 25 \end{array} \begin{tikzpicture}[remember picture, overlay] \draw[color=blue!30!gray,rounded corners]([xshift=-7pt, yshift=5pt]a1.west) rectangle ([xshift=5pt, yshift=-5pt]a2.east); \draw[color=blue!30!gray,rounded corners]([xshift=-7pt, yshift=5pt]b1.north) rectangle ([xshift=5pt, yshift=-5pt]b2.south); \end{tikzpicture} \nonumber}
Similarly, the (2,4)-entry of AB involves row 2 of A and column 4 of B.
\boldsymbol{\begin{array}{ll} \left[ \begin{array}{rrr} 3 & -1 & 2 \\ \tn{a3}{0} & 1 & \tn{a4}{4} \end{array} \right] \left[ \begin{array}{rrrr} 2 & 1 & 6 & \tn{b3}{0} \\ 0 & 2 & 3 & 4 \\ -1 & 0 & 5 & \tn{b4}{8} \end{array} \right] & (2, 4)\mbox{-entry} = 0 \cdot 0 + 1 \cdot 4 + 4 \cdot 8 = 36 \end{array} \begin{tikzpicture}[remember picture, overlay] \draw[color=blue!30!gray,rounded corners]([xshift=-7pt, yshift=5pt]a3.west) rectangle ([xshift=5pt, yshift=-5pt]a4.east); \draw[color=blue!30!gray,rounded corners]([xshift=-7pt, yshift=5pt]b3.north) rectangle ([xshift=5pt, yshift=-5pt]b4.south); \end{tikzpicture} \nonumber}
Since A is 2×3 and B is 3×4, the product is 2×4.
AB=[3−12014][21600234−1058]=[412512−422336]
003540 If A=[132] and B=[564], compute A2, AB, BA, and B2 when they are defined.
Here, A is a 1×3 matrix and B is a 3×1 matrix, so A2 and B2 are not defined. However, the compatibility rule reads
AB1×33×1 and BA3×11×3
so both AB and BA can be formed and these are 1×1 and 3×3 matrices, respectively.
AB=[132][564]=[1⋅5+3⋅6+2⋅4]=\arraycolsep=1.5pt[31]
BA=[564][132]=[5⋅15⋅35⋅26⋅16⋅36⋅24⋅14⋅34⋅2]=[51510618124128]
Unlike numerical multiplication, matrix products AB and BA need not be equal. In fact they need not even be the same size, as Example [exa:003540] shows. It turns out to be rare that AB=BA (although it is by no means impossible), and A and B are said to commute when this happens.
003554 Let A=[69−4−6] and B=[12−10]. Compute A2, AB, BA.
A2=[69−4−6][69−4−6]=[0000], so A2=0 can occur even if A≠0. Next,
AB=[69−4−6][12−10]=[−3122−8]BA=[12−10][69−4−6]=[−2−3−6−9]
Hence AB≠BA, even though AB and BA are the same size.
003567 If A is any matrix, then IA=A and AI=A, and where I denotes an identity matrix of a size so that the multiplications are defined.
These both follow from the dot product rule as the reader should verify. For a more formal proof, write A=[a1a2⋯an] where aj is column j of A. Then Definition [def:003447] and Example [exa:002949] give
IA=[Ia1Ia2⋯Ian]=[a1a2⋯an]=A
If ej denotes column j of I, then Aej=aj for each j by Example [exa:002964]. Hence Definition [def:003447] gives:
AI=A[e1e2⋯en]=[Ae1Ae2⋯Aen]=[a1a2⋯an]=A
The following theorem collects several results about matrix multiplication that are used everywhere in linear algebra.
003584 Assume that a is any scalar, and that A, B, and C are matrices of sizes such that the indicated matrix products are defined. Then:
2
- IA=A and AI=A where I denotes an identity matrix.
- A(BC)=(AB)C.
- A(B+C)=AB+AC.
- (B+C)A=BA+CA.
- a(AB)=(aA)B=A(aB).
- (AB)T=BTAT.
Condition (1) is Example [exa:003567]; we prove (2), (4), and (6) and leave (3) and (5) as exercises.
- If C=[c1c2⋯ck] in terms of its columns, then BC=[Bc1Bc2⋯Bck] by Definition [def:003447], so
A(BC)=[A(Bc1)A(Bc2)⋯A(Bck)]Definition~???=[(AB)c1(AB)c2⋯(AB)ck)]Theorem~???=(AB)CDefinition~???
-
We know (Theorem [thm:002811]) that (B+C)x=Bx+Cx holds for every column x. If we write
A=[a1a2⋯an] in terms of its columns, we get(B+C)A=[(B+C)a1(B+C)a2⋯(B+C)an]Definition~???=[Ba1+Ca1Ba2+Ca2⋯Ban+Can]Theorem~???=[Ba1Ba2⋯Ban]+[Ca1Ca2⋯Can]Adding Columns=BA+CADefinition~???
- As in Section [sec:2_1], write A=[aij] and B=[bij], so that AT=[a′ij] and BT=[b′ij] where a′ij=aji and b′ji=bij for all i and j. If cij denotes the (i,j)-entry of BTAT, then cij is the dot product of row i of BT with column j of AT. Hence
cij=b′i1a′1j+b′i2a′2j+⋯+b′ima′mj=b1iaj1+b2iaj2+⋯+bmiajm=aj1b1i+aj2b2i+⋯+ajmbmi
Property 2 in Theorem [thm:003584] is called the associative law of matrix multiplication. It asserts that the equation A(BC)=(AB)C holds for all matrices (if the products are defined). Hence this product is the same no matter how it is formed, and so is written simply as ABC. This extends: The product ABCD of four matrices can be formed several ways—for example, (AB)(CD), [A(BC)]D, and A[B(CD)]—but the associative law implies that they are all equal and so are written as ABCD. A similar remark applies in general: Matrix products can be written unambiguously with no parentheses.
However, a note of caution about matrix multiplication must be taken: The fact that AB and BA need not be equal means that the order of the factors is important in a product of matrices. For example ABCD and ADCB may not be equal.
If the order of the factors in a product of matrices is changed, the product matrix may change (or may not be defined). Ignoring this warning is a source of many errors by students of linear algebra!
Properties 3 and 4 in Theorem [thm:003584] are called distributive laws. They assert that A(B+C)=AB+AC and (B+C)A=BA+CA hold whenever the sums and products are defined. These rules extend to more than two terms and, together with Property 5, ensure that many manipulations familiar from ordinary algebra extend to matrices. For example
A(2B−3C+D−5E)=2AB−3AC+AD−5AE(A+3C−2D)B=AB+3CB−2DB
Note again that the warning is in effect: For example A(B−C) need not equal AB−CA. These rules make possible a lot of simplification of matrix expressions.
003663 Simplify the expression A(BC−CD)+A(C−B)D−AB(C−D).
A(BC−CD)+A(C−B)D−AB(C−D)=A(BC)−A(CD)+(AC−AB)D−(AB)C+(AB)D=ABC−ACD+ACD−ABD−ABC+ABD=0
Example [exa:003671] and Example [exa:003678] below show how we can use the properties in Theorem [thm:003488] to deduce other facts about matrix multiplication. Matrices A and B are said to commute if AB=BA.
003671 Suppose that A, B, and C are n×n matrices and that both A and B commute with C; that is, AC=CA and BC=CB. Show that AB commutes with C.
Showing that AB commutes with C means verifying that (AB)C=C(AB). The computation uses the associative law several times, as well as the given facts that AC=CA and BC=CB.
(AB)C=A(BC)=A(CB)=(AC)B=(CA)B=C(AB)
003678 Show that AB=BA if and only if (A−B)(A+B)=A2−B2.
The following always holds:
(A−B)(A+B)=A(A+B)−B(A+B)=A2+AB−BA−B2
Hence if AB=BA, then (A−B)(A+B)=A2−B2 follows. Conversely, if this last equation holds, then equation ([eq:commute]) becomes
A2−B2=A2+AB−BA−B2
This gives 0=AB−BA, and AB=BA follows.
In Section [sec:2_2] we saw (in Theorem [thm:002684]) that every system of linear equations has the form
Ax=b
where A is the coefficient matrix, x is the column of variables, and b is the constant matrix. Thus the system of linear equations becomes a single matrix equation. Matrix multiplication can yield information about such a system.
003694 Consider a system Ax=b of linear equations where A is an m×n matrix. Assume that a matrix C exists such that CA=In. If the system Ax=b has a solution, show that this solution must be Cb. Give a condition guaranteeing that Cb is in fact a solution.
Suppose that x is any solution to the system, so that Ax=b. Multiply both sides of this matrix equation by C to obtain, successively,
C(Ax)=Cb,(CA)x=Cb,Inx=Cb,x=Cb
This shows that if the system has a solution x, then that solution must be x=Cb, as required. But it does not guarantee that the system has a solution. However, if we write x1=Cb, then
Ax1=A(Cb)=(AC)b
Thus x1=Cb will be a solution if the condition AC=Im is satisfied.
The ideas in Example [exa:003694] lead to important information about matrices; this will be pursued in the next section.
Block Multiplication
Block Partition of a Matrix003711 It is often useful to consider matrices whose entries are themselves matrices (called blocks). A matrix viewed in this way is said to be partitioned into blocks.
For example, writing a matrix B in the form
B=[b1b2⋯bk] where the bj are the columns of B
is such a block partition of B. Here is another example.
Consider the matrices
A=[10000010002−142131−175]=[I2023PQ] and B=[4−25673−1016]=[XY]
where the blocks have been labelled as indicated. This is a natural way to partition A into blocks in view of the blocks I2 and 023 that occur. This notation is particularly useful when we are multiplying the matrices A and B because the product AB can be computed in block form as follows:
AB=[I0PQ][XY]=[IX+0YPX+QY]=[XPX+QY]=[4−256308827]
This is easily checked to be the product AB, computed in the conventional manner.
In other words, we can compute the product AB by ordinary matrix multiplication, using blocks as entries. The only requirement is that the blocks be compatible. That is, the sizes of the blocks must be such that all (matrix) products of blocks that occur make sense. This means that the number of columns in each block of A must equal the number of rows in the corresponding block of B.
Block Multiplication003723 If matrices A and B are partitioned compatibly into blocks, the product AB can be computed by matrix multiplication using blocks as entries.
We omit the proof.
We have been using two cases of block multiplication. If B=[b1b2⋯bk] is a matrix where the bj are the columns of B, and if the matrix product AB is defined, then we have
AB=A[b1b2⋯bk]=[Ab1Ab2⋯Abk]
This is Definition [def:003447] and is a block multiplication where A=[A] has only one block. As another illustration,
Bx=[b1b2⋯bk][x1x2⋮xk]=x1b1+x2b2+⋯+xkbk
where x is any k×1 column matrix (this is Definition [def:002668]).
It is not our intention to pursue block multiplication in detail here. However, we give one more example because it will be used below.
003738 Suppose matrices A=[BX0C] and A1=[B1X10C1] are partitioned as shown where B and B1 are square matrices of the same size, and C and C1 are also square of the same size. These are compatible partitionings and block multiplication gives
AA1=[BX0C][B1X10C1]=[BB1BX1+XC10CC1]
003746 Obtain a formula for Ak where A=[IX00] is square and I is an identity matrix.
We have A2=[IX00][IX00]=[I2IX+X0002]=[IX00]=A. Hence A3=AA2=AA=A2=A. Continuing in this way, we see that Ak=A for every k≥1.
Block multiplication has theoretical uses as we shall see. However, it is also useful in computing products of matrices in a computer with limited memory capacity. The matrices are partitioned into blocks in such a way that each product of blocks can be handled. Then the blocks are stored in auxiliary memory and their products are computed one by one.
Directed Graphs
The study of directed graphs illustrates how matrix multiplication arises in ways other than the study of linear equations or matrix transformations.
A directed graph consists of a set of points (called vertices) connected by arrows (called edges). For example, the vertices could represent cities and the edges available flights. If the graph has n vertices v1,v2,…,vn, the adjacency matrix A=[aij] is the n×n matrix whose (i,j)-entry aij is 1 if there is an edge from vj to vi (note the order), and zero otherwise. For example, the adjacency matrix of the directed graph shown is A=[110101100].
A path of length r (or an r-path) from vertex j to vertex i is a sequence of r edges leading from vj to vi. Thus v1→v2→v1→v1→v3 is a 4-path from v1 to v3 in the given graph. The edges are just the paths of length 1, so the (i,j)-entry aij of the adjacency matrix A is the number of 1-paths from vj to vi. This observation has an important extension:
003785 If A is the adjacency matrix of a directed graph with n vertices, then the (i,j)-entry of Ar is the number of r-paths vj→vi.
As an illustration, consider the adjacency matrix A in the graph shown. Then
A=[110101100],A2=[211210110], and A3=[421321211]
Hence, since the (2,1)-entry of A2 is 2, there are two 2-paths v1→v2 (in fact they are v1→v1→v2 and v1→v3→v2). Similarly, the (2,3)-entry of A2 is zero, so there are no 2-paths v3→v2, as the reader can verify. The fact that no entry of A3 is zero shows that it is possible to go from any vertex to any other vertex in exactly three steps.
To see why Theorem [thm:003785] is true, observe that it asserts that
the (i,j)-entry of Ar equals the number of r-paths vj→vi
holds for each r≥1. We proceed by induction on r (see Appendix [chap:appcinduction]). The case r=1 is the definition of the adjacency matrix. So assume inductively that ([eq:assert]) is true for some r≥1; we must prove that ([eq:assert]) also holds for r+1. But every (r+1)-path vj→vi is the result of an r-path vj→vk for some k, followed by a 1-path vk→vi. Writing A=[aij] and Ar=[bij], there are bkj paths of the former type (by induction) and aik of the latter type, and so there are aikbkj such paths in all. Summing over k, this shows that there are
ai1b1j+ai2b2j+⋯+ainbnj(r+1)-paths vj→vi
But this sum is the dot product of the ith row [ai1ai2⋯ain] of A with the jth column [b1jb2j⋯bnj]T of Ar. As such, it is the (i,j)-entry of the matrix product ArA=Ar+1. This shows that ([eq:assert]) holds for r+1, as required.
- When reading the notation S∘T, we read S first and then T even though the action is “first T then S ”. This annoying state of affairs results because we write T(x) for the effect of the transformation T on x, with T on the left. If we wrote this instead as (x)T, the confusion would not occur. However the notation T(x) is well established.↩