$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 7.3: Properties of Matrices

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

The objects of study in linear algebra are linear operators. We have seen that linear operators can be represented as matrices through choices of ordered bases, and that matrices provide a means of efficient computation. We now begin an in depth study of matrices.

Definition: matrix, Column and Row Vectors

An $$r \times k$$ matrix $$M=(m^{i}_{j})$$ for $$i=1, \ldots, r; j=1, \ldots, k$$ is a rectangular array of real (or complex) numbers:

$M = \begin{pmatrix} m_{1}^{1} & m_{2}^{1} & \cdots & m_{k}^{1} \\ m_{1}^{2} & m_{2}^{2} & \cdots & m_{k}^{2} \\ \vdots & \vdots & & \vdots \\ m_{1}^{r} & m_{2}^{r} & \cdots & m_{k}^{r} \\ \end{pmatrix}\, .$

The numbers $$m^{i}_{j}$$ are called entries. The superscript indexes the row of the matrix and the subscript indexes the column of the matrix in which $$m_{j}^{i}$$ appears.

An $$r\times 1$$ matrix $$v = (v^{r}_{1}) = (v^{r})$$ is called a column vector, written

$$v = \begin{pmatrix}v^{1}\\v^{2}\\ \vdots \\ v^{r} \end{pmatrix}\, .$$

A $$1\times k$$ matrix $$v = (v^{1}_{k}) = (v_{k})$$ is called a row vector, written

$$v = \begin{pmatrix}v_{1} & v_{2} & \cdots & v_{k} \end{pmatrix}\, .$$

The transpose of a column vector is the corresponding row vector and vice versa:

Example 73

Let

$$v=\begin{pmatrix}1\\2\\3\end{pmatrix}\, .$$

Then

$$v^{T}=\begin{pmatrix}1 &2 &3\end{pmatrix}\, ,$$

and $$(v^{T})^{T}=v$$.

A matrix is an efficient way to store information:

Example 74: Gif images

In computer graphics, you may have encountered image files with a .gif extension. These files are actually just matrices: at the start of the file the size of the matrix is given, after which each number is a matrix entry indicating the color of a particular pixel in the image.

This matrix then has its rows shuffled a bit: by listing, say, every eighth row, a web browser downloading the file can start displaying an incomplete version of the picture before the download is complete.

Finally, a compression algorithm is applied to the matrix to reduce the file size.

Example 75

Graphs occur in many applications, ranging from telephone networks to airline routes. In the subject of graph theory, a graph is just a collection of vertices and some edges connecting vertices. A matrix can be used to indicate how many edges attach one vertex to another. For example, the graph pictured above would have the following matrix, where $$m^{i}_{j}$$ indicates the number of edges between the vertices labeled $$i$$ and $$j$$:

$M = \begin{pmatrix} 1 & 2 & 1 & 1 \\ 2 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 3 \\ \end{pmatrix}$

This is an example of a $$\textit{symmetric matrix}$$, since $$m_{j}^{i} = m_{i}^{j}$$.

The set of all $$r\times k$$ matrices

$$\mathbb{M}_{k}^{r}:=\{(m^{i}_{j})|m^{i}_{j}\in \mathbb{R};\, i=1,\ldots,r;\, j=1\ldots k\}\, ,$$

is itself a vector space with addition and scalar multiplication defined as follows:

$$M+N = (m_{j}^{i}) + (n_{j}^{i}) = ( m_{j}^{i} + n_{j}^{i} )$$

$$rM = r(m_{j}^{i}) = (rm_{j}^{i})$$

In other words, addition just adds corresponding entries in two matrices, and scalar multiplication multiplies every entry.
Notice that $$M_{1}^{n} = \Re^{n}$$ is just the vector space of column vectors.

Recall that we can multiply an $$r \times k$$ matrix by a $$k \times 1$$ column vector to produce a $$r \times 1$$ column vector using the rule

$MV = \left(\sum_{j=1}^{k} m_{j}^{i} v^{j}\right)\, .$

This suggests the rule for multiplying an $$r \times k$$ matrix $$M$$ by a $$k \times s$$ matrix~$$N$$: our $$k \times s$$ matrix $$N$$ consists of $$s$$ column vectors side-by-side, each of dimension $$k \times 1.$$ We can multiply our $$r \times k$$ matrix $$M$$ by each of these $$s$$ column vectors using the rule we already know, obtaining $$s$$ column vectors each of dimension $$r \times 1.$$ If we place these $$s$$ column vectors side-by-side, we obtain an $$r \times s$$ matrix $$MN.$$

That is, let

$N = \begin{pmatrix} n_{1}^{1} & n_{2}^{1} & \cdots & n_{s}^{1} \\ n_{1}^{2} & n_{2}^{2} & \cdots & n_{s}^{2} \\ \vdots & \vdots & & \vdots \\ n_{1}^{k} & n_{2}^{k} & \cdots & n_{s}^{k} \\ \end{pmatrix}$

and call the columns $$N_{1}$$ through $$N_{s}$$:

$N_{1} = \begin{pmatrix}n_{1}^{1}\\n_{1}^{2}\\\vdots\\n_{1}^{k}\end{pmatrix}\, ,\: N_{2} = \begin{pmatrix}n_{2}^{1}\\n_{2}^{2}\\\vdots\\n_{2}^{k}\end{pmatrix}\, ,\: \ldots,\: N_{s} = \begin{pmatrix}n_{s}^{1}\\n_{s}^{2}\\\vdots\\n_{s}^{k}\end{pmatrix}.$

Then

$MN=M \begin{pmatrix} | & | & & | \\ N_{1} & N_{2} & \cdots & N_{s} \\ | & | & & | \\ \end{pmatrix} = \begin{pmatrix} | & | & & | \\ MN_{1} & MN_{2} & \cdots & MN_{s} \\ | & | & & | \\ \end{pmatrix}$

Concisely: If $$M=(m^{i}_{j})$$ for $$i=1, \ldots, r; j=1, \ldots, k$$ and $$N=(n^{i}_{j})$$ for $$i=1, \ldots, k; j=1, \ldots, s,$$ then $$MN=L$$ where $$L=(\ell^{i}_{j})$$ for $$i=i, \ldots, r; j=1, \ldots, s$$ is given by

$\ell^{i}_{j} = \sum_{p=1}^{k} m^{i}_{p} n^{p}_{j}.$

This rule obeys linearity.

Notice that in order for the multiplication make sense, the columns and rows must match. For an $$r\times k$$ matrix $$M$$ and an $$s\times m$$ matrix $$N$$, then to make the product $$MN$$ we must have $$k=s$$. Likewise, for the product $$NM$$, it is required that $$m=r$$. A common shorthand for keeping track of the sizes of the matrices involved in a given product is:

$$\left(r \times k\right)\times \left(k\times m\right) = \left(r\times m\right)$$

Example 76

Multiplying a $$(3\times 1)$$ matrix and a $$(1\times 2)$$ matrix yields a $$(3\times 2)$$ matrix.

$\begin{pmatrix}1\\3\\2\end{pmatrix} \begin{pmatrix}2 & 3\end{pmatrix} = \begin{pmatrix} 1\cdot 2 & 1\cdot 3 \\ 3\cdot 2 & 3\cdot 3 \\ 2\cdot 2 & 2\cdot 3 \\ \end{pmatrix} = \begin{pmatrix} 2 & 3 \\ 6 & 9 \\ 4 & 6 \\ \end{pmatrix}$

Another way to view matrix multiplication is in terms of dot products:

$$\textit{The entries of $$MN$$ are made from the dot products of the rows of $$M$$ with the columns of $$N$$.}$$

Example 77

Let

$$M=\begin{pmatrix}1&3\\3&5\\2&6\end{pmatrix}=:\begin{pmatrix}u^{T}\\v^{T}\\w^{T}\end{pmatrix} \mbox{ and } N=\begin{pmatrix}2&3&1\\0&1&0\end{pmatrix}=:\begin{pmatrix}a & b & c\end{pmatrix}$$

where

$$u=\begin{pmatrix}1\\3\end{pmatrix}\, ,\quad v=\begin{pmatrix}3\\5\end{pmatrix}\, ,\quad w=\begin{pmatrix}2\\6\end{pmatrix}\, ,\quad a=\begin{pmatrix}2\\0\end{pmatrix}\, ,\quad b=\begin{pmatrix}3\\1\end{pmatrix}\, ,\quad c=\begin{pmatrix}1\\0\end{pmatrix}\, .$$

Then
$$MN=\left(\!\begin{array}{ccc} u\cdot a & u\cdot b & u\cdot c\\ v\cdot a & v\cdot b & v\cdot c\\ w\cdot a & w\cdot b & w\cdot c\\ \end{array}\!\right) = \begin{pmatrix} 2&6&1\\ 6&14&3\\ 4&12&2 \end{pmatrix}\, .$$

This fact has an obvious yet important consequence:

Theorem: orthogonal

Let $$M$$ be a matrix and $$x$$ a column vector. If

$$Mx=0$$

then the vector $$x$$ is orthogonal to the rows of $$M$$.

Remark

Remember that the set of all vectors that can be obtained by adding up scalar multiples of the columns of a matrix is called its $$\textit{column space}$$. Similarly the $$\textit{row space}$$ is the set of all row vectors obtained by adding up multiples of the rows of a matrix. The above theorem says that if $$Mx=0$$, then the vector $$x$$ is orthogonal to every vector in the row space of $$M$$.

We know that $$r\times k$$ matrices can be used to represent linear transformations $$\Re^{k} \rightarrow \Re^{r}$$ via $$MV = \sum_{j=1}^{k} m_{j}^{i}v^{j} ,$$ which is the same rule used when we multiply an $$r\times k$$ matrix by a $$k\times 1$$ vector to produce an $$r\times1$$ vector.

Likewise, we can use a matrix $$N=(n^{i}_{j})$$ to define a linear transformation of a vector space of matrices. For example
$L \colon M^{s}_{k} \stackrel{N}{\longrightarrow} M^{r}_{k}\, ,$
$L(M)=(l^{i}_{k}) \mbox{ where } l^{i}_{k}= \sum_{j=1}^{s} n_{j}^{i}m^{j}_{k}.$
This is the same as the rule we use to multiply matrices. In other words, $$L(M)=NM$$ is a linear transformation.

### Matrix Terminology

Let $$M=(m^{i}_{j})$$ be a matrix. The entries $$m_{i}^{i}$$ are called $$\textit{diagonal}$$, and the set $$\{m_{1}^{1}$$, $$m_{2}^{2}$$, $$\ldots \}$$ is called the $$\textit{diagonal of the matrix}$$.

Any $$r\times r$$ matrix is called a $$\textit{square matrix}$$. A square matrix that is zero for all non-diagonal entries is called a diagonal matrix. An example of a square diagonal matrix is
$$\begin{pmatrix} 2 & 0 & 0\\ 0 & 3 & 0\\ 0 & 0 & 0\\ \end{pmatrix}\, .$$

The $$r\times r$$ diagonal matrix with all diagonal entries equal to $$1$$ is called the $$\textit{identity matrix}$$, $$I_{r}$$, or just $$I$$. An identity matrix looks like

$I= \begin{pmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{pmatrix}.$

The identity matrix is special because $$I_{r}M=MI_{k}=M$$ for all $$M$$ of size $$r\times k$$.

Definition

The $$\textit{transpose}$$ of an $$r\times k$$ matrix $$M = (m_{j}^{i})$$ is the $$k\times r$$ matrix with entries
$M^{T} = (\hat{m}_{j}^{i})$
with $$\hat{m}_{j}^{i} = m_{i}^{j}$$.

A matrix $$M$$ is $$\textit{symmetric}$$ if $$M=M^{T}$$.

Example 78

$$\begin{pmatrix} 2 & 5 & 6\\ 1 & 3 & 4\\ \end{pmatrix}^{T} = \begin{pmatrix} 2 & 1 \\ 5 & 3 \\ 6 & 4 \\ \end{pmatrix}\, ,$$
and
$$\begin{pmatrix} 2 & 5 & 6\\ 1 & 3 & 4\\ \end{pmatrix} \begin{pmatrix} 2 & 5 & 6\\ 1 & 3 & 4\\ \end{pmatrix}^{T} = \begin{pmatrix} 65&43\\43&26 \end{pmatrix}\, ,$$
is symmetric.

Observations

1. Only square matrices can be symmetric.
2. The transpose of a column vector is a row vector, and vice-versa.
3. Taking the transpose of a matrix twice does nothing. \emph{i.e.,} $(M^T)^T=M$.

Theorem: Transpose and Multiplication

Let $$M, N$$ be matrices such that $$MN$$ makes sense. Then

$$(MN)^{T}= N^{T}M^{T}.$$

The proof of this theorem is left to Review Question 2.

## 7.3.1 Associativity and Non-Commutativity

Many properties of matrices following from the same property for real numbers. Here is an example.

Example 79

$$\textit{Associativity of matrix multiplication.}$$ We know for real numbers $$x$$, $$y$$ and $$z$$ that
$$x(yz)=(xy)z\, ,$$
$$\textit{i.e.}$$, the order of bracketing does not matter. The same property holds for matrix multiplication, let us show why.
Suppose $$M=\left( m^{i}_{j} \right)$$, $$N=\left( n^{j}_{k} \right)$$ and $$R=\left( r^{k}_{l} \right)$$ are, respectively, $$m\times n$$, $$n\times r$$ and $$r\times t$$ matrices. Then from the rule for matrix multiplication we have
$$MN=\left(\sum_{j=1}^{n} m^{i}_{j} n^{j}_{k}\right)\mbox{ and } NR=\left(\sum_{k=1}^{r} n^{j}_{k} r^{k}_{l}\right)\, .$$
So first we compute
$$(MN)R=\left(\sum_{k=1}^{r} \Big[\sum_{j=1}^{n} m^{i}_{j} n^{j}_{k}\Big] r^{k}_{l} \right) = \left(\sum_{k=1}^{r} \sum_{j=1}^{n} \Big[ m^{i}_{j} n^{j}_{k}\Big] r^{k}_{l} \right) =\left(\sum_{k=1}^{r} \sum_{j=1}^{n} m^{i}_{j} n^{j}_{k} r^{k}_{l} \right)\, .$$
In the first step we just wrote out the definition for matrix multiplication, in the second step we moved summation symbol outside the bracket (this is just the distributive property $$x(y+z)=xy+xz$$ for numbers) and in the last step we used the associativity property for real numbers to remove the square brackets. Exactly the same reasoning shows that
$$M(NR)=\left(\sum_{j=1}^{n} m^{i}_{j}\Big[\sum_{k=1}^{r} n^{j}_{k} r^{k}_{l}\Big]\right) = \left(\sum_{k=1}^{r} \sum_{j=1}^{n} m^{i}_{j} \Big[n^{j}_{k}r^{k}_{l} \Big] \right) =\left(\sum_{k=1}^{r} \sum_{j=1}^{n} m^{i}_{j} n^{j}_{k} r^{k}_{l} \right)\, .$$
This is the same as above so we are done. $$\textit{As a fun remark, note that Einstein would simply have written}$$
$$(MN)R=(m^{i}_{j} n^{j}_{k}) r^{k}_{l}= m^{i}_{j} n^{j}_{k} r^{k}_{l} = m^{i}_{j} (n^{j}_{k} r^{k}_{l} ) = M(NR)$$.

Sometimes matrices do not share the properties of regular numbers. In particular, for $$\textit{generic}$$ $$n\times n$$ square matrices $$M$$ and $$N$$,

$$MN\neq NM\, .$$

Example 80 (Matrix multiplication does $$\textit{not}$$ commute.)

$\begin{pmatrix} 1 & 1 \\ 0 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 1 & 1 \\ \end{pmatrix}$
On the other hand:
$\begin{pmatrix} 1 & 0 \\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ \end{pmatrix}\, .$

Since $$n\times n$$ matrices are linear transformations $$\Re^{n} \rightarrow \Re^{n}$$, we can see that the order of successive linear transformations matters. Here is an example of matrices acting on objects in three dimensions that also shows matrices not commuting.

Example 81

In Review Problem 3, you learned that the matrix
$$M=\begin{pmatrix}\cos\theta & \sin\theta \\ -\sin \theta & \cos\theta\end{pmatrix}\, ,$$
rotates vectors in the plane by an angle $$\theta$$. We can generalize this, using block matrices, to three dimensions. In fact the following matrices built from a $$2\times 2$$ rotation matrix, a $$1\times 1$$ identity matrix and zeroes everywhere else
$$M=\begin{pmatrix}\cos\theta & \sin\theta &0\\ -\sin \theta & \cos\theta&0\\0&0&1\end{pmatrix}\qquad\mbox{and}\qquad N=\begin{pmatrix}1&0&0\\0&\cos\theta & \sin\theta \\ 0&-\sin \theta & \cos\theta\end{pmatrix}\, ,$$
perform rotations by an angle $$\theta$$ in the $$xy$$ and $$yz$$ planes, respectively. Because, they rotate single vectors, you can also use them to rotate objects built from a collection of vectors like pretty colored blocks! Here is a picture of $$M$$ and then $$N$$ acting on such a block, compared with the case of $$N$$ followed by $$M$$. The special case of $$\theta=90^{\circ}$$ is shown. Notice how the end products of $$MN$$ and $$NM$$ are different, so $$MN\neq NM$$ here.

## 7.3.2 Block Matrices

It is often convenient to partition a matrix $$M$$ into smaller matrices called $$\textit{blocks}$$, like so:

$M=\left(\begin{array}{ccc|c} 1 & 2 & 3 & 1 \\ 4 & 5 & 6 & 0 \\ 7 & 8 & 9 & 1 \\\hline 0 & 1 & 2 & 0 \\ \end{array}\right) = \left(\begin{array}{c|c} A & B \\ \hline C & D \\ \end{array}\right)$
Here $$A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \\ \end{pmatrix}$$, $$B=\begin{pmatrix}1\\0\\1\end{pmatrix}$$, $$C=\begin{pmatrix}0 & 1 & 2\end{pmatrix}$$, $$D=(0)$$.

1. The blocks of a block matrix must fit together to form a rectangle. So $$\left(\begin{array}{c|c} B & A \\ \hline D & C \\ \end{array}\right)$$ makes sense, but
$$\left(\begin{array}{c|c} C & B \\ \hline D & A \\ \end{array}\right)$$ does not.

2. There are many ways to cut up an $$n\times n$$ matrix into blocks. Often context or the entries of the matrix will suggest a useful way to divide the matrix into blocks. For example, if there are large blocks of zeros in a matrix, or blocks that look like an identity matrix, it can be useful to partition the matrix accordingly.

3. Matrix operations on block matrices can be carried out by treating the blocks as matrix entries. In the example above,
\begin{eqnarray*}
M^{2} & = & \left(\begin{array}{c|c}
A & B \\
\hline
C & D \\
\end{array}\right)
\left(\begin{array}{c|c}
A & B \\
\hline
C & D \\
\end{array}\right) \\
& = & \left(\begin{array}{c|c}
A^{2}+BC & AB+BD \\
\hline
CA+DC & CB+D^{2} \\
\end{array}\right) \\
\end{eqnarray*}

Computing the individual blocks, we get:
\begin{eqnarray*}
A^{2}+BC &=& \begin{pmatrix}
30 & 37 & 44 \\
66 & 81 & 96 \\
102&127 &152 \\
\end{pmatrix} \\
AB+BD &=& \begin{pmatrix} 4 \\ 10 \\ 16 \end{pmatrix} \\
CA+DC &=& \begin{pmatrix} 18 \\ 21 \\ 24 \end{pmatrix} \\
CB+D^{2} &=& (2)
\end{eqnarray*}

Assembling these pieces into a block matrix gives:
$\left(\begin{array}{ccc|c} 30 & 37 & 44 & 4 \\ 66 & 81 & 96 & 10 \\ 102 & 127 & 152 & 16 \\ \hline 4 & 10 & 16 & 2 \\ \end{array}\right)$

This is exactly $$M^{2}$$.

## 7.3.3 The Algebra of Square Matrices

Not every pair of matrices can be multiplied. When multiplying two matrices, the number of rows in the left matrix must equal the number of columns in the right. For an $$r\times k$$ matrix $$M$$ and an $$s\times l$$ matrix $$N$$, then we must have $$k=s$$.

This is not a problem for square matrices of the same size, though. Two $$n\times n$$ matrices can be multiplied in either order. For a single matrix $$M \in M^{n}_{n}$$, we can form $$M^{2}=MM$$, $$M^{3}=MMM$$, and so on. It is useful to define $$M^{0}=I\, ,$$ the identity matrix, just like $$x^{0}=1$$ for numbers.

As a result, any polynomial can be evaluated on a matrix.

Example 82

Let $$f(x) = x - 2x^{2} + 3x^{3}$$
and $$M=\begin{pmatrix} 1 & t \\ 0 & 1 \\ \end{pmatrix}\, .$$ Then:
$M^{2} = \begin{pmatrix} 1 & 2t \\ 0 & 1 \\ \end{pmatrix}\, ,\:\: M^{3} = \begin{pmatrix} 1 & 3t \\ 0 & 1 \\ \end{pmatrix}\, ,\: \ldots$
Hence:
\begin{eqnarray*}
f(M) &=& \begin{pmatrix}
1 & t \\
0 & 1 \\
\end{pmatrix}
- 2 \begin{pmatrix}
1 & 2t \\
0 & 1 \\
\end{pmatrix}
+ 3 \begin{pmatrix}
1 & 3t \\
0 & 1 \\
\end{pmatrix} \\
&=& \begin{pmatrix}
2 & 6t \\
0 & 2 \\
\end{pmatrix}
\end{eqnarray*}

Suppose $$f(x)$$ is any function defined by a convergent Taylor Series:

$f(x) = f(0) + f'(0)x + \frac{1}{2!}f''(0)x^{2} + \cdots\, .$

Then we can define the matrix function by just plugging in $$M$$:

$f(M) = f(0) + f'(0)M + \frac{1}{2!}f''(0)M^{2} + \cdots\, .$

There are additional techniques to determine the convergence of Taylor Series of matrices, based on the fact that the convergence problem is simple for diagonal matrices. It also turns out that the matrix exponential

$$\exp (M) = I + M + \frac{1}{2}M^{2} + \frac{1}{3!}M^{3} + \cdots\, ,$$

always converges.

## 7.3.4 Trace

A large matrix contains a great deal of information, some of which often reflects the fact that you have not set up your problem efficiently. For example, a clever choice of basis can often make the matrix of a linear transformation very simple. Therefore, finding ways to extract the essential information of a matrix is useful. Here we need to assume that $$n < \infty$$ otherwise there are subtleties with convergence that we'd have to address.

Definition: Trace

The $$\textit{trace}$$ of a square matrix $$M=(m_{j}^{i})$$ is the sum of its diagonal entries:
$\textit{tr}M = \sum_{i=1}^{n}m_{i}^{i}\, .$

While matrix multiplication does not commute, the trace of a product of matrices does not depend on the order of multiplication:

\begin{eqnarray*}
\textit{tr}(MN) & = & \textit{tr}( \sum_{l} M_{l}^{i} N_{j}^{l} ) \\
& = & \sum_{i} \sum_{l} M_{l}^{i} N_{i}^{l} \\
& = & \sum_{l} \sum_{i} N_{i}^{l} M_{l}^{i} \\
& = & \textit{tr}( \sum_{i} N_{i}^{l} M_{l}^{i} ) \\
& = & \textit{tr}( NM ).
\end{eqnarray*}

Thus we have a Theorem:

Theorem

$$\textit{tr}(MN)=\textit{tr}(NM)$$ for any square matrices $$M$$ and $$N$$.

Example 83

$M= \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ \end{pmatrix}, N= \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ \end{pmatrix}.$
so
$MN = \begin{pmatrix} 2 & 1 \\ 1 & 1 \\ \end{pmatrix} \neq NM = \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ \end{pmatrix}.$
However, $$\textit{tr}(MN) = 2+1 = 3 = 1+2 = \textit{tr}(NM)$$.

Another useful property of the trace is that:

$\textit{tr}M = \textit{tr}M^{T}$

This is true because the trace only uses the diagonal entries, which are fixed by the transpose. For example:

$$\textit{tr}\begin{pmatrix} 1 & 1 \\ 2 & 3 \\ \end{pmatrix} = 4 = \textit{tr}\begin{pmatrix} 1 & 2 \\ 1 & 3 \\ \end{pmatrix} = \textit{tr}\begin{pmatrix} 1 & 2 \\ 1 & 3 \\ \end{pmatrix}^{T}\, .$$
Finally, trace is a linear transformation from matrices to the real numbers. This is easy to check.