7.1: Linear Transformations and Matrices

Last updated
Save as PDF

Page ID: 1935

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Ordered, finite-dimensional, bases for vector spaces allows us to express linear operators as matrices.

Basis Notation

A basis allows us to efficiently label arbitrary vectors in terms of column vectors. Here is an example.

Example \(\PageIndex{1}\):

Let \[V=\left\{\begin{pmatrix}a&b\\c&d\end{pmatrix}\middle|a,b,c,d\in \mathbb{R}\right\}\]be the vector space of \(2\times 2\) real matrices, with addition and scalar multiplication defined componentwise. One choice of basis is the ordered set (or list) of matrices

\[B=\left(\begin{pmatrix}1&0\\0&0\end{pmatrix},\begin{pmatrix}0&1\\0&0\end{pmatrix},\begin{pmatrix}0&0\\ 1&0\end{pmatrix},\begin{pmatrix}0&0\\ 0&1\end{pmatrix}\right)=:(e_{1}^{1},e_{2}^{1},e^{2}_{1},e^{2}_{2})\, .\]

Given a particular vector and a basis, your job is to write that vector as a sum of multiples of basis elements. Here and arbitrary vector \(v\in V\) is just a matrix, so we write

\begin{eqnarray*}
v\ =\ \begin{pmatrix}a&b\\c&d\end{pmatrix}&=&\quad\!\! \begin{pmatrix}a&0\\0&0\end{pmatrix}+\begin{pmatrix}0&b\\0&0\end{pmatrix}+\begin{pmatrix}0&0\\ c&0\end{pmatrix}+\begin{pmatrix}0&0\\0&d\end{pmatrix}\\
&=&a\begin{pmatrix}1&0\\0&0\end{pmatrix}+b\begin{pmatrix}0&1\\0&0\end{pmatrix}+c\begin{pmatrix}0&0\\ 1&0\end{pmatrix}
+d\begin{pmatrix}0&0\\0&1\end{pmatrix}\\
&=&a\, e^{1}_{1}+b \, e^{1}_{2}+c \, e^{2}_{1}+d \, e^{2}_{2}\, .
\end{eqnarray*}

The coefficients \((a,b,c,d)\) of the basis vectors \((e^{1}_{1},e^{1}_{2},e^{2}_{1},e^{2}_{2})\) encode the information of which matrix the vector \(v\) is. We store them in column vector by writing

\[v= ae_{1}^{1} + be_{2}^{1} + ce_{1}^{2} + de_{2}^{2} =: (e_{1}^{1}, e_{2}^{1}, e_{1}^{2}, e_{2}^{2}) \begin{pmatrix}a\\b\\c\\d\end{pmatrix} =: \begin{pmatrix}a\\b\\c\\d\end{pmatrix}_{B}\]

The column vector \(\begin{pmatrix}a\\b\\c\\d\end{pmatrix}\) encodes the vector \(v\) but is NOT equal to it! (After all, \(v\) is a matrix so could not equal a column vector.) Both notations on the right hand side of the above equation really stand for the vector obtained by multiplying the coefficients stored in the column vector by the corresponding basis element and then summing over them.

Next, lets consider a tautological example showing how to label column vectors in terms of column vectors:

Example \(\PageIndex{2}\): (Standard Basis of \(\Re^{2}\))

The vectors

\[
e_{1}=\begin{pmatrix}1\\0\end{pmatrix},~~~
e_{2}=\begin{pmatrix}0\\1\end{pmatrix}
\]

are called the standard basis vectors of \(\Re^{2}=\Re^{\{1,2\}}\). Their description as functions of \(\{1,2\}\) are

\[e_{1}(k) = \left\{\begin{matrix}1,~~ if ~k = 1\\ 0,~~ if ~k = 2\end{matrix}\right.\]

\[e_{2}(k) = \left\{\begin{matrix}0,~~ if~ k = 1\\ 1,~~ if ~k = 2\end{matrix}\right.\]

It is natural to assign these the order: \(e_{1}\) is first and \(e_{2}\) is second. An arbitrary vector \(v\) of \(\Re^{2}\) can be written as

\[
v=\begin{pmatrix}x\\y\end{pmatrix}=x e_{1}+ ye_{2} .
\]

To emphasize that we are using the standard basis we define the list (or ordered set)

\[E=(e_{1},e_{2})\, ,\]

and write

\[\begin{pmatrix}x\\y\end{pmatrix}_{E}:=(e_{1},e_{2})\begin{pmatrix}x\\y\end{pmatrix}:=x e_{1}+ ye_{2}=v .\]

You should read this equation by saying:

"The column vector of the vector \(v\) in the basis \(E\) is \(\begin{pmatrix}x\\y\end{pmatrix}\).''

Again, the first notation of a column vector with a subscript \(E\) refers to the vector obtained by multiplying each basis vector
by the corresponding scalar listed in the column and then summing these, \(\textit {i.e.}\) \(xe_{1}+y e_{2}\). The second notation denotes exactly the same thing but we first list the basis elements and then the column vector; a useful trick because this can be read in the same way as matrix multiplication of a row vector times a column vector--except that the entries of the row vector are themselves vectors!

You should already try to write down the standard basis vectors for \(\Re^{n}\) for other values of \(n\) and express an arbitrary vector in \(\Re^{n}\) in terms of them.

The last example probably seems pedantic because column vectors are already just ordered lists of numbers and the basis notation has simply allowed us to "re-express'' these as lists of numbers. Of course, this objection does not apply to more complicated vector spaces like our first matrix example. Moreover, as we saw earlier, there are infinitely many other pairs of vectors in \(\Re^{2}\) that form a basis.

Example \(\PageIndex{3}\): (A Non-Standard Basis of \(\Re^{2}=\Re^{\{1,2\}}\))

\[b=\begin{pmatrix}1\\1\end{pmatrix}\, ,\quad \beta=\begin{pmatrix}1\\-1\end{pmatrix}.\]

As functions of \(\{1,2\}\) they read

\[b(k) = \left\{\begin{matrix}1,~~ if ~k = 1\\ 1,~~ if ~k = 2\end{matrix}\right.\]

\[\beta(k) = \left\{\begin{matrix}1,~~ if~ k = 1\\ -1,~~ if ~k = 2\end{matrix}\right.\]

Notice something important: there is no reason to say that \(\beta\) comes before \(b\) or vice versa. That is, there is no \(\textit{a priori}\) reason to give these basis elements one order or the other. However, it will be necessary to give the basis elements an order if we want to use them to encode other vectors. We choose one arbitrarily; let

\[B=(b,\beta)\]

be the ordered basis. Note that for an unordered set we use the \(\{\}\) parentheses while for lists or ordered sets we use \(()\).

As before we define

\[\begin{pmatrix}x\\y\end{pmatrix}_{B} :=(b,\beta)\begin{pmatrix}x \\ y\end{pmatrix}:= xb+ y \beta\, . \]

You might think that the numbers \(x\) and \(y\) denote exactly the same vector as in the previous example. However, they do not. Inserting the actual vectors that \(b\) and \(\beta\) represent we have

\[xb+ y \beta=x\begin{pmatrix}1\\1\end{pmatrix}+y\begin{pmatrix}1\\-1\end{pmatrix}=\begin{pmatrix}x+y\\x-y\end{pmatrix}\, .\]

Thus, to contrast, we have

\[\begin{pmatrix}x\\y\end{pmatrix}_{B}=\begin{pmatrix}x+y\\x-y\end{pmatrix} \mbox{ and } \begin{pmatrix}x\\y\end{pmatrix}_{E}=\begin{pmatrix}x\\y\end{pmatrix}\]

Only in the standard basis \(E\) does the column vector of \(v\) agree with the column vector that \(v\) actually is!

Based on the above example, you might think that our aim would be to find the "standard basis'' for any problem. In fact, this is far from the truth. Notice, for example that the vector

\[v=\begin{pmatrix}1\\1\end{pmatrix}=e_{1}+e_{2}=b\]

written in the standard basis \(E\) is just

\[v=\begin{pmatrix}1\\1\end{pmatrix}_{E}\, ,\]

which was easy to calculate. But in the basis \(B\) we find

\[v=\begin{pmatrix}1\\0\end{pmatrix}_{B}\, ,\]

which is actually a simpler column vector! The fact that there are many bases for any given vector space allows us to choose a basis in which our computation is easiest. In any case, the standard basis only makes sense for \(\Re^{n}\). Suppose your vector space was the set of solutions to a differential equation--what would a standard basis then be?

Example \(\PageIndex{4}\): A Basis For a Hyperplane

Lets again consider the hyperplane

\[V=\left\{ c_{1}\begin{pmatrix}1\\1\\0\end{pmatrix} +c_{2}\begin{pmatrix}0\\1\\1\end{pmatrix} \middle| c_{1},c_{2}\in \Re \right\} \]

One possible choice of ordered basis is

\[b_{1}=\begin{pmatrix}1\\1\\0\end{pmatrix},~~~b_{2}=\begin{pmatrix}0\\1\\1\end{pmatrix}, ~~~
B=(b_{1},b_{2}).
\]

With this choice

\[
\begin{pmatrix}x\\y\end{pmatrix}_{B} := xb_{1}+ y b_{2}
=x\begin{pmatrix}1\\1\\0\end{pmatrix}+y \begin{pmatrix}0\\1\\1\end{pmatrix}=\begin{pmatrix}x\\x+y\\y\end{pmatrix}_{E}.
\]

With the other choice of order \(B'=(b_{2}, b_{1})\)

\[
\begin{pmatrix}x\\y\end{pmatrix}_{\!B'} := xb_{2}+ y b_{2}
=x\begin{pmatrix}0\\1\\1\end{pmatrix}+y \begin{pmatrix}1\\1\\0\end{pmatrix}=\begin{pmatrix}y\\x+y\\x\end{pmatrix}_{E}.
\]

We see that the order of basis elements matters.

Finding the column vector of a given vector in a given basis usually amounts to a linear systems problem:

Example \(\PageIndex{5}\): Pauli Matrices

Let

\[V=\left\{\begin{pmatrix}z&u\\v&-z\end{pmatrix}\middle| z,u,v\in\mathbb{C}\right\}\]

be the vector space of trace-free complex-valued matrices (over \(\mathbb{C}\)) with basis \(B=(\sigma_{x},\sigma_{y},\sigma_{z})\), where

\[
\sigma_{x}=\begin{pmatrix}0&1\\1&0\end{pmatrix}\, ,\quad
\sigma_{y}=\begin{pmatrix}0&-i\\i&0\end{pmatrix}\, ,\quad
\sigma_{z}=\begin{pmatrix}1&0\\0&-1\end{pmatrix}\, .
\]

These three matrices are the famous \(\textit{Pauli matrices}\), they are used to describe electrons in quantum theory.

Let
\[
v=\begin{pmatrix}
-2+i&1+i\\3-i&-2-i
\end{pmatrix}\, .
\]

Find the column vector of \(v\) in the basis \(B\).

For this we must solve the equation

\[
\begin{pmatrix}
-2+i&1+i\\3-i&-2-i
\end{pmatrix}
=\alpha^{x} \begin{pmatrix}0&1\\1&0\end{pmatrix}+\alpha^{y}
\begin{pmatrix}0&-i\\i&0\end{pmatrix}+\alpha^{z}
\begin{pmatrix}1&0\\0&-1\end{pmatrix}\, .
\]

This gives three equations, \(\textit{i.e.}\) a linear systems problem, for the \(\alpha\)'s

\[
\left\{
\begin{array}{rrrrr}
\alpha^{x}&\!\!-\ i\alpha^{y}&&=&1+i\\
\alpha^{x}&\!\!+\ i\alpha^{y}&&=&3-i\\
&&\alpha^{z}&=&-2+i
\end{array}
\right.
\]

with solution

\[
\alpha^{x}=2\, ,\quad \alpha^{y}=2-2i\, ,\quad \alpha^{z}=-2+i\, .
\]

Hence

\[v=\begin{pmatrix}2\\\ 2-2i\\\!\!-2+i\end{pmatrix}_{B}\, .\]

To summarize, the \(\textit{column vector of a vector}\) \(v\) in an ordered basis \(B=(b_{1},b_{2},\ldots,b_{n})\),

\[\begin{pmatrix}\alpha^{1}\\\alpha^{2}\\\vdots\\\alpha^{n}\end{pmatrix}\, ,\]

is defined by solving the linear systems problem

\[v=\alpha^{1} b_{1} + \alpha^{2}b_{2} +\cdots + \alpha^{n} b_{n} = \sum_{i=1}^{n} \alpha^{i} b_{i}\, .\]
The numbers \((\alpha^{1},\alpha^{2},\ldots,\alpha^{n})\) are called the \(\textit{components of the vector}\) \(v\). Two useful shorthand notations for this are

\[v=\begin{pmatrix}\alpha^{1}\\\alpha^{2}\\\vdots\\\alpha^{n}\end{pmatrix}_{B} = (b_{1},b_{2},\ldots,b_{n})\begin{pmatrix}\alpha^{1}\\\alpha^{2}\\\vdots\\\alpha^{n}\end{pmatrix}\, .\]

From Linear Operators to Matrices

Chapter 6 showed that linear functions are very special kinds of functions; they are fully specified by their values on any basis for their domain. A matrix records how a linear operator maps an element of the basis to a sum of multiples in the target space basis.

More carefully, if \(L\) is a linear operator from \(V\) to \(W\) then the matrix for \(L\) in the ordered bases \(B=(b_{1},b_{2},\cdots)\) for \(V\) and \(B'=(\beta_{1},\beta_{2},\cdots)\) for \(W\) is the array of numbers \(m_{i}^{j}\) specified by
\[L(b_{i})= m_{i}^{1}\beta_{1}^{\phantom{1}}+\dots +m_{i}^{j} \beta_{j}^{\phantom{1}}+\cdots\]

Remark

To calculate the matrix of a linear transformation you must compute what the linear transformation does to every input basis vector and then write the answers in terms of the output basis vectors:

\begin{equation*}
\begin{split}
\big((L(b_{1})&,L(b_{2}),\ldots,L(b_{j}),\ldots\big)\\
&
={
\Big((\beta_{1},\beta_{2},\ldots,\beta_{j},\ldots)\begin{pmatrix}m^{1}_{1}\\m^{2}_{2}\\\vdots\\m^{j}_{1}\\\vdots\end{pmatrix},
(\beta_{1},\beta_{2},\ldots,\beta_{j},\ldots)\begin{pmatrix}m^{1}_{2}\\m^{2}_{2}\\\vdots\\m^{j}_{2}\\\vdots\end{pmatrix},\cdots,
(\beta_{1},\beta_{2},\ldots,\beta_{j},\ldots)\begin{pmatrix}m^{1}_{i}\\m^{2}_{i}\\\vdots\\m^{j}_{i}\\\vdots\end{pmatrix},\cdots\Big)}\\&
=(\beta_{1},\beta_{2},\ldots,\beta_{j},\ldots)\begin{pmatrix}
m_{1}^{1} & m_{2}^{1} & \cdots & m_{i}^{1} &\cdots\\
m_{1}^{2} & m_{2}^{2} & \cdots & m_{i}^{2} &\cdots\\
\vdots& \vdots & & \vdots &\\
m_{1}^{j} & m_{2}^{j} & \cdots & m_{i}^{j}& \cdots \\
\vdots&\vdots&&\vdots
\end{pmatrix}
\end{split}
\end{equation*}

Example \(\PageIndex{6}\):

Consider \(L:V\to \Re^{3}\) defined by
\[
L\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}0\\1\\0\end{pmatrix}\, ,\quad
L\begin{pmatrix}0\\1\\1\end{pmatrix} = \begin{pmatrix}0\\1\\0\end{pmatrix}\, .
\]
By linearity this specifies the action of \(L\) on any vector from \(V\) as
\[
L\left[ c_{1}\begin{pmatrix}1\\1\\0\end{pmatrix} + c_{2} \begin{pmatrix}0\\1\\1\end{pmatrix} \right]= (c_{1}+c_{2})\begin{pmatrix}0\\1\\0\end{pmatrix}.
\]
We had trouble expressing this linear operator as a matrix. Lets take input basis
\[
B=\left(\begin{pmatrix}1\\1\\0\end{pmatrix} ,\begin{pmatrix}0\\1\\1\end{pmatrix}\right)=:(b_{1},b_{2})\, ,
\]
and output basis
\[
E=\left(\begin{pmatrix}1\\0\\0\end{pmatrix} ,\begin{pmatrix}0\\1\\0\end{pmatrix} ,\begin{pmatrix}0\\0\\1\end{pmatrix}\right)\, .
\]
Then
\[
L b_{1} = 0. e_{1} + 1.e_{2}+ 0. e_{3} = L b_{2}\, ,
\]
or
\[
\big(Lb_{1}, L b_{2}) = \big( (e_{1},e_{2},e_{3})\begin{pmatrix}0\\1\\0\end{pmatrix}, (e_{1},e_{2},e_{3})\begin{pmatrix}0\\1\\0\end{pmatrix}\big)=(e_{1},e_{2},e_{3})\begin{pmatrix}0&0\\1&1\\0&0
\end{pmatrix}\, .
\]
The matrix on the right is the matrix of \(L\) in these bases. More succinctly we could write
\[
L\begin{pmatrix}x\\y\end{pmatrix}_{B}=(x+y) \begin{pmatrix}0\\1\\0\end{pmatrix}_{E}
\]
and thus see that \(L\) acts like the matrix
\[
\begin{pmatrix}
0&0\\
1&1\\
0&0
\end{pmatrix}
\].

Hence

\[L\begin{pmatrix}x\\y\end{pmatrix}_{B}
=\left( \begin{pmatrix}
0&0\\
1&1\\
0&0
\end{pmatrix}
\begin{pmatrix}x\\y\end{pmatrix} \right)_E\, ;
\]

given input and output bases, the linear operator is now encoded by a matrix.

This is the general rule for this chapter:
\[\textit{Linear operators become matrices when given ordered input and output bases.}\]

Example \(\PageIndex{7}\):

Lets compute a matrix for the derivative operator acting on the vector space of polynomials of degree 2 or less:

\[V = \{a_{0}1 + a_{1}x + a_{2} x^{2} \,|\, a_{0},a_{1},a_{2} \in \Re \}\, .\]

In the ordered basis \(B=( 1,x,x^{2})\) we write

\[\begin{pmatrix}a\\b\\c\end{pmatrix}_{\!B}= a\cdot 1 + b x + c x^{2}\]

and

\[\frac{d}{dx} \begin{pmatrix}a\\b\\c\end{pmatrix}_{\!B}=b\cdot 1 +2c x +0x^{2}=\begin{pmatrix}b\\2c\\0\end{pmatrix}_B\]

In the ordered basis \(B\) for both domain and range

\[
\frac{d}{dx}
=
\begin{pmatrix}
0&1&0\\
0&0&2\\
0&0&0
\end{pmatrix}\]

Notice this last equation makes no sense without explaining which bases we are using!

Contributor

David Cherney, Tom Denton, and Andrew Waldron (UC Davis)