6.2: Linear Maps and Functionals. Matrices

Last updated
Save as PDF

Page ID: 19198

Elias Zakon
University of Windsor via The Trilla Group (support by Saylor Foundation)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

For an adequate definition of differentiability, we need the notion of a linear map. Below, \(E^{\prime}, E^{\prime \prime},\) and \(E\) denote normed spaces over the same scalar field, \(E^{1}\) or \(C.\)

Definition 1

A function \(f : E^{\prime} \rightarrow E\) is a linear map if and only if for all \(\vec{x}, \vec{y} \in E^{\prime}\) and scalars \(a, b\)

\[f(a \vec{x}+b \vec{y})=a f(\vec{x})+b f(\vec{y});\]

equivalently, iff for all such \(\vec{x}, \vec{y},\) and \(a\)

\[f(\vec{x}+\vec{y})=f(x)+f(y) \text { and } f(a \vec{x})=a f(\vec{x}). \text {(Verify!)}\]

If \(E=E^{\prime},\) such a map is also called a linear operator.

If the range space \(E\) is the scalar field of \(E^{\prime},\) (i.e., \(E^{1}\) or \(C,)\) the linear \(f\) is also called a (real or complex) linear functional on \(E^{\prime}.\)

Note 1. Induction extends formula (1) to any "linear combinations":

\[f\left(\sum_{i=1}^{m} a_{i} \vec{x}_{i}\right)=\sum_{i=1}^{m} a_{i} f\left(\vec{x}_{i}\right)\]

for all \(\vec{x}_{i} \in E^{\prime}\) and scalars \(a_{i}\).

Briefly: A linear map \(f\) preserves linear combinations.

Note 2. Taking \(a=b=0\) in (1), we obtain \(f(\overrightarrow{0})=0\) if \(f\) is linear.

Examples

(a) Let \(E^{\prime}=E^{n}\left(C^{n}\right).\) Fix a vector \(\vec{v}=\left(v_{1}, \ldots, v_{n}\right)\) in \(E^{\prime}\) and set

\[\left(\forall \vec{x} \in E^{\prime}\right) \quad f(\vec{x})=\vec{x} \cdot \vec{v}\]

(inner product; see Chapter 3, §§1-3 and §9).

Then

\[\begin{aligned} f(a \vec{x}+b \vec{y}) &=(a \vec{x}) \cdot \vec{v}+(b \vec{y}) \cdot \vec{v} \\ &=a(\vec{x} \cdot \vec{v})+b(\vec{y} \cdot \vec{v}) \\ &=a f(\vec{x})+b f(\vec{y}); \end{aligned}\]

so \(f\) is linear. Note that if \(E^{\prime}=E^{n},\) then by definition,

\[f(\vec{x})=\vec{x} \cdot \vec{v}=\sum_{k=1}^{n} x_{k} v_{k}=\sum_{k=1}^{n} v_{k} x_{k}.\]

If, however, \(E^{\prime}=C^{n},\) then

\[f(\vec{x})=\vec{x} \cdot \vec{v}=\sum_{k=1}^{n} x_{k} \overline{v}_{k}=\sum_{k=1}^{n} \overline{v}_{k} x_{k},\]

where \(\overline{v}_{k}\) is the conjugate of the complex number \(v_{k}\).

By Theorem 3 in Chapter 4, §3, \(f\) is continuous (a polynomial!).

Moreover, \(f(\vec{x})=\vec{x} \cdot \vec{v}\) is a scalar (in \(E^{1}\) or \(C).\) Thus the range of \(f\) lies in the scalar field of \(E^{\prime};\) so \(f\) is a linear functional on \(E^{\prime}.\)

(b) Let \(I=[0,1].\) Let \(E^{\prime}\) be the set of all functions \(u : I \rightarrow E\) that are of class \(CD^{\infty}\) (Chapter 5, §6) on \(I\), hence bounded there (Theorem 2 of Chapter 4, §8).

As in Example (C) in Chapter 3, §10, \(E^{\prime}\) is a normed linear space, with norm

\[\|u\|=\sup _{x \in I}|u(x)|.\]

Here each function \(u \in E^{\prime}\) is treated as a single "point" in \(E^{\prime}.\) The
distance between two such points, \(u\) and \(v,\) equals \(\|u-v\|,\) by definition.

Now define a map \(D\) on \(E^{\prime}\) by setting \(D(u)=u^{\prime}\) (derivative of \(u\) on \(I\)). As every \(u \in E^{\prime}\) is of class \(CD^{\infty},\) so is \(u^{\prime}.\)

Thus \(D(u)=u^{\prime} \in E^{\prime},\) and so \(D : E^{\prime} \rightarrow E^{\prime}\) is a linear operator. (Its linearity follows from Theorem 4 in Chapter 5, §1.)

(c) Let again \(I=[0,1].\) Let \(E^{\prime}\) be the set of all functions \(u : I \rightarrow E\) that are bounded and have antiderivatives (Chapter 5, §5) on \(I.\) With norm \(\|u\|\) as in Example (b), \(E^{\prime}\) is a normed linear space.

Now define \(\phi : E^{\prime} \rightarrow E\) by

\[\phi(u)=\int_{0}^{1} u,\]

with \(\int u\) as in Chapter 5, §5. (Recall that \(\int_{0}^{1} u\) is an element of \(E\) if \(u : I \rightarrow E.\) ) By Corollary 1 in Chapter 5, §5, \(\phi\) is a linear map of \(E^{\prime}\) into \(E\). (Why?)

(d) The zero map \(f=0\) on \(E^{\prime}\) is always linear. (Why?)

Theorem \(\PageIndex{1}\)

A linear map \(f : E^{\prime} \rightarrow E\) is continuous (even uniformly so) on all of \(E^{\prime}\) iff it is continuous at \(\overrightarrow{0};\) equivalently, iff there is a real \(c>0\) such that

\[\left(\forall \vec{x} \in E^{\prime}\right) \quad|f(\vec{x})| \leq c|\vec{x}|.\]

(We call this property linear boundedness.)

Proof

Assume that \(f\) is continuous at \(\overrightarrow{0}.\) Then, given \(\varepsilon>0,\) there is \(\delta>0\) such that

\[|f(\vec{x})-f(\overrightarrow{0})|=|f(\vec{x})| \leq \varepsilon\]

whenever \(|\vec{x}-\overrightarrow{0}|=|\vec{x}|<\delta\).

Now, for any \(\vec{x} \neq \overrightarrow{0},\) we surely have

\[\left|\frac{\delta \vec{x}}{|\vec{x}|}\right|=\frac{\delta}{2}<\delta.\]

Hence

\[(\forall \vec{x} \neq \overrightarrow{0}) \quad\left|f\left(\frac{\delta \vec{x}}{2|\vec{x}|}\right)\right| \leq \varepsilon,\]

or, by linearity,

\[\frac{\delta}{2|\vec{x}|}|f(\vec{x})| \leq \varepsilon,\]

i.e.,

\[|f(\vec{x})| \leq \frac{2 \varepsilon}{\delta}|\vec{x}|.\]

By Note 2, this also holds if \(\vec{x}=\overrightarrow{0}\).

Thus, taking \(c=2 \varepsilon / \delta,\) we obtain

\[\left(\forall \vec{x} \in E^{\prime}\right) \quad f(\vec{x}) \leq c|\vec{x}| \quad \text {(linear boundedness).}\]

Now assume (3). Then

\[\left(\forall \vec{x}, \vec{y} \in E^{\prime}\right) \quad|f(\vec{x}-\vec{y})| \leq c|\vec{x}-\vec{y}|;\]

or, by linearity,

\[\left(\forall \vec{x}, \vec{y} \in E^{\prime}\right) \quad|f(\vec{x})-f(\vec{y})| \leq c|\vec{x}-\vec{y}|.\]

Hence \(f\) is uniformly continuous (given \(\varepsilon>0,\) take \(\delta=\varepsilon / c).\) This, in turn, implies continuity at \(\overrightarrow{0};\) so all conditions are equivalent, as claimed. \(\quad \square\)

A linear map need not be continuous. But, for \(E^{n}\) and \(C^{n},\) we have the following result.

Theorem \(\PageIndex{2}\)

(i) Any linear map on \(E^{n}\) or \(C^{n}\) is uniformly continuous.

(ii) Every linear functional on \(E^{n}\left(C^{n}\right)\) has the form

\[f(\vec{x})=\vec{x} \cdot \vec{v} \quad \text {(dot product)}\]

for some unique vector \(\vec{v} \in E^{n}\left(C^{n}\right),\) dependent on \(f\) only.

Proof

Suppose \(f : E^{n} \rightarrow E\) is linear; so \(f\) preserves linear combinations.

But every \(\vec{x} \in E^{n}\) is such a combination,

\[\vec{x}=\sum_{k=1}^{n} x_{k} \vec{e}_{k} \quad \text {(Theorem 2 in Chapter 3, §§1-3).}\]

Thus, by Note 1,

\[f(\vec{x})=f\left(\sum_{k=1}^{n} x_{k} \vec{e}_{k}\right)=\sum_{k=1}^{n} x_{k} f\left(\vec{e}_{k}\right).\]

Here the function values \(f\left(\vec{e}_{k}\right)\) are fixed vectors in the range space \(E,\) say,

\[f\left(\vec{e}_{k}\right)=v_{k} \in E,\]

so that

\[f(\vec{x})=\sum_{k=1}^{n} x_{k} f\left(\vec{e}_{k}\right)=\sum_{k=1}^{n} x_{k} v_{k}, \quad v_{k} \in E.\]

Thus \(f\) is a polynomial in \(n\) real variables \(x_{k},\) hence continuous (even uniformly so, by Theorem 1).

In particular, if \(E=E^{1}\) (i.e., \(f\) is a linear functional) then all \(v_{k}\) in (5) are real numbers; so they form a vector

\[\vec{v}=\left(v_{1}, \ldots, v_{k}\right) \text { in } E^{n},\]

and (5) can be written as

\[f(\vec{x})=\vec{x} \cdot \vec{v}.\]

The vector \(\vec{v}\) is unique. For suppose there are two vectors, \(\vec{u}\) and \(\vec{v},\) such that

\[\left(\forall \vec{x} \in E^{n}\right) \quad f(\vec{x})=\vec{x} \cdot \vec{v}=\vec{x} \cdot \vec{u}.\]

Then

\[\left(\forall \vec{x} \in E^{n}\right) \quad \vec{x} \cdot(\vec{v}-\vec{u})=0.\]

By Problem 10 of Chapter 3, §§1-3, this yields \(\vec{v}-\vec{u}=\overrightarrow{0},\) or \(\vec{v}=\vec{u}.\) This completes the proof for \(E=E^{n}.\)

It is analogous for \(C^{n};\) only in (ii) the \(v_{k}\) are complex and one has to replace them by their conjugates \(\overline{v}_{k}\) when forming the vector \(\vec{v}\) to obtain \(f(\vec{x})=\vec{x} \cdot \vec{v}\). Thus all is proved. \(\quad \square\)

Note 3. Formula (5) shows that a linear map \(f : E^{n}\left(C^{n}\right) \rightarrow E\) is uniquely determined by the \(n\) function values \(v_{k}=f\left(\vec{e}_{k}\right)\).

If further \(E=E^{m}\left(C^{m}\right),\) the vectors \(v_{k}\) are \(m\) -tuples of scalars,

\[v_{k}=\left(v_{1 k}, \ldots, v_{m k}\right).\]

We often write such vectors vertically, as the \(n\) "columns" in an array of \(m\) "rows" and \(n\) "columns":

\[\left(\begin{array}{cccc}{v_{11}} & {v_{12}} & {\dots} & {v_{1 n}} \\ {v_{21}} & {v_{22}} & {\dots} & {v_{2 n}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {v_{m 1}} & {v_{m 2}} & {\dots} & {v_{m n}} \end{array}\right).\]

Formally, (6) is a double sequence of \(m n\) terms, called an \(m \times n\) matrix. We denote it by \([f]=\left(v_{i k}\right),\) where for \(k=1,2, \ldots, n\),

\[f\left(\vec{e}_{k}\right)=v_{k}=\left(v_{1 k}, \ldots, v_{m k}\right).\]

Thus linear maps \(f : E^{n} \rightarrow E^{m}\) (or \(f : C^{n} \rightarrow C^{m})\) correspond one-to-one to their matrices \([f].\)

The easy proof of Corollaries 1 to 3 below is left to the reader.

Corollary \(\PageIndex{1}\)

If \(f, g : E^{\prime} \rightarrow E\) are linear, so is

\[h=a f+b g\]

for any scalars \(a, b\).

If further \(E^{\prime}=E^{n}\left(C^{n}\right)\) and \(E=E^{m}\left(C^{m}\right),\) with \([f]=\left(v_{i k}\right)\) and \([g]=\left(w_{i k}\right)\), then

\[[h]=\left(a v_{i k}+b w_{i k}\right).\]

Corollary \(\PageIndex{2}\)

A map \(f : E^{n}\left(C^{n}\right) \rightarrow E\) is linear iff

\[f(\vec{x})=\sum_{k=1}^{n} v_{k} x_{k},\]

where \(v_{k}=f\left(\vec{e}_{k}\right)\).

Hint: For the "if," use Corollary 1. For the "only if," use formula (5) above.

Corollary \(\PageIndex{3}\)

If \(f : E^{\prime} \rightarrow E^{\prime \prime}\) and \(g : E^{\prime \prime} \rightarrow E\) are linear, so is the composite \(h=g \circ f.\)

Our next theorem deals with the matrix of the composite linear map \(g \circ f\)

Theorem \(\PageIndex{3}\)

Let \(f : E^{\prime} \rightarrow E^{\prime \prime}\) and \(g : E^{\prime \prime} \rightarrow E\) be linear, with

\[E^{\prime}=E^{n}\left(C^{n}\right), E^{\prime \prime}=E^{m}\left(C^{m}\right), \text { and } E=E^{r}\left(C^{r}\right).\]

If \([f]=\left(v_{i k}\right)\) and \([g]=\left(w_{j i}\right),\) then

\[[h]=[g \circ f]=\left(z_{j k}\right),\]

where

\[z_{j k}=\sum_{i=1}^{m} w_{j i} v_{i k}, \quad j=1,2, \ldots, r, k=1,2, \ldots, n.\]

Proof

Denote the basic unit vectors in \(E^{\prime}\) by

\[e_{1}^{\prime}, \ldots, e_{n}^{\prime},\]

those in \(E^{\prime \prime}\) by

\[e_{1}^{\prime \prime}, \ldots, e_{m}^{\prime \prime},\]

and those in \(E\) by

\[e_{1}, \ldots, e_{r}.\]

Then for \(k=1,2, \ldots, n\),

\[f\left(e_{k}^{\prime}\right)=v_{k}=\sum_{i=1}^{m} v_{i k} e_{i}^{\prime \prime} \text { and } h\left(e_{k}^{\prime}\right)=\sum_{j=1}^{r} z_{j k} e_{j},\]

and for \(i=1, \dots m\),

\[g\left(e_{i}^{\prime \prime}\right)=\sum_{j=1}^{r} w_{j i} e_{j}.\]

Also,

\[h\left(e_{k}^{\prime}\right)=g\left(f\left(e_{k}^{\prime}\right)\right)=g\left(\sum_{i=1}^{m} v_{i k} e_{i}^{\prime \prime}\right)=\sum_{i=1}^{m} v_{i k} g\left(e_{i}^{\prime \prime}\right)=\sum_{i=1}^{m} v_{i k}\left(\sum_{j=1}^{r} w_{j i} e_{j}\right).\]

Thus

\[h\left(e_{k}^{\prime}\right)=\sum_{j=1}^{r} z_{j k} e_{j}=\sum_{j=1}^{r}\left(\sum_{i=1}^{m} w_{j i} v_{i k}\right) e_{j}.\]

But the representation in terms of the \(e_{j}\) is unique (Theorem 2 in Chapter 3, §§1-3), so, equating coefficients, we get (7). \(\quad \square\)

Note 4. Observe that \(z_{j k}\) is obtained, so to say, by "dot-multiplying" the \(j\)th row of \([g]\) (an \(r \times m\) matrix) by the \(k\)th column of \([f]\) (an \(m \times n\) matrix).

It is natural to set

\[[g][f]=[g \circ f],\]

\[\left(w_{j i}\right)\left(v_{i k}\right)=\left(z_{j k}\right),\]

with \(z_{j k}\) as in (7).

Caution. Matrix multiplication, so defined, is not commutative.

Definition 2

The set of all continuous linear maps \(f : E^{\prime} \rightarrow E\) (for fixed \(E^{\prime} \) and \(E\)) is denoted \(L(E^{\prime}, E).\)

If \(E=E^{\prime},\) we write \(L(E)\) instead.

For each \(f\) in \(L\left(E^{\prime}, E\right),\) we define its norm by

\[\|f\|=\sup _{|\vec{x}| \leq 1}|f(\vec{x})|.\]

Note that \(\|f\|<+\infty,\) by Theorem 1.

Theorem \(\PageIndex{4}\)

\(L(E^{\prime}, E)\) is a normed linear space under the norm defined above and under the usual operations on functions, as in Corollary 1.

Proof

Corollary 1 easily implies that \(L(E^{\prime}, E)\) is a vector space. We now show that \(\|\cdot\|\) is a genuine norm.

The triangle law,

\[\|f+g\| \leq\|f\|+\|g\|,\]

follows exactly as in Example (C) of Chapter 3, §10. (Verify!)

Also, by Problem 5 in Chapter 2, §§8-9, \(\sup |a f(\vec{x})|=|a| \sup |f(\vec{x})|.\) Hence \(\|a f\|=|a|\|f\|\) for any scalar \(a.\)

As noted above, \(0 \leq\|f\|<+\infty\).

It remains to show that \(\|f\|=0\) iff \(f\) is the zero map. If

\[\|f\|=\sup _{|\vec{x}| \leq 1}|f(\vec{x})|=0,\]

then \(|f(\vec{x})|=0\) when \(|\vec{x}| \leq 1.\) Hence, if \(\vec{x} \neq \overrightarrow{0}\),

\[f(\frac{\vec{x}}{|\vec{x}|})=\frac{1}{|\vec{x}|} f(\vec{x})=0.\]

As \(f(\overrightarrow{0})=0,\) we have \(f(\vec{x})=0\) for all \(\vec{x} \in E^{\prime}\).

Thus \(\|f\|=0\) implies \(f=0,\) and the converse is clear. Thus all is proved. \(\quad \square\)

Note 5. A similar proof, via \(f\left(\frac{\vec{x}}{|\vec{x}|}\right)\) and properties of lub, shows that

\[\|f\|=\sup _{\vec{x} \neq 0}\left|\frac{f(\vec{x})}{|\vec{x}|}\right|\]

and

\[(\forall \vec{x} \in E^{\prime}) \quad|f(\vec{x})| \leq\|f\||\vec{x}|.\]

It also follows that \(\|f\|\) is the least real \(c\) such that

\[(\forall \vec{x} \in E^{\prime}) \quad|f(\vec{x})| \leq c|\vec{x}|.\]

Verify. (See Problem 3'.)

As in any normed space, we define distances in \(L(E^{\prime}, E)\) by

\[\rho(f, g)=\|f-g\|,\]

making it a metric space; so we may speak of convergence, limits, etc. in it.

Corollary \(\PageIndex{4}\)

If \(f \in L(E^{\prime}, E^{\prime \prime})\) and \(g \in L(E^{\prime \prime}, E),\) then

\[\|g \circ f\| \leq\|g\|\|f\|.\]

Proof

By Note 5,

\[\left(\forall \vec{x} \in E^{\prime}\right) \quad|g(f(\vec{x}))| \leq\|g\||f(\vec{x})| \leq\|g\|\|f\||\vec{x}|.\]

Hence

\[(\forall \vec{x} \neq \overrightarrow{0}) \quad\left|\frac{(g \circ f)(\vec{x})}{|\vec{x}|}\right| \leq\|g\|\|f\|,\]

and so

\[\|g\|\|f\| \geq \sup _{\vec{x} \neq \overline{0}} \frac{|(g \circ f)(\vec{x})|}{|\vec{x}|}=\|g \circ f\|. \quad \square\]