6.2: Linear Maps and Functionals. Matrices
For an adequate definition of differentiability, we need the notion of a linear map. Below, \(E^{\prime}, E^{\prime \prime},\) and \(E\) denote normed spaces over the same scalar field, \(E^{1}\) or \(C.\)
A function \(f : E^{\prime} \rightarrow E\) is a linear map if and only if for all \(\vec{x}, \vec{y} \in E^{\prime}\) and scalars \(a, b\)
\[f(a \vec{x}+b \vec{y})=a f(\vec{x})+b f(\vec{y});\]
equivalently, iff for all such \(\vec{x}, \vec{y},\) and \(a\)
\[f(\vec{x}+\vec{y})=f(x)+f(y) \text { and } f(a \vec{x})=a f(\vec{x}). \text {(Verify!)}\]
If \(E=E^{\prime},\) such a map is also called a linear operator.
If the range space \(E\) is the scalar field of \(E^{\prime},\) (i.e., \(E^{1}\) or \(C,)\) the linear \(f\) is also called a (real or complex) linear functional on \(E^{\prime}.\)
Note 1. Induction extends formula (1) to any "linear combinations":
\[f\left(\sum_{i=1}^{m} a_{i} \vec{x}_{i}\right)=\sum_{i=1}^{m} a_{i} f\left(\vec{x}_{i}\right)\]
for all \(\vec{x}_{i} \in E^{\prime}\) and scalars \(a_{i}\).
Briefly: A linear map \(f\) preserves linear combinations.
Note 2. Taking \(a=b=0\) in (1), we obtain \(f(\overrightarrow{0})=0\) if \(f\) is linear.
(a) Let \(E^{\prime}=E^{n}\left(C^{n}\right).\) Fix a vector \(\vec{v}=\left(v_{1}, \ldots, v_{n}\right)\) in \(E^{\prime}\) and set
\[\left(\forall \vec{x} \in E^{\prime}\right) \quad f(\vec{x})=\vec{x} \cdot \vec{v}\]
(inner product; see Chapter 3, §§1-3 and §9).
Then
\[\begin{aligned} f(a \vec{x}+b \vec{y}) &=(a \vec{x}) \cdot \vec{v}+(b \vec{y}) \cdot \vec{v} \\ &=a(\vec{x} \cdot \vec{v})+b(\vec{y} \cdot \vec{v}) \\ &=a f(\vec{x})+b f(\vec{y}); \end{aligned}\]
so \(f\) is linear. Note that if \(E^{\prime}=E^{n},\) then by definition,
\[f(\vec{x})=\vec{x} \cdot \vec{v}=\sum_{k=1}^{n} x_{k} v_{k}=\sum_{k=1}^{n} v_{k} x_{k}.\]
If, however, \(E^{\prime}=C^{n},\) then
\[f(\vec{x})=\vec{x} \cdot \vec{v}=\sum_{k=1}^{n} x_{k} \overline{v}_{k}=\sum_{k=1}^{n} \overline{v}_{k} x_{k},\]
where \(\overline{v}_{k}\) is the conjugate of the complex number \(v_{k}\).
By Theorem 3 in Chapter 4, §3, \(f\) is continuous (a polynomial!).
Moreover, \(f(\vec{x})=\vec{x} \cdot \vec{v}\) is a scalar (in \(E^{1}\) or \(C).\) Thus the range of \(f\) lies in the scalar field of \(E^{\prime};\) so \(f\) is a linear functional on \(E^{\prime}.\)
(b) Let \(I=[0,1].\) Let \(E^{\prime}\) be the set of all functions \(u : I \rightarrow E\) that are of class \(CD^{\infty}\) (Chapter 5, §6) on \(I\), hence bounded there (Theorem 2 of Chapter 4, §8).
As in Example (C) in Chapter 3, §10, \(E^{\prime}\) is a normed linear space, with norm
\[\|u\|=\sup _{x \in I}|u(x)|.\]
Here each function \(u \in E^{\prime}\) is treated as a single "point" in \(E^{\prime}.\) The
distance between two such points, \(u\) and \(v,\) equals \(\|u-v\|,\) by definition.
Now define a map \(D\) on \(E^{\prime}\) by setting \(D(u)=u^{\prime}\) (derivative of \(u\) on \(I\)). As every \(u \in E^{\prime}\) is of class \(CD^{\infty},\) so is \(u^{\prime}.\)
Thus \(D(u)=u^{\prime} \in E^{\prime},\) and so \(D : E^{\prime} \rightarrow E^{\prime}\) is a linear operator. (Its linearity follows from Theorem 4 in Chapter 5, §1.)
(c) Let again \(I=[0,1].\) Let \(E^{\prime}\) be the set of all functions \(u : I \rightarrow E\) that are bounded and have antiderivatives (Chapter 5, §5) on \(I.\) With norm \(\|u\|\) as in Example (b), \(E^{\prime}\) is a normed linear space.
Now define \(\phi : E^{\prime} \rightarrow E\) by
\[\phi(u)=\int_{0}^{1} u,\]
with \(\int u\) as in Chapter 5, §5. (Recall that \(\int_{0}^{1} u\) is an element of \(E\) if \(u : I \rightarrow E.\) ) By Corollary 1 in Chapter 5, §5, \(\phi\) is a linear map of \(E^{\prime}\) into \(E\). (Why?)
(d) The zero map \(f=0\) on \(E^{\prime}\) is always linear. (Why?)
A linear map \(f : E^{\prime} \rightarrow E\) is continuous (even uniformly so) on all of \(E^{\prime}\) iff it is continuous at \(\overrightarrow{0};\) equivalently, iff there is a real \(c>0\) such that
\[\left(\forall \vec{x} \in E^{\prime}\right) \quad|f(\vec{x})| \leq c|\vec{x}|.\]
(We call this property linear boundedness.)
- Proof
-
Assume that \(f\) is continuous at \(\overrightarrow{0}.\) Then, given \(\varepsilon>0,\) there is \(\delta>0\) such that
\[|f(\vec{x})-f(\overrightarrow{0})|=|f(\vec{x})| \leq \varepsilon\]
whenever \(|\vec{x}-\overrightarrow{0}|=|\vec{x}|<\delta\).
Now, for any \(\vec{x} \neq \overrightarrow{0},\) we surely have
\[\left|\frac{\delta \vec{x}}{|\vec{x}|}\right|=\frac{\delta}{2}<\delta.\]
Hence
\[(\forall \vec{x} \neq \overrightarrow{0}) \quad\left|f\left(\frac{\delta \vec{x}}{2|\vec{x}|}\right)\right| \leq \varepsilon,\]
or, by linearity,
\[\frac{\delta}{2|\vec{x}|}|f(\vec{x})| \leq \varepsilon,\]
i.e.,
\[|f(\vec{x})| \leq \frac{2 \varepsilon}{\delta}|\vec{x}|.\]
By Note 2, this also holds if \(\vec{x}=\overrightarrow{0}\).
Thus, taking \(c=2 \varepsilon / \delta,\) we obtain
\[\left(\forall \vec{x} \in E^{\prime}\right) \quad f(\vec{x}) \leq c|\vec{x}| \quad \text {(linear boundedness).}\]
Now assume (3). Then
\[\left(\forall \vec{x}, \vec{y} \in E^{\prime}\right) \quad|f(\vec{x}-\vec{y})| \leq c|\vec{x}-\vec{y}|;\]
or, by linearity,
\[\left(\forall \vec{x}, \vec{y} \in E^{\prime}\right) \quad|f(\vec{x})-f(\vec{y})| \leq c|\vec{x}-\vec{y}|.\]
Hence \(f\) is uniformly continuous (given \(\varepsilon>0,\) take \(\delta=\varepsilon / c).\) This, in turn, implies continuity at \(\overrightarrow{0};\) so all conditions are equivalent, as claimed. \(\quad \square\)
A linear map need not be continuous. But, for \(E^{n}\) and \(C^{n},\) we have the following result.
(i) Any linear map on \(E^{n}\) or \(C^{n}\) is uniformly continuous.
(ii) Every linear functional on \(E^{n}\left(C^{n}\right)\) has the form
\[f(\vec{x})=\vec{x} \cdot \vec{v} \quad \text {(dot product)}\]
for some unique vector \(\vec{v} \in E^{n}\left(C^{n}\right),\) dependent on \(f\) only.
- Proof
-
Suppose \(f : E^{n} \rightarrow E\) is linear; so \(f\) preserves linear combinations.
But every \(\vec{x} \in E^{n}\) is such a combination,
\[\vec{x}=\sum_{k=1}^{n} x_{k} \vec{e}_{k} \quad \text {(Theorem 2 in Chapter 3, §§1-3).}\]
Thus, by Note 1,
\[f(\vec{x})=f\left(\sum_{k=1}^{n} x_{k} \vec{e}_{k}\right)=\sum_{k=1}^{n} x_{k} f\left(\vec{e}_{k}\right).\]
Here the function values \(f\left(\vec{e}_{k}\right)\) are fixed vectors in the range space \(E,\) say,
\[f\left(\vec{e}_{k}\right)=v_{k} \in E,\]
so that
\[f(\vec{x})=\sum_{k=1}^{n} x_{k} f\left(\vec{e}_{k}\right)=\sum_{k=1}^{n} x_{k} v_{k}, \quad v_{k} \in E.\]
Thus \(f\) is a polynomial in \(n\) real variables \(x_{k},\) hence continuous (even uniformly so, by Theorem 1).
In particular, if \(E=E^{1}\) (i.e., \(f\) is a linear functional) then all \(v_{k}\) in (5) are real numbers; so they form a vector
\[\vec{v}=\left(v_{1}, \ldots, v_{k}\right) \text { in } E^{n},\]
and (5) can be written as
\[f(\vec{x})=\vec{x} \cdot \vec{v}.\]
The vector \(\vec{v}\) is unique. For suppose there are two vectors, \(\vec{u}\) and \(\vec{v},\) such that
\[\left(\forall \vec{x} \in E^{n}\right) \quad f(\vec{x})=\vec{x} \cdot \vec{v}=\vec{x} \cdot \vec{u}.\]
Then
\[\left(\forall \vec{x} \in E^{n}\right) \quad \vec{x} \cdot(\vec{v}-\vec{u})=0.\]
By Problem 10 of Chapter 3, §§1-3, this yields \(\vec{v}-\vec{u}=\overrightarrow{0},\) or \(\vec{v}=\vec{u}.\) This completes the proof for \(E=E^{n}.\)
It is analogous for \(C^{n};\) only in (ii) the \(v_{k}\) are complex and one has to replace them by their conjugates \(\overline{v}_{k}\) when forming the vector \(\vec{v}\) to obtain \(f(\vec{x})=\vec{x} \cdot \vec{v}\). Thus all is proved. \(\quad \square\)
Note 3. Formula (5) shows that a linear map \(f : E^{n}\left(C^{n}\right) \rightarrow E\) is uniquely determined by the \(n\) function values \(v_{k}=f\left(\vec{e}_{k}\right)\).
If further \(E=E^{m}\left(C^{m}\right),\) the vectors \(v_{k}\) are \(m\) -tuples of scalars,
\[v_{k}=\left(v_{1 k}, \ldots, v_{m k}\right).\]
We often write such vectors vertically, as the \(n\) "columns" in an array of \(m\) "rows" and \(n\) "columns":
\[\left(\begin{array}{cccc}{v_{11}} & {v_{12}} & {\dots} & {v_{1 n}} \\ {v_{21}} & {v_{22}} & {\dots} & {v_{2 n}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {v_{m 1}} & {v_{m 2}} & {\dots} & {v_{m n}} \end{array}\right).\]
Formally, (6) is a double sequence of \(m n\) terms, called an \(m \times n\) matrix. We denote it by \([f]=\left(v_{i k}\right),\) where for \(k=1,2, \ldots, n\),
\[f\left(\vec{e}_{k}\right)=v_{k}=\left(v_{1 k}, \ldots, v_{m k}\right).\]
Thus linear maps \(f : E^{n} \rightarrow E^{m}\) (or \(f : C^{n} \rightarrow C^{m})\) correspond one-to-one to their matrices \([f].\)
The easy proof of Corollaries 1 to 3 below is left to the reader.
If \(f, g : E^{\prime} \rightarrow E\) are linear, so is
\[h=a f+b g\]
for any scalars \(a, b\).
If further \(E^{\prime}=E^{n}\left(C^{n}\right)\) and \(E=E^{m}\left(C^{m}\right),\) with \([f]=\left(v_{i k}\right)\) and \([g]=\left(w_{i k}\right)\), then
\[[h]=\left(a v_{i k}+b w_{i k}\right).\]
A map \(f : E^{n}\left(C^{n}\right) \rightarrow E\) is linear iff
\[f(\vec{x})=\sum_{k=1}^{n} v_{k} x_{k},\]
where \(v_{k}=f\left(\vec{e}_{k}\right)\).
Hint: For the "if," use Corollary 1. For the "only if," use formula (5) above.
If \(f : E^{\prime} \rightarrow E^{\prime \prime}\) and \(g : E^{\prime \prime} \rightarrow E\) are linear, so is the composite \(h=g \circ f.\)
Our next theorem deals with the matrix of the composite linear map \(g \circ f\)
Let \(f : E^{\prime} \rightarrow E^{\prime \prime}\) and \(g : E^{\prime \prime} \rightarrow E\) be linear, with
\[E^{\prime}=E^{n}\left(C^{n}\right), E^{\prime \prime}=E^{m}\left(C^{m}\right), \text { and } E=E^{r}\left(C^{r}\right).\]
If \([f]=\left(v_{i k}\right)\) and \([g]=\left(w_{j i}\right),\) then
\[[h]=[g \circ f]=\left(z_{j k}\right),\]
where
\[z_{j k}=\sum_{i=1}^{m} w_{j i} v_{i k}, \quad j=1,2, \ldots, r, k=1,2, \ldots, n.\]
- Proof
-
Denote the basic unit vectors in \(E^{\prime}\) by
\[e_{1}^{\prime}, \ldots, e_{n}^{\prime},\]
those in \(E^{\prime \prime}\) by
\[e_{1}^{\prime \prime}, \ldots, e_{m}^{\prime \prime},\]
and those in \(E\) by
\[e_{1}, \ldots, e_{r}.\]
Then for \(k=1,2, \ldots, n\),
\[f\left(e_{k}^{\prime}\right)=v_{k}=\sum_{i=1}^{m} v_{i k} e_{i}^{\prime \prime} \text { and } h\left(e_{k}^{\prime}\right)=\sum_{j=1}^{r} z_{j k} e_{j},\]
and for \(i=1, \dots m\),
\[g\left(e_{i}^{\prime \prime}\right)=\sum_{j=1}^{r} w_{j i} e_{j}.\]
Also,
\[h\left(e_{k}^{\prime}\right)=g\left(f\left(e_{k}^{\prime}\right)\right)=g\left(\sum_{i=1}^{m} v_{i k} e_{i}^{\prime \prime}\right)=\sum_{i=1}^{m} v_{i k} g\left(e_{i}^{\prime \prime}\right)=\sum_{i=1}^{m} v_{i k}\left(\sum_{j=1}^{r} w_{j i} e_{j}\right).\]
Thus
\[h\left(e_{k}^{\prime}\right)=\sum_{j=1}^{r} z_{j k} e_{j}=\sum_{j=1}^{r}\left(\sum_{i=1}^{m} w_{j i} v_{i k}\right) e_{j}.\]
But the representation in terms of the \(e_{j}\) is unique (Theorem 2 in Chapter 3, §§1-3), so, equating coefficients, we get (7). \(\quad \square\)
Note 4. Observe that \(z_{j k}\) is obtained, so to say, by "dot-multiplying" the \(j\)th row of \([g]\) (an \(r \times m\) matrix) by the \(k\)th column of \([f]\) (an \(m \times n\) matrix).
It is natural to set
\[[g][f]=[g \circ f],\]
or
\[\left(w_{j i}\right)\left(v_{i k}\right)=\left(z_{j k}\right),\]
with \(z_{j k}\) as in (7).
Caution. Matrix multiplication, so defined, is not commutative.
The set of all continuous linear maps \(f : E^{\prime} \rightarrow E\) (for fixed \(E^{\prime} \) and \(E\)) is denoted \(L(E^{\prime}, E).\)
If \(E=E^{\prime},\) we write \(L(E)\) instead.
For each \(f\) in \(L\left(E^{\prime}, E\right),\) we define its norm by
\[\|f\|=\sup _{|\vec{x}| \leq 1}|f(\vec{x})|.\]
Note that \(\|f\|<+\infty,\) by Theorem 1.
\(L(E^{\prime}, E)\) is a normed linear space under the norm defined above and under the usual operations on functions, as in Corollary 1.
- Proof
-
Corollary 1 easily implies that \(L(E^{\prime}, E)\) is a vector space. We now show that \(\|\cdot\|\) is a genuine norm.
The triangle law,
\[\|f+g\| \leq\|f\|+\|g\|,\]
follows exactly as in Example (C) of Chapter 3, §10. (Verify!)
Also, by Problem 5 in Chapter 2, §§8-9, \(\sup |a f(\vec{x})|=|a| \sup |f(\vec{x})|.\) Hence \(\|a f\|=|a|\|f\|\) for any scalar \(a.\)
As noted above, \(0 \leq\|f\|<+\infty\).
It remains to show that \(\|f\|=0\) iff \(f\) is the zero map. If
\[\|f\|=\sup _{|\vec{x}| \leq 1}|f(\vec{x})|=0,\]
then \(|f(\vec{x})|=0\) when \(|\vec{x}| \leq 1.\) Hence, if \(\vec{x} \neq \overrightarrow{0}\),
\[f(\frac{\vec{x}}{|\vec{x}|})=\frac{1}{|\vec{x}|} f(\vec{x})=0.\]
As \(f(\overrightarrow{0})=0,\) we have \(f(\vec{x})=0\) for all \(\vec{x} \in E^{\prime}\).
Thus \(\|f\|=0\) implies \(f=0,\) and the converse is clear. Thus all is proved. \(\quad \square\)
Note 5. A similar proof, via \(f\left(\frac{\vec{x}}{|\vec{x}|}\right)\) and properties of lub, shows that
\[\|f\|=\sup _{\vec{x} \neq 0}\left|\frac{f(\vec{x})}{|\vec{x}|}\right|\]
and
\[(\forall \vec{x} \in E^{\prime}) \quad|f(\vec{x})| \leq\|f\||\vec{x}|.\]
It also follows that \(\|f\|\) is the least real \(c\) such that
\[(\forall \vec{x} \in E^{\prime}) \quad|f(\vec{x})| \leq c|\vec{x}|.\]
Verify. (See Problem 3'.)
As in any normed space, we define distances in \(L(E^{\prime}, E)\) by
\[\rho(f, g)=\|f-g\|,\]
making it a metric space; so we may speak of convergence, limits, etc. in it.
If \(f \in L(E^{\prime}, E^{\prime \prime})\) and \(g \in L(E^{\prime \prime}, E),\) then
\[\|g \circ f\| \leq\|g\|\|f\|.\]
- Proof
-
By Note 5,
\[\left(\forall \vec{x} \in E^{\prime}\right) \quad|g(f(\vec{x}))| \leq\|g\||f(\vec{x})| \leq\|g\|\|f\||\vec{x}|.\]
Hence
\[(\forall \vec{x} \neq \overrightarrow{0}) \quad\left|\frac{(g \circ f)(\vec{x})}{|\vec{x}|}\right| \leq\|g\|\|f\|,\]
and so
\[\|g\|\|f\| \geq \sup _{\vec{x} \neq \overline{0}} \frac{|(g \circ f)(\vec{x})|}{|\vec{x}|}=\|g \circ f\|. \quad \square\]