# 8.1: Vector Spaces, linear Mappings, and Convexity

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

The euclidean space \({\mathbb{R}}^n\) has already made an appearance in the metric space chapter. In this chapter, we will extend the differential calculus we created for one variable to several variables. The key idea in differential calculus is to approximate functions by lines and linear functions. In several variables we must introduce a little bit of linear algebra before we can move on. So let us start with vector spaces and linear functions on vector spaces. While it is common to use \(\vec{x}\) or the bold \(\mathbf{x}\) for elements of \({\mathbb{R}}^n\), especially in the applied sciences, we use just plain \(x\), which is common in mathematics. That is \(x \in {\mathbb{R}}^n\) is a vector, which means that \(x = (x^1,x^2,\ldots,x^n)\) is an \(n\)-tuple of real numbers. We use upper indices for identifying components, leaving us the lower index for sequences of vectors. For example, we can have vectors \(x_1\) and \(x_2\) in \({\mathbb{R}}^n\) and then \(x_1 = (x_1^1,x_1^2,\ldots,x_1^n)\) and \(x_2 = (x_2^1,x_2^2,\ldots,x_2^n)\). It is common to write vectors as column vectors, that is, \(n \times 1\) matrices:

\[x = (x^1,x^2,\ldots,x^n) = \mbox{ \scriptsize $\begin{bmatrix} x^1 \\ x^2 \\ \vdots \\ x^n \end{bmatrix}$ }.\]

We will do so when convenient. We call real numbers scalars to distinguish them from vectors. Let \(X\) be a set together with operations of addition, \(+ \colon X \times X \to X\), and multiplication, \(\cdot \colon {\mathbb{R}}\times X \to X\), (we write \(ax\) instead of \(a \cdot x\)). \(X\) is called a vector space (or a real vector space) if the following conditions are satisfied: (Addition is associative) If \(u, v, w \in X\), then \(u+(v+w) = (u+v)+w\). (Addition is commutative) If \(u, v \in X\), then \(u+v = v+u\). (Additive identity) There is a \(0 \in X\) such that \(v+0=v\) for all \(v \in X\). (Additive inverse) For every \(v \in X\), there is a \(-v \in X\), such that \(v+(-v)=0\). (Distributive law) If \(a \in {\mathbb{R}}\), \(u,v \in X\), then \(a(u+v) = au+av\). (Distributive law) If \(a,b \in {\mathbb{R}}\), \(v \in X\), then \((a+b)v = av+bv\). (Multiplication is associative) If \(a,b \in {\mathbb{R}}\), \(v \in X\), then \((ab)v = a(bv)\). (Multiplicative identity) \(1v = v\) for all \(v \in X\). Elements of a vector space are usually called vectors, even if they are not elements of \({\mathbb{R}}^n\) (vectors in the “traditional” sense). An example vector space is \({\mathbb{R}}^n\), where addition and multiplication by a constant is done componentwise: if \(\alpha \in {\mathbb{R}}\) and \(x,y \in {\mathbb{R}}^n\), then \[\begin{aligned} & x+y := (x^1,x^2,\ldots,x^n) + (y^1,y^2,\ldots,y^n) = (x^1+y^1,x^2+y^2,\ldots,x^n+y^n) , \\ & \alpha x := \alpha (x^1,x^2,\ldots,x^n) = (\alpha x^1, \alpha x^2,\ldots, \alpha x^n) .\end{aligned}\] In this book we mostly deal with vector spaces that can be regarded as subsets of \({\mathbb{R}}^n\), but there are other vector spaces that are useful in analysis. For example, the space \(C([0,1],{\mathbb{R}})\) of continuous functions on the interval \([0,1]\) is a vector space. A trivial example of a vector space (the smallest one in fact) is just \(X = \{ 0 \}\). The operations are defined in the obvious way. You always need a zero vector to exist, so all vector spaces are nonempty sets. It is also possible to use other fields than \({\mathbb{R}}\) in the definition (for example it is common to use the complex numbers \({\mathbb{C}}\)), but let us stick with the real numbers1. A function \(f \colon X \to Y\), when \(Y\) is not \({\mathbb{R}}\) is often called a mapping or a map rather than a function. Linear combinations and dimension If we have vectors \(x_1, \ldots, x_k \in {\mathbb{R}}^n\) and scalars \(a^1, \ldots, a^k \in {\mathbb{R}}\), then \[a^1 x_1 + a^2 x_2 + \cdots + a^k x_k\] is called a linear combination of the vectors \(x_1, \ldots, x_k\). If \(Y \subset {\mathbb{R}}^n\) is a set then the span of \(Y\), or in notation \(\operatorname{span}(Y)\), is the set of all linear combinations of some finite number of elements of \(Y\). We also say \(Y\) spans \(\operatorname{span}(Y)\). Let \(Y := \{ (1,1) \} \subset {\mathbb{R}}^2\). Then

\[\operatorname{span}(Y) = \{ (x,x) \in {\mathbb{R}}^2 : x \in {\mathbb{R}}\} .\]

That is, \(\operatorname{span}(Y)\) is the line through the origin and the point \((1,1)\). [example:vecspr2span] Let \(Y := \{ (1,1), (0,1) \} \subset {\mathbb{R}}^2\). Then

\[\operatorname{span}(Y) = {\mathbb{R}}^2 ,\]

as any point \((x,y) \in {\mathbb{R}}^2\) can be written as a linear combination \[(x,y) = x (1,1) + (y-x) (0,1) .\] A sum of two linear combinations is again a linear combination, and a scalar multiple of a linear combination is a linear combination, which proves the following proposition. Let \(X\) be a vector space. For any \(Y \subset X\), the set \(\operatorname{span}(Y)\) is a vector space itself. If \(Y\) is already a vector space then \(\operatorname{span}(Y) = Y\). A set of vectors \(\{ x_1, x_2, \ldots, x_k \}\) is linearly independent, if the only solution to

\[a^1 x_1 + a^2 x_2 + \cdots + a^k x_k = 0\]

is the trivial solution \(a^1 = a^2 = \cdots = a^k = 0\). A set that is not linearly independent, is linearly dependent. A linearly independent set \(B\) of vectors such that \(\operatorname{span}(B) = X\) is called a basis of \(X\). For example the set \(Y\) of the two vectors in is a basis of \({\mathbb{R}}^2\). If a vector space \(X\) contains a linearly independent set of \(d\) vectors, but no linearly independent set of \(d+1\) vectors then we say the dimension or \(\dim \, X := d\). If for all \(d \in {\mathbb{N}}\) the vector space \(X\) contains a set of \(d\) linearly independent vectors, we say \(X\) is infinite dimensional and write \(\dim \, X := \infty\). Clearly for the trivial vector space, \(\dim \, \{ 0 \} = 0\). We will see in a moment that any vector space that is a subset of \({\mathbb{R}}^n\) has a finite dimension, and that dimension is less than or equal to \(n\). If a set is linearly dependent, then one of the vectors is a linear combination of the others. In other words, if \(a^j \not= 0\), then we can solve for \(x_j\)

\[x_j = \frac{a^1}{a^j} x_1 + \cdots + \frac{a^{j-1}}{a^j} x_{j-1} + \frac{a^{j+1}}{a^j} x_{j+1} + \cdots + \frac{a^k}{a^k} x_k .\]

Clearly then the vector \(x_j\) has at least two different representations as linear combinations of \(\{ x_1,x_2,\ldots,x_k \}\). If \(B = \{ x_1, x_2, \ldots, x_k \}\) is a basis of a vector space \(X\), then every point \(y \in X\) has a unique representation of the form \[y = \sum_{j=1}^k \alpha^j x_j\] for some numbers \(\alpha^1, \alpha^2, \ldots, \alpha^k\). Every \(y \in X\) is a linear combination of elements of \(B\) since \(X\) is the span of \(B\). For uniqueness suppose \[y = \sum_{j=1}^k \alpha^j x_j = \sum_{j=1}^k \beta^j x_j ,\] then \[\sum_{j=1}^k (\alpha^j-\beta^j) x_j = 0 .\] By linear independence of the basis \(\alpha^j = \beta^j\) for all \(j\). For \({\mathbb{R}}^n\) we define \[e_1 := (1,0,0,\ldots,0) , \quad e_2 := (0,1,0,\ldots,0) , \quad \ldots, \quad e_n := (0,0,0,\ldots,1) ,\] and call this the standard basis of \({\mathbb{R}}^n\). We use the same letters \(e_j\) for any \({\mathbb{R}}^n\), and which space \({\mathbb{R}}^n\) we are working in is understood from context. A direct computation shows that \(\{ e_1, e_2, \ldots, e_n \}\) is really a basis of \({\mathbb{R}}^n\); it is easy to show that it spans \({\mathbb{R}}^n\) and is linearly independent. In fact,

\[x = (x^1,x^2,\ldots,x^n) = \sum_{j=1}^n x^j e_j .\]

Let \(X\) be a vector space. If \(X\) is spanned by \(d\) vectors, then \(\dim \, X \leq d\). \(\dim \, X = d\) if and only if \(X\) has a basis of \(d\) vectors (and so every basis has \(d\) vectors). In particular, \(\dim \, {\mathbb{R}}^n = n\). If \(Y \subset X\) is a vector space and \(\dim \, X = d\), then \(\dim \, Y \leq d\). If \(\dim \, X = d\) and a set \(T\) of \(d\) vectors spans \(X\), then \(T\) is linearly independent. If \(\dim \, X = d\) and a set \(T\) of \(m\) vectors is linearly independent, then there is a set \(S\) of \(d-m\) vectors such that \(T \cup S\) is a basis of \(X\). Let us start with (i). Suppose \(S = \{ x_1 , x_2, \ldots, x_d \}\) spans \(X\), and \(T = \{ y_1, y_2, \ldots, y_m \}\) is a set of linearly independent vectors of \(X\). We wish to show that \(m \leq d\). Write \[y_1 = \sum_{k=1}^d \alpha_1^k x_k ,\] which we can do as \(S\) spans \(X\). One of the \(\alpha_1^k\) is nonzero (otherwise \(y_1\) would be zero), so suppose without loss of generality that this is \(\alpha_1^1\). Then we can solve \[x_1 = \frac{1}{\alpha_1^1} y_1 - \sum_{k=2}^d \frac{\alpha_1^k}{\alpha_1^1} x_k .\] In particular \(\{ y_1 , x_2, \ldots, x_d \}\) span \(X\), since \(x_1\) can be obtained from \(\{ y_1 , x_2, \ldots, x_d \}\). Next, \[y_2 = \alpha_2^1 y_1 + \sum_{k=2}^d \alpha_2^k x_k .\] As \(T\) is linearly independent, we must have that one of the \(\alpha_2^k\) for \(k \geq 2\) must be nonzero. Without loss of generality suppose \(\alpha_2^2 \not= 0\). Proceed to solve for \[x_2 = \frac{1}{\alpha_2^2} y_2 - \frac{\alpha_2^1}{\alpha_2^2} y_1 - \sum_{k=3}^d \frac{\alpha_2^k}{\alpha_2^2} x_k .\] In particular \(\{ y_1 , y_2, x_3, \ldots, x_d \}\) spans \(X\). The astute reader will think back to linear algebra and notice that we are row-reducing a matrix. We continue this procedure. If \(m < d\), then we are done. So suppose \(m \geq d\). After \(d\) steps we obtain that \(\{ y_1 , y_2, \ldots, y_d \}\) spans \(X\). Any other vector \(v\) in \(X\) is a linear combination of \(\{ y_1 , y_2, \ldots, y_d \}\), and hence cannot be in \(T\) as \(T\) is linearly independent. So \(m = d\). Let us look at (ii). First notice that if we have a set \(T\) of \(k\) linearly independent vectors that do not span \(X\), then we can always choose a vector \(v \in X \setminus \operatorname{span}(T)\). The set \(T \cup \{ v \}\) is linearly independent (exercise). If \(\dim \, X = d\), then there must exist some linearly independent set of \(d\) vectors \(T\), and it must span \(X\), otherwise we could choose a larger set of linearly independent vectors. So we have a basis of \(d\) vectors. On the other hand if we have a basis of \(d\) vectors, it is linearly independent and spans \(X\). By (i) we know there is no set of \(d+1\) linearly independent vectors, so dimension must be \(d\). For (iii) notice that \(\{ e_1, e_2, \ldots, e_n \}\) is a basis of \({\mathbb{R}}^n\). To see (iv), suppose \(Y\) is a vector space and \(Y \subset X\), where \(\dim \, X = d\). As \(X\) cannot contain \(d+1\) linearly independent vectors, neither can \(Y\). For (v) suppose \(T\) is a set of \(m\) vectors that is linearly dependent and spans \(X\). Then one of the vectors is a linear combination of the others. Therefore if we remove it from \(T\) we obtain a set of \(m-1\) vectors that still span \(X\) and hence \(\dim \, X \leq m-1\). For (vi) suppose \(T = \{ x_1, \ldots, x_m \}\) is a linearly independent set. We follow the procedure above in the proof of (ii) to keep adding vectors while keeping the set linearly independent. As the dimension is \(d\) we can add a vector exactly \(d-m\) times. Linear mappings A mapping \(A \colon X \to Y\) of vector spaces \(X\) and \(Y\) is linear (or a linear transformation) if for every \(a \in {\mathbb{R}}\) and \(x,y \in X\) we have \[A(a x) = a A(x) \qquad A(x+y) = A(x)+A(y) .\] We usually write \(Ax\) instead of \(A(x)\) if \(A\) is linear. If \(A\) is one-to-one an onto then we say \(A\) is invertible and we denote the inverse by \(A^{-1}\). If \(A \colon X \to X\) is linear then we say \(A\) is a linear operator on \(X\). We write \(L(X,Y)\) for the set of all linear transformations from \(X\) to \(Y\), and just \(L(X)\) for the set of linear operators on \(X\). If \(a,b \in {\mathbb{R}}\) and \(A, B \in L(X,Y)\), define the transformation \(aA+bB\) \[(aA+bB)(x) = aAx+bBx .\] If \(A \in L(Y,Z)\) and \(B \in L(X,Y)\), define the transformation \(AB\) as \[ABx := A(Bx) .\] Finally denote by \(I \in L(X)\) the identity: the linear operator such that \(Ix = x\) for all \(x\). It is not hard to see that \(aA+bB \in L(X,Y)\), and that \(AB \in L(X,Z)\). In particular, \(L(X,Y)\) is a vector space. It is obvious that if \(A\) is linear then \(A0 = 0\). If \(A \colon X \to Y\) is invertible, then \(A^{-1}\) is linear. Let \(a \in {\mathbb{R}}\) and \(y \in Y\). As \(A\) is onto, then there is an \(x\) such that \(y = Ax\), and further as it is also one-to-one \(A^{-1}(Az) = z\) for all \(z \in X\). So

\[A^{-1}(ay) = A^{-1}(aAx) = A^{-1}\bigl(A(ax)\bigr) = ax = aA^{-1}(y).\]

Similarly let \(y_1,y_2 \in Y\), and \(x_1, x_2 \in X\) such that \(Ax_1 = y_1\) and \(Ax_2 = y_2\), then \[A^{-1}(y_1+y_2) = A^{-1}(Ax_1+Ax_2) = A^{-1}\bigl(A(x_1+x_2)\bigr) = x_1+x_2 = A^{-1}(y_1) + A^{-1}(y_2). \qedhere\] If \(A \colon X \to Y\) is linear then it is completely determined by its values on a basis of \(X\). Furthermore, if \(B\) is a basis, then any function \(\tilde{A} \colon B \to Y\) extends to a linear function on \(X\). For infinite dimensional spaces, the proof is essentially the same, but a little trickier to write, so let us stick with finitely many dimensions. We leave the infinite dimensional case to the reader. Let \(\{ x_1, x_2, \ldots, x_n \}\) be a basis and suppose \(A(x_j) = y_j\). Then every \(x \in X\) has a unique representation \[x = \sum_{j=1}^n b^j x_j\] for some numbers \(b^1,b^2,\ldots,b^n\). Then by linearity \[Ax = A\sum_{j=1}^n b^j x_j = \sum_{j=1}^n b^j Ax_j = \sum_{j=1}^n b^j y_j .\] The “furthermore” follows by defining the extension \(Ax = \sum_{j=1}^n b^j y_j\), and noting that this is well defined by uniqueness of the representation of \(x\). If \(X\) is a finite dimensional vector space and \(A \colon X \to X\) is linear, then \(A\) is one-to-one if and only if it is onto. Let \(\{ x_1,x_2,\ldots,x_n \}\) be a basis for \(X\). Suppose \(A\) is one-to-one. Now suppose

\[\sum_{j=1}^n c^j Ax_j = A\sum_{j=1}^n c^j x_j = 0 .\]

As \(A\) is one-to-one, the only vector that is taken to 0 is 0 itself. Hence, \[0 = \sum_{j=1}^n c^j x_j\] and so \(c^j = 0\) for all \(j\). Therefore, \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) is linearly independent. By an above proposition and the fact that the dimension is \(n\), we have that \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) span \(X\). As any point \(x \in X\) can be written as \[x = \sum_{j=1}^n a^j Ax_j = A\sum_{j=1}^n a^j x_j ,\] so \(A\) is onto. Now suppose \(A\) is onto. As \(A\) is determined by the action on the basis we see that every element of \(X\) has to be in the span of \(\{ Ax_1, \ldots, Ax_n \}\). Suppose \[A\sum_{j=1}^n c^j x_j = \sum_{j=1}^n c^j Ax_j = 0 .\] By the same proposition as \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) span \(X\), the set is independent, and hence \(c^j = 0\) for all \(j\). This means that \(A\) is one-to-one. If \(Ax = Ay\), then \(A(x-y) = 0\) and so \(x=y\). Convexity A subset \(U\) of a vector space is convex if whenever \(x,y \in U\), the line segment from \(x\) to \(y\) lies in \(U\). That is, if the convex combination \((1-t)x+ty\) is in \(U\) for all \(t \in [0,1]\). See . Note that in \({\mathbb{R}}\), every connected interval is convex. In \({\mathbb{R}}^2\) (or higher dimensions) there are lots of nonconvex connected sets. For example the set \({\mathbb{R}}^2 \setminus \{0\}\) is not convex but it is connected. To see this simply take any \(x \in {\mathbb{R}}^2 \setminus \{0\}\) and let \(y:=-x\). Then \((\nicefrac{1}{2})x + (\nicefrac{1}{2})y = 0\), which is not in the set. On the other hand, the ball \(B(x,r) \subset {\mathbb{R}}^n\) (using the standard metric on \({\mathbb{R}}^n\)) is always convex by the triangle inequality. Show that in \({\mathbb{R}}^n\) any ball \(B(x,r)\) for \(x \in {\mathbb{R}}^n\) and \(r > 0\) is convex. Any subspace \(V\) of a vector space \(X\) is convex. A somewhat more complicated example is given by the following. Let \(C([0,1],{\mathbb{R}})\) be the vector space of continuous real valued functions on \({\mathbb{R}}\). Let \(X \subset C([0,1],{\mathbb{R}})\) be the set of those \(f\) such

\[\int_0^1 f(x)~dx \leq 1 \qquad \text{and} \qquad f(x) \geq 0 \text{ for all $x \in [0,1]$} .\]

Then \(X\) is convex. Take \(t \in [0,1]\) and note that if \(f,g \in X\) then \(t f(x) + (1-t) g(x) \geq 0\) for all \(x\). Furthermore \[\int_0^1 \bigl(tf(x) + (1-t)g(x)\bigr) ~dx = t \int_0^1 f(x) ~dx + (1-t)\int_0^1 g(x) ~dx \leq 1 .\] Note that \(X\) is not a subspace of \(C([0,1],{\mathbb{R}})\). The intersection two closed sets is convex. In fact, If \(\{ C_\lambda \}_{\lambda \in I}\) is an arbitrary collection of convex sets, then

\[C := \bigcap_{\lambda \in I} C_\lambda\]

is convex. The proof is easy. If \(x, y \in C\), then \(x,y \in C_\lambda\) for all \(\lambda \in I\), and hence if \(t \in [0,1]\), then \(tx + (1-t)y \in C_\lambda\) for all \(\lambda \in I\). Therefore \(tx + (1-t)y \in C\) and \(C\) is convex. Let \(T \colon V \to W\) be a linear mapping between two vector spaces and let \(C \subset V\) be a convex set. Then \(T(C)\) is convex. Take any two points \(p,q \in T(C)\). Then pick \(x,y \in C\) such that \(T(x) = p\) and \(T(y)=q\). As \(C\) is convex then for all \(t \in [0,1]\) we have \(tx+(1-t)y \in C\), so \[T\bigl(tx+(1-t)y\bigr) = tT(x)+(1-t)T(y) = tp+(1-t)q\] is in \(T(C)\). For completeness, let us A very useful construction is the convex hull. Given any set \(S \subset V\) of a vector space, define the convex hull of \(S\), by

\[\operatorname{co}(S) := \bigcap \{ C \subset V : S \subset C, \text{ and $C$ is convex} \} .\]

That is, the convex hull is the smallest convex set containing \(S\). Note that by a proposition above, the intersection of convex sets is convex and hence, the convex hull is convex. The convex hull of 0 and 1 in \({\mathbb{R}}\) is \([0,1]\). Proof: Any convex set containing 0 and 1 must contain \([0,1]\). The set \([0,1]\) is convex, therefore it must be the convex hull. Exercises Verify that \({\mathbb{R}}^n\) is a vector space. Let \(X\) be a vector space. Prove that a finite set of vectors \(\{ x_1,\ldots,x_n \} \subset X\) is linearly independent if and only if for every \(j=1,2,\ldots,n\) \[\operatorname{span}( \{ x_1,\ldots,x_{j-1},x_{j+1},\ldots,x_n \}) \subsetneq \operatorname{span}( \{ x_1,\ldots,x_n \}) .\] That is, the span of the set with one vector removed is strictly smaller. Prove that \(C([0,1],{\mathbb{R}})\) is an infinite dimensional vector space where the operations are defined in the obvious way: \(s=f+g\) and \(m=fg\) are defined as \(s(x) := f(x)+g(x)\) and \(m(x) := f(x)g(x)\). Hint: for the dimension, think of functions that are only nonzero on the interval \((\nicefrac{1}{n+1},\nicefrac{1}{n})\). Let \(k \colon [0,1]^2 \to {\mathbb{R}}\) be continuous. Show that \(L \colon C([0,1],{\mathbb{R}}) \to C([0,1],{\mathbb{R}})\) defined by \[Lf(y) := \int_0^1 k(x,y)f(x)~dx\] is a linear operator. That is, show that \(L\) is well defined (that \(Lf\) is continuous), and that \(L\) is linear.