# 8.2: Analysis with Vector spaces

- Last updated

- Save as PDF

- Page ID
- 32322

- Jiří Lebl
- Associate Professor (Mathematics) at Oklahoma State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

Norms
Let us start measuring distance.
If \(X\) is a vector space, then we say a real valued function \(\left\lVert {\cdot} \right\rVert\) is a norm if:
\(\left\lVert {x} \right\rVert \geq 0\), with \(\left\lVert {x} \right\rVert=0\) if and only if \(x=0\).
\(\left\lVert {cx} \right\rVert = \left\lvert {c} \right\rvert\left\lVert {x} \right\rVert\) for all \(c \in {\mathbb{R}}\) and \(x \in X\).
\(\left\lVert {x+y} \right\rVert \leq \left\lVert {x} \right\rVert+\left\lVert {y} \right\rVert\) for all \(x,y \in X\) (Triangle inequality).
Before defining the standard norm on \({\mathbb{R}}^n\), let us define the standard scalar dot product on \({\mathbb{R}}^n\). For two vectors if \(x=(x^1,x^2,\ldots,x^n) \in {\mathbb{R}}^n\) and \(y=(y^1,y^2,\ldots,y^n) \in {\mathbb{R}}^n\), define \[x \cdot y := \sum_{j=1}^n x^j y^j .\] It is easy to see that the dot product is linear in each variable separately, that is, it is a linear mapping when you keep one of the variables constant. The Euclidean norm is then defined as \[\left\lVert {x} \right\rVert := \sqrt{x \cdot x} = \sqrt{(x^1)^2+(x^2)^2 + \cdots + (x^n)^2}.\] It is easy to see that the Euclidean norm satisfies (i) and (ii). To prove that (iii) holds, the key key inequality in the so-called Cauchy-Schwarz inequality that we have seen before. As this inequality is so important let us restate and reprove it using the notation of this chapter.
Let \(x, y \in {\mathbb{R}}^n\), then \[\left\lvert {x \cdot y} \right\rvert \leq \left\lVert {x} \right\rVert\left\lVert {y} \right\rVert = \sqrt{x\cdot x}\, \sqrt{y\cdot y},\] with equality if and only if the vectors are scalar multiples of each other.
If \(x=0\) or \(y = 0\), then the theorem holds trivially. So assume \(x\not= 0\) and \(y \not= 0\).
If \(x\) is a scalar multiple of \(y\), that is \(x = \lambda y\) for some \(\lambda \in {\mathbb{R}}\), then the theorem holds with equality: \[\left\lvert {\lambda y \cdot y} \right\rvert = \left\lvert {\lambda} \right\rvert \, \left\lvert {y\cdot y} \right\rvert = \left\lvert {\lambda} \right\rvert \, \left\lVert {y} \right\rVert^2 = \left\lVert {\lambda y} \right\rVert \left\lVert {y} \right\rVert .\]
Next take \(x+ty\), \[\left\lVert {x+ty} \right\rVert^2 = (x+ty) \cdot (x+ty) = x \cdot x + x \cdot ty + ty \cdot x + ty \cdot ty = \left\lVert {x} \right\rVert^2 + 2t(x \cdot y) + t^2 \left\lVert {y} \right\rVert^2 .\] If \(x\) is not a scalar multiple of \(y\), then \(\left\lVert {x+ty} \right\rVert^2 > 0\) for all \(t\). So the above polynomial in \(t\) is never zero. From elementary algebra it follows that the discriminant must be negative: \[4 {(x \cdot y)}^2 - 4 \left\lVert {x} \right\rVert^2\left\lVert {y} \right\rVert^2 < 0,\] or in other words \({(x \cdot y)}^2 < \left\lVert {x} \right\rVert^2\left\lVert {y} \right\rVert^2\).
Item (iii), the triangle inequality, follows via a simple computation: \[\left\lVert {x+y} \right\rVert^2 = x \cdot x + y \cdot y + 2 (x \cdot y) \leq \left\lVert {x} \right\rVert^2 + \left\lVert {y} \right\rVert^2 + 2 (\left\lVert {x} \right\rVert+\left\lVert {y} \right\rVert) = {(\left\lVert {x} \right\rVert + \left\lVert {y} \right\rVert)}^2 .\]
The distance \(d(x,y) := \left\lVert {x-y} \right\rVert\) is the standard distance function on \({\mathbb{R}}^n\) that we used when we talked about metric spaces.
In fact, on any vector space \(X\), once we have a norm (any norm), we define a distance \(d(x,y) := \left\lVert {x-y} \right\rVert\) that makes \(X\) into a metric space (an easy exercise).
Let \(A \in L(X,Y)\). Define \[\left\lVert {A} \right\rVert := \sup \{ \left\lVert {Ax} \right\rVert : x \in X ~ \text{with} ~ \left\lVert {x} \right\rVert = 1 \} .\] The number \(\left\lVert {A} \right\rVert\) is called the operator norm. We will see below that indeed it is a norm (at least for finite dimensional spaces). By linearity we get \[\left\lVert {A} \right\rVert = \sup \{ \left\lVert {Ax} \right\rVert : x \in X ~ \text{with} ~ \left\lVert {x} \right\rVert = 1 \} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\left\lVert {Ax} \right\rVert}{\left\lVert {x} \right\rVert} .\] This implies that \[\left\lVert {Ax} \right\rVert \leq \left\lVert {A} \right\rVert \left\lVert {x} \right\rVert .\]
It is not hard to see from the definition that \(\left\lVert {A} \right\rVert = 0\) if and only if \(A = 0\), that is, if \(A\) takes every vector to the zero vector.
For finite dimensional spaces \(\left\lVert {A} \right\rVert\) is always finite as we prove below. This also implies that \(A\) is continuous. For infinite dimensional spaces neither statement needs to be true. For a simple example, take the vector space of continuously differentiable functions on \([0,1]\) and as the norm use the uniform norm. The functions \(\sin(nx)\) have norm 1, but the derivatives have norm \(n\). So differentiation (which is a linear operator) has unbounded norm on this space. But let us stick to finite dimensional spaces now.
If \(A \in L({\mathbb{R}}^n,{\mathbb{R}}^m)\), then \(\left\lVert {A} \right\rVert < \infty\) and \(A\) is uniformly continuous (Lipschitz with constant \(\left\lVert {A} \right\rVert\)).
If \(A,B \in L({\mathbb{R}}^n,{\mathbb{R}}^m)\) and \(c \in {\mathbb{R}}\), then \[\left\lVert {A+B} \right\rVert \leq \left\lVert {A} \right\rVert+\left\lVert {B} \right\rVert, \qquad \left\lVert {cA} \right\rVert = \left\lvert {c} \right\rvert\left\lVert {A} \right\rVert .\] In particular \(L({\mathbb{R}}^n,{\mathbb{R}}^m)\) is a metric space with distance \(\left\lVert {A-B} \right\rVert\).
If \(A \in L({\mathbb{R}}^n,{\mathbb{R}}^m)\) and \(B \in L({\mathbb{R}}^m,{\mathbb{R}}^k)\), then \[\left\lVert {BA} \right\rVert \leq \left\lVert {B} \right\rVert \left\lVert {A} \right\rVert .\]
For (i), let \(x \in {\mathbb{R}}^n\). We know that \(A\) is defined by its action on a basis. Write \[x = \sum_{j=1}^n c^j e_j .\] Then \[\left\lVert {Ax} \right\rVert = \left\lVert {\sum_{j=1}^n c^j Ae_j} \right\rVert \leq \sum_{j=1}^n \left\lvert {c^j} \right\rvert \left\lVert {Ae_j} \right\rVert .\] If \(\left\lVert {x} \right\rVert = 1\), then it is easy to see that \(\left\lvert {c^j} \right\rvert \leq 1\) for all \(j\), so \[\left\lVert {Ax} \right\rVert \leq \sum_{j=1}^n \left\lvert {c^j} \right\rvert \left\lVert {Ae_j} \right\rVert \leq \sum_{j=1}^n \left\lVert {Ae_j} \right\rVert .\] The right hand side does not depend on \(x\) and so we are done, we have found a finite upper bound. Next, \[\left\lVert {A(x-y)} \right\rVert \leq \left\lVert {A} \right\rVert \left\lVert {x-y} \right\rVert\] as we mentioned above. So if \(\left\lVert {A} \right\rVert < \infty\), then this says that \(A\) is Lipschitz with constant \(\left\lVert {A} \right\rVert\).
For (ii), let us note that \[\left\lVert {(A+B)x} \right\rVert = \left\lVert {Ax+Bx} \right\rVert \leq \left\lVert {Ax} \right\rVert+\left\lVert {Bx} \right\rVert \leq \left\lVert {A} \right\rVert \left\lVert {x} \right\rVert+\left\lVert {B} \right\rVert\left\lVert {x} \right\rVert = (\left\lVert {A} \right\rVert+\left\lVert {B} \right\rVert) \left\lVert {x} \right\rVert .\] So \(\left\lVert {A+B} \right\rVert \leq \left\lVert {A} \right\rVert+\left\lVert {B} \right\rVert\).
Similarly \[\left\lVert {(cA)x} \right\rVert = \left\lvert {c} \right\rvert \left\lVert {Ax} \right\rVert \leq (\left\lvert {c} \right\rvert\left\lVert {A} \right\rVert) \left\lVert {x} \right\rVert .\] Thus \(\left\lVert {cA} \right\rVert \leq \left\lvert {c} \right\rvert\left\lVert {A} \right\rVert\). Next note \[\left\lvert {c} \right\rvert \left\lVert {Ax} \right\rVert = \left\lVert {cAx} \right\rVert \leq \left\lVert {cA} \right\rVert \left\lVert {x} \right\rVert .\] Hence \(\left\lvert {c} \right\rvert\left\lVert {A} \right\rVert \leq \left\lVert {cA} \right\rVert\).
That we have a metric space follows pretty easily, and is left to student.
For (iii) write \[\left\lVert {BAx} \right\rVert \leq \left\lVert {B} \right\rVert \left\lVert {Ax} \right\rVert \leq \left\lVert {B} \right\rVert \left\lVert {A} \right\rVert \left\lVert {x} \right\rVert . \qedhere\]
As a norm defines a metric, we have defined a metric space topology on \(L({\mathbb{R}}^n,{\mathbb{R}}^m)\) so we can talk about open/closed sets, continuity, and convergence. Note that we have defined a norm only on \({\mathbb{R}}^n\) and not on an arbitrary finite dimensional vector space. However, after picking bases, we can define a norm on any vector space in the same way. So we really have a topology on any \(L(X,Y)\), although the precise metric would depend on the basis picked.
Let \(U \subset L({\mathbb{R}}^n)\) be the set of invertible linear operators.
If \(A \in U\) and \(B \in L({\mathbb{R}}^n)\), and \[\label{eqcontineq} \left\lVert {A-B} \right\rVert < \frac{1}{\left\lVert {A^{-1}} \right\rVert},\] then \(B\) is invertible.
\(U\) is open and \(A \mapsto A^{-1}\) is a continuous function on \(U\).
The proposition says that \(U\) is an open set and \(A \mapsto A^{-1}\) is continuous on \(U\).
You should always think back to \({\mathbb{R}}^1\), where linear operators are just numbers \(a\). The operator \(a\) is invertible (\(a^{-1} = \nicefrac{1}{a}\)) whenever \(a \not=0\). Of course \(a \mapsto \nicefrac{1}{a}\) is continuous. When \(n > 1\), then there are other noninvertible operators, and in general things are a bit more difficult.
Let us prove (i). First a straight forward computation \[\left\lVert {x} \right\rVert = \left\lVert {A^{-1}Ax} \right\rVert \leq \left\lVert {A^{-1}} \right\rVert \left\lVert {Ax} \right\rVert \leq \left\lVert {A^{-1}} \right\rVert ( \left\lVert {(A-B)x} \right\rVert + \left\lVert {Bx} \right\rVert ) \leq \left\lVert {A^{-1}} \right\rVert\left\lVert {A-B} \right\rVert \left\lVert {x} \right\rVert + \left\lVert {A^{-1}} \right\rVert\left\lVert {Bx} \right\rVert .\] Now assume \(x \neq 0\) and so \(\left\lVert {x} \right\rVert \neq 0\). Using [eqcontineq] we obtain \[\left\lVert {x} \right\rVert < \left\lVert {x} \right\rVert + \left\lVert {A^{-1}} \right\rVert\left\lVert {Bx} \right\rVert ,\] or in other words \(\left\lVert {Bx} \right\rVert \not= 0\) for all nonzero \(x\), and hence \(Bx \not= 0\) for all nonzero \(x\). This is enough to see that \(B\) is one-to-one (if \(Bx = By\), then \(B(x-y) = 0\), so \(x=y\)). As \(B\) is one-to-one operator from \({\mathbb{R}}^n\) to \({\mathbb{R}}^n\) it is onto and hence invertible.
Let us look at (ii). Let \(B\) be invertible and near \(A^{-1}\), that is [eqcontineq] is satisfied. In fact, suppose \(\left\lVert {A-B} \right\rVert \left\lVert {A^{-1}} \right\rVert < \nicefrac{1}{2}\). Then we have shown above (using \(B^{-1}y\) instead of \(x\)) \[\left\lVert {B^{-1}y} \right\rVert \leq \left\lVert {A^{-1}} \right\rVert\left\lVert {A-B} \right\rVert \left\lVert {B^{-1}y} \right\rVert + \left\lVert {A^{-1}} \right\rVert\left\lVert {y} \right\rVert \leq \nicefrac{1}{2} \left\lVert {B^{-1}y} \right\rVert + \left\lVert {A^{-1}} \right\rVert\left\lVert {y} \right\rVert ,\] or \[\left\lVert {B^{-1}y} \right\rVert \leq %\frac{1}{1- \norm{A^{-1}}\norm{A-B}) \norm{A^{-1}}\norm{y} . 2\left\lVert {A^{-1}} \right\rVert\left\lVert {y} \right\rVert .\] So \(\left\lVert {B^{-1}} \right\rVert \leq 2 \left\lVert {A^{-1}} \right\rVert %\frac{\norm{A^{-1}}}{1- \norm{A^{-1}}\norm{A-B})} .\).
Now note that \[A^{-1}(A-B)B^{-1} = A^{-1}(AB^{-1}-I) = B^{-1}-A^{-1} ,\] and \[\left\lVert {B^{-1}-A^{-1}} \right\rVert = \left\lVert {A^{-1}(A-B)B^{-1}} \right\rVert \leq \left\lVert {A^{-1}} \right\rVert\left\lVert {A-B} \right\rVert\left\lVert {B^{-1}} \right\rVert \leq %\frac{\norm{A^{-1}}^2}{1- \norm{A^{-1}}\norm{A-B})} %\norm{A-B} %\leq 2\left\lVert {A^{-1}} \right\rVert^2 \left\lVert {A-B} \right\rVert . \qedhere\]
FIXME: continuity of vector space
Matrices
Finally let us get to matrices, which are a convenient way to represent finite-dimensional operators. If we have bases \(\{ x_1, x_2, \ldots, x_n \}\) and \(\{ y_1, y_2, \ldots, y_m \}\) for vector spaces \(X\) and \(Y\), then we know that a linear operator is determined by its values on the basis. Given \(A \in L(X,Y)\), define the numbers \(\{ a_i^j \}\) as follows \[A x_j = \sum_{i=1}^m a_j^i y_i ,\] and write them as a matrix \[A = \begin{bmatrix} a_1^1 & a_2^1 & \cdots & a_n^1 \\ a_1^2 & a_2^2 & \cdots & a_n^2 \\ \vdots & \vdots & \ddots & \vdots \\ a_1^m & a_2^m & \cdots & a_n^m \end{bmatrix} .\] Note that the columns of the matrix are precisely the coefficients that represent \(A x_j\). Let us derive the familiar rule for matrix multiplication.
When \[x = \sum_{j=1}^n \gamma^j x_j ,\] then \[A x = \sum_{j=1}^n \sum_{i=1}^m \gamma^j a_j^i y_i , = \sum_{i=1}^m \left(\sum_{j=1}^n \gamma^j a_j^i \right) y_i ,\] which gives rise to the familiar rule for matrix multiplication.
There is a one-to-one correspondence between matrices and linear operators in \(L(X,Y)\). That is, once we fix a basis in \(X\) and in \(Y\). If we would choose a different basis, we would get different matrices. This is important, the operator \(A\) acts on elements of \(X\), the matrix is something that works with \(n\)-tuples of numbers.
If \(B\) is an \(r\)-by-\(m\) matrix with entries \(b_k^j\), then the matrix for \(BA\) has the \(i,k\)th entry \(c_k^i\) being \[c_k^i = \sum_{j=1}^m b_k^ja_j^i .\] Note how upper and lower indices line up.
A linear mapping changing one basis to another is then just a square matrix in which the columns represent basis elements of the second basis in terms of the first basis. We call such a linear mapping an change of basis.
Now suppose all the bases are just the standard bases and \(X={\mathbb{R}}^n\) and \(Y={\mathbb{R}}^m\). If we recall the Cauchy-Schwarz inequality we note that \[\left\lVert {Ax} \right\rVert^2 = \sum_{i=1}^m { \left(\sum_{j=1}^n \gamma^j a_j^i \right)}^2 \leq \sum_{i=1}^m { \left(\sum_{j=1}^n {(\gamma^j)}^2 \right) \left(\sum_{j=1}^n {(a_j^i)}^2 \right) } = \sum_{i=1}^m \left(\sum_{j=1}^n {(a_j^i)}^2 \right) \left\lVert {x} \right\rVert^2 .\] In other words, we have a bound on the operator norm \[\left\lVert {A} \right\rVert \leq \sqrt{\sum_{i=1}^m \sum_{j=1}^n {(a_j^i)}^2} .\] If the entries go to zero, then \(\left\lVert {A} \right\rVert\) goes to zero. In particular, if \(A\) if fixed and \(B\) is changing such that the entries of \(A-B\) go to zero then \(B\) goes to \(A\) in operator norm. That is \(B\) goes to \(A\) in the metric space topology induced by the operator norm. We have proved the first part of:
If \(f \colon S \to {\mathbb{R}}^{nm}\) is a continuous function for a metric space \(S\), then taking the components of \(f\) as the entries of a matrix, \(f\) is a continuous mapping from \(S\) to \(L({\mathbb{R}}^n,{\mathbb{R}}^m)\). Conversely if \(f \colon S \to L({\mathbb{R}}^n,{\mathbb{R}}^m)\) is a continuous function then the entries of the matrix are continuous functions.
The proof of the second part is rather easy. Take \(f(x) e_j\) and note that is a continuous function to \({\mathbb{R}}^m\) with standard Euclidean norm (Note \(\left\lVert {(A-B)e_j} \right\rVert \leq \left\lVert {A-B} \right\rVert\)). Such a function recall from last semester that such a function is continuous if and only if its components are continuous and these are the components of the \(j\)th column of the matrix \(f(x)\).
Determinants
It would be nice to have an easy test for when is a matrix invertible. This is where determinants come in. First define the symbol \(\operatorname{sgn}(x)\) for a number is defined by \[\operatorname{sgn}(x) := \begin{cases} -1 & \text{ if $x < 0$} , \\ 0 & \text{ if $x = 0$} , \\ 1 & \text{ if $x > 0$} . \end{cases}\] Suppose \(\sigma = (\sigma_1,\ldots,\sigma_n)\) is a permutation of the integers \((1,\ldots,n)\). It is not hard to see that any permutation can be obtained by a sequence of transpositions (switchings of two elements). Call a permutation even (resp. odd) if it takes an even (resp. odd) number of transpositions to get from \(\sigma\) to \((1,\ldots,n)\). It can be shown that this is well defined, in fact it is not hard to show that \[\operatorname{sgn}(\sigma) := \operatorname{sgn}(\sigma_1,\ldots,\sigma_n) = \prod_{p < q} \operatorname{sgn}(\sigma_q-\sigma_p)\] is \(1\) if \(\sigma\) is even and \(-1\) if \(\sigma\) is odd. This fact can be proved by noting that applying a transposition changes the sign, which is not hard to prove by induction on \(n\). Then note that the sign of \((1,2,\ldots,n)\) is 1.
Let \(S_n\) be the set of all permutations on \(n\) elements (the symmetric group). Let \(A= [a_j^i]\) be a matrix. Define the determinant of \(A\) \[\det(A) := \sum_{\sigma \in S_n} \operatorname{sgn} (\sigma) \prod_{i=1}^n a_{\sigma_i}^i .\]
\(\det(I) = 1\).
\(\det([x_1 x_2 \ldots x_n ])\) as a function of column vectors \(x_j\) is linear in each variable \(x_j\) separately.
If two columns of a matrix are interchanged, then the determinant changes sign.
If two columns of \(A\) are equal, then \(\det(A) = 0\).
If a column is zero, then \(\det(A) = 0\).
\(A \mapsto \det(A)\) is a continuous function.
\(\det\left[\begin{smallmatrix} a & b \\ c &d \end{smallmatrix}\right] = ad-bc\) and \(\det [a] = a\).
In fact, the determinant is the unique function that satisfies (i), (ii), and (iii). But we digress.
We go through the proof quickly, as you have likely seen this before.
(i) is trivial. For (ii) Notice that each term in the definition of the determinant contains exactly one factor from each column.
Part (iii) follows by noting that switching two columns is like switching the two corresponding numbers in every element in \(S_n\). Hence all the signs are changed. Part (iv) follows because if two columns are equal and we switch them we get the same matrix back and so part (iii) says the determinant must have been 0.
Part (v) follows because the product in each term in the definition includes one element from the zero column. Part (vi) follows as \(\det\) is a polynomial in the entries of the matrix and hence continuous. We have seen that a function defined on matrices is continuous in the operator norm if it is continuous in the entries. Finally, part (vii) is a direct computation.
If \(A\) and \(B\) are \(n\times n\) matrices, then \(\det(AB) = \det(A)\det(B)\). In particular, \(A\) is invertible if and only if \(\det(A) \not= 0\) and in this case, \(\det(A^{-1}) = \frac{1}{\det(A)}\).
Let \(b_1,b_2,\ldots,b_n\) be the columns of \(B\). Then \[AB = [ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] .\] That is, the columns of \(AB\) are \(Ab_1,Ab_2,\ldots,Ab_n\).
Let \(b_j^i\) denote the elements of \(B\) and \(a_j\) the columns of \(A\). Note that \(Ae_j = a_j\). By linearity of the determinant as proved above we have \[\begin{split} \det(AB) & = \det ([ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ]) = \det \left(\left[ \sum_{j=1}^n b_1^ja_j \quad Ab_2 \quad \cdots \quad Ab_n \right]\right) \\ & = \sum_{j=1}^n b_1^j \det ([ a_j \quad Ab_2 \quad \cdots \quad Ab_n ]) \\ & = \sum_{1 \leq j_1,j_2,\ldots,j_n \leq n} b_1^{j_1} b_2^{j_2} \cdots b_n^{j_n} \det ([ a_{j_1} \quad a_{j_2} \quad \cdots \quad a_{j_n} ]) \\ & = \left( \sum_{(j_1,j_2,\ldots,j_n) \in S_n} b_1^{j_1} b_2^{j_2} \cdots b_n^{j_n} \operatorname{sgn}(j_1,j_2,\ldots,j_n) \right) \det ([ a_{1} \quad a_{2} \quad \cdots \quad a_{n} ]) . \end{split}\] In the above, go from all integers between 1 and \(n\), to just elements of \(S_n\) by noting that when two columns in the determinant are the same then the determinant is zero. We then reorder the columns to the original ordering and obtain the sgn.
The conclusion follows by recognizing the determinant of \(B\). The rows and columns are swapped, but a moment’s reflection reveals it does not matter. We could also just plug in \(A=I\) above.
For the second part of the theorem note that if \(A\) is invertible, then \(A^{-1}A = I\) and so \(\det(A^{-1})\det(A) = 1\). If \(A\) is not invertible, then the columns are linearly dependent. That is, suppose \[\sum_{j=1}^n c^j a_j = 0 .\] Without loss of generality suppose \(c^1\neq 1\). Take \[B := \begin{bmatrix} c^1 & 0 & 0 & \cdots & 0 \\ c^2 & 1 & 0 & \cdots & 0 \\ c^3 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ c^n & 0 & 0 & \cdots & 1 \end{bmatrix} .\] It is not hard to see from the definition that \(\det(B) = c^1 \not= 0\). Then \(\det(AB) = \det(A)\det(B) = c^1\det(A)\). Note that the first column of \(AB\) is zero, and hence \(\det(AB) = 0\). Thus \(\det(A) = 0\).
There are tree types of so-called elementary matrices. First for some \(j = 1,2,\ldots,n\) and some \(\lambda \in {\mathbb{R}}\), \(\lambda \neq 0\), an \(n \times n\) matrix \(E\) defined by \[Ee_i = \begin{cases} e_i & \text{if $i \neq j$} , \\ \lambda e_i & \text{if $i = j$} . \end{cases}\] Given any \(n \times m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with the \(k\)th row multiplied by \(\lambda\). It is an easy computation (exercise) that \(\det(E) = \lambda\).
Second, for some \(j\) and \(k\) with \(j\neq k\), and \(\lambda \in {\mathbb{R}}\) an \(n \times n\) matrix \(E\) defined by \[Ee_i = \begin{cases} e_i & \text{if $i \neq j$} , \\ e_i + \lambda e_k & \text{if $i = j$} . \end{cases}\] Given any \(n \times m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with \(\lambda\) times the \(k\)th row added to the \(j\)th row. It is an easy computation (exercise) that \(\det(E) = 1\).
Finally for some \(j\) and \(k\) with \(j\neq k\) an \(n \times n\) matrix \(E\) defined by \[Ee_i = \begin{cases} e_i & \text{if $i \neq j$ and $i \neq k$} , \\ e_k & \text{if $i = j$} , \\ e_j & \text{if $i = k$} . \end{cases}\] Given any \(n \times m\) matrix \(M\) the matrix \(EM\) is the same matrix with \(j\)th and \(k\)th rows swapped. It is an easy computation (exercise) that \(\det(E) = -1\).
Elementary matrices are useful for computing the determinant. The proof of the following proposition is left as an exercise.
[prop:elemmatrixdecomp] Let \(T\) be an \(n \times n\) invertible matrix. Then there exists a finite sequence of elementary matrices \(E_1, E_2, \ldots, E_k\) such that \[T = E_1 E_2 \cdots E_k ,\] and \[\det(T) = \det(E_1)\det(E_2)\cdots \det(E_k) .\]
Determinant is independent of the basis. In other words, if \(B\) is invertible then, \[\det(A) = \det(B^{-1}AB) .\]
The proof is immediate. If in one basis \(A\) is the matrix representing a linear operator, then for another basis we can find a matrix \(B\) such that the matrix \(B^{-1}AB\) takes us to the first basis, applies \(A\) in the first basis, and takes us back to the basis we started with. Therefore, the determinant can be defined as a function on the space \(L(X)\) for some finite dimensional metric space \(X\), not just on matrices. We choose a basis on \(X\), and we can represent a linear mapping using a matrix with respect to this basis. We obtain the same determinant as if we had used any other basis. It follows from the two propositions that \[\det \colon L(X) \to {\mathbb{R}}\] is a well-defined and continuous function.
Exercises
If \(X\) is a vector space with a norm \(\left\lVert {\cdot} \right\rVert\), then show that \(d(x,y) := \left\lVert {x-y} \right\rVert\) makes \(X\) a metric space.
Verify the computation of the determinant for the three types of elementary matrices.