3.3 The inverse of a matrix
- Page ID
- 113847
This page is a draft and is under active development.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)\( \def\Span#1{\text{Span}\left\lbrace #1\right\rbrace} \def\vect#1{\mathbf{#1}} \def\ip{\boldsymbol{\cdot}} \def\iff{\Longleftrightarrow} \def\cp{\times} \)
Introduction
In the Section Sec:MatrixOps we defined the sum and product of matrices (of compatible sizes), and we saw that to a certain extent matrix algebra is guided by the same rules as the arithmetic of real numbers. We can also subtract two matrices via \[ A - B = A + (-1)B, \nonumber \nonumber\] but we did not mention division of matrices. For two numbers \(a \) and \(b \), with \(a \neq 0 \), the equation \[ ax = b \nonumber \nonumber\] has the unique solution \[ x = \frac{b}{a} = a^{-1}b = ba^{-1} \nonumber \nonumber\] where \[ a^{-1} = \frac1a \nonumber \nonumber\] is the (unique) solution of the equation \[ ax = 1 \nonumber \nonumber\] The bad news: \[ \frac{A}{B} \text{ cannot be defined in any useful way!} \nonumber \nonumber\] First of all the corresponding matrix equation \[ AX = B, A\neq O \nonumber \nonumber\] does not always have a solution, or the solution is not unique, not even in the case of two \(n \times n \) matrices \(A \) and \(B \). Two examples to illustrate this:
And lastly, if there is a matrix \(C \) for which \[ CA = I \nonumber \nonumber\] and we would adopt the notation \[ C = A^{-1} \nonumber \nonumber\] then \[ \begin{array}{rcl} AX = B &\Rightarrow& C(AX) = CB \\ &\Rightarrow& (CA)X = IX = \underline{\underline{X =}} CB = \underline{\underline{A^{-1}B}}. \end{array} \nonumber \nonumber\] So \(X = A^{-1}B \). However, it is in no way clear why \(A^{-1}B \) and \(BA^{-1} \) should be equal, and in general indeed they are not. So the notation \[ \dfrac{B}{A} \nonumber \nonumber\] will still be ambiguous. For non-square matrices things are even worse. In this section we will only consider square matrices.
Definition and basic properties of the inverse
Note the use of the definite article the in the sentence `\(B \) is called the inverse of \(A \)'. The following proposition justifies this choice of word.
The proof is very short, when we plug in the right idea at the right place:
Skip/Read the proof -
Suppose \(B \) and \(C \) are two matrices that satisfy the properties of being an inverse of \(A \), i.e. \[ AB = BA = I \quad \text{and} \quad AC = CA = I. \nonumber \nonumber\] Then the following chain of identities proves that \(B \) and \(C \) must be equal: \[ B = B I = B (AC) = (BA) C= I C = C. \nonumber \nonumber\]
Actually, the proof shows slightly more, as the assumptions \[ CA= I, \quad AB = I \nonumber \nonumber\] are not used. In fact it shows that for three \(n \times n \) matrices \(A \), \(B \) and \(C \) \[ \text{if} \quad BA = I \quad \text{ and }\quad AC = I \quad\text{ then } \quad B = C. \nonumber \nonumber\]
The first example can be generalized:
We leave the verification as an exercise.
The following proposition shows that the above considerations can be generalized.
Skip/Read the proof -
As in the proof in Remark Rem:DetZeroDependentColumns we have to prove two implications: \[ AX = I \text{ has a unique solution } \Longrightarrow A \text{ has linearly independent columns} \nonumber \nonumber\] and \[ A \text{ has linearly independent columns} \Longrightarrow AX = I \text{ has a unique solution. } \nonumber \nonumber\] For the first part, assume that \[ AX = I \text{ has a (unique) solution. } \nonumber \nonumber\] That means that every column \(\vect{e}_j \) of the identity matrix is a linear combination of columns \(\vect{a}_1, \ldots, \vect{a}_n \) of \(A \). So the span of the columns of \(A \) contains the span of the columns of \(\vect{e}_1, \ldots, \vect{e}_n \), which is the whole \(\mathbb{R}^n \). Thus every linear system \[ A\vect{x} =\vect{b}, \vect{b} \in \mathbb{R}^{n} \nonumber \nonumber\] has a solution. Then the reduced echelon form of \(A \) must have a pivot in every row, and, since it is a square matrix, it must be the identity matrix. Consequently, it has a pivot in every column, so the linear system \[ A\vect{x} =\vect{0} \nonumber \nonumber\] only has the trivial solution, which proves that indeed the columns of \(A \) are linearly independent. For the converse, suppose that \(A \) has linearly independent columns. Then the reduced echelon form of \(A \) must be the identity matrix. This implies that for each \(\vect{b} \) in \(\mathbb{R}^n \) \[ [ A | \vect{b} ] \sim [ I | \vect{b'} ], \nonumber \nonumber\] and in particular, each linear system \[ A\vect{x} =\vect{e}_j \nonumber \nonumber\] has a unique solution. If we denote this solution by \(\vect{c}_j \) we have that \[ A[ \vect{c}_1 \quad \vect{c}_2 \quad \ldots \quad \vect{c}_n ] = [ A\vect{c}_1 \quad A\vect{c}_2 \quad \ldots \quad A\vect{c}_n ] = [ \vect{e}_1 \quad \vect{e}_2 \quad \ldots \quad \vect{e}_n ] = I. \nonumber \nonumber\] Since all solutions \(\vect{c}_j \) are unique, the solution of the equation \[ A\vect{x} = I \nonumber \nonumber\] is unique as well.
It makes sense that the solution \(B \) of this matrix equation will be the inverse of \(A \), and it is, but it takes some effort to show that the other requirement, \[ BA = I \nonumber \nonumber\] is also fulfilled. In the next subsection we will see that the matrix equation \[ AX = I \nonumber \nonumber\] will lead the way to an algorithm to compute the inverse of a matrix. Before we go there we will look at some general properties of invertible matrices.
Skip/Read the proof -
We multiply both sides of the equation \[ AX = B \nonumber \nonumber\] by \(A^{-1} \) and use the fact that the matrix product has the associative property: \[ \begin{array}{rl} AX = B \Longrightarrow A^{-1}(AX) = A^{-1}B & \Longrightarrow (A^{-1}A)X = IX = A^{-1}B \\ &\Longrightarrow X = A^{-1}B. \end{array} \nonumber \nonumber\]
We illustrate the proposition by an example.
A note of warning: the proof of Proposition 11 is based on the existence of the inverse of the matrix \(A \). Beware of this: never start using the expression \(A^{-1} \) unless you have made sure first that the matrix \(A \) is indeed invertible. If not, you may lead yourself into inconsistencies like in the following example:
The next proposition contains a few rules to manipulate inverse matrices.
- The matrix \(cA \) is invertible, and \[ (cA)^{-1} = \dfrac1c A^{-1}. \nonumber \nonumber\]
- The matrix \(A^T \) is invertible, and \[ (A^T)^{-1} = (A^{-1})^T. \nonumber \nonumber\]
- The matrix \(A^{-1} \) is invertible, and \[ (A^{-1})^{-1} = A. \nonumber \nonumber\]
Skip/Read the proof -
All statements can be proved by verifying that the relevant products are equal to \(I \).
- The matrix \(A^{-1} \) exists, and so does \(\dfrac1c A^{-1} \). We find: \[ (cA) \cdot \dfrac1c A^{-1} = c\cdot \dfrac1c A\cdot A^{-1} = 1 \quad \cdot I = I, \nonumber \nonumber\] and likewise \(\dfrac1c A^{-1}\cdot (cA) = I \), which proves that indeed \(\dfrac1c A^{-1} = (cA)^{-1} \).
- Since it is given that \(A^{-1} \) exists we can proceed as follows, where we make use of the characteristic property \( B^TA^T = (AB)^T \): \[ (A^{-1})^TA^T = ( AA^{-1})^T = I^T = I \text{ and } A^T(A^{-1})^T =( A^{-1}A)^T = I^T = I, \nonumber \nonumber\] which settles the second statement. To prove (iii), see Exercise 15.
The next example gives an illustration of property ii.
The last property we mention and prove is the product rule for the matrix inverse:
Skip/Read the proof -
Again we just check that the properties of the definition hold: suppose that \(A \) and \(B \) are invertible with inverses \(A^{-1} \) and \(B^{-1} \). Then using the associative property we find \[ (B^{-1}A^{-1})(AB) = B^{-1}A^{-1}AB = B^{-1}(A^{-1}A)B = B^{-1}IB = B^{-1}B = I, \nonumber \nonumber\] and along the same lines \[ (AB) B^{-1}A^{-1} = I. \nonumber \nonumber\] This shows that \(B^{-1}A^{-1} \) is indeed the inverse of \(AB \).
How to compute the inverse
In fact the construction of the inverse of a matrix was already present implicitly in Propositon 10. The inverse of the matrix \(A \) must satisfy the equation \(AX = I \). Written out column by column this means that \[ AX = I \quad \iff \quad A[ \vect{x}_1 \quad \vect{x}_2 \quad \ldots \quad \vect{x}_n ] = [ \vect{e}_1 \quad \vect{e}_2 \quad \ldots \quad \vect{e}_n ]. \nonumber \nonumber\] For the existence of a solution of this equation Prop. 10 tells us it is \underline{necessary} that \(A \) has independent columns, and we can furthermore read off that the columns of the matrix \(X \) will be the (unique) solutions of the linear systems \[ A\vect{x}_k = \vect{e}_k, \text{ where } k = 1,2,\ldots, n \nonumber \nonumber\] So let us first focus on this equation by considering a fairly general \(3\times 3 \) matrix \(A \).
Now was this just beginners' luck? It wasn't, as the next proposition shows
Skip/Read the proof -
We have already seen (Proposition 10) that an invertible matrix has independent columns, which implies that the reduced echelon form of \(A \) is indeed the identity matrix. And then it is clear that via row operations we get \[ \left[ A \, \vert \, I \right] \sim \quad . . . . . \quad \sim \left[ I \, \vert \, B \right], \nonumber \nonumber\] where the matrix \(B \) satisfies \(AB = I \). What we have to show is that \[ BA = I \nonumber \nonumber\] as well. To understand that this is indeed true, we recall (Definition Dfn:Matrixproduct:ElementaryMatrix) that row operations can be effectuated via multiplications with elementary matrices. Furthermore, since the matrix product is defined column by column, i.e. \[ MX = M\left[\begin{array}{cccc}\vect{x}_1 &\vect{x}_2 &\ldots &\vect{x}_p \end{array}\right] = \left[\begin{array}{cccc}M\vect{x}_1 &M\vect{x}_2 &\ldots &M\vect{x}_p \end{array}\right], \nonumber \nonumber\] we also have \[ E\left[ A_1 \quad \, \vert \, A_2 \quad \right] = \left[ EA_1 \quad \, \vert \, EA_2 \quad \right]. \nonumber \nonumber\] A series of \(k \) row operations can be mimicked by \(k \) multiplications with elementary matrices: \[ \begin{array}{ccl} \left[ A \, \vert \, I \right] &\sim& \left[ E_1A \, \vert \, E_1I \right] \sim \left[ E_2 E_1A \, \vert \, E_2E_1I \right] \sim \ldots \quad \sim \\ &\sim& \left[ E_k\cdots E_2 E_1A \, \vert \, E_k\cdots E_2E_1I \right] = \left[ I \, \vert \, B \right]. \end{array} \nonumber \nonumber\] So the matrix \(B \) that was found as the solution of the matrix equation \[ AX = I \nonumber \nonumber\] is the product of all the elementary matrices by which \(A \) is reduced to the identity matrix. Thus we have shown that indeed \[ BA = (E_k\cdots E_2 E_1)A = I. \nonumber \nonumber\]
If \(A \) is not invertible, then the outcome of the row reduction of \[ \left[ A \, \vert \, I \right] \nonumber \nonumber\] will also lead to the correct answer: as soon as it is clear that \(A \) cannot be row reduced to \(I \) we can conclude that \(A \) is not invertible.
To help understand the above exposition let us run through the whole procedure for a specific matrix .
Characterizations of invertibility
In the previous subsections quite a few properties of invertible matrices came along, either explicitly or implicitly. For future reference we list them in a theorem. Recall: by definition a (square) matrix \(A \) is invertible (or: regular) if and only if there exists a matrix \(B \) for which \[ AB = BA= I. \nonumber \nonumber\]
- \(A \) is invertible;
- there exists a matrix \(B \) for which \(AB = I \);
- for each \(\vect{b}\in\mathbb{R}^n \) the linear system \(A\vect{x} = \vect{b} \) has a unique solution;
- \(A \) is row equivalent to the identity matrix \(I_n \);
- \(A \) has independent columns;
- \(A \) can be written as a product of elementary matrices: \[ A = E_1E_2\cdots E_k. \nonumber \nonumber\]
Skip/Read the proof -
It is a good exercise to find out where the evidence of each characterization is found, and wherever necessary to fill in the missing details.
There are many variations on Theorem 25. The following exercise contains a few.
- There is a matrix \(B \) such that \(BA = I \) ;
- \(A \) has independent rows;
- each column of the matrix \(A \) is a pivot column;
- the columns of \(A \) span the whole \(\mathbb{R}^n \). Again it may very well be that you have to resort to previous sections.