6.1: Linear Transformations and Matrices

Last updated
Save as PDF

Page ID: 33463

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

In this and subsequent sections it will often be convenient to write vectors vertically; thus, instead of $\mathbf{X}=(x_1,x_2, \dots,x_n)$ we will write \[ \mathbf{X}=\left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\end{array}\right] \nonumber \] when dealing with matrix operations. Although we assume that you have completed a course in linear algebra, we will review the pertinent matrix operations.

We have defined vector-valued functions as ordered $n$-tuples of real-valued functions, in connection with composite functions $h=f\circ \mathbf{G}$, where $f$ is real-valued and $\mathbf{G}$ is vector-valued. We now consider vector-valued functions as objects of interest on their own.

If $f_1$, $f_2$, , $f_m$ are real-valued functions defined on a set $D$ in $\R^n$, then \[ \mathbf{F}=\left[\begin{array}{c} f_1\\ f_2\\\vdots\\ f_m\end{array}\right] \nonumber \] assigns to every $\mathbf{X}$ in $D$ an $m$-vector \[ \mathbf{F}(\mathbf{X})=\left[\begin{array}{c} f_1(\mathbf{X})\\ f_2(\mathbf{X}) \\\vdots\\ f_m(\mathbf{X})\end{array}\right]. \nonumber \] Recall that $f_1$, $f_2$, , $f_m$ are the {}, or simply {}, of $\mathbf{F}$. We write \[ \mathbf{F}: \R^n \to \R^m \nonumber \] to indicate that the domain of $\mathbf{F}$ is in $\R^n$ and the range of $\mathbf{F}$ is in $\R^m$. We also say that $\mathbf{F}$ is a {} $\R^n$ {} $\R^m$. If $m=1$, we identify $\mathbf{F}$ with its single component function $f_1$ and regard it as a real-valued function.

.75pc

The simplest interesting transformations from $\R^n$ to $\R^m$ are the {}, defined as follows

.75pc

If can be seen by induction (Exercise~) that if $\mathbf{L}$ is linear, then \[\begin{equation}\label{eq:6.1.2} \mathbf{L}(a_1\mathbf{X}_1+a_2\mathbf{X}_2+\cdots+a_k\mathbf{X}_k)= a_1\mathbf{L}(\mathbf{X}_1)+a_2\mathbf{L}(\mathbf{X}_2)+\cdots+a_k\mathbf{L}(\mathbf{X}_k) \end{equation} \nonumber \] for any vectors $\mathbf{X}_1$, $\mathbf{X}_2$, , $\mathbf{X}_k$ and real numbers $a_1$, $a_2$, , $a_k$. Any $\mathbf{X}$ in $\R^n$ can be written as \[\begin{eqnarray*} \mathbf{X}\ar=\left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\end{array}\right] =x_1\left[\begin{array}{c} 1\\ 0\\\vdots\\ 0\end{array}\right] +x_2\left[\begin{array}{c} 0\\ 1\\\vdots\\ 0\end{array}\right]+\cdots +x_n\left[\begin{array}{c} 0\\ 0\\\vdots\\ 1\end{array}\right]\\ \ar=x_1\mathbf{E}_1+x_2\mathbf{E}_2+\cdots+x_n\mathbf{E}_n. \end{eqnarray*} \nonumber \] Applying with $k=n$, $\mathbf{X}_i=\mathbf{E}_i$, and $a_i=x_i$ yields \[\begin{equation}\label{eq:6.1.3} \mathbf{L}(\mathbf{X})=x_1\mathbf{L}(\mathbf{E}_1)+x_2\mathbf{L}(\mathbf{E}_2) +\cdots+x_n\mathbf{L}(\mathbf{E}_n). \end{equation} \nonumber \] Now denote \[ \mathbf{L}(\mathbf{E}_j)=\left[\begin{array}{c} a_{1j}\\ a_{2j}\\ \vdots\\ a_{mj}\end{array}\right], \nonumber \] so becomes \[ \mathbf{L}(\mathbf{X})=x_1\left[\begin{array}{c} a_{11}\\ a_{21}\\\vdots\\ a_{m1} \end{array}\right] +x_2\left[\begin{array}{c} a_{12}\\ a_{22}\\\vdots\\ a_{m2}\end{array} \right]+\cdots +x_n\left[\begin{array}{c} a_{1n}\\ a_{2n}\\\vdots\\ a_{mn}\end{array} \right], \nonumber \] which is equivalent to . This proves that if $\mathbf{L}$ is linear, then $\mathbf{L}$ has the form . We leave the proof of the converse to you (Exercise~).

We call the rectangular array \[\begin{equation}\label{eq:6.1.4} \mathbf{A}=\left[\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{21}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\cdots&a_{mn}\end{array}\right] \end{equation} \nonumber \]

the {} of the linear transformation . The number $a_{ij}$ in the $i$th row and $j$th column of $\mathbf{A}$ is called the {} $\mathbf{A}$. We say that $\mathbf{A}$ is an $m\times n$ matrix, since $\mathbf{A}$ has $m$ rows and $n$ columns. We will sometimes abbreviate as \[ \mathbf{A}=[a_{ij}]. \nonumber \]

We will now recall the matrix operations that we need to study the differential calculus of transformations.

We leave the proofs of next three theorems to you (Exercises~–)

The next theorem shows why Definition~ is appropriate. We leave the proof to you (Exercise~).

If a real-valued function $f: \R^n\to \R$ is differentiable at $\mathbf{X}_0$, then \[ d_{\mathbf{X}_0}f=f_{x_1}(\mathbf{X}_0)\,dx_1+f_{x_2} (\mathbf{X}_0)\,dx_2+\cdots+f_{x_n} (\mathbf{X}_0)\,dx_n. \nonumber \] This can be written as a matrix product \[\begin{equation}\label{eq:6.1.5} d_{\mathbf{X}_0}f=[f_{x_1}(\mathbf{X}_0)\quad f_{x_2}(\mathbf{X}_0)\quad \cdots\quad f_{x_n}(\mathbf{X}_0)] \left[\begin{array}{c} dx_1\\ dx_2\\\vdots\\ dx_n\end{array}\right]. \end{equation} \nonumber \] We define the {} by \[\begin{equation}\label{eq:6.1.6} f'(\mathbf{X}_0)=[f_{x_1}(\mathbf{X}_0)\quad f_{x_2}(\mathbf{X}_0)\quad\cdots\quad f_{x_n}(\mathbf{X}_0)] \end{equation} \nonumber \] and the {} by \[ d\mathbf{X}=\left[\begin{array}{c} dx_1\\ dx_2\\\vdots\\ dx_n\end{array}\right]. \nonumber \]

Then can be rewritten as \[\begin{equation}\label{eq:6.1.7} d_{\mathbf{X}_0}f=f'(\mathbf{X}_0)\,d\mathbf{X}. \end{equation} \nonumber \]

This is analogous to the corresponding formula for functions of one variable (Example~), and shows that the differential matrix $f'(\mathbf{X}_0)$ is a natural generalization of the derivative. With this new notation we can express the defining property of the differential in a way similar to the form that applies for $n=1$: \[ \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{f(\mathbf{X})-f(\mathbf{X}_0)-f'(\mathbf{X}_{0}) (\mathbf{X}-\mathbf{X}_0)}{|\mathbf{X}-\mathbf{X}_0|}=0, \nonumber \] where $\mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0})$ and $f'(\mathbf{X}_0)(\mathbf{X}-\mathbf{X}_{0})$ is the matrix product \[ [f_{x_1}(\mathbf{X}_0)\quad f_{x_2}(\mathbf{X}_0)\quad\cdots\quad f_{x_n}(\mathbf{X}_0)] \left[\begin{array}{c} x_1-x_{10}\\ x_2-x_{20}\\\vdots\\ x_n-x_{n0}\end{array} \right]. \nonumber \]

As before, we omit the $\mathbf{X}_0$ in and when it is not necessary to emphasize the specific point; thus, we write \[ f'= \left[\hskip-.5em\begin{array}{cccc} f_{x_1}&f_{x_2}&\cdots&f_{x_n} \end{array}\hskip-.5em\right] \mbox{\quad and \quad}df=f'd\mathbf{X}. \nonumber \]

We will need the following definition in the next section.

To justify this definition, we must show that $\|\mathbf{A}\|$ exists. The components of $\mathbf{Y}=\mathbf{AX}$ are \[ y_i=a_{i1}x_1+a_{i2}x_2+\cdots+a_{in}x_n,\quad 1\le i\le m. \nonumber \] By Schwarz’s inequality, \[ y^2_i\le (a^2_{i1}+a^2_{i2}+\cdots+a^2_{in}) |\mathbf{X}|^2. \nonumber \] Summing this over $1\le i\le m$ yields \[ |\mathbf{Y}|^2\le\left(\sum^m_{i=1}\sum^n_{j=1} a^2_{ij}\right) |\mathbf{X}|^2. \nonumber \] Therefore, the set \[ B=\set{K}{|\mathbf{AX}|\le K|\mathbf{X}|\mbox{\quad for all $\mathbf{X}$ in $\R^n$}} \nonumber \] is nonempty. Since $B$ is bounded below by zero, $B$ has an infimum $\alpha$. If $\epsilon>0$, then $\alpha+\epsilon$ is in $B$ because if not, then no number less than $\alpha+\epsilon$ could be in $B$. Then $\alpha+\epsilon$ would be a lower bound for $B$, contradicting the definition of $\alpha$. Hence, \[ |\mathbf{AX}|\le (\alpha+\epsilon)|\mathbf{X}|,\quad\mathbf{X}\in \R^n. \nonumber \] Since $\epsilon$ is an arbitrary positive number, this implies that \[ |\mathbf{AX}|\le\alpha |\mathbf{X}|,\quad\mathbf{X}\in \R^n, \nonumber \] so $\alpha\in B$. Since no smaller number is in $B$, we conclude that $\|\mathbf{A}\|=\alpha$.

In our applications we will not have to actually compute the norm of a matrix $\mathbf{A}$; rather, it will be sufficient to know that the norm exists (finite).

Linear transformations from $\R^n$ to $\R^n$ will be important when we discuss the inverse function theorem in Section~6.3 and change of variables in multiple integrals in Section~7.3. The matrix of such a transformation is {}; that is, it has the same number of rows and columns.

We assume that you know the definition of the determinant \[ \det(\mathbf{A})=\left|\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn}\end{array}\right| \nonumber \] of an $n\times n$ matrix \[ \mathbf{A}=\left[\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn}\end{array}\right]. \nonumber \]

The {}, $\mathbf{A}^t$, of a matrix $\mathbf{A}$ (square or not) is the matrix obtained by interchanging the rows and columns of $\mathbf{A}$; thus, if \[ \mathbf{A}=\left[\begin{array}{ccr} 1&2 &3\\3&1 &4\\ 0&1 &-2\end{array}\right],\mbox{\quad then\quad} \mathbf{A}^t=\left[\begin{array}{ccr} 1&3 &0\\2&1 &1\\ 3&4 &-2\end{array}\right]. \nonumber \] A square matrix and its transpose have the same determinant; thus, \[ \det(\mathbf{A}^t)=\det(\mathbf{A}). \nonumber \]

We take the next theorem from linear algebra as given.

.2pc

The entries $a_{ii}$, $1\le i\le n$, of an $n\times n$ matrix $\mathbf{A}$ are on the {} of $\mathbf{A}$. The $n\times n$ matrix with ones on the main diagonal and zeros elsewhere is called the {} and is denoted by $\mathbf{I}$; thus, if $n=3$, .4pc \[ \mathbf{I}=\left[\begin{array}{ccc} 1&0 &0\\0&1 &0\\ 0&0 &1\end{array}\right]. \nonumber \] .4pc We call $\mathbf{I}$ the identity matrix because $\mathbf{A}\mathbf{I}= \mathbf{A}$ and $\mathbf{I}\mathbf{A}=\mathbf{A}$ if $\mathbf{A}$ is any $n\times n$ matrix. We say that an $n\times n$ matrix $\mathbf{A}$ is {} if there is an $n\times n$ matrix $\mathbf{A}^{-1}$, the {} $\mathbf{A}$, such that $\mathbf{A}\mathbf{A}^{-1}=\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}$. Otherwise, we say that $\mathbf{A}$ is {}

Our main objective is to show that an $n\times n$ matrix $\mathbf{A}$ is nonsingular if and only if $\det(\mathbf{A})\ne0$. We will also find a formula for the inverse.

.1pc

For a proof of the following theorem, see any elementary linear algebra text.

If we compute $\det(\mathbf{A})$ from the formula \[ \det(\mathbf{A})=\sum_{k=1}^na_{ik}c_{ik}, \nonumber \]

we say that we are {}. Since we can choose $i$ arbitrarily from $\{1, \dots,n\}$, there are $n$ ways to do this. If we compute $\det(\mathbf{A})$ from the formula \[ \det(\mathbf{A})=\sum_{k=1}^na_{kj}c_{kj}, \nonumber \] we say that we are {}. There are also $n$ ways to do this.

In particular, we note that $\det(\mathbf{I})=1$ for all $n\ge1$.

If $\det(\mathbf{A})=0$, then $\det(\mathbf{A}\mathbf{B})=0$ for any $n\times n$ matrix, by Theorem~. Therefore, since $\det(\mathbf{I})=1$, there is no matrix $n\times n$ matrix $\mathbf{B}$ such that $\mathbf{A}\mathbf{B}=\mathbf{I}$; that is, $\mathbf{A}$ is singular if $\det(\mathbf{A})=0$. Now suppose that $\det(\mathbf{A})\ne0$. Since implies that \[ \mathbf{A}\adj(\mathbf{A})=\det(\mathbf{A})\mathbf{I} \nonumber \] and implies that \[ \adj(\mathbf{A})\mathbf{A}=\det(\mathbf{A})\mathbf{I}, \nonumber \] dividing both sides of these two equations by $\det(\mathbf{A})$ shows that if $\mathbf{A}^{-1}$ is as defined in , then $\mathbf{A}\mathbf{A}^{-1}=\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}$. Therefore, $\mathbf{A}^{-1}$ is an inverse of $\mathbf{A}$. To see that it is the only inverse, suppose that $\mathbf{B}$ is an $n\times n$ matrix such that $\mathbf{A}\mathbf{B}=\mathbf{I}$. Then $\mathbf{A}^{-1}(\mathbf{A}\mathbf{B})=\mathbf{A}^{-1}$, so $(\mathbf{A}^{-1}\mathbf{A})\mathbf{B}=\mathbf{A}^{-1}$. Since $\mathbf{A}\mathbf{A}^{-1}=\mathbf{I}$ and $\mathbf{I}\mathbf{B}=\mathbf{B}$, it follows that $\mathbf{B}=\mathbf{A}^{-1}$.

Now consider the equation \[\begin{equation} \label{eq:6.1.11} \mathbf{A}\mathbf{X}=\mathbf{Y} \end{equation} \nonumber \] with \[ \mathbf{A}= \left[\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn} \end{array}\right],\quad \mathbf{X}= \left[\begin{array}{c} x_1\\x_2\\\vdots\\x_n \end{array}\right], \mbox{\quad and \quad} \mathbf{Y}= \left[\begin{array}{c} y_1\\y_2\\\vdots\\y_n \end{array}\right]. \nonumber \] Here $\mathbf{A}$ and $\mathbf{Y}$ are given, and the problem is to find $\mathbf{X}$.

Suppose that $\mathbf{A}$ is nonsingular, and let $\mathbf{X}=\mathbf{A}^{-1}\mathbf{Y}$. Then \[ \mathbf{A}\mathbf{X}=\mathbf{A}(\mathbf{A}^{-1}\mathbf{Y})= (\mathbf{A}\mathbf{A}^{-1})\mathbf{Y} =\mathbf{I}\mathbf{Y}=\mathbf{Y}; \nonumber \] that is, $\mathbf{X}$ is a solution of . To see that $\mathbf{X}$ is the only solution of , suppose that $\mathbf{A}\mathbf{X}_1=\mathbf{Y}$. Then $\mathbf{A}\mathbf{X}_1=\mathbf{A} \mathbf{X}$, so \[\begin{eqnarray*} \mathbf{A}^{-1}(\mathbf{A}\mathbf{X})\ar= \mathbf{A}^{-1}(\mathbf{A}\mathbf{X}_1)\\ \arraytext{and}\\ (\mathbf{A}^{-1}\mathbf{A})\mathbf{X}\ar= (\mathbf{A}^{-1}\mathbf{A})\mathbf{X}_1, \end{eqnarray*} \nonumber \] which is equivalent to $\mathbf{I}\mathbf{X}=\mathbf{I}\mathbf{X}_1$, or $\mathbf{X}=\mathbf{X}_1$.

Conversely, suppose that has a solution for every $\mathbf{Y}$, and let $\mathbf{X}_i$ satisfy $\mathbf{A}\mathbf{X}_i=\mathbf{E}_i$, $1\le i\le n$. Let \[ \mathbf{B}= [\mathbf{X}_1\,\mathbf{X}_2\,\cdots\,\mathbf{X}_n]; \nonumber \] that is, $\mathbf{X}_1$, $\mathbf{X}_2$, , $\mathbf{X}_n$ are the columns of $\mathbf{B}$. Then \[ \mathbf{A}\mathbf{B}= [\mathbf{A}\mathbf{X}_1\,\mathbf{A}\mathbf{X}_2\,\cdots\,\mathbf{A}\mathbf{X}_n]= [\mathbf{E}_1\,\mathbf{E}_2\,\cdots\,\mathbf{E}_n] =\mathbf{I}. \nonumber \] To show that $\mathbf{B}=\mathbf{A}^{-1}$, we must still show that $\mathbf{B}\mathbf{A}=\mathbf{I}$. We first note that, since $\mathbf{A}\mathbf{B} =\mathbf{I}$ and $\det(\mathbf{B}\mathbf{A})=\det(\mathbf{A}\mathbf{B})=1$ (Theorem~), $\mathbf{B}\mathbf{A}$ is nonsingular (Theorem~). Now note that \[ (\mathbf{B}\mathbf{A})(\mathbf{B}\mathbf{A})= \mathbf{B}(\mathbf{A}\mathbf{B})\mathbf{A})=\mathbf{B}\mathbf{I}\mathbf{A}; \nonumber \] that is, \[ (\mathbf{B}\mathbf{A})(\mathbf{B}\mathbf{A})=(\mathbf{B}\mathbf{A}). \nonumber \] Multiplying both sides of this equation on the left by $\mathbf{B}\mathbf{A})^{-1}$ yields $\mathbf{B}\mathbf{A}=\mathbf{I}$.

The following theorem gives a useful formula for the components of the solution of .

From Theorems~ and , the solution of $\mathbf{A}\mathbf{X}=\mathbf{Y}$ is \[\begin{eqnarray*} \left[\begin{array}{c} x_1\\x_2\\\vdots\\x_n \end{array}\right] =\mathbf{A}^{-1}\mathbf{Y} \ar=\frac{1}{\det(\mathbf{A})} \left[\begin{array}{cccc} c_{11}&c_{21}&\cdots&c_{n1}\\ c_{12}&c_{22}&\cdots&c_{n2}\\ \cdots&\cdots&\ddots&\cdots\\ c_{1n}&c_{2n}&\cdots&c_{nn} \end{array}\right] \left[\begin{array}{c} y_1\\y_2\\\vdots\\y_n \end{array}\right]\\ \ar= \left[\begin{array}{c} c_{11}y_1+c_{21}y_2+\cdots+c_{n1}y_n\\ c_{12}y_1+c_{22}y_2+\cdots+c_{n2}y_n\\ \vdots\\ c_{1n}y_1+c_{2n}y_2+\cdots+c_{nn}y_n \end{array}\right]. \end{eqnarray*} \nonumber \] But \[ c_{11}y_1+c_{21}y_2+\cdots+c_{n1}y_n= \left|\begin{array}{cccc} y_1&a_{12}&\cdots&a_{1n}\\ y_2&a_{22}&\dots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ y_n&a_{n2}&\cdots&a_{nn}\end{array}\right|, \nonumber \]

as can be seen by expanding the determinant on the right in cofactors of its first column. Similarly, \[ c_{12}y_1+c_{22}y_2+\cdots+c_{n2}y_n= \left|\begin{array}{ccccc} a_{11}&y_1&a_{13}&\cdots&a_{1n}\\ a_{21}&y_2&a_{23}&\cdots&a_{2n}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ a_{n1}&y_n&a_{n3}&\cdots&a_{nn}\end{array}\right|, \nonumber \] as can be seen by expanding the determinant on the right in cofactors of its second column. Continuing in this way completes the proof.

A system of $n$ equations in $n$ unknowns \[\begin{equation} \label{eq:6.1.12} \begin{array}{rcl} a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n\ar=0\\ a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n\ar=0\\ &\vdots&\\ a_{n1}x_1+a_{n2}x_2+\cdots+a_{nn}x_n\ar=0 \end{array} \end{equation} \nonumber \] (or, in matrix form, $\mathbf{A}\mathbf{X}=\mathbf{0}$) is {}. It is obvious that $\mathbf{X}_0=\mathbf{0}$ satisfies this system. We call this the {} of . Any other solutions of , if they exist, are {}.

We will need the following theorems. The proofs may be found in any linear algebra text.

Throughout the rest of this chapter, transformations $\mathbf{F}$ and points $\mathbf{X}$ should be considered as written in vertical form when they occur in connection with matrix operations. However, we will write $\mathbf{X}=(x_1,x_2, \dots,x_n)$ when $\mathbf{X}$ is the argument of a function.

-.2em In Section~5.2 we defined a vector-valued function (transformation) to be continuous at $\mathbf{X}_0$ if each of its component functions is continuous at $\mathbf{X}_0$. We leave it to you to show that this implies the following theorem (Exercise~1).

This theorem is the same as Theorem~ except that the ``absolute value’’ in now stands for distance in $\R^m$ rather than $\R$.

If $\mathbf{C}$ is a constant vector, then ``$\lim_{\mathbf{X}\to\mathbf{X}_0}\mathbf{F}(\mathbf{X})=\mathbf{C}$’’ means that \[ \lim_{\mathbf{X}\to\mathbf{X}_0}|\mathbf{F}(\mathbf{X})-\mathbf{C}|=0. \nonumber \] Theorem~ implies that $\mathbf{F}$ is continuous at $\mathbf{X}_{0}$ if and only if \[ \lim_{\mathbf{X}\to\mathbf{X}_0}\mathbf{F}(\mathbf{X})=\mathbf{F}(\mathbf{X}_0). \nonumber \]

In Section~5.4 we defined a vector-valued function (transformation) to be differentiable at $\mathbf{X}_0$ if each of its components is differentiable at $\mathbf{X}_0$ (Definition~). The next theorem characterizes this property in a useful way.

Let $\mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0})$. If $\mathbf{F}$ is differentiable at $\mathbf{X}_0$, then so are $f_1$, $f_2$, , $f_m$ (Definition~). Hence, \[ \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{\dst{f_i(\mathbf{X})-f_i(\mathbf{X}_0) - \sum_{j=1}^n \frac{\partial f_i(\mathbf{X}_0)}{\partial x_j} (x_j-x_{j0})}} { |\mathbf{X}-\mathbf{X}_{0}|}=0, \quad 1\le i\le m, \nonumber \] which implies with $\mathbf{A}$ as in .

Now suppose that holds with $\mathbf{A}=[a_{ij}]$. Since each component of the vector in approaches zero as $\mathbf{X}$ approaches $\mathbf{X}_0$, it follows that \[ \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{\dst{f_i(\mathbf{X})-f_i(\mathbf{X}_0) -\dst{\sum_{j=1}^n} a_{ij} (x_j-x_{j0})}}{ |\mathbf{X}-\mathbf{X}_0|} =0,\quad 1\le i\le m, \nonumber \] so each $f_i$ is differentiable at $\mathbf{X}_0$, and therefore so is $\mathbf{F}$ (Definition~). By Theorem~, \[ a_{ij}=\frac{\partial f_i (\mathbf{X}_0)}{\partial x_j},\quad 1\le i\le m, \quad 1\le j\le n, \nonumber \] which implies .

A transformation $\mathbf{T}: \R^n\to\R^m$ of the form \[ \mathbf{T}(\mathbf{X})=\mathbf{U}+\mathbf{A}(\mathbf{X}-\mathbf{X}_0), \nonumber \] where $\mathbf{U}$ is a constant vector in $\R^m$, $\mathbf{X}_0$ is a constant vector in $\R^n$, and $\mathbf{A}$ is a constant $m \times n$ matrix, is said to be {}. Theorem~ says that if $\mathbf{F}$ is differentiable at $\mathbf{X}_0$, then $\mathbf{F}$ can be well approximated by an affine transformation.

If $\mathbf{F}=(f_1,f_2, \dots,f_m)$ is differentiable at $\mathbf{X}_0$, we define the {} $\mathbf{X}_0$ to be the linear transformation \[\begin{equation}\label{eq:6.2.4} d_{\mathbf{X}_0}\mathbf{F}=\left[\begin{array}{c} d_{\mathbf{X}_0}f_1\\ d_{\mathbf{X}_0}f_2\\ \vdots\\ d_{\mathbf{X}_0}f_m\end{array}\right]. \end{equation} \nonumber \] We call the matrix $\mathbf{A}$ in and denote it by $\mathbf{F}'(\mathbf{X}_{0})$; thus, \[\begin{equation}\label{eq:6.2.5} \mathbf{F}'(\mathbf{X}_0)=\left[\begin{array}{cccc}\dst{\frac{\partial f_1(\mathbf{X}_0)}{\partial x_1}}&\dst{\frac{\partial f_1(\mathbf{X}_0)}{\partial x_2}}& \cdots&\dst{\frac{\partial f_1(\mathbf{X}_0)}{\partial x_n}}\\ [3\jot] \dst{\frac{\partial f_2(\mathbf{X}_0)}{\partial x_1}}& \dst{\frac{\partial f_2(\mathbf{X}_0)}{\partial x_2}}&\cdots& \dst{\frac{\partial f_2(\mathbf{X}_0)}{\partial x_n}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots\\ [3\jot] \dst{\frac{\partial f_m(\mathbf{X}_0)}{\partial x_1}}& \dst{\frac{\partial f_m(\mathbf{X}_0)}{\partial x_2}}&\cdots& \dst{\frac{\partial f_m(\mathbf{X}_0)}{\partial x_n}}\end{array} \right]. \end{equation} \nonumber \]

(It is important to bear in mind that while $\mathbf{F}$ is a function from $\R^n$ to $\R^m$, $\mathbf{F'}$ is not such a function; $\mathbf{F}'$ is an $m\times n$ matrix.) From Theorem~, the differential can be written in terms of the differential matrix as \[\begin{equation}\label{eq:6.2.6} d_{\mathbf{X}_0}\mathbf{F}=\mathbf{F}'(\mathbf{X}_0)\left[\begin{array}{c} dx_1\\ dx_2\\ \vdots\\ dx_n\end{array}\right] \end{equation} \nonumber \] or, more succinctly, as \[ d_{\mathbf{X}_0}\mathbf{F}=\mathbf{F}'(\mathbf{X}_0)\,d\mathbf{X}, \nonumber \] where \[ d\mathbf{X}=\left[\begin{array}{c} dx_1\\ dx_2\\\vdots\\ dx_n\end{array} \right], \nonumber \] as defined earlier.

When it is not necessary to emphasize the particular point $\mathbf{X}_0$, we write as \[ d\mathbf{F}=\left[\begin{array}{c} df_1\\ df_2\\\vdots\\ df_m\end{array}\right], \nonumber \] as .5pc \[ \mathbf{F}'=\left[\begin{array}{cccc}\dst{\frac{\partial f_1}{\partial x_1}}&\dst{\frac{\partial f_1}{\partial x_2}}&\cdots& \dst{\frac{\partial f_1}{\partial x_n}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial x_1}}& \dst{\frac{\partial f_2}{\partial x_2}}&\cdots& \dst{\frac{\partial f_2}{\partial x_n}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots\\ [3\jot] \dst{\frac{\partial f_m}{\partial x_1}}& \dst{\frac{\partial f_m}{\partial x_2}}&\cdots& \dst{\frac{\partial f_m}{\partial x_n}}\end{array}\right], \nonumber \] .5pc and as \[ d\mathbf{F}=\mathbf{F}'\,d\mathbf{X}. \nonumber \]

With the differential notation we can rewrite as \[ \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{\mathbf{F} (\mathbf{X})-\mathbf{F}(\mathbf{X}_{0})- \mathbf{F}'(\mathbf{X}_0)(\mathbf{X}-\mathbf{X}_0)} { |\mathbf{X}-\mathbf{X}_0|}=\mathbf{0}. \nonumber \]

If $m=n$, the differential matrix is square and its determinant is called the {}. The standard notation for this determinant is \[ \frac{\partial (f_1,f_2, \dots,f_n)}{\partial (x_1,x_2, \dots,x_n)}= \left|\begin{array}{cccc}\dst{\frac{\partial f_1}{\partial x_1}}& \dst{\frac{\partial f_1}{\partial x_2}}&\cdots& \dst{\frac{\partial f_1}{\partial x_n}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial x_1}}&\dst{\frac{\partial f_2}{\partial x_2}}&\cdots&\dst{\frac{\partial f_2}{ \partial x_n}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots\\ [3\jot] \dst{\frac{\partial f_n}{\partial x_1}}&\dst{\frac{\partial f_n}{\partial x_2}}&\cdots&\dst{\frac{\partial f_n}{ \partial x_n}}\end{array}\right|. \nonumber \] We will often write the Jacobian of $\mathbf{F}$ more simply as $J(\mathbf{F})$, and its value at $\mathbf{X}_0$ as $J\mathbf{F}(\mathbf{X}_0)$.

Since an $n\times n$ matrix is nonsingular if and only if its determinant is nonzero, it follows that if $\mathbf{F}: \R^n\to \R^n$ is differentiable at $\mathbf{X}_0$, then $\mathbf{F}'(\mathbf{X}_{0})$ is nonsingular if and only if $J\mathbf{F}(\mathbf{X}_0)\ne0$. We will soon use this important fact.

We leave the proof of the following theorem to you (Exercise~).

Theorem~ and Definition~ imply the following theorem.

We say that $\mathbf{F}$ is {} on a set $S$ if $S$ is contained in an open set on which the partial derivatives in are continuous. The next three lemmas give properties of continuously differentiable transformations that we will need later.

Consider the auxiliary function \[\begin{equation} \label{eq:6.2.9} \mathbf{G}(\mathbf{X})=\mathbf{F}(\mathbf{X})-\mathbf{F}'(\mathbf{X}_0)\mathbf{X}. \end{equation} \nonumber \] The components of $\mathbf{G}$ are \[ g_i(\mathbf{X})=f_i(\mathbf{X})-\sum_{j=1}^n \frac{\partial f_i(\mathbf{X}_{0}) \partial x_j} x_j, \nonumber \] so \[ \frac{\partial g_i(\mathbf{X})}{\partial x_j}= \frac{\partial f_i(\mathbf{X})} {\partial x_j}-\frac{\partial f_i(\mathbf{X}_0)}{\partial x_j}. \nonumber \]

Thus, $\partial g_i/\partial x_j$ is continuous on $N$ and zero at $\mathbf{X}_0$. Therefore, there is a $\delta>0$ such that \[\begin{equation}\label{eq:6.2.10} \left|\frac{\partial g_i(\mathbf{X})}{\partial x_j}\right|<\frac{\epsilon}{ \sqrt{mn}}\mbox{\quad for \quad}1\le i\le m,\quad 1\le j\le n, \mbox{\quad if \quad} |\mathbf{X}-\mathbf{X}_0|<\delta. \end{equation} \nonumber \] Now suppose that $\mathbf{X}$, $\mathbf{Y}\in B_\delta(\mathbf{X}_0)$. By Theorem~, \[\begin{equation}\label{eq:6.2.11} g_i(\mathbf{X})-g_i(\mathbf{Y})=\sum_{j=1}^n \frac{\partial g_i(\mathbf{X}_i)}{\partial x_j}(x_j-y_j), \end{equation} \nonumber \] where $\mathbf{X}_i$ is on the line segment from $\mathbf{X}$ to $\mathbf{Y}$, so $\mathbf{X}_i\in B_\delta(\mathbf{X}_0)$. From , , and Schwarz’s inequality, \[ (g_i(\mathbf{X})-g_i(\mathbf{Y}))^2\le\left(\sum_{j=1}^n\left[\frac{\partial g_i (\mathbf{X}_i)}{\partial x_j}\right]^2\right) |\mathbf{X}-\mathbf{Y}|^2 <\frac{\epsilon^2}{ m} |\mathbf{X}-\mathbf{Y}|^2. \nonumber \] Summing this from $i=1$ to $i=m$ and taking square roots yields \[\begin{equation}\label{eq:6.2.12} |\mathbf{G}(\mathbf{X})-\mathbf{G}(\mathbf{Y})|<\epsilon |\mathbf{X}-\mathbf{Y}| \mbox{\quad if\quad}\mathbf{X}, \mathbf{Y}\in B_\delta(\mathbf{X}_0). \end{equation} \nonumber \] To complete the proof, we note that \[\begin{equation}\label{eq:6.2.13} \mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{Y})= \mathbf{G}(\mathbf{X})-\mathbf{G}(\mathbf{Y})+\mathbf{F}'(\mathbf{X}_0)(\mathbf{X}-\mathbf{Y}), \end{equation} \nonumber \] so and the triangle inequality imply .

Let $\mathbf{X}$ and $\mathbf{Y}$ be arbitrary points in $D_\mathbf{F}$ and let $\mathbf{G}$ be as in . From , \[\begin{equation} \label{eq:6.2.16} |\mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{Y})|\ge\big| |\mathbf{F}'(\mathbf{X}_0)(\mathbf{X} -\mathbf{Y})|-|\mathbf{G}(\mathbf{X})-\mathbf{G}(\mathbf{Y})|\big|, \end{equation} \nonumber \] Since \[ \mathbf{X}-\mathbf{Y}=[\mathbf{F}'(\mathbf{X}_0)]^{-1} \mathbf{F}'(\mathbf{X}_{0}) (\mathbf{X}-\mathbf{Y}), \nonumber \] implies that \[ |\mathbf{X}-\mathbf{Y}|\le \frac{1}{ r} |\mathbf{F}'(\mathbf{X}_0) (\mathbf{X}-\mathbf{Y}|, \nonumber \] so \[\begin{equation}\label{eq:6.2.17} |\mathbf{F}'(\mathbf{X}_0)(\mathbf{X}-\mathbf{Y})|\ge r|\mathbf{X}-\mathbf{Y}|. \end{equation} \nonumber \] Now choose $\delta>0$ so that holds. Then and imply .

See Exercise~ for a stronger conclusion in the case where $\mathbf{F}$ is linear.

On \[ S=\set{(\mathbf{X},\mathbf{Y})}{\mathbf{X},\mathbf{Y}\in D}\subset \R^{2n} \nonumber \] define \[ g(\mathbf{X},\mathbf{Y})=\left\{\casespace\begin{array}{ll} \dst{\frac{|\mathbf{F}(\mathbf{Y})- \mathbf{F}(\mathbf{X}) -\mathbf{F}'(\mathbf{X})(\mathbf{Y}-\mathbf{X})|}{ |\mathbf{Y}-\mathbf{X}|}},& \mathbf{Y}\ne\mathbf{X},\\[2\jot] 0,&\mathbf{Y}=\mathbf{X}.\end{array}\right. \nonumber \] Then $g$ is continuous for all $(\mathbf{X},\mathbf{Y})$ in $S$ such that $\mathbf{X}\ne \mathbf{Y}$. We now show that if $\mathbf{X}_0\in D$, then

\[\begin{equation}\label{eq:6.2.19} \lim_{(\mathbf{X},\mathbf{Y})\to (\mathbf{X}_0,\mathbf{X}_0)} g(\mathbf{X},\mathbf{Y})=0 =g(\mathbf{X}_0,\mathbf{X}_0); \end{equation} \nonumber \]

that is, $g$ is also continuous at points $(\mathbf{X}_0,\mathbf{X}_0)$ in $S$.

Suppose that $\epsilon>0$ and $\mathbf{X}_0\in D$. Since the partial derivatives of $f_1$, $f_2$, , $f_m$ are continuous on an open set containing $D$, there is a $\delta>0$ such that \[\begin{equation}\label{eq:6.2.20} \left|\frac{\partial f_i(\mathbf{Y})}{\partial x_j}-\frac{\partial f_i(\mathbf{X}) }{\partial x_j}\right|<\frac{\epsilon}{\sqrt{mn}}\mbox{\quad if\quad} \mathbf{X},\mathbf{Y}\in B_\delta (\mathbf{X}_0),\ 1\le i\le m,\ 1\le j\le n. \end{equation} \nonumber \] (Note that $\partial f_i/\partial x_j$ is uniformly continuous on $\overline{B_\delta(\mathbf{X}_0)}$ for $\delta$ sufficiently small, from Theorem~.) Applying Theorem~ to $f_1$, $f_2$, , $f_m$, we find that if $\mathbf{X}$, $\mathbf{Y}\in B_\delta (\mathbf{X}_0)$, then \[ f_i(\mathbf{Y})-f_i(\mathbf{X})=\sum_{j=1}^n \frac{\partial f_i(\mathbf{X}_{i})} {\partial x_j} (y_j-x_j), \nonumber \] where $\mathbf{X}_i$ is on the line segment from $\mathbf{X}$ to $\mathbf{Y}$. From this, \[\begin{eqnarray*} \left[f_i(\mathbf{Y})-f_i(\mathbf{X}) -\dst{\sum_{j=1}^n} \frac{\partial f_i(\mathbf{X})}{\partial x_j} (y_j-x_j)\right]^2 \ar=\left[\sum_{j=1}^n\left[\frac{\partial f_i(\mathbf{X}_i)}{\partial x_j}- \frac{\partial f_i(\mathbf{X})}{\partial x_j}\right] (y_j-x_j)\right]^2\\ \ar\le |\mathbf{Y}-\mathbf{X}|^2\sum_{j=1}^n \left[\frac{\partial f_i(\mathbf{X}_{i})} {\partial x_j} -\frac{\partial f_i(\mathbf{X})}{\partial x_j}\right]^2\\ \ar{}\mbox{(by Schwarz's inequality)}\\ \ar< \frac{\epsilon^2}{ m} |\mathbf{Y}-\mathbf{X}|^2\quad\mbox{\quad (by \eqref{eq:6.2.20})\quad}. \end{eqnarray*} \nonumber \] Summing from $i=1$ to $i=m$ and taking square roots yields \[ |\mathbf{F}(\mathbf{Y})-\mathbf{F}(\mathbf{X})-\mathbf{F}'(\mathbf{X}) (\mathbf{Y}-\mathbf{X})| <\epsilon |\mathbf{Y}-\mathbf{X}|\mbox{\quad if\quad} \mathbf{X},\mathbf{Y}\in B_\delta(\mathbf{X}_0). \nonumber \] This implies and completes the proof that $g$ is continuous on $S$.

Since $D$ is compact, so is $S$ (Exercise~). Therefore, $g$ is bounded on $S$ (Theorem~); thus, for some $M_1$, \[ |\mathbf{F}(\mathbf{Y})-\mathbf{F}(\mathbf{X})-\mathbf{F}'(\mathbf{X}) (\mathbf{Y} -\mathbf{X})|\le M_1|\mathbf{X}-\mathbf{Y}|\mbox{\quad if\quad} \mathbf{X},\mathbf{Y}\in D. \nonumber \] But \[\begin{equation}\label{eq:6.2.21} \begin{array}{rcl} |\mathbf{F}(\mathbf{Y})-\mathbf{F}(\mathbf{X}) |\ar\le |\mathbf{F}(\mathbf{Y})-\mathbf{F}(\mathbf{X})-\mathbf{F}'(\mathbf{X}) (\mathbf{Y}-\mathbf{X})|+|\mathbf{F}'(\mathbf{X})(\mathbf{Y}-\mathbf{X})|\\ \ar\le (M_1+\|\mathbf{F}'(\mathbf{X})\|) |(\mathbf{Y}-\mathbf{X}|. \end{array} \end{equation} \nonumber \] Since \[ \|\mathbf{F}'(\mathbf{X})\| \le\left(\sum_{i=1}^m\sum_{j=1}^n\left[\frac{\partial f_i(\mathbf{X}) }{\partial x_j}\right]^2\right)^{1/2} \nonumber \] and the partial derivatives $\{\partial f_i/\partial x_j\}$ are bounded on $D$, it follows that $\|\mathbf{F}'(\mathbf{X})\|$ is bounded on $D$; that is, there is a constant $M_2$ such that \[ \|\mathbf{F}'(\mathbf{X})\|\le M_2,\quad\mathbf{X}\in D. \nonumber \] Now implies with $M=M_1+M_2$.

By using differential matrices, we can write the chain rule for transformations in a form analogous to the form of the chain rule for real-valued functions of one variable (Theorem~).

The components of $\mathbf{H}$ are $h_1$, $h_2$, , $h_m$, where \[ h_i(\mathbf{U})=f_i(\mathbf{G}(\mathbf{U})). \nonumber \] Applying Theorem~ to $h_i$ yields \[\begin{equation}\label{eq:6.2.24} d_{\mathbf{U}_0}h_i=\sum_{j=1}^n \frac{\partial f_i(\mathbf{X}_{0})} {\partial x_j} d_{\mathbf{U}_0}g_j,\quad 1\le i\le m. \end{equation} \nonumber \]

Since \[ d_{\mathbf{U}_0}\mathbf{H}=\left[\begin{array}{c} d_{\mathbf{U}_0}h_1\\ d_{\mathbf{U}_0}h_2\\ \vdots\\ d_{\mathbf{U}_0} h_m\end{array}\right]\mbox{ \quad and\quad} d_{\mathbf{U}_0}\mathbf{G}= \left[\begin{array}{c} d_{\mathbf{U}_0}g_1\\ d_{\mathbf{U}_0}g_2\\ \vdots\\ d_{\mathbf{U}_0}g_n \end{array}\right], \nonumber \] 5pt the $m$ equations in can be written in matrix form as \[\begin{equation}\label{eq:6.2.25} d_{\mathbf{U}_0}\mathbf{H}=\mathbf{F}'(\mathbf{X}_0)d_{\mathbf{U}_0}\mathbf{G}= \mathbf{F}'(\mathbf{G}(\mathbf{U}_0)) d_{\mathbf{U}_0}\mathbf{G}. \end{equation} \nonumber \] But \[ d_{\mathbf{U}_0}\mathbf{G}=\mathbf{G}'(\mathbf{U}_0)\,d\mathbf{U}, \nonumber \] where \[ d\mathbf{U}=\left[\begin{array}{c} du_1\\ du_2\\\vdots\\ du_k\end{array}\right], \nonumber \] so can be rewritten as \[ d_{\mathbf{U}_0}\mathbf{H}= \mathbf{F}'(\mathbf{G}(\mathbf{U}_0)) \mathbf{G}'(\mathbf{U}_0)\,d\mathbf{U}. \nonumber \] On the other hand, \[ d_{\mathbf{U}_0}\mathbf{H}=\mathbf{H}'(\mathbf{U}_0)\,d\mathbf{U}. \nonumber \] Comparing the last two equations yields . Since $\mathbf{G}'(\mathbf{U}_0)$ is the matrix of $d_{\mathbf{U}_0}\mathbf{G}$ and $\mathbf{F}'(\mathbf{G}(\mathbf{U}_0))=\mathbf{F}'(\mathbf{X}_0)$ is the matrix of $d_{\mathbf{X}_0}\mathbf{F}$, Theorem~

and imply~.

So far our discussion of transformations has dealt mainly with properties that could just as well be defined and studied by considering the component functions individually. Now we turn to questions involving a transformation as a whole, that cannot be studied by regarding it as a collection of independent component functions.

In this section we restrict our attention to transformations from $\R^n$ to itself. It is useful to interpret such transformations geometrically. If $\mathbf{F}=(f_1,f_2, \dots,f_n)$, we can think of the components of \[ \mathbf{F}(\mathbf{X})=(f_1(\mathbf{X}),f_2(\mathbf{X}), \dots,f_n (\mathbf{X})) \nonumber \] as the coordinates of a point $\mathbf{U}=\mathbf{F}(\mathbf{X})$ in another ``copy’’ of $\R^n$. Thus, $\mathbf{U}=(u_1,u_2, \dots,u_n)$, with \[ u_1=f_1(\mathbf{X}),\quad u_2=f_2(\mathbf{X}),\quad\dots,\quad u_n= f_n(\mathbf{X}). \nonumber \] We say that $\mathbf{F}$ {}, and that $\mathbf{U}$ {} $\mathbf{F}$. Occasionally we will also write $\partial u_i/ \partial x_j$ to mean $\partial f_i/\partial x_j$. If $S\subset D_\mathbf{F}$, then the set \[ \mathbf{F}(S)=\set{\mathbf{U}}{\mathbf{U}=\mathbf{F}(\mathbf{X}),\,\mathbf{X}\in S} \nonumber \] is the {}.

We will often denote the components of $\mathbf{X}$ by $x$, $y$, , and the components of $\mathbf{U}$ by $u$, $v$, .

6pt

12pt

-.15em A transformation $\mathbf{F}$ is {}, or {}, if $\mathbf{F}(\mathbf{X}_1)$ and $\mathbf{F}(\mathbf{X}_2)$ are distinct whenever $\mathbf{X}_1$ and $\mathbf{X}_2$ are distinct points of $D_\mathbf{F}$. In this case, we can define a function $\mathbf{G}$ on the range \[ R(\mathbf{F})=\set{\mathbf{U}}{\mathbf{U}=\mathbf{F}(\mathbf{X})\mbox{ for some }\mathbf{X}\in D_\mathbf{F}} \nonumber \] of $\mathbf{F}$ by defining $\mathbf{G}(\mathbf{U})$ to be the unique point in $D_\mathbf{F}$ such that $\mathbf{F}(\mathbf{U}) =\mathbf{U}$. Then \[ D_\mathbf{G}=R(\mathbf{F})\mbox{\quad and\quad} R(\mathbf{G})=D_\mathbf{F}. \nonumber \] Moreover, $\mathbf{G}$ is one-to-one, \[ \mathbf{G}(\mathbf{F}(\mathbf{X}))=\mathbf{X},\quad \mathbf{X}\in D_\mathbf{F}, \nonumber \] and \[ \mathbf{F}(\mathbf{G}(\mathbf{U}))=\mathbf{U},\quad \mathbf{U}\in D_\mathbf{G}. \nonumber \] We say that $\mathbf{G}$ is the {} of $\mathbf{F}$, and write $\mathbf{G}=\mathbf{F}^{-1}$. The relation between $\mathbf{F}$ and $\mathbf{G}$ is symmetric; that is, $\mathbf{F}$ is also the inverse of $\mathbf{G}$, and we write $\mathbf{F}= \mathbf{G}^{-1}$.

The crucial difference between the transformations of Examples~ and is that the matrix of $\mathbf{L}$ is nonsingular while the matrix of $\mathbf{L}_1$ is singular. Thus, $\mathbf{L}$ (see ) can be written as \[\begin{equation} \label{eq:6.3.5} \left[\begin{array}{c} u\\ v\end{array}\right]=\left[\begin{array}{rr} 1&-1\\ 1& 1\end{array}\right]\left[\begin{array}{c} x\\ y\end{array}\right], \end{equation} \nonumber \] where the matrix has the inverse \[ \left[\begin{array}{rr}\frac{1}{2}&\frac{1}{2}\\ [3\jot] -\frac{1}{2}&\frac{1}{2}\end{array}\right]. \nonumber \] (Verify.) Multiplying both sides of by this matrix yields \[ \left[\begin{array}{rr}\frac{1}{2}&\frac{1}{2}\\ [3\jot] -\frac{1}{2}&\frac{1}{2}\end{array}\right] \left[\begin{array}{c} u\\ v\end{array}\right]=\left[\begin{array}{c} x\\ y\end{array} \right], \nonumber \] which is equivalent to .

Since the matrix \[ \left[\begin{array}{cc} 1&1\\ 2&2\end{array}\right] \nonumber \] of $\mathbf{L}_1$ is singular, cannot be solved uniquely for $(x,y)$ in terms of $(u,v)$. In fact, it cannot be solved at all unless $v=2u$.

The following theorem settles the question of invertibility of linear transformations from $\R^n$ to $\R^n$. We leave the proof to you (Exercise~).

-.5em We will now briefly review polar coordinates, which we will use in some of the following examples.

The coordinates of any point $(x,y)$ can be written in infinitely many ways as \[\begin{equation} \label{eq:6.3.6} x=r\cos\theta,\quad y=r\sin\theta, \end{equation} \nonumber \]

where \[ r^2=x^2+y^2 \nonumber \] and, if $r>0$, $\theta$ is the angle from the $x$-axis to the line segment from $(0,0)$ to $(x,y)$, measured counterclockwise (Figure~).

1pc 6pt

12pt 1pc

For each $(x,y)\ne (0,0)$ there are infinitely many values of $\theta$, differing by integral multiples of $2\pi$, that satisfy . If $\theta$ is any of these values, we say that $\theta$ is an {} of $(x,y)$, and write \[ \theta=\arg(x,y). \nonumber \] By itself, this does not define a function. However, if $\phi$ is an arbitrary fixed number, then \[ \theta=\arg(x,y),\quad \phi\le\theta<\phi+2\pi, \nonumber \] does define a function, since every half-open interval $[\phi,\phi+2\pi)$ contains exactly one argument of $(x,y)$.

We do not define $\arg(0,0)$, since places no restriction on $\theta$ if $(x,y)=(0,0)$ and therefore $r=0$.

The transformation \[ \left[\begin{array}{c} r\\\theta\end{array}\right]=\mathbf{G}(x,y)= \left[\begin{array}{c}\sqrt{x^2+y^2}\\ [3\jot] \arg(x,y)\end{array}\right],\quad \phi\le\arg(x,y)<\phi+2\pi, \nonumber \] is defined and one-to-one on \[ D_\mathbf{G}=\set{(x,y)}{(x,y)\ne (0,0)}, \nonumber \] and its range is \[ R(\mathbf{G})=\set{(r,\theta)}{ r>0,\, \phi\le\theta<\phi+2\pi}. \nonumber \]

For example, if $\phi=0$, then \[ \mathbf{G}(1,1)=\left[\begin{array}{c}\sqrt{2}\\ [3\jot] \dst{\frac{\pi}{4}}\end{array}\right], \nonumber \] since $\pi/4$ is the unique argument of $(1,1)$ in $[0,2\pi)$. If $\phi=\pi$, then \[ \mathbf{G}(1,1)=\left[\begin{array}{c}\sqrt{2}\\ [3\jot] \dst{\frac{9\pi}{4}}\end{array}\right], \nonumber \] since $9\pi/4$ is the unique argument of $(1,1)$ in $[\pi,3\pi)$.

If $\arg(x_0,y_0)=\phi$, then $(x_0,y_0)$ is on the half-line shown in Figure~ and $\mathbf{G}$ is not continuous at $(x_0,y_0)$, since every neighborhood of $(x_0,y_0)$ contains points $(x,y)$ for which the second component of $\mathbf{G}(x,y)$ is arbitrarily close to $\phi+2\pi$, while the second component of $\mathbf{G}(x_0,y_0)$ is $\phi$. We will show later, however, that $\mathbf{G}$ is continuous, in fact, continuously differentiable, on the plane with this half-line deleted.

6pt

12pt

A transformation $\mathbf{F}$ may fail to be one-to-one, but be one-to-one on a subset $S$ of $D_\mathbf{F}$. By this we mean that $\mathbf{F}(\mathbf{X}_1)$ and $\mathbf{F}(\mathbf{X}_2)$ are distinct whenever $\mathbf{X}_{1}$ and $\mathbf{X}_2$ are distinct points of $S$. In this case, $\mathbf{F}$ is not invertible, but if $\mathbf{F}_S$ is defined on $S$ by \[ \mathbf{F}_S(\mathbf{X})=\mathbf{F}(\mathbf{X}),\quad \mathbf{X}\in S, \nonumber \] and left undefined for $\mathbf{X}\not\in S$, then $\mathbf{F}_S$ is invertible. We say that $\mathbf{F}_S$ is the {}, and that $\mathbf{F}^{-1}_S$ is the {}. The domain of $\mathbf{F}^{-1}_S$ is $\mathbf{F}(S)$.

If $\mathbf{F}$ is one-to-one on a neighborhood of $\mathbf{X}_0$, we say that $\mathbf{F}$ is {}. If this is true for every $\mathbf{X}_0$ in a set $S$, then $\mathbf{F}$ is {}.

The question of invertibility of an arbitrary transformation $\mathbf{F}: \R^n\to \R^n$ is too general to have a useful answer. However, there is a useful and easily applicable sufficient condition which implies that one-to-one restrictions of continuously differentiable transformations have continuously differentiable inverses.

To motivate our study of this question, let us first consider the linear transformation \[ \mathbf{F}(\mathbf{X})=\mathbf{A}\mathbf{X}=\left[\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\\vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn}\end{array}\right] \left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\end{array}\right]. \nonumber \] From Theorem~, $\mathbf{F}$ is invertible if and only if $\mathbf{A}$ is nonsingular, in which case $R(\mathbf{F})=\R^n$ and \[ \mathbf{F}^{-1}(\mathbf{U})=\mathbf{A}^{-1}\mathbf{U}. \nonumber \] Since $\mathbf{A}$ and $\mathbf{A}^{-1}$ are the differential matrices of $\mathbf{F}$ and $\mathbf{F}^{-1}$, respectively, we can say that a linear transformation is invertible if and only if its differential matrix $\mathbf{F}'$ is nonsingular, in which case the differential matrix of $\mathbf{F}^{-1}$ is given by \[ (\mathbf{F}^{-1})'=(\mathbf{F}')^{-1}. \nonumber \] Because of this, it is tempting to conjecture that if $\mathbf{F}: \R^n\to \R^n$ is continuously differentiable and $\mathbf{A}'(\mathbf{X})$ is nonsingular, or, equivalently, $J\mathbf{F}(\mathbf{X})\ne0$, for $\mathbf{X}$ in a set $S$, then $\mathbf{F}$ is one-to-one on $S$. However, this is false. For example, if \[ \mathbf{F}(x,y)=\left[\begin{array}{c} e^x\cos y\\ e^x\sin y\end{array}\right], \nonumber \] then \[\begin{equation} \label{eq:6.3.19} J\mathbf{F}(x,y)=\left|\begin{array}{cr} e^x\cos y&-e^x\sin y\\ e^x\sin y&e^x\cos y\end{array}\right|=e^{2x}\ne0, \end{equation} \nonumber \] but $\mathbf{F}$ is not one-to-one on $\R^2$ (Example~). The best that can be said in general is that if $\mathbf{F}$ is continuously differentiable and $J\mathbf{F}(\mathbf{X}) \ne0$ in an open set $S$, then $\mathbf{F}$ is locally invertible on $S$, and the local inverses are continuously differentiable. This is part of the inverse function theorem, which we will prove presently. First, we need the following definition.

We first show that if $\mathbf{X}_{0} \in S$, then a neighborhood of $\mathbf{F}(\mathbf{X}_0)$ is in $\mathbf{F}(S)$. This implies that $\mathbf{F}(S)$ is open.

Since $S$ is open, there is a $\rho>0$ such that $\overline{B_\rho(\mathbf{X}_0)}\subset S$. Let $B$ be the boundary of $B_\rho(\mathbf{X}_0)$; thus, \[\begin{equation} \label{eq:6.3.20} B=\set\mathbf{X}{|\mathbf{X}-\mathbf{X}_0|=\rho}. \end{equation} \nonumber \] The function \[ \sigma(\mathbf{X})=|\mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{X}_0)| \nonumber \] is continuous on $S$ and therefore on $B$, which is compact. Hence, by Theorem~, there is a point $\mathbf{X}_1$ in $B$ where $\sigma(\mathbf{X})$ attains its minimum value, say $m$, on $B$. Moreover, $m>0$, since $\mathbf{X}_1\ne\mathbf{X}_0$ and $\mathbf{F}$ is one-to-one on $S$. Therefore, \[\begin{equation} \label{eq:6.3.21} |\mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{X}_0)|\ge m>0\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|=\rho. \end{equation} \nonumber \] The set \[ \set{\mathbf{U}}{|\mathbf{U}-\mathbf{F}(\mathbf{X}_0)|<m/2} \nonumber \] is a neighborhood of $\mathbf{F}(\mathbf{X}_0)$. We will show that it is a subset of $\mathbf{F}(S)$. To see this, let $\mathbf{U}$ be a fixed point in this set; thus, \[\begin{equation} \label{eq:6.3.22} |\mathbf{U}-F(\mathbf{X}_0)|<m/2. \end{equation} \nonumber \] Consider the function \[ \sigma_1(\mathbf{X})=|\mathbf{U}-\mathbf{F}(\mathbf{X})|^2, \nonumber \] which is continuous on $S$. Note that \[\begin{equation} \label{eq:6.3.23} \sigma_1(\mathbf{X})\ge\frac{m^2}{4}\mbox{\quad if \quad} |\mathbf{X}-\mathbf{X}_0|=\rho, \end{equation} \nonumber \] since if $|\mathbf{X}-\mathbf{X}_0|=\rho$, then \[\begin{eqnarray*} |\mathbf{U}-\mathbf{F}(\mathbf{X})|\ar=|(\mathbf{U}-\mathbf{F}(\mathbf{X}_0)) +(\mathbf{F}(\mathbf{X}_0)-\mathbf{F}(\mathbf{X}))| \\ \ar\ge \big||\mathbf{F}(\mathbf{X}_0)-\mathbf{F}(\mathbf{X})| -|\mathbf{U}-\mathbf{F}(\mathbf{X}_0)|\big|\\ \ar\ge m-\frac{m}{2}=\frac{m}{2}, \end{eqnarray*} \nonumber \] from and .

Since $\sigma_1$ is continuous on $S$, $\sigma_1$ attains a minimum value $\mu$ on the compact set $\overline{B_\rho(\mathbf{X}_0)}$ (Theorem~); that is, there is an $\overline{\mathbf{X}}$ in $\overline{B_\rho(\mathbf{X}_0)}$ such that \[ \sigma_1(\mathbf{X})\ge\sigma_1(\overline{\mathbf{X}})=\mu, \quad\mathbf{X}\in\overline{B_\rho (\mathbf{X}_0)}. \nonumber \] Setting $\mathbf{X}=\mathbf{X}_0$, we conclude from this and that \[ \sigma_1(\overline{\mathbf{X}})=\mu\le\sigma_1(\mathbf{X}_0)<\frac{m^2}{4}. \nonumber \] Because of and , this rules out the possibility that $\overline{\mathbf{X}}\in B$, so $\overline{\mathbf{X}}\in B_\rho(\mathbf{X}_0)$.

Now we want to show that $\mu=0$; that is, $\mathbf{U}= \mathbf{F}(\overline{\mathbf{X}})$. To this end, we note that $\sigma_1(\mathbf{X})$ can be written as \[ \sigma_1(\mathbf{X})=\sum^n_{j=1}(u_j-f_j(\mathbf{X}))^2, \nonumber \] -.4em so $\sigma_1$ is differentiable on $B_p(\mathbf{X}_{0})$. Therefore, the first partial derivatives of $\sigma_1$ are all zero at the local minimum point $\overline{\mathbf{X}}$ (Theorem~), so 3pt \[ \sum^n_{j=1}\frac{\partial f_j(\overline{\mathbf{X}})}{\partial x_i} (u_j-f_j(\overline{\mathbf{X}}))=0,\quad 1\le i\le n, \nonumber \] or, in matrix form, \[ \mathbf{F}'(\overline{\mathbf{X}})(\mathbf{U}-\mathbf{F}(\overline{\mathbf{X}}))=0. \nonumber \] Since $\mathbf{F}'(\overline{\mathbf{X}})$ is nonsingular this implies that $\mathbf{U}= \mathbf{F}(\overline{\mathbf{X}})$ (Theorem~). Thus, we have shown that every $\mathbf{U}$ that satisfies is in $\mathbf{F}(S)$. Therefore, since $\mathbf{X}_0$ is an arbitrary point of $S$, $\mathbf{F}(S)$ is open.

Next, we show that $\mathbf{G}$ is continuous on $\mathbf{F}(S)$. Suppose that $\mathbf{U}_0\in\mathbf{F}(S)$ and $\mathbf{X}_0$ is the unique point in $S$ such that $\mathbf{F}(\mathbf{X}_0)=\mathbf{U}_0$. Since $\mathbf{F}'(\mathbf{X}_{0})$ is invertible, Lemma~ implies that there is a $\lambda>0$ and an open neighborhood $N$ of $\mathbf{X}_0$ such that $N\subset S$ and \[\begin{equation} \label{eq:6.3.24} |\mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{X}_0)|\ge\lambda |\mathbf{X}-\mathbf{X}_0| \mbox{\quad if\quad}\mathbf{X}\in N. \end{equation} \nonumber \] (Exercise~ also implies this.) Since $\mathbf{F}$ satisfies the hypotheses of the present theorem on $N$, the first part of this proof shows that $\mathbf{F}(N)$ is an open set containing $\mathbf{U}_0=\mathbf{F} (\mathbf{X}_0)$. Therefore, there is a $\delta>0$ such that $\mathbf{X}=\mathbf{G}(\mathbf{U})$ is in $N$ if $\mathbf{U}\in B_\delta(\mathbf{U}_{0})$. Setting $\mathbf{X}=\mathbf{G}(\mathbf{U})$ and $\mathbf{X}_0 = \mathbf{G}(\mathbf{U}_0)$ in yields \[ |\mathbf{F}(\mathbf{G}(\mathbf{U}))-\mathbf{F}(\mathbf{G}(\mathbf{U}_0)) |\ge\lambda |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|\mbox{\quad if \quad} \mathbf{U}\in B_\delta (\mathbf{U}_0). \nonumber \] Since $\mathbf{F}(\mathbf{G}(\mathbf{U}))=\mathbf{U}$, this can be rewritten as \[\begin{equation} \label{eq:6.3.25} |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|\le\frac{1}{\lambda} |\mathbf{U}- \mathbf{U}_0|\mbox{\quad if\quad}\mathbf{U}\in B_\delta(\mathbf{U}_0), \end{equation} \nonumber \] which means that $\mathbf{G}$ is continuous at $\mathbf{U}_0$. Since $\mathbf{U}_0$ is an arbitrary point in $\mathbf{F}(S)$, it follows that $\mathbf{G}$ is continous on $\mathbf{F}(S)$.

We will now show that $\mathbf{G}$ is differentiable at $\mathbf{U}_0$. Since \[ \mathbf{G}(\mathbf{F}(\mathbf{X}))=\mathbf{X},\quad\mathbf{X}\in S, \nonumber \] the chain rule (Theorem~) implies that {} $\mathbf{G}$ is differentiable at $\mathbf{U}_0$, then \[ \mathbf{G}'(\mathbf{U}_0)\mathbf{F}'(\mathbf{X}_0)=\mathbf{I} \nonumber \]

(Example~). Therefore, if $\mathbf{G}$ is differentiable at $\mathbf{U}_0$, the differential matrix of $\mathbf{G}$ must be \[ \mathbf{G}'(\mathbf{U}_0)=[\mathbf{F}'(\mathbf{X}_0)]^{-1}, \nonumber \] so to show that $\mathbf{G}$ is differentiable at $\mathbf{U}_0$, we must show that if \[\begin{equation} \label{eq:6.3.26} \mathbf{H}(\mathbf{U})= \frac{\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)- [\mathbf{F}'(\mathbf{X} _0)]^{-1} (\mathbf{U}-\mathbf{U}_0)}{ |\mathbf{U}-\mathbf{U}_0|}\quad (\mathbf{U}\ne\mathbf{U}_0), \end{equation} \nonumber \] then \[\begin{equation} \label{eq:6.3.27} \lim_{\mathbf{U}\to\mathbf{U}_0}\mathbf{H}(\mathbf{U})=\mathbf{0}. \end{equation} \nonumber \]

Since $\mathbf{F}$ is one-to-one on $S$ and $\mathbf{F} (\mathbf{G}(\mathbf{U})) =\mathbf{U}$, it follows that if $\mathbf{U}\ne\mathbf{U}_0$, then $\mathbf{G}(\mathbf{U})\ne\mathbf{G}(\mathbf{U}_0)$. Therefore, we can multiply the numerator and denominator of by $|\mathbf{G}(\mathbf{U}) -\mathbf{G}(\mathbf{U}_0)|$ to obtain \[ \begin{array}{rcl} \mathbf{H}(\mathbf{U})\ar= \dst\frac{|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_{0}|} {|\mathbf{U}-\mathbf{U}_0|} \left(\frac{\mathbf{G}(\mathbf{U})-\mathbf{G} (\mathbf{U}_0)- \left[\mathbf{F}'(\mathbf{X}_{0}) \right]^{-1}(\mathbf{U}-\mathbf{U}_0)} {|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}\right)\\\\ \ar=-\dst\frac{|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}{ |\mathbf{U}-\mathbf{U}_0|} \left[\mathbf{F}'(\mathbf{X}_0)\right]^{-1} \left(\frac{\mathbf{U}-\mathbf{U}_0-\mathbf{F}'(\mathbf{X}_0) (\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)) }{|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}\right) \end{array} \nonumber \] if $0<|\mathbf{U}-\mathbf{U}_0|<\delta$. Because of , this implies that \[ |\mathbf{H}(\mathbf{U})|\le\frac{1}{\lambda} \|[\mathbf{F}'(\mathbf{X}_0)]^{-1}\| \left|\frac{\mathbf{U}-\mathbf{U}_0-\mathbf{F}'(\mathbf{X}_0) (\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0))}{ |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}\right| \nonumber \] if $0<|\mathbf{U}-\mathbf{U}_0|<\delta$. Now let \[ \mathbf{H}_1(\mathbf{U})=\frac{\mathbf{U}-\mathbf{U}_0-\mathbf{F}'(\mathbf{X}_0) (\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0))}{ |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|} \nonumber \] To complete the proof of , we must show that \[\begin{equation} \label{eq:6.3.28} \lim_{\mathbf{U}\to\mathbf{U}_0}\mathbf{H}_1(\mathbf{U})=\mathbf{0}. \end{equation} \nonumber \] Since $\mathbf{F}$ is differentiable at $\mathbf{X}_0$, we know that if \[ \mathbf{H}_2(\mathbf{X})= \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{\mathbf{F}(\mathbf{X})-\mathbf{F}(\mathbf{X}_0)-\mathbf{F}'(\mathbf{X}_0) (\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|}, \nonumber \] then \[\begin{equation} \label{eq:6.3.29} \lim_{\mathbf{X}\to\mathbf{X}_0}\mathbf{H}_2(\mathbf{X})=\mathbf{0}. \end{equation} \nonumber \] Since $\mathbf{F}(\mathbf{G}(\mathbf{U}))=\mathbf{U}$ and $\mathbf{X}_0= \mathbf{G}(\mathbf{U}_0)$, \[ \mathbf{H}_1(\mathbf{U})=\mathbf{H}_2(\mathbf{G}(\mathbf{U})). \nonumber \]

Now suppose that $\epsilon>0$. From , there is a $\delta_1>0$ such that \[\begin{equation} \label{eq:6.3.30} |\mathbf{H}_2(\mathbf{X})|<\epsilon\mbox{\quad if \quad} 0< |\mathbf{X}-\mathbf{X}_{0}| =|\mathbf{X}-\mathbf{G}(\mathbf{U}_0)|<\delta_1. \end{equation} \nonumber \] Since $\mathbf{G}$ is continuous at $\mathbf{U}_0$, there is a $\delta_2\in(0,\delta)$ such that \[ |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|<\delta_1\mbox{\quad if \quad} 0<|\mathbf{U}-\mathbf{U}_0|<\delta_2. \nonumber \] This and imply that \[ |\mathbf{H}_1(\mathbf{U})|=|\mathbf{H}_2(\mathbf{G}(\mathbf{U}))|<\epsilon \mbox{\quad if \quad} 0<|\mathbf{U}-\mathbf{U}_0|<\delta_2. \nonumber \] Since this implies , $\mathbf{G}$ is differentiable at $\mathbf{X}_0$.

Since $\mathbf{U}_0$ is an arbitrary member of $\mathbf{F}(N)$, we can now drop the zero subscript and conclude that $\mathbf{G}$ is continuous and differentiable on $\mathbf{F}(N)$, and \[ \mathbf{G}'(\mathbf{U})=[\mathbf{F}'(\mathbf{X})]^{-1},\quad\mathbf{U}\in\mathbf{F}(N). \nonumber \] To see that $\mathbf{G}$ is on $\mathbf{F}(N)$, we observe that by Theorem~, each entry of $\mathbf{G}'(\mathbf{U})$ (that is, each partial derivative $\partial g_i(\mathbf{U})/\partial u_j$, $1\le i, j\le n$) can be written as the ratio, with nonzero denominator, of determinants with entries of the form \[\begin{equation} \label{eq:6.3.31} \frac{\partial f_r(\mathbf{G}(\mathbf{U}))}{\partial x_s}. \end{equation} \nonumber \] Since $\partial f_r/\partial x_s$ is continuous on $N$ and $\mathbf{G}$ is continuous on $\mathbf{F}(N)$, Theorem~ implies that is continuous on $\mathbf{F}(N)$. Since a determinant is a continuous function of its entries, it now follows that the entries of $\mathbf{G}'(\mathbf{U})$ are continuous on $\mathbf{F}(N)$.

-.4em If $\mathbf{F}$ is regular on an open set $S$, we say that $\mathbf{F}^{-1}_S$ is a {} $\mathbf{F}^{-1}$. (This is a convenient terminology but is not meant to imply that $\mathbf{F}$ actually has an inverse.) From this definition, it is possible to define a branch of $\mathbf{F}^{-1}$ on a set $T \subset R(\mathbf{F})$ if and only if $T=\mathbf{F}(S)$, where $\mathbf{F}$ is regular on $S$. There may be open subsets of $R(\mathbf{F})$ that do not have this property, and therefore no branch of $\mathbf{F}^{-1}$ can be defined on them. It is also possible that $T=\mathbf{F}(S_1)= \mathbf{F}(S_2)$, where $S_1$ and $S_2$ are distinct subsets of $D_\mathbf{F}$. In this case, more than one branch of $\mathbf{F}^{-1}$ is defined on $T$. Thus, we saw in Example~ that two branches of $\mathbf{F}^{-1}$ may be defined on a set $T$. In Example~ infinitely many branches of $\mathbf{F}^{-1}$ are defined on the same set.

It is useful to define branches of the argument To do this, we think of the relationship between polar and rectangular coordinates in terms of the transformation \[\begin{equation} \label{eq:6.3.32} \left[\begin{array}{c} x\\ y\end{array}\right]=\mathbf{F}(r,\theta)= \left[\begin{array}{c} r \cos\theta\\ r\sin\theta\end{array}\right], \end{equation} \nonumber \] where for the moment we regard $r$ and $\theta$ as rectangular coordinates of a point in an $r\theta$-plane. Let $S$ be an open subset of the right half of this plane (that is, $S\subset\set{(r,\theta)}{r>0}$)

that does not contain any pair of points $(r,\theta)$ and $(r,\theta+2k\pi)$, where $k$ is a nonzero integer. Then $\mathbf{F}$ is one-to-one and continuously differentiable on $S$, with \[\begin{equation} \label{eq:6.3.33} \mathbf{F}'(r,\theta)=\left[\begin{array}{rr}\cos\theta&-r\sin\theta\\\sin\theta& r\cos\theta\end{array}\right] \end{equation} \nonumber \] and \[\begin{equation} \label{eq:6.3.34} J\mathbf{F}(r,\theta)=r>0,\quad (r,\theta)\in S. \end{equation} \nonumber \] Hence, $\mathbf{F}$ is regular on $S$. Now let $T=\mathbf{F}(S)$, the set of points in the $xy$-plane with polar coordinates in $S$. Theorem~ states that $T$ is open and $\mathbf{F}_S$ has a continuously differentiable inverse (which we denote by $\mathbf{G}$, rather than $\mathbf{F}^{-1}_S$, for typographical reasons) \[ \left[\begin{array}{c} r\\\theta\end{array}\right]= \mathbf{G}(x,y)=\left[\begin{array}{c} \sqrt{x^2+y^2}\\ [3\jot]\arg_S(x,y)\end{array}\right],\quad (x,y)\in T, \nonumber \] where $\arg_S(x,y)$ is the unique value of $\arg(x,y)$ such that \[ (r,\theta)=\left(\sqrt{x^2+y^2},\,\arg_S(x,y)\right)\in S. \nonumber \] We say that $\arg_S(x,y)$ is a {}. Theorem~ also implies that \[\begin{eqnarray*} \mathbf{G}'(x,y)=\left[ \mathbf{F}'(r,\theta)\right]^{-1}\ar=\left[\begin{array}{rr}\cos\theta& \sin\theta\\ \dst{-\frac{\sin\theta}{ r}}&\dst{\frac{\cos\theta}{ r}}\end{array}\right]\mbox{\quad (see \eqref{eq:6.3.33})}\\ \ar=\left[\begin{array}{rr}\dst{\frac{x}{\sqrt{x^2+y^2}}}& \dst{\frac{y}{\sqrt{x^2+y^2}}}\\ [3\jot] \dst{-\frac{y}{ x^2+y^2}}&\dst{\frac{x}{ x^2+y^2}} \end{array}\right] \mbox{\quad (see \eqref{eq:6.3.32})}. \end{eqnarray*} \nonumber \] 1pc Therefore, \[\begin{equation} \label{eq:6.3.35} \frac{\partial\arg_S(x,y)}{\partial x}=-\frac{y}{ x^2+y^2},\quad \frac{\partial\arg_S(x,y)}{\partial y}=\frac{x}{ x^2+y^2}. \end{equation} \nonumber \]

A branch of $\arg(x,y)$ can be defined on an open set $T$ of the $xy$-plane if and only if the polar coordinates of the points in $T$ form an open subset of the $r\theta$-plane that does not intersect the $\theta$-axis or contain any two points of the form $(r,\theta)$ and $(r,\theta+2k\pi)$, where $k$ is a nonzero integer. No subset containing the origin $(x,y)=(0,0)$ has this property, nor does any deleted neighborhood of the origin (Exercise~), so there are open sets on which no branch of the argument can be defined. However, if one branch can be defined on $T$, then so can infinitely many others. (Why?) All branches of $\arg(x,y)$ have the same partial derivatives, given in .

We leave it to you (Exercise~) to verify that and can also be obtained by differentiating directly.

\begin{example}\rm If \[ \left[\begin{array}{c} u\\ v\end{array}\right]= \mathbf{F}(x,y)=\left[\begin{array}{c} e^x\cos y\\ e^x\sin y\end{array}\right] \nonumber \] (Example~), we can also define a branch $\mathbf{G}$ of $\mathbf{F}^{-1}$ on any subset $T$ of the $uv$-plane on which a branch of $\arg(u,v)$ can be defined, and $\mathbf{G}$ has the form \[\begin{equation} \label{eq:6.3.39} \left[\begin{array}{r} x\\ y\end{array}\right]= \mathbf{G}(u,v)=\left[\begin{array}{c} \log(u^2+v^2)^{1/2}\\\arg(u,v)\end{array}\right]. \end{equation} \nonumber \] Since the branches of the argument differ by integral multiples of $2\pi$, implies that if $\mathbf{G}_1$ and $\mathbf{G}_2$ are branches of $\mathbf{F}^{-1}$, both defined on $T$, then \[ \mathbf{G}_1(u,v)-\mathbf{G}_2(u,v)=\left[\begin{array}{c} 0 \\ 2k\pi\end{array}\right] \mbox{\quad ($k=$ integer)}. \nonumber \]

From Theorem~, \[\begin{eqnarray*} \mathbf{G}'(u,v)=\left[ \mathbf{F}'(x,y)\right]^{-1}\ar=\left[\begin{array}{rr} e^x\cos y &-e^x\sin y\\ e^x\sin y&e^x\cos y\end{array}\right]^{-1}\\[2\jot] \ar=\left[\begin{array}{rr}e^{-x}\cos y&e^{-x}\sin y\\ -e^{-x}\sin y&e^{-x}\cos y\end{array}\right]. \end{eqnarray*} \nonumber \] Substituting for $x$ and $y$ in terms of $u$ and $v$ from , we find that \[\begin{eqnarray*} \frac{\partial x}{\partial u}\ar=\frac{\partial y}{\partial v}= e^{-x}\cos y=e^{-2x}u=\frac{u}{u^2+v^2}\\ \arraytext{and}\\ \frac{\partial x}{\partial v}\ar=-\frac{\partial y}{\partial u}= e^{-x}\sin y=e^{-2x}v=\frac{v}{u^2+v^2}. \end{eqnarray*} \nonumber \] \end{example}

Examples~ and show that a continuously differentiable function $\mathbf{F}$ may fail to have an inverse on a set $S$ even if $J\mathbf{F}(\mathbf{X})\ne0$ on $S$. However, the next theorem shows that in this case $\mathbf{F}$ is locally invertible on $S$.

\begin{theorem} [The Inverse Function Theorem] Let $\mathbf{F}: \R^n\to \R^n$ be continuously differentiable on an open set $S,$ and suppose that $J\mathbf{F}(\mathbf{X})\ne0$ on $S.$ Then$,$ if $\mathbf{X}_0\in S,$ there is an open neighborhood $N$ of $\mathbf{X}_0$ on which $\mathbf{F}$ is regular$.$ Moreover$,$ $\mathbf{F}(N)$ is open and $\mathbf{G}= \mathbf{F}^{-1}_N$ is continuously differentiable on $\mathbf{F}(N),$ with \[ \mathbf{G}'(\mathbf{U})=\left[\mathbf{F}'(\mathbf{X})\right]^{-1}\quad \mbox{ $($where $\mathbf{U}=\mathbf{F}(\mathbf{X}) \nonumber \])$},(N). $$ \end{theorem}

Lemma~ implies that there is an open neighborhood $N$ of $\mathbf{X}_0$ on which $\mathbf{F}$ is one-to-one. The rest of the conclusions then follow from applying Theorem~ to $\mathbf{F}$ on $N$.

By continuity, since $J\mathbf{F}'(\mathbf{X}_0)\ne0$, $J\mathbf{F}'(\mathbf{X})$ is nonzero for all $\mathbf{X}$ in some open neighborhood $S$ of $\mathbf{X}_0$. Now apply Theorem~.

Theorem~ and imply that the transformation is locally invertible on $S=\set{(r,\theta)}{ r>0}$, which means that it is possible to define a branch of $\arg(x,y)$ in a neighborhood of any point $(x_0,y_0)\ne (0,0)$. It also implies, as we have already seen, that

the transformation of Example~ is locally invertible everywhere except at $(0,0)$, where its Jacobian equals zero, and the transformation of Example~ is locally invertible everywhere.

In this section we consider transformations from $\R^{n+m}$ to $\R^m$. It will be convenient to denote points in $\R^{n+m}$ by \[ (\mathbf{X},\mathbf{U})=(x_1,x_2, \dots,x_n, u_1,u_2, \dots,u_m). \nonumber \] We will often denote the components of $\mathbf{X}$ by $x$, $y$, , and the components of $\mathbf{U}$ by $u$, $v$, .

To motivate the problem we are interested in, we first ask whether the linear system of $m$ equations in $m+n$ variables \[\begin{equation} \label{eq:6.4.1} \begin{array}{rcl} a_{11}x_1\hspace*{.11em}+\hspace*{.11em}a_{12}x_2\hspace*{.11em}+\hspace*{.11em}\cdots\hspace*{.11em}+\hspace*{.11em}a_{1n}x_n\hspace*{.11em}+\hspace*{.11em}b_{11}u_1\hspace*{.11em}+\hspace*{.11em}b_{12}u_2\hspace*{.11em}+\hspace*{.11em}\cdots\hspace*{.11em}+\hspace*{.11em} b_{1m}u_m\ar=0\\ a_{21}x_1\hspace*{.11em}+\hspace*{.11em}a_{22}x_2\hspace*{.11em}+\hspace*{.11em}\cdots\hspace*{.11em}+\hspace*{.11em}a_{2n}x_n\hspace*{.11em}+\hspace*{.11em}b_{21}u_1\hspace*{.11em}+\hspace*{.11em}b_{22}u_x\hspace*{.11em}+\hspace*{.11em}\cdots\hspace*{.11em}+\hspace*{.11em} b_{2m}u_m\ar=0\\&\vdots& \\ a_{m1}x_1+a_{m2}x_2+\cdots+a_{mn}x_n+b_{m1}u_1+b_{m2}u_2+\cdots+ b_{mm}u_m\ar=0\end{array} \end{equation} \nonumber \]

determines $u_1$, $u_2$, , $u_m$ uniquely in terms of $x_1$, $x_2$, , $x_n$. By rewriting the system in matrix form as \[ \mathbf{AX}+\mathbf{BU}=\mathbf{0}, \nonumber \] where \[ \mathbf{A}=\left[\begin{array}{cccc} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{m2}&\cdots&a_{mn}\end{array}\right],\quad \mathbf{B}=\left[\begin{array}{cccc} b_{11}&b_{12}&\cdots&b_{1m}\\ b_{21}&b_{22}&\cdots&b_{2m}\\ \vdots&\vdots&\ddots&\vdots\\ b_{m1}&b_{m2}&\cdots&b_{mm}\end{array}\right], \nonumber \] \[ \mathbf{X}=\left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\end{array}\right], \mbox{\quad and \quad} \mathbf{U}=\left[\begin{array}{c} u_1\\ u_2\\\vdots\\ u_m\end{array}\right], \nonumber \] we see that can be solved uniquely for $\mathbf{U}$ in terms of $\mathbf{X}$ if the square matrix $\mathbf{B}$ is nonsingular. In this case the solution is \[ \mathbf{U}=-\mathbf{B}^{-1}\mathbf{AX}. \nonumber \] For our purposes it is convenient to restate this: If \[\begin{equation} \label{eq:6.4.2} \mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{AX}+\mathbf{BU}, \end{equation} \nonumber \] where $\mathbf{B}$ is nonsingular, then the system \[ \mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{0} \nonumber \] determines $\mathbf{U}$ as a function of $\mathbf{X}$, for all $\mathbf{X}$ in $\R^n$.

Notice that $\mathbf{F}$ in is a linear transformation. If $\mathbf{F}$ is a more general transformation from $\R^{n+m}$ to $\R^m$, we can still ask whether the system \[ \mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{0}, \nonumber \] or, in terms of components, \[\begin{eqnarray*} f_1(x_1,x_2, \dots,x_n,u_1,u_2, \dots,u_m)\ar=0\\ f_2(x_1,x_2, \dots,x_n,u_1,u_2, \dots,u_m)\ar=0\\ &\vdots& \\ f_m(x_1,x_2, \dots,x_n, u_1,u_2, \dots,u_m)\ar=0, \end{eqnarray*} \nonumber \] can be solved for $\mathbf{U}$ in terms of $\mathbf{X}$. However, the situation is now more complicated, even if $m=1$. For example, suppose that $m=1$ and \[ f(x,y,u)=1-x^2-y^2-u^2. \nonumber \]

If $x^2+y^2>1$, then no value of $u$ satisfies \[\begin{equation} \label{eq:6.4.3} f(x,y,u)=0. \end{equation} \nonumber \] However, infinitely many functions $u=u(x,y)$ satisfy on the set \[ S=\set{(x,y)}{x^2+y^2\le1}. \nonumber \] They are of the form \[ u(x,y)=\epsilon(x,y)\sqrt{1-x^2-y^2}, \nonumber \] where $\epsilon(x,y)$ can be chosen arbitrarily, for each $(x,y)$ in $S$, to be $1$ or $-1$. We can narrow the choice of functions to two by requiring that $u$ be continuous on $S$; then \[\begin{eqnarray} u(x,y)\ar=\sqrt{1-x^2-y^2} \label{eq:6.4.4}\\ \arraytext{or}\nonumber\\ u(x,y)\ar=-\sqrt{1-x^2-y^2}.\nonumber \end{eqnarray} \nonumber \] We can define a unique continuous solution $u$ of by specifying its value at a single interior point of $S$. For example, if we require that \[ u\left(\frac{1}{\sqrt{3}}, \frac{1}{\sqrt{3}}\right)=\frac{1}{\sqrt{3}}, \nonumber \] then $u$ must be as defined by .

The question of whether an arbitrary system \[ \mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{0} \nonumber \] determines $\mathbf{U}$ as a function of $\mathbf{X}$ is too general to have a useful answer. However, there is a theorem, the implicit function theorem, that answers this question affirmatively in an important special case. To facilitate the statement of this theorem, we partition the differential matrix of $\mathbf{F}: \R^{n+m}\to \R^m$: \[\begin{equation} \label{eq:6.4.5} \begin{array}{rcl}\mathbf{F}'=\left[\begin{array}{ccccccccc} \dst{\frac{\partial f_1}{\partial x_1}}& \dst{\frac{\partial f_1}{\partial x_2}}&\cdots& \dst{\frac{\partial f_1}{\partial x_n}}&|& \dst{\frac{\partial f_1}{\partial u_1}}& \dst{\frac{\partial f_1}{\partial u_2}}&\cdots& \dst{\frac{\partial f_1}{\partial u_m}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial x_1}}& \dst{\frac{\partial f_2}{\partial x_2}}&\cdots& \dst{\frac{\partial f_2}{\partial x_n}}&|& \dst{\frac{\partial f_2}{\partial u_1}}& \dst{\frac{\partial f_2}{\partial u_2}}&\cdots& \dst{\frac{\partial f_2}{\partial u_m}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots&|&\vdots&\vdots&\ddots&\vdots\\ \dst{\frac{\partial f_m}{\partial x_1}}& \dst{\frac{\partial f_m}{\partial x_2}}&\cdots& \dst{\frac{\partial f_m}{\partial x_n}}&|& \dst{\frac{\partial f_m}{\partial u_1}}& \dst{\frac{\partial f_m}{\partial u_2}}&\cdots& \dst{\frac{\partial f_m}{\partial u_m}}\end{array}\right]\\ \end{array} \end{equation} \nonumber \] or \[ \mathbf{F}'=[\mathbf{F}_\mathbf{X},\mathbf{F}_\mathbf{U}], \nonumber \] where $\mathbf{F}_\mathbf{X}$ is the submatrix to the left of the dashed line in and $\mathbf{F}_\mathbf{U}$ is to the right.

For the linear transformation , $\mathbf{F}_\mathbf{X}=\mathbf{A}$ and $\mathbf{F}_\mathbf{U}=\mathbf{B}$, and we have seen that the system $\mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{0}$ defines $\mathbf{U}$ as a function of $\mathbf{X}$ for all $\mathbf{X}$ in $\R^n$ if $\mathbf{F}_\mathbf{U}$ is nonsingular. The next theorem shows that a related result holds for more general transformations.

Define $\boldsymbol{\Phi}:\R^{n+m}\to \R^{n+m}$ by \[\begin{equation} \label{eq:6.4.8} \boldsymbol{\Phi}(\mathbf{X},\mathbf{U})=\left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\\ f_1(\mathbf{X},\mathbf{U})\\ [3\jot] f_2(\mathbf{X},\mathbf{U})\\\vdots\\ f_m(\mathbf{X},\mathbf{U})\end{array} \right] \end{equation} \nonumber \] or, in ``horizontal’’notation by \[\begin{equation} \label{eq:6.4.9} \boldsymbol{\Phi}(\mathbf{X},\mathbf{U})=(\mathbf{X},\mathbf{F}(\mathbf{X},\mathbf{U})). \end{equation} \nonumber \] Then $\boldsymbol{\Phi}$ is continuously differentiable on $S$ and, since $\mathbf{F}(\mathbf{X}_0,\mathbf{U}_0)=\mathbf{0}$, \[\begin{equation} \label{eq:6.4.10} \boldsymbol{\Phi}(\mathbf{X}_0,\mathbf{U}_0)=(\mathbf{X}_0,\mathbf{0}). \end{equation} \nonumber \] The differential matrix of $\boldsymbol{\Phi}$ is \[ \boldsymbol{\Phi}'=\left[\begin{array}{cccccccc} 1&0&\cdots&0&0&0&\cdots&0\\ [3\jot] 0&1&\cdots&0&0&0&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&1&0&0&\cdots&0\\ [3\jot] \dst{\frac{\partial f_1}{\partial x_1}}& \dst{\frac{\partial f_1}{\partial x_2}}&\cdots& \dst{\frac{\partial f_1}{\partial x_n}}& \dst{\frac{\partial f_1}{\partial u_1}}& \dst{\frac{\partial f_1}{\partial u_2}}&\cdots& \dst{\frac{\partial f_1}{\partial u_m}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial x_1}}& \dst{\frac{\partial f_2}{\partial x_2}}&\cdots& \dst{\frac{\partial f_2}{\partial x_n}}& \dst{\frac{\partial f_2}{\partial u_1}}& \dst{\frac{\partial f_2}{\partial u_2}}&\cdots& \dst{\frac{\partial f_2}{\partial u_m}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\ [3\jot] \dst{\frac{\partial f_m}{\partial x_1}}& \dst{\frac{\partial f_m}{\partial x_2}}&\cdots& \dst{\frac{\partial f_m}{\partial x_n}}& \dst{\frac{\partial f_m}{\partial u_1}}& \dst{\frac{\partial f_m}{\partial u_2}}&\cdots& \dst{\frac{\partial f_m}{\partial u_m}}\end{array}\right]= \left[\begin{array}{cc}\mathbf{I}&\mathbf{0}\\\mathbf{F}_\mathbf{X}&\mathbf{F}_\mathbf{U} \end{array}\right], \nonumber \]

where $\mathbf{I}$ is the $n\times n$ identity matrix, $\mathbf{0}$ is the $n\times m$ matrix with all zero entries, and $\mathbf{F}_\mathbf{X}$ and $\mathbf{F}_\mathbf{U}$ are as in . By expanding $\det(\boldsymbol{\Phi}')$ and the determinants that evolve from it in terms of the cofactors of their first rows, it can be shown in $n$ steps that .5pc \[ J\boldsymbol{\Phi}=\det(\boldsymbol{\Phi}')=\left|\begin{array}{cccc} \dst{\frac{\partial f_1}{\partial u_1}}& \dst{\frac{\partial f_1}{\partial u_2}}&\cdots& \dst{\frac{\partial f_1}{\partial u_m}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial u_1}}& \dst{\frac{\partial f_2}{\partial u_2}}&\cdots& \dst{\frac{\partial f_2}{\partial u_m}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots\\ \dst{\frac{\partial f_m}{\partial u_1}}& \dst{\frac{\partial f_m}{\partial u_2}}&\cdots& \dst{\frac{\partial f_m}{\partial u_m}}\end{array}\right|= \det(\mathbf{F}_\mathbf{U}). \nonumber \] .5pc In particular, \[ J\boldsymbol{\Phi}(\mathbf{X}_0,\mathbf{U}_0)=\det(\mathbf{F}_\mathbf{U} (\mathbf{X}_0,\mathbf{U}_{0})\ne0. \nonumber \] Since $\boldsymbol{\Phi}$ is continuously differentiable on $S$, Corollary~ implies that $\boldsymbol{\Phi}$ is regular on some open neighborhood $M$ of $(\mathbf{X}_0,\mathbf{U}_0)$ and that $\widehat{M}=\boldsymbol{\Phi}(M)$ is open.

Because of the form of $\boldsymbol{\Phi}$ (see or ), we can write points of $\widehat{M}$ as $(\mathbf{X},\mathbf{V})$, where $\mathbf{V}\in \R^m$. Corollary~ also implies that $\boldsymbol{\Phi}$ has a a continuously differentiable inverse $\boldsymbol{\Gamma}(\mathbf{X},\mathbf{V})$ defined on $\widehat{M}$ with values in $M$. Since $\boldsymbol{\Phi}$ leaves the $\mathbf{X}$ part" of $(\mathbf{X},\mathbf{U})$ fixed, a local inverse of $\boldsymbol{\Phi}$ must also have this property. Therefore, $\boldsymbol{\Gamma}$ must have the form \vskip.5pc \[ \boldsymbol{\Gamma}(\mathbf{X},\mathbf{V})=\left[\begin{array}{c} x_1\\ x_2\\\vdots\\ x_n\\[3\jot] h_1(\mathbf{X},\mathbf{V})\\[3\jot] h_2(\mathbf{X},\mathbf{V})\\ \vdots\\ [3\jot] h_m(\mathbf{X},\mathbf{V})\end{array}\right] $$ \vskip1pc or, inhorizontal’’ notation, \[ \boldsymbol{\Gamma}(\mathbf{X},\mathbf{V})=(\mathbf{X},\mathbf{H}(\mathbf{X},\mathbf{V})), \nonumber \] where $\mathbf{H}:\R^{n+m}\to \R^m$ is continuously differentiable on $\widehat{M}$. We will show that $\mathbf{G}(\mathbf{X})=\mathbf{H}(\mathbf{X},\mathbf{0})$ has the stated properties.

From , $(\mathbf{X}_0,\mathbf{0})\in\widehat{M}$ and, since $\widehat{M}$ is open, there is a neighborhood $N$ of $\mathbf{X}_0$ in $\R^n$ such that $(\mathbf{X},\mathbf{0})\in\widehat{M}$ if $\mathbf{X}\in N$ (Exercise~). Therefore, $(\mathbf{X},\mathbf{G}(\mathbf{X})) =\boldsymbol{\Gamma}(\mathbf{X},\mathbf{0})\in M$ if $\mathbf{X}\in N$. Since $\boldsymbol{\Gamma}=\boldsymbol{\Phi}^{-1}$, $(\mathbf{X},\mathbf{0}) =\boldsymbol{\Phi}(\mathbf{X},\mathbf{G}(\mathbf{X}))$. Setting $\mathbf{X}=\mathbf{X}_0$ and recalling shows that $\mathbf{G}(\mathbf{X}_0)=\mathbf{U}_0$, since $\boldsymbol{\Phi}$ is one-to-one on $M$.

Henceforth we assume that $\mathbf{X}\in N$. Now, \[ \begin{array}{rcll} (\mathbf{X},\mathbf{0})\ar= \boldsymbol{\Phi}(\boldsymbol{\Gamma}(\mathbf{X},\mathbf{0})) &\mbox{ (since $\boldsymbol{\Phi}=\boldsymbol{\Gamma}^{-1})$}\\ \ar=\boldsymbol{\Phi}(\mathbf{X},\mathbf{G}(\mathbf{X}))&\mbox{ (since $\boldsymbol{\Gamma}(\mathbf{X},\mathbf{0})=(\mathbf{X},\mathbf{G}(\mathbf{X}))$)}\\ \ar=(\mathbf{X},\mathbf{F}(\mathbf{X},\mathbf{G}(\mathbf{X})))&\mbox{ (since $\boldsymbol{\Phi}(\mathbf{X},\mathbf{U})= (\mathbf{X},\mathbf{F}(\mathbf{X},\mathbf{U} ))$)}. \end{array} \nonumber \] Therefore, $\mathbf{F}(\mathbf{X},\mathbf{G}(\mathbf{X}))=\mathbf{0}$; that is, $\mathbf{G}$ satisfies . To see that $\mathbf{G}$ is unique, suppose that $\mathbf{G}_1:\R^n\to \R^m$ also satisfies . Then \[ \boldsymbol{\Phi}(\mathbf{X},\mathbf{G}(\mathbf{X}))= (\mathbf{X},\mathbf{F} (\mathbf{X},\mathbf{G}(\mathbf{X})))=(\mathbf{X},\mathbf{0}) \nonumber \] and \[ \boldsymbol{\Phi}(\mathbf{X},\mathbf{G}_1(\mathbf{X}))=(\mathbf{X},\mathbf{F} (\mathbf{X},\mathbf{G}_1(\mathbf{X})))=(\mathbf{X},\mathbf{0}) \nonumber \] for all $\mathbf{X}$ in $N$. Since $\boldsymbol{\Phi}$ is one-to-one on $M$, this implies that $\mathbf{G}(\mathbf{X})= \mathbf{G}_1(\mathbf{X})$.

Since the partial derivatives \[ \frac{\partial h_i}{\partial x_j},\quad 1\le i\le m,\quad 1\le j\le n, \nonumber \] are continuous functions of $(\mathbf{X},\mathbf{V})$ on $\widehat{M}$, they are continuous with respect to $\mathbf{X}$ on the subset $\set{(\mathbf{X},\mathbf{0})}{\mathbf{X} \in N}$ of $\widehat{M}$. Therefore, $\mathbf{G}$ is continuously differentiable on $N$. To verify , we write $\mathbf{F}(\mathbf{X},\mathbf{G}(\mathbf{X}))=\mathbf{0}$ in terms of components; thus, \[ f_i(x_1,x_2, \dots,x_n,g_1(\mathbf{X}),g_2(\mathbf{X}), \dots,g_m(\mathbf{X})) =0,\quad 1\le i\le m,\quad\mathbf{X}\in N. \nonumber \] Since $f_i$ and $g_1$, $g_2$, , $g_m$ are continuously differentiable on their respective domains, the chain rule (Theorem~) implies that \[\begin{equation} \label{eq:6.4.11} \frac{\partial f_i(\mathbf{X},\mathbf{G}(\mathbf{X}))}{\partial x_j}+ \sum^m_{r=1} \frac{\partial f_i(\mathbf{X},\mathbf{G}(\mathbf{X}))}{\partial u_r} \frac{\partial g_r(\mathbf{X}) }{\partial x_j}=0,\quad 1\le i\le m,\ 1\le j\le n, \end{equation} \nonumber \] or, in matrix form, \[\begin{equation} \label{eq:6.4.12} \mathbf{F}_\mathbf{X}(\mathbf{X},\mathbf{G}(\mathbf{X}))+\mathbf{F}_\mathbf{U} (\mathbf{X},\mathbf{G}(\mathbf{X}))\mathbf{G}'(\mathbf{X})=\mathbf{0}. \end{equation} \nonumber \] Since $(\mathbf{X},\mathbf{G}(\mathbf{X}))\in M$ for all $\mathbf{X}$ in $N$ and $\mathbf{F}_\mathbf{U}(\mathbf{X},\mathbf{U})$ is nonsingular when $(\mathbf{X},\mathbf{U})\in M$, we can multiply on the left by $\mathbf{F}^{-1}_\mathbf{U}(\mathbf{X},\mathbf{G}(\mathbf{X}))$ to obtain . This completes the proof.

In Theorem~ we denoted the implicitly defined transformation by $\mathbf{G}$ for reasons of clarity in the proof. However, in applying the theorem it is convenient to denote the transformation more informally by $\mathbf{U}=\mathbf{U}(\mathbf{X})$; thus, $\mathbf{U}(\mathbf{X}_0)=\mathbf{U}_0$, and we replace and by \[ (\mathbf{ X},\mathbf{U}(\mathbf{X}))\in M\mbox{\quad and \quad} \mathbf{X}(\mathbf{X},\mathbf{U}(\mathbf{X}))=0\mbox{\quad if}\quad\mathbf{X}\in N, \nonumber \] and \[ \mathbf{U}'(\mathbf{X})=-[\mathbf{F}_\mathbf{U}(\mathbf{X},\mathbf{U}(\mathbf{X}))]^{-1} \mathbf{F}_\mathbf{X}(\mathbf{X},\mathbf{U}(\mathbf{X})),\quad \mathbf{X}\in N, \nonumber \]

while becomes \[\begin{equation} \label{eq:6.4.13} \frac{\partial f_i}{\partial x_j}+ \sum^m_{r=1} \frac{\partial f_i}{\partial u_r} \frac{\partial u_r }{\partial x_j}=0,\quad 1\le i\le m,\ 1\le j\le n, \end{equation} \nonumber \] it being understood that the partial derivatives of $u_r$ and $f_i$ are evaluated at $\mathbf{X}$ and $(\mathbf{X},\mathbf{U}(\mathbf{X}))$, respectively.

The following corollary is the implicit function theorem for $m=1$.

-2em2em

It is not necessary to memorize formulas like and . Since we know that $f$ and $u$ are differentiable, we can obtain and by applying the chain rule to the identity \[ f(x,y,u(x,y))=0. \nonumber \]

If we try to solve for $u$, we see very clearly that Theorem~ and Corollary~ are {} theorems; that is, they tell us that there is a function $u=u(x,y)$ that satisfies , but not how to find it. In this case there is no convenient formula for the function, although its partial derivatives can be expressed conveniently in terms of $x$, $y$, and $u(x,y)$: \[ u_x(x,y)=-\frac{f_x(x,y,u(x,y))}{ f_u(x,y,u(x,y))},\quad u_y(x,y)=-\frac{f_y(x,y,u(x,y))}{ f_u(x,y,u(x,y))}. \nonumber \] In particular, since $u(1,-1)=1$, \[ u_x(1,-1)=-\frac{0}{-7}=0,\quad u_y(1,-1)=-\frac{4}{-7}=\frac{4}{7}. \nonumber \]

Again, it is not necessary to memorize , since the partial derivatives of an implicitly defined function can be obtained from the chain rule and Cramer’s rule, as in the next example.

-.4em It is convenient to extend the notation introduced in Section 6.2 for the Jacobian of a transformation $\mathbf{F}:\R^m\to \R^m$. If $f_1$, $f_2$, , $f_m$ are real-valued functions of $k$ variables, $k\ge m$, and $\xi_1$, $\xi_2$, , $\xi_m$ are any $m$ of the variables, then we call the determinant \[ \left|\begin{array}{cccc} \dst{\frac{\partial f_1}{\partial\xi_1}}& \dst{\frac{\partial f_1}{\partial\xi_2}}&\cdots& \dst{\frac{\partial f_1}{\partial\xi_m}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial\xi_1}}& \dst{\frac{\partial f_2}{\partial\xi_2}}&\cdots& \dst{\frac{\partial f_2}{\partial\xi_m}}\\ [3\jot] \vdots&\vdots&\ddots&\vdots\\ [3\jot] \dst{\frac{\partial f_m}{\partial\xi_1}}& \dst{\frac{\partial f_m}{\partial\xi_2}}&\cdots& \dst{\frac{\partial f_m}{\partial\xi_m}}\end{array}\right|, \nonumber \] the {}. We denote this Jacobian by \[ \frac{\partial (f_1,f_2, \dots,f_m)}{\partial(\xi_1,\xi_2, \dots,\xi_m)}, \nonumber \] and we denote the value of the Jacobian at a point $\mathbf{P}$ by \[ \frac{\partial(f_1,f_2, \dots,f_m)}{\partial (\xi_1,\xi_2, \dots,\xi_m)}\Bigg|_\mathbf{P}. \nonumber \]

The requirement in Theorem~ that $\mathbf{F}_\mathbf{U}(\mathbf{X}_0,\mathbf{U}_0)$ be nonsingular is equivalent to \[ \frac{\partial(f_1,f_2, \dots, f_m)}{\partial (u_1,u_2, \dots,u_m)} \Bigg|_{(\mathbf{X}_0,\mathbf{U}_0)}\ne0. \nonumber \] If this is so then, for a fixed $j$, Cramer’s rule allows us to write the solution of as \[ \frac{\partial u_i}{\partial x_j}=- \frac{\dst{\frac{\partial(f_1,f_2, \dots,f_i, \dots,f_m)}{\partial ( u_1, u_2, \dots, x_j, \dots, u_m)}}}{ \dst{\frac{\partial(f_1,f_2, \dots,f_i, \dots,f_m)}{ \partial( u_1, u_2, \dots, u_i, \dots, u_m)}}},\quad 1\le i\le m, \nonumber \] Notice that the determinant in the numerator on the right is obtained by replacing the $i$th column of the determinant in the denominator, which is \[ \left[\begin{array}{c}\dst{\frac{\partial f_1}{\partial u_i}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial u_i}}\\\vdots\\ \dst{\frac{\partial f_m}{\partial u_i}}\end{array}\right], \mbox{\quad by\quad}\left[\begin{array}{c}\dst{\frac{\partial f_1}{ \partial x_j}}\\ [3\jot] \dst{\frac{\partial f_2}{\partial x_j}}\\\vdots\\ \dst{\frac{\partial f_m}{\partial x_j}}\end{array}\right]. \nonumber \]

So far we have considered only the problem of solving a continuously differentiable system \[\begin{equation} \label{eq:6.4.21} \mathbf{F}(\mathbf{X},\mathbf{U})=\mathbf{0}\quad(\mathbf{F}:\R^{n+m}\to \R^m) \end{equation} \nonumber \] for the last $m$ variables, $u_1$, $u_2$, , $u_m$, in terms of the first $n$, $x_1$, $x_2$, , $x_n$. This was merely for convenience; can be solved near $(\mathbf{X}_0,\mathbf{U}_0)$ for any $m$ of the variables in terms of the other $n$, provided only that the Jacobian of $f_1$, $f_2$, , $f_m$ with respect to the chosen $m$ variables is nonzero at $(\mathbf{X}_0,\mathbf{U}_0)$. This can be seen by renaming the variables and applying Theorem~.

We close this section by observing that the functions $u_1$, $u_2$, , $u_m$ defined in Theorem~ have higher derivatives if $f_1,f_2, \dots,f_m$ do, and they may be obtained by differentiating , using the chain rule. (Exercise~).