# 8.5: Inverse and implicit function Theorem

- Last updated

- Save as PDF

- Page ID
- 32325

- Jiří Lebl
- Associate Professor (Mathematics) at Oklahoma State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

Inverse and implicit function theorem
Note: FIXME lectures
To prove the inverse function theorem we use the contraction mapping principle we have seen in FIXME and that we have used to prove Picard’s theorem. Recall that a mapping \(f \colon X \to X'\) between two metric spaces \((X,d)\) and \((X',d')\) is called a contraction if there exists a \(k < 1\) such that \[d'\bigl(f(x),f(y)\bigr) \leq k d(x,y) \ \ \ \ \text{for all } x,y \in X.\] The contraction mapping principle says that if \(f \colon X \to X\) is a contraction and \(X\) is a complete metric space, then there exists a fixed point, that is, there exists an \(x \in X\) such that \(f(x) = x\).
Intuitively if a function is differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is differentiable and the derivative is invertible, the function is (locally) invertible.
Let \(U \subset {\mathbb{R}}^n\) be a set and let \(f \colon U \to {\mathbb{R}}^n\) be a continuously differentiable function. Also suppose \(x_0 \in U\), \(f(x_0) = y_0\), and \(f'(x_0)\) is invertible (that is, \(J_f(x_0) \not=0\)). Then there exist open sets \(V, W \subset {\mathbb{R}}^n\) such that \(x_0 \in V \subset U\), \(f(V) = W\) and \(f|_V\) is one-to-one and onto. Furthermore, the inverse \(g(y) = (f|_V)^{-1}(y)\) is continuously differentiable and \[g'(y) = {\bigl(f'(x)\bigr)}^{-1}, \qquad \text{ for all $x \in V$, $y = f(x)$.}\]
Write \(A = f'(x_0)\). As \(f'\) is continuous, there exists an open ball \(V\) around \(x_0\) such that \[\left\lVert {A-f'(x)} \right\rVert < \frac{1}{2\left\lVert {A^{-1}} \right\rVert} \qquad \text{for all $x \in V$.}\] Note that \(f'(x)\) is invertible for all \(x \in V\).
Given \(y \in {\mathbb{R}}^n\) we define \(\varphi_y \colon C \to {\mathbb{R}}^n\) \[\varphi_y (x) = x + A^{-1}\bigl(y-f(x)\bigr) .\] As \(A^{-1}\) is one-to-one, then \(\varphi_y(x) = x\) (\(x\) is a fixed point) if only if \(y-f(x) = 0\), or in other words \(f(x)=y\). Using chain rule we obtain \[\varphi_y'(x) = I - A^{-1} f'(x) = A^{-1} \bigl( A-f'(x) \bigr) .\] So for \(x \in V\) we have \[\left\lVert {\varphi_y'(x)} \right\rVert \leq \left\lVert {A^{-1}} \right\rVert \left\lVert {A-f'(x)} \right\rVert < \nicefrac{1}{2} .\] As \(V\) is a ball it is convex, and hence \[\left\lVert {\varphi_y(x_1)-\varphi_y(x_2)} \right\rVert \leq \frac{1}{2} \left\lVert {x_1-x_2} \right\rVert \qquad \text{for all $x_1,x_2 \in V$}.\] In other words \(\varphi_y\) is a contraction defined on \(V\), though we so far do not know what is the range of \(\varphi_y\). We cannot apply the fixed point theorem, but we can say that \(\varphi_y\) has at most one fixed point (note proof of uniqueness in the contraction mapping principle). That is, there exists at most one \(x \in V\) such that \(f(x) = y\), and so \(f|_V\) is one-to-one.
Let \(W = f(V)\). We need to show that \(W\) is open. Take a \(y_1 \in W\), then there is a unique \(x_1 \in V\) such that \(f(x_1) = y_1\). Let \(r > 0\) be small enough such that the closed ball \(C(x_1,r) \subset V\) (such \(r > 0\) exists as \(V\) is open).
Suppose \(y\) is such that \[\left\lVert {y-y_1} \right\rVert < \frac{r}{2\left\lVert {A^{-1}} \right\rVert} .\] If we can show that \(y \in W\), then we have shown that \(W\) is open. Define \(\varphi_y(x) = x+A^{-1}\bigl(y-f(x)\bigr)\) as before. If \(x \in C(x_1,r)\), then \[\begin{split} \left\lVert {\varphi_y(x)-x_1} \right\rVert & \leq \left\lVert {\varphi_y(x)-\varphi_y(x_1)} \right\rVert + \left\lVert {\varphi_y(x_1)-x_1} \right\rVert \\ & \leq \frac{1}{2}\left\lVert {x-x_1} \right\rVert + \left\lVert {A^{-1}(y-y_1)} \right\rVert \\ & \leq \frac{1}{2}r + \left\lVert {A^{-1}} \right\rVert\left\lVert {y-y_1} \right\rVert \\ & < \frac{1}{2}r + \left\lVert {A^{-1}} \right\rVert \frac{r}{2\left\lVert {A^{-1}} \right\rVert} = r . \end{split}\] So \(\varphi_y\) takes \(C(x_1,r)\) into \(B(x_1,r) \subset C(x_1,r)\). It is a contraction on \(C(x_1,r)\) and \(C(x_1,r)\) is complete (closed subset of \({\mathbb{R}}^n\) is complete). Apply the contraction mapping principle to obtain a fixed point \(x\), i.e. \(\varphi_y(x) = x\). That is \(f(x) = y\). So \(y \in f\bigl(C(x_1,r)\bigr) \subset f(V) = W\). Therefore \(W\) is open.
Next we need to show that \(g\) is continuously differentiable and compute its derivative. First let us show that it is differentiable. Let \(y \in W\) and \(k \in {\mathbb{R}}^n\), \(k\not= 0\), such that \(y+k \in W\). Then there are unique \(x \in V\) and \(h \in {\mathbb{R}}^n\), \(h \not= 0\) and \(x+h \in V\), such that \(f(x) = y\) and \(f(x+h) = y+k\) as \(f|_V\) is a one-to-one and onto mapping of \(V\) onto \(W\). In other words, \(g(y) = x\) and \(g(y+k) = x+h\). We can still squeeze some information from the fact that \(\varphi_y\) is a contraction. \[\varphi_y(x+h)-\varphi_y(x) = h + A^{-1} \bigl( f(x)-f(x+h) \bigr) = h - A^{-1} k .\] So \[\left\lVert {h-A^{-1}k} \right\rVert = \left\lVert {\varphi_y(x+h)-\varphi_y(x)} \right\rVert \leq \frac{1}{2}\left\lVert {x+h-x} \right\rVert = \frac{\left\lVert {h} \right\rVert}{2}.\] By the inverse triangle inequality \(\left\lVert {h} \right\rVert - \left\lVert {A^{-1}k} \right\rVert \leq \frac{1}{2}\left\lVert {h} \right\rVert\) so \[\left\lVert {h} \right\rVert \leq 2 \left\lVert {A^{-1}k} \right\rVert \leq 2 \left\lVert {A^{-1}} \right\rVert \left\lVert {k} \right\rVert.\] In particular as \(k\) goes to 0, so does \(h\).
As \(x \in V\), then \(f'(x)\) is invertible. Let \(B = \bigl(f'(x)\bigr)^{-1}\), which is what we think the derivative of \(g\) at \(y\) is. Then \[\begin{split} \frac{\left\lVert {g(y+k)-g(y)-Bk} \right\rVert}{\left\lVert {k} \right\rVert} & = \frac{\left\lVert {h-Bk} \right\rVert}{\left\lVert {k} \right\rVert} \\ & = \frac{\left\lVert {h-B\bigl(f(x+h)-f(x)\bigr)} \right\rVert}{\left\lVert {k} \right\rVert} \\ & = \frac{\left\lVert {B\bigl(f(x+h)-f(x)-f'(x)h\bigr)} \right\rVert}{\left\lVert {k} \right\rVert} \\ & \leq \left\lVert {B} \right\rVert \frac{\left\lVert {h} \right\rVert}{\left\lVert {k} \right\rVert}\, \frac{\left\lVert {f(x+h)-f(x)-f'(x)h} \right\rVert}{\left\lVert {h} \right\rVert} \\ & \leq 2\left\lVert {B} \right\rVert\left\lVert {A^{-1}} \right\rVert \frac{\left\lVert {f(x+h)-f(x)-f'(x)h} \right\rVert}{\left\lVert {h} \right\rVert} . \end{split}\] As \(k\) goes to 0, so does \(h\). So the right hand side goes to 0 as \(f\) is differentiable, and hence the left hand side also goes to 0. And \(B\) is precisely what we wanted \(g'(y)\) to be.
We have \(g\) is differentiable, let us show it is \(C^1(W)\). Now, \(g \colon W \to V\) is continuous (it is differentiable), \(f'\) is a continuous function from \(V\) to \(L({\mathbb{R}}^n)\), and \(X \to X^{-1}\) is a continuous function. \(g'(y) = {\bigl( f'\bigl(g(y)\bigr)\bigr)}^{-1}\) is the composition of these three continuous functions and hence is continuous.
Suppose \(U \subset {\mathbb{R}}^n\) is open and \(f \colon U \to {\mathbb{R}}^n\) is a continuously differentiable mapping such that \(f'(x)\) is invertible for all \(x \in U\). Then given any open set \(V \subset U\), \(f(V)\) is open. (\(f\) is an open mapping).
Without loss of generality, suppose \(U=V\). For each point \(y \in f(V)\), we pick \(x \in f^{-1}(y)\) (there could be more than one such point), then by the inverse function theorem there is a neighbourhood of \(x\) in \(V\) that maps onto an neighbourhood of \(y\). Hence \(f(V)\) is open.
The theorem, and the corollary, is not true if \(f'(x)\) is not invertible for some \(x\). For example, the map \(f(x,y) = (x,xy)\), maps \({\mathbb{R}}^2\) onto the set \({\mathbb{R}}^2 \setminus \{ (0,y) : y \neq 0 \}\), which is neither open nor closed. In fact \(f^{-1}(0,0) = \{ (0,y) : y \in {\mathbb{R}}\}\). Note that this bad behaviour only occurs on the \(y\)-axis, everywhere else the function is locally invertible. In fact if we avoid the \(y\)-axis it is even one to one.
Also note that just because \(f'(x)\) is invertible everywhere does not mean that \(f\) is one-to-one globally. It is “locally” one-to-one but perhaps not “globally.” For an example, take the map \(f \colon {\mathbb{R}}^2 \setminus \{ 0 \} \to {\mathbb{R}}^2\) defined by \(f(x,y) = (x^2-y^2,2xy)\). It is left to student to show that \(f\) is differentiable and the derivative is invertible
On the other hand, the mapping is 2-to-1 globally. For every \((a,b)\) that is not the origin, there are exactly two solutions to \(x^2-y^2=a\) and \(2xy=b\). We leave it to the student to show that there is at least one solution, and then notice that replacing \(x\) and \(y\) with \(-x\) and \(-y\) we obtain another solution.
Also note that the invertibility of the derivative is not a necessary condition, just sufficient for having a continuous inverse and being an open mapping. For example the function \(f(x) = x^3\) is an open mapping from \({\mathbb{R}}\) to \({\mathbb{R}}\) and is globally one-to-one with a continuous inverse.
Implicit function theorem
The inverse function theorem is really a special case of the implicit function theorem which we prove next. Although somewhat ironically we prove the implicit function theorem using the inverse function theorem. What we were showing in the inverse function theorem was that the equation \(x-f(y) = 0\) was solvable for \(y\) in terms of \(x\) if the derivative in terms of \(y\) was invertible, that is if \(f'(y)\) was invertible. That is there was locally a function \(g\) such that \(x-f\bigl(g(x)\bigr) = 0\).
OK, so how about we look at the equation \(f(x,y) = 0\). Obviously this is not solvable for \(y\) in terms of \(x\) in every case. For example, when \(f(x,y)\) does not actually depend on \(y\). For a slightly more complicated example, notice that \(x^2+y^2-1 = 0\) defines the unit circle, and we can locally solve for \(y\) in terms of \(x\) when 1) we are near a point which lies on the unit circle and 2) when we are not at a point where the circle has a vertical tangency, or in other words where \(\frac{\partial f}{\partial y} = 0\).
To make things simple we fix some notation. We let \((x,y) \in {\mathbb{R}}^{n+m}\) denote the coordinates \((x^1,\ldots,x^n,y^1,\ldots,y^m)\). A linear transformation \(A \in L({\mathbb{R}}^{n+m},{\mathbb{R}}^m)\) can then always be written as \(A = [ A_x ~ A_y ]\) so that \(A(x,y) = A_x x + A_y y\), where \(A_x \in L({\mathbb{R}}^n,{\mathbb{R}}^m)\) and \(A_y \in L({\mathbb{R}}^m)\).
Let \(A = [A_x~A_y] \in L({\mathbb{R}}^{n+m},{\mathbb{R}}^m)\) and suppose \(A_y\) is invertible, then let \(B = - {(A_y)}^{-1} A_x\) and note that \[0 = A ( x, Bx) = A_x x + A_y Bx .\]
The proof is obvious. We simply solve and obtain \(y = Bx\). Let us therefore show that the same can be done for \(C^1\) functions.
[thm:implicit] Let \(U \subset {\mathbb{R}}^{n+m}\) be an open set and let \(f \colon U \to {\mathbb{R}}^m\) be a \(C^1(U)\) mapping. Let \((x_0,y_0) \in U\) be a point such that \(f(x_0,y_0) = 0\) and such that \[\frac{\partial(f^1,\ldots,f^m)}{\partial(y^1,\ldots,y^m)} (x_0,y_0) \neq 0 .\] Then there exists an open set \(W \subset {\mathbb{R}}^n\) with \(x_0 \in W\), an open set \(W' \subset {\mathbb{R}}^m\) with \(y_0 \in W'\), with \(W \times W' \subset U\), and a \(C^1(W)\) mapping \(g \colon W \to W'\), with \(g(x_0) = y_0\), and for all \(x \in W\), the point \(g(x)\) is the unique point in \(W'\) such that \[f\bigl(x,g(x)\bigr) = 0 .\] Furthermore, if \([ A_x ~ A_y ] = f'(x_0,y_0)\), then \[g'(x_0) = -{(A_y)}^{-1}A_x .\]
FIXME: and these are ALL the points where \(f\) vanishes near \(x_0,y_0\).
The condition \(\frac{\partial(f^1,\ldots,f^m)}{\partial(y^1,\ldots,y^m)} (x_0,y_0) = \det(A_y) \neq 0\) simply means that \(A_y\) is invertible.
Define \(F \colon U \to {\mathbb{R}}^{n+m}\) by \(F(x,y) := \bigl(x,f(x,y)\bigr)\). It is clear that \(F\) is \(C^1\), and we want to show that the derivative at \((x_0,y_0)\) is invertible.
Let us compute the derivative. We know that \[\frac{\left\lVert {f(x_0+h,y_0+k) - f(x_0,y_0) - A_x h - A_y k} \right\rVert}{\left\lVert {(h,k)} \right\rVert}\] goes to zero as \(\left\lVert {(h,k)} \right\rVert = \sqrt{\left\lVert {h} \right\rVert^2+\left\lVert {k} \right\rVert^2}\) goes to zero. But then so does \[\frac{\left\lVert {\bigl(h,f(x_0+h,y_0+k)-f(x_0,y_0)\bigr) - (h,A_x h+A_y k)} \right\rVert}{\left\lVert {(h,k)} \right\rVert} = \frac{\left\lVert {f(x_0+h,y_0+k) - f(x_0,y_0) - A_x h - A_y k} \right\rVert}{\left\lVert {(h,k)} \right\rVert} .\] So the derivative of \(F\) at \((x_0,y_0)\) takes \((h,k)\) to \((h,A_x h+A_y k)\). If \((h,A_x h+A_y k) = (0,0)\), then \(h=0\), and so \(A_y k = 0\). As \(A_y\) is one-to-one, then \(k=0\). Therefore \(F'(x_0,y_0)\) is one-to-one or in other words invertible and we can apply the inverse function theorem.
That is, there exists some open set \(V \subset {\mathbb{R}}^{n+m}\) with \((x_0,0) \in V\), and an inverse mapping \(G \colon V \to {\mathbb{R}}^{n+m}\), that is \(F\bigl(G(x,s)\bigr) = (x,s)\) for all \((x,s) \in V\) (where \(x \in {\mathbb{R}}^n\) and \(s \in {\mathbb{R}}^m\)). Write \(G = (G_1,G_2)\) (the first \(n\) and the second \(m\) components of \(G\)). Then \[F\bigl(G_1(x,s),G_2(x,s)\bigr) = \bigl(G_1(x,s),f(G_1(x,s),G_2(x,s))\bigr) = (x,s) .\] So \(x = G_1(x,s)\) and \(f\bigl(G_1(x,s),G_2(x,s)) = f\bigl(x,G_2(x,s)\bigr) = s\). Plugging in \(s=0\) we obtain \[f\bigl(x,G_2(x,0)\bigr) = 0 .\] The set \(G(V)\) contains a whole neighbourhood of the point \((x_0,y_0)\) and therefore there are some open The set \(V\) is open and hence there exist some open sets \(\tilde{W}\) and \(W'\) such that \(\tilde{W} \times W' \subset G(V)\) with \(x_0 \in \tilde{W}\) and \(y_0 \in W'\). Then take \(W = \{ x \in \tilde{W} : G_2(x,0) \in W' \}\). The function that takes \(x\) to \(G_2(x,0)\) is continuous and therefore \(W\) is open. We define \(g \colon W \to {\mathbb{R}}^m\) by \(g(x) := G_2(x,0)\) which is the \(g\) in the theorem. The fact that \(g(x)\) is the unique point in \(W'\) follows because \(W \times W' \subset G(V)\) and \(G\) is one-to-one and onto \(G(V)\).
Next differentiate \[x\mapsto f\bigl(x,g(x)\bigr) ,\] at \(x_0\), which should be the zero map. The derivative is done in the same way as above. We get that for all \(h \in {\mathbb{R}}^{n}\) \[0 = A\bigl(h,g'(x_0)h\bigr) = A_xh + A_yg'(x_0)h ,\] and we obtain the desired derivative for \(g\) as well.
In other words, in the context of the theorem we have \(m\) equations in \(n+m\) unknowns. \[\begin{aligned} & f^1 (x_1,\ldots,x_n,y_1,\ldots,y_m) = 0 \\ & \qquad \qquad \qquad \vdots \\ & f^m (x_1,\ldots,x_n,y_1,\ldots,y_m) = 0\end{aligned}\] And the condition guaranteeing a solution is that this is a \(C^1\) mapping (that all the components are \(C^1\), or in other words all the partial derivatives exist and are continuous), and the matrix \[\begin{bmatrix} \frac{\partial f^1}{\partial y^1} & \ldots & \frac{\partial f^1}{\partial y^m} \\ \vdots & \ddots & \vdots \\ \frac{\partial f^m}{\partial y^1} & \ldots & \frac{\partial f^m}{\partial y^m} \end{bmatrix}\] is invertible at \((x_0,y_0)\).
Consider the set \(x^2+y^2-{(z+1)}^3 = -1\), \(e^x+e^y+e^z = 3\) near the point \((0,0,0)\). The function we are looking at is \[f(x,y,z) = (x^2+y^2-{(z+1)}^3+1,e^x+e^y+e^z-3) .\] We find that \[Df = \begin{bmatrix} 2x & 2y & -3{(z+1)}^2 \\ e^x & e^y & e^z \end{bmatrix} .\] The matrix \[\begin{bmatrix} 2(0) & -3{(0+1)}^2 \\ e^0 & e^0 \end{bmatrix} = \begin{bmatrix} 0 & -3 \\ 1 & 1 \end{bmatrix}\] is invertible. Hence near \((0,0,0)\) we can find \(y\) and \(z\) as \(C^1\) functions of \(x\) such that for \(x\) near 0 we have \[x^2+y(x)^2-{(z(x)+1)}^3 = -1, \qquad e^x+e^{y(x)}+e^{z(x)} = 3 .\] The theorem does not tell us how to find \(y(x)\) and \(z(x)\) explicitly, it just tells us they exist. In other words, near the origin the set of solutions is a smooth curve inn \({\mathbb{R}}^3\) that goes through the origin.
Note that there are versions of the theorem for arbitrarily many derivatives. If \(f\) has \(k\) continuous derivatives, then the solution also has \(k\) derivatives.
Exercises