8.3: The Derivative

Last updated
Save as PDF

Page ID: 32323

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Recall that when we had a function $f \colon {\mathbb{R}}\to {\mathbb{R}}$, we defined the derivative at $x$ as \[\lim_{h \to 0} \frac{f(x+h)-f(x)}{h} .\] In other words, there was a number $a$ (the derivative of $f$ at $x$) such that \[\lim_{h \to 0} \left\lvert {\frac{f(x+h)-f(x)}{h} - a} \right\rvert = \lim_{h \to 0} \left\lvert {\frac{f(x+h)-f(x) - ah}{h}} \right\rvert = \lim_{h \to 0} \frac{\left\lvert {f(x+h)-f(x) - ah} \right\rvert}{\left\lvert {h} \right\rvert} = 0.\] Multiplying by $a$ is a linear map in one dimension. That is, we think of $a \in L({\mathbb{R}}^1,{\mathbb{R}}^1)$. We use this definition to extend differentiation to more variables. Let $U \subset {\mathbb{R}}^n$ be an open subset and $f \colon U \to {\mathbb{R}}^m$. We say $f$ is differentiable at $x \in U$ if there exists an $A \in L({\mathbb{R}}^n,{\mathbb{R}}^m)$ such that \[\lim_{\substack{h \to 0\\h\in {\mathbb{R}}^n}} \frac{\left\lVert {f(x+h)-f(x) - Ah} \right\rVert}{\left\lVert {h} \right\rVert} = 0 .\] We define $Df(x) := A$, or $f'(x) := A$, and we say $A$ is the derivative of $f$ at $x$. When $f$ is differentiable at all $x \in U$, we say simply that $f$ is differentiable. For a differentiable function, the derivative of $f$ is a function from $U$ to $L({\mathbb{R}}^n,{\mathbb{R}}^m)$. Compare to the one dimensional case, where the derivative is a function from $U$ to ${\mathbb{R}}$, but we really want to think of ${\mathbb{R}}$ here as $L({\mathbb{R}}^1,{\mathbb{R}}^1)$. The norms above must be in the right spaces of course. The norm in the numerator is in ${\mathbb{R}}^m$, and the norm in the denominator is ${\mathbb{R}}^n$ where $h$ lives. Normally it is understood that $h \in {\mathbb{R}}^n$ from context. We will not explicitly say so from now on. We have again cheated somewhat and said that $A$ is the derivative. We have not shown yet that there is only one, let us do that now. Let $U \subset {\mathbb{R}}^n$ be an open subset and $f \colon U \to {\mathbb{R}}^m$. Suppose $x \in U$ and there exist $A,B \in L({\mathbb{R}}^n,{\mathbb{R}}^m)$ such that \[\lim_{h \to 0} \frac{\left\lVert {f(x+h)-f(x) - Ah} \right\rVert}{\left\lVert {h} \right\rVert} = 0 \qquad \text{and} \qquad \lim_{h \to 0} \frac{\left\lVert {f(x+h)-f(x) - Bh} \right\rVert}{\left\lVert {h} \right\rVert} = 0 .\] Then $A=B$. \[\begin{split} \frac{\left\lVert {(A-B)h} \right\rVert}{\left\lVert {h} \right\rVert} & = \frac{\left\lVert {f(x+h)-f(x) - Ah - (f(x+h)-f(x) - Bh)} \right\rVert}{\left\lVert {h} \right\rVert} \\ & \leq \frac{\left\lVert {f(x+h)-f(x) - Ah} \right\rVert}{\left\lVert {h} \right\rVert} + \frac{\left\lVert {f(x+h)-f(x) - Bh} \right\rVert}{\left\lVert {h} \right\rVert} . \end{split}\] So $\frac{\left\lVert {(A-B)h} \right\rVert}{\left\lVert {h} \right\rVert} \to 0$ as $h \to 0$. That is, given $\epsilon > 0$, then for all $h$ in some $\delta$-ball around the origin \[\epsilon > \frac{\left\lVert {(A-B)h} \right\rVert}{\left\lVert {h} \right\rVert} = \left\lVert {(A-B)\frac{h}{\left\lVert {h} \right\rVert}} \right\rVert .\] For any $x$ with $\left\lVert {x} \right\rVert=1$ let $h = (\nicefrac{\delta}{2}) \, x$, then $\left\lVert {h} \right\rVert < \delta$ and $\frac{h}{\left\lVert {h} \right\rVert} = x$ and so $\left\lVert {A-B} \right\rVert \leq \epsilon$. So $A = B$. If $f(x) = Ax$ for a linear mapping $A$, then $f'(x) = A$. This is easily seen: \[\frac{\left\lVert {f(x+h)-f(x) - Ah} \right\rVert}{\left\lVert {h} \right\rVert} = \frac{\left\lVert {A(x+h)-Ax - Ah} \right\rVert}{\left\lVert {h} \right\rVert} = \frac{0}{\left\lVert {h} \right\rVert} = 0 .\] Let $U \subset {\mathbb{R}}^n$ be open and $f \colon U \to {\mathbb{R}}^m$ be differentiable at $x_0$. Then $f$ is continuous at $x_0$. Another way to write the differentiability is to write \[r(h) := f(x_0+h)-f(x_0) - f'(x_0) h .\] As $\frac{\left\lVert {r(h)} \right\rVert}{\left\lVert {h} \right\rVert}$ must go to zero as $h \to 0$, then $r(h)$ itself must go to zero. The mapping $h \mapsto f'(x_0) h$ is linear mapping between finite dimensional spaces. Therefore it is continuous and goes to zero. Thereforem $f(x_0+h)$ must go to $f(x_0)$ as $h \to 0$. That is, $f$ is continuous at $x_0$. Let $U \subset {\mathbb{R}}^n$ be open and let $f \colon U \to {\mathbb{R}}^m$ be differentiable at $x_0 \in U$. Let $V \subset {\mathbb{R}}^m$ be open, $f(U) \subset V$ and let $g \colon V \to {\mathbb{R}}^\ell$ be differentiable at $f(x_0)$. Then \[F(x) = g\bigl(f(x)\bigr)\] is differentiable at $x_0$ and \[F'(x_0) = g'\bigl(f(x_0)\bigr) f'(x_0) .\] Without the points this is sometimes written as $F' = {(f \circ g)}' = g' f'$. The way to understand it is that the derivative of the composition $g \circ f$ is the composition of the derivatives of $g$ and $f$. That is, if $A := f'(x_0)$ and $B := g'\bigl(f(x_0)\bigr)$, then $F'(x_0) = BA$. Let $A := f'(x_0)$ and $B := g'\bigl(f(x_0)\bigr)$. Take $h \in {\mathbb{R}}^n$ and write $y_0 = f(x_0)$, $k = f(x_0+h)-f(x_0)$. Let \[r(h) := f(x_0+h)-f(x_0) - A h = k - Ah.\] Then \[\begin{split} \frac{\left\lVert {F(x_0+h)-F(x_0) - BAh} \right\rVert}{\left\lVert {h} \right\rVert} & = \frac{\left\lVert {g\bigl(f(x_0+h)\bigr)-g\bigl(f(x_0)\bigr) - BAh} \right\rVert}{\left\lVert {h} \right\rVert} \\ & = \frac{\left\lVert {g(y_0+k)-g(y_0) - B\bigl(k-r(h)\bigr)} \right\rVert}{\left\lVert {h} \right\rVert} \\ %& = %\frac %{\norm{g(y_0+k)-g(y_0) - B\bigl(k-r(h)\bigr)}} %{\norm{k}} %\frac %{\norm{f(x_0+h)-f(x_0)}} %{\norm{h}} %\\ & \leq \frac {\left\lVert {g(y_0+k)-g(y_0) - Bk} \right\rVert} {\left\lVert {h} \right\rVert} + \left\lVert {B} \right\rVert \frac {\left\lVert {r(h)} \right\rVert} {\left\lVert {h} \right\rVert} \\ & = \frac {\left\lVert {g(y_0+k)-g(y_0) - Bk} \right\rVert} {\left\lVert {k} \right\rVert} \frac {\left\lVert {f(x_0+h)-f(x_0)} \right\rVert} {\left\lVert {h} \right\rVert} + \left\lVert {B} \right\rVert \frac {\left\lVert {r(h)} \right\rVert} {\left\lVert {h} \right\rVert} . \end{split}\] First, $\left\lVert {B} \right\rVert$ is constant and $f$ is differentiable at $x_0$, so the term $\left\lVert {B} \right\rVert\frac{\left\lVert {r(h)} \right\rVert}{\left\lVert {h} \right\rVert}$ goes to 0. Next as $f$ is continuous at $x_0$, we have that as $h$ goes to 0, then $k$ goes to 0. Therefore $\frac {\left\lVert {g(y_0+k)-g(y_0) - Bk} \right\rVert} {\left\lVert {k} \right\rVert}$ goes to 0 because $g$ is differentiable at $y_0$. Finally \[\frac {\left\lVert {f(x_0+h)-f(x_0)} \right\rVert} {\left\lVert {h} \right\rVert} \leq \frac {\left\lVert {f(x_0+h)-f(x_0)-Ah} \right\rVert} {\left\lVert {h} \right\rVert} + \frac {\left\lVert {Ah} \right\rVert} {\left\lVert {h} \right\rVert} \leq \frac {\left\lVert {f(x_0+h)-f(x_0)-Ah} \right\rVert} {\left\lVert {h} \right\rVert} + \left\lVert {A} \right\rVert .\] As $f$ is differentiable at $x_0$, the term $\frac {\left\lVert {f(x_0+h)-f(x_0)} \right\rVert} {\left\lVert {h} \right\rVert}$ stays bounded as $h$ goes to 0. Therefore, $\frac{\left\lVert {F(x_0+h)-F(x_0) - BAh} \right\rVert}{\left\lVert {h} \right\rVert}$ goes to zero, and $F'(x_0) = BA$, which is what was claimed. Partial derivatives There is another way to generalize the derivative from one dimension. We can hold all but one variables constant and take the regular derivative. Let $f \colon U \to {\mathbb{R}}$ be a function on an open set $U \subset {\mathbb{R}}^n$. If the following limit exists we write \[\frac{\partial f}{\partial x^j} (x) := \lim_{h\to 0}\frac{f(x^1,\ldots,x^{j-1},x^j+h,x^{j+1},\ldots,x^n)-f(x)}{h} = \lim_{h\to 0}\frac{f(x+h e_j)-f(x)}{h} .\] We call $\frac{\partial f}{\partial x^j} (x)$ the partial derivative of $f$ with respect to $x^j$. Sometimes we write $D_j f$ instead. For a mapping $f \colon U \to {\mathbb{R}}^m$ we write $f = (f^1,f^2,\ldots,f^m)$, where $f^k$ are real-valued functions. Then we define $\frac{\partial f^k}{\partial x^j}$ (or write it as $D_j f^k$). Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the total derivative of a function. Let $U \subset {\mathbb{R}}^n$ be open and let $f \colon U \to {\mathbb{R}}^m$ be differentiable at $x_0 \in U$. Then all the partial derivatives at $x_0$ exist and in terms of the standard basis of ${\mathbb{R}}^n$ and ${\mathbb{R}}^m$, $f'(x_0)$ is represented by the matrix \[\begin{bmatrix} \frac{\partial f^1}{\partial x^1}(x_0) & \frac{\partial f^1}{\partial x^2}(x_0) & \ldots & \frac{\partial f^1}{\partial x^n}(x_0) \\ \frac{\partial f^2}{\partial x^1}(x_0) & \frac{\partial f^2}{\partial x^2}(x_0) & \ldots & \frac{\partial f^2}{\partial x^n}(x_0) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f^m}{\partial x^1}(x_0) & \frac{\partial f^m}{\partial x^2}(x_0) & \ldots & \frac{\partial f^m}{\partial x^n}(x_0) \end{bmatrix} .\] In other words \[f'(x_0) \, e_j = \sum_{k=1}^m \frac{\partial f^k}{\partial x^j}(x_0) \,e_k .\] If $h = \sum_{j=1}^n c^j e_j$, then \[f'(x_0) \, h = \sum_{j=1}^n \sum_{k=1}^m c^j \frac{\partial f^k}{\partial x^j}(x_0) \,e_k .\] Again note the up-down pattern with the indices being summed over. That is on purpose. Fix a $j$ and note that \[\begin{split} \left\lVert {\frac{f(x_0+h e_j)-f(x_0)}{h} - f'(x_0) e_j} \right\rVert & = \left\lVert {\frac{f(x_0+h e_j)-f(x_0) - f'(x_0) h e_j}{h}} \right\rVert \\ & = \frac{\left\lVert {f(x_0+h e_j)-f(x_0) - f'(x_0) h e_j} \right\rVert}{\left\lVert {h e_j} \right\rVert} . \end{split}\] As $h$ goes to 0, the right hand side goes to zero by differentiability of $f$, and hence \[\lim_{h \to 0} \frac{f(x_0+h e_j)-f(x_0)}{h} = f'(x_0) e_j .\] Note that $f$ is vector valued. So represent $f$ by components $f = (f^1,f^2,\ldots,f^m)$, and note that taking a limit in ${\mathbb{R}}^m$ is the same as taking the limit in each component separately. Therefore for any $k$ the partial derivative \[\frac{\partial f^k}{\partial x^j} (x_0) = \lim_{h \to 0} \frac{f^k(x_0+h e_j)-f^k(x_0)}{h}\] exists and is equal to the $k$th component of $f'(x_0) e_j$, and we are done. One of the consequences of the theorem is that if $f$ is differentiable on $U$, then $f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m)$ is a continuous function if and only if all the $\frac{\partial f^k}{\partial x^j}$ are continuous functions. Gradient and directional derivatives Let $U \subset {\mathbb{R}}^n$ be open and $f \colon U \to {\mathbb{R}}$ is a differentiable function. We define the gradient as \[\nabla f (x) := \sum_{j=1}^n \frac{\partial f}{\partial x^j} (x)\, e_j .\] Here the upper-lower indices do not really match up. Suppose $\gamma \colon (a,b) \subset {\mathbb{R}}\to {\mathbb{R}}^n$ is a differentiable function and the image $\gamma\bigl((a,b)\bigr) \subset U$. Write $\gamma = (\gamma^1,\gamma^2,\ldots,\gamma^n)$. Let \[g(t) := f\bigl(\gamma(t)\bigr) .\] The function $g$ is differentiable and the derivative is \[g'(t) = \sum_{j=1}^n \frac{\partial f}{\partial x^j} \bigl(\gamma(t)\bigr) \frac{d\gamma^j}{dt} (t) = \sum_{j=1}^n \frac{\partial f}{\partial x^j} \frac{d\gamma^j}{dt} .\] For convenience, we sometimes leave out the points where we are evaluating as on the right hand side above. Notice \[g'(t) = (\nabla f) \bigl(\gamma(t)\bigr) \cdot \gamma'(t) = \nabla f \cdot \gamma' ,\] where the dot is the standard scalar dot product. We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. So pick a vector $u \in {\mathbb{R}}^n$ such that $\left\lVert {u} \right\rVert = 1$. Fix $x \in U$. Then define \[\gamma(t) := x + tu .\] It is easy to compute that $\gamma'(t) = u$ for all $t$. By chain rule \[\frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] = (\nabla f) (x) \cdot u ,\] where the notation $\frac{d}{dt}\big|_{t=0}$ represents the derivative evaluated at $t=0$. We also compute directly \[\frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] = \lim_{h\to 0} \frac{f(x+hu)-f(x)}{h} .\] We obtain the directional derivative, denoted by \[D_u f (x) := \frac{d}{dt}\Big|_{t=0} \bigl[ f(x+tu) \bigr] ,\] which can be computed by one of the methods above. Let us suppose $(\nabla f)(x) \neq 0$. By Cauchy-Schwarz inequality we have \[\left\lvert {D_u f(x)} \right\rvert \leq \left\lVert {(\nabla f)(x)} \right\rVert .\] Equality is achieved when $u$ is a scalar multiple of $(\nabla f)(x)$. That is, when \[u = \frac{(\nabla f)(x)}{\left\lVert {(\nabla f)(x)} \right\rVert} ,\] we get $D_u f(x) = \left\lVert {(\nabla f)(x)} \right\rVert$. The gradient points in the direction in which the function grows fastest, in other words, in the direction in which $D_u f(x)$ is maximal. Bounding the derivative Let us prove a “mean value theorem” for vector valued functions. If $\varphi \colon [a,b] \to {\mathbb{R}}^n$ is differentiable on $(a,b)$ and continuous on $[a,b]$, then there exists a $t$ such that \[\left\lVert {\varphi(b)-\varphi(a)} \right\rVert \leq (b-a) \left\lVert {\varphi'(t)} \right\rVert .\] By mean value theorem on the function $\bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(t)$ (the dot is the scalar dot product again) we obtain there is a $t$ such that \[\bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(b) - \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(a) = \left\lVert {\varphi(b)-\varphi(a)} \right\rVert^2 = \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi'(t)\] where we treat $\varphi'$ as a simply a column vector of numbers by abuse of notation. Note that in this case, it is not hard to see that \(\left\lVert {\varphi'(t)} \right\rVert_{L({\mathbb{R}},{\mathbb{R}}^n)} = \left\lVert {\varphi'(t)} \right\rVert_

ParseError: invalid DekiScript (click for details)

Callstack:
    at (Bookshelves/Analysis/Introduction_to_Real_Analysis_(Lebl)/09:_Several_Variables_and_Partial_Derivatives/9.03:_The_Derivative), /content/body/span, line 1, column 1
    at wiki.page()
    at (Under_Construction/Purgatory/Remixer_University/Username:_junalyn2020/Book:_Introduction_to_Real_Analysis_(Lebl)/8:_Several_Variables_and_Partial_Derivatives/8.3:_The_Derivative), /content/body/div[1]/pre, line 2, column 14

^n\) be a convex open set, $f \colon U \to {\mathbb{R}}^m$ a differentiable function, and an $M$ such that \[\left\lVert {f'(x)} \right\rVert \leq M\] for all $x \in U$. Then $f$ is Lipschitz with constant $M$, that is \[\left\lVert {f(x)-f(y)} \right\rVert \leq M \left\lVert {x-y} \right\rVert\] for all $x,y \in U$. Fix $x$ and $y$ in $U$ and note that $(1-t)x+ty \in U$ for all $t \in [0,1]$ by convexity. Next \[\frac{d}{dt} \Bigl[f\bigl((1-t)x+ty\bigr)\Bigr] = f'\bigl((1-t)x+ty\bigr) (y-x) .\] By mean value theorem above we get \[\left\lVert {f(x)-f(y)} \right\rVert \leq \left\lVert {\frac{d}{dt} \Bigl[ f\bigl((1-t)x+ty\bigr) \Bigr] } \right\rVert \leq \left\lVert {f'\bigl((1-t)x+ty\bigr)} \right\rVert \left\lVert {y-x} \right\rVert \leq M \left\lVert {y-x} \right\rVert . \qedhere\] If $U$ is not convex the proposition is not true. To see this fact, take the set \[U = \{ (x,y) : 0.9 < x^2+y^2 < 1.1 \} \setminus \{ (x,0) : x < 0 \} .\] Let $f(x,y)$ be the angle that the line from the origin to $(x,y)$ makes with the positive $x$ axis. You can even write the formula for $f$: \[f(x,y) = 2 \operatorname{arctan}\left( \frac{y}{x+\sqrt{x^2+y^2}}\right) .\] Think spiral staircase with room in the middle. See . The function is differentiable, and the derivative is bounded on $U$, which is not hard to see. Thinking of what happens near where the negative $x$-axis cuts the annulus in half, we see that the conclusion cannot hold. Let us solve the differential equation $f' = 0$. If $U \subset {\mathbb{R}}^n$ is connected and $f \colon U \to {\mathbb{R}}^m$ is differentiable and $f'(x) = 0$, for all $x \in U$, then $f$ is constant. For any $x \in U$, there is a ball $B(x,\delta) \subset U$. The ball $B(x,\delta)$ is convex. Since $\left\lVert {f'(y)} \right\rVert \leq 0$ for all $y \in B(x,\delta)$ then by the theorem, $\left\lVert {f(x)-f(y)} \right\rVert \leq 0 \left\lVert {x-y} \right\rVert = 0$, so $f(x) = f(y)$ for all $y \in B(x,\delta)$. This means that $f^{-1}(c)$ is open for any $c \in {\mathbb{R}}^m$. Suppose $f^{-1}(c)$ is nonempty. The two sets \[U' = f^{-1}(c), \qquad U'' = f^{-1}({\mathbb{R}}^m\setminus\{c\}) = \bigcup_{\substack{a \in {\mathbb{R}}^m\\a\neq c}} f^{-1}(a)\] are open disjoint, and further $U = U' \cup U''$. So as $U'$ is nonempty, and $U$ is connected, we have that $U'' = \emptyset$. So $f(x) = c$ for all $x \in U$. Continuously differentiable functions We say $f \colon U \subset {\mathbb{R}}^n \to {\mathbb{R}}^m$ is continuously differentiable, or $C^1(U)$ if $f$ is differentiable and $f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m)$ is continuous. Let $U \subset {\mathbb{R}}^n$ be open and $f \colon U \to {\mathbb{R}}^m$. The function $f$ is continuously differentiable if and only if all the partial derivatives exist and are continuous. Without continuity the theorem does not hold. Just because partial derivatives exist does not mean that $f$ is differentiable, in fact, $f$ may not even be continuous. See the exercises FIXME. We have seen that if $f$ is differentiable, then the partial derivatives exist. Furthermore, the partial derivatives are the entries of the matrix of $f'(x)$. So if $f' \colon U \to L({\mathbb{R}}^n,{\mathbb{R}}^m)$ is continuous, then the entries are continuous, hence the partial derivatives are continuous. To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix $x \in U$. If we can show that $f'(x)$ exists we are done, because the entries of the matrix $f'(x)$ are then the partial derivatives and if the entries are continuous functions, the matrix valued function $f'$ is continuous. Let us do induction on dimension. First let us note that the conclusion is true when $n=1$. In this case the derivative is just the regular derivative (exercise: you should check that the fact that the function is vector valued is not a problem). Suppose the conclusion is true for ${\mathbb{R}}^{n-1}$, that is, if we restrict to the first $n-1$ variables, the conclusion is true. It is easy to see that the first $n-1$ partial derivatives of $f$ restricted to the set where the last coordinate is fixed are the same as those for $f$. In the following we think of ${\mathbb{R}}^{n-1}$ as a subset of ${\mathbb{R}}^n$, that is the set in ${\mathbb{R}}^n$ where $x^n = 0$. Let \[A = \begin{bmatrix} \frac{\partial f^1}{\partial x^1}(x) & \ldots & \frac{\partial f^1}{\partial x^n}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f^m}{\partial x^1}(x) & \ldots & \frac{\partial f^m}{\partial x^n}(x) \end{bmatrix} , \qquad A_1 = \begin{bmatrix} \frac{\partial f^1}{\partial x^1}(x) & \ldots & \frac{\partial f^1}{\partial x^{n-1}}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f^m}{\partial x^1}(x) & \ldots & \frac{\partial f^m}{\partial x^{n-1}}(x) \end{bmatrix} , \qquad v = %\frac{\partial f}{\partial x^n}(x) = \begin{bmatrix} \frac{\partial f^1}{\partial x^n}(x) \\ \vdots \\ \frac{\partial f^m}{\partial x^n}(x) \end{bmatrix} .\] Let $\epsilon > 0$ be given. Let $\delta > 0$ be such that for any $k \in {\mathbb{R}}^{n-1}$ with $\left\lVert {k} \right\rVert < \delta$ we have \[\frac{\left\lVert {f(x+k) - f(x) - A_1k} \right\rVert}{\left\lVert {k} \right\rVert} < \epsilon .\] By continuity of the partial derivatives, suppose $\delta$ is small enough so that \[\left\lvert {\frac{\partial f^j}{\partial x^n}(x+h) - \frac{\partial f^j}{\partial x^n}(x)} \right\rvert < \epsilon ,\] for all $j$ and all $h$ with $\left\lVert {h} \right\rVert < \delta$. Let $h = h_1 + t e_n$ be a vector in ${\mathbb{R}}^n$ where $h_1 \in {\mathbb{R}}^{n-1}$ such that $\left\lVert {h} \right\rVert < \delta$. Then $\left\lVert {h_1} \right\rVert \leq \left\lVert {h} \right\rVert < \delta$. Note that $Ah = A_1h_1 + tv$. \[\begin{split} \left\lVert {f(x+h) - f(x) - Ah} \right\rVert & = \left\lVert {f(x+h_1 + t e_n) - f(x+h_1) - tv + f(x+h_1) - f(x) - A_1h_1} \right\rVert \\ & \leq \left\lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \right\rVert + \left\lVert {f(x+h_1) - f(x) - A_1h_1} \right\rVert \\ & \leq \left\lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \right\rVert + \epsilon \left\lVert {h_1} \right\rVert . \end{split}\] As all the partial derivatives exist then by the mean value theorem for each $j$ there is some $\theta_j \in [0,t]$ (or $[t,0]$ if $t < 0$), such that \[f^j(x+h_1 + t e_n) - f^j(x+h_1) = t \frac{\partial f^j}{\partial x^n}(x+h_1+\theta_j e_n).\] Note that if $\left\lVert {h} \right\rVert < \delta$ then $\left\lVert {h_1+\theta_j e_n} \right\rVert \leq \left\lVert {h} \right\rVert < \delta$. So to finish the estimate \[\begin{split} \left\lVert {f(x+h) - f(x) - Ah} \right\rVert & \leq \left\lVert {f(x+h_1 + t e_n) - f(x+h_1) -tv} \right\rVert + \epsilon \left\lVert {h_1} \right\rVert \\ & \leq \sqrt{\sum_{j=1}^m {\left(t\frac{\partial f^j}{\partial x^n}(x+h_1+\theta_j e_n) - t \frac{\partial f^j}{\partial x^n}(x)\right)}^2} + \epsilon \left\lVert {h_1} \right\rVert \\ & \leq \sqrt{m}\, \epsilon \left\lvert {t} \right\rvert + \epsilon \left\lVert {h_1} \right\rVert \\ & \leq (\sqrt{m}+1)\epsilon \left\lVert {h} \right\rVert . \end{split}\] The Jacobian Let $U \subset {\mathbb{R}}^n$ and $f \colon U \to {\mathbb{R}}^n$ be a differentiable mapping. Then define the Jacobian of $f$ at $x$ as \[J_f(x) := \det\bigl( f'(x) \bigr) .\] Sometimes this is written as \[\frac{\partial(f^1,\ldots,f^n)}{\partial(x^1,\ldots,x^n)} .\] This last piece of notation may seem somewhat confusing, but it is useful when you need to specify the exact variables and function components used. The Jacobian $J_f$ is a real valued function, and when $n=1$ it is simply the derivative. When $f$ is $C^1$, then $J_f$ is a continuous function. From the chain rule it follows that: \[J_{f \circ g} (x) = J_f\bigl(g(x)\bigr) J_g(x) .\] It can be computed directly that the determinant tells us what happens to area/volume. Suppose we are in ${\mathbb{R}}^2$. Then if $A$ is a linear transformation, it follows by direct computation that the direct image of the unit square $A([0,1]^2)$ has area $\left\lvert {\det(A)} \right\rvert$. Note that the sign of the determinant determines “orientation”. If the determinant is negative, then the two sides of the unit square will be flipped in the image. We claim without proof that this follows for arbitrary figures, not just the square. Similarly, the Jacobian measures how much a differentiable mapping stretches things locally, and if it flips orientation. Exercises Let $f \colon {\mathbb{R}}^2 \to {\mathbb{R}}$ be given by $f(x,y) = \sqrt{x^2+y^2}$. Show that $f$ is not differentiable at the origin. Define a function $f \colon {\mathbb{R}}^2 \to {\mathbb{R}}$ by \[f(x,y) := \begin{cases} \frac{xy}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$}, \\ 0 & \text{ if $(x,y) = (0,0)$}. \end{cases}\] a) Show that partial derivatives $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ exist at all points (including the origin). b) Show that $f$ is not continuous at the origin (and hence not differentiable). Define a function $f \colon {\mathbb{R}}^2 \to {\mathbb{R}}$ by \[f(x,y) := \begin{cases} \frac{x^2y}{x^2+y^2} & \text{ if $(x,y) \not= (0,0)$}, \\ 0 & \text{ if $(x,y) = (0,0)$}. \end{cases}\] a) Show that partial derivatives $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ exist at all points. b) Show that $f$ is continuous at the origin. c) Show that $f$ is not differentiable at the origin.