Skip to main content
\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)
Mathematics LibreTexts

3.3: Best Affine Approximations

  • Page ID
    22934
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)

    Best Affine Approximations

    Given a function \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and a point \(\mathbf{c}\), we wish to find the affine function \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) which best approximates \(f\) for points close to \(\mathbf{c}\). As before, best will mean that the remainder function,

    \[ R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h}) , \]

    approaches 0 at a sufficiently fast rate. In this context, since \(R(\mathbf{h})\) is a scalar and \(\mathbf{h}\) is a vector, sufficiently fast will mean that

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{R(\mathbf{h})}{\|\mathbf{h}\|}=0 . \label{3.3.2}\]

    Generalizing our previous notation, we will say that a function \(R: \mathbb{R}^{n} \rightarrow \mathbb{R}\) satisfying Equation \ref{3.3.2} is \(o(\mathbf{h})\). Note that if \(n=1\) this extended definition of \(o(\mathbf{h})\) is equivalent to the definition given in Section 2.2.

    Definition \(\PageIndex{1}\)

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is defined on an open ball containing the point \(\mathbf{c}\). We call an affine function \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) the best affine approximation to \(f\) at \(\mathbf{c}\) if (1) \(A(\mathbf{c})=f(\mathbf{c})\) and (2) \(R(\mathbf{h}) \text { is } o(\mathbf{h})\), where

    \[ R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h}) . \]

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and suppose \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is the best affine approximation to \(f\) at \(\mathbf{c}\). Since \(A\) is affine, there exists a linear function \(L: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and a scalar \(b\) such that

    \[ A(\mathbf{x})=L(\mathbf{x})+b \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^n\). Since \(A(\mathbf{c})=f(\mathbf{c})\), we have

    \[ f(\mathbf{c})=L(\mathbf{c})+b, \]

    which implies that

    \[ b=f(\mathbf{c})-L(\mathbf{c}) . \]

    Hence

    \[ A(\mathbf{x})=L(\mathbf{x})+f(\mathbf{c})-L(\mathbf{c})=L(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^n\). Moreover, if we let

    \[ \mathbf{a}=\left(L\left(\mathbf{e}_{1}\right), L\left(\mathbf{e}_{2}\right), \ldots, L\left(\mathbf{e}_{n}\right)\right) , \]

    where \(\mathbf{e}_{1}, \mathbf{e}_{2}, \ldots, \mathbf{e}_{n}\) are, as usual, the standard basis vectors for \(\mathbb{R}^n\), then, from our results in Section 1.5,

    \[ L(\mathbf{x})=\mathbf{a} \cdot \mathbf{x} \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^n\). Hence

    \[ A(\mathbf{x})=\mathbf{a} \cdot(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) , \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^n\), and we see that \(A\) is completely determined by the vector \(\mathbf{a}\)

    Definition \(\PageIndex{2}\)

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is defined on an open ball containing the point \(\mathbf{c}\). If \(f\) has a best affine approximation at \(\mathbf{c}\), then we say \(f\) is differentiable at \(\mathbf{c}\). Moreover, if the best affine approximation to \(f\) at \(\mathbf{c}\) is given by

    \[ A(\mathbf{x})=\mathbf{a} \cdot(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) , \]

    then we call \(\mathbf{a}\) the derivative of \(f\) at \(\mathbf{c}\) and write \(D f(\mathbf{c})=\mathbf{a}\).

    Now suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable at \(\mathbf{c}\) with best affine approximation \(A\) and let \(\mathbf{a}=\left(a_{1}, a_{2}, \ldots, a_{n}\right)=D f(\mathbf{c})\). Since

    \[ R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h})=f(\mathbf{c}+\mathbf{h})-\mathbf{a} \cdot \mathbf{h}-f(\mathbf{c}) \]

    is \(o(\mathbf{h})\), we must have

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{R(\mathbf{h})}{\|\mathbf{h}\|}=0 . \]

    In particular, for \(k=1,2, \ldots, n\), if we let \(\mathbf{h}=t \mathbf{e}_{k}\), then \(\mathbf{h}\) approaches \(\mathbf{0}\) as \(t\) approaches 0, so

    \[ 0=\lim _{t \rightarrow 0} \frac{R\left(t \mathbf{e}_{k}\right)}{\left\|t \mathbf{e}_{k}\right\|}=\lim _{t \rightarrow 0} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-t\left(\mathbf{a} \cdot \mathbf{e}_{k}\right)-f(\mathbf{c})}{|t|}=\lim _{t \rightarrow 0} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-t a_{k}-f(\mathbf{c})}{|t|} \nonumber \]

    First considering \(t>0\), we have

    \[ 0=\lim _{t \rightarrow 0^{+}} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-t a_{k}-f(\mathbf{c})}{t}=\lim _{t \rightarrow 0^{+}}\left(\frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-f(\mathbf{c})}{t}-a_{k}\right) , \]

    implying that

    \[ a_{k}=\lim _{t \rightarrow 0^{+}} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-f(\mathbf{c})}{t} . \]

    With \(t<0\), we have

    \[ 0=\lim _{t \rightarrow 0^{-}} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-t a_{k}-f(\mathbf{c})}{-t}=-\lim _{t \rightarrow 0^{-}}\left(\frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-f(\mathbf{c})}{t}-a_{k}\right), \]

    implying that

    \[ a_{k}=\lim _{t \rightarrow 0^{-}} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-f(\mathbf{c})}{t} . \]

    Hence

    \[ a_{k}=\lim _{t \rightarrow 0} \frac{f\left(\mathbf{c}+t \mathbf{e}_{k}\right)-f(\mathbf{c})}{t}=\frac{\partial}{\partial x_{k}} f(\mathbf{c}) . \]

    Thus we have shown that

    \[ \mathbf{a}=\left(\frac{\partial}{\partial x_{1}} f(\mathbf{c}), \frac{\partial}{\partial x_{2}} f(\mathbf{c}), \ldots, \frac{\partial}{\partial x_{n}} f(\mathbf{c})\right)=\nabla f(\mathbf{c}) . \]

    Theorem \(\PageIndex{1}\)

    If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is is differentiable at \(\mathbf{c}\), then

    \[ D f(\mathbf{c})=\nabla f(\mathbf{c}) \]

    It now follows that if \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable at \(\mathbf{c}\), then the best affine approximation to \(f\) at \(\mathbf{c}\) is

    \[ A(\mathbf{x})=\nabla f(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})-f(\mathbf{c}) . \]

    However, the converse does not hold: it is possible for \(\nabla f(\mathbf{c})\) to exist even when \(f\) is not differentiable at \(\mathbf{c}\). Before looking at an example, note that if \(f\) is differentiable at \(\mathbf{c}\) and \(A\) is the best affine approximation to \(f\) at \(\mathbf{c}\), then, since \(R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h})\) is \(o(\mathbf{h})\),

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}}(f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h}))=\lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{R(\mathbf{h})}{\|\mathbf{h}\|}\|\mathbf{h}\|=0\|\mathbf{0}\|=0 . \]

    Now \(A\) is continuous at \(\mathbf{c}\), so it follows that

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} f(\mathbf{c}+\mathbf{h})=\lim _{\mathbf{h} \rightarrow \mathbf{0}} A(\mathbf{c}+\mathbf{h})=A(\mathbf{c})=f(\mathbf{c}) . \]

    In other words, \(f\) is continuous at \(\mathbf{c}\).

    Theorem \(\PageIndex{2}\)

    If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable at \(\mathbf{c}\), then \(f\) is continuous at \(\mathbf{c}\).

    Example \(\PageIndex{1}\)

    Consider the function 

    \[ g(x, y)= \begin{cases}\frac{x y}{x^{2}+y^{2}}, & \text { if }(x, y) \neq(0,0), \nonumber \\ 0, & \text { if }(x, y)=(0,0) .\end{cases} \nonumber \]

    In Section 3.1 we showed that \(g\) is not continuous at (0,0) and in Section 3.2 we saw that \(\nabla g(0,0)=(0,0)\). Since \(g\) is not continuous at (0,0), it now follows, from the previous theorem, that \(g\) is not differentiable at (0,0), even though the gradient exists at that point. From the graph of \(g\) in Figure 3.3.1 (originally seen in Figure 3.1.7), we can see that the fact that \(g\) is not differentiable, in fact, not even continuous, at the origin shows up geometrically as a tear in the surface.

    Screen Shot 2021-07-26 at 14.38.40.png

    Figure 3.3.1 The graph of a nondifferentiable function

    From this example we see that the differentiability of a function \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) at a point \(\mathbf{c}\) requires more than just the existence of the gradient of \(f\) at \(\mathbf{c}\). It turns out that continuity of the partial derivatives of \(f\) on an open ball containing \(\mathbf{c}\) suffices to show that \(f\) is differentiable at \(\mathbf{c}\). Note that the partial derivatives of \(g\) in the previous example are not continuous (see Exercise 8 of Section 3.2).

    So we will now assume that \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^1\) on some open ball containing \(\mathbf{c}\). If we define an affine function \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) by 

    \[ A(\mathbf{x})=\nabla f(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) , \]

    then the remainder function is

    \[ R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h})=f(\mathbf{c}+\mathbf{h})-f(\mathbf{c})-\nabla f(\mathbf{c}) \cdot \mathbf{h} . \]

    We need to show that \(R(\mathbf{h})\) is \(o(\mathbf{h})\). Toward that end, for a fixed \(\mathbf{h} \neq \mathbf{0}\), define \(\varphi: \mathbb{R} \rightarrow \mathbb{R}\) by

    \[ \varphi(t)=f(\mathbf{c}+t \mathbf{h}) . \]

    We first note that \(\varphi\) is differentiable with

    \[ \begin{align}
    \varphi^{\prime}(t) &=\lim _{s \rightarrow 0} \frac{\varphi(t+s)-\varphi(t)}{s} \nonumber \\
    &=\lim _{s \rightarrow 0} \frac{f(\mathbf{c}+(t+s) \mathbf{h})-f(\mathbf{c}+t \mathbf{h})}{s} \nonumber \\
    &=\|\mathbf{h}\| \lim _{s \rightarrow 0} \frac{f\left(\mathbf{c}+t \mathbf{h}+s\|\mathbf{h}\| \frac{\mathbf{h}}{\|\mathbf{h}\|}\right)-f(\mathbf{c}+t \mathbf{h})}{s\|\mathbf{h}\|} \nonumber \\
    &=\|\mathbf{h}\| D_{\frac{\mathbf{h}}{\|\mathbf{h}\|}} f(\mathbf{c}+t \mathbf{h}) \nonumber \\
    &=\|\mathbf{h}\|\left(\nabla f(\mathbf{c}+t \mathbf{h}) \cdot \frac{\mathbf{h}}{\|\mathbf{h}\|}\right) \nonumber \\
    &=\nabla f(\mathbf{c}+t \mathbf{h}) \cdot \mathbf{h} \label{}
    \end{align} \]

    From the Mean Value Theorem of single-variable calculus, it follows that there exists a number \(s\) between 0 and 1 such that

    \[ \varphi^{\prime}(s)=\varphi(1)-\varphi(0)=f(\mathbf{c}+\mathbf{h})-f(\mathbf{c}) . \]

    Hence we may write

    \[ R(\mathbf{h})=\nabla f(\mathbf{c}+s \mathbf{h}) \cdot \mathbf{h}-\nabla f(\mathbf{c}) \cdot \mathbf{h}=(\nabla f(\mathbf{c}+s \mathbf{h})-\nabla f(\mathbf{c})) \cdot \mathbf{h} . \]

    Applying the Cauchy-Schwarz inequality to (3.3.29),

    \[ |R(\mathbf{h})| \leq\|\nabla f(\mathbf{c}+s \mathbf{h})-\nabla f(\mathbf{c})\|\|\mathbf{h}\| , \]

    and so

    \[ \frac{|R(\mathbf{h})|}{\|\mathbf{h}\|} \leq\|\nabla f(\mathbf{c}+s \mathbf{h})-\nabla f(\mathbf{c})\| . \]

    Now the partial derivatives of \(f\) are continuous, so

    \[ \begin{align}
    \lim _{\mathbf{h} \rightarrow \mathbf{0}}\|\nabla f(\mathbf{c}+s \mathbf{h})-\nabla f(\mathbf{c})\| &=\|\nabla f(\mathbf{c}+s \mathbf{0})-\nabla f(\mathbf{c})\| \nonumber \\
    &=\|\nabla f(\mathbf{c})-\nabla f(\mathbf{c})\| \nonumber \\
    &=0. \label{}
    \end{align}\] 

    Hence

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{R(\mathbf{h})}{\|\mathbf{h}\|}=0 . \]

    That is, \(R(\mathbf{h})\) is \(o(\mathbf{h})\) and \(A\) is the best affine approximation to \(f\) at \(\mathbf{c}\). Thus we have the following fundamental theorem.

    Theorem \(\PageIndex{3}\)

    If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^1\) on an open ball containing the point \(\mathbf{c}\), then \(f\) is differentiable at \(\mathbf{c}\).

    Example \(\PageIndex{2}\)

    Suppose \(f: \mathbb{R}^{2} \rightarrow \mathbb{R}\) is defined by

    \[ f(x, y)=4-2 x^{2}-y^{2} . \nonumber \]

    To find the best affine approximation to \(f\) at (1,1), we first compute

    \[ \nabla f(x, y)=(-4 x,-2 y) . \nonumber \]

    Thus \(\nabla f(1,1)=(-4,-2)\) and \(f(1,1)=1\), so the best affine approximation is

    \[ A(x, y)=(-4,-2) \cdot(x-1, y-1)+1 . \nonumber\]

    Simplifying, we have

    \[ A(x, y)=-4 x-2 y+7 . \nonumber \]

    Example \(\PageIndex{3}\)

    Suppose \(f: \mathbb{R}^{3} \rightarrow \mathbb{R}\) is defined by

    \[ f(x, y, z)=\sqrt{x^{2}+y^{2}+z^{2}}. \nonumber \]

    Then

    \[ \nabla f(x, y, z)=\frac{1}{\sqrt{x^{2}+y^{2}+z^{2}}}(x, y, z) . \nonumber \]

    Thus, for example, the best affine approximation to \(f\) at (2,1,2) is

    \begin{aligned}
    A(x, y, z) &=\nabla f(2,1,2) \cdot(x-2, y-1, z-2)+f(2,1,2) \\
    &=\frac{1}{3}(2,1,2) \cdot(x-2, y-1, z-2)+3 \\
    &=\frac{2}{3}(x-2)+\frac{1}{3}(y-1)+\frac{2}{3}(z-2)+3 \\
    &=\frac{2}{3} x+\frac{1}{3} y+\frac{2}{3} z.
    \end{aligned}

    Now suppose we let \((x,y,z)\) be the lengths of the three sides of a solid block, in which case \(f(x,y,z)\) represents the length of the diagonal of the box. Moreover, suppose we measure the sides of the block and find them to have lengths \(x=2+\epsilon_{x}\), \(y=1+\epsilon_{y}\), and \(z=2+\epsilon_{z}\), where \(\left|\epsilon_{x}\right| \leq h\), \(\left|\epsilon_{y}\right| \leq h\), and \(\left|\epsilon_{z}\right| \leq h\) for some positive number \(h\) representing the limit of the accuracy of our measuring device. We now estimate the diagonal of the box to be

    \[ f(2,1,2)=3 \nonumber\]

    with an error of

    \[ \begin{aligned}
    \left|f\left(2+\epsilon_{x}, 1+\epsilon_{y}, 2+\epsilon_{z}\right)-f(2,1,2)\right| & \approx\left|A\left(2+\epsilon_{x}, 1+\epsilon_{y}, 2+\epsilon_{z}\right)-3\right| \\
    &=\left|\frac{2}{3} \epsilon_{x}+\frac{1}{3} \epsilon_{y}+\frac{2}{3} \epsilon_{z}\right| \\
    & \leq \frac{2}{3}\left|\epsilon_{x}\right|+\frac{1}{3}\left|\epsilon_{y}\right|+\frac{2}{3}\left|\epsilon_{z}\right| \\
    & \leq h\left(\frac{2}{3}+\frac{1}{3}+\frac{2}{3}\right) \\
    &=\frac{5}{3} h.
    \end{aligned} \]

    That is, we expect our error in estimating the diagonal of the block to be no more that times the maximum error in our measurements of the sides of the block. For example, if the error in our length measurements is off by no more than ±0.1 centimeters, then our estimate of the diagonal of the box is off by no more than ±0.17 centimeters.

    Note that if \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is the best affine approximation to \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) at \(\mathbf{c}=\left(c_{1}, c_{2}, \ldots, c_{n}\right) \), then the graph of \(A\) is the set of all points \(\left(x_{1}, x_{2}, \ldots, x_{n}, z\right)\) in \(\mathbb{R}^{n+1}\) satisfying

    \[ z=\nabla f(\mathbf{c}) \cdot\left(x_{1}-c_{1}, x_{2}-c_{2}, \ldots, x_{n}-c_{n}\right)+f(\mathbf{c}) .\]

    Letting

    \[ \mathbf{n}=\left(\frac{\partial}{\partial x_{1}} f(\mathbf{c}), \frac{\partial}{\partial x_{2}} f(\mathbf{c}), \ldots, \frac{\partial}{\partial x_{n}} f(\mathbf{c}),-1\right) , \]

    we may describe the graph of \(A\) as the set of all points in \(\mathbb{R}^{n+1}\) satisfying

    \[ \mathbf{n} \cdot\left(x_{1}-c_{1}, x_{2}-c_{2}, \ldots, x_{n}-c_{n}, z-f(\mathbf{c})\right)=0 .\]

    Thus the graph of \(A\) is a hyperplane in \(\mathbb{R}^{n+1}\) passing through the point \(\left(c_{1}, c_{2}, \ldots, c_{n}, f(\mathbf{c})\right)\) (a point on the graph of \(f\)) with normal vector \(\mathbf{n}\). 

    Definition \(\PageIndex{3}\)

    If \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is the best affine approximation to \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) at \(\mathbf{c} = \left(c_{1}, c_{2}, \ldots, c_{n}\right) \), then we call the graph of \(A\) the tangent hyperplane to the graph of \(f\) at \(\left(c_{1}, c_{2}, \ldots, c_{n}, f(\mathbf{c})\right)\).

    Example \(\PageIndex{4}\)

    We saw above that the best affine approximation to

    \[ f(x, y)=4-2 x^{2}-y^{2} \nonumber \]

    at (1,1) is

    \[ A(x, y)=7-4 x-2 y . \nonumber \]

    Hence the equation of the tangent plane to the graph of \(f\) at is

    \[ z=7-4 x-2 y , \nonumber \]

    or

    \[ 4 x+2 y+z=7 . \nonumber \]

    Note that the vector \(\mathbf{n}=(4,2,1)\) is normal to the tangent plane, and hence normal to the graph of \(f\) at (1,1,1). The graph of \(f\) along with the tangent plane at (1,1,1) is shown in Figure 3.3.2.

    Screen Shot 2021-07-30 at 13.16.46.png
    Figure 3.3.2 A plane tangent to the graph of \(f(x, y)=4-2 x^{2}-y^{2}\)

    The chain rule

    Suppose \(\varphi: \mathbb{R} \rightarrow \mathbb{R}^{n}\) is differentiable at a point \(c\) and \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable at the point \(\varphi(c)\). Then the composition of \(f\) and \(\varphi\) is a function \(f \circ \varphi: \mathbb{R} \rightarrow \mathbb{R}\). To compute the derivative of \(f \circ \varphi\) at \(c\), we must evaluate

    \[ (f \circ \varphi)^{\prime}(c)=\lim _{h \rightarrow 0} \frac{f \circ \varphi(c+h)-f \circ \varphi(c)}{h}=\lim _{h \rightarrow 0} \frac{f(\varphi(c+h))-f(\varphi(c))}{h} . \]

    Let \(A\) be the best affine approximation to \(f\) at \(\mathbf{a}=\varphi(c)\) and let \(\mathbf{k}=\varphi(c+h)-\varphi(c)\). Then

    \[ f(\varphi(c+h))=f(\mathbf{a}+\mathbf{k})=A(\mathbf{a}+\mathbf{k})+R(\mathbf{k}) , \]

    Where \(R(\mathbf{k})\) is \(o(\mathbf{k})\). Now

    \[ A(\mathbf{a}+\mathbf{k})=\nabla f(\mathbf{a}) \cdot \mathbf{k}+f(\mathbf{a}) , \]

    so

    \[\begin{align}
    f(\varphi(c+h))-f(\varphi(c)) &=f(\mathbf{a}+\mathbf{k})-f(\mathbf{a}) \nonumber \\
    &=\nabla f(\mathbf{a}) \cdot \mathbf{k}+R(\mathbf{k}) \nonumber \\
    &=\nabla f(\mathbf{a}) \cdot(\varphi(c+h)-\varphi(c))+R(\mathbf{k}). \label{} 
    \end{align} \]

    Substituting (3.3.40) into (3.3.37), we have

    \[ \begin{align}
    (f \circ \varphi)^{\prime}(c) &=\lim _{h \rightarrow 0} \frac{\nabla f(\mathbf{a}) \cdot(\varphi(c+h)-\varphi(c))+R(\mathbf{k})}{h} \nonumber \\
    &=\lim _{h \rightarrow 0} \nabla f(\mathbf{a}) \cdot \frac{\varphi(c+h)-\varphi(c)}{h}+\lim _{h \rightarrow 0} \frac{R(\mathbf{k})}{h} \nonumber \\
    &=\nabla f(\mathbf{a}) \cdot D \varphi(\mathbf{c})+\lim _{h \rightarrow 0} \frac{R(\mathbf{k})}{h} . \label{}
    \end{align} \]

    Now \(R(\mathbf{k})\) is \(o(\mathbf{k})\), so

    \[ \lim _{\mathbf{k} \rightarrow \mathbf{0}} \frac{R(\mathbf{k})}{\|\mathbf{k}\|}=0 , \nonumber \]

    from which it follows that, for any given \(\epsilon>0\), we have

    \[ \frac{|R(\mathbf{k})|}{\|\mathbf{k}\|}<\epsilon \]

    for sufficiently small \(\mathbf{k} \neq 0\). Since \(R(\mathbf{0})=0\), it follows that

    \[ |R(\mathbf{k})|<\epsilon\|\mathbf{k}\| \]

    for all \(\mathbf{k}\) sufficiently small. Moreover, \(\varphi\) is continuous at \(c\), so we may choose \(h\) small enough to guarantee that

    \[ \mathbf{k}=\varphi(c+h)-\varphi(h) \nonumber \]

    is small enough for (3.3.43) to hold. Hence for sufficiently small \(h \neq 0\),

    \[ \frac{|R(\mathbf{k})|}{h}<\frac{\epsilon\|\mathbf{k}\|}{h} . \]

    Now

    \[ \lim _{h \rightarrow 0} \frac{\|\mathbf{k}\|}{h}=\lim _{h \rightarrow 0} \frac{\|\varphi(c+h)-\varphi(c)\|}{h}=\|D \varphi(c)\| \]

    and the choice of \(\epsilon\) was arbitrary, so it follows that

    \[\lim _{h \rightarrow \mathbf{0}} \frac{R(\mathbf{k})}{h}=0 .\]

    Hence

    \[ (f \circ \varphi)^{\prime}(c)=\nabla f(\mathbf{a}) \cdot D \varphi(c) . \]

    This is a version of the chain rule.

    Theorem \(\PageIndex{4}\)

    Suppose \(\varphi: \mathbb{R} \rightarrow \mathbb{R}^{n}\) is differentiable at \(c\) and \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable at \(\varphi (c)\). Then

    \[ (f \circ \varphi)^{\prime}(c)=\nabla f(\varphi(c)) \cdot D \varphi(c) . \]

    If we imagine a particle moving along the curve \(C\) parametrized by \(\varphi\), with velocity \(\mathbf{v}(t)\) and unit tangent vector \(T(t)\) at time \(t\), then (3.3.48) says that the rate of change of \(f\) along \(C\) at \(\varphi (c)\) is

    \[ \nabla f(\varphi(c)) \cdot \mathbf{v}(c)=\|\mathbf{v}(c)\| \nabla f(\varphi(c)) \cdot T(c)=\|\mathbf{v}(c)\| D_{T(c)} f(\varphi(c)) . \]

    In other words, the rate of change of \(f\) along \(C\) is the rate of change of \(f\) in the direction of \(T(t)\) multiplied by the speed of the particle moving along the curve.

    Example \(\PageIndex{5}\)

    Suppose that the temperature at a point \((x,y,z)\) inside a cubical region of space is given by

    \[ T(x, y, z)=80-20 x e^{-\frac{1}{20}\left(x^{2}+y^{2}+z^{2}\right)} . \nonumber \]

    Moreover, suppose a bug flies through this region along the elliptical helix parametrized by 

    \[ \varphi(t)=(\cos (\pi t), 2 \sin (\pi t), t) . \nonumber \]

    Then

    \[ \nabla T(x, y, z)=e^{-\frac{1}{20}\left(x^{2}+y^{2}+z^{2}\right)}\left(2 x^{2}-20,2 x y, 2 x z\right) \nonumber \]

    and

    \[ D \varphi(t)=(-\pi \sin (\pi t), 2 \pi \cos (\pi t), 1) . \nonumber \]

    Hence, for example, if we want to know the rate of change of temperature for the bug at \(t=\frac{1}{3}\), we would evaluate

    \[ D \varphi\left(\frac{1}{3}\right)=\left(-\frac{\sqrt{3} \pi}{2}, \pi, 1\right) \nonumber \]

    and

    \[ \nabla T\left(\varphi\left(\frac{1}{3}\right)\right)=\nabla T\left(\frac{1}{2}, \sqrt{3}, \frac{1}{3}\right)=e^{-\frac{121}{20}}\left(-\frac{39}{2}, \sqrt{3}, \frac{1}{3}\right),  \nonumber \]

    so

    \begin{aligned}
    (T \circ \varphi)^{\prime}\left(\frac{1}{3}\right) &=e^{-\frac{121}{720}}\left(-\frac{39}{2}, \sqrt{3}, \frac{1}{3}\right) \cdot\left(-\frac{\sqrt{3} \pi}{2}, \pi, 1\right) \\
    &=e^{-\frac{121}{720}}\left(\frac{39 \pi \sqrt{3}}{4}+\sqrt{3} \pi+\frac{1}{3}\right) \\
    &=49.73,
    \end{aligned}

    where the final value has been rounded to two decimal places. Hence at that moment the temperature for the bug is increasing at rate of 49.73º per second. We could also express this as

    \[ \left.\frac{d T}{d t}\right|_{t=\frac{1}{3}}=49.73^{\circ} . \nonumber \]

    For an alternative formulation of the chain rule, suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and \(x_{i}: \mathbb{R} \rightarrow \mathbb{R}\), \(i=1,2, \ldots, n\), are all differentiable and let \(w=f\left(x_{1}, x_{2}, \ldots, x_{n}\right)\). If \(x_{1}, x_{2}, \ldots, x_{n}\) are all functions of \(t\), then, by the chain rule,

    \[ \begin{align}
    \frac{d w}{d t} &=\left(\frac{\partial w}{\partial x_{1}}, \frac{\partial w}{\partial x_{2}}, \ldots, \frac{\partial w}{\partial x_{n}}\right) \cdot\left(\frac{d x_{1}}{d t}, \frac{d x_{2}}{d t}, \ldots, \frac{d x_{n}}{d t}\right) \nonumber \\
    &=\frac{\partial w}{\partial x_{1}} \frac{d x_{1}}{d t}+\frac{\partial w}{\partial x_{2}} \frac{d x_{2}}{d t}+\cdots+\frac{\partial w}{\partial x_{n}} \frac{d x_{n}}{d t} . \label{}
    \end{align} \]

    Example \(\PageIndex{6}\)

    Suppose the dimensions of a box are increasing so that its length, width, and height at time \(t\) are, in centimeters,

    \[ \begin{aligned}
    &x=3 t, \\
    &y=t^{2},
    \end{aligned} \]

    and

    \[ z=t^{3}, \nonumber \]

    respectively. Since the volume of the box is

    \[ V=x y z , \nonumber \]

    the rate of change of the volume is

    \[ \frac{d V}{d t}=\frac{\partial V}{\partial x} \frac{d x}{d t}+\frac{\partial V}{\partial y} \frac{d y}{d t}+\frac{\partial V}{\partial z} \frac{d z}{d t}=3 y z+2 x z t+3 x y t^{2} . \nonumber \]

    Hence, for example, at \(t = 2\) we have \(x = 6\), \(y = 4\), and \(z = 8\), so

    \[ \left.\frac{d V}{d t}\right|_{t=2}=96+192+288=576 \mathrm{~cm}^{3} / \mathrm{sec} . \nonumber \]

    The gradient and level sets

    Now consider a differentiable function \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and a point \(\mathbf{a}\) on the level set \(S\) specified by \(f(\mathbf{x})=c\) for some scalar \(c\). Suppose \(\varphi: \mathbb{R} \rightarrow \mathbb{R}^{n}\) is a smooth parametrization of a curve \(C\) which lies entirely on \(S\) and passes through \(\mathbf{a}\). Let \(\varphi(b)=\mathbf{a}\). Then the composition of \(f\) and \(\varphi\) is a constant function; that is,

    \[ g(t)=f \circ \varphi(t)=f(\varphi(t))=c \]

    for all values of \(t\). Thus, using the chain rule,

    \[ 0=g^{\prime}(b)=\nabla f(\varphi(b)) \cdot D \varphi(b)=\nabla f(\mathbf{a}) \cdot D \varphi(b) . \]

    Hence

    \[ \nabla f(\mathbf{a}) \perp D \varphi(b) . \]

    Now \(D \varphi(b)\) is tangent to \(C\) at \(\mathbf{a}\); moreover, since (3.3.53) holds for any curve in \(S\) passing through \(\mathbf{a}\), \(\nabla f(\mathbf{a})\) is orthogonal to every vector tangent to \(S\). In other words, \(\nabla f(\mathbf{a})\) is normal to the hyperplane tangent to \(S\) at \(\mathbf{a}\). Thus we have the following theorem.

    Theorem \(\PageIndex{5}\)

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is differentiable on an open ball containing the point \(\mathbf{a}\), and let \(S\) be the set of all points in \(\mathbb{R}^n\) such that \(f(\mathbf{x})=f(\mathbf{a})\). If \(\nabla f(\mathbf{a}) \neq \mathbf{0}\), then the hyperplane with equation

    \[ \nabla f(\mathbf{a}) \cdot(\mathbf{x}-\mathbf{a})=0 \]

    is tangent to \(S\) at \(\mathbf{a}\). 

    For \(n = 2\), the hyperplane described by (3.3.54) will be a tangent line to a curve; for \(n = 3\), it will be a tangent plane to a surface.

    Example \(\PageIndex{7}\)

    The set of all points \(S\) in \(\mathbb{R}^3\) satisfying

    \[ x^{2}+y^{2}+z^{2}=9 \nonumber \]

    is a sphere with radius 3 centered at the origin. We will find an equation for the plane tangent to \(S\) at (2,−1,2). First note that \(S\) is a level surface for the function

    \[ f(x, y, z)=x^{2}+y^{2}+z^{2} . \nonumber \]

    Now

    \[ \nabla f(x, y, z)=(2 x, 2 y, 2 z) , \nonumber \]

    so

    \[ \nabla f(2,-1,2)=(4,-2,4) . \nonumber \]

    Thus an equation for the tangent plane is

    \[ (4,-2,4) \cdot(x-2, y+1, z-2)=0 , \nonumber \]

    or

    \[ 4 x-2 y+4 z=18 . \nonumber \]

    See Figure 3.3.3.

    Screen Shot 2021-07-30 at 13.55.52.png

    Figure 3.3.3 Sphere with tangent plane