Skip to main content
Mathematics LibreTexts

4.2: Best Affine Approximations

  • Page ID
    22940
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Best affine approximations

    The following definitions should look very familiar.

    Definition \(\PageIndex{1}\)

    Suppose \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) is defined on an open ball containing the point \(\mathbf{c}\). We call an affine function \(A: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) the best affine approximation to \(f\) at \(\mathbf{c}\) if (1) \(A(\mathbf{c})=f(\mathbf{c})\) and (2) \(\|R(\mathbf{h})\|\) is \(o(\mathbf{h})\), where

    \[ R(\mathbf{h})=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h}) . \]

    Suppose \(A: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\) is the best affine approximation to \(f\) at \(\mathbf{c}\). Then, from our work in Section 1.5, there exists an \(n \times m\) matrix \(M\) and a vector \(\mathbf{b}\) in \(\mathbb{R}^n\) such that

    \[ A(\mathbf{x})=M \mathbf{x}+\mathbf{b} \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^m\). Moreover, the condition \(A(\mathbf{c})=f(\mathbf{c})\) implies \(f(\mathbf{c})=M \mathbf{c}+\mathbf{b}\), and so \(\mathbf{b}=f(\mathbf{c})-M \mathbf{c}\). Hence we have

    \[ A(\mathbf{x})=M \mathbf{x}+f(\mathbf{c})-M \mathbf{c}=M(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) \label{4.2.3} \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^m\). Thus to find the best affine approximation we need only identify the matrix \(M\) in (\(\ref{4.2.3}\)).

    Definition \(\PageIndex{2}\)

    Suppose \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) is defined on an open ball containing the point \(\mathbf{c}\). If \(f\) has a best affine approximation at \(\mathbf{c}\), then we say \(f\) is differentiable at \(\mathbf{c}\). Moreover, if the best affine approximation to \(f\) at \(\mathbf{c}\) is given by

    \[ A(\mathbf{x})=M(\mathbf{x}-\mathbf{c})+f(\mathbf{c}) , \]

    then we call \(M\) the derivative of \(f\) at \(\mathbf{c}\) and write \(D f(\mathbf{c})=M\).

    Now suppose \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) and \(A\) is an affine function with \(A(\mathbf{c})=f(\mathbf{c})\). Let \(f_k\) and \(A_k\) be the \(k\)th coordinate functions of \(f\) and \(A\), respectively, for \(k=1,2, \ldots, n\), and let \(R\) be the remainder function

    \[ \begin{aligned}
    R(\mathbf{h}) &=f(\mathbf{c}+\mathbf{h})-A(\mathbf{c}+\mathbf{h}) \\
    &=\left(f_{1}(\mathbf{c}+\mathbf{h})-A_{1}(\mathbf{c}+\mathbf{h}), f_{2}(\mathbf{c}+\mathbf{h})-A_{2}(\mathbf{c}+\mathbf{h}), \ldots, f_{n}(\mathbf{c}+\mathbf{h})-A_{n}(\mathbf{c}+\mathbf{h})\right).
    \end{aligned} \]

    Then

    \[ \frac{R(\mathbf{h})}{\|\mathbf{h}\|}=\left(\frac{f_{1}(\mathbf{c}+\mathbf{h})-A_{1}(\mathbf{c}+\mathbf{h})}{\|\mathbf{h}\|}, \frac{f_{2}(\mathbf{c}+\mathbf{h})-A_{2}(\mathbf{c}+\mathbf{h})}{\|\mathbf{h}\|}, \ldots, \frac{f_{n}(\mathbf{c}+\mathbf{h})-A_{n}(\mathbf{c}+\mathbf{h})}{\|\mathbf{h}\|}\right) , \nonumber \]

    and so

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{\| R(\mathbf{h} \|}{\|\mathbf{h}\|}=0 , \]

    that is, \(A\) is the best affine approximation to \(f\) at \(\mathbf{c}\), if and only if

    \[ \lim _{\mathbf{h} \rightarrow \mathbf{0}} \frac{f_{k}(\mathbf{c}+\mathbf{h})-A_{k}(\mathbf{c}+\mathbf{h})}{\|\mathbf{h}\|}=0 \label{4.2.6} \]

    for \(k=1,2, \ldots, n\). But (\(\ref{4.2.6}\)) is the statement that \(A_k\) is the best affine approximation to \(f_k\) at \(\mathbf{c}\). In other words, \(A\) is the best affine approximation to \(f\) at \(\mathbf{c}\) if and only if \(A_k\) is the best affine approximation to \(f_k\) at \(\mathbf{c}\) for \(k=1,2, \ldots, n\). This result has many interesting consequences.

    Proposition \(\PageIndex{1}\)

    If \(f_{k}: \mathbb{R}^{m} \rightarrow \mathbb{R}\) is the \(k\)th coordinate function of \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\), then \(f\) is differentiable at a point \(\mathbf{c}\) if and only if \(f_k\) is differentiable at \(\mathbf{c}\) for \(k=1,2, \ldots, n\).

    Definition \(\PageIndex{3}\)

    If \(f_{k}: \mathbb{R}^{m} \rightarrow \mathbb{R}\) is the \(k\)th coordinate function of \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\), then we say \(f\) is \(C^1\) on an open set \(U\) if \(f_k\) is \(C^1\) on \(U\) for \(k=1,2, \ldots, n\).

    Putting our results in Section 3.3 together with the previous proposition and definition, we have the following basic result.

    Theorem \(\PageIndex{1}\)

    If \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) is \(C^1\) on an open ball containing the point \(\mathbf{c}\), then \(f\) is differentiable at \(\mathbf{c}\).

    Suppose \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) is differentiable at \(\mathbf{c}=\left(c_{1}, c_{2}, \ldots, c_{m}\right)\) with best affine approximation \(A\) and \(f_{k}: \mathbb{R}^{m} \rightarrow \mathbb{R}\) and \(A_{k}: \mathbb{R}^{m} \rightarrow \mathbb{R}\) are the coordinate functions of \(f\) and \(A\), respectively, for \(k=1,2, \ldots, n\). Since \(A_k\) is the best affine approximation to \(f_k\) at \(\mathbf{c}\), we know from Section 3.3 that

    \[ A_{k}(\mathbf{x})=\nabla f_{k}(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+f_{k}(\mathbf{c}) \]

    for all \(\mathbf{x}\) in \(\mathbb{R}^m\). Hence, writing the vectors as column vectors, we have

    \[ \begin{align}
    A(\mathbf{x})=&\left[\begin{array}{c}
    A_{1}(\mathbf{x}) \\
    A_{2}(\mathbf{x}) \\
    \vdots \\
    A_{n}(\mathbf{x})
    \end{array}\right] \nonumber \\
    =&\left[\begin{array}{cccc}
    \nabla f_{1}(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+f_{1}(\mathbf{c}) \\
    \nabla f_{2}(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+f_{2}(\mathbf{c}) \\
    \vdots \\
    \nabla f_{n}(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+f_{n}(\mathbf{c})
    \end{array}\right] \nonumber \\
    =&\left[\begin{array}{cccc}
    \frac{\partial}{\partial x_{1}} f_{1}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{1}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{1}(\mathbf{c}) \\
    \frac{\partial}{\partial x_{1}} f_{2}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{2}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{2}(\mathbf{c}) \\
    \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial}{\partial x_{1}} f_{n}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{n}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{n}(\mathbf{c})
    \end{array}\right]\left[\begin{array}{c}
    x_{1}-c_{1} \\
    x_{2}-c_{2} \\
    \vdots \\
    x_{m}-c_{m}
    \end{array}\right]+\left[\begin{array}{c}
    f_{1}(\mathbf{c}) \\
    f_{2}(\mathbf{c}) \\
    \vdots \\
    f_{m}(\mathbf{c})
    \end{array}\right] . \label{4.2.8}
    \end{align} \]

    It follows that the \(n \times m \) matrix in (\(\ref{4.2.8}\)) is the derivative of \(f\).

    Theorem \(\PageIndex{2}\)

    If \(f: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) is differentiable at a point \(\mathbf{c}\), then the derivative of \(f\) at \(\mathbf{c}\) is given by

    \[ D f(\mathbf{c})=\left[\begin{array}{cccc}
    \frac{\partial}{\partial x_{1}} f_{1}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{1}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{1}(\mathbf{c}) \\
    \frac{\partial}{\partial x_{1}} f_{2}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{2}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{2}(\mathbf{c}) \\
    \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial}{\partial x_{1}} f_{n}(\mathbf{c}) & \frac{\partial}{\partial x_{2}} f_{n}(\mathbf{c}) & \cdots & \frac{\partial}{x_{m}} f_{n}(\mathbf{c})
    \end{array}\right] . \label{4.2.9} \]

    We call the matrix in (\(\ref{4.2.9}\)) the Jacobian matrix of \(f\), after the German mathematician Carl Gustav Jacob Jacobi (1804-1851). Note that we have seen this matrix before in our discussion of change of variables in integrals in Section 3.7.

    Example \(\PageIndex{1}\)

    Consider the function \(f: \mathbb{R}^{3} \rightarrow \mathbb{R}^{2}\) defined by

    \[ f(x, y, z)=(x y z, 3 x-2 y z) . \nonumber \]

    The coordinate functions of \(f\) are

    \[ f_{1}(x, y, z)=x y z \nonumber \]

    and

    \[ f_{2}(x, y, z)=3 x-2 y z \text {. } \nonumber \]

    Now

    \[ \nabla f_{1}(x, y, z)=(y z, x z, x y) \nonumber \]

    and

    \[ \nabla f_{2}(x, y, z)=(3,-2 z,-2 y) , \nonumber \]

    so the Jacobian of \(f\) is

    \[ D f(x, y, z)=\left[\begin{array}{ccc}
    y z & x z & x y \\
    3 & -2 z & -2 y
    \end{array}\right] . \nonumber \]

    Hence, for example,

    \[ D f(1,2,-1)=\left[\begin{array}{rrr}
    -2 & -1 & 2 \\
    3 & 2 & -4
    \end{array}\right] . \nonumber \]

    Since \(f(1,2,-1)=(-2,7)\), the best affine approximation to \(f\) at (1,2,−1) is

    \[ \begin{aligned}
    A(x, y, z) &=\left[\begin{array}{rrr}
    -2 & -1 & 2 \\
    3 & 2 & -4
    \end{array}\right]\left[\begin{array}{l}
    x-1 \\
    y-2 \\
    z+1
    \end{array}\right]+\left[\begin{array}{r}
    -2 \\
    7
    \end{array}\right] \\
    &=\left[\begin{array}{c}
    -2(x-1)-(y-2)+2(z+1)-2 \\
    3(x-1)+2(y-2)-4(z+1)+7
    \end{array}\right] \\
    &=\left[\begin{array}{c}
    -2 x-y+2 z+4 \\
    3 x+2 y-4 z-4
    \end{array}\right] .
    \end{aligned} \]

    Tangent planes

    Suppose \(f: \mathbb{R}^{2} \rightarrow \mathbb{R}^{3}\) parametrizes a surface \(S\) in \(\mathbb{R}^3\). If \(f_1\), \(f_2\), and \(f_3\) are the coordinate functions of \(f\), then the best affine approximation to \(f\) at a point \((s_0,t_0)\) is given by

    \[ \begin{align}
    &A(s, t)=\left[\begin{array}{ll}
    \frac{\partial}{\partial s} f_{1}\left(t_{0}, s_{0}\right) & \frac{\partial}{\partial t} f_{1}\left(t_{0}, s_{0}\right) \\
    \frac{\partial}{\partial s} f_{2}\left(t_{0}, s_{0}\right) & \frac{\partial}{\partial t} f_{2}\left(t_{0}, s_{0}\right) \\
    \frac{\partial}{\partial s} f_{3}\left(t_{0}, s_{0}\right) & \frac{\partial}{\partial t} f_{3}\left(t_{0}, s_{0}\right)
    \end{array}\right]\left[\begin{array}{c}
    s-s_{0} \\
    t-t_{0}
    \end{array}\right]+\left[\begin{array}{l}
    f_{1}\left(s_{0}, t_{0}\right) \\
    f_{2}\left(s_{0}, t_{0}\right) \\
    f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right] \nonumber \\
    &=\left[\begin{array}{l}
    \frac{\partial}{\partial s} f_{1}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial s} f_{2}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial s} f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right]\left(s-s_{0}\right)+\left[\begin{array}{l}
    \frac{\partial}{\partial t} f_{1}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial t} f_{2}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial t} f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right]\left(t-t_{0}\right)+\left[\begin{array}{l}
    f_{1}\left(s_{0}, t_{0}\right) \\
    f_{2}\left(s_{0}, t_{0}\right) \\
    f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right] \label{4.2.10}
    \end{align} \]

    If the vectors

    \[ \mathbf{v}=\left[\begin{array}{l}
    \frac{\partial}{\partial s} f_{1}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial s} f_{2}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial s} f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right] \label{4.2.11} \]

    and

    \[ \mathbf{w}=\left[\begin{array}{l}
    \frac{\partial}{\partial t} f_{1}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial t} f_{2}\left(s_{0}, t_{0}\right) \\
    \frac{\partial}{\partial t} f_{3}\left(s_{0}, t_{0}\right)
    \end{array}\right] \label{4.2.12} \]

    are linearly independent, then (\(\ref{4.2.10}\)) implies that the image of \(A\) is a plane in \(\mathbb{R}^3\) which passes through the point \(f\left(s_{0}, t_{0}\right)\) on the surface \(S\). Moreover, if we let \(C_1\) be the curve on \(S\) through the point \(f\left(s_{0}, t_{0}\right)\) parametrized by \(\varphi_{1}(s)=f\left(s, t_{0}\right)\) and \(C_2\) be the curve on \(S\) through the point \(f\left(s_{0}, t_{0}\right)\) parametrized by \(\varphi_{2}(t)=f\left(s_{0}, t\right)\), then \(\mathbf{v}\) is tangent to \(C_1\) at \(f\left(s_{0}, t_{0}\right)\) and \(\mathbf{w}\) is tangent to \(C_2\) at \(f\left(s_{0}, t_{0}\right)\). Hence we call the image of \(A\) the tangent plane to the surface \(S\) at the point \(f\left(s_{0}, t_{0}\right)\).

    Example \(\PageIndex{2}\)

    Let \(T\) be the torus parametrized by

    \[ f(s, t)=((3+\cos (t)) \cos (s),(3+\cos (t)) \sin (s), \sin (t)) \nonumber \]

    for \(0 \leq s \leq 2 \pi\) and \(0 \leq t \leq 2 \pi\). Then

    \[ D f(s, t)=\left[\begin{array}{cc}
    -(3+\cos (t)) \sin (s) & -\sin (t) \cos (s) \\
    (3+\cos (t)) \cos (s) & -\sin (t) \sin (s) \\
    0 & \cos (t)
    \end{array}\right] . \nonumber \]

    Thus, for example,

    \[ \operatorname{Df}\left(\frac{\pi}{2}, \frac{\pi}{4}\right)=\left[\begin{array}{cc}
    -\left(3+\frac{1}{\sqrt{2}}\right) & 0 \\
    0 & -\frac{1}{\sqrt{2}} \\
    0 & \frac{1}{\sqrt{2}}
    \end{array}\right] . \nonumber \]

    Since

    \[ f\left(\frac{\pi}{2}, \frac{\pi}{4}\right)=\left(0,3+\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right) \text {, } \nonumber \]

    the best affine approximation to \(f\) at \(\left(\frac{\pi}{2}, \frac{\pi}{4}\right)\) is

    \[ A(s, t)=\left[\begin{array}{cc}
    -\left(3+\frac{1}{\sqrt{2}}\right) & 0 \\
    0 & -\frac{1}{\sqrt{2}} \\
    0 & \frac{1}{\sqrt{2}}
    \end{array}\right]\left[\begin{array}{c}
    s-\frac{\pi}{2} \\
    t-\frac{\pi}{4}
    \end{array}\right]+\left[\begin{array}{c}
    0 \\
    3+\frac{1}{\sqrt{2}} \\
    \frac{1}{\sqrt{2}}
    \end{array}\right] \nonumber \]

    \[ =\left[\begin{array}{c}
    -\left(3+\frac{1}{\sqrt{2}}\right) \\
    0 \\
    0
    \end{array}\right]\left(s-\frac{\pi}{2}\right)+\left[\begin{array}{c}
    0 \\
    -\frac{1}{\sqrt{2}} \\
    \frac{1}{\sqrt{2}}
    \end{array}\right]\left(t-\frac{\pi}{4}\right)+\left[\begin{array}{c}
    0 \\
    3+\frac{1}{\sqrt{2}} \\
    \frac{1}{\sqrt{2}}
    \end{array}\right] . \nonumber \]

    Hence

    \[ \begin{aligned}
    &x=-\left(3+\frac{1}{\sqrt{2}}\right)\left(s-\frac{\pi}{2}\right), \\
    &y=-\frac{1}{\sqrt{2}}\left(t-\frac{\pi}{4}\right)+3+\frac{1}{\sqrt{2}}, \\
    &z=\frac{1}{\sqrt{2}}\left(t-\frac{\pi}{4}\right)+\frac{1}{\sqrt{2}},
    \end{aligned} \]

    are parametric equations for the plane \(P\) tangent to \(T\) at \(\left(0,3+\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right)\). See Figure 4.2.1.

    Screen Shot 2021-08-18 at 08.36.37.png
    Figure \(\PageIndex{1}\): Torus with a tangent plane

    Chain rule

    We are now in a position to state the chain rule in its most general form. Consider functions \(g: \mathbb{R}^{m} \rightarrow \mathbb{R}^{q}\) and \(f: \mathbb{R}^{q} \rightarrow \mathbb{R}^{n}\) and suppose \(g\) is differentiable at \(\mathbf{c}\) and \(f\) is differentiable at \(g(\mathbf{c})\). Let \(h: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\) be the composition \(h(\mathbf{x})=f(g(\mathbf{x}))\) and denote the coordinate functions of \(f\), \(g\), and \(h\) by \(f_{i}, i=1,2, \ldots, n, g_{j}, j=1,2 \ldots, q\), and \(h_{k}, k=1,2, \ldots, n\), respectively. Then, for \(k=1,2, \ldots, n\),

    \[ h_{k}\left(x_{1}, x_{2}, \ldots, x_{m}\right)=f_{k}\left(g_{1}\left(x_{1}, x_{2}, \ldots, x_{m}\right), g_{2}\left(x_{1}, x_{2}, \ldots, x_{m}\right), \ldots, g_{q}\left(x_{1}, x_{2}, \ldots, x_{m}\right)\right) . \nonumber \]

    Now if we fix \(m-1\) of the variables \(x_{1}, x_{2}, \ldots, x_{m}\), say, all but \(x_j\), then \(h_k\) is the composition of a function from \(\mathbb{R}\) to \(\mathbb{R}^q\) with a function from \(\mathbb{R}^q\) to \(\mathbb{R}\). Thus we may use the chain rule from Section 3.3 to compute \(\frac{\partial}{\partial x_{j}} h_{k}(\mathbf{c})\), namely,

    \[ \begin{gather}
    \frac{\partial}{\partial x_{j}} h_{k}(\mathbf{c})=\nabla f_{k}(g(\mathbf{c})) \cdot\left(\frac{\partial}{\partial x_{j}} g_{1}(\mathbf{c}), \frac{\partial}{\partial x_{j}} g_{2}(\mathbf{c}), \ldots, \frac{\partial}{x_{j}} g_{q}(\mathbf{c})\right) \nonumber \\
    =\frac{\partial}{\partial x_{1}} f_{k}(g(\mathbf{c})) \frac{\partial}{\partial x_{j}} g_{1}(\mathbf{c})+\frac{\partial}{\partial x_{2}} f_{k}(g(\mathbf{c})) \frac{\partial}{\partial x_{j}} g_{2}(\mathbf{c})+ \label{4.2.13} \\
    \cdots+\frac{\partial}{\partial x_{q}} f_{k}(g(\mathbf{c})) \frac{\partial}{\partial x_{j}} g_{q}(\mathbf{c}). \nonumber
    \end{gather} \]

    Hence \(\frac{\partial}{\partial x_{j}} h_{k}(\mathbf{c})\) is equal to the dot product of the \(k\)th row of \(D f(g(\mathbf{c}))\) with the \(j\)th column of \(D g(\mathbf{c})\). Moreover, if \(g\) is \(C^1\) on an open ball about \(\mathbf{c}\) and \(f\) is \(C^1\) on an open ball about \(g(\mathbf{c})\), then (\(\ref{4.2.13}\)) shows that \(\frac{\partial}{\partial x_{j}} h_{k}\) is continuous on an open ball about \(\mathbf{c}\). It follows from our results in Section 3.3 that \(h\) is differentiable at \(\mathbf{c}\). Since \(\frac{\partial}{\partial x_{j}} h_{k}\) is the entry in the \(k\)th row and \(j\)th column of \(\operatorname{Dh}(\mathbf{c})\), (\(\ref{4.2.13}\)) implies \(\operatorname{Dh}(\mathbf{c})=D f(g(\mathbf{c})) D g(\mathbf{c})\). This result, the chain rule, may be proven without assuming that \(f\) and \(g\) are both \(C^1\), and so we state the more general result in the following theorem.

    Theorem: Chain Rule \(\PageIndex{3}\)

    If \(g: \mathbb{R}^{m} \rightarrow \mathbb{R}^{q}\) is differentiable at \(\mathbf{c}\) and \(f: \mathbb{R}^{q} \rightarrow \mathbb{R}^{n}\) is differentiable at \(g(\mathbf{c})\), then \(f \circ g\) is differentiable at \(\mathbf{c}\) and

    \[ D(f \circ g)(\mathbf{c})=D f(g(\mathbf{c})) D g(\mathbf{c}) . \]

    Equivalently, the chain rule says that if \(A\) is the best affine approximation to \(g\) at \(\mathbf{c}\) and \(B\) is the best affine approximation to \(f\) at \(g(\mathbf{c})\), then \(B \circ A\) is the best affine approximation to \(f \circ g\) at \(\mathbf{c}\). That is, the best affine approximation to a composition of functions is the composition of the individual best affine approximations.

    Example \(\PageIndex{3}\)

    Suppose \(g: \mathbb{R}^{2} \rightarrow \mathbb{R}^{3}\) is defined by

    \[ g(s, t)=(\cos (s) \sin (t), \sin (s) \sin (t), \cos (t)) \nonumber \]

    and \(f: \mathbb{R}^{3} \rightarrow \mathbb{R}^{2}\) is defined by

    \[ f(x, y, z)=\left(10 x y z, x^{2}-y z\right) . \nonumber \]

    Then

    \[ D g(s, t)=\left[\begin{array}{cc}
    -\sin (s) \sin (t) & \cos (s) \cos (t) \\
    \cos (s) \sin (t) & \sin (s) \cos (t) \\
    0 & -\sin (t)
    \end{array}\right] \nonumber \]

    and

    \[ D f(x, y, z)=\left[\begin{array}{ccc}
    10 y z & 10 x z & 10 x y \\
    2 x & -z & -y
    \end{array}\right] . \nonumber \]

    Let \(h(s, t)=f(g(s, t))\). To find \(D h\left(\frac{\pi}{4}, \frac{\pi}{4}\right)\), we first note that

    \[ g\left(\frac{\pi}{4}, \frac{\pi}{4}\right)=\left(\frac{1}{2}, \frac{1}{2}, \frac{1}{\sqrt{2}}\right) , \nonumber \]

    \[ D g\left(\frac{\pi}{4}, \frac{\pi}{4}\right)=\left[\begin{array}{rr}
    -\frac{1}{2} & \frac{1}{2} \\
    \frac{1}{2} & \frac{1}{2} \\
    0 & -\frac{1}{\sqrt{2}}
    \end{array}\right] \nonumber \]

    and

    \[ D f\left(g\left(\frac{\pi}{4}, \frac{\pi}{4}\right)\right)=D f\left(\frac{1}{2}, \frac{1}{2}, \frac{1}{\sqrt{2}}\right)=\left[\begin{array}{ccc}
    \frac{5}{\sqrt{2}} & \frac{5}{\sqrt{2}} & \frac{5}{2} \\
    1 & -\frac{1}{\sqrt{2}} & -\frac{1}{2}
    \end{array}\right] . \nonumber \]

    Thus

    \[ \begin{aligned}
    \operatorname{Dh}\left(\frac{\pi}{4}, \frac{\pi}{4}\right) &=D f\left(g\left(\frac{\pi}{4}, \frac{\pi}{4}\right)\right) D g\left(\frac{\pi}{4}, \frac{\pi}{4}\right) \\
    &=\left[\begin{array}{ccc}
    \frac{5}{\sqrt{2}} & \frac{5}{\sqrt{2}} & \frac{5}{2} \\
    1 & -\frac{1}{\sqrt{2}} & -\frac{1}{2}
    \end{array}\right]\left[\begin{array}{rr}
    -\frac{1}{2} & \frac{1}{2} \\
    \frac{1}{2} & \frac{1}{2} \\
    0 & -\frac{1}{\sqrt{2}}
    \end{array}\right] \\
    &=\left[\begin{array}{cc}
    0 & \frac{5}{2 \sqrt{2}} \\
    -\frac{1+\sqrt{2}}{2 \sqrt{2}} & \frac{1}{2}
    \end{array}\right] .
    \end{aligned} \]


    This page titled 4.2: Best Affine Approximations is shared under a CC BY-NC-SA 1.0 license and was authored, remixed, and/or curated by Dan Sloughter via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.