14.4: The Chain Rule
- Page ID
- 932
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Consider the surface \(z=x^2y+xy^2\), and suppose that \(x=2+t^4\) and \(y=1-t^3\). We can think of the latter two equations as describing how \(x\) and \(y\) change relative to, say, time. Then
\[z=x^2y+xy^2=(2+t^4)^2(1-t^3)+(2+t^4)(1-t^3)^2\]
tells us explicitly how the \(z\) coordinate of the corresponding point on the surface depends on \(t\). If we want to know \(dz/dt\) we can compute it more or less directly---it's actually a bit simpler to use the chain rule:
\[\eqalign{ {dz\over dt}&=x^2y'+2xx'y+x2yy'+x'y^2\cr &=(2xy+y^2)x'+(x^2+2xy)y'\cr &=(2(2+t^4)(1-t^3)+(1-t^3)^2)(4t^3)+((2+t^4)^2+2(2+t^4)(1-t^3))(-3t^2)\cr }\]
If we look carefully at the middle step, \(dz/dt=(2xy+y^2)x'+(x^2+2xy)y'\), we notice that \(2xy+y^2\) is \(\partial z/\partial x\), and \(x^2+2xy\) is \(\partial z/\partial y\). This turns out to be true in general, and gives us a new chain rule:
Theorem 14.4.1
Suppose that \(z=f(x,y)\), \(f\) is differentiable, \(x=g(t)\), and \(y=h(t)\). Assuming that the relevant derivatives exist,
\[{dz\over dt}={\partial z\over \partial x}{dx\over dt}+ {\partial z\over \partial y}{dy\over dt}. \]
Proof
If \(f\) is differentiable, then
\[\Delta z=f_x(x_0,y_0)\Delta x+f_y(x_0,y_0)\Delta y+\epsilon_1\Delta x + \epsilon_2\Delta y,\]
where \(\epsilon_1\) and \(\epsilon_2\) approach 0 as \((x,y)\) approaches \((x_0,y_0)\). Then
\[\eqalignno{ {\Delta z\over\Delta t}&= f_x{\Delta x\over\Delta t}+f_y{\Delta y\over\Delta t}+\epsilon_1{\Delta x\over\Delta t} + \epsilon_2{\Delta y\over\Delta t}.& (14.4.1)\cr }\]
As \(\Delta t\) approaches 0, \((x,y)\) approaches \((x_0,y_0)\) and so
\[\eqalign{ \lim_{\Delta t\to0}{\Delta z\over\Delta t} &= {dz\over dt}\cr \lim_{\Delta t\to0}\epsilon_1{\Delta x\over\Delta t} &= 0\cdot{dx\over dt} \cr \lim_{\Delta t\to0}\epsilon_2{\Delta y\over\Delta t} &= 0\cdot{dy\over dt} \cr }\]
and so taking the limit of (14.4.1) as \(\Delta t\) goes to 0 gives
\[ {dz\over dt}= f_x{dx\over dt}+f_y{dy\over dt}, \]
as desired.
We can write the chain rule in way that is somewhat closer to the single variable chain rule:
\[{df\over dt}=\langle f_x,f_y\rangle\cdot\langle x',y'\rangle,\]
or (roughly) the derivatives of the outside function "times'' the derivatives of the inside functions. Not surprisingly, essentially the same chain rule works for functions of more than two variables, for example, given a function of three variables \(f(x,y,z)\), where each of \(x\), \(y\) and \(z\) is a function of \(t\),
\[{df\over dt}=\langle f_x,f_y,f_z\rangle\cdot\langle x',y',z'\rangle.\]
We can even extend the idea further. Suppose that \(f(x,y)\) is a function and \(x=g(s,t)\) and \(y=h(s,t)\) are functions of two variables \(s\) and \(t\). Then \(f\) is "really'' a function of \(s\) and \(t\) as well, and
\[{\partial f\over\partial s}=f_xg_s+f_yh_s\qquad {\partial f\over\partial t}=f_xg_t+f_yh_t.\]
The natural extension of this to \(f(x,y,z)\) works as well.
Recall that we used the ordinary chain rule to do implicit differentiation. We can do the same with the new chain rule.
Example 14.4.2
\(x^2+y^2+z^2 = 4\) defines a sphere, which is not a function of \(x\) and \(y\), though it can be thought of as two functions, the top and bottom hemispheres. We can think of \(z\) as one of these two functions, so really \(z=z(x,y)\), and we can think of \(x\) and \(y\) as particularly simple functions of \(x\) and \(y\), and let \(f(x,y,z)=x^2+y^2+z^2\). Since \(f(x,y,z)=4\), \(\partial f/\partial x=0\), but using the chain rule:
\[\eqalign{ 0={\partial f\over\partial x}&=f_x{\partial x\over\partial x}+ f_y{\partial y\over\partial x}+f_z{\partial z\over \partial x}\cr &=(2x)(1)+(2y)(0)+(2z){\partial z\over\partial x},\cr }\]
noting that since \(y\) is temporarily held constant its derivative \({\partial y/\partial x}=0\). Now we can solve for \(\partial z/\partial x\):
\[{\partial z\over \partial x}=-{2x\over 2z}=-{x\over z}. \]
In a similar manner we can compute \(\partial z/\partial y\).
Contributors
Integrated by Justin Marshall.