3.5: The Chain Rule

Last updated
Save as PDF

Page ID: 465

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

So far we have seen how to compute the derivative of a function built up from other functions by addition, subtraction, multiplication and division. There is another very important way that we combine simple functions to make more complicated functions: function composition, as discussed in Section 2.3. For example, consider \( \sqrt{625-x^2}\). This function has many simpler components, like 625 and \( x^2\), and then there is that square root symbol, so the square root function \( \sqrt{x}=x^{1/2}\) is involved. The obvious question is: can we compute the derivative using the derivatives of the constituents \( 625-x^2\) and \( \sqrt{x}\)? We can indeed. In general, if \(f(x)\) and \(g(x)\) are functions, we can compute the derivatives of \(f(g(x))\) and \(g(f(x))\) in terms of \(f'(x)\) and \(g'(x)\).

Example \(\PageIndex{1}\)

Form the two possible compositions of \( f(x)=\sqrt{x}\) and \( g(x)=625-x^2\) and compute the derivatives.

Solution

First, \( f(g(x))=\sqrt{625-x^2}\), and the derivative is \( -x/\sqrt{625-x^2}\) as we have seen.

Second, \( g(f(x))=625-(\sqrt{x})^2=625-x\) with derivative \(-1\). Of course, these calculations do not use anything new, and in particular the derivative of \(f(g(x))\) was somewhat tedious to compute from the definition.

Suppose we want the derivative of \(f(g(x))\). Again, let's set up the derivative and play some algebraic tricks:

\[\eqalign{ {d\over dx}f(g(x))& =\lim_{\Delta x\to0} {f(g(x+\Delta x))-f(g(x))\over\Delta x}\cr& =\lim_{\Delta x\to0} {f(g(x+\Delta x))-f(g(x))\over g(x+\Delta x))-g(x)} {g(x+\Delta x))-g(x)\over\Delta x}\cr } \nonumber \]

Now we see immediately that the second fraction turns into \(g'(x)\) when we take the limit. The first fraction is more complicated, but it too looks something like a derivative. The denominator, \(g(x+\Delta x))-g(x)\), is a change in the value of \(g\), so let's abbreviate it as \(\Delta g=g(x+\Delta x))-g(x)\), which also means \(g(x+\Delta x)=g(x)+\Delta g\). This gives us

\[\lim_{\Delta x\to0} {f(g(x)+\Delta g)-f(g(x))\over \Delta g}. \nonumber \]

As \(\Delta x\) goes to 0, it is also true that \(\Delta g\) goes to 0, because \(g(x+\Delta x)\) goes to \(g(x)\). So we can rewrite this limit as

\[\lim_{\Delta g\to0} {f(g(x)+\Delta g)-f(g(x))\over \Delta g}. \nonumber \]

Now this looks exactly like a derivative, namely \(f'(g(x))\), that is, the function \(f'(x)\) with \(x\) replaced by \(g(x)\). If this all withstands scrutiny, we then get

\[{d\over dx}f(g(x))=f'(g(x))g'(x). \nonumber \]

Unfortunately, there is a small flaw in the argument. Recall that what we mean by \(\lim_{\Delta x\to0}\) involves what happens when \(\Delta x\) is close to 0, but not equal to 0. The qualification is very important, since we must be able to divide by \(\Delta x\). But when \(\Delta x\) is close to 0 but not equal to 0, \(\Delta g=g(x+\Delta x))-g(x)\) is close to 0 and possibly equal to 0. This means it doesn't really make sense to divide by \(\Delta g\).

Fortunately, it is possible to recast the argument to avoid this difficulty, but it is a bit tricky; we will not include the details, which can be found in many calculus books. Note that many functions \(g\) do have the property that \(g(x+\Delta x)-g(x)\not=0\) when \(\Delta x\) is small, and for these functions the argument above is fine.

The chain rule has a particularly simple expression if we use the Leibniz notation for the derivative. The quantity \(f'(g(x))\) is the derivative of \(f\) with \(x\) replaced by \(g\); this can be written \(df/dg\). As usual, \(g'(x)=dg/dx\). Then the chain rule becomes

\[{df\over dx} = {df\over dg}{dg\over dx}. \label{chaineq1} \]

Equation \ref{chaineq1} looks like trivial arithmetic, but it is not: \(dg/dx\) is not a fraction, that is, not literal division, but a single symbol that means \(g'(x)\). Nevertheless, it turns out that what looks like trivial arithmetic, and is therefore easy to remember, is really true.

It will take a bit of practice to make the use of the chain rule come naturally---it is more complicated than the earlier differentiation rules we have seen.

Example \(\PageIndex{2}\)

Compute the derivative of

\[ \sqrt{625-x^2}. \nonumber \]

Solution

We already know that the answer is \( -x/ \sqrt{625-x^2}\), computed directly from the limit. In the context of the chain rule, we have \( f(x)=\sqrt{x}\), \(g(x)=625-x^2\).

We know that \( f'(x)=(1/2)x^{-1/2}\), so \( f'(g(x))= (1/2)(625-x^2)^{-1/2}\). Note that this is a two step computation: first compute \(f'(x)\), then replace \(x\) by \(g(x)\). Since \(g'(x)=-2x\) we have

\[f'(g(x))g'(x)={1\over 2\sqrt{625-x^2}}(-2x)={-x\over \sqrt{625-x^2}}. \nonumber \]

Example \(\PageIndex{3}\)

Compute the derivative of \( 1/\sqrt{625-x^2}\).

Solution

This is a quotient with a constant numerator, so we could use the quotient rule, but it is simpler to use the chain rule. The function is \( (625-x^2)^{-1/2}\), the composition of \( f(x)=x^{-1/2}\) and \( g(x)=625-x^2\). We compute \( f'(x)=(-1/2)x^{-3/2}\) using the power rule, and then

\[f'(g(x))g'(x)={-1\over 2(625-x^2)^{3/2}}(-2x)={x\over (625-x^2)^{3/2}}. \nonumber \]

Example \(\PageIndex{4}\)

Compute the derivative of

\[f(x)={x^2-1\over x\sqrt{x^2+1}}. \nonumber \]

Solution

The "last'' operation here is division, so to get started we need to use the quotient rule first. This gives

\[ \eqalign{ f'(x)&={(x^2-1)'x\sqrt{x^2+1}-(x^2-1)(x\sqrt{x^2+1})'\over x^2(x^2+1)}\cr& ={2x^2\sqrt{x^2+1}-(x^2-1)(x\sqrt{x^2+1})'\over x^2(x^2+1)}.\cr } \nonumber \]

Now we need to compute the derivative of \( x\sqrt{x^2+1}\). This is a product, so we use the product rule:

\[{d\over dx}x\sqrt{x^2+1}=x{d\over dx}\sqrt{x^2+1}+\sqrt{x^2+1}. \nonumber \]

Finally, we use the chain rule:

\[{d\over dx}\sqrt{x^2+1}={d\over dx}(x^2+1)^{1/2}= {1\over 2}(x^2+1)^{-1/2}(2x)={x\over \sqrt{x^2+1}}. \nonumber \]

And putting it all together:

\[ \eqalign{ f'(x)&={2x^2\sqrt{x^2+1}-(x^2-1)(x\sqrt{x^2+1})'\over x^2(x^2+1)}.\cr& ={2x^2\sqrt{x^2+1}-(x^2-1)\left(x{ {x\over \sqrt{x^2+1}}} +\sqrt{x^2+1}\right)\over x^2(x^2+1)}.\cr } \nonumber \]

This can be simplified of course, but we have done all the calculus, so that only algebra is left.

In practice, of course, you will need to use more than one of the rules we have developed to compute the derivative of a complicated function.

Example \(\PageIndex{5}\)

Compute the derivative of \( \sqrt{1+\sqrt{1+\sqrt{x}}}\).

Solution

Here we have a more complicated chain of compositions, so we use the chain rule twice. At the outermost "layer'' we have the function \( g(x)=1+\sqrt{1+\sqrt{x}}\) plugged into \( f(x)=\sqrt{x}\), so applying the chain rule once gives

\[{d\over dx}\sqrt{1+\sqrt{1+\sqrt{x}}}= {1\over 2}\left(1+\sqrt{1+\sqrt{x}}\right)^{-1/2}{d\over dx} \left(1+\sqrt{1+\sqrt{x}}\right). \nonumber \]

Now we need the derivative of \( \sqrt{1+\sqrt{x}}\). Using the chain rule again:

\[{d\over dx}\sqrt{1+\sqrt{x}}={1\over 2}\left(1+\sqrt{x}\right)^{-1/2}{1\over 2}x^{-1/2}. \nonumber \]

So the original derivative is

\[ \eqalign{ {d\over dx}\sqrt{1+\sqrt{1+\sqrt{x}}}&= {1\over 2}\left(1+\sqrt{1+\sqrt{x}}\right)^{-1/2} {1\over 2}\left(1+\sqrt{x}\right)^{-1/2}{1\over 2}x^{-1/2}\cr& ={1\over 8 \sqrt{x}\sqrt{1+\sqrt{x}}\sqrt{1+\sqrt{1+\sqrt{x}}}} .} \nonumber \]

Using the chain rule, the power rule, and the product rule, it is possible to avoid using the quotient rule entirely.

Example \(\PageIndex{6}\)

Compute the derivative of \( f(x)={x^3\over x^2+1}\).

Solution

Write \( f(x)=x^3(x^2+1)^{-1}\), then

\[ \eqalign{ f'(x)&=x^3{d\over dx}(x^2+1)^{-1}+3x^2(x^2+1)^{-1}\cr& =x^3(-1)(x^2+1)^{-2}(2x)+3x^2(x^2+1)^{-1}\cr& =-2x^4(x^2+1)^{-2}+3x^2(x^2+1)^{-1}\cr& ={-2x^4\over (x^2+1)^{2}}+{3x^2\over x^2+1}\cr& ={-2x^4\over (x^2+1)^{2}}+{3x^2(x^2+1)\over (x^2+1)^{2}}\cr& ={-2x^4+3x^4+3x^2\over (x^2+1)^{2}}={x^4+3x^2\over (x^2+1)^{2}}.\cr } \nonumber \]

Note that we already had the derivative on the second line; all the rest is simplification. It is easier to get to this answer by using the quotient rule, so there's a trade off: more work for fewer memorized formulas.

Contributors

David Guichard (Whitman College)
Integrated by Justin Marshall.

Search

Text Color

Text Size

Margin Size

Font Type

Solution

Solution

Solution

Solution

Solution