6.4: The Chain Rule. The Cauchy Invariant Rule

Last updated
Save as PDF

Page ID: 19200

Elias Zakon
University of Windsor via The Trilla Group (support by Saylor Foundation)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

To generalize the chain rule (Chapter 5, §1), we consider the composite \(h=g \circ f\) of two functions, \(f : E^{\prime} \rightarrow E^{\prime \prime}\) and \(g : E^{\prime \prime} \rightarrow E,\) with \(E^{\prime}, E^{\prime \prime},\) and \(E\) as before.

Theorem \(\PageIndex{1}\) (chain rule)

\[f : E^{\prime} \rightarrow E^{\prime \prime} \text { and } g : E^{\prime \prime} \rightarrow E\]

are differentiable at \(\vec{p}\) and \(\vec{q}=f(\vec{p}),\) respectively, then

\[h=g \circ f\]

is differentiable at \(\vec{p},\) and

\[d h(\vec{p} ; \cdot)=d g(\vec{q} ; \cdot) \circ d f(\vec{p} ; \cdot).\]

Briefly: "The differential of the composite is the composite of differentials."

Proof

Let \(U=d f(\vec{p} ; \cdot), V=d g(\vec{q} ; \cdot),\) and \(\phi=V \circ U\).

As \(U\) and \(V\) are linear continuous maps, so is \(\phi.\) We must show that \(\phi= d h(\vec{p} ; \cdot).\)

Here it is more convenient to write \(\Delta \vec{x}\) or \(\vec{x}-\vec{p}\) for the "\(\vec{t}\)" of Definition 1 in §3. For brevity, we set (with \(\vec{q}=f(\vec{p}))\)

\[\begin{aligned} w(\vec{x}) &=\Delta h-\phi(\Delta \vec{x})=h(\vec{x})-h(\vec{p})-\phi(\vec{x}-\vec{p}), \quad \vec{x} \in E^{\prime}, \\ u(\vec{x}) &=\Delta f-U(\Delta \vec{x})=f(\vec{x})-f(\vec{p})-U(\vec{x}-\vec{p}), \quad \vec{x} \in E^{\prime}, \\ v(\vec{y}) &=\Delta g-V(\Delta \vec{y})=g(\vec{y})-g(\vec{q})-V(\vec{y}-\vec{q}), \quad \vec{y} \in E^{\prime \prime}. \end{aligned}\]

Then what we have to prove (see Definition 1 in §3) reduces to

\[\lim _{\vec{x} \rightarrow \vec{p}} \frac{w(\vec{x})}{|\vec{x}-\vec{p}|}=0,\]

while the assumed existence of \(d f(\vec{p};\cdot)=U\) and \(d g(\vec{q};\cdot)=V\) can be expressed as

\[\lim _{\vec{x} \rightarrow \vec{p}} \frac{u(\vec{x})}{|\vec{x}-\vec{p}|}=0,\]

and

\[\lim _{\overline{y} \rightarrow \vec{q}} \frac{v(\vec{y})}{|\vec{y}-\vec{q}|}=0, \quad \vec{q}=f(\vec{p}).\]

From (2) and (3), recalling that \(h=g \circ f\) and \(\phi=V \circ U,\) we obtain

\[\begin{aligned} w(\vec{x}) &=g(f(\vec{x}))-g(\vec{q})-V(U(\vec{x}-\vec{p})) \\ &=g(f(\vec{x}))-g(\vec{q})-V(f(\vec{x})-f(\vec{p})-u(\vec{x})). \end{aligned}\]

Using (4), with \(\vec{y}=f(\vec{x}),\) and the linearity of \(V,\) we rewrite (6) as

\[\begin{aligned} w(\vec{x}) &=g(f(\vec{x}))-g(\vec{q})-V(f(\vec{x})-f(\vec{p}))-V(u(\vec{x})) \\ &=v(f(\vec{x}))+V(u(\vec{x})). \end{aligned}\]

(Verify!) Thus the desired formula (5) will be proved if we show that

\[\lim _{\vec{x} \rightarrow \vec{p}} \frac{V(u(\vec{x}))}{|\vec{x}-\vec{p}|}=0\]

and

\[\lim _{\vec{x} \rightarrow \vec{p}} \frac{v(f(\vec{x}))}{|\vec{x}-\vec{p}|}=0.\]

Now, as \(V\) is linear and continuous, formula (5') yields (6'). Indeed,

\[\lim _{\vec{x} \rightarrow \vec{p}} \frac{V(u(\vec{x}))}{|\vec{x}-\vec{p}|}=\lim _{\vec{x} \rightarrow \vec{p}} V\left(\frac{u(\vec{x})}{|\vec{x}-\vec{p}|}\right)=V(0)=0\]

by Corollary 2 in Chapter 4, §2. (Why?)

Similarly, (5") implies (6") by substituting \(\vec{y}=f(\vec{x}),\) since

\[|f(\vec{x})-f(\vec{p})| \leq K|\vec{x}-\vec{p}|\]

by Problem 3(iii) in §3. (Explain!) Thus all is proved. \(\quad \square\)

Note 1 (Cauchy invariant rule). Under the same assumptions, we also have

\[d h(\vec{p} ; \vec{t})=d g(\vec{q} ; \vec{s})\]

if \(\vec{s}=d f(\vec{p} ; \vec{t}), \vec{t} \in E^{\prime}\).

For with \(U\) and \(V\) as above,

\[d h(\vec{p} ; \cdot)=\phi=V \circ U.\]

Thus if

\[\vec{s}=d f(\vec{p} ; \vec{t})=U(\vec{t}),\]

we have

\[d h(\vec{p} ; \vec{t})=\phi(\vec{t})=V(U(\vec{t}))=V(\vec{s})=d g(\vec{q} ; \vec{s}),\]

proving (7).

Note 2. If

\[E^{\prime}=E^{n}\left(C^{n}\right), E^{\prime \prime}=E^{m}\left(C^{m}\right), \text { and } E=E^{r}\left(C^{r}\right)\]

then by Theorem 3 of §2 and Definition 2 in §3, we can write (1) in matrix form,

\[\left[h^{\prime}(\vec{p})\right]=\left[g^{\prime}(\vec{q})\right]\left[f^{\prime}(\vec{p})\right],\]

resembling Theorem 3 in Chapter 5, §1 (with \(f\) and \(g\) interchanged). Moreover, we have the following theorem.

Theorem \(\PageIndex{2}\)

With all as in Theorem 1, let

\[E^{\prime}=E^{n}\left(C^{n}\right), E^{\prime \prime}=E^{m}\left(C^{m}\right),\]

and

\[f=\left(f_{1}, \ldots, f_{m}\right).\]

Then

\[D_{k} h(\vec{p})=\sum_{i=1}^{m} D_{i} g(\vec{q}) D_{k} f_{i}(\vec{p});\]

or, in classical notation,

\[\frac{\partial}{\partial x_{k}} h(\vec{p})=\sum_{i=1}^{m} \frac{\partial}{\partial y_{i}} g(\vec{q}) \cdot \frac{\partial}{\partial x_{k}} f_{i}(\vec{p}), \quad k=1,2, \ldots, n.\]

Proof

Fix any basic vector \(\vec{e}_{k}\) in \(E^{\prime}\) and set

\[\vec{s}=d f\left(\vec{p} ; \vec{e}_{k}\right), \quad \vec{s}=\left(s_{1}, \ldots, s_{m}\right) \in E^{\prime \prime}.\]

As \(f\) is differentiable at \(\vec{p},\) so are its components \(f_{i}\) (Problem 9 in §3), and

\[s_{i}=d f_{i}\left(\vec{p} ; \vec{e}_{k}\right)=D_{k} f_{i}(\vec{p})\]

by Theorem 2(ii) in §3. Using also Corollary 1 in §3, we get

\[d g(\vec{q} ; \vec{s})=\sum_{i=1}^{m} s_{i} D_{i} g(\vec{q})=\sum_{i=1}^{m} D_{k} f_{i}(\vec{p}) D_{i} g(\vec{q}).\]

But as \(\vec{s}=d f\left(\vec{p} ; \vec{e}_{k}\right),\) formula (7) yields

\[d g(\vec{q} ; \vec{s})=d h\left(\vec{p} ; \vec{e}_{k}\right)=D_{k} h(\vec{p})\]

by Theorem 2(ii) in §3. Thus the result follows. \(\quad \square\)

Note 3. Theorem 2 is often called the chain rule for functions of several variables. It yields Theorem 3 in Chapter 5, §1, if \(m=n=1\).

In classical calculus one often speaks of derivatives and differentials of variables \(y=f\left(x_{1}, \ldots, x_{n}\right)\) rather than those of mappings. Thus Theorem 2 is stated as follows.

Let \(u=g\left(y_{1}, \ldots, y_{m}\right)\) be differentiable. If, in turn, each

\[y_{i}=f_{i}\left(x_{1}, \dots, x_{n}\right)\]

is differentiable for \(i=1, \ldots, m,\) then \(u\) is also differentiable as a composite function of the \(n\) variables \(x_{k},\) and ("simplifying" formula (8)) we have

\[\frac{\partial u}{\partial x_{k}}=\sum_{i=1}^{m} \frac{\partial u}{\partial y_{i}} \frac{\partial y_{i}}{\partial x_{k}}, \quad k=1,2, \ldots, n.\]

It is understood that the partials

\[\frac{\partial u}{\partial x_{k}} \text { and } \frac{\partial y_{i}}{\partial x_{k}} \text { are taken at some } \vec{p} \in E^{\prime},\]

while the \(\partial u / \partial y_{i}\) are at \(\vec{q}=f(\vec{p}),\) where \(f=\left(f_{1}, \ldots, f_{m}\right).\) This "variable" notation is convenient in computations, but may cause ambiguities (see the next example).

Example

Let \(u=g(x, y, z),\) where \(z\) depends on \(x\) and \(y:\)

\[z=f_{3}(x, y).\]

Set \(f_{1}(x, y)=x, f_{2}(x, y)=y, f=\left(f_{1}, f_{2}, f_{3}\right),\) and \(h=g \circ f;\) so

\[h(x, y)=g(x, y, z).\]

By (8'),

\[\frac{\partial u}{\partial x}=\frac{\partial u}{\partial x} \frac{\partial x}{\partial x}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial x}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial x}.\]

Here

\[\frac{\partial x}{\partial x}=\frac{\partial f_{1}}{\partial x}=1 \text { and } \frac{\partial y}{\partial x}=0,\]

for \(f_{2}\) does not depend on \(x.\) Thus we obtain

\[\frac{\partial u}{\partial x}=\frac{\partial u}{\partial x}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial x}.\]

(Question: Is \((\partial u / \partial z)(\partial z / \partial x)=0?\))

The trouble with (9) is that the variable \(u\) "poses" as both \(g\) and \(h.\) On the left, it is \(h;\) on the right, it is \(g.\)

To avoid this, our method is to differentiate well-defined mappings, not "variables." Thus in (9), we have the maps

\[g : E^{3} \rightarrow E \text { and } f : E^{2} \rightarrow E^{3},\]

with \(f_{1}, f_{2}, f_{3}\) as indicated. Then if \(h=g \circ f,\) Theorem 2 states (9) unambiguously as

\[D_{1} h(\vec{p})=D_{1} g(\vec{q})+D_{3} g(\vec{q}) \cdot D_{1} f(\vec{p}),\]

where \(\vec{p} \in E^{2}\) and

\[\vec{q}=f(\vec{p})=\left(p_{1}, p_{2}, f_{3}(\vec{p})\right).\]

(Why?) In classical notation,

\[\frac{\partial h}{\partial x}=\frac{\partial g}{\partial x}+\frac{\partial g}{\partial z} \frac{\partial f_{3}}{\partial x}\]

(avoiding the "paradox" of (9)).

Nonetheless, with due caution, one may use the "variable" notation where convenient. The reader should practice both (see the Problems).

Note 4. The Cauchy rule (7), in "variable" notation, turns into

\[d u=\sum_{i=1}^{m} \frac{\partial u}{\partial y_{i}} d y_{i}=\sum_{k=1}^{n} \frac{\partial u}{\partial x_{k}} d x_{k},\]

where \(d x_{k}=t_{k}\) and \(d y_{i}=d f_{i}(\vec{p} ; \vec{t})\).

Indeed, by Corollary 1 in §3,

\[d h(\vec{p} ; \vec{t})=\sum_{k=1}^{n} D_{k} h(\vec{p}) \cdot t_{k} \text { and } d g(\vec{q} ; \vec{s})=\sum_{i=1}^{m} D_{i} g(\vec{q}) \cdot s_{i}.\]

Now, in (7),

\[\vec{s}=\left(s_{1}, \ldots, s_{m}\right)=d f(\vec{p} ; \vec{t});\]

so by Problem 9 in §3,

\[d f_{i}(\vec{p} ; \vec{t})=s_{i}, \quad i=1, \ldots, m.\]

Rewriting all in the "variable" notation, we obtain (10).

The "advantage" of (10) is that \(d u\) has the same form, independently of whether \(u\) is treated as a function of the \(x_{k}\) or of the \(y_{i}\) (hence the name "invariant" rule). However, one must remember the meaning of \(d x_{k}\) and \(d y_{i},\) which are quite different.

The "invariance" also fails completely for differentials of higher order (§5).

The advantages of the "variable" notation vanish unless one is able to "translate" it into precise formulas.