6.10: More on Implicit Differentiation. Conditional Extrema

Last updated
Save as PDF

Page ID: 21631

Elias Zakon
University of Windsor via The Trilla Group (support by Saylor Foundation)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

I. Implicit differentiation was sketched in §7. Under suitable assumptions (Theorem 4 in §7), one can differentiate a given system of equations,

\[g_{k}\left(x_{1}, \ldots, x_{n}, y_{1}, \ldots, y_{m}\right)=0, \quad k=1,2, \ldots, n,\]

treating the \(x_{j}\) as implicit functions of the \(y_{i}\) without seeking an explicit solution of the form

\[x_{j}=H_{j}\left(y_{1}, \dots, y_{m}\right).\]

This yields a new system of equations from which the partials \(D_{i} H_{j}=\frac{\partial x_{j}}{\partial y_{i}}\) can be found directly.

We now supplement Theorem 4 in §7 (review it!) by showing that this new system is linear in the partials involved and that its determinant is \(\neq 0.\) Thus in general, it is simpler to solve than (1).

As in Part IV of §7, we set

\[(\vec{x}, \vec{y})=\left(x_{1}, \ldots, x_{n}, y_{1}, \ldots, y_{m}\right) \text { and } g=\left(g_{1}, \ldots, g_{n}\right),\]

replacing the \(f\) of §7 by \(g.\) Then equations (1) simplify to

\[g(\vec{x}, \vec{y})=\overrightarrow{0},\]

where \(g : E^{n+m} \rightarrow E^{n}\) (or \(g : C^{n+m} \rightarrow C^{n})\).

Theorem \(\PageIndex{1}\) (implicit differentiation)

Adopt all assumptions of Theorem 4 in §7, replacing \(f\) by \(g\) and setting \(H=\left(H_{1}, \ldots, H_{n}\right)\),

\[D_{j} g_{k}(\vec{p}, \vec{q})=a_{j k}, \quad j \leq n+m, \quad k \leq n.\]

Then for each \(i=1, \ldots, m,\) we have \(n\) linear equations,

\[\sum_{j=1}^{n} a_{j k} D_{i} H_{j}(\vec{q})=-a_{n+i, k}, \quad k \leq n,\]

with

\[\operatorname{det}\left(a_{j k}\right) \neq 0, \quad(j, k \leq n),\]

that uniquely determine the partials \(D_{i} H_{j}(\vec{q})\) for \(j=1,2, \ldots, n\).

Proof

As usual, extend the map \(H : Q \rightarrow P\) of Theorem 4 in §7 to \(H : E^{m} \rightarrow\) \(E^{n}\) (or \(C^{m} \rightarrow C^{n})\) by setting \(H=\overrightarrow{0}\) on \(-Q\).

Also, define \(\sigma : E^{m} \rightarrow E^{n+m}\left(C^{m} \rightarrow C^{n+m}\right)\) by

\[\sigma(\vec{y})=(H(\vec{y}), \vec{y})=\left(H_{1}(\vec{y}), \ldots, H_{n}(\vec{y}), y_{1}, \ldots, y_{m}\right), \quad \vec{y} \in E^{m}\left(C^{m}\right).\]

Then \(\sigma\) is differentiable at \(\vec{q} \in Q,\) as are its \(n+m\) components. (Why?) since \(\vec{x}=H(\vec{y})\) is a solution of (2), equations (1) and (2) become identities when \(\vec{x}\) is replaced by \(H(\vec{y}).\) Also, \(\sigma(\vec{q})=(H(\vec{q}), \vec{q})=(\vec{p}, \vec{q})\) since \(H(\vec{q})=\vec{p}\). Moreover,

\[g(\sigma(\vec{y}))=g(H(\vec{y}), \vec{y})=\overrightarrow{0} \text { for } \vec{y} \in Q;\]

i.e., \(g \circ \sigma=\overrightarrow{0}\) on \(Q\).

Now, by assumption, \(g \in C D^{1}\) at \((\vec{p}, \vec{q});\) so the chain rule (Theorem 2 in §4) applies, with \(f, \vec{p}, \vec{q}, n,\) and \(m\) replaced by \(\sigma, \vec{q},(\vec{p}, \vec{q}), m,\) and \(n+m,\) respectively.

As \(h=g \circ \sigma=\overrightarrow{0}\) on \(Q,\) an open set, the partials of \(h\) vanish on \(Q.\) So by Theorem 2 of §4, writing \(\sigma_{j}\) for the \(j\)th component of \(\sigma,\)

\[\overrightarrow{0}=\sum_{j=1}^{n+m} D_{j} g(\vec{p}, \vec{q}) \cdot D_{i} \sigma_{j}(\vec{q}), \quad i \leq m.\]

By (4), \(\sigma_{j}=H_{j}\) if \(j \leq n,\) and \(\sigma_{j}(\vec{y})=y_{i}\) if \(j=n+i.\) Thus \(D_{i} \sigma_{j}=D_{i} H_{j}\) \(j \leq n;\) but for \(j>n,\) we have \(D_{i} \sigma_{j}=1\) if \(j=n+i,\) and \(D_{i} \sigma_{j}=0\) otherwise. Hence by (5),

\[\overrightarrow{0}=\sum_{j=1}^{n} D_{j} g(\vec{p}, \vec{q}) \cdot D_{i} H_{j}(\vec{q})+D_{n+i} g(\vec{p}, \vec{q}), \quad i=1,2, \ldots, m.\]

As \(g=\left(g_{1}, \ldots, g_{n}\right),\) each of these vector equations splits into \(n\) scalar ones:

\[0=\sum_{j=1}^{n} D_{j} g_{k}(\vec{p}, \vec{q}) \cdot D_{i} H_{j}(\vec{q})+D_{n+i} g_{k}(\vec{p}, \vec{q}), \quad i \leq m, k \leq n.\]

With \(D_{j} g_{k}(\vec{p}, \vec{q})=a_{j k},\) this yields (3), where \(\operatorname{det}\left(a_{j k}\right)=\operatorname{det}\left(D_{j} g_{k}(\vec{p}, \vec{q})\right) \neq 0\) by hypothesis (see Theorem 4 in §7).

Thus all is proved.\(\quad \square\)

Note 1. By continuity (Note 1 in §6), we have det \(D_{j} g_{k}(\vec{x}, \vec{y})) \neq 0\) for all \((\vec{x}, \vec{y})\) in a sufficiently small neighborhood of \((\vec{p}, \vec{q}).\) Thus Theorem 1 holds also with \((\vec{p}, \vec{q})\) replaced by such \((\vec{x}, \vec{y})\). In practice, one does not have to memorize (3), but one obtains it by implicitly differentiating equations (1).

II. We shall now apply Theorem 1 to the theory of conditional extrema.

Definition 1

We say that \(f : E^{n+m} \rightarrow E^{1}\) has a local conditional maximum (minimum) at \(\vec{p} \in E^{n+m},\) with constraints

\[g=\left(g_{1}, \ldots, g_{n}\right)=\overrightarrow{0}\]

\(\left(g : E^{n+m} \rightarrow E^{n}\right)\) iff in some neighborhood \(G\) of \(\vec{p}\) we have

\[\Delta f=f(\vec{x})-f(\vec{p}) \leq 0 \quad(\geq 0, \text { respectively)}\]

for all \(\vec{x} \in G\) for which \(g(\vec{x})=\overrightarrow{0}\).

In §9 (Example (B) and Problems), we found such conditional extrema by using the constraint equations \(g=\overrightarrow{0}\) to eliminate some variables and thus reduce all to finding the unconditional extrema of a function of fewer (independent) variables.

Often, however, such elimination is cumbersome since it involves solving a system (1) of possibly nonlinear equations. It is here that implicit differentiation (based on Theorem 1) is useful.

Lagrange invented a method (known as that of multipliers) for finding the critical points at which such extrema may exist; to wit, we have the following:

Given \(f : E^{n+m} \rightarrow E^{1},\) set

\[F=f+\sum_{k=1}^{n} c_{k} g_{k},\]

where the constants \(c_{k}\) are to be determined and \(g_{k}\) are as above.

Then find the partials \(D_{j} F(j \leq n+m)\) and solve the system of \(2 n+m\) equations

\[D_{j} F(\vec{x})=0, \quad j \leq n+m, \quad \text { and } \quad g_{k}(\vec{x})=0, \quad k \leq n,\]

for the \(2 n+m\) "unknowns" \(x_{j}(j \leq n+m)\) and \(c_{k}(k \leq n),\) the \(c_{k}\) originating from (7).

Any \(\vec{x}\) satisfying (8), with the \(c_{k}\) so determined is a critical point (still to be tested). The method is based on Theorem 2 below, where we again write \((\vec{p}, \vec{q})\) for \(\vec{p}\) and \((\vec{x}, \vec{y})\) for \(\vec{x}\) (we call it "double notation").

Theorem \(\PageIndex{2}\) (Lagrange multipliers)

Suppose \(f : E^{n+m} \rightarrow E^{1}\) is differentiable at

\[(\vec{p}, \vec{q})=\left(p_{1}, \ldots, p_{n}, q_{1}, \ldots, q_{m}\right)\]

and has a local extremum at \((\vec{p}, \vec{q})\) subject to the constraints

\[g=\left(g_{1}, \ldots, g_{n}\right)=\overrightarrow{0},\]

with \(g\) as in Theorem 1, \(g : E^{n+m} \rightarrow E^{n}.\) Then

\[\sum_{k=1}^{n} c_{k} D_{j} g_{k}(\vec{p}, \vec{q})=-D_{j} f(\vec{p}, \vec{q}), \quad j=1,2, \ldots, n+m,\]

for certain multipliers \(c_{k}\) (determined by the first n equations in (9)).

Proof

These \(n\) equations admit a unique solution for the \(c_{k},\) as they are linear, and

\[\operatorname{det}\left(D_{j} g_{k}(\vec{p}, \vec{q})\right) \neq 0 \quad(j, k \leq n)\]

by hypothesis. With the \(c_{k}\) so determined, (9) holds for \(j \leq n.\) It remains to prove (9) for \(n<j \leq n+m.\)

Now, since \(f\) has a conditional extremum at \((\vec{p}, \vec{q})\) as stated, we have

\[f(\vec{x}, \vec{y})-f(\vec{p}, \vec{q}) \leq 0 \quad(\text { or } \quad \geq 0)\]

for all \((\vec{x}, \vec{y}) \in P \times Q\) with \(g(\vec{x}, \vec{y})=\overrightarrow{0},\) provided we make the neighborhood \(P \times Q\) small enough.

Define \(H\) and \(\sigma\) as in the previous proof (see (4)); so \(\vec{x}=H(\vec{y})\) is equivalent to \(g(\vec{x}, \vec{y})=\overrightarrow{0}\) for \((\vec{x}, \vec{y}) \in P \times Q\).

Then, for all such \((\vec{x}, \vec{y}),\) with \(\vec{x}=H(\vec{y}),\) we surely have \(g(\vec{x}, \vec{y})=\overrightarrow{0}\) and also

\[f(\vec{x}, \vec{y})=f(H(\vec{y}), \vec{y})=f(\sigma(\vec{y})).\]

Set \(h=f \circ \sigma, h : E^{m} \rightarrow E^{1}.\) Then (10) reduces to

\[h(\vec{y})-h(\vec{q}) \leq 0(\text { or } \geq 0) \quad \text { for all } \vec{y} \in Q.\]

This means that \(h\) has an unconditional extremum at \(\vec{q},\) an interior point of \(Q.\) Thus, by Theorem 1 in §9,

\[D_{i} h(\vec{q})=0, \quad i=1, \ldots, m.\]

Hence, applying the chain rule (Theorem 2 of §4) to \(h=f \circ \sigma,\) we get, much as in the previous proof,

\[\begin{aligned} 0 &=\sum_{j=1}^{n+m} D_{j} f(\vec{p}, \vec{q}) D_{i} \sigma_{j}(\vec{q}) \\ &=\sum_{j=1}^{n} D_{j} f(\vec{p}, \vec{q}) D_{i} H_{j}(\vec{q})+D_{n+i} f(\vec{p}, \vec{q}), \quad i \leq m. \end{aligned}\]

(Verify!)

Next, as \(g\) by hypothesis satisfies Theorem 1, we get equations (3) or equivalently (6). Multiplying (6) by \(c_{k},\) adding and combining with (11), we obtain

\[\begin{array}{l}{\sum_{j=1}^{n}\left[D_{j} f(\vec{p}, \vec{q})+\sum_{k=1}^{n} c_{k} D_{j} g_{k}(\vec{p}, \vec{q})\right] D_{i} H_{j}(\vec{q})} \\ {\qquad+D_{n+i} f(\vec{p}, \vec{q})+\sum_{k=1}^{n} c_{k} D_{n+i} g_{k}(\vec{p}, \vec{q})=0, \quad i \leq m}. \end{array}\]

(Verify!) But the square-bracketed expression is \(0;\) for we chose the \(c_{k}\) so as to satisfy (9) for \(j \leq n.\) Thus all simplifies to

\[\sum_{k=1}^{n} c_{k} D_{n+i} g_{k}(\vec{p}, \vec{q})=-D_{n+i} f(\vec{p}, \vec{q}), \quad i=1,2, \ldots, m.\]

Hence (9) holds for \(n<j \leq n+m,\) too, and all is proved.\(\quad \square\)

Remarks. Lagrange's method has the advantage that all variables (the \(x_{k}\) and \(y_{i}\)) are treated equally, without singling out the dependent ones. Thus in applications, one uses only \(F,\) i.e., \(f\) and \(g\) (not \(H\)).

One can also write \(\vec{x}=\left(x_{1}, \ldots, x_{n+m}\right)\) for \((\vec{x}, \vec{y})=\left(x_{1}, \ldots, x_{n}, y_{1}, \ldots, y_{m}\right)\) (the "double" notation was good for the proof only).

On the other hand, one still must solve equations (8).

Theorem 2 yields only a necessary condition (9) for extrema with constraints. There also are various sufficient conditions, but mostly one uses geometric and other considerations instead (as we did in §9). Therefore, we limit ourselves to one proposition (using "single" notation this time).

Theorem \(\PageIndex{3}\) (sufficient conditions)

Let

\[F=f+\sum_{k=1}^{n} c_{k} g_{k},\]

with \(f : E^{n+m} \rightarrow E^{1}, g : E^{n+m} \rightarrow E^{n},\) and \(c_{k}\) as in Theorem 2.

Then \(f\) has a maximum (minimum) at \(\vec{p}=\left(p_{1}, \ldots, p_{n+m}\right)\) (with constraints \(g=\left(g_{1}, \ldots, g_{n}\right)=\overrightarrow{0}\) whenever \(F\) does. (A fortiori, this is the case if \(F\) has an unconditional extremum at \(\vec{p}\).)

Proof

Suppose \(F\) has a maximum at \(\vec{p},\) with constraints \(g=\overrightarrow{0}.\) Then

\[0 \geq F(\vec{x})-F(\vec{p})=f(\vec{x})-f(\vec{p})+\sum_{k=1}^{n} c_{k}\left[g_{k}(\vec{x})-g_{k}(\vec{p})\right]\]

for those \(\vec{x}\) near \(\vec{p}\) (including \(\vec{x}=\vec{p} )\) for which \(g(\vec{x})=\overrightarrow{0}\).

But for such \(\vec{x}, g_{k}(\vec{x})=g_{k}(\vec{p})=0, c_{k}\left[g_{k}(\vec{x})-g_{k}(\vec{p})\right]=0,\) and so

\[0 \geq F(\vec{x})-F(\vec{p})=f(\vec{x})-f(\vec{p}).\]

Hence \(f\) has a maximum at \(\vec{p},\) with constraints as stated.

Similarly, \(\Delta F=\Delta f\) in case \(F\) has a conditional minimum at \(\vec{p}\).\(\quad \square\)

Example 1

Find the local extrema of

\[f(x, y, z, t)=x+y+z+t\]

on the condition that

\[g(x, y, z, t)=x y z t-a^{4}=0,\]

with \(a>0\) and \(x, y, z, t>0.\) (Note that inequalities do not count as "constraints" in the sense of Theorems 2 and 3.) Here one can simply eliminate \(t=a^{4} /(x y z),\) but it is still easier to use Lagrange's method.

Set \(F(x, y, z, t)=x+y+z+t+c x y z t.\) (We drop \(a^{4}\) since it will anyway disappear in differentiation.) Equations (8) then read

\[0=1+c y z t=1+c x z t=1+c x y t=1+c x y z, \quad x y z t-a^{4}=0.\]

Solving for \(x, z, t\) and \(c,\) we get \(c=-a^{-3}, x=y=z=t=a\).

Thus \(F(x, y, z, t)=x+y+z+t-x y z t / a^{3},\) and the only critical point is \(\vec{p}=(a, a, a, a).\) (Verify!)

By Theorem 3, one can now explore the sign of \(F(\vec{x})-F(\vec{p}),\) where \(\vec{x}=(x, y, z, t).\) For \(\vec{x}\) near \(\vec{p},\) it agrees with the sign of \(d^{2} F(\vec{p} ; \cdot).\) (See proof of Theorem 2 in §9.) We shall do it below, using yet another device, to be explained now.

Elimination of dependent differentials. If all partials of \(F\) vanish at \(\vec{p}\) (e.g., if \(\vec{p}\) satisfies (9), then \(d^{1} F(\vec{p} ; \cdot)=0\) on \(E^{n+m}\) (briefly \(d F \equiv 0\)).

Conversely, if \(d^{1} f(\vec{p} ; \cdot)=0\) on a globe \(G_{\vec{p}},\) for some function \(f\) on \(n\) independent variables, then

\[D_{k} f(\vec{p})=0, \quad k=1,2, \ldots, n,\]

since \(d^{1} f(\vec{p} ; \cdot)\) (a polynomial!) vanishes at infinitely many points if its coefficients \(D_{k} f(\vec{p})\) vanish. (The latter fails, however, if the variables are interdependent.)

Thus, instead of working with the partials, one can equate to \(0\) the differential \(d F\) or \(d f.\) Using the "variable" notation and the invariance of \(d f\) (Note 4 in §4), one then writes \(d x, d y, \ldots\) for the "differentials" of dependent and independent variables alike, and tries to eliminate the differentials of the dependent variables. We now redo Example 1 using this method.

Example 2

With \(f\) and \(g\) as in Example 1, we treat \(t\) as the dependent variable, i.e., an implicit function of \(x, y, z\),

\[t=a^{4} /(x y z)=H(x, y, z),\]

and differentiate the identity \(x y z t-a^{4}=0\) to obtain

\[0=y z t d x+x z t d y+x y t d z+x y z d t;\]

\[d t=-t\left(\frac{d x}{x}+\frac{d y}{y}+\frac{d z}{z}\right).\]

Substituting this value of \(d t\) in \(d f=d x+d y+d z+d t=0\) (the equation for critical points), we eliminate \(d t\) and find:

\[\left(1-\frac{t}{x}\right) d x+\left(1-\frac{t}{y}\right) d y+\left(1-\frac{t}{z}\right) d z \equiv 0.\]

As \(x, y, z\) are independent variables, this identity implies that the coefficients of \(d x, d y,\) and \(d z\) must vanish, as pointed out above. Thus

\[1-\frac{t}{x}=1-\frac{t}{y}=1-\frac{t}{z}=0.\]

Hence \(x=y=z=t=a\). (Why?) Thus again, the only critical point is \(\vec{p}=(a, a, a, a).\)

Now, returning to Lagrange's method, we use formula (5) in §5 to compute

\[d^{2} F=-\frac{2}{a}(d x d y+d x d z+d z d t+d x d t+d y d z+d y d t).\]

(Verify!)

We shall show that this expression is sign-constant (if \(x y z t=a^{4})\), near the critical point \(\vec{p}.\) Indeed, setting \(x=y=z=t=a\) in (12), we get \(d t=-(d x+d y+d z),\) and (13) turns into

\[\begin{array}{l}{\qquad \begin{aligned}-\frac{2}{a}[d x d y+d x d z+& d y d z-(d x+d y+d z)^{2} ] \\=& \frac{1}{a}\left[d x^{2}+d y^{2}+d z^{2}+(d x+d y+d z)^{2}\right]=d^{2} F. \end{aligned}}\end{array}\]

This expression is \(>0\) (for \(d x, d y,\) and \(d z\) are not all \(0\)). Thus \(f\) has
a local conditional minimum at \(\vec{p}=(a, a, a, a).\)

Caution; here we cannot infer that \(f(\vec{p})\) is the least value of \(f\) under the imposed conditions: \(x, y, z>0\) and \(x y z t=a^{4}.\)

The simplification due to the Cauchy invariant rule (Note 4 in §4) makes the use of the "variable" notation attractive, though caution ismandatory.

Note 2. When using Theorem 2, it suffices to ascertain that some \(n\) equations from (9) admit a solution for the \(c_{k};\) for then, renumbering the equations, one can achieve that these become the first \(n\) equations, as was assumed. This means that the \(n \times(n+m)\) matrix \(\left(D_{j} g_{k}(\vec{p}, \vec{q})\right)\) must be of rank \(n,\) i.e., contains an \(n \times n\)-submatrix (obtained by deleting some columns), with a nonzero determinant.

In the Problems we often use \(r, s, t, \ldots\) for Lagrange multipliers.