6.9: Local Extrema. Maxima and Minima
We say that \(f : E^{\prime} \rightarrow E^{1}\) has a local maximum (minimum) at \(\vec{p} \in E^{\prime}\) iff \(f(\vec{p})\) is the largest (least) value of \(f\) on some globe \(G\) about \(\vec{p};\) more precisely, iff
\[(\forall \vec{x} \in G) \quad \Delta f=f(\vec{x})-f(\vec{p})<0(>0).\]
We speak of an improper extremum if we only have \(\Delta f \leq 0( \geq 0)\) on \(G.\) In any case, all depends on the sign of \(\Delta f.\)
From Problem 6 in §1, recall the following necessary condition.
If \(f : E^{\prime} \rightarrow E^{1}\) has a local extremum at \(\vec{p}\) then \(D_{\vec{u}} f(\vec{p})=0\) for all \(\vec{u} \neq \overrightarrow{0}\) in \(E^{\prime}.\)
In the case \(E^{\prime}=E^{n}\left(C^{n}\right),\) this means that \(d^{1} f(\vec{p} ; \cdot)=0\) on \(E^{\prime}\).
(Recall that \(d^{1} f(\vec{p} ; \vec{t})=\sum_{k=1}^{n} D_{k} f(\vec{p}) t_{k}.\) It vanishes if the \(D_{k} f(\vec{p})\) do.
Note 1. This condition is only necessary, not sufficient. For example, if \(f(x, y)=x y,\) then \(d^{1} f(\overrightarrow{0} ; \cdot)=0 ;\) yet \(f\) has no extremum at \(\overrightarrow{0}.\) (Verify!)
Sufficient conditions were given in Theorem 2 of §5, for \(E^{\prime}=E^{1}.\) We now take up \(E^{\prime}=E^{2}.\)
Let \(f : E^{2} \rightarrow E^{1}\) be of class \(C D^{2}\) on a globe \(G=G_{\vec{p}}(\delta).\) Suppose \(d^{1} f(\vec{p} ; \cdot)=0\) on \(E^{2}.\) Set \(A=D_{11} f(\vec{p}), B=D_{12} f(\vec{p}),\) and \(C=D_{22} f(\vec{p})\).
Then the following statements are true.
(i) If \(A C>B^{2},\) has a maximum or minimum at \(\vec{p},\) according to whether
\(A<0\) or \(A>0.\)
(ii) If \(A C<B^{2}, f\) has no extremum at \(\vec{p}\).
The case \(A C=B\) is unresolved.
- Proof
-
Let \(\vec{x} \in G\) and \(\vec{u}=\vec{x}-\vec{p} \neq \overrightarrow{0}\).
As \(d^{1} f(\vec{p} ; \cdot)=0,\) Theorem 2 in §5, yields
\[\Delta f=f(\vec{x})-f(\vec{p})=R_{1}=\frac{1}{2} d^{2} f(\vec{s} ; \vec{u}),\]
with \(\vec{s} \in L(\vec{p}, \vec{x}) \subseteq G\) (see Corollary 1 of §5). As \(f \in C D^{2},\) we have \(D_{12} f=D_{21} f\) on \(G\) (Theorem 1 in §5). Thus by formula (4) in §5,
\[\Delta f=\frac{1}{2} d^{2} f(\vec{s} ; \vec{u})=\frac{1}{2}\left[D_{11} f(\vec{s}) u_{1}^{2}+2 D_{12} f(\vec{s}) u_{1} u_{2}+D_{22} f(\vec{s}) u_{2}^{2}\right].\]
Now, as the partials involved are continuous, we can choose \(G=G_{\vec{p}}(\delta)\) so small that the sign of expression (1) will not change if \(\vec{s}\) is replaced by \(\vec{p}\). Then the crucial sign of \(\Delta f\) on \(G\) coincides with that of
\[D=A u_{1}^{2}+2 B u_{1} u_{2}+C u_{2}^{2}\]
(with \(A, B,\) and \(C\) as stated in the theorem).
From (2) we obtain, by elementary algebra,
\[\begin{aligned} A D &=\left(A u_{1}+B u_{2}\right)^{2}+\left(A C-B^{2}\right) u_{2}^{2}, \\ C D &=\left(C u_{1}+B u_{2}\right)^{2}+\left(A C-B^{2}\right) u_{2}^{2}. \end{aligned}\]
Clearly, if \(A C>B^{2},\) the right-side expression in (3) is \(>0;\) so \(A D>0\), i.e., \(D\) has the same sign as \(A.\)
Hence if \(A<0,\) we also have \(\Delta f<0\) on \(G,\) and \(f\) has a maximum at \(\vec{p}.\) If \(A>0,\) then \(\Delta f>0,\) and \(f\) has a minimum at \(\vec{p}\).
Now let \(A C<B^{2}\). We claim that no matter how small \(G=G_{\vec{p}}(\delta), \Delta f\) changes sign as \(\vec{x}\) varies in \(G,\) and so \(f\) has no extremum at \(\vec{p}\).
Indeed, we have \(\vec{x}=\vec{p}+\vec{u}, \vec{u}=\left(u_{1}, u_{2}\right) \neq \overrightarrow{0}.\) If \(u_{2}=0,\) (3) shows that \(D\) and \(\Delta f\) have the same sign as \(A(A \neq 0).\)
But if \(u_{2} \neq 0\) and \(u_{1}=-B u_{2} / A\) (assuming \(A \neq 0),\) then \(D\) and \(\Delta f\) have the sign opposite to that of \(A;\) and \(\vec{x}\) is still in \(G\) if \(u_{2}\) is small enough (how small?).
One proceeds similarly if \(C \neq 0\) (interchange \(A\) and \(C,\) and use (3').
Finally, if \(A=C=0,\) then by (2), \(D=2 B u_{1} u_{2}\) and \(B \neq 0\) (since \(A C<B^{2})\). Again \(D\) and \(\Delta f\) change sign as \(u_{1} u_{2}\) does; so \(f\) has no extremum at \(\vec{p}.\) Thus all is proved.\(\quad \square\)
Briefly, the proof utilizes the fact that the trinomial (2) is sign-changing iff its discriminant \(B^{2}-A C\) is positive, i.e., \(\left|\begin{array}{cc}{A} & {B} \\ {B} & {C}\end{array}\right|<0\).
Note 2. Functions \(f : C \rightarrow E^{1}\) (of one complex variable) are likewise covered by Theorem 2 if one treats them as functions on \(E^{2}\) (of two real variables).
Functions of n variables. Here we must rely on the algebraic theory of so-called symmetric quadratic forms, i.e., polynomials \(P : E^{n} \rightarrow E^{1}\) of the form
\[P(\vec{u})=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} u_{i} u_{j},\]
where \(\vec{u}=\left(u_{i}, \ldots, u_{n}\right) \in E^{n}\) and \(a_{i j}=a_{j i} \in E^{1}\).
We take for granted a theorem due to J. J. Sylvester (see S. Perlis, Theory of Matrices, 1952, p. 197), which may be stated as follows.
Let \(P : E^{n} \rightarrow E^{1}\) be a symmetric quadratic form,
\[P(\vec{u})=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} u_{i} u_{j}.\]
(i) \(P>0\) on all of \(E^{n}-\{\overrightarrow{0}\}\) iff the following \(n\) determinants \(A_{k}\) are positive:
\[A_{k}=\left|\begin{array}{cccc}{a_{11}} & {a_{12}} & {\dots} & {a_{1 k}} \\ {a_{21}} & {a_{22}} & {\dots} & {a_{2 k}} \\ {\ldots} & {\ldots \ldots \ldots} & {\ldots} & {a_{2 k}} \\ {a_{k 1}} & {a_{k 2}} & {\dots} & {a_{k k}}\end{array}\right|, \quad k=1,2, \ldots, n.\]
(ii) We have \(P<0\) on \(E^{n}-\{\overrightarrow{0}\}\) iff \((-1)^{k} A_{k}>0\) for \(k=1,2, \ldots, n\).
Now we can extend Theorem 2 to the case \(f : E^{n} \rightarrow E^{1}\). (This will also cover \(f : C^{n} \rightarrow E^{1},\) treated as \(f : E^{2 n} \rightarrow E^{1}.)\) The proof resembles that of Theorem 2.
Let \(f : E^{n} \rightarrow E^{1}\) be of class \(C D^{2}\) on some \(G=G_{\vec{p}}(\delta).\) Suppose \(d f(\vec{p} ; \cdot)=0\) on \(E^{n}.\) Define the \(A_{k}\) as in (4), with \(a_{i j}=D_{i j} f(\vec{p}), i, j, k \leq n\) Then the following statements hold.
(i) \(f\) has a local minimum at \(\vec{p}\) if \(A_{k}>0\) for \(k=1,2, \ldots, n\).
(ii) \(f\) has a local maximum at \(\vec{p}\) if \((-1)^{k} A_{k}>0\) for \(k=1, \ldots, n\).
(iii) \(f\) has no extremum at \(\vec{p}\) if the expression
\[P(\vec{u})=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} u_{i} u_{j}\]
is \(>0\) for some \(\vec{u} \in E^{n}\) and \(<0\) for others (i.e., \(P\) changes sign on \(E^{n})\).
- Proof
-
Let again \(\vec{x} \in G, \vec{u}=\vec{x}-\vec{p} \neq \overrightarrow{0},\) and use Taylor's theorem to obtain
\[\Delta f=f(\vec{x})-f(\vec{p})=R_{1}=\frac{1}{2} d^{2} f(\vec{s} ; \vec{u})=\sum_{j=1}^{n} \sum_{i=1}^{n} D_{i j} f(\vec{s}) u_{i} u_{j},\]
with \(\vec{s} \in L(\vec{x}, \vec{p})\).
As \(f \in C D^{2},\) the partials \(D_{i j} f\) are continuous on \(G.\) Thus we can make \(G\) so small that the sign of the last double sum does not change if \(\vec{s}\) is replaced by \(\vec{p}\). Hence the sign of \(\Delta f\) on \(G\) is the same as that of \(P(\vec{u})=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} u_{i} u_{j}\), with the \(a_{i j}\) as stated in the theorem.
The quadratic form \(P\) is symmetric since \(a_{i j}=a_{j i}\) by Theorem 1 in §5. Thus by Sylvester's theorem stated above, one easily obtains our assertions (i) and (ii). Indeed, they are immediate from clauses (i) and (ii) of that theorem.
Now, for (iii), suppose \(P(\vec{u})>0>P(\vec{v}),\) i.e.,
\[\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} u_{i} u_{j}>0>\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} v_{i} v_{j} \quad \text { for some } \vec{u}, \vec{v} \in E^{n}-\{\overrightarrow{0}\}.\]
If here \(\vec{u}\) and \(\vec{v}\) are replaced by \(t \vec{u}\) and \(t \vec{v}(t \neq 0),\) then \(u_{i} u_{j}\) and \(v_{i} v_{j}\) turn into \(t^{2} u_{i} u_{j}\) and \(t^{2} v_{i} v_{j},\) respectively. Hence
\[P(t \vec{u})=t^{2} P(\vec{u})>0>t^{2} P(\vec{v})=P(t \vec{v}).\]
Now, for any \(t \in(0, \delta /|\vec{u}|),\) the point \(\vec{x}=\vec{p}+t \vec{u}\) lies on the \(\vec{u}\)-directed line through \(\vec{p},\) inside \(G=G_{\vec{p}}(\delta).\) (Why?) Similarly for the point \(\vec{x}^{\prime}=\vec{p}+t \vec{v}.\)
Hence for such \(\vec{x}\) and \(\vec{x}^{\prime},\) Taylor's theorem again yields formulas analogous to (5) for some \(\vec{s} \in L(\vec{p}, \vec{x})\) and \(\vec{s}^{\prime} \in L\left(\vec{p}, \vec{x}^{\prime}\right)\) lying on the same two lines. It again follows that for small \(\delta\),
\[f(\vec{x})-f(\vec{p})>0>f\left(\vec{x}^{\prime}\right)-f(\vec{p}),\]
just as \(P(\vec{u})>0>P(\vec{v})\).
Thus \(\Delta f\) changes sign on \(G_{\vec{p}}(\delta),\) and (iii) is proved.\(\quad \square\)
Note 3. Still unresolved are cases in which \(P(\vec{u})\) vanishes for some \(\vec{u} \neq \overrightarrow{0},\) without changing its sign; e.g., \(P(\vec{u})=\left(u_{1}+u_{2}+u_{3}\right)^{2}=0\) for \(\vec{u}=(1,1,-2)\). Then the answer depends on higher-order terms of the Taylor formula. In particular, if \(d^{1} f(\vec{p} ; \cdot)=d^{2} f(\vec{p} ; \cdot)=0\) on \(E^{n},\) then \(\Delta f=R_{2}=\frac{1}{6} d^{3} f(\vec{p} ; \vec{s}),\) etc.
Note 4. The largest or least value of \(f\) on a set \(A\) (sometimes called the absolute maximum or minimum) may occur at sominterior (e.g., boundary) point \(\vec{p} \in A,\) and then fails to be among the local extrema (where, by definition, a globe \(G_{\vec{p}} \subseteq A\) is presupposed). Thus to find absolute extrema, one must also explore the behaviour of \(f\) at noninterior points of \(A.\)
By Theorem 1, local extrema can occur only at so-called critical points \(\vec{p}\), i.e., those at which all directional derivatives vanish (or fail to exist, in which case \(D_{\vec{u}} f(\vec{p})=0\) by convention).
In practice, to find such points in \(E^{n}\left(C^{n}\right),\) one equates the partials \(D_{k} f\) \((k \leq n)\) to \(0.\) Then one uses Theorems 2 and 3 or other considerations to determine whether an extremum really exists.
(A) Find the largest value of
\[f(x, y)=\sin x+\sin y-\sin (x+y)\]
on the set \(A \subseteq E^{2}\) bounded by the lines \(x=0, y=0\) and \(x+y=2 \pi\).
We have
\[D_{1} f(x, y)=\cos x-\cos (x+y) \text { and } D_{2} f(x, y)=\cos y-\cos (x+y).\]
Inside the triangle \(A,\) both partials vanish only at the point \(\left(\frac{2 \pi}{3}, \frac{2 \pi}{3}\right)\) at which \(f=\frac{3}{2} \sqrt{3}.\) On the boundary of \(A\) (i.e., on the lines \(x=0, y=0\) and \(x+y=2 \pi ), f=0.\) Thus even without using Theorem 2, it is evident that \(f\) attains its largest value,
\[f\left(\frac{2 \pi}{3}, \frac{2 \pi}{3}\right)=\frac{3}{2} \sqrt{3},\]
at this unique critical point.
(B) Find the largest and the least value of
\[f(x, y, z)=a^{2} x^{2}+b^{2} y^{2}+c^{2} z^{2}-\left(a x^{2}+b y^{2}+c z^{2}\right)^{2},\]
on the condition that \(x^{2}+y^{2}+z^{2}=1\) and \(a>b>c>0\).
As \(z^{2}=1-x^{2}-y^{2},\) we can eliminate \(z\) from \(f(x, y, z)\) and replace \(f\)by \(F : E^{2} \rightarrow E^{1}:\)
\[F(x, y)=\left(a^{2}-c^{2}\right) x^{2}+\left(b^{2}-c^{2}\right) y^{2}+c^{2}-\left[(a-c) x^{2}+(b-c) y^{2}+c\right]^{2}.\]
(Explain!) For \(F,\) we seek the extrema on the disc \(\overline{G}=\overline{G}_{0}(1) \subset E^{2},\) where \(x^{2}+y^{2} \leq 1\) (so as not to violate the condition \(x^{2}+y^{2}+z^{2}=1)\).
Equating to 0 the two partials
\[\begin{array}{l}{D_{1} F(x, y)=2 x(a-c)\left\{(a+c)-2\left[(a-c) x^{2}+(b-c) y^{2}+c\right]^{2}\right\}=0}, \\ {D_{2} F(x, y)=2 y(b-c)\left\{(b+c)-2\left[(a-c) x^{2}+(b-c) y^{2}+c\right]^{2}\right\}=0}\end{array}\]
and solving this system of equations, we find these critical points inside \(G:\)
(1) \(x=y=0\) (\(F=0)\);
(2) \(x=0, y=\pm 2^{-\frac{1}{2}}\left(F=\frac{1}{4}(b-c)^{2}\right);\) and
(3) \(x=\pm 2^{-\frac{1}{2}}, y=0\left(F=\frac{1}{4}(a-c)^{2}\right)\).
(Verify!)
Now, for the boundary of \(\overline{G},\) i.e., the circle \(x^{2}+y^{2}=1,\) repeat this process: substitute \(y^{2}=1-x^{2}\) in the formula for \(F(x, y),\) thus reducing it to
\[h(x)=\left(a^{2}-b^{2}\right) x^{2}+b^{2}+\left[(a-b) x^{2}+b\right]^{2}, \quad h : E^{1} \rightarrow E^{1},\]
on the interval \([-1,1] \subset E^{1}.\) In \((-1,1)\) the derivative
\[h^{\prime}(x)=2(a-b) x\left(1-2 x^{2}\right)\]
vanishes only when
(4) \(x=0\) (\(h=0),\) and
(5) \(x=\pm 2^{-\frac{1}{2}}\left(h=\frac{1}{4}(a-b)^{2}\right)\).
Finally, at the endpoints of \([-1,1],\) we have
(6) \(x=\pm 1\) (\(h=0)\).
Comparing the resulting function values in all six cases, we conclude that the least of them is \(0,\) while the largest is \(\frac{1}{4}(a-c)^{2}.\) These are the desired least and largest values of \(f,\) subject to the conditions stated. They are attained, respectively, at the points
\[(0,0, \pm 1),(0, \pm 1,0),( \pm 1,0,0), \text { and }\left( \pm 2^{-\frac{1}{2}}, 0, \pm 2^{-\frac{1}{2}}\right).\]
Again, the use of Theorems 2 and 3 was redundant. However, we suggest as an exercise that the reader test the critical points of \(F\) by using Theorem 2.
Caution. Theorems 1 to 3 apply to functions of independent variables only. In Example (B), \(x, y, z\) were made interdependent by the imposed equation
\[x^{2}+y^{2}+z^{2}=1\]
(which geometrically limits all to the surface of \(G_{\overrightarrow{0}}(1)\) in \(E^{3}),\) so that one of them, \(z,\) could be eliminated. Only then can Theorems 1 to 3 be used.