Processing math: 53%
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Mathematics LibreTexts

5.1: Structure of Rn

( \newcommand{\kernel}{\mathrm{null}\,}\)

In this chapter we study functions defined on subsets of the real n-dimensional space \Rn, which consists of all ordered n-tuples X=(x1,x2,,xn) of real numbers, called the {} or {} of X. This space is sometimes called {}.

In this section we introduce an algebraic structure for \Rn. We also consider its {} properties; that is, properties that can be described in terms of a special class of subsets, the neighborhoods in \Rn. In Section~1.3 we studied the topological properties of \R1, which we will continue to denote simply as \R. Most of the definitions and proofs in Section~1.3 were stated in terms of neighborhoods in \R. We will see that they carry over to \Rn if the concept of neighborhood in \Rn is suitably defined.

Members of \R have dual interpretations: geometric, as points on the real line, and algebraic, as real numbers. We assume that you are familiar with the geometric interpretation of members of \R2 and \R3 as the rectangular coordinates of points in a plane and three-dimensional space, respectively. Although \Rn cannot be visualized geometrically if n4, geometric ideas from \R, \R2, and \R3 often help us to interpret the properties of \Rn for arbitrary n.

As we said in Section~1.3, the idea of neighborhood is always associated with some definition of ``closeness’’ of points. The following definition imposes an algebraic structure on \Rn, in terms of which the distance between two points can be defined in a natural way. In addition, this algebraic structure will be useful later for other purposes.

-2em

Note that $+$'' has two distinct meanings in \eqref{eq:5.1.1}: on the left,+’‘stands for the newly defined addition of members of \Rn and, on the right, for addition of real numbers. However, this can never lead to confusion, since the meaning of ``+’’ can always be deduced from the symbols on either side of it. A similar comment applies to the use of juxtaposition to indicate scalar multiplication on the left of and multiplication of real numbers on the right.

We leave the proof of the following theorem to you (Exercise~).

Clearly, 0=(0,0,,0) and, if X=(x1,x2,,xn), then X=(x1,x2,,xn). We write X+(Y) as XY. The point 0 is called the {}.

A nonempty set V={X,Y,Z,}, together with rules such as , associating a unique member of V with every ordered pair of its members, and , associating a unique member of V with every real number and member of V, is said to be a {} if it has the properties listed in Theorem~. The members of a vector space are called {}. When we wish to emphasize that we are regarding a member of \Rn as part of this algebraic structure, we will speak of it as a vector; otherwise, we will speak of it as a point.

If n=1, this definition of length reduces to the familiar absolute value, and the distance between two points is the length of the interval having them as endpoints; for n=2 and n=3, the length and distance of Definition~ reduce to the familiar definitions for the plane and three-dimensional space.

If Y=0, then both sides of are 0, so holds, with equality. In this case, Y=0X. Now suppose that Y0 and t is any real number. Then 0\ar\dstni=1(xityi)2\ar=\dstni=1x2i2tni=1xiyi+t2ni=1y2i\ar=|X|22(XY)t+t2|Y|2. The last expression is a second-degree polynomial p in t. From the quadratic formula, the zeros of p are t=(XY)±(XY)2|X|2|Y|2|Y|2. Hence, (XY)2|X|2|Y|2, because if not, then p would have two distinct real zeros and therefore be negative between them (Figure~), contradicting the inequality . Taking square roots in yields if Y0.

If X=tY, then |XY|=|X||Y|=|t||Y|2 (verify), so equality holds in . Conversely, if equality holds in , then p has the real zero t0=(XY)/|Y2, and ni=1(xit0yi)2=0 from ; therefore, X=t0Y.

6pt

12pt

By definition, \boldsymbol{\begin{equation} \label{eq:5.1.7}
\begin{array}{rcl}
|\mathbf{X}+\mathbf{Y}|^2\ar=\dst\sum^n_{i=1} (x_i+y_i)^2=\sum^n_{i=1} x^2_i+
2\sum^n_{i=1} x_iy_i+\sum^n_{i=1}y^2_i\\[4\jot]
\ar=|\mathbf{X}|^2+2(\mathbf{X}\cdot\mathbf{Y})+|\mathbf{Y}|^2\\[2\jot]
\ar\le  |\mathbf{X}|^2+2|\mathbf{X}|\,|\mathbf{Y}|+|\mathbf{Y}|^2\mbox{\quad (by
Schwarz's inequality)}\\[2\jot]
\ar=(|\mathbf{X}|+|\mathbf{Y}|)^2.
\end{array}
\end{equation} \nonumber}
Hence, |X+Y|2(|X|+|Y|)2. Taking square roots yields .

From the third line of , equality holds in if and only if XY=|X||Y|, which is true if and only if one of the vectors X and Y is a nonnegative scalar multiple of the other (Lemma~).

Write XZ=(XY)+(YZ), and apply Theorem~ with X and Y replaced by XY and YZ.

Since X=Y+(XY), Theorem~ implies that |X||Y|+|XY|, which is equivalent to |X||Y||XY|. Interchanging X and Y yields |Y||X||YX|. Since |XY|=|YX|, the last two inequalities imply the stated conclusion.

The next theorem lists properties of length, distance, and inner product that follow directly from Definitions~ and . We leave the proof to you (Exercise~).

The equation of a line through a point X0=(x0,y0,z0) in \R3 can be written parametrically as x=x0+u1t,y=y0+u2t,z=z0+u3t,<t<, where u1, u2, and u3 are not all zero. We write this in vector form as X=X0+tU,<t<, with U=(u1,u2,u3), and we say that the line is {}.

There are many ways to represent a given line parametrically. For example, X=X0+sV,<s<, represents the same line as if and only if V=aU for some nonzero real number a. Then the line is traversed in the same direction as s and t vary from to if a>0, or in opposite directions if a<0.

To write the parametric equation of a line through two points X0 and X1 in \R3, we take U=X10 in , which yields X=X0+t(X1X0)=tX1+(1t)X0,<t<. The line segment from X0 to X1 consists of those points for which 0t1.

These familiar notions can be generalized to \Rn, as follows:

Having defined distance in \Rn, we are now able to say what we mean by a neighborhood of a point in \Rn.

An ϵ-neighborhood of a point X0 in \R2 is the inside, but not the circumference, of the circle of radius ϵ about X0. In \R3 it is the inside, but not the surface, of the sphere of radius ϵ about X0.

In Section~1.3 we stated several other definitions in terms of ϵ-neighborhoods: {}, {}, {}, {}, {},{}, {}, {}, {}, {}, {}, and {}. Since these definitions are the same for \Rn as for \R, we will not repeat them. We advise you to read them again in Section~1.3, substituting \Rn for \R and X0 for x0.

1pc 12pt 6pt

12pt

6pt

12pt

Open and closed n-balls are generalizations to \Rn of open and closed intervals.

The following lemma will be useful later in this section, when we consider connected sets.

The line segment is given by X=tX2+(1t)X1,0<t<1. Suppose that r>0. If |X1X0|<r,|X2X0|<r, and 0<t<1, then |XX0|\ar=|tX2+(1t)X1tX0(1t)X0|\ar=|t(X2X0)+(1t)X1X0)|\art|X2X0|+(1t)|X1X0|\ar<tr+(1t)r=r. -2em2em

The proofs in Section~1.3 of Theorem~ (the union of open sets is open, the intersection of closed sets is closed) and Theorem~ and its Corollary~ (a set is closed if and only if it contains all its limit points) are also valid in \Rn. You should reread them now.

The Heine–Borel theorem (Theorem~) also holds in \Rn, but the proof in Section~1.3 is valid only for n=1. To prove the Heine–Borel theorem for general n, we need some preliminary definitions and results that are of interest in their own right.

The next two theorems follow from this, the definition of distance in \Rn, and what we already know about convergence in \R. We leave the proofs to you (Exercises~ and ).

The next definition generalizes the definition of the diameter of a circle or sphere.

Let {Xr} be a sequence such that XrSr (r1). Because of , XrSk if rk, so |XrXs|<d(Sk)\quad if\quadr,sk. From and Theorem~, Xr converges to a limit ¯X. Since ¯X is a limit point of every Sk and every Sk is closed, ¯X is in every Sk (Corollary~). Therefore, ¯XI, so I. Moreover, ¯X is the only point in I, since if YI, then |¯XY|d(Sk),k1, and implies that Y=¯X.

We can now prove the Heine–Borel theorem for \Rn. This theorem concerns {} sets. As in \R, a compact set in \Rn is a closed and bounded set.

Recall that a collection H of open sets is an open covering of a set S if S\setHHH.

The proof is by contradiction. We first consider the case where n=2, so that you can visualize the method. Suppose that there is a covering H for S from which it is impossible to select a finite subcovering. Since S is bounded, S is contained in a closed square T={(x,y)|a1xa1+L,a2xa2+L} with sides of length L (Figure~).

6pt

12pt

Bisecting the sides of T as shown by the dashed lines in Figure~ leads to four closed squares, T(1),T(2), T(3), and T(4), with sides of length L/2. Let S(i)=ST(i),1i4. Each S(i), being the intersection of closed sets, is closed, and S=4i=1S(i). Moreover, H covers each S(i), but at least one S(i) cannot be covered by any finite subcollection of H, since if all the S(i) could be, then so could S. Let S1 be a set with this property, chosen from S(1), S(2), S(3), and S(4). We are now back to the situation we started from: a compact set S1 covered by H, but not by any finite subcollection of H. However, S1 is contained in a square T1 with sides of length L/2 instead of L. Bisecting the sides of T1 and repeating the argument, we obtain a subset S2 of S1 that has the same properties as S, except that it is contained in a square with sides of length L/4. Continuing in this way produces a sequence of nonempty closed sets S0(=S), S1, S2, , such that SkSk+1 and d(Sk)L/2k1/2(k0). From Theorem~, there is a point ¯X in k=1Sk. Since ¯XS, there is an open set H in H that contains ¯X, and this H must also contain some ϵ-neighborhood of ¯X. Since every X in Sk satisfies the inequality |X¯X|2k+1/2L, it follows that SkH for k sufficiently large. This contradicts our assumption on H, which led us to believe that no Sk could be covered by a finite number of sets from H. Consequently, this assumption must be false: H must have a finite subcollection that covers S. This completes the proof for n=2.

The idea of the proof is the same for n>2. The counterpart of the square T is the {} with sides of length L: T=\set(x1,x2,,xn)aixiai+L,i=1,2,,n. Halving the intervals of variation of the n coordinates x1, x2, , xn divides T into 2n closed hypercubes with sides of length L/2: T(i)=\set(x1,x2,,xn)bixibi+L/2,1in, where bi=ai or bi=ai+L/2. If no finite subcollection of H covers S, then at least one of these smaller hypercubes must contain a subset of S that is not covered by any finite subcollection of S. Now the proof proceeds as for n=2.

The Bolzano–Weierstrass theorem is valid in \Rn; its proof is the same as in \R.

Although it is legitimate to consider functions defined on arbitrary domains, we restricted our study of functions of one variable mainly to functions defined on intervals. There are good reasons for this. If we wish to raise questions of continuity and differentiability at every point of the domain D of a function f, then every point of D must be a limit point of D0. Intervals have this property. Moreover, the definition of baf(x)dx is obviously applicable only if f is defined on [a,b].

It is not productive to consider questions of continuity and differentiability of functions defined on the union of disjoint intervals, since many important results simply do not hold for such domains. For example, the intermediate value theorem (Theorem~; see also Exercise~) says that if f is continuous on an interval I and f(x1)<μ<f(x2) for some x1 and x2 in I, then f(¯x)=μ for some ¯x in I. Theorem~ says that f is constant on an interval I if f0 on I. Neither of these results holds if I is the union of disjoint intervals rather than a single interval; thus, if f is defined on I=(0,1)(2,3) by f(x)={\casespace1,0<x<1,0,2<x<3, then f is continuous on I, but does not assume any value between 0 and 1, and f0 on I, but f is not constant.

It is not difficult to see why these results fail to hold for this function: the domain of f consists of two disconnected pieces. It would be more sensible to regard f as two entirely different functions, one defined on (0,1) and the other on (2,3). The two results mentioned are valid for each of these functions.

As we will see when we study functions defined on subsets of \Rn, considerations like those just cited as making it natural to consider functions defined on intervals in \R lead us to single out a preferred class of subsets as domains of functions of n variables. These subsets are called {}. To define this term, we first need the following definition.

6pt

12pt

If X1,X2,,Xk are points in \Rn and Li is the line segment from Xi to Xi+1, 1ik1, we say that L1, L2, , Lk1 form a {} from X1 to Xk, and that X1 and Xk are {} by the polygonal path. For example, Figure~ shows a polygonal path in \R2 connecting (0,0) to (3,3). A set S is {} if every pair of points in S can be connected by a polygonal path lying entirely in S.

For sufficiency, we will show that if S is disconnected, then S is not poly-gonally connected. Let S=AB, where A and B satisfy . Suppose that X1A and X2B, and assume that there is a polygonal path in S connecting X1 to X2. Then some line segment L in this path must contain a point Y1 in A and a point Y2 in B. The line segment X=tY2+(1t)Y1,0t1, is part of L and therefore in S. Now define ρ=sup\setτtY2+(1t)Y1A, 0tτ1, and let Xρ=ρY2+(1ρ)Y1. Then Xρ¯A¯B. However, since $_AB $ and ¯AB=A¯B=, this is impossible. Therefore, the assumption that there is a polygonal path in S from X1 to X2 must be false.

For necessity, suppose that S is a connected open set and X0S. Let A be the set consisting of X0 and the points in S can be connected to X0 by polygonal paths in S. Let B be set of points in S that cannot be connected to X0 by polygonal paths. If Y0S, then S contains an ϵ-neighborhood Nϵ(Y0) of Y0, since S is open. Any point Y1 in Nϵ(Y0 can be connected to Y0 by the line segment X=tY1+(1t)Y0,0t1, which lies in Nϵ(Y0) (Lemma~) and therefore in S. This implies that Y0 can be connected to X0 by a polygonal path in S if and only if every member of Nϵ(Y0) can also. Thus, Nϵ(Y0)A if Y0A, and Nϵ(Y0)B if Y0B. Therefore, A and B are open. Since AB=, this implies that A¯B=¯AB= (Exercise~). Since A is nonempty (X0A), it now follows that B=, since if B, S would be disconnected (Definition~). Therefore, A=S, which completes the proof of necessity.

We did not use the assumption that S is open in the proof of sufficiency. In fact, we actually proved that any polygonally connected set, open or not, is connected. The converse is false. A set (not open) may be connected but not polygonally connected (Exercise~).

Our study of functions on \Rn will deal mostly with functions whose domains are regions, defined next.

From Definition~, a sequence {Xr} of points in \Rn converges to a limit ¯X if and only if for every ϵ>0 there is an integer K such that |Xr¯X|<ϵ\quad if\quadrK.

The \Rn definitions of divergence, boundedness, subsequence, and sums, differences, and constant multiples of sequences are analogous to those given in Sections 4.1 and 4.2 for the case where n=1. Since \Rn is not ordered for n>1, monotonicity, limits inferior and superior of sequences in \Rn, and divergence to ± are undefined for n>1. Products and quotients of members of \Rn are also undefined if n>1.

24pt 6pt 12pt

12pt

Several theorems from Sections~4.1 and 4.2 remain valid for sequences in \Rn, with proofs unchanged, provided that ``||" is interpreted as distance in \Rn. (A trivial change is required: the subscript n, used in Sections~4.1 and 4.2 to identify the terms of the sequence, must be replaced, since n here stands for the dimension of the space.) These include Theorems~ (uniqueness of the limit), (boundedness of a convergent sequence), parts of (concerning limits of sums, differences, and constant multiples of convergent sequences), and (every subsequence of a convergent sequence converges to the limit of the sequence).

{

), and ???.}

We now study real-valued functions of n variables. We denote the domain of a function f by Df and the value of f at a point X=(x1,x2,,xn) by f(X) or f(x1,x2,,xn). We continue the convention adopted in Section~2.1 for functions of one variable: If a function is defined by a formula such as f(X)\ar=(1x21x22x2n)1/2\arraytextorg(X)\ar=(1x21x22x2n)1 without specification of its domain, it is to be understood that its domain is the largest subset of \Rn for which the formula defines a unique real number. Thus, in the absence of any other stipulation, the domain of f in is the closed n-ball \setX|X|1, while the domain of g in is the set \setX|X|1.

The main objective of this section is to study limits and continuity of functions of n variables. The proofs of many of the theorems here are similar to the proofs of their counterparts in Sections~2.1 and . We leave most of them to you.

Definition~ does not require that f be defined at X0, or even on a deleted neighborhood of X0.

6pt

The following theorem is analogous to Theorem~2.1.3. We leave its proof to you (Exercise~).

When investigating whether a function has a limit at a point X0, no restriction can be made on the way in which X approaches X0, except that X must be in Df. The next example shows that incorrect restrictions can lead to incorrect conclusions.

The sum, difference, and product of functions of n variables are defined in the same way as they are for functions of one variable (Definition~), and the proof of the next theorem is the same as the proof of Theorem~.

6pt

12pt

We leave it to you to define lim|X|f(X)= and lim|X|f(X)= (Exercise~).

We will continue the convention adopted in Section~2.1: $\lim_{\mathbf{X}\to \mathbf{X}_0} f(\mathbf{X})$ exists'' means that $\lim_{\mathbf{X}\to\mathbf{X}_0} f(\mathbf{X})=L$, where $L$ is finite; to leave open the possibility that $L=\pm\infty$, we will say thatlimXX0f(X) exists in the extended reals.’’ A similar convention applies to limits as |X|.

Theorem~ remains valid if $\lim_{\mathbf{X}\to\mathbf{X}_0}$'' is replaced bylim|X|,’’ provided that D is unbounded. Moreover, , , and are valid in either version of Theorem~ if either or both of L1 and L2 is infinite, provided that their right sides are not indeterminate, and remains valid if L20 and L1/L2 is not indeterminate.

We now define continuity for functions of n variables. The definition is quite similar to the definition for functions of one variable.

The next theorem follows from this and Definition~.

In applying this theorem when X0D0f, we will usually omit ``and XDf,’’ it being understood that Sδ(X0)Df.

We will say that f is {} S if f is continuous at every point of S.

Theorem~ implies the next theorem, which is analogous to Theorem~ and, like the latter, permits us to investigate continuity of a given function by regarding the function as the result of addition, subtraction, multiplication, and division of simpler functions.

Suppose that g1, g2, , gn are real-valued functions defined on a subset T of \Rm, and define the {} G on T by G(U)=(g1(U),g2(U),,gn(U)),UT. Then g1, g2, , gn are the {} of G=(g1,g2,,gn). We say that limUU0G(U)=L=(L1,L2,,Ln) if limUU0gi(U)=Li,1in, and that G is {} at U0 if g1, g2, , gn are each continuous at U0.

The next theorem follows from Theorem~ and Definitions~ and . We omit the proof.

The following theorem on the continuity of a composite function is analogous to Theorem~.

6pt

12pt

Suppose that ϵ>0. Since f is continuous at X0=G(U0), there is an ϵ1>0 such that |f(X)f(G(U0))|<ϵ if |XG(U0)|<ϵ1\quad and\quadXDf. Since G is continuous at U0, there is a δ>0 such that |G(U)G(U0)|<ϵ1\quad if\quad|UU0|<δ\quad and\quadUDG. By taking X=G(U) in and , we see that |h(U)h(U0)|=|f(G(U)f(G(U0))|<ϵ if |UU0|<δ\quad and\quadUT.

-.4em The definitions of {}, and {} on a set S are the same for functions of n variables as for functions of one variable, as are the definitions of {} and {} of a function on a set S (Section 2.2). The proofs of the next two theorems are similar to those of Theorems~ and (Exercises~ and ).

The next theorem is analogous to Theorem~.

If there is no such C, then S=RT, where R\ar=\setXXS and f(X)<u\arraytextandT\ar=\setXXS and f(X)>u. If X0R, the continuity of f implies that there is a δ>0 such that f(X)<u if |XX0|<δ and XS. This means that X0¯T. Therefore, R¯T=. Similarly, ¯RT=. Therefore, S is disconnected (Definition~), which contradicts the assumption that S is a region (Exercise~). Hence, we conclude that f(C)=u for some C in S.

-.4em The definition of uniform continuity for functions of n variables is the same as for functions of one variable; f is uniformly continuous on a subset S of its domain in \Rn if for every ϵ>0 there is a δ>0 such that |f(X)f(X)|<ϵ whenever |XX|<δ and X,XS. We emphasize again that δ must depend only on ϵ and S, and not on the particular points X and X.

The proof of the next theorem is analogous to that of Theorem~. We leave it to you (Exercise~).

{}

To say that a function of one variable has a derivative at x0 is the same as to say that it is differentiable at x0. The situation is not so simple for a function f of more than one variable. First, there is no specific number that can be called {} derivative of f at a point X0 in \Rn. In fact, there are infinitely many numbers, called the {} X0 (defined below), that are analogous to the derivative of a function of one variable. Second, we will see that the existence of directional derivatives at X0 does not imply that f is differentiable at X0, if differentiability at X0 is to imply (as it does for functions of one variable) that f(X)f(X0) can be approximated well near X0 by a simple linear function, or even that f is continuous at X0.

We will now define directional derivatives and partial derivatives of functions of several variables. However, we will still have occasion to refer to derivatives of functions of one variable. We will call them {} derivatives when we wish to distinguish between them and the partial derivatives that we are about to define.

-2em2em

The directional derivatives that we are most interested in are those in the directions of the unit vectors E1=(1,0,,0),E2=(0,1,0,,0),,En=(0,,0,1). (All components of Ei are zero except for the ith, which is 1.) Since X and X+tEi differ only in the ith coordinate, f(X)/Ei is called the {}. It is also denoted by f(X)/xi or fxi(X); thus, f(X)x1=fx1(X)=limt0f(x1+t,x2,,xn)f(x1,x2,,xn)t,

f(X)xi=fxi(X)=limt0f(x1,,xi1,xi+t,xi+1,,xn)f(x1,x2,,xn)t if 2in, and f(X)xn=fxn(X)=limt0f(x1,,xn1,xn+t)f(x1,,xn1,xn)t, if the limits exist.

If we write X=(x,y), then we denote the partial derivatives accordingly; thus, f(x,y)x\ar=fx(x,y)=limh0f(x+h,y)f(x,y)h\arraytextandf(x,y)y\ar=fy(x,y)=limh0f(x,y+h)f(x,y)h.

It can be seen from these definitions that to compute fxi(X) we simply differentiate f with respect to xi according to the rules for ordinary differentiation, while treating the other variables as constants.

1pc

-3em3em

The next theorem follows from the rule just given for calculating partial derivatives.

If fxi(X) exists at every point of a set D, then it defines a function fxi on D. If this function has a partial derivative with respect to xj on a subset of D, we denote the partial derivative by xj(fxi)=2fxjxi=fxixj. Similarly, \frac{\partial}{\partial x_k}\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)=\frac{\partial^3 f}{\partial x_k\partial x_j \partial x_i}=f_{x_ix_jx_k}. \nonumber The function obtained by differentiating f successively with respect to x_{i_1}, x_{i_2}, \dots, x_{i_r} is denoted by \frac{\partial^r f}{\partial x_{i_r}\partial x_{i_{r-1}}\cdots\partial x_{i1}}=f_{x_{i_1}}\cdots x_{i_{r-1}} x_{i_r}; \nonumber it is an {}.

This example shows that f_{xy}(\mathbf{X}_0) and f_{yx}(\mathbf{X}_0) may differ. However, the next theorem shows that they are equal if f satisfies a fairly mild condition.

Suppose that \epsilon>0. Choose \delta>0 so that the open square

S_\delta=\set{(x,y)}{|x-x_0|<\delta, |y-y_0|<\delta} \nonumber is in N and \begin{equation}\label{eq:5.3.6} |f_{xy}(\widehat{x},\widehat{y})-f_{xy}(x_0,y_0)|<\epsilon\quad \mbox{\quad if\quad}(\widehat{x},\widehat{y})\in S_\delta. \end{equation} \nonumber This is possible because of the continuity of f_{xy} at (x_0,y_0). The function \begin{equation}\label{eq:5.3.7} A(h,k)=f(x_0+h, y_0+k)-f(x_0+h,y_0)-f(x_0,y_0+k)+f(x_0,y_0) \end{equation} \nonumber is defined if -\delta<h, k<\delta; moreover, \begin{equation}\label{eq:5.3.8} A(h,k)=\phi(x_0+h)-\phi(x_0), \end{equation} \nonumber where \phi(x)=f(x,y_0+k)-f(x,y_0). \nonumber Since \phi'(x)=f_x(x,y_0+k)-f_x(x,y_0),\quad |x-x_0|<\delta, \nonumber and the mean value theorem imply that \begin{equation}\label{eq:5.3.9} A(h,k)=\left[f_x (\widehat{x},y_0+k)-f_x(\widehat{x},y_0)\right]h, \end{equation} \nonumber where \widehat{x} is between x_0 and x_0+h. The mean value theorem, applied to f_x(\widehat{x},y) (where \widehat{x} is regarded as constant), also implies that f_x(\widehat{x},y_0+k)-f_x(\widehat{x},y_0)=f_{xy}(\widehat{x},\widehat{y})k, \nonumber where \widehat{y} is between y_0 and y_0+k. From this and , A(h,k)=f_{xy}(\widehat{x},\widehat{y})hk. \nonumber Now implies that \begin{equation}\label{eq:5.3.10} \left|\frac{A(h,k)}{ hk}-f_{xy}(x_0,y_0)\right|=\left|f_{xy}(\widehat{x}, \widehat{y})-f_{xy}(x_0,y_0)\right|<\epsilon \mbox{\quad if\quad} 0<|h|, |k|<\delta. \end{equation} \nonumber Since implies that \begin{eqnarray*} \lim_{k\to 0}\frac{A(h,k)}{ hk}\ar=\lim_{k\to 0} \frac{f(x_0+h,y_0+k)-f(x_0 +h,y_0)}{ hk}\\ \ar{}-\lim_{k\to 0}\frac{f(x_0,y_0+k)-f(x_0,y_0)}{ hk}\\ \ar=\frac{f_y(x_0+h,y_0)-f_y(x_0,y_0)}{ h}, \end{eqnarray*} \nonumber it follows from that \left|\frac{f_y(x_0+h,y_0)-f_y(x_0,y_0)}{ h}-f_{xy}(x_0,y_0)\right|\le \epsilon\mbox{\quad if\quad} 0<|h|<\delta. \nonumber

Taking the limit as h\to0 yields |f_{yx}(x_0,y_0)-f_{xy}(x_0,y_0)|\le\epsilon. \nonumber Since \epsilon is an arbitrary positive number, this proves .

Theorem~ implies the following theorem. We leave the proof to you (Exercises~ and ).

For example, if f satisfies the hypotheses of Theorem~ with k=4 at a point \mathbf{X}_0 in \R^n (n\ge2), then f_{xxyy}(\mathbf{X}_0)=f_{xyxy}(\mathbf{X}_0)=f_{xyyx}(\mathbf{X}_0)=f_{yyxx}(\mathbf{X}_0)= f_{yxyx}(\mathbf{X}_0)=f_{yxxy}(\mathbf{X}_0), \nonumber and their common value is denoted by \frac{\partial^4f(\mathbf{X}_0)}{\partial x^2\partial y^2}. \nonumber

It can be shown (Exercise~) that if f is a function of (x_1,x_2, \dots,x_n) and (r_1,r_2, \dots,r_n) is a fixed ordered n-tuple that satisfies and , then the number of partial derivatives f_{x_{i_1}x_{i_2}\cdots x_{i_r}} that involve differentiation r_i times with respect to x_i, 1\le i\le n, equals the {} \frac{r!}{ r_1!r_2!\cdots r_n!}. \nonumber

A function of several variables may have first-order partial derivatives at a point \mathbf{X}_0 but fail to be continuous at \mathbf{X}_0. For example, if

\begin{equation}\label{eq:5.3.15} f(x,y)=\left\{\casespace\begin{array}{ll}\dst\frac{xy}{ x^2+y^2},&(x,y)\ne (0,0),\\[2\jot] 0,&(x,y)=(0,0),\end{array}\right. \end{equation} \nonumber

then \begin{eqnarray*} f_x(0,0)\ar=\lim_{h\to0}\frac{f(h,0)-f(0,0)}{h}=\lim_{h\to0}\frac{0-0}{h}=0\\ \arraytext{and}\\ f_y(0,0)\ar=\lim_{k\to0}\frac{f(0,k)-f(0,0)}{k}=\lim_{k\to0}\frac{0-0}{k}=0, \end{eqnarray*} \nonumber but f is not continous at (0,0). (See Examples~ and .) Therefore, if differentiability of a function of several variables is to be a stronger property than continuity, as it is for functions of one variable, the definition of differentiability must require more than the existence of first partial derivatives. Exercise~ characterizes differentiability of a function f of one variable in a way that suggests the proper generalization: f is differentiable at x_0 if and only if \lim_{x\to {x_0}} \frac{f(x)-f(x_0)-m(x-x_0)}{ x-x_0}=0 \nonumber for some constant m, in which case m=f'(x_0).

The generalization to functions of n variables is as follows.

From , m_1=f_x(x_0,y_0) and m_2=f_y(x_0,y_0) in Example~. The next theorem shows that this is not a coincidence.

Let i be a given integer in \{1,2, \dots,n\}. Let \mathbf{X}=\mathbf{X}_0+t\mathbf{E}_i, so that x_i=x_{i0}+t, x_j =x_{j0} if j\ne i, and |\mathbf{X}-\mathbf{X}_0|=|t|. Then and the differentiability of f at \mathbf{X}_0 imply that \lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{E}_i)-f(\mathbf{X}_0)-m_it}{ t}=0. \nonumber

Hence, \lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{E}_i)-f(\mathbf{X}_0)}{ t}=m_i. \nonumber 6pt

This proves , since the limit on the left is f_{x_i} (\mathbf{X}_0), by definition.

A {} is a function of the form \begin{equation}\label{eq:5.3.19} L(\mathbf{X})=m_1x_1+m_2x_2+\cdots+m_nx_n, \end{equation} \nonumber where m_1, m_2, , m_n are constants. From Definition~, f is differentiable at \mathbf{X}_0 if and only if there is a linear function L such that f(\mathbf{X})-f(\mathbf{X}_{0}) can be approximated so well near \mathbf{X}_0 by L(\mathbf{X})-L(\mathbf{X}_0)=L(\mathbf{X}-\mathbf{X}_0) \nonumber that \begin{equation}\label{eq:5.3.20} f(\mathbf{X})-f(\mathbf{X}_0)=L(\mathbf{X}-\mathbf{X}_0)+ E(\mathbf{X})(|\mathbf{X}-\mathbf{X}_0|), \end{equation} \nonumber where \begin{equation}\label{eq:5.3.21} \lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0. \end{equation} \nonumber

6pt

From and Schwarz’s inequality, |L(\mathbf{X}-\mathbf{X}_0)|\le M|\mathbf{X}-\mathbf{X}_0|, \nonumber where M=(m^2_1+m^2_2+\cdots+m^2_n)^{1/2}. \nonumber This and imply that |f(\mathbf{X})-f(\mathbf{X}_0)|\le(M+|E(\mathbf{X})|) |\mathbf{X}-\mathbf{X}_0|, \nonumber which, with , implies that f is continuous at \mathbf{X}_0.

Theorem~ implies that the function f defined by is not differentiable at (0,0), since it is not continuous at (0,0). However, f_x(0,0) and f_y(0,0) exist, so the converse of Theorem~ is false; that is, a function may have partial derivatives at a point without being differentiable at the point.

Theorem~ implies that if f is differentiable at \mathbf{X}_{0}, then there is exactly one linear function L that satisfies and : L(\mathbf{X})=f_{x_1}(\mathbf{X}_0)x_1+ f_{x_2}(\mathbf{X}_0)x_2+\cdots+f_{x_n}(\mathbf{X}_0)x_n. \nonumber

This function is called the {}. We will denote it by d_{\mathbf{X}_0}f and its value by (d_{\mathbf{X}_0}f)(\mathbf{X}); thus, \begin{equation}\label{eq:5.3.22} (d_{\mathbf{X}_0}f)(\mathbf{X})=f_{x_1}(\mathbf{X}_0)x_1+f_{x_2}(\mathbf{X}_0) x_2+\cdots+f_{x_n} (\mathbf{X}_0)x_n. \end{equation} \nonumber In terms of the differential, can be rewritten as \lim_{\mathbf{X}\to\mathbf{X}_0} \frac{f(\mathbf{X})-f(\mathbf{X}_0)-(d_{\mathbf{X}_0}f)(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|} =0. \nonumber

For convenience in writing d_{\mathbf{X}_0} f, and to conform with standard notation, we introduce the function dx_i, defined by dx_i(\mathbf{X})=x_i; \nonumber that is, dx_i is the function whose value at a point in \R^n is the ith coordinate of the point. It is the differential of the function g_i(\mathbf{X})=x_i. From , \begin{equation}\label{eq:5.3.23} d_{\mathbf{X}_0} f=f_{x_1}(\mathbf{X}_0)\,dx_1+f_{x_2}(\mathbf{X}_{0} \,dx_2+\cdots+f_{x_n} (\mathbf{X}_0)\,dx_n. \end{equation} \nonumber

If we write \mathbf{X}=(x,y, \dots,), then we write d_{\mathbf{X}_0} f=f_x (\mathbf{X}_0)\,dx+f_y(\mathbf{X}_0)\,dy+\cdots, \nonumber where dx, dy, are the functions defined by dx(\mathbf{X})=x,\quad dy (\mathbf{X})=y,\dots \nonumber

When it is not necessary to emphasize the specific point \mathbf{X}_0, can be written more simply as df=f_{x_1}\,dx_1+f_{x_2}\,dx_2+\cdots+f_{x_n}\,dx_n. \nonumber When dealing with a specific function at an arbitrary point of its domain, we may use the hybrid notation df=f_{x_1}(\mathbf{X})\,dx_1+f_{x_2}(\mathbf{X})\,dx_2+\cdots+f_{x_n}(\mathbf{X})\,dx_n. \nonumber

Unfortunately, the notation for the differential is so complicated that it obscures the simplicity of the concept. The peculiar symbols df, dx, dy, etc., were introduced in the early stages of the development of calculus to represent very small (``infinitesimal’’) increments in the variables. However, in modern usage they are not quantities at all, but linear functions. This meaning of the symbol dx differs from its meaning in \int_a^b f(x)\,dx, where it serves merely to identify the variable of integration; indeed, some authors omit it in the latter context and write simply \int^b_a f.

Theorem~ implies the following lemma, which is analogous to Lemma~. We leave the proof to you (Exercise~).

Theorems~ and and the definition of the differential imply the following \theorem.

The next theorem provides a widely applicable sufficient condition for differentiability.

Let \mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0}) and suppose that \epsilon>0. Our assumptions imply that there is a \delta>0 such that f_{x_1}, f_{x_2}, \dots, f_{x_n} are defined in the n-ball S_\delta (\mathbf{X}_0)=\set{\mathbf{X}}{|\mathbf{X}-\mathbf{X}_0|<\delta} \nonumber and \begin{equation}\label{eq:5.3.24} |f_{x_j}(\mathbf{X})-f_{x_j}(\mathbf{X}_0)|<\epsilon\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta,\quad 1\le j\le n. \end{equation} \nonumber Let \mathbf{X}=(x_1,x_, \dots,x_n) be in S_\delta(\mathbf{X}_0). Define \mathbf{X}_j=(x_1, \dots,x_j, x_{j+1,0}, \dots,x_{n0}),\quad 1\le j\le n-1, \nonumber and \mathbf{X}_n=\mathbf{X}. Thus, for 1\le j\le n, \mathbf{X}_j differs from \mathbf{X}_{j-1} in the jth component only, and the line segment from \mathbf{X}_{j-1} to \mathbf{X}_j is in S_\delta (\mathbf{X}_0). Now write \begin{equation}\label{eq:5.3.25} f(\mathbf{X})-f(\mathbf{X}_0)=f(\mathbf{X}_n)-f(\mathbf{X}_0)= \sum^n_{j=1}\,[f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})], \end{equation} \nonumber and consider the auxiliary functions \begin{equation}\label{eq:5.3.26} \begin{array}{rcl} g_1(t)\ar=f(t,x_{20}, \dots,x_{n0}),\\[2\jot] g_j(t)\ar=f(x_1, \dots,x_{j-1},t,x_{j+1,0}, \dots,x_{n0}),\quad 2\le j\le n-1,\\[2\jot] g_n(t)\ar=f(x_1, \dots,x_{n-1},t), \end{array} \end{equation} \nonumber where, in each case, all variables except t are temporarily regarded as constants. Since f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=g_j(x_j)-g_j(x_{j0}), \nonumber the mean value theorem implies that f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=g'_j(\tau_j)(x_j-x_{j0}), \nonumber

where \tau_j is between x_j and x_{j0}. From , g'_j(\tau_j)=f_{x_j}(\widehat{\mathbf{X}}_j), \nonumber where \widehat{\mathbf{X}}_j is on the line segment from \mathbf{X}_{j-1} to \mathbf{X}_j. Therefore, f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=f_{x_j}(\widehat{\mathbf{X}}_j)(x_j-x_{j0}), \nonumber and implies that \begin{eqnarray*} f(\mathbf{X})-f(\mathbf{X}_0)\ar=\sum^n_{j=1} f_{x_j} (\widehat{\mathbf{X}}_j)(x_j-x_{j0})\\ \ar=\sum^n_{j=1} f_{x_j}(\mathbf{X}_0) (x_j-x_{j0})+\sum^n_{j=1} \,[f_{x_j}(\widehat{\mathbf{X}}_j)-f_{x_j}(\mathbf{X}_0)](x_j-x_{j0}). \end{eqnarray*} \nonumber From this and , \left|f(\mathbf{X})-f(\mathbf{X}_0)-\sum^n_{j=1} f_{x_j}(\mathbf{X}_{0}) (x_j-x_{j0})\right|\le \epsilon\sum^n_{j=1} |x_j-x_{j0}|\le n\epsilon |\mathbf{X}-\mathbf{X}_0|, \nonumber which implies that f is differentiable at \mathbf{X}_0.

We say that f is {} on a subset S of \R^n if S is contained in an open set on which f_{x_1}, f_{x_2}, , f_{x_n} are continuous. Theorem~ implies that such a function is differentiable at each \mathbf{X}_0 in S.

In Section~2.3 we saw that if a function f of one variable is differentiable at x_0, then the curve y=f(x) has a tangent line y=T(x)=f(x_0)+f'(x_0)(x-x_0) \nonumber that approximates it so well near x_0 that \lim_{x\to x_0}\frac{f(x)-T(x)}{ x-x_0}=0. \nonumber

Moreover, the tangent line is the ``limit’’ of the secant line through the points (x_1,f(x_0)) and (x_0,f(x_0)) as x_1 approaches x_0.

6pt

12pt Differentiability of a function of n variables has an analogous geometric interpretation. We will illustrate it for n=2. If f is defined in a region D in \R^2, then the set of points (x,y,z) such that \begin{equation}\label{eq:5.3.27} z=f(x,y),\quad (x,y)\in D, \end{equation} \nonumber is a {} in \R^3 (Figure~).

6pt

12pt

If f is differentiable at \mathbf{X}_0=(x_0,y_0), then the plane \begin{equation}\label{eq:5.3.28} z=T(x,y)=f(\mathbf{X}_0)+f_x(\mathbf{X}_0)(x-x_0)+f_y(\mathbf{X}_0)(y-y_0) \end{equation} \nonumber intersects the surface at (x_0,y_0,f(x_0,y_0)) and approximates the surface so well near (x_0,y_0) that

\lim_{(x,y)\to (x_0,y_0)}\frac{f(x,y)-T(x,y)}{\sqrt{(x-x_0)^2+ (y-y_0)^2}}=0 \nonumber (Figure~). Moreover, is the only plane in \R^3 with these properties (Exercise~). We say that this plane is {} (x_0,y_0,f(x_0,y_0)). We will now show that it is the limit'' ofsecant planes’’ associated with the surface z=f(x,y), just as a tangent line to a curve y=f(x) in \R^3 is the limit of secant lines to the curve (Section~2.3).

Let \mathbf{X}_i=(x_i,y_i) (i=1,2,3). The equation of the ``secant plane’’ through the points (x_i,y_i,f(x_i,y_i)) (i=1,2,3) on the surface z=f(x,y) (Figure~) is of the form \begin{equation}\label{eq:5.3.29} z=f(\mathbf{X}_0)+A(x-x_0)+B(y-y_0), \end{equation} \nonumber where A and B satisfy the system \begin{eqnarray*} f(\mathbf{X}_1)\ar=f(\mathbf{X}_0)+A(x_1-x_0)+B(y_1-y_0),\\ f(\mathbf{X}_2)\ar=f(\mathbf{X}_0)+A(x_2-x_0)+B(y_2-y_0). \end{eqnarray*} \nonumber Solving for A and B yields \begin{eqnarray} A\ar=\frac{(f(\mathbf{X}_1)-f(\mathbf{X}_0))(y_2-y_0)- (f(\mathbf{X}_2)-f(\mathbf{X}_0)) (y_1-y_0)}{(x_1-x_0)(y_2-y_0)-(x_2-x_0)(y_1-y_0)}\label{eq:5.3.30}\\ \arraytext{and}\nonumber\\ B\ar=\frac{(f(\mathbf{X}_2)-f(\mathbf{X}_0))(x_1-x_0)- (f(\mathbf{X}_1)-f(\mathbf{X}_0))(x_2-x_0)}{(x_1-x_0)(y_2-y_0)- (x_2-x_0)(y_1-y_0)}\label{eq:5.3.31} \end{eqnarray} \nonumber if \begin{equation}\label{eq:5.3.32} (x_1-x_0)(y_2-y_0)-(x_2-x_0)(y_1-y_0)\ne0, \end{equation} \nonumber which is equivalent to the requirement that \mathbf{X}_0, \mathbf{X}_1, and \mathbf{X}_2 do not lie on a line (Exercise~). If we write \mathbf{X}_1=\mathbf{X}_0+t\mathbf{U}\mbox{\quad and\quad}\mathbf{X}_2= \mathbf{X}_0+t\mathbf{V}, \nonumber where \mathbf{U}=(u_1,u_2) and \mathbf{V}=(v_1,v_2) are fixed nonzero vectors (Figure~), then , , and take the more convenient forms \begin{eqnarray} A\ar=\frac{\dst{\frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t} v_2- \frac{ f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t}u_2}} {u_1v_2-u_2v_1}, \label{eq:5.3.33}\\ B\ar=\frac{\dst{\frac{f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t} u_1- \frac{ f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t}v_1}} {u_1v_2-u_2v_1}, \label{eq:5.3.34} \end{eqnarray} \nonumber and u_1v_2-u_2v_1\ne0. \nonumber

6pt

12pt

If f is differentiable at \mathbf{X}_0, then \begin{equation}\label{eq:5.3.35} f(\mathbf{X})-f(\mathbf{X}_0)=f_x(\mathbf{X}_0) (x-x_0)+f_y(\mathbf{X}_0)(y-y_0)+ \epsilon(\mathbf{X}) |\mathbf{X}-\mathbf{X}_0|, \end{equation} \nonumber where \begin{equation}\label{eq:5.3.36} \lim_{\mathbf{X}\to\mathbf{X}_0}\epsilon(\mathbf{X})=0. \end{equation} \nonumber Substituting first \mathbf{X}=\mathbf{X}_0+t\mathbf{U} and then \mathbf{X} =\mathbf{X}_0+t\mathbf{V} in and dividing by t yields \begin{equation}\label{eq:5.3.37} \frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t}=f_x (\mathbf{X}_0)u_1+ f_y(\mathbf{X}_0)u_2+E_1(t) |\mathbf{U}| \end{equation} \nonumber and \begin{equation}\label{eq:5.3.38} \frac{f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t}=f_x (\mathbf{X}_0)v_1+ f_y(\mathbf{X}_0)v_2+E_2(t) |\mathbf{V}|, \end{equation} \nonumber where E_1(t)=\epsilon(\mathbf{X}_0+t\mathbf{U})|t|/t\mbox{\quad and\quad} E_2(t)= \epsilon(\mathbf{X}_0+t\mathbf{V})|t|/t, \nonumber so \begin{equation}\label{eq:5.3.39} \lim_{t\to 0} E_i(t)=0,\quad i=1,2, \end{equation} \nonumber because of . Substituting and into and yields \begin{equation}\label{eq:5.3.40} A=f_x(\mathbf{X}_0)+\Delta_1(t),\quad B=f_y(\mathbf{X}_0)+\Delta_2(t), \end{equation} \nonumber where

\Delta_1(t)=\frac{v_2 |\mathbf{U}|E_1(t)-u_2|\mathbf{V}|E_2(t)}{ u_1v_2-u_2v_1} \nonumber and \Delta_2(t)=\frac{u_1|\mathbf{V}|E_2(t)-v_1|\mathbf{U}|E_1(t)}{ u_1v_2-u_2v_1}, \nonumber so \begin{equation}\label{eq:5.3.41} \lim_{t\to 0}\Delta_i(t)=0,\quad i=1,2, \end{equation} \nonumber because of .

From and , the equation of the secant plane is z=f(\mathbf{X}_0)+[f_x(\mathbf{X}_0)+\Delta_1(t)](x-x_0)+ [f_y(\mathbf{X}_0)+\Delta_2(t)](y-y_0). \nonumber Therefore, because of , the secant plane ``approaches’’ the tangent plane as t approaches zero.

We say that \mathbf{X}_0 is a {} of f if there is a \delta>0 such that f(\mathbf{X})-f(\mathbf{X}_0) \nonumber does not change sign in S_\delta (\mathbf{X}_0)\cap D_f. More specifically, \mathbf{X}_0 is a {} if f(\mathbf{X})\le f(\mathbf{X}_0) \nonumber or a {} if f(\mathbf{X})\ge f(\mathbf{X}_0) \nonumber for all \mathbf{X} in S_\delta (\mathbf{X}_0)\cap D_f.

The next theorem is analogous to Theorem~.

Let \mathbf{E}_1=(1,0, \dots,0),\quad \mathbf{E}_{2} =(0,1,0, \dots,0),\dots,\quad \mathbf{E}_n= (0,0, \dots,1), \nonumber and g_i(t)=f(\mathbf{X}_0+t\mathbf{E}_i),\quad 1\le i\le n. \nonumber Then g_i is differentiable at t=0, with g'_i(0)=f_{x_i}(\mathbf{X}_0) \nonumber

(Definition~). Since \mathbf{X}_0 is a local extreme point of f, t_0=0 is a local extreme point of g_i. Now Theorem~ implies that g'_i(0)=0, and this implies .

The converse of Theorem~ is false, since may hold at a point \mathbf{X}_0 that is not a local extreme point of f. For example, let \mathbf{X}_0= (0,0) and f(x,y)=x^3+y^3. \nonumber We say that a point \mathbf{X}_0 where holds is a {} of f. Thus, if f is defined in a neighborhood of a local extreme point \mathbf{X}_0, then \mathbf{X}_0 is a critical point of f; however, a critical point need not be a local extreme point of f.

The use of Theorem~ for finding local extreme points is covered in calculus, so we will not pursue it here.

We now consider the problem of differentiating a composite function h(\mathbf{U})=f(\mathbf{G}(\mathbf{U})), \nonumber where \mathbf{G}=(g_1,g_2, \dots,g_n) is a vector-valued function, as defined in Section~5.2. We begin with the following definition.

We need the following lemma to prove the main result of the section.

Since g_1, g_2, , g_n are differentiable at \mathbf{U}_0, applying Lemma~ to g_i shows that \begin{equation} \label{eq:5.4.1} \begin{array}{rcl} g_i(\mathbf{U})-g_i(\mathbf{U}_0)\ar=(d_{\mathbf{U}_0}g_i)(\mathbf{U}-\mathbf{U}_0)+ E_i (\mathbf{U}) |(\mathbf{U}-\mathbf{U}_0|\\[2\jot] \ar=\dst\sum_{j=1}^m\frac{\partial g_i(\mathbf{U}_0)}{\partial u_j}(u_j-u_{j0})+ E_i (\mathbf{U}) |(\mathbf{U}-\mathbf{U}_0|, \end{array} \end{equation} \nonumber

where \begin{equation} \label{eq:5.4.2} \lim_{\mathbf{U}\to\mathbf{U}_0}E_i(\mathbf{U})=0,\quad 1\le i\le n. \end{equation} \nonumber From Schwarz’s inequality, |g_i(\mathbf{U})-g_i(\mathbf{U}_0)|\le(M_i+|E_i(\mathbf{U})|) |\mathbf{U}-\mathbf{U}_{0}|, \nonumber where M_i=\left(\sum_{j=1}^m\left(\frac{\partial g_i(\mathbf{U}_0)} {\partial u_j}\right)^2\right)^{1/2}. \nonumber Therefore, \frac{|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}{|\mathbf{U}-\mathbf{U}_0|} \le\left(\sum_{i=1}^n(M_i+|E_i(\mathbf{U})|)^2\right)^{1/2}. \nonumber From , \lim_{\mathbf{U}\to\mathbf{U}_0} \left(\sum_{i=1}^n(M_i+|E_i(\mathbf{U})|)^2\right)^{1/2} =\left(\sum_{i=1}^nM_i^2\right)^{1/2}=M, \nonumber which implies the conclusion.

The following theorem is analogous to Theorem~.

We leave it to you to show that \mathbf{U}_0 is an interior point of the domain of h (Exercise~), so it is legitimate to ask if h is differentiable at \mathbf{U}_0.

Let \mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0}). Note that x_{i0}=g_i(\mathbf{U}_0),\quad 1\le i\le n, \nonumber by assumption. Since f is differentiable at \mathbf{X}_0, Lemma~ implies that \begin{equation} \label{eq:5.4.5} f(\mathbf{X})-f(\mathbf{X}_0)=\sum_{i=1}^n f_{x_i} (\mathbf{X}_0) (x_i-x_{i0})+E(\mathbf{X})|\mathbf{X}-\mathbf{X}_0|, \end{equation} \nonumber where \lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0. \nonumber

Substituting \mathbf{X}=\mathbf{G}(\mathbf{U}) and \mathbf{X}_0=\mathbf{G}(\mathbf{U}_0) in and recalling yields \begin{equation} \label{eq:5.4.6} h(\mathbf{U})-h(\mathbf{U}_0)=\dst{\sum_{i=1}^n}\, f_{x_i}(\mathbf{X}_0) (g_i(\mathbf{U})-g_i(\mathbf{U}_0)) +E(\mathbf{G}(\mathbf{U})) |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|. \end{equation} \nonumber Substituting into yields \begin{array}{rcl} h(\mathbf{U})-h(\mathbf{U}_0)\ar=\dst{\sum_{i=1}^n} f_{x_i}(\mathbf{X}_0) (d_{\mathbf{U}_0}g_i) (\mathbf{U}-\mathbf{U}_0) +\dst{\left(\sum_{i=1}^n f_{x_i}(\mathbf{X}_0)E_i(\mathbf{U})\right)} |\mathbf{U}-\mathbf{U}_0| \\\\ \ar{}+E(\mathbf{G}(\mathbf{U})) |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_{0}|. \end{array} \nonumber Since \lim_{\mathbf{U}\to\mathbf{U}_0}E(\mathbf{G}(\mathbf{U}))=\lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0, \nonumber and Lemma~ imply that \frac{h(\mathbf{U})-h(\mathbf{U}_0)-\dst\sum_{i=1}^nf_{x_i}(\mathbf{X}_{0} d_{\mathbf{U}_0}g_i (\mathbf{U}-\mathbf{U}_0)}{|\mathbf{U}-\mathbf{U}_0|}=0. \nonumber Therefore, h is differentiable at \mathbf{U}_0, and d_{\mathbf{U}_0}h is given by .

Substituting d_{\mathbf{U}_0}g_i=\frac{\partial g_i(\mathbf{U}_0)}{\partial u_1} \,du_1+\frac{\partial g_i(\mathbf{U}_0)}{\partial u_2} \,du_2+\cdots+ \frac{\partial g_i(\mathbf{U}_0)}{ \partial u_m} \,du_m,\quad 1\le i\le n, \nonumber into and collecting multipliers of du_1, du_2, , du_m yields d_{\mathbf{U}_0}h=\sum_{i=1}^m\left(\sum_{j=1}^n \frac{\partial f(\mathbf{X}_0)}{\partial x_j} \frac{\partial g_j(\mathbf{U}_0)}{\partial u_i}\right)\,du_i. \nonumber However, from Theorem~, d_{\mathbf{U}_0} h=\sum_{i=1}^m\frac{\partial h(\mathbf{U}_0)}{\partial u_i} \,du_i. \nonumber Comparing the last two equations yields .

When it is not important to emphasize the particular point \mathbf{X}_0, we write less formally as \begin{equation} \label{eq:5.4.9} \frac{\partial h}{\partial u_i}=\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial g_j}{\partial u_i},\quad 1\le i\le m, \end{equation} \nonumber with the understanding that in calculating \partial h(\mathbf{U}_0)/\partial u_i, \partial g_j/\partial u_i is evaluated at \mathbf{U}_0 and \partial f/\partial x_j at \mathbf{X}_0=\mathbf{G}(\mathbf{U}_0).

The formulas and can also be simplified by replacing the symbol \mathbf{G} with \mathbf{X}=\mathbf{X}(\mathbf{U}); then we write h(\mathbf{U})=f(\mathbf{X}(\mathbf{U})) \nonumber and \frac{\partial h(\mathbf{U}_0)}{\partial u_i}=\sum_{j=1}^n \frac{\partial f(\mathbf{X}_0)}{ \partial x_j} \frac{\partial x_j(\mathbf{U}_0)}{\partial u_i}, \nonumber or simply \begin{equation} \label{eq:5.4.10} \frac{\partial h}{\partial u_i}=\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial u_i}. \end{equation} \nonumber

The proof of Corollary~ suggests a straightforward way to calculate the partial derivatives of a composite function without using explicitly. If h(\mathbf{U})=f(\mathbf{X}(\mathbf{U})), then Theorem~ , in the more casual notation introduced before Example~, implies that \begin{equation} \label{eq:5.4.12} dh=f_{x_1} dx_1+f_{x_2} dx_2+\cdots+f_{x_n}dx_n, \end{equation} \nonumber where dx_1, dx_2, , dx_n must be written in terms of the differentials du_1, du_2, , du_m of the independent variables; thus,

dx_i=\frac{\partial x_i}{\partial u_1} \,du_1+\frac{\partial x_i}{ \partial u_2} \,du_2+\cdots+\frac{\partial x_i}{\partial u_m} \,du_m. \nonumber Substituting this into and collecting the multipliers of du_1, du_2, , du_m yields~.

Higher derivatives of composite functions can be computed by repeatedly applying the chain rule. For example, differentiating with respect to u_k yields \begin{equation} \label{eq:5.4.16} \begin{array}{rcl} \dst\frac{\partial^2h}{\partial u_k\partial u_i}\ar=\dst{\sum_{j=1}^n \frac{\partial }{\partial u_k}\left(\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial u_i}\right)}\\[2\jot] \ar=\dst{\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial^2 x_j}{ \partial u_k\,\partial u_i}+\sum_{j=1}^n\frac{\partial x_j}{\partial u_i} \frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_j}\right)}. \end{array} \end{equation} \nonumber We must be careful finding \frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_j}\right), \nonumber which really stands here for \begin{equation} \label{eq:5.4.17} \frac{\partial}{\partial u_k}\left (\frac{\partial f(\mathbf{X}(\mathbf{U}))}{ \partial x_j}\right). \end{equation} \nonumber The safest procedure is to write temporarily g(\mathbf{X})=\frac{\partial f(\mathbf{X})}{\partial x_j}; \nonumber then becomes \frac{\partial g(\mathbf{X}(\mathbf{U}))}{\partial u_k}=\sum_{s=1}^n \frac{\partial g(\mathbf{X}(\mathbf{U}))}{\partial x_s} \frac{\partial x_s(\mathbf{U})}{\partial u_k}. \nonumber Since \frac{\partial g}{\partial x_s}=\frac{\partial^2f}{\partial x_s\,\partial x_j}, \nonumber this yields \frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_k}\right)= \sum_{s=1}^n\frac{\partial^2f}{\partial x_s\,\partial x_j} \frac{\partial x_s}{\partial u_k}. \nonumber Substituting this into yields \begin{equation} \label{eq:5.4.18} \frac{\partial^2h}{\partial u_k\,\partial u_i}=\sum_{j=1}^n \frac{\partial f }{\partial x_j} \frac{\partial^2x_j}{\partial u_k\,\partial u_i}+ \sum_{j=1}^n\frac{\partial x_j}{\partial u_i}\sum_{s=1}^n \frac{\partial^2f}{\partial x_s\,\partial x_j} \frac{\partial x_s}\,{\partial u_k}. \end{equation} \nonumber

To compute h_{u_iu_k}(\mathbf{U}_0) from this formula, we evaluate the partial derivatives of x_1, x_2, , x_n at \mathbf{U}_0 and those of f at \mathbf{X}_0= \mathbf{X}(\mathbf{U}_0). The formula is valid if x_1, x_2, , x_n and their first partial derivatives are differentiable at \mathbf{U}_0 and f, f_{x_i}, f_{x_2}, , f_{x_n} and their first partial derivatives are differentiable at \mathbf{X}_0.

Instead of memorizing , you should understand how it is derived and use the method, rather than the formula, when calculating second partial derivatives of composite functions. The same method applies to the calculation of higher derivatives.

For a composite function of the form h(t)=f(x_1(t),x_2(t), \dots,x_n(t)) \nonumber where t is a real variable, x_1, x_2, , x_n are differentiable at t_0, and f is differentiable at \mathbf{X}_0=\mathbf{X}(t_0), takes the form \begin{equation} \label{eq:5.4.20} h'(t_0)=\sum_{j=1}^n f_{x_j}(\mathbf{X}(t_0)) x'_j(t_0). \end{equation} \nonumber This will be useful in the proof of the following theorem.

An equation of L is \mathbf{X}=\mathbf{X}(t)=t\mathbf{X}_2+(1-t)\mathbf{X}_1,\quad 0\le t\le1. \nonumber Our hypotheses imply that the function h(t)=f(\mathbf{X}(t)) \nonumber is continuous on [0,1] and differentiable on (0,1). Since x_i(t)=tx_{i2}+(1-t)x_{i1}, \nonumber implies that h'(t)=\sum_{i=1}^n f_{x_i}(\mathbf{X}(t))(x_{i2}-x_{i1}),\quad 0<t<1. \nonumber From the mean value theorem for functions of one variable (Theorem~), h(1)-h(0)=h'(t_0) \nonumber for some t_0\in (0,1). Since h(1)=f(\mathbf{X}_2) and h(0)=f(\mathbf{X}_1), this implies with \mathbf{X}_0=\mathbf{X}(t_0).

We will show that if \mathbf{X}_0 and \mathbf{X} are in S, then f(\mathbf{X})=f(\mathbf{X}_0). Since S is an open region, S is polygonally connected (Theorem~). Therefore, there are points \mathbf{X}_0, \mathbf{X}_1, \dots, \mathbf{X}_n=\mathbf{X} \nonumber such that the line segment L_i from \mathbf{X}_{i-1} to \mathbf{X}_i is in S, 1\le i\le n. From Theorem~, f(\mathbf{X}_i)-f(\mathbf{X}_{i-1})=\sum_{i=1}^n (d_{\widetilde{\mathbf{X}}_i} f)(\mathbf{X}_i-\mathbf{X}_{i-1}), \nonumber where \widetilde{\mathbf{X}} is on L_i and therefore in S. Therefore, f_{x_i} (\widetilde{\mathbf{X}}_i)=f_{x_2} (\widetilde{\mathbf{X}}_i)=\cdots= f_{x_n}(\widetilde{\mathbf{X}}_i)=0, \nonumber

which means that d_{\widetilde{\mathbf{X}}_i} f\equiv0. Hence, f(\mathbf{X}_0)=f(\mathbf{X}_1)=\cdots=f(\mathbf{X}_n); \nonumber that is, f(\mathbf{X})=f(\mathbf{X}_0) for every \mathbf{X} in S.

Suppose that f is defined in an n-ball B_\rho(\mathbf{X}_0), with \rho>0. If \mathbf{X}\in B_\rho(\mathbf{X}_0), then \mathbf{X}(t)=\mathbf{X}_0+t(\mathbf{X}-\mathbf{X}_0)\in B_\rho(\mathbf{X}),\quad 0\le t\le 1, \nonumber so the function h(t)=f(\mathbf{X}(t)) \nonumber is defined for 0\le t\le 1. From Theorem~ (see also ), h'(t)=\sum_{i=1}^n f_{x_i} (\mathbf{X}(t)(x_i-x_{i0}) \nonumber if f is differentiable in B_\rho(\mathbf{X}_0), and \begin{eqnarray*} h''(t)\ar=\sum_{j=1}^n\frac{\partial}{\partial x_j}\left(\sum_{i=1}^n \frac{\partial f(\mathbf{X}(t))}{\partial x_i}(x_i-x_{i0})\right)(x_j-x_{j0})\\ \ar=\sum_{i,j=1}^n\frac{\partial^2f(\mathbf{X}(t))}{\partial x_j\,\partial x_i}(x_i-x_{i0})(x_j-x_{j0}) \end{eqnarray*} \nonumber if f_{x_1}, f_{x_2}, , f_{x_n} are differentiable in B_\rho(\mathbf{X}_0). Continuing in this way, we see that \begin{equation} \label{eq:5.4.22} h^{(r)}(t)=\sum_{i_1,i_2, \dots,i_r=1}^n \frac{\partial^r f(\mathbf{X}(t)) }{\partial x_{i_r}\,\partial x_{i_{r-1}}\cdots\partial x_{i_1}} (x_{i_1}-x_{i_1,0})(x_{i_2}-x_{i_2,0})\cdots(x_{i_r}-x_{i_r,0}) \end{equation} \nonumber if all partial derivatives of f of order \le r-1 are differentiable in B_\rho(\mathbf{X_0}).

This motivates the following definition.

Under the assumptions of Definition~, the value of \frac{\partial^rf(\mathbf{X}_0)}{\partial x_{i_r}\partial x_{i_{r-1}}\cdots \partial x_{i_1}} \nonumber depends only on the number of times f is differentiated with respect to each variable, and not on the order in which the differentiations are performed (Exercise~). Hence, Exercise~ implies that can be rewritten as \begin{equation} \label{eq:5.4.24} d^{(r)}_{\mathbf{X}_0}f=\sum_r\frac{r!}{ r_1!r_2!\cdots r_n!} \frac{\partial^rf(\mathbf{X}_0)}{\partial x^{r_1}_1\partial x^{r_2}_2\cdots \partial x^{r_n}_n}(dx_1)^{r_1}(dx_2)^{r_2}\cdots(dx_n)^{r_n}, \end{equation} \nonumber where \sum_r indicates summation over all ordered n-tuples (r_1,r_2, \dots,r_n) of nonnegative integers such that r_1+r_2+\cdots+r_n=r \nonumber and \partial x_i^{r_i} is omitted from the ``denominators’’ of all terms in for which r_i=0. In particular, if n=2, d^{(r)}_{\mathbf{X}_0}f=\sum_{j=0}^r\binom{r}{j} \frac{\partial^rf(x_0,y_0)}{\partial x^j\,\partial y^{r-j}}(dx)^j (dy)^{r-j}. \nonumber

The next theorem is analogous to Taylor’s theorem for functions of one variable (Theorem~).

Define \begin{equation} \label{eq:5.4.26} h(t)=f(\mathbf{X}_0+t(\mathbf{X}-\mathbf{X}_0)). \end{equation} \nonumber With \boldsymbol{\Phi}=\mathbf{X}-\mathbf{X}_0, our assumptions and the discussion preceding Definition~ imply that h, h', , h^{(k+1)} exist on [0,1]. From Taylor’s theorem for functions of one variable, \begin{equation} \label{eq:5.4.27} h(1)=\sum_{r=0}^k\frac{h^{(r)}(0)}{ r!}+\frac{h^{(k+1)}(\tau)}{(k+1)!}, \end{equation} \nonumber for some \tau\in(0,1). From , \begin{equation} \label{eq:5.4.28} h(0)=f(\mathbf{X}_0)\mbox{\quad and\quad} h(1)=f(\mathbf{X}). \end{equation} \nonumber From and with \boldsymbol{\Phi}=\mathbf{X}-\mathbf{X}_0, \begin{eqnarray} h^{(r)}(0)\ar=(d^{(r)}_{\mathbf{X}_0}f) (\mathbf{X}-\mathbf{X}_0),\quad 1\le r\le k, \label{eq:5.4.29}\\ \arraytext{and}\nonumber\\ h^{(k+1)}(\tau)\ar=\left(d^{k+1}_{\widetilde{\mathbf{X}}} f\right) (\mathbf{X}-\mathbf{X}_0) \label{eq:5.4.30} \end{eqnarray} \nonumber

where \widetilde{\mathbf{X}}=\mathbf{X}_0+\tau(\mathbf{X}-\mathbf{X}_0) \nonumber is on L and distinct from \mathbf{X}_0 and \mathbf{X}. Substituting , , and into yields .

By analogy with the situation for functions of one variable, we define the kth {} \mathbf{X}_0 by T_k(\mathbf{X})=\sum_{r=0}^k\frac{1}{ r!} (d^{(r)}_{\mathbf{X}_0} f) (\mathbf{X}-\mathbf{X}_0) \nonumber if the differentials exist; then can be rewritten as

f(\mathbf{X})=T_k(\mathbf{X})+\frac{1}{(k+1)!} (d^{(k+1)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0). \nonumber

The next theorem leads to a useful sufficient condition for local maxima and minima. It is related to Theorem~. Strictly speaking, however, it is not a generalization of Theorem (Exercise).

If \epsilon>0, there is a \delta>0 such that B_\delta (\mathbf{X}_0)\subset N and all kth-order partial derivatives of f satisfy the inequality \begin{equation} \label{eq:5.4.32} \left|\frac{\partial^kf(\widetilde{\mathbf{X}})}{\partial x_{i_k}\partial x_{i_{k-1}} \cdots\partial x_{i_1}}- \frac{\partial^kf(\mathbf{X}_0)}{\partial x_{i_k} \partial x_{i_{k-1}}\cdots\partial x_{i_1}}\right|<\epsilon,\quad \widetilde{\mathbf{X}}\in B_\delta (\mathbf{X}_0). \end{equation} \nonumber Now suppose that \mathbf{X}\in B_\delta (\mathbf{X}_0). From Theorem~ with k replaced by k-1, \begin{equation} \label{eq:5.4.33} f(\mathbf{X})=T_{k-1}(\mathbf{X})+\frac{1}{ k!} (d^{(k)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0), \end{equation} \nonumber where \widetilde{\mathbf{X}} is some point on the line segment from \mathbf{X}_0 to \mathbf{X} and is therefore in B_\delta(\mathbf{X}_0). We can rewrite as \begin{equation} \label{eq:5.4.34} f(\mathbf{X})=T_k(\mathbf{X})+\frac{1}{ k!}\left[(d^{(k)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0)- (d^{(k)}_{\mathbf{X}_0} f)(\mathbf{X}-\mathbf{X}_0)\right]. \end{equation} \nonumber But and imply that \begin{equation} \label{eq:5.4.35} \left|(d^{(k)}_{\widetilde{\mathbf{X}}}f)(\mathbf{X}-\mathbf{X}_0)-(d^{(k)}_{{\mathbf{X}}_0}f)(\mathbf{X}-\mathbf{X}_0)\right|< n^k\epsilon |\mathbf{X}-\mathbf{X}_0|^k \end{equation} \nonumber (Exercise~), which implies that \frac{|f(\mathbf{X})-T_k(\mathbf{X})|} { |\mathbf{X}-\mathbf{X}_0|^k}<\frac{n^k\epsilon}{ k!}, \quad\mathbf{X}\in B_\delta (\mathbf{X}_0), \nonumber from . This implies .

Let r be a positive integer and \mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0}). A function of the form \begin{equation} \label{eq:5.4.36} p(\mathbf{X})=\sum_r a_{r_1r_2\dots r_n}(x_1-x_{10})^{r_1} (x_2-x_{20})^{r_2}\cdots (x_n-x_{n0})^{r_n}, \end{equation} \nonumber where the coefficients \{a_{r_1r_2\dots r_n}\} are constants and the summation is over all n-tuples of nonnegative integers (r_1,r_2, \dots,r_n) such that r_1+r_2+\cdots+r_n=r, \nonumber is a {}, provided that at least one of the coefficients is nonzero. For example, if f satisfies the conditions of Definition~, then the function p(\mathbf{X})=(d^{(r)}_{\mathbf{X}_0}f) (\mathbf{X}-\mathbf{X}_0) \nonumber

is such a polynomial if at least one of the rth-order mixed partial derivatives of f at \mathbf{X}_0 is nonzero.

Clearly, p(\mathbf{X}_0)=0 if p is a homogeneous polynomial of degree r\ge 1 in \mathbf{X}-\mathbf{X}_0. If p(\mathbf{X})\ge0 for all \mathbf{X}, we say that p is {}; if p(\mathbf{X})>0 except when \mathbf{X}=\mathbf{X}_0, p is {}.

Similarly, p is {} if p(\mathbf{X})\le0 or {} if p(\mathbf{X})<0 for all \mathbf{X}\ne \mathbf{X}_0. In all these cases, p is {}.

With p as in , p(-\mathbf{X}+2\mathbf{X}_0)=(-1)^r p(\mathbf{X}), \nonumber so p cannot be semidefinite if r is odd.

From Theorem~, if f is differentiable and attains a local extreme value at \mathbf{X}_0, then \begin{equation} \label{eq:5.4.37} d_{\mathbf{X}_0}f=0, \end{equation} \nonumber since f_{x_1}(\mathbf{X}_0)=f_{x_2} (\mathbf{X}_0)=\cdots=f_{x_n}(\mathbf{X}_0)=0. However, the converse is false. The next theorem provides a method for deciding whether a point satisfying is an extreme point. It is related to Theorem~.

From and Theorem~, \begin{equation} \label{eq:5.4.39} \lim_{ \mathbf{X}\to\mathbf{X}_0} \frac{f(\mathbf{X})-f(\mathbf{X}_0)-\dst\frac{1}{k!} (d^{(k)}_{\mathbf{X}_0})(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|^k}=0. \end{equation} \nonumber If \mathbf{X}=\mathbf{X}_0+t\mathbf{U}, where \mathbf{U} is a constant vector, then (d^{(k)}_{\mathbf{X}_0} f) (\mathbf{X}-\mathbf{X}_0)= t^k(d^{(k)}_{\mathbf{X}_0} f)(\mathbf{U}), \nonumber so implies that \lim_{t\to 0} \frac{f(\mathbf{X}_0+t\mathbf{U})- f(\mathbf{X}_0)-\dst\frac{t^k}{k!}(d^{(k)}_{\mathbf{X}_0}f)(\mathbf{U})}{ t^k}=0, \nonumber or, equivalently, \begin{equation} \label{eq:5.4.40} \lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t^k}=\frac{1}{ k!} (d^{(k)}_{\mathbf{X}_0}f)(\mathbf{U}) \end{equation} \nonumber for any constant vector \mathbf{U}.

To prove

, suppose that d^{(k)}_{\mathbf{X}_0}f is not semidefinite. Then there are vectors \mathbf{U}_1 and \mathbf{U}_2 such that (d^{(k)}_{\mathbf{X}_0} f)(\mathbf{U}_1)>0\mbox{\quad and\quad} (d^{(k)}_\mathbf{X_0}f)(\mathbf{U}_2)<0. \nonumber This and imply that f(\mathbf{X}_0+t\mathbf{U}_1)>f(\mathbf{X}_0)\mbox{\quad and\quad} f(\mathbf{X}_0+t\mathbf{U}_2)<f(\mathbf{X}_0) \nonumber for t sufficiently small. Hence, \mathbf{X}_0 is not a local extreme point of f.

To prove

, first assume that d^{(k)}_{\mathbf{X}_0} f is positive definite. Then it can be shown that there is a \rho>0 such that \begin{equation} \label{eq:5.4.41} \frac{(d^{(k)}_{\mathbf{X}_0} f)(\mathbf{X}-\mathbf{X}_0)}{ k!}\ge\rho |\mathbf{X}-\mathbf{X}_0|^k \end{equation} \nonumber

for all \mathbf{X} (Exercise~). From , there is a \delta>0 such that \frac{f(\mathbf{X})-f(\mathbf{X}_0)-\dst\frac{1}{k!} (d^{(k)}_{\mathbf{X}_0}f)(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|^k}>- \frac{\rho}{2}\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta. \nonumber Therefore, f(\mathbf{X})-f(\mathbf{X}_0)>\frac{1}{ k!} (d^{(k)}_{\mathbf{X}_0})(\mathbf{X}-\mathbf{X}_0)-\frac{\rho}{2} |\mathbf{X}-\mathbf{X}_0|^k\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta. \nonumber This and imply that f(\mathbf{X})-f(\mathbf{X}_0)>\frac{\rho}{2} |\mathbf{X}-\mathbf{X}_0|^k\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0| <\delta, \nonumber which implies that \mathbf{X}_0 is a local minimum point of f. This proves half of

. We leave the other half to you (Exercise~).

To prove

merely requires examples; see Exercise~.

Write (x-x_0,y-y_0)=(u,v) and p(u,v)=(d^{(2)}_{\mathbf{X}_0}f)(u,v)=Au^2+2Buv+Cv^2, \nonumber where A=f_{xx}(x_0,y_0), B=f_{xy}(x_0,y_0), and C=f_{yy}(x_0,y_0), so D=AC-B^2. \nonumber If D>0, then A\ne0, and we can write \begin{eqnarray*} p(u,v)\ar=A\left(u^2+\frac{2B}{ A} uv+\frac{B^2}{ A^2}v^2\right)+\left(C-\frac{B^2}{ A}\right)v^2\\ \ar=A\left(u+\frac{B}{ A}v\right)^2+\frac{D}{ A}v^2. \end{eqnarray*} \nonumber This cannot vanish unless u=v=0. Hence, d^{(2)}_{\mathbf{X}_0}f is positive definite if A>0 or negative definite if A<0, and Theorem~ implies

.

If D<0, there are three possibilities:

In each case the two given values of p differ in sign, so \mathbf{X}_0 is not a local extreme point of f, from Theorem~

.

{}


This page titled 5.1: Structure of Rn is shared under a CC BY-NC-SA 3.0 license and was authored, remixed, and/or curated by William F. Trench via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?