5.1: Structure of Rn

Last updated

May 3, 2023
Save as PDF
- 5: Real-Valued Functions of Several Variables
- 5.2: Continuous Real-Valued Function of n Variables

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\id}{\mathrm{id}}$ $\newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$ $\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$ $\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\id}{\mathrm{id}}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\kernel}{\mathrm{null}\,}$

$\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$

$\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$

$\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$ $\newcommand{\AA}{\unicode[.8,0]{x212B}}$

$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vectorC}[1]{\textbf{#1}}$

$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$

$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$

$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\avec}{\mathbf a}$

$\newcommand{\bvec}{\mathbf b}$

$\newcommand{\cvec}{\mathbf c}$

$\newcommand{\dvec}{\mathbf d}$

$\newcommand{\dtil}{\widetilde{\mathbf d}}$

$\newcommand{\evec}{\mathbf e}$

$\newcommand{\fvec}{\mathbf f}$

$\newcommand{\nvec}{\mathbf n}$

$\newcommand{\pvec}{\mathbf p}$

$\newcommand{\qvec}{\mathbf q}$

$\newcommand{\svec}{\mathbf s}$

$\newcommand{\tvec}{\mathbf t}$

$\newcommand{\uvec}{\mathbf u}$

$\newcommand{\vvec}{\mathbf v}$

$\newcommand{\wvec}{\mathbf w}$

$\newcommand{\xvec}{\mathbf x}$

$\newcommand{\yvec}{\mathbf y}$

$\newcommand{\zvec}{\mathbf z}$

$\newcommand{\rvec}{\mathbf r}$

$\newcommand{\mvec}{\mathbf m}$

$\newcommand{\zerovec}{\mathbf 0}$

$\newcommand{\onevec}{\mathbf 1}$

$\newcommand{\real}{\mathbb R}$

$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$

$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$

$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$

$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$

$\newcommand{\bcal}{\cal B}$

$\newcommand{\ccal}{\cal C}$

$\newcommand{\scal}{\cal S}$

$\newcommand{\wcal}{\cal W}$

$\newcommand{\ecal}{\cal E}$

$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$

$\newcommand{\gray}[1]{\color{gray}{#1}}$

$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$

$\newcommand{\rank}{\operatorname{rank}}$

$\newcommand{\row}{\text{Row}}$

$\newcommand{\col}{\text{Col}}$

$\renewcommand{\row}{\text{Row}}$

$\newcommand{\nul}{\text{Nul}}$

$\newcommand{\var}{\text{Var}}$

$\newcommand{\corr}{\text{corr}}$

$\newcommand{\len}[1]{\left|#1\right|}$

$\newcommand{\bbar}{\overline{\bvec}}$

$\newcommand{\bhat}{\widehat{\bvec}}$

$\newcommand{\bperp}{\bvec^\perp}$

$\newcommand{\xhat}{\widehat{\xvec}}$

$\newcommand{\vhat}{\widehat{\vvec}}$

$\newcommand{\uhat}{\widehat{\uvec}}$

$\newcommand{\what}{\widehat{\wvec}}$

$\newcommand{\Sighat}{\widehat{\Sigma}}$

$\newcommand{\lt}{<}$

$\newcommand{\gt}{>}$

$\newcommand{\amp}{&}$

$\definecolor{fillinmathshade}{gray}{0.9}$

In this chapter we study functions defined on subsets of the real $n$ -dimensional space $\R^n$ , which consists of all ordered $n$ -tuples $\mathbf{X}=(x_1,x_2, \dots,x_n)$ of real numbers, called the {} or {} of $\mathbf{X}$ . This space is sometimes called {}.

In this section we introduce an algebraic structure for $\R^n$ . We also consider its {} properties; that is, properties that can be described in terms of a special class of subsets, the neighborhoods in $\R^n$ . In Section~1.3 we studied the topological properties of $\R^1$ , which we will continue to denote simply as $\R$ . Most of the definitions and proofs in Section~1.3 were stated in terms of neighborhoods in $\R$ . We will see that they carry over to $\R^n$ if the concept of neighborhood in $\R^n$ is suitably defined.

Members of $\R$ have dual interpretations: geometric, as points on the real line, and algebraic, as real numbers. We assume that you are familiar with the geometric interpretation of members of $\R^2$ and $\R^3$ as the rectangular coordinates of points in a plane and three-dimensional space, respectively. Although $\R^n$ cannot be visualized geometrically if $n\ge4$ , geometric ideas from $\R$ , $\R^2$ , and $\R^3$ often help us to interpret the properties of $\R^n$ for arbitrary $n$ .

As we said in Section~1.3, the idea of neighborhood is always associated with some definition of ``closeness’’ of points. The following definition imposes an algebraic structure on $\R^n$ , in terms of which the distance between two points can be defined in a natural way. In addition, this algebraic structure will be useful later for other purposes.

-2em

Note that $+$'' has two distinct meanings in \eqref{eq:5.1.1}: on the left, $+$ ’‘stands for the newly defined addition of members of $\R^n$ and, on the right, for addition of real numbers. However, this can never lead to confusion, since the meaning of `` $+$ ’’ can always be deduced from the symbols on either side of it. A similar comment applies to the use of juxtaposition to indicate scalar multiplication on the left of and multiplication of real numbers on the right.

We leave the proof of the following theorem to you (Exercise~).

Clearly, $\mathbf{0}=(0,0, \dots,0)$ and, if $\mathbf{X}=(x_1,x_2, \dots,x_n)$ , then $-\mathbf{X}= (-x_1,-x_2, \dots,-x_n). \nonumber$ We write $\mathbf{X}+(-\mathbf{Y})$ as $\mathbf{X}-\mathbf{Y}$ . The point $\mathbf{0}$ is called the {}.

A nonempty set $V=\{\mathbf{X},\mathbf{Y},\mathbf{Z}, \dots\}$ , together with rules such as , associating a unique member of $V$ with every ordered pair of its members, and , associating a unique member of $V$ with every real number and member of $V$ , is said to be a {} if it has the properties listed in Theorem~. The members of a vector space are called {}. When we wish to emphasize that we are regarding a member of $\R^n$ as part of this algebraic structure, we will speak of it as a vector; otherwise, we will speak of it as a point.

If $n=1$ , this definition of length reduces to the familiar absolute value, and the distance between two points is the length of the interval having them as endpoints; for $n=2$ and $n=3$ , the length and distance of Definition~ reduce to the familiar definitions for the plane and three-dimensional space.

If $\mathbf{Y}=\mathbf{0}$ , then both sides of are $\mathbf{0}$ , so holds, with equality. In this case, $\mathbf{Y}=0\mathbf{X}$ . Now suppose that $\mathbf{Y}\ne\mathbf{0}$ and $t$ is any real number. Then $\begin{equation}\label{eq:5.1.4} \begin{array}{rcl} 0\ar\le \dst{\sum^n_{i=1} (x_i-ty_i)^2}\\ \ar=\dst{\sum^n_{i=1} x^2_i-2t\sum^n_{i=1} x_iy_i+t^2\sum^n_{i=1} y^2_i}\\\\ \ar=|\mathbf{X}|^2-2(\mathbf{X}\cdot\mathbf{Y})t+t^2|\mathbf{Y}|^2. \end{array} \end{equation} \nonumber$ The last expression is a second-degree polynomial $p$ in $t$ . From the quadratic formula, the zeros of $p$ are $t=\frac{(\mathbf{X}\cdot\mathbf{Y})\pm\sqrt{(\mathbf{X}\cdot\mathbf{Y})^2- |\mathbf{X}|^2|\mathbf{Y}|^2}}{ |\mathbf{Y}|^2}. \nonumber$ Hence, $\begin{equation}\label{eq:5.1.5} (\mathbf{X}\cdot\mathbf{Y})^2\le |\mathbf{X}|^2|\mathbf{Y}|^2, \end{equation} \nonumber$ because if not, then $p$ would have two distinct real zeros and therefore be negative between them (Figure~), contradicting the inequality . Taking square roots in yields if $\mathbf{Y}\ne\mathbf{0}$ .

If $\mathbf{X}=t\mathbf{Y}$ , then $|\mathbf{X}\cdot\mathbf{Y}|=|\mathbf{X}||\mathbf{Y}| =|t||\mathbf{Y}|^2$ (verify), so equality holds in . Conversely, if equality holds in , then $p$ has the real zero $t_0=(\mathbf{X}\cdot\mathbf{Y})/|\mathbf{Y}\|^2$ , and $\sum_{i=1}^n(x_i-t_0y_i)^2=0 \nonumber$ from ; therefore, $\mathbf{X}=t_0\mathbf{Y}$ .

6pt

12pt

By definition, $\begin{equation} \label{eq:5.1.7} \begin{array}{rcl} |\mathbf{X}+\mathbf{Y}|^2\ar=\dst\sum^n_{i=1} (x_i+y_i)^2=\sum^n_{i=1} x^2_i+ 2\sum^n_{i=1} x_iy_i+\sum^n_{i=1}y^2_i\\[4\jot] \ar=|\mathbf{X}|^2+2(\mathbf{X}\cdot\mathbf{Y})+|\mathbf{Y}|^2\\[2\jot] \ar\le |\mathbf{X}|^2+2|\mathbf{X}|\,|\mathbf{Y}|+|\mathbf{Y}|^2\mbox{\quad (by Schwarz's inequality)}\\[2\jot] \ar=(|\mathbf{X}|+|\mathbf{Y}|)^2. \end{array} \end{equation} \nonumber$ Hence, $|\mathbf{X}+\mathbf{Y}|^2\le (|\mathbf{X}|+|\mathbf{Y}|)^2. \nonumber$ Taking square roots yields .

From the third line of , equality holds in if and only if $\mathbf{X}\cdot\mathbf{Y}=|\mathbf{X}||\mathbf{Y}|$ , which is true if and only if one of the vectors $\mathbf{X}$ and $\mathbf{Y}$ is a nonnegative scalar multiple of the other (Lemma~).

Write $\mathbf{X}-\mathbf{Z}=(\mathbf{X}-\mathbf{Y})+(\mathbf{Y}-\mathbf{Z}), \nonumber$ and apply Theorem~ with $\mathbf{X}$ and $\mathbf{Y}$ replaced by $\mathbf{X}-\mathbf{Y}$ and $\mathbf{Y}-\mathbf{Z}$ .

Since $\mathbf{X}=\mathbf{Y}+(\mathbf{X}-\mathbf{Y}), \nonumber$ Theorem~ implies that $|\mathbf{X}|\le |\mathbf{Y}|+|\mathbf{X}-\mathbf{Y}|, \nonumber$ which is equivalent to $|\mathbf{X}|-|\mathbf{Y}|\le |\mathbf{X}-\mathbf{Y}|. \nonumber$ Interchanging $\mathbf{X}$ and $\mathbf{Y}$ yields $|\mathbf{Y}|-|\mathbf{X}|\le |\mathbf{Y}-\mathbf{X}|. \nonumber$ Since $|\mathbf{X}-\mathbf{Y}|=|\mathbf{Y}-\mathbf{X}|$ , the last two inequalities imply the stated conclusion.

The next theorem lists properties of length, distance, and inner product that follow directly from Definitions~ and . We leave the proof to you (Exercise~).

The equation of a line through a point $\mathbf{X}_0=(x_0,y_0,z_0)$ in $\R^3$ can be written parametrically as $x=x_0+u_1t,\quad y=y_0+u_2t,\quad z=z_0+u_3t,\quad -\infty<t< \infty, \nonumber$ where $u_1$ , $u_2$ , and $u_3$ are not all zero. We write this in vector form as $\begin{equation}\label{eq:5.1.9} \mathbf{X}=\mathbf{X}_0+t\mathbf{U},\quad -\infty<t<\infty, \end{equation} \nonumber$ with $\mathbf{U}=(u_1,u_2,u_3)$ , and we say that the line is {}.

There are many ways to represent a given line parametrically. For example, $\begin{equation}\label{eq:5.1.10} \mathbf{X}=\mathbf{X}_0+s\mathbf{V},\quad -\infty<s<\infty, \end{equation} \nonumber$ represents the same line as if and only if $\mathbf{V}=a\mathbf{U}$ for some nonzero real number $a$ . Then the line is traversed in the same direction as $s$ and $t$ vary from $-\infty$ to $\infty$ if $a>0$ , or in opposite directions if $a<0$ .

To write the parametric equation of a line through two points $\mathbf{X}_{0}$ and $\mathbf{X}_1$ in $\R^3$ , we take $\mathbf{U}=\mathbf{X}_1-\mathbf{0}$ in , which yields $\mathbf{X}=\mathbf{X}_0+t(\mathbf{X}_1-\mathbf{X}_0) =t\mathbf{X}_1+(1-t)\mathbf{X}_0,\quad -\infty<t<\infty. \nonumber$ The line segment from $\mathbf{X}_0$ to $\mathbf{X}_1$ consists of those points for which $0\le t\le1$ .

These familiar notions can be generalized to $\R^n$ , as follows:

Having defined distance in $\R^n$ , we are now able to say what we mean by a neighborhood of a point in $\R^n$ .

An $\epsilon$ -neighborhood of a point $\mathbf{X}_0$ in $\R^2$ is the inside, but not the circumference, of the circle of radius $\epsilon$ about $\mathbf{X}_0$ . In $\R^3$ it is the inside, but not the surface, of the sphere of radius $\epsilon$ about $\mathbf{X}_0$ .

In Section~1.3 we stated several other definitions in terms of $\epsilon$ -neighborhoods: {}, {}, {}, {}, {},{}, {}, {}, {}, {}, {}, and {}. Since these definitions are the same for $\R^n$ as for $\R$ , we will not repeat them. We advise you to read them again in Section~1.3, substituting $\R^n$ for $\R$ and $\mathbf{X}_{0}$ for $x_0$ .

1pc 12pt 6pt

12pt

6pt

12pt

Open and closed $n$ -balls are generalizations to $\R^n$ of open and closed intervals.

The following lemma will be useful later in this section, when we consider connected sets.

The line segment is given by $\mathbf{X}=t\mathbf{X}_2+(1-t)\mathbf{X}_1,\quad 0<t<1. \nonumber$ Suppose that $r>0$ . If $|\mathbf{X}_1-\mathbf{X}_0|<r,\quad |\mathbf{X}_2-\mathbf{X}_0|<r, \nonumber$ and $0<t<1$ , then $\begin{eqnarray*} |\mathbf{X}-\mathbf{X}_0|\ar=|t\mathbf{X}_2+(1-t)\mathbf{X}_1-t\mathbf{X}_0-(1-t)\mathbf{X}_0|\\ \ar=|t(\mathbf{X}_2-\mathbf{X}_0)+(1-t)\mathbf{X}_1-\mathbf{X}_0)|\\ \ar\le t|\mathbf{X}_2-\mathbf{X}_0|+(1-t)|\mathbf{X}_1-\mathbf{X}_0|\\ \ar< tr+(1-t)r=r. \end{eqnarray*} \nonumber$ -2em2em

The proofs in Section~1.3 of Theorem~ (the union of open sets is open, the intersection of closed sets is closed) and Theorem~ and its Corollary~ (a set is closed if and only if it contains all its limit points) are also valid in $\R^n$ . You should reread them now.

The Heine–Borel theorem (Theorem~) also holds in $\R^n$ , but the proof in Section~1.3 is valid only for $n=1$ . To prove the Heine–Borel theorem for general $n$ , we need some preliminary definitions and results that are of interest in their own right.

The next two theorems follow from this, the definition of distance in $\R^n$ , and what we already know about convergence in $\R$ . We leave the proofs to you (Exercises~ and ).

The next definition generalizes the definition of the diameter of a circle or sphere.

Let $\{\mathbf{X}_r\}$ be a sequence such that $\mathbf{X}_r\in S_r\ (r\ge1)$ . Because of , $\mathbf{X}_r\in S_k$ if $r\ge k$ , so $|\mathbf{X}_r-\mathbf{X}_s|<d(S_k)\mbox{\quad if\quad} r, s\ge k. \nonumber$ From and Theorem~, $\mathbf{X}_{r}$ converges to a limit $\overline{\mathbf{X}}$ . Since $\overline{\mathbf{X}}$ is a limit point of every $S_k$ and every $S_k$ is closed, $\overline{\mathbf{X}}$ is in every $S_k$ (Corollary~). Therefore, $\overline{\mathbf{X}}\in I$ , so $I\ne \emptyset$ . Moreover, $\overline{\mathbf{X}}$ is the only point in $I$ , since if $\mathbf{Y}\in I$ , then $|\overline{\mathbf{X}}-\mathbf{Y}|\le d(S_k),\quad k\ge1, \nonumber$ and implies that $\mathbf{Y}=\overline{\mathbf{X}}$ .

We can now prove the Heine–Borel theorem for $\R^n$ . This theorem concerns {} sets. As in $\R$ , a compact set in $\R^n$ is a closed and bounded set.

Recall that a collection ${\mathcal H}$ of open sets is an open covering of a set $S$ if $S\subset\cup\set{H}{ H\in {\mathcal H}}. \nonumber$

The proof is by contradiction. We first consider the case where $n=2$ , so that you can visualize the method. Suppose that there is a covering ${\mathcal H}$ for $S$ from which it is impossible to select a finite subcovering. Since $S$ is bounded, $S$ is contained in a closed square $T=\{(x,y)|a_1\le x\le a_1+L, a_2\le x\le a_2+L\} \nonumber$ with sides of length $L$ (Figure~).

6pt

12pt

Bisecting the sides of $T$ as shown by the dashed lines in Figure~ leads to four closed squares, $T^{(1)}, T^{(2)}$ , $T^{(3)}$ , and $T^{(4)}$ , with sides of length $L/2$ . Let $S^{(i)}=S\cap T^{(i)},\quad 1\le i\le4. \nonumber$ Each $S^{(i)}$ , being the intersection of closed sets, is closed, and $S=\bigcup^4_{i=1} S^{(i)}. \nonumber$ Moreover, ${\mathcal H}$ covers each $S^{(i)}$ , but at least one $S^{(i)}$ cannot be covered by any finite subcollection of ${\mathcal H}$ , since if all the $S^{(i)}$ could be, then so could $S$ . Let $S_1$ be a set with this property, chosen from $S^{(1)}$ , $S^{(2)}$ , $S^{(3)}$ , and $S^{(4)}$ . We are now back to the situation we started from: a compact set $S_1$ covered by ${\mathcal H}$ , but not by any finite subcollection of ${\mathcal H}$ . However, $S_1$ is contained in a square $T_1$ with sides of length $L/2$ instead of $L$ . Bisecting the sides of $T_1$ and repeating the argument, we obtain a subset $S_2$ of $S_1$ that has the same properties as $S$ , except that it is contained in a square with sides of length $L/4$ . Continuing in this way produces a sequence of nonempty closed sets $S_0\,(=S)$ , $S_1$ , $S_2$ , , such that $S_k\supset S_{k+1}$ and $d(S_k)\le L/2^{k-1/2}\,(k\ge0)$ . From Theorem~, there is a point $\overline{\mathbf{X}}$ in $\bigcap^\infty_{k=1}S_k$ . Since $\overline{\mathbf{X}}\in S$ , there is an open set $H$ in ${\mathcal H}$ that contains $\overline{\mathbf{X}}$ , and this $H$ must also contain some $\epsilon$ -neighborhood of $\overline{\mathbf{X}}$ . Since every $\mathbf{X}$ in $S_k$ satisfies the inequality $|\mathbf{X}-\overline{\mathbf{X}}|\le2^{-k+1/2}L, \nonumber$ it follows that $S_k\subset H$ for $k$ sufficiently large. This contradicts our assumption on ${\mathcal H}$ , which led us to believe that no $S_k$ could be covered by a finite number of sets from ${\mathcal H}$ . Consequently, this assumption must be false: ${\mathcal H}$ must have a finite subcollection that covers $S$ . This completes the proof for $n=2$ .

The idea of the proof is the same for $n>2$ . The counterpart of the square $T$ is the {} with sides of length $L$ : $T=\set{(x_1,x_2, \dots,x_n)}{ a_i\le x_i\le a_i+L, i=1,2, \dots, n}. \nonumber$ Halving the intervals of variation of the $n$ coordinates $x_1$ , $x_2$ , , $x_n$ divides $T$ into $2^n$ closed hypercubes with sides of length $L/2$ : $T^{(i)}=\set{(x_1,x_2, \dots,x_n)}{b_i\le x_i\le b_i+L/2, 1\le i\le n}, \nonumber$ where $b_i=a_i$ or $b_i=a_i+L/2$ . If no finite subcollection of ${\mathcal H}$ covers $S$ , then at least one of these smaller hypercubes must contain a subset of $S$ that is not covered by any finite subcollection of $S$ . Now the proof proceeds as for $n=2$ .

The Bolzano–Weierstrass theorem is valid in $\R^n$ ; its proof is the same as in $\R$ .

Although it is legitimate to consider functions defined on arbitrary domains, we restricted our study of functions of one variable mainly to functions defined on intervals. There are good reasons for this. If we wish to raise questions of continuity and differentiability at every point of the domain $D$ of a function $f$ , then every point of $D$ must be a limit point of $D^0$ . Intervals have this property. Moreover, the definition of $\int_a^b f(x)\,dx$ is obviously applicable only if $f$ is defined on $[a,b]$ .

It is not productive to consider questions of continuity and differentiability of functions defined on the union of disjoint intervals, since many important results simply do not hold for such domains. For example, the intermediate value theorem (Theorem~; see also Exercise~) says that if $f$ is continuous on an interval $I$ and $f(x_1)<\mu<f(x_2)$ for some $x_1$ and $x_2$ in $I$ , then $f(\overline{x})=\mu$ for some $\overline{x}$ in $I$ . Theorem~ says that $f$ is constant on an interval $I$ if $f'\equiv0$ on $I$ . Neither of these results holds if $I$ is the union of disjoint intervals rather than a single interval; thus, if $f$ is defined on $I=(0,1)\cup (2,3)$ by $f(x)=\left\{\casespace\begin{array}{ll} 1,&0<x<1,\\ 0,&2<x<3,\end{array}\right. \nonumber$ then $f$ is continuous on $I$ , but does not assume any value between $0$ and $1$ , and $f'\equiv0$ on $I$ , but $f$ is not constant.

It is not difficult to see why these results fail to hold for this function: the domain of $f$ consists of two disconnected pieces. It would be more sensible to regard $f$ as two entirely different functions, one defined on $(0,1)$ and the other on $(2,3)$ . The two results mentioned are valid for each of these functions.

As we will see when we study functions defined on subsets of $\R^n$ , considerations like those just cited as making it natural to consider functions defined on intervals in $\R$ lead us to single out a preferred class of subsets as domains of functions of $n$ variables. These subsets are called {}. To define this term, we first need the following definition.

6pt

12pt

If $\mathbf{X}_1,\mathbf{X}_2, \dots,\mathbf{X}_k$ are points in $\R^n$ and $L_i$ is the line segment from $\mathbf{X}_i$ to $\mathbf{X}_{i+1}$ , $1\le i\le k-1$ , we say that $L_1$ , $L_2$ , , $L_{k-1}$ form a {} from $\mathbf{X}_1$ to $\mathbf{X}_k$ , and that $\mathbf{X}_{1}$ and $\mathbf{X}_k$ are {} by the polygonal path. For example, Figure~ shows a polygonal path in $\R^2$ connecting $(0,0)$ to $(3,3)$ . A set $S$ is {} if every pair of points in $S$ can be connected by a polygonal path lying entirely in $S$ .

For sufficiency, we will show that if $S$ is disconnected, then $S$ is not poly-gonally connected. Let $S=A\cup B$ , where $A$ and $B$ satisfy . Suppose that $\mathbf{X}_1\in A$ and $\mathbf{X}_2\in B$ , and assume that there is a polygonal path in $S$ connecting $\mathbf{X}_{1}$ to $\mathbf{X}_2$ . Then some line segment $L$ in this path must contain a point $\mathbf{Y}_1$ in $A$ and a point $\mathbf{Y}_2$ in $B$ . The line segment $\mathbf{X}=t\mathbf{Y}_2+(1-t)\mathbf{Y}_1,\quad 0\le t\le1, \nonumber$ is part of $L$ and therefore in $S$ . Now define $\rho=\sup\set{\tau}{ tY_2+(1-t)\mathbf{Y}_1\in A,\ 0\le t\le \tau\le1}, \nonumber$ and let $\mathbf{X}_\rho=\rho\mathbf{Y}_2+(1-\rho)\mathbf{Y}_1. \nonumber$ Then $\mathbf{X}_\rho\in\overline{A}\cap\overline{B}$ . However, since $_AB $ and $\overline{A}\cap B=A\cap\overline{B}=\emptyset$ , this is impossible. Therefore, the assumption that there is a polygonal path in $S$ from $\mathbf{X}_1$ to $\mathbf{X}_2$ must be false.

For necessity, suppose that $S$ is a connected open set and $\mathbf{X}_0\in S$ . Let $A$ be the set consisting of $\mathbf{X}_0$ and the points in $S$ can be connected to $\mathbf{X}_0$ by polygonal paths in $S$ . Let $B$ be set of points in $S$ that cannot be connected to $\mathbf{X}_0$ by polygonal paths. If $\mathbf{Y}_0\in S$ , then $S$ contains an $\epsilon$ -neighborhood $N_\epsilon (\mathbf{Y}_0)$ of $\mathbf{Y}_0$ , since $S$ is open. Any point $\mathbf{Y}_1$ in $N_\epsilon (\mathbf{Y}_{0}$ can be connected to $\mathbf{Y}_0$ by the line segment $\mathbf{X}=t\mathbf{Y}_1+(1-t)\mathbf{Y}_0,\quad 0\le t\le1, \nonumber$ which lies in $N_\epsilon(\mathbf{Y}_0)$ (Lemma~) and therefore in $S$ . This implies that $\mathbf{Y}_0$ can be connected to $\mathbf{X}_0$ by a polygonal path in $S$ if and only if every member of $N_\epsilon (\mathbf{Y}_{0})$ can also. Thus, $N_\epsilon(\mathbf{Y}_0)\subset A$ if $\mathbf{Y}_0\in A$ , and $N_\epsilon (\mathbf{Y}_0)\in B$ if $\mathbf{Y}_0\in B$ . Therefore, $A$ and $B$ are open. Since $A\cap B =\emptyset$ , this implies that $A\cap\overline{B}=\overline{A}\cap B=\emptyset$ (Exercise~). Since $A$ is nonempty $(\mathbf{X}_0\in A)$ , it now follows that $B=\emptyset$ , since if $B\ne\emptyset$ , $S$ would be disconnected (Definition~). Therefore, $A=S$ , which completes the proof of necessity.

We did not use the assumption that $S$ is open in the proof of sufficiency. In fact, we actually proved that any polygonally connected set, open or not, is connected. The converse is false. A set (not open) may be connected but not polygonally connected (Exercise~).

Our study of functions on $\R^n$ will deal mostly with functions whose domains are regions, defined next.

From Definition~, a sequence $\{\mathbf{X}_r\}$ of points in $\R^n$ converges to a limit $\overline{\mathbf{X}}$ if and only if for every $\epsilon>0$ there is an integer $K$ such that $|\mathbf{X}_r-\overline{\mathbf{X}}|<\epsilon\mbox{\quad if\quad} r\ge K. \nonumber$

The $\R^n$ definitions of divergence, boundedness, subsequence, and sums, differences, and constant multiples of sequences are analogous to those given in Sections 4.1 and 4.2 for the case where $n=1$ . Since $\R^n$ is not ordered for $n>1$ , monotonicity, limits inferior and superior of sequences in $\R^n$ , and divergence to $\pm\infty$ are undefined for $n>1$ . Products and quotients of members of $\R^n$ are also undefined if $n>1$ .

24pt 6pt 12pt

12pt

Several theorems from Sections~4.1 and 4.2 remain valid for sequences in $\R^n$ , with proofs unchanged, provided that `` $|\quad |$ " is interpreted as distance in $\R^n$ . (A trivial change is required: the subscript $n$ , used in Sections~4.1 and 4.2 to identify the terms of the sequence, must be replaced, since $n$ here stands for the dimension of the space.) These include Theorems~ (uniqueness of the limit), (boundedness of a convergent sequence), parts of (concerning limits of sums, differences, and constant multiples of convergent sequences), and (every subsequence of a convergent sequence converges to the limit of the sequence).

{

$),$ and $\ref{exer:1.3.21}.$ }

We now study real-valued functions of $n$ variables. We denote the domain of a function $f$ by $D_f$ and the value of $f$ at a point $\mathbf{X}=(x_1,x_2, \dots,x_n)$ by $f(\mathbf{X})$ or $f(x_1,x_2, \dots,x_n)$ . We continue the convention adopted in Section~2.1 for functions of one variable: If a function is defined by a formula such as $\begin{eqnarray} f(\mathbf{X}) \ar=\left(1-x^2_1-x^2_2-\cdots-x^2_n\right)^{1/2}\label{eq:5.2.1}\\ \arraytext{or}\nonumber\\ g(\mathbf{X}) \ar=\left(1-x^2_1-x^2_2-\cdots-x^2_n\right)^{-1}\label{eq:5.2.2} \end{eqnarray} \nonumber$ without specification of its domain, it is to be understood that its domain is the largest subset of $\R^n$ for which the formula defines a unique real number. Thus, in the absence of any other stipulation, the domain of $f$ in is the closed $n$ -ball $\set{\mathbf{X}}{|\mathbf{X}|\le1}$ , while the domain of $g$ in is the set $\set{\mathbf{X}}{|\mathbf{X}|\ne1}$ .

The main objective of this section is to study limits and continuity of functions of $n$ variables. The proofs of many of the theorems here are similar to the proofs of their counterparts in Sections~2.1 and . We leave most of them to you.

Definition~ does not require that $f$ be defined at $\mathbf{X}_0$ , or even on a deleted neighborhood of $\mathbf{X}_{0}$ .

6pt

The following theorem is analogous to Theorem~2.1.3. We leave its proof to you (Exercise~).

When investigating whether a function has a limit at a point $\mathbf{X}_0$ , no restriction can be made on the way in which $\mathbf{X}$ approaches $\mathbf{X}_0$ , except that $\mathbf{X}$ must be in $D_f$ . The next example shows that incorrect restrictions can lead to incorrect conclusions.

The sum, difference, and product of functions of $n$ variables are defined in the same way as they are for functions of one variable (Definition~), and the proof of the next theorem is the same as the proof of Theorem~.

6pt

12pt

We leave it to you to define $\lim_{|\mathbf{X}|\to\infty} f(\mathbf{X})=\infty$ and $\lim_{|\mathbf{X}|\to\infty} f(\mathbf{X})=-\infty$ (Exercise~).

We will continue the convention adopted in Section~2.1: $\lim_{\mathbf{X}\to \mathbf{X}_0} f(\mathbf{X})$ exists'' means that $\lim_{\mathbf{X}\to\mathbf{X}_0} f(\mathbf{X})=L$, where $L$ is finite; to leave open the possibility that $L=\pm\infty$, we will say that $\lim_{\mathbf{X}\to\mathbf{X}_0} f(\mathbf{X})$ exists in the extended reals.’’ A similar convention applies to limits as $|\mathbf{X}|\to\infty$ .

Theorem~ remains valid if $\lim_{\mathbf{X}\to\mathbf{X}_0}$'' is replaced by $\lim_{|\mathbf{X}|\to\infty}$ ,’’ provided that $D$ is unbounded. Moreover, , , and are valid in either version of Theorem~ if either or both of $L_1$ and $L_2$ is infinite, provided that their right sides are not indeterminate, and remains valid if $L_2\ne 0$ and $L_1/L_2$ is not indeterminate.

We now define continuity for functions of $n$ variables. The definition is quite similar to the definition for functions of one variable.

The next theorem follows from this and Definition~.

In applying this theorem when $\mathbf{X}_0\in D^0_f$ , we will usually omit ``and $\mathbf{X}\in D_f$ ,’’ it being understood that $S_\delta (\mathbf{X}_{0}) \subset D_f$ .

We will say that $f$ is {} $S$ if $f$ is continuous at every point of $S$ .

Theorem~ implies the next theorem, which is analogous to Theorem~ and, like the latter, permits us to investigate continuity of a given function by regarding the function as the result of addition, subtraction, multiplication, and division of simpler functions.

Suppose that $g_1$ , $g_2$ , , $g_n$ are real-valued functions defined on a subset $T$ of $\R^m$ , and define the {} $\mathbf{G}$ on $T$ by $\mathbf{G}(\mathbf{U})=\left(g_1(\mathbf{U}), g_2(\mathbf{U}), \dots, g_n(\mathbf{U})\right),\quad\mathbf{U}\in T. \nonumber$ Then $g_1$ , $g_2$ , , $g_n$ are the {} of $\mathbf{G}=(g_1,g_2, \dots,g_n)$ . We say that $\lim_{\mathbf{U}\to\mathbf{U}_0}\mathbf{G}(\mathbf{U})=\mathbf{L}=(L_1, L_2, \dots,L_n) \nonumber$ if $\lim_{\mathbf{U}\to\mathbf{U}_0} g_i(\mathbf{U})=L_i,\quad 1\le i\le n, \nonumber$ and that $\mathbf{G}$ is {} at $\mathbf{U}_0$ if $g_1$ , $g_2$ , , $g_n$ are each continuous at $\mathbf{U}_0$ .

The next theorem follows from Theorem~ and Definitions~ and . We omit the proof.

The following theorem on the continuity of a composite function is analogous to Theorem~.

6pt

12pt

Suppose that $\epsilon>0$ . Since $f$ is continuous at $\mathbf{X}_0=\mathbf{G}(\mathbf{U}_0)$ , there is an $\epsilon_1>0$ such that $\begin{equation}\label{eq:5.2.17} |f(\mathbf{X})-f(\mathbf{G}(\mathbf{U}_0))|<\epsilon \end{equation} \nonumber$ if $\begin{equation}\label{eq:5.2.18} |\mathbf{X}-\mathbf{G}(\mathbf{U}_0)|<\epsilon_1\mbox{\quad and\quad} \mathbf{X}\in D_f. \end{equation} \nonumber$ Since $\mathbf{G}$ is continuous at $\mathbf{U}_0$ , there is a $\delta>0$ such that $|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|<\epsilon_1 \mbox{\quad if\quad} |\mathbf{U}-\mathbf{U}_0|< \delta\mbox{\quad and\quad}\mathbf{U}\in D_\mathbf{G}. \nonumber$ By taking $\mathbf{X}=\mathbf{G}(\mathbf{U})$ in and , we see that $|h(\mathbf{U})-h(\mathbf{U}_0)|=|f(\mathbf{G}(\mathbf{U}) -f(\mathbf{G}(\mathbf{U}_0))|<\epsilon \nonumber$ if $|\mathbf{U}-\mathbf{U}_0|<\delta\mbox{\quad and\quad}\mathbf{U}\in T. \nonumber$

-.4em The definitions of {}, and {} on a set $S$ are the same for functions of $n$ variables as for functions of one variable, as are the definitions of {} and {} of a function on a set $S$ (Section 2.2). The proofs of the next two theorems are similar to those of Theorems~ and (Exercises~ and ).

The next theorem is analogous to Theorem~.

If there is no such $\mathbf{C}$ , then $S=R\cup T$ , where $\begin{eqnarray*} R\ar=\set{\mathbf{X}}{\mathbf{X}\in S\mbox{ and } f(\mathbf{X})<u}\\ \arraytext{and}\\ T\ar=\set{\mathbf{X}}{\mathbf{X}\in S\mbox{ and } f(\mathbf{X})>u}. \end{eqnarray*} \nonumber$ If $\mathbf{X}_0\in R$ , the continuity of $f$ implies that there is a $\delta>0$ such that $f(\mathbf{X})<u$ if $|\mathbf{X}-\mathbf{X}_0|<\delta$ and $\mathbf{X}\in S$ . This means that $\mathbf{X}_0\not\in\overline{T}$ . Therefore, $R\cap\overline{T}=\emptyset$ . Similarly, $\overline{R}\cap T=\emptyset$ . Therefore, $S$ is disconnected (Definition~), which contradicts the assumption that $S$ is a region (Exercise~). Hence, we conclude that $f(\mathbf{C})=u$ for some $\mathbf{C}$ in $S$ .

-.4em The definition of uniform continuity for functions of $n$ variables is the same as for functions of one variable; $f$ is uniformly continuous on a subset $S$ of its domain in $\R^n$ if for every $\epsilon>0$ there is a $\delta>0$ such that $|f(\mathbf{X})-f(\mathbf{X}')|<\epsilon \nonumber$ whenever $|\mathbf{X}-\mathbf{X}'|<\delta$ and $\mathbf{X},\mathbf{X}'\in S$ . We emphasize again that $\delta$ must depend only on $\epsilon$ and $S$ , and not on the particular points $\mathbf{X}$ and $\mathbf{X}'$ .

The proof of the next theorem is analogous to that of Theorem~. We leave it to you (Exercise~).

{}

To say that a function of one variable has a derivative at $x_0$ is the same as to say that it is differentiable at $x_0$ . The situation is not so simple for a function $f$ of more than one variable. First, there is no specific number that can be called {} derivative of $f$ at a point $\mathbf{X}_0$ in $\R^n$ . In fact, there are infinitely many numbers, called the {} $\mathbf{X}_0$ (defined below), that are analogous to the derivative of a function of one variable. Second, we will see that the existence of directional derivatives at $\mathbf{X}_0$ does not imply that $f$ is differentiable at $\mathbf{X}_0$ , if differentiability at $\mathbf{X}_0$ is to imply (as it does for functions of one variable) that $f(\mathbf{X})-f(\mathbf{X}_0)$ can be approximated well near $\mathbf{X}_0$ by a simple linear function, or even that $f$ is continuous at $\mathbf{X}_0$ .

We will now define directional derivatives and partial derivatives of functions of several variables. However, we will still have occasion to refer to derivatives of functions of one variable. We will call them {} derivatives when we wish to distinguish between them and the partial derivatives that we are about to define.

-2em2em

The directional derivatives that we are most interested in are those in the directions of the unit vectors $\mathbf{E}_1=(1,0, \dots,0),\quad\mathbf{E}_2=(0,1,0, \dots,0), \dots,\quad \mathbf{E}_n=(0, \dots,0,1). \nonumber$ (All components of $\mathbf{E}_i$ are zero except for the $i$ th, which is $1$ .) Since $\mathbf{X}$ and $\mathbf{X}+t\mathbf{E}_i$ differ only in the $i$ th coordinate, $\partial f(\mathbf{X})/\partial\mathbf{E}_i$ is called the {}. It is also denoted by $\partial f(\mathbf{X})/\partial x_i$ or $f_{x_i}(\mathbf{X})$ ; thus, $\frac{\partial f(\mathbf{X})}{\partial x_1}=f_{x_1}(\mathbf{X})=\lim_{t\to 0} \frac{f(x_1+t, x_2, \dots,x_n)-f(x_1,x_2, \dots,x_n)}{ t}, \nonumber$

$\frac{\partial f(\mathbf{X})}{\partial x_i}=f_{x_i}(\mathbf{X})= \lim_{t\to0} \frac{{f(x_1, \dots, x_{i-1}, x_i+t, x_{i+1}, \dots, x_n)} -f(x_1,x_2, \dots,x_n) }{ t} \nonumber$ if $2\le i\le n$ , and $\frac{\partial f(\mathbf{X})}{\partial x_n}=f_{x_n}(\mathbf{X})=\lim_{t\to 0} \frac{f(x_1, \dots,x_{n-1},x_n+t)-f(x_1, \dots,x_{n-1},x_n)}{ t}, \nonumber$ if the limits exist.

If we write $\mathbf{X}=(x,y)$ , then we denote the partial derivatives accordingly; thus, $\begin{eqnarray*} \frac{\partial f(x,y)}{\partial x}\ar=f_x(x,y)=\lim_{h\to 0} \frac{f(x+h,y)- f(x,y)}{ h}\\ \arraytext{and}\\ \frac{\partial f(x,y)}{\partial y}\ar=f_y(x,y)=\lim_{h\to 0} \frac{f(x,y+h)- f(x,y)}{ h}. \end{eqnarray*} \nonumber$

It can be seen from these definitions that to compute $f_{x_i}(\mathbf{X})$ we simply differentiate $f$ with respect to $x_i$ according to the rules for ordinary differentiation, while treating the other variables as constants.

1pc

-3em3em

The next theorem follows from the rule just given for calculating partial derivatives.

If $f_{x_i}(\mathbf{X})$ exists at every point of a set $D$ , then it defines a function $f_{x_i}$ on $D$ . If this function has a partial derivative with respect to $x_j$ on a subset of $D$ , we denote the partial derivative by $\frac{\partial}{\partial x_j}\left(\frac{\partial f}{\partial x_{i}} \right)=\frac{\partial^2 f}{\partial x_j\partial x_i}=f_{x_ix_j}. \nonumber$ Similarly, $\frac{\partial}{\partial x_k}\left(\frac{\partial^2 f}{\partial x_j \partial x_i}\right)=\frac{\partial^3 f}{\partial x_k\partial x_j \partial x_i}=f_{x_ix_jx_k}. \nonumber$ The function obtained by differentiating $f$ successively with respect to $x_{i_1}, x_{i_2}, \dots, x_{i_r}$ is denoted by $\frac{\partial^r f}{\partial x_{i_r}\partial x_{i_{r-1}}\cdots\partial x_{i1}}=f_{x_{i_1}}\cdots x_{i_{r-1}} x_{i_r}; \nonumber$ it is an {}.

This example shows that $f_{xy}(\mathbf{X}_0)$ and $f_{yx}(\mathbf{X}_0)$ may differ. However, the next theorem shows that they are equal if $f$ satisfies a fairly mild condition.

Suppose that $\epsilon>0$ . Choose $\delta>0$ so that the open square

$S_\delta=\set{(x,y)}{|x-x_0|<\delta, |y-y_0|<\delta} \nonumber$ is in $N$ and $\begin{equation}\label{eq:5.3.6} |f_{xy}(\widehat{x},\widehat{y})-f_{xy}(x_0,y_0)|<\epsilon\quad \mbox{\quad if\quad}(\widehat{x},\widehat{y})\in S_\delta. \end{equation} \nonumber$ This is possible because of the continuity of $f_{xy}$ at $(x_0,y_0)$ . The function $\begin{equation}\label{eq:5.3.7} A(h,k)=f(x_0+h, y_0+k)-f(x_0+h,y_0)-f(x_0,y_0+k)+f(x_0,y_0) \end{equation} \nonumber$ is defined if $-\delta<h$ , $k<\delta$ ; moreover, $\begin{equation}\label{eq:5.3.8} A(h,k)=\phi(x_0+h)-\phi(x_0), \end{equation} \nonumber$ where $\phi(x)=f(x,y_0+k)-f(x,y_0). \nonumber$ Since $\phi'(x)=f_x(x,y_0+k)-f_x(x,y_0),\quad |x-x_0|<\delta, \nonumber$ and the mean value theorem imply that $\begin{equation}\label{eq:5.3.9} A(h,k)=\left[f_x (\widehat{x},y_0+k)-f_x(\widehat{x},y_0)\right]h, \end{equation} \nonumber$ where $\widehat{x}$ is between $x_0$ and $x_0+h$ . The mean value theorem, applied to $f_x(\widehat{x},y)$ (where $\widehat{x}$ is regarded as constant), also implies that $f_x(\widehat{x},y_0+k)-f_x(\widehat{x},y_0)=f_{xy}(\widehat{x},\widehat{y})k, \nonumber$ where $\widehat{y}$ is between $y_0$ and $y_0+k$ . From this and , $A(h,k)=f_{xy}(\widehat{x},\widehat{y})hk. \nonumber$ Now implies that $\begin{equation}\label{eq:5.3.10} \left|\frac{A(h,k)}{ hk}-f_{xy}(x_0,y_0)\right|=\left|f_{xy}(\widehat{x}, \widehat{y})-f_{xy}(x_0,y_0)\right|<\epsilon \mbox{\quad if\quad} 0<|h|, |k|<\delta. \end{equation} \nonumber$ Since implies that $\begin{eqnarray*} \lim_{k\to 0}\frac{A(h,k)}{ hk}\ar=\lim_{k\to 0} \frac{f(x_0+h,y_0+k)-f(x_0 +h,y_0)}{ hk}\\ \ar{}-\lim_{k\to 0}\frac{f(x_0,y_0+k)-f(x_0,y_0)}{ hk}\\ \ar=\frac{f_y(x_0+h,y_0)-f_y(x_0,y_0)}{ h}, \end{eqnarray*} \nonumber$ it follows from that $\left|\frac{f_y(x_0+h,y_0)-f_y(x_0,y_0)}{ h}-f_{xy}(x_0,y_0)\right|\le \epsilon\mbox{\quad if\quad} 0<|h|<\delta. \nonumber$

Taking the limit as $h\to0$ yields $|f_{yx}(x_0,y_0)-f_{xy}(x_0,y_0)|\le\epsilon. \nonumber$ Since $\epsilon$ is an arbitrary positive number, this proves .

Theorem~ implies the following theorem. We leave the proof to you (Exercises~ and ).

For example, if $f$ satisfies the hypotheses of Theorem~ with $k=4$ at a point $\mathbf{X}_0$ in $\R^n$ ( $n\ge2$ ), then $f_{xxyy}(\mathbf{X}_0)=f_{xyxy}(\mathbf{X}_0)=f_{xyyx}(\mathbf{X}_0)=f_{yyxx}(\mathbf{X}_0)= f_{yxyx}(\mathbf{X}_0)=f_{yxxy}(\mathbf{X}_0), \nonumber$ and their common value is denoted by $\frac{\partial^4f(\mathbf{X}_0)}{\partial x^2\partial y^2}. \nonumber$

It can be shown (Exercise~) that if $f$ is a function of $(x_1,x_2, \dots,x_n)$ and $(r_1,r_2, \dots,r_n)$ is a fixed ordered $n$ -tuple that satisfies and , then the number of partial derivatives $f_{x_{i_1}x_{i_2}\cdots x_{i_r}}$ that involve differentiation $r_i$ times with respect to $x_i$ , $1\le i\le n$ , equals the {} $\frac{r!}{ r_1!r_2!\cdots r_n!}. \nonumber$

A function of several variables may have first-order partial derivatives at a point $\mathbf{X}_0$ but fail to be continuous at $\mathbf{X}_0$ . For example, if

$\begin{equation}\label{eq:5.3.15} f(x,y)=\left\{\casespace\begin{array}{ll}\dst\frac{xy}{ x^2+y^2},&(x,y)\ne (0,0),\\[2\jot] 0,&(x,y)=(0,0),\end{array}\right. \end{equation} \nonumber$

then $\begin{eqnarray*} f_x(0,0)\ar=\lim_{h\to0}\frac{f(h,0)-f(0,0)}{h}=\lim_{h\to0}\frac{0-0}{h}=0\\ \arraytext{and}\\ f_y(0,0)\ar=\lim_{k\to0}\frac{f(0,k)-f(0,0)}{k}=\lim_{k\to0}\frac{0-0}{k}=0, \end{eqnarray*} \nonumber$ but $f$ is not continous at $(0,0)$ . (See Examples~ and .) Therefore, if differentiability of a function of several variables is to be a stronger property than continuity, as it is for functions of one variable, the definition of differentiability must require more than the existence of first partial derivatives. Exercise~ characterizes differentiability of a function $f$ of one variable in a way that suggests the proper generalization: $f$ is differentiable at $x_0$ if and only if $\lim_{x\to {x_0}} \frac{f(x)-f(x_0)-m(x-x_0)}{ x-x_0}=0 \nonumber$ for some constant $m$ , in which case $m=f'(x_0)$ .

The generalization to functions of $n$ variables is as follows.

From , $m_1=f_x(x_0,y_0)$ and $m_2=f_y(x_0,y_0)$ in Example~. The next theorem shows that this is not a coincidence.

Let $i$ be a given integer in $\{1,2, \dots,n\}$ . Let $\mathbf{X}=\mathbf{X}_0+t\mathbf{E}_i$ , so that $x_i=x_{i0}+t$ , $x_j =x_{j0}$ if $j\ne i$ , and $|\mathbf{X}-\mathbf{X}_0|=|t|$ . Then and the differentiability of $f$ at $\mathbf{X}_0$ imply that $\lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{E}_i)-f(\mathbf{X}_0)-m_it}{ t}=0. \nonumber$

Hence, $\lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{E}_i)-f(\mathbf{X}_0)}{ t}=m_i. \nonumber$ 6pt

This proves , since the limit on the left is $f_{x_i} (\mathbf{X}_0)$ , by definition.

A {} is a function of the form $\begin{equation}\label{eq:5.3.19} L(\mathbf{X})=m_1x_1+m_2x_2+\cdots+m_nx_n, \end{equation} \nonumber$ where $m_1$ , $m_2$ , $,$ $m_n$ are constants. From Definition~, $f$ is differentiable at $\mathbf{X}_0$ if and only if there is a linear function $L$ such that $f(\mathbf{X})-f(\mathbf{X}_{0})$ can be approximated so well near $\mathbf{X}_0$ by $L(\mathbf{X})-L(\mathbf{X}_0)=L(\mathbf{X}-\mathbf{X}_0) \nonumber$ that $\begin{equation}\label{eq:5.3.20} f(\mathbf{X})-f(\mathbf{X}_0)=L(\mathbf{X}-\mathbf{X}_0)+ E(\mathbf{X})(|\mathbf{X}-\mathbf{X}_0|), \end{equation} \nonumber$ where $\begin{equation}\label{eq:5.3.21} \lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0. \end{equation} \nonumber$

6pt

From and Schwarz’s inequality, $|L(\mathbf{X}-\mathbf{X}_0)|\le M|\mathbf{X}-\mathbf{X}_0|, \nonumber$ where $M=(m^2_1+m^2_2+\cdots+m^2_n)^{1/2}. \nonumber$ This and imply that $|f(\mathbf{X})-f(\mathbf{X}_0)|\le(M+|E(\mathbf{X})|) |\mathbf{X}-\mathbf{X}_0|, \nonumber$ which, with , implies that $f$ is continuous at $\mathbf{X}_0$ .

Theorem~ implies that the function $f$ defined by is not differentiable at $(0,0)$ , since it is not continuous at $(0,0)$ . However, $f_x(0,0)$ and $f_y(0,0)$ exist, so the converse of Theorem~ is false; that is, a function may have partial derivatives at a point without being differentiable at the point.

Theorem~ implies that if $f$ is differentiable at $\mathbf{X}_{0}$ , then there is exactly one linear function $L$ that satisfies and : $L(\mathbf{X})=f_{x_1}(\mathbf{X}_0)x_1+ f_{x_2}(\mathbf{X}_0)x_2+\cdots+f_{x_n}(\mathbf{X}_0)x_n. \nonumber$

This function is called the {}. We will denote it by $d_{\mathbf{X}_0}f$ and its value by $(d_{\mathbf{X}_0}f)(\mathbf{X})$ ; thus, $\begin{equation}\label{eq:5.3.22} (d_{\mathbf{X}_0}f)(\mathbf{X})=f_{x_1}(\mathbf{X}_0)x_1+f_{x_2}(\mathbf{X}_0) x_2+\cdots+f_{x_n} (\mathbf{X}_0)x_n. \end{equation} \nonumber$ In terms of the differential, can be rewritten as $\lim_{\mathbf{X}\to\mathbf{X}_0} \frac{f(\mathbf{X})-f(\mathbf{X}_0)-(d_{\mathbf{X}_0}f)(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|} =0. \nonumber$

For convenience in writing $d_{\mathbf{X}_0} f$ , and to conform with standard notation, we introduce the function $dx_i$ , defined by $dx_i(\mathbf{X})=x_i; \nonumber$ that is, $dx_i$ is the function whose value at a point in $\R^n$ is the $i$ th coordinate of the point. It is the differential of the function $g_i(\mathbf{X})=x_i$ . From , $\begin{equation}\label{eq:5.3.23} d_{\mathbf{X}_0} f=f_{x_1}(\mathbf{X}_0)\,dx_1+f_{x_2}(\mathbf{X}_{0} \,dx_2+\cdots+f_{x_n} (\mathbf{X}_0)\,dx_n. \end{equation} \nonumber$

If we write $\mathbf{X}=(x,y, \dots,)$ , then we write $d_{\mathbf{X}_0} f=f_x (\mathbf{X}_0)\,dx+f_y(\mathbf{X}_0)\,dy+\cdots, \nonumber$ where $dx$ , $dy$ , are the functions defined by $dx(\mathbf{X})=x,\quad dy (\mathbf{X})=y,\dots \nonumber$

When it is not necessary to emphasize the specific point $\mathbf{X}_0$ , can be written more simply as $df=f_{x_1}\,dx_1+f_{x_2}\,dx_2+\cdots+f_{x_n}\,dx_n. \nonumber$ When dealing with a specific function at an arbitrary point of its domain, we may use the hybrid notation $df=f_{x_1}(\mathbf{X})\,dx_1+f_{x_2}(\mathbf{X})\,dx_2+\cdots+f_{x_n}(\mathbf{X})\,dx_n. \nonumber$

Unfortunately, the notation for the differential is so complicated that it obscures the simplicity of the concept. The peculiar symbols $df$ , $dx$ , $dy$ , etc., were introduced in the early stages of the development of calculus to represent very small (``infinitesimal’’) increments in the variables. However, in modern usage they are not quantities at all, but linear functions. This meaning of the symbol $dx$ differs from its meaning in $\int_a^b f(x)\,dx$ , where it serves merely to identify the variable of integration; indeed, some authors omit it in the latter context and write simply $\int^b_a f$ .

Theorem~ implies the following lemma, which is analogous to Lemma~. We leave the proof to you (Exercise~).

Theorems~ and and the definition of the differential imply the following \theorem.

The next theorem provides a widely applicable sufficient condition for differentiability.

Let $\mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0})$ and suppose that $\epsilon>0$ . Our assumptions imply that there is a $\delta>0$ such that $f_{x_1}, f_{x_2}, \dots, f_{x_n}$ are defined in the $n$ -ball $S_\delta (\mathbf{X}_0)=\set{\mathbf{X}}{|\mathbf{X}-\mathbf{X}_0|<\delta} \nonumber$ and $\begin{equation}\label{eq:5.3.24} |f_{x_j}(\mathbf{X})-f_{x_j}(\mathbf{X}_0)|<\epsilon\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta,\quad 1\le j\le n. \end{equation} \nonumber$ Let $\mathbf{X}=(x_1,x_, \dots,x_n)$ be in $S_\delta(\mathbf{X}_0)$ . Define $\mathbf{X}_j=(x_1, \dots,x_j, x_{j+1,0}, \dots,x_{n0}),\quad 1\le j\le n-1, \nonumber$ and $\mathbf{X}_n=\mathbf{X}$ . Thus, for $1\le j\le n$ , $\mathbf{X}_j$ differs from $\mathbf{X}_{j-1}$ in the $j$ th component only, and the line segment from $\mathbf{X}_{j-1}$ to $\mathbf{X}_j$ is in $S_\delta (\mathbf{X}_0)$ . Now write $\begin{equation}\label{eq:5.3.25} f(\mathbf{X})-f(\mathbf{X}_0)=f(\mathbf{X}_n)-f(\mathbf{X}_0)= \sum^n_{j=1}\,[f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})], \end{equation} \nonumber$ and consider the auxiliary functions $\begin{equation}\label{eq:5.3.26} \begin{array}{rcl} g_1(t)\ar=f(t,x_{20}, \dots,x_{n0}),\\[2\jot] g_j(t)\ar=f(x_1, \dots,x_{j-1},t,x_{j+1,0}, \dots,x_{n0}),\quad 2\le j\le n-1,\\[2\jot] g_n(t)\ar=f(x_1, \dots,x_{n-1},t), \end{array} \end{equation} \nonumber$ where, in each case, all variables except $t$ are temporarily regarded as constants. Since $f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=g_j(x_j)-g_j(x_{j0}), \nonumber$ the mean value theorem implies that $f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=g'_j(\tau_j)(x_j-x_{j0}), \nonumber$

where $\tau_j$ is between $x_j$ and $x_{j0}$ . From , $g'_j(\tau_j)=f_{x_j}(\widehat{\mathbf{X}}_j), \nonumber$ where $\widehat{\mathbf{X}}_j$ is on the line segment from $\mathbf{X}_{j-1}$ to $\mathbf{X}_j$ . Therefore, $f(\mathbf{X}_j)-f(\mathbf{X}_{j-1})=f_{x_j}(\widehat{\mathbf{X}}_j)(x_j-x_{j0}), \nonumber$ and implies that $\begin{eqnarray*} f(\mathbf{X})-f(\mathbf{X}_0)\ar=\sum^n_{j=1} f_{x_j} (\widehat{\mathbf{X}}_j)(x_j-x_{j0})\\ \ar=\sum^n_{j=1} f_{x_j}(\mathbf{X}_0) (x_j-x_{j0})+\sum^n_{j=1} \,[f_{x_j}(\widehat{\mathbf{X}}_j)-f_{x_j}(\mathbf{X}_0)](x_j-x_{j0}). \end{eqnarray*} \nonumber$ From this and , $\left|f(\mathbf{X})-f(\mathbf{X}_0)-\sum^n_{j=1} f_{x_j}(\mathbf{X}_{0}) (x_j-x_{j0})\right|\le \epsilon\sum^n_{j=1} |x_j-x_{j0}|\le n\epsilon |\mathbf{X}-\mathbf{X}_0|, \nonumber$ which implies that $f$ is differentiable at $\mathbf{X}_0$ .

We say that $f$ is {} on a subset $S$ of $\R^n$ if $S$ is contained in an open set on which $f_{x_1}$ , $f_{x_2}$ , $,$ $f_{x_n}$ are continuous. Theorem~ implies that such a function is differentiable at each $\mathbf{X}_0$ in $S$ .

In Section~2.3 we saw that if a function $f$ of one variable is differentiable at $x_0$ , then the curve $y=f(x)$ has a tangent line $y=T(x)=f(x_0)+f'(x_0)(x-x_0) \nonumber$ that approximates it so well near $x_0$ that $\lim_{x\to x_0}\frac{f(x)-T(x)}{ x-x_0}=0. \nonumber$

Moreover, the tangent line is the ``limit’’ of the secant line through the points $(x_1,f(x_0))$ and $(x_0,f(x_0))$ as $x_1$ approaches $x_0$ .

6pt

12pt Differentiability of a function of $n$ variables has an analogous geometric interpretation. We will illustrate it for $n=2$ . If $f$ is defined in a region $D$ in $\R^2$ , then the set of points $(x,y,z)$ such that $\begin{equation}\label{eq:5.3.27} z=f(x,y),\quad (x,y)\in D, \end{equation} \nonumber$ is a {} in $\R^3$ (Figure~).

6pt

12pt

If $f$ is differentiable at $\mathbf{X}_0=(x_0,y_0)$ , then the plane $\begin{equation}\label{eq:5.3.28} z=T(x,y)=f(\mathbf{X}_0)+f_x(\mathbf{X}_0)(x-x_0)+f_y(\mathbf{X}_0)(y-y_0) \end{equation} \nonumber$ intersects the surface at $(x_0,y_0,f(x_0,y_0))$ and approximates the surface so well near $(x_0,y_0)$ that

$\lim_{(x,y)\to (x_0,y_0)}\frac{f(x,y)-T(x,y)}{\sqrt{(x-x_0)^2+ (y-y_0)^2}}=0 \nonumber$ (Figure~). Moreover, is the only plane in $\R^3$ with these properties (Exercise~). We say that this plane is {} $(x_0,y_0,f(x_0,y_0))$ . We will now show that it is the limit'' ofsecant planes’’ associated with the surface $z=f(x,y)$ , just as a tangent line to a curve $y=f(x)$ in $\R^3$ is the limit of secant lines to the curve (Section~2.3).

Let $\mathbf{X}_i=(x_i,y_i)$ $(i=1,2,3)$ . The equation of the ``secant plane’’ through the points $(x_i,y_i,f(x_i,y_i))$ $(i=1,2,3)$ on the surface $z=f(x,y)$ (Figure~) is of the form $\begin{equation}\label{eq:5.3.29} z=f(\mathbf{X}_0)+A(x-x_0)+B(y-y_0), \end{equation} \nonumber$ where $A$ and $B$ satisfy the system $\begin{eqnarray*} f(\mathbf{X}_1)\ar=f(\mathbf{X}_0)+A(x_1-x_0)+B(y_1-y_0),\\ f(\mathbf{X}_2)\ar=f(\mathbf{X}_0)+A(x_2-x_0)+B(y_2-y_0). \end{eqnarray*} \nonumber$ Solving for $A$ and $B$ yields $\begin{eqnarray} A\ar=\frac{(f(\mathbf{X}_1)-f(\mathbf{X}_0))(y_2-y_0)- (f(\mathbf{X}_2)-f(\mathbf{X}_0)) (y_1-y_0)}{(x_1-x_0)(y_2-y_0)-(x_2-x_0)(y_1-y_0)}\label{eq:5.3.30}\\ \arraytext{and}\nonumber\\ B\ar=\frac{(f(\mathbf{X}_2)-f(\mathbf{X}_0))(x_1-x_0)- (f(\mathbf{X}_1)-f(\mathbf{X}_0))(x_2-x_0)}{(x_1-x_0)(y_2-y_0)- (x_2-x_0)(y_1-y_0)}\label{eq:5.3.31} \end{eqnarray} \nonumber$ if $\begin{equation}\label{eq:5.3.32} (x_1-x_0)(y_2-y_0)-(x_2-x_0)(y_1-y_0)\ne0, \end{equation} \nonumber$ which is equivalent to the requirement that $\mathbf{X}_0$ , $\mathbf{X}_1$ , and $\mathbf{X}_2$ do not lie on a line (Exercise~). If we write $\mathbf{X}_1=\mathbf{X}_0+t\mathbf{U}\mbox{\quad and\quad}\mathbf{X}_2= \mathbf{X}_0+t\mathbf{V}, \nonumber$ where $\mathbf{U}=(u_1,u_2)$ and $\mathbf{V}=(v_1,v_2)$ are fixed nonzero vectors (Figure~), then , , and take the more convenient forms $\begin{eqnarray} A\ar=\frac{\dst{\frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t} v_2- \frac{ f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t}u_2}} {u_1v_2-u_2v_1}, \label{eq:5.3.33}\\ B\ar=\frac{\dst{\frac{f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t} u_1- \frac{ f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t}v_1}} {u_1v_2-u_2v_1}, \label{eq:5.3.34} \end{eqnarray} \nonumber$ and $u_1v_2-u_2v_1\ne0. \nonumber$

6pt

12pt

If $f$ is differentiable at $\mathbf{X}_0$ , then $\begin{equation}\label{eq:5.3.35} f(\mathbf{X})-f(\mathbf{X}_0)=f_x(\mathbf{X}_0) (x-x_0)+f_y(\mathbf{X}_0)(y-y_0)+ \epsilon(\mathbf{X}) |\mathbf{X}-\mathbf{X}_0|, \end{equation} \nonumber$ where $\begin{equation}\label{eq:5.3.36} \lim_{\mathbf{X}\to\mathbf{X}_0}\epsilon(\mathbf{X})=0. \end{equation} \nonumber$ Substituting first $\mathbf{X}=\mathbf{X}_0+t\mathbf{U}$ and then $\mathbf{X} =\mathbf{X}_0+t\mathbf{V}$ in and dividing by $t$ yields $\begin{equation}\label{eq:5.3.37} \frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t}=f_x (\mathbf{X}_0)u_1+ f_y(\mathbf{X}_0)u_2+E_1(t) |\mathbf{U}| \end{equation} \nonumber$ and $\begin{equation}\label{eq:5.3.38} \frac{f(\mathbf{X}_0+t\mathbf{V})-f(\mathbf{X}_0)}{ t}=f_x (\mathbf{X}_0)v_1+ f_y(\mathbf{X}_0)v_2+E_2(t) |\mathbf{V}|, \end{equation} \nonumber$ where $E_1(t)=\epsilon(\mathbf{X}_0+t\mathbf{U})|t|/t\mbox{\quad and\quad} E_2(t)= \epsilon(\mathbf{X}_0+t\mathbf{V})|t|/t, \nonumber$ so $\begin{equation}\label{eq:5.3.39} \lim_{t\to 0} E_i(t)=0,\quad i=1,2, \end{equation} \nonumber$ because of . Substituting and into and yields $\begin{equation}\label{eq:5.3.40} A=f_x(\mathbf{X}_0)+\Delta_1(t),\quad B=f_y(\mathbf{X}_0)+\Delta_2(t), \end{equation} \nonumber$ where

$\Delta_1(t)=\frac{v_2 |\mathbf{U}|E_1(t)-u_2|\mathbf{V}|E_2(t)}{ u_1v_2-u_2v_1} \nonumber$ and $\Delta_2(t)=\frac{u_1|\mathbf{V}|E_2(t)-v_1|\mathbf{U}|E_1(t)}{ u_1v_2-u_2v_1}, \nonumber$ so $\begin{equation}\label{eq:5.3.41} \lim_{t\to 0}\Delta_i(t)=0,\quad i=1,2, \end{equation} \nonumber$ because of .

From and , the equation of the secant plane is $z=f(\mathbf{X}_0)+[f_x(\mathbf{X}_0)+\Delta_1(t)](x-x_0)+ [f_y(\mathbf{X}_0)+\Delta_2(t)](y-y_0). \nonumber$ Therefore, because of , the secant plane ``approaches’’ the tangent plane as $t$ approaches zero.

We say that $\mathbf{X}_0$ is a {} of $f$ if there is a $\delta>0$ such that $f(\mathbf{X})-f(\mathbf{X}_0) \nonumber$ does not change sign in $S_\delta (\mathbf{X}_0)\cap D_f$ . More specifically, $\mathbf{X}_0$ is a {} if $f(\mathbf{X})\le f(\mathbf{X}_0) \nonumber$ or a {} if $f(\mathbf{X})\ge f(\mathbf{X}_0) \nonumber$ for all $\mathbf{X}$ in $S_\delta (\mathbf{X}_0)\cap D_f$ .

The next theorem is analogous to Theorem~.

Let $\mathbf{E}_1=(1,0, \dots,0),\quad \mathbf{E}_{2} =(0,1,0, \dots,0),\dots,\quad \mathbf{E}_n= (0,0, \dots,1), \nonumber$ and $g_i(t)=f(\mathbf{X}_0+t\mathbf{E}_i),\quad 1\le i\le n. \nonumber$ Then $g_i$ is differentiable at $t=0$ , with $g'_i(0)=f_{x_i}(\mathbf{X}_0) \nonumber$

(Definition~). Since $\mathbf{X}_0$ is a local extreme point of $f$ , $t_0=0$ is a local extreme point of $g_i$ . Now Theorem~ implies that $g'_i(0)=0$ , and this implies .

The converse of Theorem~ is false, since may hold at a point $\mathbf{X}_0$ that is not a local extreme point of $f$ . For example, let $\mathbf{X}_0= (0,0)$ and $f(x,y)=x^3+y^3. \nonumber$ We say that a point $\mathbf{X}_0$ where holds is a {} of $f$ . Thus, if $f$ is defined in a neighborhood of a local extreme point $\mathbf{X}_0$ , then $\mathbf{X}_0$ is a critical point of $f$ ; however, a critical point need not be a local extreme point of $f$ .

The use of Theorem~ for finding local extreme points is covered in calculus, so we will not pursue it here.

We now consider the problem of differentiating a composite function $h(\mathbf{U})=f(\mathbf{G}(\mathbf{U})), \nonumber$ where $\mathbf{G}=(g_1,g_2, \dots,g_n)$ is a vector-valued function, as defined in Section~5.2. We begin with the following definition.

We need the following lemma to prove the main result of the section.

Since $g_1$ , $g_2$ , , $g_n$ are differentiable at $\mathbf{U}_0$ , applying Lemma~ to $g_i$ shows that $\begin{equation} \label{eq:5.4.1} \begin{array}{rcl} g_i(\mathbf{U})-g_i(\mathbf{U}_0)\ar=(d_{\mathbf{U}_0}g_i)(\mathbf{U}-\mathbf{U}_0)+ E_i (\mathbf{U}) |(\mathbf{U}-\mathbf{U}_0|\\[2\jot] \ar=\dst\sum_{j=1}^m\frac{\partial g_i(\mathbf{U}_0)}{\partial u_j}(u_j-u_{j0})+ E_i (\mathbf{U}) |(\mathbf{U}-\mathbf{U}_0|, \end{array} \end{equation} \nonumber$

where $\begin{equation} \label{eq:5.4.2} \lim_{\mathbf{U}\to\mathbf{U}_0}E_i(\mathbf{U})=0,\quad 1\le i\le n. \end{equation} \nonumber$ From Schwarz’s inequality, $|g_i(\mathbf{U})-g_i(\mathbf{U}_0)|\le(M_i+|E_i(\mathbf{U})|) |\mathbf{U}-\mathbf{U}_{0}|, \nonumber$ where $M_i=\left(\sum_{j=1}^m\left(\frac{\partial g_i(\mathbf{U}_0)} {\partial u_j}\right)^2\right)^{1/2}. \nonumber$ Therefore, $\frac{|\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|}{|\mathbf{U}-\mathbf{U}_0|} \le\left(\sum_{i=1}^n(M_i+|E_i(\mathbf{U})|)^2\right)^{1/2}. \nonumber$ From , $\lim_{\mathbf{U}\to\mathbf{U}_0} \left(\sum_{i=1}^n(M_i+|E_i(\mathbf{U})|)^2\right)^{1/2} =\left(\sum_{i=1}^nM_i^2\right)^{1/2}=M, \nonumber$ which implies the conclusion.

The following theorem is analogous to Theorem~.

We leave it to you to show that $\mathbf{U}_0$ is an interior point of the domain of $h$ (Exercise~), so it is legitimate to ask if $h$ is differentiable at $\mathbf{U}_0$ .

Let $\mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0})$ . Note that $x_{i0}=g_i(\mathbf{U}_0),\quad 1\le i\le n, \nonumber$ by assumption. Since $f$ is differentiable at $\mathbf{X}_0$ , Lemma~ implies that $\begin{equation} \label{eq:5.4.5} f(\mathbf{X})-f(\mathbf{X}_0)=\sum_{i=1}^n f_{x_i} (\mathbf{X}_0) (x_i-x_{i0})+E(\mathbf{X})|\mathbf{X}-\mathbf{X}_0|, \end{equation} \nonumber$ where $\lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0. \nonumber$

Substituting $\mathbf{X}=\mathbf{G}(\mathbf{U})$ and $\mathbf{X}_0=\mathbf{G}(\mathbf{U}_0)$ in and recalling yields $\begin{equation} \label{eq:5.4.6} h(\mathbf{U})-h(\mathbf{U}_0)=\dst{\sum_{i=1}^n}\, f_{x_i}(\mathbf{X}_0) (g_i(\mathbf{U})-g_i(\mathbf{U}_0)) +E(\mathbf{G}(\mathbf{U})) |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_0)|. \end{equation} \nonumber$ Substituting into yields $\begin{array}{rcl} h(\mathbf{U})-h(\mathbf{U}_0)\ar=\dst{\sum_{i=1}^n} f_{x_i}(\mathbf{X}_0) (d_{\mathbf{U}_0}g_i) (\mathbf{U}-\mathbf{U}_0) +\dst{\left(\sum_{i=1}^n f_{x_i}(\mathbf{X}_0)E_i(\mathbf{U})\right)} |\mathbf{U}-\mathbf{U}_0| \\\\ \ar{}+E(\mathbf{G}(\mathbf{U})) |\mathbf{G}(\mathbf{U})-\mathbf{G}(\mathbf{U}_{0}|. \end{array} \nonumber$ Since $\lim_{\mathbf{U}\to\mathbf{U}_0}E(\mathbf{G}(\mathbf{U}))=\lim_{\mathbf{X}\to\mathbf{X}_0}E(\mathbf{X})=0, \nonumber$ and Lemma~ imply that $\frac{h(\mathbf{U})-h(\mathbf{U}_0)-\dst\sum_{i=1}^nf_{x_i}(\mathbf{X}_{0} d_{\mathbf{U}_0}g_i (\mathbf{U}-\mathbf{U}_0)}{|\mathbf{U}-\mathbf{U}_0|}=0. \nonumber$ Therefore, $h$ is differentiable at $\mathbf{U}_0$ , and $d_{\mathbf{U}_0}h$ is given by .

Substituting $d_{\mathbf{U}_0}g_i=\frac{\partial g_i(\mathbf{U}_0)}{\partial u_1} \,du_1+\frac{\partial g_i(\mathbf{U}_0)}{\partial u_2} \,du_2+\cdots+ \frac{\partial g_i(\mathbf{U}_0)}{ \partial u_m} \,du_m,\quad 1\le i\le n, \nonumber$ into and collecting multipliers of $du_1$ , $du_2$ , , $du_m$ yields $d_{\mathbf{U}_0}h=\sum_{i=1}^m\left(\sum_{j=1}^n \frac{\partial f(\mathbf{X}_0)}{\partial x_j} \frac{\partial g_j(\mathbf{U}_0)}{\partial u_i}\right)\,du_i. \nonumber$ However, from Theorem~, $d_{\mathbf{U}_0} h=\sum_{i=1}^m\frac{\partial h(\mathbf{U}_0)}{\partial u_i} \,du_i. \nonumber$ Comparing the last two equations yields .

When it is not important to emphasize the particular point $\mathbf{X}_0$ , we write less formally as $\begin{equation} \label{eq:5.4.9} \frac{\partial h}{\partial u_i}=\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial g_j}{\partial u_i},\quad 1\le i\le m, \end{equation} \nonumber$ with the understanding that in calculating $\partial h(\mathbf{U}_0)/\partial u_i$ , $\partial g_j/\partial u_i$ is evaluated at $\mathbf{U}_0$ and $\partial f/\partial x_j$ at $\mathbf{X}_0=\mathbf{G}(\mathbf{U}_0)$ .

The formulas and can also be simplified by replacing the symbol $\mathbf{G}$ with $\mathbf{X}=\mathbf{X}(\mathbf{U})$ ; then we write $h(\mathbf{U})=f(\mathbf{X}(\mathbf{U})) \nonumber$ and $\frac{\partial h(\mathbf{U}_0)}{\partial u_i}=\sum_{j=1}^n \frac{\partial f(\mathbf{X}_0)}{ \partial x_j} \frac{\partial x_j(\mathbf{U}_0)}{\partial u_i}, \nonumber$ or simply $\begin{equation} \label{eq:5.4.10} \frac{\partial h}{\partial u_i}=\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial u_i}. \end{equation} \nonumber$

The proof of Corollary~ suggests a straightforward way to calculate the partial derivatives of a composite function without using explicitly. If $h(\mathbf{U})=f(\mathbf{X}(\mathbf{U}))$ , then Theorem~ , in the more casual notation introduced before Example~, implies that $\begin{equation} \label{eq:5.4.12} dh=f_{x_1} dx_1+f_{x_2} dx_2+\cdots+f_{x_n}dx_n, \end{equation} \nonumber$ where $dx_1$ , $dx_2$ , , $dx_n$ must be written in terms of the differentials $du_1$ , $du_2$ , , $du_m$ of the independent variables; thus,

$dx_i=\frac{\partial x_i}{\partial u_1} \,du_1+\frac{\partial x_i}{ \partial u_2} \,du_2+\cdots+\frac{\partial x_i}{\partial u_m} \,du_m. \nonumber$ Substituting this into and collecting the multipliers of $du_1$ , $du_2$ , , $du_m$ yields~.

Higher derivatives of composite functions can be computed by repeatedly applying the chain rule. For example, differentiating with respect to $u_k$ yields $\begin{equation} \label{eq:5.4.16} \begin{array}{rcl} \dst\frac{\partial^2h}{\partial u_k\partial u_i}\ar=\dst{\sum_{j=1}^n \frac{\partial }{\partial u_k}\left(\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial u_i}\right)}\\[2\jot] \ar=\dst{\sum_{j=1}^n\frac{\partial f}{\partial x_j} \frac{\partial^2 x_j}{ \partial u_k\,\partial u_i}+\sum_{j=1}^n\frac{\partial x_j}{\partial u_i} \frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_j}\right)}. \end{array} \end{equation} \nonumber$ We must be careful finding $\frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_j}\right), \nonumber$ which really stands here for $\begin{equation} \label{eq:5.4.17} \frac{\partial}{\partial u_k}\left (\frac{\partial f(\mathbf{X}(\mathbf{U}))}{ \partial x_j}\right). \end{equation} \nonumber$ The safest procedure is to write temporarily $g(\mathbf{X})=\frac{\partial f(\mathbf{X})}{\partial x_j}; \nonumber$ then becomes $\frac{\partial g(\mathbf{X}(\mathbf{U}))}{\partial u_k}=\sum_{s=1}^n \frac{\partial g(\mathbf{X}(\mathbf{U}))}{\partial x_s} \frac{\partial x_s(\mathbf{U})}{\partial u_k}. \nonumber$ Since $\frac{\partial g}{\partial x_s}=\frac{\partial^2f}{\partial x_s\,\partial x_j}, \nonumber$ this yields $\frac{\partial}{\partial u_k}\left(\frac{\partial f}{\partial x_k}\right)= \sum_{s=1}^n\frac{\partial^2f}{\partial x_s\,\partial x_j} \frac{\partial x_s}{\partial u_k}. \nonumber$ Substituting this into yields $\begin{equation} \label{eq:5.4.18} \frac{\partial^2h}{\partial u_k\,\partial u_i}=\sum_{j=1}^n \frac{\partial f }{\partial x_j} \frac{\partial^2x_j}{\partial u_k\,\partial u_i}+ \sum_{j=1}^n\frac{\partial x_j}{\partial u_i}\sum_{s=1}^n \frac{\partial^2f}{\partial x_s\,\partial x_j} \frac{\partial x_s}\,{\partial u_k}. \end{equation} \nonumber$

To compute $h_{u_iu_k}(\mathbf{U}_0)$ from this formula, we evaluate the partial derivatives of $x_1$ , $x_2$ , , $x_n$ at $\mathbf{U}_0$ and those of $f$ at $\mathbf{X}_0= \mathbf{X}(\mathbf{U}_0)$ . The formula is valid if $x_1$ , $x_2$ , , $x_n$ and their first partial derivatives are differentiable at $\mathbf{U}_0$ and $f$ , $f_{x_i}$ , $f_{x_2}$ , , $f_{x_n}$ and their first partial derivatives are differentiable at $\mathbf{X}_0$ .

Instead of memorizing , you should understand how it is derived and use the method, rather than the formula, when calculating second partial derivatives of composite functions. The same method applies to the calculation of higher derivatives.

For a composite function of the form $h(t)=f(x_1(t),x_2(t), \dots,x_n(t)) \nonumber$ where $t$ is a real variable, $x_1$ , $x_2$ , , $x_n$ are differentiable at $t_0$ , and $f$ is differentiable at $\mathbf{X}_0=\mathbf{X}(t_0)$ , takes the form $\begin{equation} \label{eq:5.4.20} h'(t_0)=\sum_{j=1}^n f_{x_j}(\mathbf{X}(t_0)) x'_j(t_0). \end{equation} \nonumber$ This will be useful in the proof of the following theorem.

An equation of $L$ is $\mathbf{X}=\mathbf{X}(t)=t\mathbf{X}_2+(1-t)\mathbf{X}_1,\quad 0\le t\le1. \nonumber$ Our hypotheses imply that the function $h(t)=f(\mathbf{X}(t)) \nonumber$ is continuous on $[0,1]$ and differentiable on $(0,1)$ . Since $x_i(t)=tx_{i2}+(1-t)x_{i1}, \nonumber$ implies that $h'(t)=\sum_{i=1}^n f_{x_i}(\mathbf{X}(t))(x_{i2}-x_{i1}),\quad 0<t<1. \nonumber$ From the mean value theorem for functions of one variable (Theorem~), $h(1)-h(0)=h'(t_0) \nonumber$ for some $t_0\in (0,1)$ . Since $h(1)=f(\mathbf{X}_2)$ and $h(0)=f(\mathbf{X}_1)$ , this implies with $\mathbf{X}_0=\mathbf{X}(t_0)$ .

We will show that if $\mathbf{X}_0$ and $\mathbf{X}$ are in $S$ , then $f(\mathbf{X})=f(\mathbf{X}_0)$ . Since $S$ is an open region, $S$ is polygonally connected (Theorem~). Therefore, there are points $\mathbf{X}_0, \mathbf{X}_1, \dots, \mathbf{X}_n=\mathbf{X} \nonumber$ such that the line segment $L_i$ from $\mathbf{X}_{i-1}$ to $\mathbf{X}_i$ is in $S$ , $1\le i\le n$ . From Theorem~, $f(\mathbf{X}_i)-f(\mathbf{X}_{i-1})=\sum_{i=1}^n (d_{\widetilde{\mathbf{X}}_i} f)(\mathbf{X}_i-\mathbf{X}_{i-1}), \nonumber$ where $\widetilde{\mathbf{X}}$ is on $L_i$ and therefore in $S$ . Therefore, $f_{x_i} (\widetilde{\mathbf{X}}_i)=f_{x_2} (\widetilde{\mathbf{X}}_i)=\cdots= f_{x_n}(\widetilde{\mathbf{X}}_i)=0, \nonumber$

which means that $d_{\widetilde{\mathbf{X}}_i} f\equiv0$ . Hence, $f(\mathbf{X}_0)=f(\mathbf{X}_1)=\cdots=f(\mathbf{X}_n); \nonumber$ that is, $f(\mathbf{X})=f(\mathbf{X}_0)$ for every $\mathbf{X}$ in $S$ .

Suppose that $f$ is defined in an $n$ -ball $B_\rho(\mathbf{X}_0)$ , with $\rho>0$ . If $\mathbf{X}\in B_\rho(\mathbf{X}_0)$ , then $\mathbf{X}(t)=\mathbf{X}_0+t(\mathbf{X}-\mathbf{X}_0)\in B_\rho(\mathbf{X}),\quad 0\le t\le 1, \nonumber$ so the function $h(t)=f(\mathbf{X}(t)) \nonumber$ is defined for $0\le t\le 1$ . From Theorem~ (see also ), $h'(t)=\sum_{i=1}^n f_{x_i} (\mathbf{X}(t)(x_i-x_{i0}) \nonumber$ if $f$ is differentiable in $B_\rho(\mathbf{X}_0)$ , and $\begin{eqnarray*} h''(t)\ar=\sum_{j=1}^n\frac{\partial}{\partial x_j}\left(\sum_{i=1}^n \frac{\partial f(\mathbf{X}(t))}{\partial x_i}(x_i-x_{i0})\right)(x_j-x_{j0})\\ \ar=\sum_{i,j=1}^n\frac{\partial^2f(\mathbf{X}(t))}{\partial x_j\,\partial x_i}(x_i-x_{i0})(x_j-x_{j0}) \end{eqnarray*} \nonumber$ if $f_{x_1}$ , $f_{x_2}$ , , $f_{x_n}$ are differentiable in $B_\rho(\mathbf{X}_0)$ . Continuing in this way, we see that $\begin{equation} \label{eq:5.4.22} h^{(r)}(t)=\sum_{i_1,i_2, \dots,i_r=1}^n \frac{\partial^r f(\mathbf{X}(t)) }{\partial x_{i_r}\,\partial x_{i_{r-1}}\cdots\partial x_{i_1}} (x_{i_1}-x_{i_1,0})(x_{i_2}-x_{i_2,0})\cdots(x_{i_r}-x_{i_r,0}) \end{equation} \nonumber$ if all partial derivatives of $f$ of order $\le r-1$ are differentiable in $B_\rho(\mathbf{X_0})$ .

This motivates the following definition.

Under the assumptions of Definition~, the value of $\frac{\partial^rf(\mathbf{X}_0)}{\partial x_{i_r}\partial x_{i_{r-1}}\cdots \partial x_{i_1}} \nonumber$ depends only on the number of times $f$ is differentiated with respect to each variable, and not on the order in which the differentiations are performed (Exercise~). Hence, Exercise~ implies that can be rewritten as $\begin{equation} \label{eq:5.4.24} d^{(r)}_{\mathbf{X}_0}f=\sum_r\frac{r!}{ r_1!r_2!\cdots r_n!} \frac{\partial^rf(\mathbf{X}_0)}{\partial x^{r_1}_1\partial x^{r_2}_2\cdots \partial x^{r_n}_n}(dx_1)^{r_1}(dx_2)^{r_2}\cdots(dx_n)^{r_n}, \end{equation} \nonumber$ where $\sum_r$ indicates summation over all ordered $n$ -tuples $(r_1,r_2, \dots,r_n)$ of nonnegative integers such that $r_1+r_2+\cdots+r_n=r \nonumber$ and $\partial x_i^{r_i}$ is omitted from the ``denominators’’ of all terms in for which $r_i=0$ . In particular, if $n=2$ , $d^{(r)}_{\mathbf{X}_0}f=\sum_{j=0}^r\binom{r}{j} \frac{\partial^rf(x_0,y_0)}{\partial x^j\,\partial y^{r-j}}(dx)^j (dy)^{r-j}. \nonumber$

The next theorem is analogous to Taylor’s theorem for functions of one variable (Theorem~).

Define $\begin{equation} \label{eq:5.4.26} h(t)=f(\mathbf{X}_0+t(\mathbf{X}-\mathbf{X}_0)). \end{equation} \nonumber$ With $\boldsymbol{\Phi}=\mathbf{X}-\mathbf{X}_0$ , our assumptions and the discussion preceding Definition~ imply that $h$ , $h'$ , , $h^{(k+1)}$ exist on $[0,1]$ . From Taylor’s theorem for functions of one variable, $\begin{equation} \label{eq:5.4.27} h(1)=\sum_{r=0}^k\frac{h^{(r)}(0)}{ r!}+\frac{h^{(k+1)}(\tau)}{(k+1)!}, \end{equation} \nonumber$ for some $\tau\in(0,1)$ . From , $\begin{equation} \label{eq:5.4.28} h(0)=f(\mathbf{X}_0)\mbox{\quad and\quad} h(1)=f(\mathbf{X}). \end{equation} \nonumber$ From and with $\boldsymbol{\Phi}=\mathbf{X}-\mathbf{X}_0$ , $\begin{eqnarray} h^{(r)}(0)\ar=(d^{(r)}_{\mathbf{X}_0}f) (\mathbf{X}-\mathbf{X}_0),\quad 1\le r\le k, \label{eq:5.4.29}\\ \arraytext{and}\nonumber\\ h^{(k+1)}(\tau)\ar=\left(d^{k+1}_{\widetilde{\mathbf{X}}} f\right) (\mathbf{X}-\mathbf{X}_0) \label{eq:5.4.30} \end{eqnarray} \nonumber$

where $\widetilde{\mathbf{X}}=\mathbf{X}_0+\tau(\mathbf{X}-\mathbf{X}_0) \nonumber$ is on $L$ and distinct from $\mathbf{X}_0$ and $\mathbf{X}$ . Substituting , , and into yields .

By analogy with the situation for functions of one variable, we define the $k$ th {} $\mathbf{X}_0$ by $T_k(\mathbf{X})=\sum_{r=0}^k\frac{1}{ r!} (d^{(r)}_{\mathbf{X}_0} f) (\mathbf{X}-\mathbf{X}_0) \nonumber$ if the differentials exist; then can be rewritten as

$f(\mathbf{X})=T_k(\mathbf{X})+\frac{1}{(k+1)!} (d^{(k+1)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0). \nonumber$

The next theorem leads to a useful sufficient condition for local maxima and minima. It is related to Theorem~. Strictly speaking, however, it is not a generalization of Theorem_(Exercise).

If $\epsilon>0$ , there is a $\delta>0$ such that $B_\delta (\mathbf{X}_0)\subset N$ and all $k$ th-order partial derivatives of $f$ satisfy the inequality $\begin{equation} \label{eq:5.4.32} \left|\frac{\partial^kf(\widetilde{\mathbf{X}})}{\partial x_{i_k}\partial x_{i_{k-1}} \cdots\partial x_{i_1}}- \frac{\partial^kf(\mathbf{X}_0)}{\partial x_{i_k} \partial x_{i_{k-1}}\cdots\partial x_{i_1}}\right|<\epsilon,\quad \widetilde{\mathbf{X}}\in B_\delta (\mathbf{X}_0). \end{equation} \nonumber$ Now suppose that $\mathbf{X}\in B_\delta (\mathbf{X}_0)$ . From Theorem~ with $k$ replaced by $k-1$ , $\begin{equation} \label{eq:5.4.33} f(\mathbf{X})=T_{k-1}(\mathbf{X})+\frac{1}{ k!} (d^{(k)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0), \end{equation} \nonumber$ where $\widetilde{\mathbf{X}}$ is some point on the line segment from $\mathbf{X}_0$ to $\mathbf{X}$ and is therefore in $B_\delta(\mathbf{X}_0)$ . We can rewrite as $\begin{equation} \label{eq:5.4.34} f(\mathbf{X})=T_k(\mathbf{X})+\frac{1}{ k!}\left[(d^{(k)}_{\widetilde{\mathbf{X}}} f)(\mathbf{X}-\mathbf{X}_0)- (d^{(k)}_{\mathbf{X}_0} f)(\mathbf{X}-\mathbf{X}_0)\right]. \end{equation} \nonumber$ But and imply that $\begin{equation} \label{eq:5.4.35} \left|(d^{(k)}_{\widetilde{\mathbf{X}}}f)(\mathbf{X}-\mathbf{X}_0)-(d^{(k)}_{{\mathbf{X}}_0}f)(\mathbf{X}-\mathbf{X}_0)\right|< n^k\epsilon |\mathbf{X}-\mathbf{X}_0|^k \end{equation} \nonumber$ (Exercise~), which implies that $\frac{|f(\mathbf{X})-T_k(\mathbf{X})|} { |\mathbf{X}-\mathbf{X}_0|^k}<\frac{n^k\epsilon}{ k!}, \quad\mathbf{X}\in B_\delta (\mathbf{X}_0), \nonumber$ from . This implies .

Let $r$ be a positive integer and $\mathbf{X}_0=(x_{10},x_{20}, \dots,x_{n0})$ . A function of the form $\begin{equation} \label{eq:5.4.36} p(\mathbf{X})=\sum_r a_{r_1r_2\dots r_n}(x_1-x_{10})^{r_1} (x_2-x_{20})^{r_2}\cdots (x_n-x_{n0})^{r_n}, \end{equation} \nonumber$ where the coefficients $\{a_{r_1r_2\dots r_n}\}$ are constants and the summation is over all $n$ -tuples of nonnegative integers $(r_1,r_2, \dots,r_n)$ such that $r_1+r_2+\cdots+r_n=r, \nonumber$ is a {}, provided that at least one of the coefficients is nonzero. For example, if $f$ satisfies the conditions of Definition~, then the function $p(\mathbf{X})=(d^{(r)}_{\mathbf{X}_0}f) (\mathbf{X}-\mathbf{X}_0) \nonumber$

is such a polynomial if at least one of the $r$ th-order mixed partial derivatives of $f$ at $\mathbf{X}_0$ is nonzero.

Clearly, $p(\mathbf{X}_0)=0$ if $p$ is a homogeneous polynomial of degree $r\ge 1$ in $\mathbf{X}-\mathbf{X}_0$ . If $p(\mathbf{X})\ge0$ for all $\mathbf{X}$ , we say that $p$ is {}; if $p(\mathbf{X})>0$ except when $\mathbf{X}=\mathbf{X}_0$ , $p$ is {}.

Similarly, $p$ is {} if $p(\mathbf{X})\le0$ or {} if $p(\mathbf{X})<0$ for all $\mathbf{X}\ne \mathbf{X}_0$ . In all these cases, $p$ is {}.

With $p$ as in , $p(-\mathbf{X}+2\mathbf{X}_0)=(-1)^r p(\mathbf{X}), \nonumber$ so $p$ cannot be semidefinite if $r$ is odd.

From Theorem~, if $f$ is differentiable and attains a local extreme value at $\mathbf{X}_0$ , then $\begin{equation} \label{eq:5.4.37} d_{\mathbf{X}_0}f=0, \end{equation} \nonumber$ since $f_{x_1}(\mathbf{X}_0)=f_{x_2} (\mathbf{X}_0)=\cdots=f_{x_n}(\mathbf{X}_0)=0$ . However, the converse is false. The next theorem provides a method for deciding whether a point satisfying is an extreme point. It is related to Theorem~.

From and Theorem~, $\begin{equation} \label{eq:5.4.39} \lim_{ \mathbf{X}\to\mathbf{X}_0} \frac{f(\mathbf{X})-f(\mathbf{X}_0)-\dst\frac{1}{k!} (d^{(k)}_{\mathbf{X}_0})(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|^k}=0. \end{equation} \nonumber$ If $\mathbf{X}=\mathbf{X}_0+t\mathbf{U}$ , where $\mathbf{U}$ is a constant vector, then $(d^{(k)}_{\mathbf{X}_0} f) (\mathbf{X}-\mathbf{X}_0)= t^k(d^{(k)}_{\mathbf{X}_0} f)(\mathbf{U}), \nonumber$ so implies that $\lim_{t\to 0} \frac{f(\mathbf{X}_0+t\mathbf{U})- f(\mathbf{X}_0)-\dst\frac{t^k}{k!}(d^{(k)}_{\mathbf{X}_0}f)(\mathbf{U})}{ t^k}=0, \nonumber$ or, equivalently, $\begin{equation} \label{eq:5.4.40} \lim_{t\to 0}\frac{f(\mathbf{X}_0+t\mathbf{U})-f(\mathbf{X}_0)}{ t^k}=\frac{1}{ k!} (d^{(k)}_{\mathbf{X}_0}f)(\mathbf{U}) \end{equation} \nonumber$ for any constant vector $\mathbf{U}$ .

To prove

, suppose that $d^{(k)}_{\mathbf{X}_0}f$ is not semidefinite. Then there are vectors $\mathbf{U}_1$ and $\mathbf{U}_2$ such that $(d^{(k)}_{\mathbf{X}_0} f)(\mathbf{U}_1)>0\mbox{\quad and\quad} (d^{(k)}_\mathbf{X_0}f)(\mathbf{U}_2)<0. \nonumber$ This and imply that $f(\mathbf{X}_0+t\mathbf{U}_1)>f(\mathbf{X}_0)\mbox{\quad and\quad} f(\mathbf{X}_0+t\mathbf{U}_2)<f(\mathbf{X}_0) \nonumber$ for $t$ sufficiently small. Hence, $\mathbf{X}_0$ is not a local extreme point of $f$ .

To prove

, first assume that $d^{(k)}_{\mathbf{X}_0} f$ is positive definite. Then it can be shown that there is a $\rho>0$ such that $\begin{equation} \label{eq:5.4.41} \frac{(d^{(k)}_{\mathbf{X}_0} f)(\mathbf{X}-\mathbf{X}_0)}{ k!}\ge\rho |\mathbf{X}-\mathbf{X}_0|^k \end{equation} \nonumber$

for all

$\mathbf{X}$ (Exercise~). From , there is a

$\delta>0$ such that

$\frac{f(\mathbf{X})-f(\mathbf{X}_0)-\dst\frac{1}{k!} (d^{(k)}_{\mathbf{X}_0}f)(\mathbf{X}-\mathbf{X}_0)}{ |\mathbf{X}-\mathbf{X}_0|^k}>- \frac{\rho}{2}\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta. \nonumber$ Therefore,

$f(\mathbf{X})-f(\mathbf{X}_0)>\frac{1}{ k!} (d^{(k)}_{\mathbf{X}_0})(\mathbf{X}-\mathbf{X}_0)-\frac{\rho}{2} |\mathbf{X}-\mathbf{X}_0|^k\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0|<\delta. \nonumber$ This and imply that

$f(\mathbf{X})-f(\mathbf{X}_0)>\frac{\rho}{2} |\mathbf{X}-\mathbf{X}_0|^k\mbox{\quad if\quad} |\mathbf{X}-\mathbf{X}_0| <\delta, \nonumber$ which implies that

$\mathbf{X}_0$ is a local minimum point of

$f$ . This proves half of

. We leave the other half to you (Exercise~).

To prove

merely requires examples; see Exercise~.

Write

$(x-x_0,y-y_0)=(u,v)$ and

$p(u,v)=(d^{(2)}_{\mathbf{X}_0}f)(u,v)=Au^2+2Buv+Cv^2, \nonumber$ where

$A=f_{xx}(x_0,y_0)$ ,

$B=f_{xy}(x_0,y_0)$ , and

$C=f_{yy}(x_0,y_0)$ , so

$D=AC-B^2. \nonumber$ If

$D>0$ , then

$A\ne0$ , and we can write

$\begin{eqnarray*} p(u,v)\ar=A\left(u^2+\frac{2B}{ A} uv+\frac{B^2}{ A^2}v^2\right)+\left(C-\frac{B^2}{ A}\right)v^2\\ \ar=A\left(u+\frac{B}{ A}v\right)^2+\frac{D}{ A}v^2. \end{eqnarray*} \nonumber$ This cannot vanish unless

$u=v=0$ . Hence,

$d^{(2)}_{\mathbf{X}_0}f$ is positive definite if

$A>0$ or negative definite if

$A<0$ , and Theorem~ implies

If $D<0$ , there are three possibilities:

In each case the two given values of

$p$ differ in sign, so

$\mathbf{X}_0$ is not a local extreme point of

$f$ , from Theorem~

{}

Search

Text Color

Text Size

Margin Size

Font Type

Support Center

How can we help?