3.4: Second-Order Approximations
( \newcommand{\kernel}{\mathrm{null}\,}\)
In one-variable calculus, Taylor polynomials provide a natural way to extend best affine approximations to higher-order polynomial approximations. It is possible to generalize these ideas to scalar-valued functions of two or more variables, but the theory rapidly becomes involved and technical. In this section we will be content merely to point the way with a discussion of second-degree Taylor polynomials. Even at this level, it is best to leave full explanations for a course in advanced calculus.
Higher-order derivatives
The first step is to introduce higher order derivatives. If f:Rn→R has partial derivatives which exist on an open set U, then, for any i=1,2,3,…,n,∂f∂xi is itself a function from Rn to R. The partial derivatives of ∂f∂xi, if they exist, are called second-order partial derivatives of f. We may denote the partial derivative of ∂f∂xi with respect to xj,j=1,2,3,…, evaluated at a point x, by either ∂2∂xj∂xif(x), or fxixj(x), or Dxixjf(x). Note the order in which the variables are written; it is possible that differentiating first with respect to xi and second with respect xj will yield a different result than if the order were reversed.
If j=i, we will write ∂2∂x2if(x) for ∂2∂xi∂xif(x). It is, of course, possible to extend this notation to third, fourth, and higher-order derivatives.
Example 3.4.1
Suppose f(x,y)=x2y−3xsin(2y). Then
fx(x,y)=2xy−3sin(2y)
and
fy(x,y)=x2−6xcos(2y),
so
fxx(x,y)=2y,fxy(x,y)=2x−6cos(2y),fyy(x,y)=12xsin(2y),
and
fyx(x,y)=2x−6cos(2y).
Note that, in this example, fxy(x,y)=fyx(x,y). For an example of a third-order derivative,
fyxy(x,y)=12sin(2y).
Example 3.4.2
Suppose w=xy2z3−4xylog(z). Then, for example,
∂2w∂y∂x=∂∂y(∂w∂x)=∂∂y(y2z3−4ylog(z))=2yz3−4log(z)
and
∂2w∂z2=∂∂z(∂w∂z)=∂∂z(3xy2z2−4xyz)=6xy2z+4xyz2.
Also,
∂2w∂x∂y=∂∂x(∂w∂y)=∂∂x(2xyz3−4xlog(z))=2yz3−4log(z),
and so
∂2w∂y∂x=∂2w∂x∂y.
In both of our examples we have seen instances where mixed second partial derivatives, that is, second-order partial derivatives with respect to two different variables, taken in different orders are equal. This is not always the case, but does follow if we assume that both of the mixed partial derivatives in question are continuous.
Definition 3.4.1
We say a function f:Rn→R is C2 on an open set U if fxjxi is continuous on U for each i=1,2,…,n and j=1,2,…,n.
Theorem 3.4.1
If f is C2 on an open ball containing a point c, then
∂2∂xj∂xif(c)=∂2∂xi∂xjf(c)
for i=1,2,…,n and j=1,2,…,n.
Although we have the tools to verify this result, we will leave the justification for a more advanced course.
We shall see that it is convenient to use a matrix to arrange the second partial derivatives of a function f. If f:Rn→R, there are n2 second partial derivatives and this matrix will be n×n.
Definition 3.4.2
Suppose the second-order partial derivatives of f:Rn→R all exist at the point c. We call the n×n matrix
Hf(c)=[∂2∂x21f(c)∂2∂x2∂x1f(c)∂2∂x3∂x1f(c)⋯∂2∂xn∂x1f(c)∂2∂x1∂x2f(c)∂2∂x22f(c)∂2∂x3∂x2f(c)⋯∂2∂xn∂x2f(c)∂2∂x1∂x3f(c)∂2∂x2∂x3f(c)∂2∂x23f(c)⋯∂2∂xn∂x3f(c)⋮⋮⋮⋱⋮∂2∂x1∂xnf(c)∂2∂x2∂xnf(c)∂2∂x3∂xnf(c)⋯∂2∂x2nf(c)]
the Hessian of f at c.
Put another way, the Hessian of f at c is the n×n matrix whose ith row is ∇fxi(c).
Example 3.4.3
Suppose f(x,y)=x2y−3xsin(2y). Then, using our results from above,
Hf(x,y)=[fxx(x,y)fxy(x,y)fyx(x,y)fyy(x,y)]=[2y2x−6cos(y)2x−6cos(2y)12xsin(2y)].
Thus, for example,
Hf(2,0)=[0−2−20].
Suppose f:Rn→R is C2 on an open ball B2(c,r) and let h=(h1,h2) be a point with ‖h‖<r. If we define φ:R→R by φ(t)=f(c+th), then φ(0)=f(c) and φ(1)=f(c+h). From the one-variable calculus version of Taylor’s theorem, we know that
φ(1)=φ(0)+φ′(0)+12φ′′(s),
where s is a real number between 0 and 1. Using the chain rule, we have
φ′(t)=∇f(c+th)⋅ddt(c+th)=∇f(c+th)⋅h=fx(c+th)h1+fy(c+th)h2
and
φ′′(t)=h1∇fx(c+th)⋅h+h2∇fy(c+th)⋅h=(h1∇fx(c+th)+h2∇fy(c+th)⋅h=[h1h2][fxx(c+th)fxy(c+th)fyx(c+th)fyy(c+th)][h1h2]=hTHf(c+th)h,
where we have used the notation
h=[h1h2]
and
hT=[h1h2],
the latter being called the transpose of h (see Problem 12 of Section 1.6). Hence
φ′(0)=∇f(c)⋅h
and
φ′′(s)=12hTHf(c+sh)h,
so, substituting into (???), we have
f(c+h)=φ(1)=f(c)+∇f(c)⋅h+12hTHf(c+sh)h.
This result, a version of Taylor’s theorem, is easily generalized to higher dimensions.
Theorem 3.4.2
Suppose f:Rn→R is C2 on an open ball Bn(c,r) and let h be a point with ‖h‖<r. Then there exists a real number s between 0 and 1 such that
f(c+h)=f(c)+∇f(c)⋅h+12hTHf(c+sh)h.
If we let x=c+h and evaluate the Hessian at c, (???) becomes a polynomial approximation for f.
Definition 3.4.3
If f:Rn→R is C2 on an open ball about the point c, then we call
P2(x)=f(c)+∇f(c)⋅(x−c)+12(x−c)THf(c)(x−c)
the second-order Taylor polynomial for f at c.
Example 3.4.4
To find the second-order Taylor polynomial for f(x,y)=e−2x+y at (0,0), we compute
∇f(x,y)=(−2e−2x+y,e−2x+y)
and
Hf(x,y)=[4e−2x+y−2e−2x+y−2e−2x+ye−2x+y],
from which it follows that
∇f(0,0)=(−2,1)
and
Hf(0,0)=[4−2−21].
Then
P2(x,y)=f(0,0)+∇f(0,0)⋅(x,y)+12[xy]Hf(0,0)[xy]=1+(−2,1)⋅(x,y)+12[xy][4−2−21][xy]=1−2x+y=12[xy][4x−2y−2x+y]=1−2x+y+12(4x2−2xy−2xy+y2)=1−2x+y+2x2−2xy+12y2.
Symmetric matrices
Note that if f:R2→R is C2 on an open ball about the point c, then the entry in the ith row and jth column of Hf(c) is equal to the entry in the jth row and ith column of Hf(c) since
∂2∂xj∂xif(c)=∂2∂xi∂xjf(c).
Definition 3.4.4
We call a matrix M=[aij] with the property that aij=aji for all i≠j a symmetric matrix.
Example 3.4.5
The matrices
\left[\begin{array}{ll} 2 & 1 \\ 1 & 5 \end{array}\right] \nonumber
and
\left[\begin{array}{rrr} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & -7 \end{array}\right] \nonumber
are both symmetric, while the matrices
\left[\begin{array}{rr} 2 & -1 \\ 3 & 4 \end{array}\right] \nonumber
and
\left[\begin{array}{rrr} 2 & 1 & 3 \\ 2 & 3 & 4 \\ -2 & 4 & -6 \end{array}\right] \nonumber
are not symmetric.
Example \PageIndex{6}
The Hessian of any C^2 scalar valued function is a symmetric matrix. For example, the Hessian of f(x, y)=e^{-2 x+y}, namely,
H f(x, y)=\left[\begin{array}{cc} 4 e^{-2 x+y} & -2 e^{-2 x+y} \\ -2 e^{-2 x+y} & e^{-2 x+y} \end{array}\right] , \nonumber
is symmetric for any value of (x,y).
Given an n \times n symmetric matrix M, the function q: \mathbb{R}^{n} \rightarrow \mathbb{R} defined by
q(\mathbf{x})=\mathbf{x}^{T} M \mathbf{x} \nonumber
is a quadratic polynomial. When M is the Hessian of some function f, this is the form of the quadratic term in the second-order Taylor polynomial for f. In the next section it will be important to be able to determine when this term is positive for all \mathbf{x} \neq \mathbf{0} or negative for all \mathbf{x} \neq \mathbf{0}.
Definition \PageIndex{5}
Let M be an n \times n symmetric matrix and define q: \mathbb{R}^{n} \rightarrow \mathbb{R} by
q(\mathbf{x})=\mathbf{x}^{T} M \mathbf{x} . \nonumber
We say M is positive definite if q(\mathbf{x})>0 for all \mathbf{x} \neq \mathbf{0} in \mathbb{R}^n, negative definite if q(\mathbf{x})<0 for all \mathbf{x} \neq \mathbf{0} in \mathbb{R}^n, and indefinite if there exists an \mathbf{x} \neq 0 for which q(\mathbf{x})>0 and an \mathbf{x} \neq \mathbf{0} for which q(\mathbf{x})<0. Otherwise, we say M is nondefinite.
In general it is not easy to determine to which of these categories a given symmetric matrix belongs. However, the important special case of 2 \times 2 matrices is straightforward. Consider
M=\left[\begin{array}{ll} a & b \\ b & c \end{array}\right] \nonumber
and let
q(x, y)=\left[\begin{array}{ll} x & y \end{array}\right] M\left[\begin{array}{l} x \\ y \end{array}\right]=a x^{2}+2 b x y+c y^{2} . \label{3.4.10}
If a \neq 0, then we may complete the square in (\ref{3.4.10}) to obtain
\begin{align} q(x, y) &=a\left(x^{2}+\frac{2 b}{a} x y\right)+c y^{2} \nonumber \\ &=a\left(\left(x+\frac{b}{a} y\right)^{2}-\frac{b^{2}}{a^{2}} y^{2}\right)+c y^{2} \nonumber \\ &=a\left(x+\frac{b}{a} y\right)^{2}+\left(c-\frac{b^{2}}{a}\right) y^{2} \nonumber \\ &=a\left(x+\frac{b}{a} y\right)^{2}+\frac{a c-b^{2}}{a} y^{2} \nonumber \\ &=a\left(x+\frac{b}{a} y\right)^{2}+\frac{\operatorname{det}(M)}{a} y^{2} . \label{3.4.11} \end{align}
Now suppose \operatorname{det}(M)>0. Then from (\ref{3.4.11}) we see that q(x, y)>0 for all (x, y) \neq(0,0) if a>0 and q(x, y)<0 for all (x, y) \neq(0,0) if a<0. That is, M is positive definite if a>0 and negative definite if a<0. If \operatorname{det}(M)<0, then q(1,0) and q\left(-\frac{b}{a}, 1\right) will have opposite signs, and so M is indefinite. Finally, suppose \operatorname{det}(M)=0. Then
q(x, y)=a\left(x+\frac{b}{a} y\right)^{2} , \nonumber
so q(x,y) = 0 when x=-\frac{b}{a} y. Moreover, q(x,y) has the same sign as a for all other values of (x,y). Hence in this case M is nondefinite.
Similar analyses for the case a=0 give us the following result.
Theorem \PageIndex{3}
Suppose
M=\left[\begin{array}{ll} a & b \\ b & c \end{array}\right] . \nonumber
If \operatorname{det}(M)>0, then M is positive definite if a>0 and negative definite if a<0. If \operatorname{det}(M)<0, then M is indefinite. If \operatorname{det}(M)=0, then M is nondefinite.
Example \PageIndex{7}
The matrix
M=\left[\begin{array}{ll} 2 & 1 \\ 1 & 3 \end{array}\right] \nonumber
is positive definite since \operatorname{det}(M)=5>0 and 2>0.
Example \PageIndex{8}
The matrix
M=\left[\begin{array}{rr} -2 & 1 \\ 1 & -4 \end{array}\right] \nonumber
is negative definite since \operatorname{det}(M)=7>0 and -2<0.
Example \PageIndex{9}
The matrix
M=\left[\begin{array}{rr} -3 & 1 \\ 1 & 2 \end{array}\right] \nonumber
is indefinite since \operatorname{det}(M)=-7<0.
Example \PageIndex{10}
The matrix
M=\left[\begin{array}{ll} 4 & 2 \\ 2 & 1 \end{array}\right] \nonumber
is nondefinite since \operatorname{det}(M)=0.
In the next section we will see how these ideas help us identify local extreme values for scalar valued functions of two variables.