Skip to main content
Mathematics LibreTexts

3.4: Second-Order Approximations

  • Page ID
    22935
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In one-variable calculus, Taylor polynomials provide a natural way to extend best affine approximations to higher-order polynomial approximations. It is possible to generalize these ideas to scalar-valued functions of two or more variables, but the theory rapidly becomes involved and technical. In this section we will be content merely to point the way with a discussion of second-degree Taylor polynomials. Even at this level, it is best to leave full explanations for a course in advanced calculus.

    Higher-order derivatives

    The first step is to introduce higher order derivatives. If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) has partial derivatives which exist on an open set \(U\), then, for any \(i=1,2,3, \ldots, n, \frac{\partial f}{\partial x_{i}}\) is itself a function from \(\mathbb{R}^n\) to \(\mathbb{R}\). The partial derivatives of \(\frac{\partial f}{\partial x_{i}}\), if they exist, are called second-order partial derivatives of \(f\). We may denote the partial derivative of \(\frac{\partial f}{\partial x_{i}}\) with respect to \(x_{j}, j=1,2,3, \ldots\), evaluated at a point \(\mathbf{x}\), by either \(\frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f(\mathbf{x})\), or \(f_{x_{i} x_{j}}(\mathbf{x})\), or \(D_{x_{i} x_{j}} f(\mathbf{x})\). Note the order in which the variables are written; it is possible that differentiating first with respect to \(x_i\) and second with respect \(x_j\) will yield a different result than if the order were reversed.

    If \(j=i\), we will write \(\frac{\partial^{2}}{\partial x_{i}^{2}} f(\mathbf{x})\) for \(\frac{\partial^{2}}{\partial x_{i} \partial x_{i}} f(\mathbf{x})\). It is, of course, possible to extend this notation to third, fourth, and higher-order derivatives.

    Example \(\PageIndex{1}\)

    Suppose \(f(x, y)=x^{2} y-3 x \sin (2 y)\). Then

    \[ f_{x}(x, y)=2 x y-3 \sin (2 y) \nonumber \]

    and

    \[ f_{y}(x, y)=x^{2}-6 x \cos (2 y) , \nonumber \]

    so

    \[ \begin{gathered}
    f_{x x}(x, y)=2 y, \\
    f_{x y}(x, y)=2 x-6 \cos (2 y), \\
    f_{y y}(x, y)=12 x \sin (2 y),
    \end{gathered} \]

    and

    \[ f_{y x}(x, y)=2 x-6 \cos (2 y) . \nonumber \]

    Note that, in this example, \(f_{x y}(x, y)=f_{y x}(x, y)\). For an example of a third-order derivative,

    \[ f_{y x y}(x, y)=12 \sin (2 y) . \nonumber \]

    Example \(\PageIndex{2}\)

    Suppose \(w=x y^{2} z^{3}-4 x y \log (z)\). Then, for example,

    \[ \frac{\partial^{2} w}{\partial y \partial x}=\frac{\partial}{\partial y}\left(\frac{\partial w}{\partial x}\right)=\frac{\partial}{\partial y}\left(y^{2} z^{3}-4 y \log (z)\right)=2 y z^{3}-4 \log (z) \nonumber \]

    and

    \[ \frac{\partial^{2} w}{\partial z^{2}}=\frac{\partial}{\partial z}\left(\frac{\partial w}{\partial z}\right)=\frac{\partial}{\partial z}\left(3 x y^{2} z^{2}-\frac{4 x y}{z}\right)=6 x y^{2} z+\frac{4 x y}{z^{2}} . \nonumber \]

    Also,

    \[ \frac{\partial^{2} w}{\partial x \partial y}=\frac{\partial}{\partial x}\left(\frac{\partial w}{\partial y}\right)=\frac{\partial}{\partial x}\left(2 x y z^{3}-4 x \log (z)\right)=2 y z^{3}-4 \log (z) , \nonumber \]

    and so

    \[ \frac{\partial^{2} w}{\partial y \partial x}=\frac{\partial^{2} w}{\partial x \partial y} . \nonumber \]

    In both of our examples we have seen instances where mixed second partial derivatives, that is, second-order partial derivatives with respect to two different variables, taken in different orders are equal. This is not always the case, but does follow if we assume that both of the mixed partial derivatives in question are continuous.

    Definition \(\PageIndex{1}\)

    We say a function \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^2\) on an open set \(U\) if \(f_{x_{j} x_{i}}\) is continuous on \(U\) for each \(i=1,2, \ldots, n\) and \(j=1,2, \ldots, n\).

    Theorem \(\PageIndex{1}\)

    If \(f\) is \(C^2\) on an open ball containing a point \(\mathbf{c}\), then

    \[ \frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f(\mathbf{c})=\frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f(\mathbf{c}) \nonumber \]

    for \(i=1,2, \ldots, n\) and \(j=1,2, \ldots, n\).

    Although we have the tools to verify this result, we will leave the justification for a more advanced course.

    We shall see that it is convenient to use a matrix to arrange the second partial derivatives of a function \(f\). If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\), there are \(n^2\) second partial derivatives and this matrix will be \(n \times n\).

    Definition \(\PageIndex{2}\)

    Suppose the second-order partial derivatives of \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) all exist at the point \(\mathbf{c}\). We call the \(n \times n\) matrix

    \[ H f(\mathbf{c})=\left[\begin{array}{ccccc}
    \frac{\partial^{2}}{\partial x_{1}^{2}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{2} \partial x_{1}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{3} \partial x_{1}} f(\mathbf{c}) & \cdots & \frac{\partial^{2}}{\partial x_{n} \partial x_{1}} f(\mathbf{c}) \\
    \frac{\partial^{2}}{\partial x_{1} \partial x_{2}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{2}^{2}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{3} \partial x_{2}} f(\mathbf{c}) & \cdots & \frac{\partial^{2}}{\partial x_{n} \partial x_{2}} f(\mathbf{c}) \\
    \frac{\partial^{2}}{\partial x_{1} \partial x_{3}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{2} \partial x_{3}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{3}^{2}} f(\mathbf{c}) & \cdots & \frac{\partial^{2}}{\partial x_{n} \partial x_{3}} f(\mathbf{c}) \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{\partial^{2}}{\partial x_{1} \partial x_{n}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{2} \partial x_{n}} f(\mathbf{c}) & \frac{\partial^{2}}{\partial x_{3} \partial x_{n}} f(\mathbf{c}) & \cdots & \frac{\partial^{2}}{\partial x_{n}^{2}} f(\mathbf{c}) \label{}
    \end{array}\right] \]

    the Hessian of \(f\) at \(\mathbf{c}\).

    Put another way, the Hessian of \(f\) at \(\mathbf{c}\) is the \(n \times n \) matrix whose \(i\)th row is \(\nabla f_{x_{i}}(\mathbf{c})\).

    Example \(\PageIndex{3}\)

    Suppose \(f(x, y)=x^{2} y-3 x \sin (2 y)\). Then, using our results from above,

    \[ H f(x, y)=\left[\begin{array}{cc}
    f_{x x}(x, y) & f_{x y}(x, y) \\
    f_{y x}(x, y) & f_{y y}(x, y)
    \end{array}\right]=\left[\begin{array}{cc}
    2 y & 2 x-6 \cos (y) \\
    2 x-6 \cos (2 y) & 12 x \sin (2 y)
    \end{array}\right] . \nonumber \]

    Thus, for example,

    \[ H f(2,0)=\left[\begin{array}{rr}
    0 & -2 \\
    -2 & 0
    \end{array}\right] . \nonumber \]

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^2\) on an open ball \(B^{2}(\mathbf{c}, r)\) and let \(\mathbf{h}=\left(h_{1}, h_{2}\right)\) be a point with \(\|\mathbf{h}\|<r\). If we define \(\varphi: \mathbb{R} \rightarrow \mathbb{R}\) by \(\varphi(t)=f(\mathbf{c}+t \mathbf{h})\), then \(\varphi(0)=f(\mathbf{c})\) and \(\varphi(1)=f(\mathbf{c}+\mathbf{h})\). From the one-variable calculus version of Taylor’s theorem, we know that

    \[ \varphi(1)=\varphi(0)+\varphi^{\prime}(0)+\frac{1}{2} \varphi^{\prime \prime}(s) , \label{3.4.2} \]

    where \(s\) is a real number between 0 and 1. Using the chain rule, we have

    \[ \varphi^{\prime}(t)=\nabla f(\mathbf{c}+t \mathbf{h}) \cdot \frac{d}{d t}(\mathbf{c}+t \mathbf{h})=\nabla f(\mathbf{c}+t \mathbf{h}) \cdot \mathbf{h}=f_{x}(\mathbf{c}+t \mathbf{h}) h_{1}+f_{y}(\mathbf{c}+t \mathbf{h}) h_{2} \label{3.4.3} \]

    and

    \[ \begin{align}
    \varphi^{\prime \prime}(t) &=h_{1} \nabla f_{x}(\mathbf{c}+t \mathbf{h}) \cdot \mathbf{h}+h_{2} \nabla f_{y}(\mathbf{c}+t \mathbf{h}) \cdot \mathbf{h} \nonumber \\
    &=\left(h_{1} \nabla f_{x}(\mathbf{c}+t \mathbf{h})+h_{2} \nabla f_{y}(\mathbf{c}+t \mathbf{h}) \cdot \mathbf{h}\right. \nonumber \\
    &=\left[\begin{array}{ll}
    h_{1} & h_{2}
    \end{array}\right]\left[\begin{array}{ll}
    f_{x x}(\mathbf{c}+t \mathbf{h}) & f_{x y}(\mathbf{c}+t \mathbf{h}) \nonumber \\
    f_{y x}(\mathbf{c}+t \mathbf{h}) & f_{y y}(\mathbf{c}+t \mathbf{h})
    \end{array}\right]\left[\begin{array}{l}
    h_{1} \\
    h_{2}
    \end{array}\right] \nonumber \\
    &=\mathbf{h}^{T} H f(\mathbf{c}+t \mathbf{h}) \mathbf{h}, \label{3.4.4}
    \end{align} \]

    where we have used the notation

    \[ \mathbf{h}=\left[\begin{array}{l}
    h_{1} \\
    h_{2}
    \end{array}\right] \nonumber \]

    and

    \[ \mathbf{h}^{T}=\left[\begin{array}{ll}
    h_{1} & h_{2}
    \end{array}\right] , \nonumber \]

    the latter being called the transpose of \(\mathbf{h}\) (see Problem 12 of Section 1.6). Hence

    \[ \varphi^{\prime}(0)=\nabla f(\mathbf{c}) \cdot \mathbf{h} \label{3.4.5} \]

    and

    \[ \varphi^{\prime \prime}(s)=\frac{1}{2} \mathbf{h}^{T} H f(c+s \mathbf{h}) \mathbf{h} , \label{3.4.6} \]

    so, substituting into (\(\ref{3.4.2}\)), we have

    \[ f(\mathbf{c}+\mathbf{h})=\varphi(1)=f(\mathbf{c})+\nabla f(\mathbf{c}) \cdot \mathbf{h}+\frac{1}{2} \mathbf{h}^{T} H f(\mathbf{c}+s \mathbf{h}) \mathbf{h} . \label{3.4.7} \]

    This result, a version of Taylor’s theorem, is easily generalized to higher dimensions.

    Theorem \(\PageIndex{2}\)

    Suppose \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^2\) on an open ball \(B^{n}(\mathbf{c}, r)\) and let \(\mathbf{h}\) be a point with \(\|\mathbf{h}\|<r\). Then there exists a real number \(s\) between 0 and 1 such that

    \[ f(\mathbf{c}+\mathbf{h})=f(\mathbf{c})+\nabla f(\mathbf{c}) \cdot \mathbf{h}+\frac{1}{2} \mathbf{h}^{T} H f(\mathbf{c}+s \mathbf{h}) \mathbf{h} . \label{3.4.8} \]

    If we let \(\mathbf{x}=\mathbf{c}+\mathbf{h}\) and evaluate the Hessian at \(\mathbf{c}\), (\(\ref{3.4.8}\)) becomes a polynomial approximation for \(f\).

    Definition \(\PageIndex{3}\)

    If \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) is \(C^2\) on an open ball about the point \(\mathbf{c}\), then we call

    \[ P_{2}(\mathbf{x})=f(\mathbf{c})+\nabla f(\mathbf{c}) \cdot(\mathbf{x}-\mathbf{c})+\frac{1}{2}(\mathbf{x}-\mathbf{c})^{T} H f(\mathbf{c})(\mathbf{x}-\mathbf{c}) \]

    the second-order Taylor polynomial for \(f\) at \(\mathbf{c}\).

    Example \(\PageIndex{4}\)

    To find the second-order Taylor polynomial for \(f(x, y)=e^{-2 x+y}\) at (0,0), we compute

    \[ \nabla f(x, y)=\left(-2 e^{-2 x+y}, e^{-2 x+y}\right) \nonumber \]

    and

    \[ H f(x, y)=\left[\begin{array}{cc}
    4 e^{-2 x+y} & -2 e^{-2 x+y} \\
    -2 e^{-2 x+y} & e^{-2 x+y}
    \end{array}\right] , \nonumber \]

    from which it follows that

    \[ \nabla f(0,0)=(-2,1) \nonumber \]

    and

    \[ H f(0,0)=\left[\begin{array}{rr}
    4 & -2 \\
    -2 & 1
    \end{array}\right] . \nonumber \]

    Then

    \[ \begin{aligned}
    P_{2}(x, y) &=f(0,0)+\nabla f(0,0) \cdot(x, y)+\frac{1}{2}\left[\begin{array}{ll}
    x & y
    \end{array}\right] H f(0,0)\left[\begin{array}{l}
    x \\
    y
    \end{array}\right] \\
    &=1+(-2,1) \cdot(x, y)+\frac{1}{2}\left[\begin{array}{ll}
    x & y
    \end{array}\right]\left[\begin{array}{rr}
    4 & -2 \\
    -2 & 1
    \end{array}\right]\left[\begin{array}{l}
    x \\
    y
    \end{array}\right] \\
    &=1-2 x+y=\frac{1}{2}\left[\begin{array}{ll}
    x & y
    \end{array}\right]\left[\begin{array}{c}
    4 x-2 y \\
    -2 x+y
    \end{array}\right] \\
    &=1-2 x+y+\frac{1}{2}\left(4 x^{2}-2 x y-2 x y+y^{2}\right) \\
    &=1-2 x+y+2 x^{2}-2 x y+\frac{1}{2} y^{2}.
    \end{aligned} \]

    Symmetric matrices

    Note that if \(f: \mathbb{R}^{2} \rightarrow \mathbb{R}\) is \(C^2\) on an open ball about the point \(\mathbf{c}\), then the entry in the \(i\)th row and \(j\)th column of \(Hf(\mathbf{c})\) is equal to the entry in the \(j\)th row and \(i\)th column of \(Hf(\mathbf{c})\) since

    \[ \frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f(\mathbf{c})=\frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f(\mathbf{c}) . \nonumber \]

    Definition \(\PageIndex{4}\)

    We call a matrix \(M=\left[a_{i j}\right]\) with the property that \(a_{i j}=a_{j i}\) for all \(i \neq j\) a symmetric matrix.

    Example \(\PageIndex{5}\)

    The matrices

    \[ \left[\begin{array}{ll}
    2 & 1 \\
    1 & 5
    \end{array}\right] \nonumber \]

    and

    \[ \left[\begin{array}{rrr}
    1 & 2 & 3 \\
    2 & 4 & 5 \\
    3 & 5 & -7
    \end{array}\right] \nonumber \]

    are both symmetric, while the matrices

    \[ \left[\begin{array}{rr}
    2 & -1 \\
    3 & 4
    \end{array}\right] \nonumber \]

    and

    \[ \left[\begin{array}{rrr}
    2 & 1 & 3 \\
    2 & 3 & 4 \\
    -2 & 4 & -6
    \end{array}\right] \nonumber \]

    are not symmetric.

    Example \(\PageIndex{6}\)

    The Hessian of any \(C^2\) scalar valued function is a symmetric matrix. For example, the Hessian of \(f(x, y)=e^{-2 x+y}\), namely,

    \[ H f(x, y)=\left[\begin{array}{cc}
    4 e^{-2 x+y} & -2 e^{-2 x+y} \\
    -2 e^{-2 x+y} & e^{-2 x+y}
    \end{array}\right] , \nonumber \]

    is symmetric for any value of \((x,y)\).

    Given an \(n \times n \) symmetric matrix \(M\), the function \(q: \mathbb{R}^{n} \rightarrow \mathbb{R}\) defined by

    \[ q(\mathbf{x})=\mathbf{x}^{T} M \mathbf{x} \nonumber \]

    is a quadratic polynomial. When \(M\) is the Hessian of some function \(f\), this is the form of the quadratic term in the second-order Taylor polynomial for \(f\). In the next section it will be important to be able to determine when this term is positive for all \(\mathbf{x} \neq \mathbf{0}\) or negative for all \(\mathbf{x} \neq \mathbf{0}\).

    Definition \(\PageIndex{5}\)

    Let \(M\) be an \(n \times n \) symmetric matrix and define \(q: \mathbb{R}^{n} \rightarrow \mathbb{R}\) by

    \[q(\mathbf{x})=\mathbf{x}^{T} M \mathbf{x} . \nonumber \]

    We say \(M\) is positive definite if \(q(\mathbf{x})>0\) for all \(\mathbf{x} \neq \mathbf{0}\) in \(\mathbb{R}^n\), negative definite if \(q(\mathbf{x})<0\) for all \(\mathbf{x} \neq \mathbf{0}\) in \(\mathbb{R}^n\), and indefinite if there exists an \(\mathbf{x} \neq 0\) for which \(q(\mathbf{x})>0\) and an \(\mathbf{x} \neq \mathbf{0}\) for which \(q(\mathbf{x})<0\). Otherwise, we say \(M\) is nondefinite.

    In general it is not easy to determine to which of these categories a given symmetric matrix belongs. However, the important special case of \(2 \times 2\) matrices is straightforward. Consider

    \[ M=\left[\begin{array}{ll}
    a & b \\
    b & c
    \end{array}\right] \nonumber \]

    and let

    \[ q(x, y)=\left[\begin{array}{ll}
    x & y
    \end{array}\right] M\left[\begin{array}{l}
    x \\
    y
    \end{array}\right]=a x^{2}+2 b x y+c y^{2} . \label{3.4.10} \]

    If \(a \neq 0\), then we may complete the square in (\(\ref{3.4.10}\)) to obtain

    \[ \begin{align}
    q(x, y) &=a\left(x^{2}+\frac{2 b}{a} x y\right)+c y^{2} \nonumber \\
    &=a\left(\left(x+\frac{b}{a} y\right)^{2}-\frac{b^{2}}{a^{2}} y^{2}\right)+c y^{2} \nonumber \\
    &=a\left(x+\frac{b}{a} y\right)^{2}+\left(c-\frac{b^{2}}{a}\right) y^{2} \nonumber \\
    &=a\left(x+\frac{b}{a} y\right)^{2}+\frac{a c-b^{2}}{a} y^{2} \nonumber \\
    &=a\left(x+\frac{b}{a} y\right)^{2}+\frac{\operatorname{det}(M)}{a} y^{2} . \label{3.4.11}
    \end{align}\]

    Now suppose \(\operatorname{det}(M)>0\). Then from (\(\ref{3.4.11}\)) we see that \(q(x, y)>0\) for all \((x, y) \neq(0,0)\) if \(a>0\) and \(q(x, y)<0\) for all \((x, y) \neq(0,0)\) if \(a<0\). That is, \(M\) is positive definite if \(a>0\) and negative definite if \(a<0\). If \(\operatorname{det}(M)<0\), then \(q(1,0)\) and \(q\left(-\frac{b}{a}, 1\right)\) will have opposite signs, and so \(M\) is indefinite. Finally, suppose \(\operatorname{det}(M)=0\). Then

    \[ q(x, y)=a\left(x+\frac{b}{a} y\right)^{2} , \nonumber \]

    so \(q(x,y) = 0\) when \(x=-\frac{b}{a} y\). Moreover, \(q(x,y)\) has the same sign as \(a\) for all other values of \((x,y)\). Hence in this case \(M\) is nondefinite.

    Similar analyses for the case \(a=0\) give us the following result.

    Theorem \(\PageIndex{3}\)

    Suppose

    \[ M=\left[\begin{array}{ll}
    a & b \\
    b & c
    \end{array}\right] . \nonumber \]

    If \(\operatorname{det}(M)>0\), then \(M\) is positive definite if \(a>0\) and negative definite if \(a<0\). If \(\operatorname{det}(M)<0\), then \(M\) is indefinite. If \(\operatorname{det}(M)=0\), then \(M\) is nondefinite.

    Example \(\PageIndex{7}\)

    The matrix

    \[ M=\left[\begin{array}{ll}
    2 & 1 \\
    1 & 3
    \end{array}\right] \nonumber \]

    is positive definite since \(\operatorname{det}(M)=5>0\) and \(2>0\).

    Example \(\PageIndex{8}\)

    The matrix

    \[M=\left[\begin{array}{rr}
    -2 & 1 \\
    1 & -4
    \end{array}\right] \nonumber \]

    is negative definite since \(\operatorname{det}(M)=7>0\) and \(-2<0\).

    Example \(\PageIndex{9}\)

    The matrix

    \[ M=\left[\begin{array}{rr}
    -3 & 1 \\
    1 & 2
    \end{array}\right] \nonumber \]

    is indefinite since \(\operatorname{det}(M)=-7<0\).

    Example \(\PageIndex{10}\)

    The matrix

    \[ M=\left[\begin{array}{ll}
    4 & 2 \\
    2 & 1
    \end{array}\right] \nonumber \]

    is nondefinite since \(\operatorname{det}(M)=0\).

    In the next section we will see how these ideas help us identify local extreme values for scalar valued functions of two variables.


    This page titled 3.4: Second-Order Approximations is shared under a CC BY-NC-SA 1.0 license and was authored, remixed, and/or curated by Dan Sloughter via source content that was edited to the style and standards of the LibreTexts platform.