2.6: Unconstrained Optimization- Numerical Methods

Last updated
Save as PDF

Page ID: 2255

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

The types of problems that we solved in the previous section were examples of unconstrained optimization problems. That is, we tried to find local (and perhaps even global) maximum and minimum points of real-valued functions \(f (x, y)\), where the points \((x, y)\) could be any points in the domain of \(f\). The method we used required us to find the critical points of \(f\), which meant having to solve the equation \(\nabla f = \textbf{0}\), which in general is a system of two equations in two unknowns (\(x \text{ and }y\)). While this was relatively simple for the examples we did, in general this will not be the case. If the equations involve polynomials in \(x \text{ and }y\) of degree three or higher, or complicated expressions involving trigonometric, exponential, or logarithmic functions, then solving even one such equation, let alone two, could be impossible by elementary means.

For example, if one of the equations that had to be solved was

\[\nonumber x^3 +9x−2 = 0 ,\]

you may have a hard time getting the exact solutions. Trial and error would not help much, especially since the only real solution turns out to be

\[\nonumber \sqrt[3]{\sqrt{28}+1-\sqrt[3]{\sqrt{28}-1}}.\]

In a situation such as this, the only choice may be to find a solution using some numerical method which gives a sequence of numbers which converge to the actual solution. For example, Newton’s method for solving equations \(f (x) = 0\), which you probably learned in single-variable calculus. In this section we will describe another method of Newton for finding critical points of real-valued functions of two variables.

Let \(f (x, y)\) be a smooth real-valued function, and define

\[\nonumber D(x, y) = \dfrac{∂^2 f}{ ∂x^2} (x, y) \dfrac{∂^2 f}{ ∂y^2} (x, y)− \left (\dfrac{∂^2 f}{∂y∂x} (x, y)\right )^2 \]

Newton’s algorithm: Pick an initial point \((x_0 , y_0)\). For \(n\) = 0,1,2,3,..., define:

\[ x_{n+1}=x_n - \dfrac{\begin{vmatrix} \dfrac{∂^2 f}{ ∂y^2} (x_n, y_n) & \dfrac{∂^2 f}{∂x∂y} (x_n, y_n) \\[4pt] \dfrac{∂f}{∂y} (x_n, y_n) & \dfrac{∂f}{∂x} (x_n, y_n)\\[4pt] \end{vmatrix}}{D(x_n, y_n)},\quad y_{n+1}=y_n - \dfrac{\begin{vmatrix} \dfrac{∂^2 f}{ ∂x^2} (x_n, y_n) & \dfrac{∂^2 f}{∂x∂y} (x_n, y_n) \\[4pt] \dfrac{∂f}{∂x} (x_n, y_n) & \dfrac{∂f}{∂y} (x_n, y_n)\\[4pt] \end{vmatrix}}{D(x_n, y_n)}\label{Eq2.14}\]

Then the sequence of points \((x_n, y_n)_{n=1}^{\infty}\) converges to a critical point. If there are several critical points, then you will have to try different initial points to find them.

\(f (x, y) = x^ 3 − x y− x+ x y^3 − y^ 4\text{ for }−1 ≤ x ≤ 0 \text{ and }0 ≤ y ≤ 1\)

The derivation of Newton’s algorithm, and the proof that it converges (given a “reasonable” choice for the initial point) requires techniques beyond the scope of this text. See RALSTON and RABINOWITZ for more detail and for discussion of other numerical methods. Our description of Newton’s algorithm is the special two-variable case of a more general algorithm that can be applied to functions of \(n \ge 2\) variables.

In the case of functions which have a global maximum or minimum, Newton’s algorithm can be used to find those points. In general, global maxima and minima tend to be more interesting than local versions, at least in practical applications. A maximization problem can always be turned into a minimization problem (why?), so a large number of methods have been developed to find the global minimum of functions of any number of variables. This field of study is called nonlinear programming. Many of these methods are based on the steepest descent technique, which is based on an idea that we discussed in Section 2.4. Recall that the negative gradient \(- \nabla f\) gives the direction of the fastest rate of decrease of a function \(f\). The crux of the steepest descent idea, then, is that starting from some initial point, you move a certain amount in the direction of \(-\nabla f\) at that point. Wherever that takes you becomes your new point, and you then just keep repeating that procedure until eventually (hopefully) you reach the point where f has its smallest value. There is a “pure” steepest descent method, and a multitude of variations on it that improve the rate of convergence, ease of calculation, etc. In fact, Newton’s algorithm can be interpreted as a modified steepest descent method. For more discussion of this, and of nonlinear programming in general, see BAZARAA, SHERALI and SHETTY.

Search

Text Color

Text Size

Margin Size

Font Type