$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$ $$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

# 2.6: Unconstrained Optimization: Numerical Methods

The types of problems that we solved in the previous section were examples of unconstrained optimization problems. That is, we tried to find local (and perhaps even global) maximum and minimum points of real-valued functions $$f (x, y)$$, where the points $$(x, y)$$ could be any points in the domain of $$f$$ . The method we used required us to find the critical points of $$f$$ , which meant having to solve the equation $$\nabla f = \textbf{0}$$, which in general is a system of two equations in two unknowns ($$x \text{ and }y$$). While this was relatively simple for the examples we did, in general this will not be the case. If the equations involve polynomials in $$x \text{ and }y$$ of degree three or higher, or complicated expressions involving trigonometric, exponential, or logarithmic functions, then solving even one such equation, let alone two, could be impossible by elementary means.

For example, if one of the equations that had to be solved was

$\nonumber x^3 +9x−2 = 0 ,$

you may have a hard time getting the exact solutions. Trial and error would not help much, especially since the only real solution turns out to be

$\nonumber \sqrt[3]{\sqrt{28}+1-\sqrt[3]{\sqrt{28}-1}}.$

In a situation such as this, the only choice may be to find a solution using some numerical method which gives a sequence of numbers which converge to the actual solution. For example, Newton’s method for solving equations $$f (x) = 0$$, which you probably learned in single-variable calculus. In this section we will describe another method of Newton for finding critical points of real-valued functions of two variables.

Let $$f (x, y)$$ be a smooth real-valued function, and define

$\nonumber D(x, y) = \dfrac{∂^2 f}{ ∂x^2} (x, y) \dfrac{∂^2 f}{ ∂y^2} (x, y)− \left (\dfrac{∂^2 f}{∂y∂x} (x, y)\right )^2$

Newton’s algorithm: Pick an initial point $$(x_0 , y_0)$$. For $$n$$ = 0,1,2,3,..., define:

$x_{n+1}=x_n - \dfrac{\begin{vmatrix} \dfrac{∂^2 f}{ ∂y^2} (x_n, y_n) & \dfrac{∂^2 f}{∂x∂y} (x_n, y_n) \\ \dfrac{∂f}{∂y} (x_n, y_n) & \dfrac{∂f}{∂x} (x_n, y_n)\\ \end{vmatrix}}{D(x_n, y_n)},\quad y_{n+1}=y_n - \dfrac{\begin{vmatrix} \dfrac{∂^2 f}{ ∂x^2} (x_n, y_n) & \dfrac{∂^2 f}{∂x∂y} (x_n, y_n) \\ \dfrac{∂f}{∂x} (x_n, y_n) & \dfrac{∂f}{∂y} (x_n, y_n)\\ \end{vmatrix}}{D(x_n, y_n)}\label{Eq2.14}$

Then the sequence of points $$(x_n, y_n)_{n=1}^{\infty}$$ converges to a critical point. If there are several critical points, then you will have to try different initial points to find them.

Example 2.23

Find all local maxima and minima of $$f (x, y) = x^3 − x y− x+ x y^3 − y^4$$.

Solution

First calculate the necessary partial derivatives:

$\nonumber \dfrac{∂f}{∂x} = 3x^2 − y−1+ y^3 , \quad \dfrac{∂f}{∂y} = −x+3x y^2 −4y^3$

$\nonumber \dfrac{∂^2 f}{ ∂x^2} = 6x ,\quad \dfrac{∂^2 f}{ ∂y^2} = 6x y−12y^2 ,\quad \dfrac{∂^2 f}{∂y∂x} = −1+3y^2$

Notice that solving $$\nabla f = \textbf{0}$$ would involve solving two third-degree polynomial equations in $$x \text{ and }y$$, which in this case can not be done easily.

We need to pick an initial point $$(x_0 , y_0)$$ for our algorithm. Looking at the graph of $$z = f (x, y)$$ over a large region may help (see Figure 2.6.1 below), though it may be hard to tell where the critical points are.

Figure 2.6.1 $$f (x, y) = x^ 3 − x y− x+ x y^3 − y^ 4 \text{ for }−20 ≤ x ≤ 20 \text{ and }−20 ≤ y ≤ 20$$

Notice in the Formulas Equation \ref{Eq2.14} that we divide by $$D$$, so we should pick an initial point where $$D$$ is not zero. And we can see that $$D(0,0) = (0)(0)−(−1)2 = −1 \neq 0$$, so take $$(0,0)$$ as our initial point. Since it may take a large number of iterations of Newton’s algorithm to be sure that we are close enough to the actual critical point, and since the computations are quite tedious, we will let a computer do the computing. For this, we will write a simple program, using the Java programming language, which will take a given initial point as a parameter and then perform 100 iterations of Newton’s algorithm. In each iteration the new point will be printed, so that we can see if there is convergence. The full code is shown in Listing 2.1.

Listing 2.1 Program listing for newton.java

To use this program, you should first save the code in Listing 2.1 in a plain text file called newton.java. You will need the Java Development Kit to compile the code. In the directory where newton.java is saved, run this command at a command prompt to compile the code: javac newton.java

Then run the program with the initial point (0,0) with this command:

java newton 0 0

Below is the output of the program using (0,0) as the initial point, truncated to show the first 10 lines and the last 5 lines:

As you can see, we appear to have converged fairly quickly (after only 8 iterations) to what appears to be an actual critical point (up to Java’s level of precision), namely the point (0.4711356343449874,−0.39636433796318005). It is easy to confirm that $$∇f = 0$$ at this point, either by evaluating $$\dfrac{∂f}{ ∂x} \text{ and }\dfrac{∂f}{ ∂y}$$ at the point ourselves or by modifying our program to also print the values of the partial derivatives at the point. It turns out that both partial derivatives are indeed close enough to zero to be considered zero:

$\nonumber \dfrac{∂f}{ ∂x} (0.4711356343449874,−0.39636433796318005) = 4.85722573273506×10^{−17}$

$\nonumber \dfrac{∂f}{ ∂y} (0.4711356343449874,−0.39636433796318005) = −8.326672684688674×10^{−17}$

We also have $$D(0.4711356343449874,−0.39636433796318005) = −8.776075636032301 < 0$$, so by Theorem 2.6 we know that (0.4711356343449874,−0.39636433796318005) is a saddle point.

Since $$∇f$$ consists of cubic polynomials, it seems likely that there may be three critical points. The computer program makes experimenting with other initial points easy, and trying different values does indeed lead to different sequences which converge:

Again, it is easy to confirm that both $$\dfrac{∂f}{ ∂x} \text{ and }\dfrac{∂f}{ ∂y}$$ vanish at the point $$(−0.6703832459238667,0.42501465652420045)$$, which means it is a critical point. And

$\nonumber D(−0.6703832459238667,0.42501465652420045) = 15.3853578526055 > 0$

$\nonumber \dfrac{∂^ 2 f}{ ∂x^ 2} (−0.6703832459238667,0.42501465652420045) = −4.0222994755432 < 0$

so we know that $$(−0.6703832459238667,0.42501465652420045)$$ is a local maximum. An idea of what the graph of f looks like near that point is shown in Figure 2.6.2, which does suggest a local maximum around that point.

Finally, running the computer program with the initial point $$(−5,−5)$$ yields the critical point $$(−7.540962756992551,−5.595509445899435), \text{ with }D < 0$$ at that point, which makes it a saddle point.

Figure 2.6.2 $$f (x, y) = x^ 3 − x y− x+ x y^3 − y^ 4\text{ for }−1 ≤ x ≤ 0 \text{ and }0 ≤ y ≤ 1$$

The derivation of Newton’s algorithm, and the proof that it converges (given a “reasonable” choice for the initial point) requires techniques beyond the scope of this text. See RALSTON and RABINOWITZ for more detail and for discussion of other numerical methods. Our description of Newton’s algorithm is the special two-variable case of a more general algorithm that can be applied to functions of $$n \ge 2$$ variables.

In the case of functions which have a global maximum or minimum, Newton’s algorithm can be used to find those points. In general, global maxima and minima tend to be more interesting than local versions, at least in practical applications. A maximization problem can always be turned into a minimization problem (why?), so a large number of methods have been developed to find the global minimum of functions of any number of variables. This field of study is called nonlinear programming. Many of these methods are based on the steepest descent technique, which is based on an idea that we discussed in Section 2.4. Recall that the negative gradient $$- \nabla f$$ gives the direction of the fastest rate of decrease of a function $$f$$ . The crux of the steepest descent idea, then, is that starting from some initial point, you move a certain amount in the direction of $$-\nabla f$$ at that point. Wherever that takes you becomes your new point, and you then just keep repeating that procedure until eventually (hopefully) you reach the point where f has its smallest value. There is a “pure” steepest descent method, and a multitude of variations on it that improve the rate of convergence, ease of calculation, etc. In fact, Newton’s algorithm can be interpreted as a modified steepest descent method. For more discussion of this, and of nonlinear programming in general, see BAZARAA, SHERALI and SHETTY.