C - Root Finding

Last updated
Save as PDF

Page ID: 89646

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

To this point you have found solutions to equations almost exclusively by algebraic manipulation. This is possible only for the artificially simple equations of problem sets and tests. In the “real world” it is very common to encounter equations that cannot be solved by algebraic manipulation. For example, you found, by completing a square, that the solutions to the quadratic equation \(ax^2+bx+c=0\) are \(x=\big(-b\pm\sqrt{b^2-4ac}\big)/2a\text{.}\) But it is known that there simply does not exist a corresponding formula for the roots of a general polynomial of degree five or more. Fortunately, encountering such an equation is not the end of the world, because usually one does not need to know the solutions exactly. One only needs to know them to within some specified degree of accuracy. For example, one rarely needs to know \(\pi\) to more than a few decimal places. There is a whole subject, called numerical analysis, that concerns using algorithms to solve equations (and perform other tasks) approximately, to any desired degree of accuracy.

We have already had, in Examples 1.6.14 and 1.6.15, and the lead up to them, a really quick introduction to the bisection method, which is a crude, but effective, algorithm for finding approximate solutions to equations of the form \(f(x)=0\text{.}\) We shall shortly use a little calculus to derive a very efficient algorithm for finding approximate solutions to such equations. But first here is a simple example which provides a review of some of the basic ideas of root finding and the bisection method.

Example C.0.1 Bisection method

Suppose that we are given some function \(f(x)\) and we have to find solutions to the equation \(f(x)=0\text{.}\) To be concrete, suppose that \(f(x) = 8x^3+12x^2+6x-15\text{.}\) How do we go about solving \(f(x)=0\text{?}\) To get a rough idea of the lay of the land, sketch the graph of \(f(x)\text{.}\) First observe that

when \(x\) is very large and negative, \(f(x)\) is very large and negative
when \(x\) is very large and positive, \(f(x)\) is very large and positive
when \(x=0\text{,}\) \(f(x) =f(0) = -15\lt 0\)
when \(x=1\text{,}\) \(f(x) =f(1) = 11\gt 0\)
\(f'(x) = 24x^2+24x+6 = 24\big(x^2+x+\frac{1}{4}\big) =24\big(x+\frac{1}{2}\big)^2\ge 0\) for all \(x\text{.}\) So \(f(x)\) increases monotonically with \(x\text{.}\) The graph has a tangent of slope \(0\) at \(x=-\frac{1}{2}\) and tangents of strictly positive slope everywhere else.

This tells us that the graph of \(f(x)\) looks like

Since \(f(x)\) strictly increases¹ as \(x\) increases, \(f(x)\) can take the value zero for at most one value of \(x\text{.}\)

Since \(f(0)\lt 0\) and \(f(1)\gt 0\) and \(f\) is continuous, \(f(x)\) must pass through \(0\) as \(x\) travels from \(x=0\) to \(x=1\text{,}\) by Theorem 1.6.12 (the intermediate value theorem). So \(f(x)\) takes the value zero for some \(x\) between \(0\) and \(1\text{.}\) We will often write this as “the root is \(x=0.5\pm 0.5\)” to indicate the uncertainty.
To get closer to the root, we evaluate \(f(x)\) halfway between \(0\) and \(1\text{.}\)
\[ f\big(\tfrac{1}{2}\big) = 8\big(\tfrac{1}{2}\big)^3+12\big(\tfrac{1}{2}\big)^2 +6\big(\tfrac{1}{2}\big)-15 = -8 \nonumber \]
Since \(f\big(\frac{1}{2}\big)\lt 0\) and \(f(1)\gt 0\) and \(f\) is continuous, \(f(x)\) must take the value zero for some \(x\) between \(\frac{1}{2}\) and \(1\text{.}\) The root is \(0.75\pm 0.25\text{.}\)
To get still closer to the root, we evaluate \(f(x)\) halfway between \(\frac{1}{2}\) and \(1\text{.}\)
\[ f\big(\tfrac{3}{4}\big) = 8\big(\tfrac{3}{4}\big)^3+12\big(\tfrac{3}{4}\big)^2 +6\big(\tfrac{3}{4}\big)-15 = -\tfrac{3}{8} \nonumber \]
Since \(f\big(\frac{3}{4}\big)\lt 0\) and \(f(1)\gt 0\) and \(f\) is continuous, \(f(x)\) must take the value zero for some \(x\) between \(\frac{3}{4}\) and \(1\text{.}\) The root is \(0.875\pm 0.125\text{.}\)
And so on.

The root finding strategy used in Example C.0.1 is called the bisection method. The bisection method will home in on a root of the function \(f(x)\) whenever

\(f(x)\) is continuous (\(f(x)\) need not have a derivative) and
you can find two numbers \(a_1\lt b_1\) with \(f(a_1)\) and \(f(b_1)\) being of opposite sign.

Denote by \(I_1\) the interval \([a_1,b_1]=\big\{x\ \big|\ a_1\le x\le b_1\big\}\text{.}\) Once you have found the interval \(I_1\text{,}\) the bisection method generates a sequence \(I_1\text{,}\) \(I_2\text{,}\) \(I_3\text{,}\) \(\cdots\) of intervals by the following rule.

Equation C.0.2 (bisection method)

Denote by \(c_n=\frac{a_n+b_n}{2}\) the midpoint of the interval \(I_n=[a_n,b_n]\text{.}\) If \(f(c_n)\) has the same sign as \(f(a_n)\text{,}\) then

\[ I_{n+1}=[a_{n+1},b_{n+1}]\quad\text{with}\quad a_{n+1}=c_n,\ b_{n+1}=b_n \nonumber \]

and if \(f(c_n)\) and \(f(a_n)\) have opposite signs, then

\[ I_{n+1}=[a_{n+1},b_{n+1}]\quad\text{with}\quad a_{n+1}=a_n,\ b_{n+1}=c_n \nonumber \]

This rule was chosen so that \(f(a_n)\) and \(f(b_n)\) have opposite sign for every \(n\text{.}\) Since \(f(x)\) is continuous, \(f(x)\) has a zero in each interval \(I_n\text{.}\) Thus each step reduces the error bars by a factor of \(2\text{.}\) That isn't too bad, but we can come up with something that is much more efficient. We just need a little calculus.

Search

Text Color

Text Size

Margin Size

Font Type

Example C.0.1 Bisection method