D.3: Variable Step Size Methods
- Page ID
- 91843
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)We now introduce a family of procedures that decide by themselves what step size to use. In all of these procedures the user specifies an acceptable error rate and the procedure attempts to adjust the step size so that each step introduces error at no more than that rate. That way the procedure uses a small step size when it is hard to get an accurate approximation, and a large step size when it is easy to get a good approximation.
Suppose that we wish to generate an approximation to the initial value problem
\[ y'=f(t,y),\qquad y(t_0)=y_0 \nonumber \]
for some range of \(t\)'s and we want the error introduced per unit increase 1 We know that the error will get larger the further we go in \(t\text{.}\) So it makes sense to try to limit the error per unit increase in \(t\text{.}\) of \(t\) to be no more than about some small fixed number \(\varepsilon\text{.}\) This means that if \(y_n\approx y(t_0+nh)\) and \(y_{n+1}\approx y(t+(n+1)h)\text{,}\) then we want the local truncation error in the step from \(y_n\) to \(y_{n+1}\) to be no more than about \(\varepsilon h\text{.}\) Suppose further that we have already produced the approximate solution as far as \(t_n\text{.}\) The rough strategy is as follows. We do the step from \(t_n\) to \(t_n+h\) twice using two different algorithms, giving two different approximations to \(y(t_{n+1})\text{,}\) that we call \(A_{1,n+1}\) and \(A_{2,n+1}\text{.}\) The two algorithms are chosen so that
- we can use \(A_{1,n+1}-A_{2,n+1}\) to compute an approximate local truncation error and
- for efficiency, the two algorithms use almost the same evaluations of \(f\text{.}\) Remember that evaluating the function \(f\) is typically the most time-consuming part of our computation.
In the event that the local truncation error, divided by \(h\text{,}\) (i.e. the error per unit increase of \(t\)) is smaller than \(\varepsilon\text{,}\) we set \(t_{n+1}=t_n+h\text{,}\) accept \(A_{2,n+1}\) as the approximate value 2 Better still, accept \(A_{2,n+1}\) minus the computed approximate error in \(A_{2,n+1}\) as the approximate value for \(y(t_{n+1})\text{.}\) for \(y(t_{n+1})\text{,}\) and move on to the next step. Otherwise we pick, using what we have learned from \(A_{1,n+1}-A_{2,n+1}\text{,}\) a new trial step size \(h\) and start over again at \(t_n\text{.}\)
Now for the details. We start with a very simple procedure. We will later soup it up to get a much more efficient procedure.
Euler and Euler-2step (preliminary version)
Denote by \(\phi(t)\) the exact solution to \(y'=f(t,y)\) that satisfies the initial condition \(\phi(t_n)=y_n\text{.}\) If we apply one step of Euler with step size \(h\text{,}\) giving
\[ A_{1,n+1}=y_n+hf(t_n,y_n) \nonumber \]
we know, from (D.2.4), that
\[ A_{1,n+1}=\phi(t_n+h)+Kh^2+O(h^3) \nonumber \]
The problem, of course, is that we don't know what the error is, even approximately, because we don't know what the constant \(K\) is. But we can estimate \(K\) simply by redoing the step from \(t_n\) to \(t_n+h\) using a judiciously chosen second algorithm. There are a number of different second algorithms that will work. We call the simple algorithm that we use in this subsection Euler-2step 3 This name is begging for a dance related footnote and we invite the reader to supply their own.. One step of Euler-2step with step size \(h\) just consists of doing two steps of Euler with step size \(h/2\text{:}\)
\[ A_{2,n+1} = y_n+\tfrac{h}{2}f(t_n,y_n) +\tfrac{h}{2}f\big(t_n+\tfrac{h}{2},y_n+\tfrac{h}{2}f(t_n,y_n)\big) \nonumber \]
Here, the first half-step took us from \(y_n\) to \(y_{\rm mid}=y_n+\frac{h}{2}f(t_n,y_n)\) and the second half-step took us from \(y_{\rm mid}\) to \(y_{\rm mid}+\frac{h}{2}f\big(t_n+\frac{h}{2},y_{\rm mid}\big)\text{.}\) The local truncation error introduced in the first half-step is \(K(h/2)^2+O(h^3)\text{.}\) That for the second half-step is \(K(h/2)^2+O(h^3)\) with the same 4 Because the two half-steps start at values of \(t\) only \(h/2\) apart, and we are thinking of \(h\) as being very small, it should not be surprising that we can use the same value of \(K\) in both. In case you don't believe us, we have included a derivation of the local truncation error for Euler-2step later in this appendix. \(K\text{,}\) though with a different \(O(h^3)\text{.}\) All together
\begin{align*} A_{2,n+1}&=\phi(t_n+h)+\big[ K\big(\tfrac{h}{2}\big)^2+O(h^3)\big] + \big[K\big(\tfrac{h}{2}\big)^2+O(h^3)\big] \\ &=\phi(t_n+h)+\half Kh^2+O(h^3) \end{align*}
The difference is 5
\begin{align*} A_{1,n+1}-A_{2,n+1}&=\big[\phi(t_n+h)+Kh^2+O(h^3)\big] -\big[\phi(t_n+h)-\half Kh^2-O(h^3)\big] \\ &=\half Kh^2+O(h^3) \end{align*}
So if we do one step of both Euler and Euler-2step, we can estimate
\[ \half Kh^2=A_{1,n+1}-A_{2,n+1}+O(h^3) \nonumber \]
We now know that in the step just completed Euler-2step introduced an error of about \(\half Kh^2\approx A_{1,n+1}-A_{2,n+1}\text{.}\) That is, the current error rate is about \(r=\frac{|A_{1,n+1}-A_{2,n+1}|}{h}\approx\half |K| h\) per unit increase of \(t\text{.}\)
- If this \(r=\frac{|A_{1,n+1}-A_{2,n+1}|}{h}>\varepsilon\text{,}\) we reject 6 The measured error rate, \(r\text{,}\) is bigger than the desired error rate \(\varepsilon\text{.}\) That means that it is harder to get the accuracy we want than we thought. So we have to take smaller steps. \(A_{2,n+1}\) and repeat the current step with a new trial step size chosen so that \(\half |K|(\text{new }h)\lt\varepsilon\text{,}\) i.e. \(\frac{r}{h}(\text{new }h)\lt\varepsilon\text{.}\) To give ourselves a small safety margin, we could use 7 We don't want to make the new \(h\) too close to \(\frac{\varepsilon}{r}{h}\) since we are only estimating things and we might end up with an error rate bigger that \(\varepsilon\text{.}\) On the other hand, we don't want to make the new \(h\) too small because that means too much work — so we choose it to be just a little smaller than \(\frac{\varepsilon}{r}{h}\) \(\ldots\) say \(0.9\frac{\varepsilon}{r}{h}\).
\[ \text{new }h=0.9\,\frac{\varepsilon}{r}\,h \nonumber \]
- If \(r=\frac{|A_{1,n+1}-A_{2,n+1}|}{h}\lt\varepsilon\) we can accept 8 The measured error rate, \(r\text{,}\) is smaller than the desired error rate \(\varepsilon\text{.}\) That means that it is easier to get the accuracy we want than we thought. So we can make the next step larger. \(A_{2,n+1}\) as an approximate value for \(y(t_{n+1})\text{,}\) with \(t_{n+1}=t_n+h\text{,}\) and move on to the next step, starting with the new trial step size 9 Note that in this case \(\frac{\varepsilon}{r}>1\text{.}\) So the new \(h\) can be bigger than the last \(h\text{.}\)
\[ \text{new } h=0.9\,\frac{\varepsilon}{r}\,h \nonumber \]
That is our preliminary version of the Euler/Euler-2step variable step size method. We call it the preliminary version, because we will shortly tweak it to get a much more efficient procedure.
As a concrete example, suppose that our problem is
\[ y(0)=e^{-2},\ y'=8(1-2t)y,\ \varepsilon=0.1 \nonumber \]
and that we have gotten as far as
\[ t_n=0.33,\ y_n=0.75,\ \ \text{trial }h=0.094 \nonumber \]
Then, using \(E=|A_{1,n+1}-A_{2,n+1}|\) to denote the magnitude of the estimated local truncation error in \(A_{2,n+1}\) and \(r\) the corresponding error rate
\begin{align*} f(t_n,y_n)&=8(1-2\times 0.33) 0.75=2.04 \\ A_{1,n+1}&=y_n+hf(t_n,y_n)=0.75+0.094\times 2.04=0.942 \\ y_{\rm mid}&=y_n+\tfrac{h}{2}f(t_n,y_n) =0.75+\tfrac{0.094}{2}\times 2.04=0.846 \\ f\big(t_n+\tfrac{h}{2},y_{\rm mid}\big) &=8\Big[1-2\big(0.33+\tfrac{0.094}{2}\big)\Big]0.846=1.66 \\ A_{2,n+1}&=y_{\rm mid}+\tfrac{h}{2}f(t_n+\tfrac{h}{2},y_{\rm mid}) =0.846+\tfrac{0.094}{2}1.66=0.924\\ E&=|A_{1,n+1}-A_{2,n+1}|=|0.942-0.924|=0.018\\ r&=\frac{|E|}{h}=\frac{0.018}{0.094}=0.19 \end{align*}
Since \(r=0.19>\varepsilon=0.1\,\text{,}\) the current step size is unacceptable and we have to recompute with the new step size
\[ \text{new } h=0.9\frac{\varepsilon}{r}(\text{old }h) =0.9\ \frac{0.1}{0.19}\ 0.094 =0.045 \nonumber \]
to give
\begin{align*} f(t_n,y_n)&=8(1-2\times0.33)0.75=2.04 \\ A_{1,n+1}&=y_n+hf(t_n,y_n)=0.75+0.045\times 2.04=0.842 \\ y_{\rm mid}&=y_n+\tfrac{h}{2}f(t_n,y_n) =0.75+\tfrac{0.045}{2}\times 2.04=0.796 \\ f\big(t_n+\tfrac{h}{2},y_{\rm mid}\big) &=8\Big[1-2\big(0.33+\tfrac{0.045}{2}\big)\Big]0.796=1.88 \\ A_{2,n+1}&=y_{\text{mid}} +\tfrac{h}{2}f(t_n +\tfrac{h}{2},y_{\rm mid}) =0.796+\tfrac{0.045}{2}1.88 =0.838 \\ E&=A_{1.n+1}-A_{2.n+1}=0.842-0.838=0.004 \\ r&=\frac{|E|}{h}=\frac{0.004}{0.045}=0.09 \end{align*}
This time \(\,r=0.09\lt \varepsilon=0.1\,\text{,}\) is acceptable so we set \(t_{n+1}=0.33+0.045=0.375\) and
\[ y_{n+1}=A_{2,n+1}=0.838 \nonumber \]
The initial trial step size from \(t_{n+1}\) to \(t_{n+2}\) is
\[ \text{new }h = 0.9\frac{\varepsilon}{r}(\text{old }h) =0.9\,\frac{0.1}{0.09}\,.045=.045 \nonumber \]
By a fluke, it has turned out that the new \(h\) is the same as the old \(h\) (to three decimal places). If \(r\) had been significantly smaller than \(\varepsilon\text{,}\) then the new \(h\) would have been signficantly bigger than the old \(h\) - indicating that it is (relatively) easy to estimate things in this region, making a larger step size sufficient.
As we said above, we will shortly upgrade the above variable step size method, that we are calling the preliminary version of the Euler/Euler-2step method, to get a much more efficient procedure. Before we do so, let's pause to investigate a little how well our preliminary procedure does at controlling the rate of error production.
We have been referring, loosely, to \(\varepsilon\) as the desired rate for introduction of error, by our variable step size method, as \(t\) advances. If the rate of increase of error were exactly \(\varepsilon\text{,}\) then at final time \(t_f\) the accumulated error would be exactly \(\varepsilon(t_f-t_0)\text{.}\) But our algorithm actually chooses the step size \(h\) for each step so that the estimated local truncation error in \(A_{2,n+1}\) for that step is about \(\varepsilon h\text{.}\) We have seen that, once some local truncation error has been introduced, its contribution to the global truncation error can grow exponentially with \(t_f\text{.}\)
Here are the results of a numerical experiment that illustrate this effect. In this experiment, the above preliminary Euler/Euler-2step method is applied to the initial value problem \(\ y'=t-2y,\ y(0)=3 \ \) for \(\varepsilon=\frac{1}{16},\frac{1}{32},\cdots\) (ten different values) and for \(t_f=0.2,\ 0.4,\ \cdots,\ 3.8\text{.}\) Here is a plot of the resulting \(\frac{\text{actual error at }t=t_f}{\varepsilon t_f}\) against \(t_f\text{.}\)
If the rate of introduction of error were exactly \(\varepsilon\text{,}\) we would have \(\frac{\text{actual error at }t=t_f}{\varepsilon t_f}=1\text{.}\) There is a small square on the graph for each different pair \(\varepsilon,t_f\text{.}\) So for each value of \(t_f\) there are ten (possibly overlapping) squares on the line \(x=t_f\text{.}\) This numerical experiment suggests that \(\frac{\text{actual error at }t=t_f}{\varepsilon t_f}\) is relatively independent of \(\varepsilon\) and starts, when \(t_f\) is small, at about one, as we want, but grows (perhaps exponentially) with \(t_f\text{.}\)
Euler and Euler-2step (final version)
We are now ready to use a sneaky bit of arithemtic to supercharge our Euler/Euler-2step method. As in our development of the preliminary version of the method, denote by \(\phi(t)\) the exact solution to \(y'=f(t,y)\) that satisfies the initial condition \(\phi(t_n)=y_n\text{.}\) We have seen, at the beginning of §D.3.1, that applying one step of Euler with step size \(h\text{,}\) gives
\begin{align*} A_{1,n+1}&=y_n+hf(t_n,y_n) \notag\\ &=\phi(t_n+h)+Kh^2+O(h^3) \tag{E5} \end{align*}
and applying one step of Euler-2step with step size \(h\) (i.e. applying two steps of Euler with step size \(h/2\)) gives
\begin{align*} A_{2,n+1} &= y_n+\tfrac{h}{2}f(t_n,y_n) +\tfrac{h}{2}f\big(t_n+\tfrac{h}{2}\,,\,y_n+\tfrac{h}{2}f(t_n,y_n)\big)\notag\\ &=\phi(t_n+h)+\tfrac{1}{2} Kh^2+O(h^3) \tag{E6} \end{align*}
because the local truncation error introduced in the first half-step was \(K(h/2)^2+O(h^3)\) and that introduced in the second half-step was \(K(h/2)^2+O(h^3)\text{.}\) Now here is the sneaky bit. Equations (E5) and (E6) are very similar and we can eliminate all \(Kh^2\)'s by subtracting (E5) from \(2\) times (E6). This gives
\[ 2\text{(E6)} - \text{(E5):}\qquad 2A_{2,n+1}-A_{1,n+1} = \phi(t_n+h) +O(h^3) \nonumber \]
(no more \(h^2\) term!) or
\[ \phi(t_n+h)= 2A_{2,n+1}-A_{1,n+1}+O(h^3) \tag{E7} \nonumber \]
which tells us that choosing
\[ y_{n+1}=2A_{2,n+1}-A_{1,n+1} \tag{E8} \nonumber \]
would give a local truncation error of order \(h^3\text{,}\) rather than the order \(h^2\) of the preliminary Euler/Euler-2step method. To convert the preliminary version to the final version, we just replace \(y_{n+1}=A_{2,n+1}\) by \(y_{n+1} = 2A_{2,n+1}-A_{1,n+1}\text{:}\)
Given \(\varepsilon>0\text{,}\) \(t_n\text{,}\) \(y_n\) and the current step size \(h\)
- compute
\begin{align*} A_{1,n+1}&=y_n+hf(t_n,y_n) \\ A_{2,n+1} &= y_n+\tfrac{h}{2}f(t_n,y_n) +\tfrac{h}{2}f\big(t_n+\tfrac{h}{2}\,,\,y_n+\tfrac{h}{2}f(t_n,y_n)\big)\\ r&=\frac{|A_{1,n+1}-A_{2,n+1}|}{h} \end{align*}
- If \(r>\varepsilon\text{,}\) repeat the first bullet but with the new step size
\[ (\text{new }h)=0.9\,\frac{\varepsilon}{r}\,(\text{old }h) \nonumber \]
- If \(r\lt\varepsilon\) set
\begin{align*} t_{n+1}&=t_n+h \\ y_{n+1}&=2A_{2,n+1}-A_{1,n+1}\quad\text{and the new trial step size} \\ (\text{new } h)&=0.9\,\frac{\varepsilon}{r}\,(\text{old }h) \end{align*}
and move on to the next step. Note that since \(r\lt\varepsilon\text{,}\) \(\frac{r}{\varepsilon}h\gt h\) which indicates that the new \(h\) can be larger than the old \(h\text{.}\) We include the \(0.9\) to be careful not to make the error of the next step too big.
Let's think a bit about how our final Euler/Euler-2step method should perform.
- The step size here, as in the preliminary version, is chosen so that the local truncation error in \(A_{2,n+1}\) per unit increase of \(t\text{,}\) namely \(r=\frac{|A_{1,n+1}-A_{2,n+1}|}{h}\approx\frac{Kh^2/2}{h}=\frac{K}{2}h\text{,}\) is approximately \(\varepsilon\text{.}\) So \(h\) is roughly proportional to \(\varepsilon\text{.}\)
- On the other hand, (E7) shows that, in the full method, local truncation error is being added to \(y_{n+1}\) at a rate of \(\frac{O(h^3)}{h}=O(h^2)\) per unit increase in \(t\text{.}\)
- So one would expect that local truncation increases the error at a rate proportional to \(\varepsilon^2\) per unit increase in \(t\text{.}\)
- If the rate of increase of error were exactly a constant time \(\varepsilon^2\text{,}\) then the error accumulated between the initial time \(t=0\) and the final time \(t=t_f\) would be exactly a constant times \(\varepsilon^2\,t_f\text{.}\)
- However we have seen that, once some local truncation error has been introduced, its contribution to the global error can grow exponentially with \(t_f\text{.}\) So we would expect that, under the full Euler/Euler-2step method, \(\frac{{\rm actual\ error\ at\ }t=t_f}{\varepsilon^2 t_f}\) to be more or less independent of \(\varepsilon\text{,}\) but still growing exponentially in \(t_f\text{.}\)
Here are the results of a numerical experiment that illustrate this. In this experiment, the above final Euler/Euler-2step method, (D.3.2), is applied to the initial value problem \(\ y'=t-2y,\ y(0)=3 \ \) for \(\varepsilon=\frac{1}{16},\frac{1}{32},\cdots\) (ten different values) and for \(t_f=0.2,\ 0.4,\ \cdots,\ 3.8\text{.}\) In the following plot, there is a small square for the resulting \(\frac{\text{actual error at }t=t_f}{\varepsilon^2 t_f}\) for each different pair \(\varepsilon,t_f\text{.}\)
It does indeed look like \(\frac{{\rm actual\ error\ at\ }t=t_f}{\varepsilon^2 t_f}\) is relatively independent of \(\varepsilon\) but grows (perhaps exponentially) with \(t_f\text{.}\) Note that \(\frac{{\rm actual\ error\ at\ }t=t_f}{\varepsilon^2 t_f}\) contains a factor of \(\varepsilon^2\) in the denominator. The actual error rate \(\frac{{\rm actual\ error\ at\ }t=t_f}{t_f}\) is much smaller than is suggested by the graph.
Fehlberg's Method
Of course, in practice more efficient and more accurate methods 10 There are a very large number of such methods. We will only look briefly at a couple of the simpler ones. The interested reader can find more by search engining for such keywords as “Runge-Kutta methods” and “adaptive step size”. than Euler and Euler-2step are used. Fehlberg's method 11 E. Fehlberg, NASA Technical Report R315 (1969) and NASA Technical Report R287 (1968). uses improved Euler and a second more accurate method. Each step involves three calculations of \(f\text{:}\)
\begin{align*} f_{1,n}&=f(t_n,y_n) \\ f_{2,n}&=f(t_n+h,y_n+hf_{1,n}) \\ f_{3,n}&=f\left(t_n+\tfrac{h}{2},y_n+\tfrac{h}{4}[f_{1,n}+f_{2,n}]\right) \end{align*}
Once these three evaluations have been made, the method generates two approximations for \(y(t_n+h)\text{:}\)
\begin{align*} A_{1,n+1}&=y_n+\tfrac{h}{2}\left[f_{1,n}+f_{2,n}\right] \\ A_{2,n+1}&=y_n+\tfrac{h}{6}\left[f_{1,n}+f_{2,n}+4f_{3,n}\right] \end{align*}
Denote by \(\phi(t)\) the exact solution to \(y'=f(t,y)\) that satisfies the initial condition \(\phi(t_n)=y_n\text{.}\) Now \(A_{1,n+1}\) is just the \(y_{n+1}\) produced by the improved Euler's method. The local truncation error for the improved Euler's method is of order \(h^3\text{,}\) one power of \(h\) smaller than that for Euler's method. So
\[ A_{1,n+1} = \phi(t_n+h) + Kh^3+O(h^4) \nonumber \]
and it turns out 12 The interested reader can find Fehlberg's original paper online (at NASA!) and follow the derivation. It requires careful Taylor expansions and then clever algebra to cancel the bigger error terms. that
\[ A_{2,n+1} = \phi(t_n+h) + O(h^4) \nonumber \]
So the error in \(A_{1,n+1}\) is
\begin{align*} E&=\big|Kh^3+O(h^4)\big| =\big|A_{1,n+1}-\phi(t_n+h)\big|+O(h^4) \\ &=\big|A_{1,n+1}-A_{2,n+1}\big|+O(h^4) \end{align*}
and our estimate for rate at which error is being introduced into \(A_{1,n+1}\) is
\[ r=\frac{|A_{1,n+1}-A_{2,n+1}|}{h}\approx |K|h^2 \nonumber \]
per unit increase of \(t\text{.}\)
- If \(r>\varepsilon\) we redo this step with a new trial step size chosen so that \(|K|(\text{new }h)^2\lt\varepsilon\text{,}\) i.e. \(\frac{r}{h^2}(\text{new }h)^2\lt\varepsilon\text{.}\) With our traditional safety factor, we take
\[ \text{new }h=0.9\sqrt{\frac{\varepsilon}{r}}\,h\qquad \text{(the new }h\text{ is smaller)} \nonumber \]
- If \(r\le \varepsilon\) we set \(t_{n+1}=t_n+h\) and \(y_{n+1}=A_{2,n+1}\) (since \(A_{2,n+1}\) should be considerably more accurate than \(A_{1,n+1}\)) and move on to the next step with trial step size
\[ \text{new }h=0.9\sqrt{\frac{\varepsilon}{r}}\,h\qquad \text{(the new }h\text{ is usually bigger)} \nonumber \]
The Kutta-Merson Process
The Kutta-Merson process 13 R.H. Merson, ``An operational method for the study of integration processes'' , Proc. Symp. Data Processing , Weapons Res. Establ. Salisbury , Salisbury (1957) pp. 110–125. uses two variations of the Runge-Kutta method. Each step involves five calculations 14 Like the other methods described above, the coefficients \(1/3\text{,}\) \(1/6\text{,}\) \(1/8\) etc. are chosen so as to cancel larger error terms. While determining the correct choice of coefficients is not conceptually difficult, it does take some work and is beyond the scope of this appendix. The interested reader should search-engine their way to a discussion of adaptive Runge-Kutta methods. of \(f\text{:}\)
\begin{align*} k_{1,n}&=f(t_n,y_n) \\ k_{2,n}&=f\big(t_n+\tfrac{1}{3}h,y_n+\tfrac{1}{3}hk_{1,n}\big)\\ k_{3,n}&= f\big(t_n+\tfrac{1}{3}h,y_n+\tfrac{1}{6}hk_{1,n}+\tfrac{1}{6}hk_{2,n}\big) \\ k_{4,n}&= f\big(t_n+\tfrac{1}{2}h,y_n+\tfrac{1}{8}hk_{1,n}+\tfrac{3}{8}hk_{3,n}\big) \\ k_{5,n}&= f\big(t_n+h,y_n+\tfrac{1}{2}hk_{1,n}-\tfrac{3}{2}hk_{3,n}+2hk_{4,n}\big) \end{align*}
Once these five evaluations have been made, the process generates two approximations for \(y(t_n+h)\text{:}\)
\begin{align*} A_{1,n+1}&=y_n+h\left[\tfrac{1}{2}k_{1,n}-\tfrac{3}{2}k_{3,n}+2k_{4,n}\right] \\ A_{2,n+1}&=y_n+h\left[\tfrac{1}{6}k_{1,n}+\tfrac{2}{3}k_{4,n} +\tfrac{1}{6}k_{5,n}\right] \end{align*}
The (signed) error in \(A_{1,n+1}\) is \(\frac{1}{120}h^5K+O(h^6)\) while that in \(A_{2,n+1}\) is \(\frac{1}{720}h^5K+O(h^6)\) with the same constant \(K\text{.}\) So \(A_{1,n+1}-A_{2,n+1} = \frac{5}{720}Kh^5+O(h^6)\) and the unknown constant \(K\) can be determined, to within an error \(O(h)\text{,}\) by
\[ K=\frac{720}{5\,h^5}(A_{1,n+1}-A_{2,n+1}) \nonumber \]
and the approximate (signed) error in \(A_{2,n+1}\) and its corresponding rate per unit increase of \(t\) are
\begin{align*} E&=\frac{1}{720}K h^5=\frac{1}{5}(A_{1,n+1}-A_{2,n+1}) \\ r=\frac{|E|}{h}&=\frac{1}{720}|K| h^4=\frac{1}{5\,h}\big|A_{1,n+1}-A_{2,n+1}\big| \end{align*}
- If \(r>\varepsilon\) we redo this step with a new trial step size chosen so that \(\frac{1}{720}|K|(\text{new h})^4\lt\varepsilon\text{,}\) i.e. \(\frac{r}{h^4}(\text{new }h)^4\lt\varepsilon\text{.}\) With our traditional safety factor, we take
\[ \text{new }h=0.9\left(\frac{\varepsilon}{r}\right)^{1/4}\,h \nonumber \]
- If \(r\le \varepsilon\) we set \(t_{n+1}=t_n+h\) and \(y_{n+1}=A_{2,n+1}-E\) (since \(E\) is our estimate of the signed error in \(A_{2,n+1}\)) and move on to the next step with trial step size
\[ \text{new }h=0.9\left(\frac{\varepsilon}{r}\right)^{1/4}\,h \nonumber \]
The Local Truncation Error for Euler-2step
In our description of Euler/Euler-2step above we simply stated the local truncation error without an explanation. In this section, we show how it may be derived. We note that very similar calculations underpin the other methods we have described.
In this section, we will be using partial derivatives and, in particular, the chain rule for functions of two variables. That material is covered in Chapter 2 of the CLP-3 text. If you are not yet comfortable with it, you can either take our word for those bits, or you can delay reading this section until you have learned a little multivariable calculus.
Recall that, by definition, the local truncation error for an algorithm is the (signed) error generated by a single step of the algorithm, under the assumptions that we start the step with the exact solution and that there is no roundoff error 15 We should note that in serious big numerical computations, one really does have to take rounding errors into account because they can cause serious problems. The interested reader should search-engine their way to the story of Edward Lorenz's numerical simulations and the beginnings of chaos theory. Unfortunately we simply do not have space in this text to discuss all aspects of mathematics.. Denote by \(\phi(t)\) the exact solution to
\begin{align*} y'(t)&=f(t,y) \\ y(t_n)&=y_n \end{align*}
In other words, \(\phi(t)\) obeys
\begin{align*} \phi'(t) &= f\big(t,\phi(t)\big)\qquad\text{ for all }t \\ \phi(t_n)&=y_n \end{align*}
In particular \(\phi'(t_n)=f\big(t_n,\phi(t_n)\big)=f(t_n,y_n)\) and, carefully using the chain rule, which is (2.4.2) in the CLP-3 text,
\begin{align*} \phi''(t_n)&=\dfrac{d}{dt}f\big(t,\phi(t)\big)\Big|_{t=t_n} =\Big[f_t\big(t,\phi(t)\big)+f_y\big(t,\phi(t)\big)\phi'(t)\Big]_{t=t_n} \\ &=f_t(t_n,y_n)+f_y(t_n,y_n)\,f(t_n,y_n) \tag{E9} \end{align*}
Remember that \(f_t\) is the partial derivative of \(f\) with respect to \(t\text{,}\) and that \(f_y\) is the partial derivative of \(f\) with respect to \(y\text{.}\) We'll need (E9) below.
By definition, the local truncation error for Euler is
\[ E_1(h)=\phi(t_n+h)-y_n-hf\big(t_n,y_n\big) \nonumber \]
while that for Euler-2step is
\[ E_2(h)=\phi(t_n+h)-y_n-\tfrac{h}{2}f(t_n,y_n) -\tfrac{h}{2}f\big(t_n+\tfrac{h}{2},y_n+\tfrac{h}{2}f(t_n,y_n)\big) \nonumber \]
To understand how \(E_1(h)\) and \(E_2(h)\) behave for small \(h\) we can use Taylor expansions ((3.4.10) in the CLP-1 text) to write them as power series in \(h\text{.}\) To be precise, we use
\[ g(h)=g(0)+g'(0)\,h+\half g''(0)\,h^2+O(h^3) \nonumber \]
to expand both \(E_1(h)\) and \(E_2(h)\) in powers of \(h\) to order \(h^2\text{.}\) Note that, in the expression for \(E_1(h)\text{,}\) \(t_n\) and \(y_n\) are constants — they do not vary with \(h\text{.}\) So computing derivatives of \(E_1(h)\) with respect to \(h\) is actually quite simple.
\begin{alignat*}{2} E_1(h)&=\phi(t_n+h)-y_n-hf\big(t_n,y_n\big)\qquad & E_1(0)&=\phi(t_n)-y_n=0 \\ E'_1(h)&=\phi'(t_n+h)-f\big(t_n,y_n\big) & E'_1(0)&=\phi'(t_n)-f\big(t_n,y_n\big)=0 \\ E''_1(h)&=\phi''(t_n+h) & E''_1(0)&=\phi''(t_n) \end{alignat*}
By Taylor, the local truncation error for Euler obeys
\[ E_1(h)=\half\phi''(t_n)h^2+O(h^3)=Kh^2+O(h^3)\qquad\text{with } K=\half\phi''(t_n) \nonumber \]
Computing arguments of \(E_2(h)\) with respect to \(h\) is a little harder, since \(h\) now appears in the arguments of the function \(f\text{.}\) As a consequence, we have to include some partial derivatives.
\begin{align*} E_2(h)&=\phi(t_n+h)-y_n-\frac{h}{2}f(t_n,y_n) -\frac{h}{2}f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big) \\ E'_2(h)&=\phi'(t_n+h)-\frac{1}{2}f(t_n,y_n) -\frac{1}{2}f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big) \\ &\hskip2.5in-\frac{h}{2} \underbrace{\dfrac{d}{dh} f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)}_{\text{leave this expression as is for now}} \\ E''_2(h)&=\phi''(t_n+h) -2\times\frac{1}{2} \underbrace{\dfrac{d}{dh}f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)}_{\text{leave this one too}} \\ &\hskip2.5in-\frac{h}{2} \underbrace{\frac{\mathrm{d^2}}{\mathrm{d}h^2}f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)}_{\text{and leave this one too}} \end{align*}
Since we only need \(E_2(h)\) and its derivatives at \(h=0\text{,}\) we don't have to compute the \(\frac{\mathrm{d^2 f}}{\mathrm{d}h^2}\) term (thankfully) and we also do not need to compute the \(\dfrac{df}{dh}\) term in \(E_2'\text{.}\) We do, however, need \(\dfrac{df}{dh}\Big|_{h=0}\) for \(E_2''(0)\text{.}\)
\begin{align*} E_2(0)&=\phi(t_n)-y_n=0 \\ E'_2(0)&=\phi'(t_n)-\frac{1}{2}f(t_n,y_n)-\frac{1}{2}f(t_n,y_n)=0 \\ E''_2(0)&=\phi''(t_n)-\dfrac{d}{dh} f\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)\Big|_{h=0} \\ &=\phi''(t_n)- \frac{1}{2} f_t\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)\Big|_{h=0} \\ &\hskip1.5in -\frac{1}{2} f(t_n,y_n)\, f_y\Big(t_n+\frac{h}{2},y_n+\frac{h}{2}f(t_n,y_n)\Big)\Big|_{h=0} \\ &=\phi''(t_n)- \frac{1}{2}f_t(t_n,y_n)-\frac{1}{2}f_y(t_n,y_n)\, f(t_n,y_n) \\ &=\half\phi''(t_n)\qquad\text{by (E9)} \end{align*}
By Taylor, the local truncation error for Euler-2step obeys
\[ E_2(h)=\frac{1}{4}\phi''(t_n)\,h^2+O(h^3) =\half Kh^2+O(h^3)\qquad{\rm with\ }K=\half\phi''(t_n) \nonumber \]
Observe that the \(K\) in (D.3.4) is identical to the \(K\) in (D.3.3). This is exactly what we needed in our analysis of Sections D.3.1 and D.3.2.