1.1: Introduction
- Page ID
- 108068
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Solving ordinary differential equations
Solving ordinary differential equations (ODEs) is a major focus of numerical computing. We start by considering first-order ODEs of the form
\[\tag{eq:1.1} y' = f(t, y)\]
Where \(y(t)\) is the unknown (i.e. the thing we want to find), and \(t\) is the independent variable. When dealing with such an ODE, we usually think of \(t\) as time, and \(y\) as something depending upon time, like the population of an animal species or the voltage of a node in an electronic circuit. Equation [eq:1.1] says that the way \(y\) changes in time depends upon some function \(f\). In this booklet, we assume the function \(f\) is smooth and well-behaved, and that it is easily calculable in a computer program.
One may write down solutions to [eq:1.1], but to get a unique solution an additional piece of information is required: An initial condition. That is, we need to know something about \(y(t)\) at one point in time so we can select one individual solution from all possible solutions which satisfy [eq:1.1]. One traditionally uses a value for \(y(t)\) at time \(t=0\) as the initial condition, but other choices are also possible (but uncommon).
Here’s a very basic example. Consider the ODE \[\tag{eq:1.2} y' = a y\] You likely encountered this ODE early in your mathematical career, and you know the solution by heart: \[\tag{eq:1.3} y(t) = C e^{a t}\] where \(C\) is some constant. The solution [eq:1.3] obviously satisfies equation [eq:1.2], but it is not useful in practice as the solution to the equation. Why? The reason is that there are an infinite number of possible solutions depending upon the value of \(C\) chosen. That is, [eq:1.3] defines a family of solutions, but in a real, practical problem we only want one of those solutions. For example, how would you make a plot of [eq:1.3]? The answer is, you can’t, until you chose a specific value of \(C\).
In a practical problem, you usually know the value of your function at a start-point in time, almost always at \(t = 0\). Call that value \(y_0\). In this case, the specific solution which satisfies [eq:1.2] and also satisfies \(y(t=0) = y_0\) is \[\tag{eq:1.4} y(t) = y_0 e^{a t}\] The point is that numerically solving an ODE requires two pieces of information: 1. The ODE itself, and 2. An initial condition. The initial value picks one solution out of the entire family of solutions which satisfy [eq:1.2]. The concept is shown in 1.1
Such a problem is called an "initial value problem" (IVP) to distinguish it from other types of differential equations (which we will encounter later in the class). "Initial value problem" simply means we know the beginning (initial) value of \(y\), and want to find its behavior as we move forward in time. Once the equation [eq:1.1] and an initial condition for \(y\) are specified, then we have all the information required to find a unique solution \(y(t)\) numerically.
Methods to find numerical solutions IVPs are the focus of this booklet. As a matter of nomenclature, we generally say "solve the ODE", or "solve the IVP". This means we compute the function \(y(t)\) starting from time \(t=0\) to some point in the future, \(t_{end}\). Another way to say the same thing is we "integrate the ODE", or "integrate the IVP". When speaking of ODEs, "integrate" means the same thing as "solve".
Sampled functions
In mathematics we become familiar with the notion of a function. Consider the simplest case of a function \(y(t)\) which takes a real scalar \(t\) and returns a real scalar value \(y(t)\). This is frequently written formally as \(y:\mathbb{R}\rightarrow \mathbb{R}\). This means, take any arbitrary real number \(t\), plug it into \(y\), and out will come a new real number \(y(t)\). The important idea to glean from this is that the function is like a machine into which one can input any \(t\) and expect to get an output. That is, the function is defined on every real number \(t\).
Unfortunately, when working on real data using a computer this nice abstraction doesn’t always hold. Rather, it is very common for us to have only certain, discrete points \(t_n\) where the function is defined. Digital audio is a simple example: the sounds we hear with our ear are related to the sound pressure waves moving our eardrums. The sound pressure has a value for every point in time. But when processing audio using a computer the sound pressure is sampled at regular, discrete times and turned into numbers representing the sound pressure at each sample time. This process is often referred to as "discretization", meaning that the continuously-defined \(y(t)\) is replaced by a set of discrete samples \(y_n\). This is depicted in 1.2, which shows the relationship of a continuous-time function and its discrete-time (discretized) replacement.
It turns out that the ODE solvers we will study all work with sampled functions. That is, the solvers compute the solution \(y(t)\) at a sequence of discrete time values \(t_n\) similar to the situation shown in 1.2. The output of the solver will be a vector of discrete values \(y_n\) representing samples of the actual, underlying continuous function \(y(t)\).
Here is an implementation hint when you write programs processing sampled functions: The sampled function itself is generally represented by a vector, \(y_n\). This is the object your algorithm will use during its work. On top of \(y_n\) I recommend also carrying around a vector representing the sample times, \(t_n\), in your program. Having your sample times readily accessible can help decrease confusion when you want to plot \(y_n\) vs. time, for example. For moderately sized vectors, the extra memory required to hold \(t_n\) is a small price to pay to keep confusion at a minimum while writing your code.
Differentiation on the computer
Next before diving into solving [eq:1.1] we need to review how differentiation is done on the computer. To start, recall the definition of the derivative of \(y(t)\) which you probably learned in high school: \[\tag{eq:1.5} \frac{d y}{d t} = \lim_{{\Delta t}\to 0} \frac{y(t + \Delta t) - y(t)}{\Delta t}\] In doing numerical computation, we can’t represent the limiting transition \(\lim_{{\Delta t}\to 0}\) – our data is usually discrete (e.g. a sampled function). More importantly, floating point numbers don’t smoothly transition to zero – there is no "infinitesimally small but nonzero" number representable on a computer. Instead, we use the definition [eq:1.5] to form an approximation to the derivative as \[\tag{eq:1.6} \frac{d y}{d t} \approx \frac{y(t + h) - y(t)}{h}\] where \(h\) is a small number. This is called the "forward difference" approximation of the derivative.
First derivatives from Taylor’s series
It’s easy to write down the approximation [eq:1.6] by remembering the high school definition of a derivative. However, we gain some appreciation for the accuracy of approximating derivatives by deriving them using Taylor series expansions. Suppose we know the value of a function \(y\) at the time \(t\) and want to know its value at a later time \(t+h\). How to find \(y(t+h)\)? The Taylor’s series expansion says we can find the value at the later time by \[\tag{eq:1.7} y(t+h) = y(t) + h \frac{d y}{d t} \biggr\rvert_{t} + \frac{h^2}{2} \frac{d^2 y}{d t^2} \biggr\rvert_{t} + \frac{h^3}{6} \frac{d^3 y}{d t^3} \biggr\rvert_{t} + O(h^4)\] The notation \(O(h^4)\) means that there are terms of power \(h^4\) and above in the sum, but we will consider them details which we can ignore.
We can rearrange [eq:1.7] to put the first derivative on the LHS, yielding \[\nonumber \frac{d y}{d t} \biggr\rvert_{t} = \frac{y(t+h) - y(t)}{h} - \frac{h}{2} \frac{d^2 y}{d t^2} \biggr\rvert_{t} - \frac{h^2}{6} \frac{d^3 y}{d t^3} \biggr\rvert_{t} - O(h^4)\] Now we make the assumption that \(h\) is very small, so we can throw away terms of order \(h\) and above from this expression. With that assumption we get \[\nonumber y(t+h) \approx y(t) + h \frac{d y}{d t} \biggr\rvert_{t}\] which can be rearranged to give \[\tag{eq:1.8} \frac{d y}{d t} \approx \frac{y(t + h) - y(t)}{h}\] i.e. the same forward difference expression as found in [eq:1.6] earlier. However, note that to get this approximation we threw away all terms of order \(h\) and above. This implies that when we use this approximation we incur an error! That error will be on the order of \(h\). In the limit \(h \rightarrow 0\) the error disappears but since computers can only use finite \(h\), there will always be an error in our computations, and the error is proportional to \(h\). We express this concept by saying the error scales as \(O(h)\).
Using Taylor’s series expansions we can discover other results. Consider sitting at the time point \(t\) and asking for the value of \(y\) at time \(t - h\) in the past. In this case, we can write an expression for the past value of \(y\) as \[\tag{eq:1.9} y(t-h) = y(t) - h \frac{d y}{d t} \biggr\rvert_{t} + \frac{h^2}{2} \frac{d^2 y}{d t^2} \biggr\rvert_{t} - \frac{h^3}{6} \frac{d^3 y}{d t^3} \biggr\rvert_{t} + O(h^4)\] Note the negative signs on the odd-order terms. Similar to the forward difference expression above, it’s easy to rearrange this expression and then throw away terms of \(O(h)\) and above to get a different approximation for the derivative, \[\tag{eq:1.10} \frac{d y}{d t} \approx \frac{y(t) - y(t-h)}{h}\] This is called the "backward difference" approximation to the derivative. Since we threw away terms of \(O(h)\) and above, this expression for the derivative also carries an error penalty of \(O(h)\).
Now consider forming difference between Taylor series [eq:1.7] and [eq:1.9]. When we subtract one from the other we get \[\tag{eq:TaylorsExpansionDifference} \nonumber y(t+h) - y(t-h) = 2 h \frac{d y}{d t} \biggr\rvert_{t} + 2 \frac{h^3}{6} \frac{d^3 y}{d t^3} \biggr\rvert_{t} + O(h^5)\] This expression may be rearranged to become \[\nonumber \frac{d y}{d t} \biggr\rvert_{t} = \frac{y(t+h) - y(t-h)}{2 h} - \frac{h^2}{6} \frac{d^3 y}{d t^3} \biggr\rvert_{t} + O(h^4)\] Now if we throw away terms of \(O(h^2)\) and above, we get the so-called "central difference" approximation for the derivative, \[\tag{eq:1.11} \frac{d y}{d t} \biggr\rvert_{t} \approx \frac{y(t+h) - y(t-h)}{2 h}\] Note that to get this expression we discarded \(O(h^2)\) terms. This implies that the error incurred upon using this expression for a derivative is \(O(h^2)\). Since we usually have \(h \ll 1\), the central difference error \(O(h^2)\) is smaller than the \(O(h)\) incurred when using either the forward or backward difference formulas. That is, the central difference expression [eq:1.11] provides a more accurate approximation to the derivative than either the forward or the backward difference formulas, and should be your preferred derivative approximation when you can use it.
In each of these derivations we threw away terms of some order \(p\) (and above). Discarding terms of order \(p\) and above means we no longer have a complete Taylor series. This operation is called "truncation", meaning we have truncated (chopped off) the remaining terms of the full Taylor series. The error incurred by this operation is therefore called "truncation error".
Second derivatives from Taylor’s series
We can also get an approximation for the second derivative from expressions [eq:1.7] and [eq:1.9]. This time, form the sum of the two expressions to get \[\nonumber \label{eq:TaylorsExpansionSum} y(t+h) + y(t-h) = 2 y(t) + 2 \frac{h^2}{2} \frac{d^2 y}{d t^2} \biggr\rvert_{t} +O(h^4)\] This may be rearranged to get an expression for the second derivative, \[\nonumber \frac{d^2 y}{d t^2} \biggr\rvert_{t} = \frac{y(t+h) -2 y(t) + y(t-h)} {h^2} -O(h^2)\] If we drop the \(O(h^2)\) term, we get the approximation to the second derivative, \[\tag{eq:1.12} \frac{d^2 y}{d t^2} \biggr\rvert_{t} \approx \frac{y(t+h) -2 y(t) + y(t-h)} {h^2}\] And since we dropped the \(O(h^2)\) term to make this approximation, it means that use of [eq:1.12] will incur an error of \(O(h^2)\) when using this approximation.
Chapter summary
Here are the important points made in this chapter:
- Solving an ODE numerically needs two pieces of information: The ODE and an initial condition. Such problems are called "initial value problems", or IVPs.
- As mathematicians, we are used to functions \(y(t)\) which accept any continuous value of \(t\) and emit a value. However, when dealing with computers, we must frequently use "sampled functions" in which the continuous \(y(t)\) is replaced with samples \(y_n\) taken at regular, discrete points in time, \(t_n\). The ODE solvers presented in this booklet fall into this category – they provide samples of the solution at discrete times, similar to snapshots of the continuous function they try to approximate.
- summarizes the four different approximations for derivatives we have encountered so far. There are many more derivative approximations of higher order, and using more terms. The higher order approximations give better accuracy at the cost of additional complexity. Higher order approximations are used less frequently than the four shown here.
Derivative | Method name | Expression | Error order |
---|---|---|---|
\(\frac{d y}{d t}\) | Forward difference | \(\frac{y_{n+1}-y_n}{h}\) | \(h\) |
\(\frac{d y}{d t}\) | Backward difference | \(\frac{y_{n}-y_{n-1}}{h}\) | \(h\) |
\(\frac{d y}{d t}\) | Central difference | \(\frac{y_{n+1}-y_{n-1}}{2 h}\) | \(h^2\) |
\(\frac{d^2 y}{d t^2}\) | Second derivative | \(\frac{y_{n+1}-2 y_{n}+y_{n-1}}{h^2}\) | \(h^2\) |