4.2: Fitting to a Linear Combination of Functions
- Page ID
- 96052
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Consider the general fitting function
\[y(x)=\sum_{j=1}^{m} c_{j} f_{j}(x) \nonumber \]
where we assume \(m\) functions \(f_{j}(x)\). For example, if we want to fit a cubic polynomial to the data, then we would have \(m=4\) and take \(f_{1}=1, f_{2}=x, f_{3}=x^{2}\) and \(f_{4}=x^{3}\). Typically, the number of functions \(f_{j}\) is less than the number of data points; that is, \(m<n\), so that a direct attempt to solve for the \(c_{j}^{\prime}\) s such that the fitting function exactly passes through the \(n\) data points would result in \(n\) equations and \(m\) unknowns. This would be an over-determined linear system that in general has no solution.
We now define the vectors
\[\mathbf{y}=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{array}\right), \quad \mathbf{c}=\left(\begin{array}{c} c_{1} \\ c_{2} \\ \vdots \\ c_{m} \end{array}\right) \nonumber \]
and the \(n\)-by-m matrix
\[\mathrm{A}=\left(\begin{array}{cccc} f_{1}\left(x_{1}\right) & f_{2}\left(x_{1}\right) & \cdots & f_{m}\left(x_{1}\right) \\ f_{1}\left(x_{2}\right) & f_{2}\left(x_{2}\right) & \cdots & f_{m}\left(x_{2}\right) \\ \vdots & \vdots & \ddots & \vdots \\ f_{1}\left(x_{n}\right) & f_{2}\left(x_{n}\right) & \cdots & f_{m}\left(x_{n}\right) \end{array}\right) \nonumber \]
With \(m<n\), the equation \(A \mathbf{c}=\mathbf{y}\) is over-determined. We let
\[\mathbf{r}=\mathbf{y}-\mathrm{A} \mathbf{c} \nonumber \]
be the residual vector, and let
\[\rho=\sum_{i=1}^{n} r_{i}^{2} \nonumber \]
The method of least squares minimizes \(\rho\) with respect to the components of c. Now, using \(T\) to signify the transpose of a matrix, we have
\[\begin{aligned} \rho &=\mathbf{r}^{T} \mathbf{r} \\ &=(\mathbf{y}-\mathrm{A} \mathbf{c})^{T}(\mathbf{y}-\mathrm{A} \mathbf{c}) \\ &=\mathbf{y}^{T} \mathbf{y}-\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}-\mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \end{aligned} \nonumber \]
Since \(\rho\) is a scalar, each term in the above expression must be a scalar, and since the transpose of a scalar is equal to the scalar, we have
\[\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}=\left(\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}\right)^{T}=\mathbf{y}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]
Therefore,
\[\rho=\mathbf{y}^{T} \mathbf{y}-2 \mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]
To find the minimum of \(\rho\), we will need to solve \(\partial \rho / \partial c_{j}=0\) for \(j=1, \ldots, m\). To take the derivative of \(\rho\), we switch to a tensor notation, using the Einstein summation convention, where repeated indices are summed over their allowable range. We can write
\[\rho=y_{i} y_{i}-2 y_{i} \mathrm{~A}_{i k} c_{k}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]
Taking the partial derivative, we have
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i k} \frac{\partial c_{k}}{\partial c_{j}}+\frac{\partial c_{i}}{\partial c_{j}} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} \frac{\partial c_{l}}{\partial c_{j}} \nonumber \]
Now,
\[\frac{\partial c_{i}}{\partial c_{j}}= \begin{cases}1, & \text { if } i=j \\ 0, & \text { otherwise }\end{cases} \nonumber \]
Therefore,
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} \nonumber \]
Now,
\[\begin{aligned} c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} &=c_{i} \mathrm{~A}_{k i} \mathrm{~A}_{k j} \\ &=\mathrm{A}_{k j} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \end{aligned} \nonumber \]
Therefore,
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+2 \mathrm{~A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]
With the partials set equal to zero, we have
\[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=y_{i} \mathrm{~A}_{i j} \nonumber \]
or
\[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=\mathrm{A}_{j i}^{T} y_{i} \nonumber \]
In vector notation, we have
\[\mathrm{A}^{T} \mathrm{~A} \mathbf{c}=\mathrm{A}^{T} \mathbf{y} . \nonumber \]
Equation (4.2) is the so-called normal equation, and can be solved for c by Gaussian elimination using the MATLAB backslash operator. After constructing the matrix A given by (4.1), and the vector y from the data, one can code in MATLAB
\[c=\left(A^{\prime} A\right) \backslash\left(A^{\prime} y\right) \nonumber \]
But in fact the MATLAB back slash operator will automatically solve the normal equations when the matrix \(A\) is not square, so that the MATLAB code
\[c=A \backslash y \nonumber \]
yields the same result.