4.2: Fitting to a Linear Combination of Functions
Consider the general fitting function
\[y(x)=\sum_{j=1}^{m} c_{j} f_{j}(x) \nonumber \]
where we assume \(m\) functions \(f_{j}(x)\) . For example, if we want to fit a cubic polynomial to the data, then we would have \(m=4\) and take \(f_{1}=1, f_{2}=x, f_{3}=x^{2}\) and \(f_{4}=x^{3}\) . Typically, the number of functions \(f_{j}\) is less than the number of data points; that is, \(m<n\) , so that a direct attempt to solve for the \(c_{j}^{\prime}\) s such that the fitting function exactly passes through the \(n\) data points would result in \(n\) equations and \(m\) unknowns. This would be an over-determined linear system that in general has no solution.
We now define the vectors
\[\mathbf{y}=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{array}\right), \quad \mathbf{c}=\left(\begin{array}{c} c_{1} \\ c_{2} \\ \vdots \\ c_{m} \end{array}\right) \nonumber \]
and the \(n\) -by-m matrix
\[\mathrm{A}=\left(\begin{array}{cccc} f_{1}\left(x_{1}\right) & f_{2}\left(x_{1}\right) & \cdots & f_{m}\left(x_{1}\right) \\ f_{1}\left(x_{2}\right) & f_{2}\left(x_{2}\right) & \cdots & f_{m}\left(x_{2}\right) \\ \vdots & \vdots & \ddots & \vdots \\ f_{1}\left(x_{n}\right) & f_{2}\left(x_{n}\right) & \cdots & f_{m}\left(x_{n}\right) \end{array}\right) \nonumber \]
With \(m<n\) , the equation \(A \mathbf{c}=\mathbf{y}\) is over-determined. We let
\[\mathbf{r}=\mathbf{y}-\mathrm{A} \mathbf{c} \nonumber \]
be the residual vector, and let
\[\rho=\sum_{i=1}^{n} r_{i}^{2} \nonumber \]
The method of least squares minimizes \(\rho\) with respect to the components of c. Now, using \(T\) to signify the transpose of a matrix, we have
\[\begin{aligned} \rho &=\mathbf{r}^{T} \mathbf{r} \\ &=(\mathbf{y}-\mathrm{A} \mathbf{c})^{T}(\mathbf{y}-\mathrm{A} \mathbf{c}) \\ &=\mathbf{y}^{T} \mathbf{y}-\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}-\mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \end{aligned} \nonumber \]
Since \(\rho\) is a scalar, each term in the above expression must be a scalar, and since the transpose of a scalar is equal to the scalar, we have
\[\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}=\left(\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}\right)^{T}=\mathbf{y}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]
Therefore,
\[\rho=\mathbf{y}^{T} \mathbf{y}-2 \mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]
To find the minimum of \(\rho\) , we will need to solve \(\partial \rho / \partial c_{j}=0\) for \(j=1, \ldots, m\) . To take the derivative of \(\rho\) , we switch to a tensor notation, using the Einstein summation convention, where repeated indices are summed over their allowable range. We can write
\[\rho=y_{i} y_{i}-2 y_{i} \mathrm{~A}_{i k} c_{k}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]
Taking the partial derivative, we have
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i k} \frac{\partial c_{k}}{\partial c_{j}}+\frac{\partial c_{i}}{\partial c_{j}} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} \frac{\partial c_{l}}{\partial c_{j}} \nonumber \]
Now,
\[\frac{\partial c_{i}}{\partial c_{j}}= \begin{cases}1, & \text { if } i=j \\ 0, & \text { otherwise }\end{cases} \nonumber \]
Therefore,
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} \nonumber \]
Now,
\[\begin{aligned} c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} &=c_{i} \mathrm{~A}_{k i} \mathrm{~A}_{k j} \\ &=\mathrm{A}_{k j} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \end{aligned} \nonumber \]
Therefore,
\[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+2 \mathrm{~A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]
With the partials set equal to zero, we have
\[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=y_{i} \mathrm{~A}_{i j} \nonumber \]
or
\[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=\mathrm{A}_{j i}^{T} y_{i} \nonumber \]
In vector notation, we have
\[\mathrm{A}^{T} \mathrm{~A} \mathbf{c}=\mathrm{A}^{T} \mathbf{y} . \nonumber \]
Equation (4.2) is the so-called normal equation, and can be solved for c by Gaussian elimination using the MATLAB backslash operator. After constructing the matrix A given by (4.1), and the vector y from the data, one can code in MATLAB
\[c=\left(A^{\prime} A\right) \backslash\left(A^{\prime} y\right) \nonumber \]
But in fact the MATLAB back slash operator will automatically solve the normal equations when the matrix \(A\) is not square, so that the MATLAB code
\[c=A \backslash y \nonumber \]
yields the same result.