Skip to main content
Mathematics LibreTexts

4.2: Fitting to a Linear Combination of Functions

  • Page ID
    96052
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Consider the general fitting function

    \[y(x)=\sum_{j=1}^{m} c_{j} f_{j}(x) \nonumber \]

    where we assume \(m\) functions \(f_{j}(x)\). For example, if we want to fit a cubic polynomial to the data, then we would have \(m=4\) and take \(f_{1}=1, f_{2}=x, f_{3}=x^{2}\) and \(f_{4}=x^{3}\). Typically, the number of functions \(f_{j}\) is less than the number of data points; that is, \(m<n\), so that a direct attempt to solve for the \(c_{j}^{\prime}\) s such that the fitting function exactly passes through the \(n\) data points would result in \(n\) equations and \(m\) unknowns. This would be an over-determined linear system that in general has no solution.

    We now define the vectors

    \[\mathbf{y}=\left(\begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{array}\right), \quad \mathbf{c}=\left(\begin{array}{c} c_{1} \\ c_{2} \\ \vdots \\ c_{m} \end{array}\right) \nonumber \]

    and the \(n\)-by-m matrix

    \[\mathrm{A}=\left(\begin{array}{cccc} f_{1}\left(x_{1}\right) & f_{2}\left(x_{1}\right) & \cdots & f_{m}\left(x_{1}\right) \\ f_{1}\left(x_{2}\right) & f_{2}\left(x_{2}\right) & \cdots & f_{m}\left(x_{2}\right) \\ \vdots & \vdots & \ddots & \vdots \\ f_{1}\left(x_{n}\right) & f_{2}\left(x_{n}\right) & \cdots & f_{m}\left(x_{n}\right) \end{array}\right) \nonumber \]

    With \(m<n\), the equation \(A \mathbf{c}=\mathbf{y}\) is over-determined. We let

    \[\mathbf{r}=\mathbf{y}-\mathrm{A} \mathbf{c} \nonumber \]

    be the residual vector, and let

    \[\rho=\sum_{i=1}^{n} r_{i}^{2} \nonumber \]

    The method of least squares minimizes \(\rho\) with respect to the components of c. Now, using \(T\) to signify the transpose of a matrix, we have

    \[\begin{aligned} \rho &=\mathbf{r}^{T} \mathbf{r} \\ &=(\mathbf{y}-\mathrm{A} \mathbf{c})^{T}(\mathbf{y}-\mathrm{A} \mathbf{c}) \\ &=\mathbf{y}^{T} \mathbf{y}-\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}-\mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \end{aligned} \nonumber \]

    Since \(\rho\) is a scalar, each term in the above expression must be a scalar, and since the transpose of a scalar is equal to the scalar, we have

    \[\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}=\left(\mathbf{c}^{T} \mathrm{~A}^{T} \mathbf{y}\right)^{T}=\mathbf{y}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]

    Therefore,

    \[\rho=\mathbf{y}^{T} \mathbf{y}-2 \mathbf{y}^{T} \mathrm{~A} \mathbf{c}+\mathbf{c}^{T} \mathrm{~A}^{T} \mathrm{~A} \mathbf{c} . \nonumber \]

    To find the minimum of \(\rho\), we will need to solve \(\partial \rho / \partial c_{j}=0\) for \(j=1, \ldots, m\). To take the derivative of \(\rho\), we switch to a tensor notation, using the Einstein summation convention, where repeated indices are summed over their allowable range. We can write

    \[\rho=y_{i} y_{i}-2 y_{i} \mathrm{~A}_{i k} c_{k}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]

    Taking the partial derivative, we have

    \[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i k} \frac{\partial c_{k}}{\partial c_{j}}+\frac{\partial c_{i}}{\partial c_{j}} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k l} \frac{\partial c_{l}}{\partial c_{j}} \nonumber \]

    Now,

    \[\frac{\partial c_{i}}{\partial c_{j}}= \begin{cases}1, & \text { if } i=j \\ 0, & \text { otherwise }\end{cases} \nonumber \]

    Therefore,

    \[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}+c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} \nonumber \]

    Now,

    \[\begin{aligned} c_{i} \mathrm{~A}_{i k}^{T} \mathrm{~A}_{k j} &=c_{i} \mathrm{~A}_{k i} \mathrm{~A}_{k j} \\ &=\mathrm{A}_{k j} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k i} c_{i} \\ &=\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \end{aligned} \nonumber \]

    Therefore,

    \[\frac{\partial \rho}{\partial c_{j}}=-2 y_{i} \mathrm{~A}_{i j}+2 \mathrm{~A}_{j k}^{T} \mathrm{~A}_{k l} c_{l} \nonumber \]

    With the partials set equal to zero, we have

    \[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=y_{i} \mathrm{~A}_{i j} \nonumber \]

    or

    \[\mathrm{A}_{j k}^{T} \mathrm{~A}_{k l} c_{l}=\mathrm{A}_{j i}^{T} y_{i} \nonumber \]

    In vector notation, we have

    \[\mathrm{A}^{T} \mathrm{~A} \mathbf{c}=\mathrm{A}^{T} \mathbf{y} . \nonumber \]

    Equation (4.2) is the so-called normal equation, and can be solved for c by Gaussian elimination using the MATLAB backslash operator. After constructing the matrix A given by (4.1), and the vector y from the data, one can code in MATLAB

    \[c=\left(A^{\prime} A\right) \backslash\left(A^{\prime} y\right) \nonumber \]

    But in fact the MATLAB back slash operator will automatically solve the normal equations when the matrix \(A\) is not square, so that the MATLAB code

    \[c=A \backslash y \nonumber \]

    yields the same result.


    This page titled 4.2: Fitting to a Linear Combination of Functions is shared under a CC BY 3.0 license and was authored, remixed, and/or curated by Jeffrey R. Chasnov via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.