4.2: Fitting to a Linear Combination of Functions
( \newcommand{\kernel}{\mathrm{null}\,}\)
Consider the general fitting function
y(x)=m∑j=1cjfj(x)
where we assume m functions fj(x). For example, if we want to fit a cubic polynomial to the data, then we would have m=4 and take f1=1,f2=x,f3=x2 and f4=x3. Typically, the number of functions fj is less than the number of data points; that is, m<n, so that a direct attempt to solve for the c′j s such that the fitting function exactly passes through the n data points would result in n equations and m unknowns. This would be an over-determined linear system that in general has no solution.
We now define the vectors
y=(y1y2⋮yn),c=(c1c2⋮cm)
and the n-by-m matrix
A=(f1(x1)f2(x1)⋯fm(x1)f1(x2)f2(x2)⋯fm(x2)⋮⋮⋱⋮f1(xn)f2(xn)⋯fm(xn))
With m<n, the equation Ac=y is over-determined. We let
r=y−Ac
be the residual vector, and let
ρ=n∑i=1r2i
The method of least squares minimizes ρ with respect to the components of c. Now, using T to signify the transpose of a matrix, we have
ρ=rTr=(y−Ac)T(y−Ac)=yTy−cT ATy−yT Ac+cT AT Ac.
Since ρ is a scalar, each term in the above expression must be a scalar, and since the transpose of a scalar is equal to the scalar, we have
cT ATy=(cT ATy)T=yT Ac.
Therefore,
ρ=yTy−2yT Ac+cT AT Ac.
To find the minimum of ρ, we will need to solve ∂ρ/∂cj=0 for j=1,…,m. To take the derivative of ρ, we switch to a tensor notation, using the Einstein summation convention, where repeated indices are summed over their allowable range. We can write
ρ=yiyi−2yi Aikck+ci ATik Aklcl
Taking the partial derivative, we have
∂ρ∂cj=−2yi Aik∂ck∂cj+∂ci∂cj ATik Aklcl+ci ATik Akl∂cl∂cj
Now,
∂ci∂cj={1, if i=j0, otherwise
Therefore,
∂ρ∂cj=−2yi Aij+ATjk Aklcl+ci ATik Akj
Now,
ci ATik Akj=ci Aki Akj=Akj Akici=ATjk Akici=ATjk Aklcl
Therefore,
∂ρ∂cj=−2yi Aij+2 ATjk Aklcl
With the partials set equal to zero, we have
ATjk Aklcl=yi Aij
or
ATjk Aklcl=ATjiyi
In vector notation, we have
AT Ac=ATy.
Equation (4.2) is the so-called normal equation, and can be solved for c by Gaussian elimination using the MATLAB backslash operator. After constructing the matrix A given by (4.1), and the vector y from the data, one can code in MATLAB
c=(A′A)∖(A′y)
But in fact the MATLAB back slash operator will automatically solve the normal equations when the matrix A is not square, so that the MATLAB code
c=A∖y
yields the same result.