4.1: Fitting a Straight Line
( \newcommand{\kernel}{\mathrm{null}\,}\)
Suppose the fitting curve is a line. We write for the fitting curve
y(x)=\alpha x+\beta . \nonumber
The distance r_{i} from the data point \left(x_{i}, y_{i}\right) and the fitting curve is given by
\begin{aligned} r_{i} &=y_{i}-y\left(x_{i}\right) \\ &=y_{i}-\left(\alpha x_{i}+\beta\right) . \end{aligned} \nonumber
A least-squares fit minimizes the sum of the squares of the r_{i} ’s. This minimum can be shown to result in the most probable values of \alpha and \beta.
We define
\begin{aligned} \rho &=\sum_{i=1}^{n} r_{i}^{2} \\ &=\sum_{i=1}^{n}\left(y_{i}-\left(\alpha x_{i}+\beta\right)\right)^{2} \end{aligned} \nonumber
To minimize \rho with respect to \alpha and \beta, we solve
\frac{\partial \rho}{\partial \alpha}=0, \quad \frac{\partial \rho}{\partial \beta}=0 \nonumber
Taking the partial derivatives, we have
\begin{aligned} &\frac{\partial \rho}{\partial \alpha}=\sum_{i=1}^{n} 2\left(-x_{i}\right)\left(y_{i}-\left(\alpha x_{i}+\beta\right)\right)=0 \\ &\frac{\partial \rho}{\partial \beta}=\sum_{i=1}^{n} 2(-1)\left(y_{i}-\left(\alpha x_{i}+\beta\right)\right)=0 \end{aligned} \nonumber
These equations form a system of two linear equations in the two unknowns \alpha and \beta, which is evident when rewritten in the form
\begin{aligned} \alpha \sum_{i=1}^{n} x_{i}^{2}+\beta \sum_{i=1}^{n} x_{i} &=\sum_{i=1}^{n} x_{i} y_{i} \\ \alpha \sum_{i=1}^{n} x_{i}+\beta n &=\sum_{i=1}^{n} y_{i} \end{aligned} \nonumber
These equations can be solved either analytically, or numerically in MATLAB, where the matrix form is
\left(\begin{array}{ccc} \sum_{i=1}^{n} & x_{i}^{2} & \sum_{i=1}^{n} \\ \sum_{i=1}^{n} & x_{i} & n \end{array}\right)\left(\begin{array}{c} \alpha \\ \beta \end{array}\right)=\left(\begin{array}{c} \sum_{i=1}^{n} x_{i} y_{i} \\ \sum_{i=1}^{n} y_{i} \end{array}\right) . \nonumber
A proper statistical treatment of this problem should also consider an estimate of the errors in \alpha and \beta as well as an estimate of the goodness-of-fit of the data to the model. We leave these further considerations to a statistics class.