3.10: 3. 10- Solution of the Least-Squares Problem
View Solution of the Least-Squares Problem by the Normal Equations on YouTube
The problem of least-squares can be cast as the problem of solving an overdetermined matrix equation \(\text{Ax} = \text{b}\) when \(\text{b}\) is not in the column space of \(\text{A}\). By replacing \(\text{b}\) by its orthogonal projection onto the column space of \(\text{A}\), the solution minimizes the norm \(||\text{Ax} − \text{b}||\).
Now \(\text{b} = \text{b}_{\text{proj}_{\text{Col(A)}}} + (\text{b} − \text{b}_{\text{proj}_{\text{Col(A)}}})\), where \(\text{b}_{\text{proj}_{\text{Col(A)}}}\) is the projection of \(\text{b}\) onto the column space of \(\text{A}\). Since \((\text{b} − \text{b}_{\text{proj}_{\text{Col(A)}}})\) is orthogonal to the column space of \(\text{A}\), it is in the nullspace of \(\text{A}^{\text{T}}\). Therefore, \(\text{A}^{\text{T}}(\text{b} − \text{b}_{\text{proj}_{\text{Col(A)}}} ) = 0\), and it pays to multiply \(\text{Ax} = \text{b}\) by \(\text{A}^{\text{T}}\) to obtain
\[\text{A}^{\text{T}}\text{Ax}=\text{A}^{\text{T}}\text{b}.\nonumber \]
These equations, called the normal equations for \(\text{Ax} = \text{b}\), determine the least-squares solution for \(\text{x}\), which can be found by Gaussian elimination. When \(\text{A}\) is an \(m\)-by-\(n\) matrix, then \(\text{A}^{\text{T}}\text{A}\) is an \(n\)-by-\(n\) matrix, and it can be shown that \(\text{A}^{\text{T}}\text{A}\) is invertible when the columns of \(\text{A}\) are linearly independent. When this is the case, one can rewrite the normal equations by multiplying both sides by \(\text{A}(\text{A}^{\text{T}}\text{A})^{−1}\) to obtain
\[\text{Ax}=\text{A}(\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}}\text{b},\nonumber \]
where the matrix
\[\text{P}=\text{A}(\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}}\nonumber \]
projects a vector onto the column space of \(\text{A}\). It is easy to prove that \(\text{P}^2 = \text{P}\), which states that two projections is the same as one. We have
\[\begin{aligned}\text{P}^2&=(\text{A}(\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}})(\text{A}(\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}}) \\ &=\text{A}\left[(\text{A}^{\text{T}}\text{A})^{-1}(\text{A}^{\text{T}}\text{A})\right](\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}} \\ &=\text{A}(\text{A}^{\text{T}}\text{A})^{-1}\text{A}^{\text{T}}=P.\end{aligned} \nonumber \]
As an example, consider the toy least-squares problem of fitting a line through the three data points \((1, 1),\: (2, 3)\) and \((3, 2)\). With the line given by \(y = \beta_0 + \beta_1x\), the overdetermined system of equations is given by
\[\left(\begin{array}{cc}1&1\\1&2\\1&3\end{array}\right)\left(\begin{array}{c}\beta_0\\ \beta_1\end{array}\right)=\left(\begin{array}{c}1\\3\\2\end{array}\right).\nonumber \]
The least-squares solution is determined by solving
\[\left(\begin{array}{ccc}1&1&1\\1&2&3\end{array}\right)\left(\begin{array}{cc}1&1\\1&2\\1&3\end{array}\right)\left(\begin{array}{c}\beta_0\\ \beta_1\end{array}\right)=\left(\begin{array}{ccc}1&1&1\\1&2&3\end{array}\right)\left(\begin{array}{c}1\\3\\2\end{array}\right),\nonumber \]
or
\[\left(\begin{array}{cc}3&6\\6&14\end{array}\right)\left(\begin{array}{c}\beta_0 \\ \beta_1\end{array}\right)=\left(\begin{array}{c}6\\13\end{array}\right).\nonumber \]
We can solve either by directly inverting the two-by-two matrix or by using Gaussian elimination. Inverting the two-by-two matrix, we have
\[\left(\begin{array}{c}\beta_0 \\ \beta_1\end{array}\right)=\frac{1}{6}\left(\begin{array}{rr}14&-6 \\ -6&3\end{array}\right)\left(\begin{array}{c}6\\13\end{array}\right)=\left(\begin{array}{c}1\\1/2\end{array}\right),\nonumber \]
so that the least-squares line is given by
\[y=1+\frac{1}{2}x.\nonumber \]
The graph of the data and the line is shown below.