6.5: The Method of Least Squares

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\avec}{\mathbf a}$$ $$\newcommand{\bvec}{\mathbf b}$$ $$\newcommand{\cvec}{\mathbf c}$$ $$\newcommand{\dvec}{\mathbf d}$$ $$\newcommand{\dtil}{\widetilde{\mathbf d}}$$ $$\newcommand{\evec}{\mathbf e}$$ $$\newcommand{\fvec}{\mathbf f}$$ $$\newcommand{\nvec}{\mathbf n}$$ $$\newcommand{\pvec}{\mathbf p}$$ $$\newcommand{\qvec}{\mathbf q}$$ $$\newcommand{\svec}{\mathbf s}$$ $$\newcommand{\tvec}{\mathbf t}$$ $$\newcommand{\uvec}{\mathbf u}$$ $$\newcommand{\vvec}{\mathbf v}$$ $$\newcommand{\wvec}{\mathbf w}$$ $$\newcommand{\xvec}{\mathbf x}$$ $$\newcommand{\yvec}{\mathbf y}$$ $$\newcommand{\zvec}{\mathbf z}$$ $$\newcommand{\rvec}{\mathbf r}$$ $$\newcommand{\mvec}{\mathbf m}$$ $$\newcommand{\zerovec}{\mathbf 0}$$ $$\newcommand{\onevec}{\mathbf 1}$$ $$\newcommand{\real}{\mathbb R}$$ $$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$$ $$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$$ $$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$$ $$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$$ $$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$$ $$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$$ $$\newcommand{\bcal}{\cal B}$$ $$\newcommand{\ccal}{\cal C}$$ $$\newcommand{\scal}{\cal S}$$ $$\newcommand{\wcal}{\cal W}$$ $$\newcommand{\ecal}{\cal E}$$ $$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$$ $$\newcommand{\gray}[1]{\color{gray}{#1}}$$ $$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$$ $$\newcommand{\rank}{\operatorname{rank}}$$ $$\newcommand{\row}{\text{Row}}$$ $$\newcommand{\col}{\text{Col}}$$ $$\renewcommand{\row}{\text{Row}}$$ $$\newcommand{\nul}{\text{Nul}}$$ $$\newcommand{\var}{\text{Var}}$$ $$\newcommand{\corr}{\text{corr}}$$ $$\newcommand{\len}[1]{\left|#1\right|}$$ $$\newcommand{\bbar}{\overline{\bvec}}$$ $$\newcommand{\bhat}{\widehat{\bvec}}$$ $$\newcommand{\bperp}{\bvec^\perp}$$ $$\newcommand{\xhat}{\widehat{\xvec}}$$ $$\newcommand{\vhat}{\widehat{\vvec}}$$ $$\newcommand{\uhat}{\widehat{\uvec}}$$ $$\newcommand{\what}{\widehat{\wvec}}$$ $$\newcommand{\Sighat}{\widehat{\Sigma}}$$ $$\newcommand{\lt}{<}$$ $$\newcommand{\gt}{>}$$ $$\newcommand{\amp}{&}$$ $$\definecolor{fillinmathshade}{gray}{0.9}$$
Learning Objectives
1. Learn examples of best-fit problems.
2. Learn to turn a best-fit problem into a least-squares problem.
3. Recipe: find a least-squares solution (two ways). solution.
4. Picture: geometry of a least-squares solution.
5. Vocabulary words: least-squares solution.

In this section, we answer the following important question:

Suppose that $$Ax=b$$ does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation $$Ax=b$$.

Definition $$\PageIndex{1}$$: Least-Squares Solution

Let $$A$$ be an $$m\times n$$ matrix and let $$b$$ be a vector in $$\mathbb{R}^m$$. A least-squares solution of the matrix equation $$Ax=b$$ is a vector $$\hat x$$ in $$\mathbb{R}^n$$ such that

$\text{dist}(b,\,A\hat x) \leq \text{dist}(b,\,Ax) \nonumber$

for all other vectors $$x$$ in $$\mathbb{R}^n$$.

Recall that $$\text{dist}(v,w) = \|v-w\|$$ is the distance, Definition 6.1.2 in Section 6.1, between the vectors $$v$$ and $$w$$. The term “least squares” comes from the fact that $$\text{dist}(b,Ax) = \|b-A\hat x\|$$ is the square root of the sum of the squares of the entries of the vector $$b-A\hat x$$. So a least-squares solution minimizes the sum of the squares of the differences between the entries of $$A\hat x$$ and $$b$$. In other words, a least-squares solution solves the equation $$Ax=b$$ as closely as possible, in the sense that the sum of the squares of the difference $$b-Ax$$ is minimized.

Least Squares: Picture

Suppose that the equation $$Ax=b$$ is inconsistent. Recall from Note 2.3.6 in Section 2.3 that the column space of $$A$$ is the set of all other vectors $$c$$ such that $$Ax=c$$ is consistent. In other words, $$\text{Col}(A)$$ is the set of all vectors of the form $$Ax.$$ Hence, the closest vector, Note 6.3.1 in Section 6.3, of the form $$Ax$$ to $$b$$ is the orthogonal projection of $$b$$ onto $$\text{Col}(A)$$. This is denoted $$b_{\text{Col}(A)}\text{,}$$ following Definition 6.3.1 in Section 6.3

Figure $$\PageIndex{1}$$

Note $$\PageIndex{1}$$

A least-squares solution of $$Ax=b$$ is a solution $$\hat x$$ of the consistent equation $$Ax=b_{\text{Col}(A)}$$

Note $$\PageIndex{2}$$

If $$Ax=b$$ is consistent, then $$b_{\text{Col}(A)} = b\text{,}$$ so that a least-squares solution is the same as a usual solution.

Where is $$\hat x$$ in this picture? If $$v_1,v_2,\ldots,v_n$$ are the columns of $$A\text{,}$$ then

$A\hat x = A\left(\begin{array}{c}\hat{x}_1 \\ \hat{x}_2 \\ \vdots \\ \hat{x}_{n}\end{array}\right) = \hat x_1v_1 + \hat x_2v_2 + \cdots + \hat x_nv_n. \nonumber$

Hence the entries of $$\hat x$$ are the “coordinates” of $$b_{\text{Col}(A)}$$ with respect to the spanning set $$\{v_1,v_2,\ldots,v_m\}$$ of $$\text{Col}(A)$$. (They are honest $$\mathcal{B}$$-coordinates if the columns of $$A$$ are linearly independent.)

Figure $$\PageIndex{2}$$

We learned to solve this kind of orthogonal projection problem in Section 6.3.

Theorem $$\PageIndex{1}$$

Let $$A$$ be an $$m\times n$$ matrix and let $$b$$ be a vector in $$\mathbb{R}^m$$. The least-squares solutions of $$Ax=b$$ are the solutions of the matrix equation

$A^TAx = A^Tb \nonumber$

Proof

By Theorem 6.3.2 in Section 6.3, if $$\hat x$$ is a solution of the matrix equation $$A^TAx = A^Tb\text{,}$$ then $$A\hat x$$ is equal to $$b_{\text{Col}(A)}$$. We argued above that a least-squares solution of $$Ax=b$$ is a solution of $$Ax = b_{\text{Col}(A)}.$$

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a Least-Squares Solution

Let $$A$$ be an $$m\times n$$ matrix and let $$b$$ be a vector in $$\mathbb{R}^n$$. Here is a method for computing a least-squares solution of $$Ax=b\text{:}$$

1. Compute the matrix $$A^TA$$ and the vector $$A^Tb$$.
2. Form the augmented matrix for the matrix equation $$A^TAx = A^Tb\text{,}$$ and row reduce.
3. This equation is always consistent, and any solution $$\hat x$$ is a least-squares solution.

To reiterate: once you have found a least-squares solution $$\hat x$$ of $$Ax=b\text{,}$$ then $$b_{\text{Col}(A)}$$ is equal to $$A\hat x$$.

Example $$\PageIndex{1}$$

Find the least-squares solutions of $$Ax=b$$ where:

$A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber$

What quantity is being minimized?

Solution

We have

$A^TA = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) = \left(\begin{array}{cc}5&3\\3&3\end{array}\right) \nonumber$

and

$A^T b = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{c}6\\0\\0\end{array}\right) = \left(\begin{array}{c}0\\6\end{array}\right). \nonumber$

We form an augmented matrix and row reduce:

$\left(\begin{array}{cc|c}5&3&0\\3&3&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&-3\\0&1&5\end{array}\right). \nonumber$

Therefore, the only least-squares solution is $$\hat x = {-3\choose 5}.$$

This solution minimizes the distance from $$A\hat x$$ to $$b\text{,}$$ i.e., the sum of the squares of the entries of $$b-A\hat x = b-b_{\text{Col}(A)} = b_{\text{Col}(A)^\perp}$$. In this case, we have

$||b-A\hat{x}||=\left|\left|\left(\begin{array}{c}6\\0\\0\end{array}\right)-\left(\begin{array}{c}5\\2\\-1\end{array}\right)\right|\right|=\left|\left|\left(\begin{array}{c}1\\-2\\1\end{array}\right)\right|\right|=\sqrt{1^2+(-2)^2+1^2}=\sqrt{6}.\nonumber$

Therefore, $$b_{\text{Col}(A)} = A\hat x$$ is $$\sqrt 6$$ units from $$b.$$

In the following picture, $$v_1,v_2$$ are the columns of $$A\text{:}$$

Figure $$\PageIndex{4}$$

Example $$\PageIndex{2}$$

Find the least-squares solutions of $$Ax=b$$ where:

$A = \left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) \qquad b = \left(\begin{array}{c}1\\0\\-1\end{array}\right). \nonumber$

Solution

We have

$A^T A = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) = \left(\begin{array}{cc}5&-1\\-1&5\end{array}\right) \nonumber$

and

$A^T b = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{c}1\\0\\-1\end{array}\right) = \left(\begin{array}{c}2\\-2\end{array}\right). \nonumber$

We form an augmented matrix and row reduce:

$\left(\begin{array}{cc|c}5&-1&2\\-1&5&-2\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&1/3\\0&1&-1/3\end{array}\right). \nonumber$

Therefore, the only least-squares solution is $$\hat x = \frac 13{1\choose -1}.$$

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of $$A$$ are linearly dependent, then $$Ax=b_{\text{Col}(A)}$$ has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of Corollary 6.3.1 in Section 6.3.

Theorem $$\PageIndex{2}$$

Let $$A$$ be an $$m\times n$$ matrix and let $$b$$ be a vector in $$\mathbb{R}^m$$. The following are equivalent:

1. $$Ax=b$$ has a unique least-squares solution.
2. The columns of $$A$$ are linearly independent.
3. $$A^TA$$ is invertible.

In this case, the least-squares solution is

$\hat x = (A^TA)^{-1} A^Tb. \nonumber$

Proof

The set of least-squares solutions of $$Ax=b$$ is the solution set of the consistent equation $$A^TAx=A^Tb\text{,}$$ which is a translate of the solution set of the homogeneous equation $$A^TAx=0$$. Since $$A^TA$$ is a square matrix, the equivalence of 1 and 3 follows from Theorem 5.1.1 in Section 5.1. The set of least squares-solutions is also the solution set of the consistent equation $$Ax = b_{\text{Col}(A)}\text{,}$$ which has a unique solution if and only if the columns of $$A$$ are linearly independent by Recipe: Checking Linear Independence in Section 2.5.

Example $$\PageIndex{3}$$: Infinitely many least-squares solutions

Find the least-squares solutions of $$Ax=b$$ where:

$A = \left(\begin{array}{ccc}1&0&1\\1&1&-1\\1&2&-3\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber$

Solution

We have

$A^TA = \left(\begin{array}{ccc}3&3&-3\\3&5&-7\\-3&-7&11\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}6\\0\\6\end{array}\right). \nonumber$

We form an augmented matrix and row reduce:

$\left(\begin{array}{ccc|c}3&3&-3&6\\3&5&-7&0\\-3&-7&11&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&1&5\\0&1&-2&-3\\0&0&0&0\end{array}\right). \nonumber$

The free variable is $$x_3\text{,}$$ so the solution set is

$\left\{\begin{array}{rrrrr}x_1 &=& -x_3 &+& 5\\ x_2 &=& 2x_3 &-& 3\\ x_3 &=& x_3&{}&{}\end{array}\right. \quad\xrightarrow[\text{vector form}]{\text{parametric}}\quad \hat{x}=\left(\begin{array}{c}x_1\\x_2\\x_3\end{array}\right)=x_3\left(\begin{array}{c}-1\\2\\1\end{array}\right)+\left(\begin{array}{c}5\\-3\\0\end{array}\right).\nonumber$

For example, taking $$x_3 = 0$$ and $$x_3=1$$ gives the least-squares solutions

$\hat x = \left(\begin{array}{c}5\\-3\\0\end{array}\right)\quad\text{and}\quad \hat x =\left(\begin{array}{c}4\\-1\\1\end{array}\right). \nonumber$

Geometrically, we see that the columns $$v_1,v_2,v_3$$ of $$A$$ are coplanar:

Figure $$\PageIndex{7}$$

Therefore, there are many ways of writing $$b_{\text{Col}(A)}$$ as a linear combination of $$v_1,v_2,v_3$$.

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if $$A$$ is an $$m\times n$$ matrix with orthogonal columns $$u_1,u_2,\ldots,u_m\text{,}$$ then we can use the Theorem 6.4.1 in Section 6.4 to write

$b_{\text{Col}(A)} = \frac{b\cdot u_1}{u_1\cdot u_1}\,u_1 + \frac{b\cdot u_2}{u_2\cdot u_2}\,u_2 + \cdots + \frac{b\cdot u_m}{u_m\cdot u_m}\,u_m = A\left(\begin{array}{c}(b\cdot u_1)/(u_1\cdot u_1) \\ (b\cdot u_2)/(u_2\cdot u_2) \\ \vdots \\ (b\cdot u_m)/(u_m\cdot u_m)\end{array}\right). \nonumber$

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent, Fact 6.4.1 in Section 6.4.

Recipe 2: Compute a Least-Squares Solution

Let $$A$$ be an $$m\times n$$ matrix with orthogonal columns $$u_1,u_2,\ldots,u_m\text{,}$$ and let $$b$$ be a vector in $$\mathbb{R}^n$$. Then the least-squares solution of $$Ax=b$$ is the vector

$\hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \ldots,\; \frac{b\cdot u_m}{u_m\cdot u_m} \right). \nonumber$

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

Example $$\PageIndex{4}$$

Find the least-squares solution of $$Ax=b$$ where:

$A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber$

Solution

Let $$u_1,u_2,u_3$$ be the columns of $$A$$. These form an orthogonal set, so

$\hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber$

Compare Example 6.4.8 in Section 6.4.

Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example $$\PageIndex{5}$$: Best-Fit Line

Suppose that we have measured three data points

$(0,6),\quad (1,0),\quad (2,0), \nonumber$

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

Figure $$\PageIndex{9}$$

The general equation for a (non-vertical) line is

$y = Mx + B. \nonumber$

If our three data points were to lie on this line, then the following equations would be satisfied:

\begin{align} 6&=M\cdot 0+B\nonumber \\ 0&=M\cdot 1+B\label{eq:1} \\ 0&=M\cdot 2+B.\nonumber\end{align}

In order to find the best-fit line, we try to solve the above equations in the unknowns $$M$$ and $$B$$. As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve $$Ax=b$$ for

$A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad x = \left(\begin{array}{c}M\\B\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber$

We solved this least-squares problem in Example $$\PageIndex{1}$$: the only least-squares solution to $$Ax=b$$ is $$\hat x = {M\choose B} = {-3\choose 5}\text{,}$$ so the best-fit line is

$y = -3x + 5. \nonumber$

Figure $$\PageIndex{10}$$

What exactly is the line $$y= f(x) = -3x+5$$ minimizing? The least-squares solution $$\hat x$$ minimizes the sum of the squares of the entries of the vector $$b-A\hat x$$. The vector $$b$$ is the left-hand side of $$\eqref{eq:1}$$, and

$A\left(\begin{array}{c}-3\\5\end{array}\right) = \left(\begin{array}{c}-3(0)+5\\-3(1)+5\\-3(2)+5\end{array}\right) = \left(\begin{array}{c}f(0)\\f(1)\\f(2)\end{array}\right). \nonumber$

In other words, $$A\hat x$$ is the vector whose entries are the $$y$$-coordinates of the graph of the line at the values of $$x$$ we specified in our data points, and $$b$$ is the vector whose entries are the $$y$$-coordinates of those data points. The difference $$b-A\hat x$$ is the vertical distance of the graph from the data points:

Figure $$\PageIndex{11}$$

$\color{blue}{b-A\hat{x}=\left(\begin{array}{c}6\\0\\0\end{array}\right)-A\left(\begin{array}{c}-3\\5\end{array}\right)=\left(\begin{array}{c}-1\\2\\-1\end{array}\right)}\nonumber$

The best-fit line minimizes the sum of the squares of these vertical distances.

Example $$\PageIndex{7}$$: Best-fit parabola

Find the parabola that best approximates the data points

$(-1,\,1/2),\quad(1,\,-1),\quad(2,\,-1/2),\quad(3,\,2). \nonumber$

Figure $$\PageIndex{13}$$

What quantity is being minimized?

Solution

The general equation for a parabola is

$y = Bx^2 + Cx + D. \nonumber$

If the four points were to lie on this parabola, then the following equations would be satisfied:

\begin{align} \frac{1}{2}&=B(-1)^2+C(-1)+D\nonumber \\ -1&=B(1)^2+C(1)+D\nonumber \\ -\frac{1}{2}&=B(2)^2+C(2)+D\label{eq:2} \\ 2&=B(3)^2+C(3)+D.\nonumber\end{align}

We treat this as a system of equations in the unknowns $$B,C,D$$. In matrix form, we can write this as $$Ax=b$$ for

$A = \left(\begin{array}{ccc}1&-1&1\\1&1&1\\4&2&1\\9&3&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}1/2 \\ -1\\-1/2 \\ 2\end{array}\right). \nonumber$

We find a least-squares solution by multiplying both sides by the transpose:

$A^TA = \left(\begin{array}{ccc}99&35&15\\35&15&5\\15&5&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}31/2\\7/2\\1\end{array}\right), \nonumber$

then forming an augmented matrix and row reducing:

$\left(\begin{array}{ccc|c}99&35&15&31/2 \\ 35&15&5&7/2 \\ 15&5&4&1\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&0&53/88 \\ 0&1&0&-379/440 \\ 0&0&1&-41/44\end{array}\right) \implies \hat x = \left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44 \end{array}\right). \nonumber$

The best-fit parabola is

$y = \frac{53}{88}x^2 - \frac{379}{440}x - \frac{41}{44}. \nonumber$

Multiplying through by $$88\text{,}$$ we can write this as

$88y = 53x^2 - \frac{379}{5}x - 82. \nonumber$

Figure $$\PageIndex{14}$$

Now we consider what exactly the parabola $$y = f(x)$$ is minimizing. The least-squares solution $$\hat x$$ minimizes the sum of the squares of the entries of the vector $$b-A\hat x$$. The vector $$b$$ is the left-hand side of $$\eqref{eq:2}$$, and

$A\hat{x}=\left(\begin{array}{c} \frac{53}{88}(-1)^2-\frac{379}{440}(-1)-\frac{41}{44} \\ \frac{53}{88}(1)^2-\frac{379}{440}(1)-\frac{41}{44} \\ \frac{53}{88}(2)^2-\frac{379}{440}(2)-\frac{41}{44} \\ \frac{53}{88}(3)^2-\frac{379}{440}(3)-\frac{41}{44}\end{array}\right)=\left(\begin{array}{c}f(-1) \\ f(1) \\ f(2) \\ f(3)\end{array}\right).\nonumber$

In other words, $$A\hat x$$ is the vector whose entries are the $$y$$-coordinates of the graph of the parabola at the values of $$x$$ we specified in our data points, and $$b$$ is the vector whose entries are the $$y$$-coordinates of those data points. The difference $$b-A\hat x$$ is the vertical distance of the graph from the data points:

Figure $$\PageIndex{15}$$

$\color{blue}{b-A\hat{x}=\left(\begin{array}{c}1/2 \\ -1 \\ -1/2 \\ 2\end{array}\right)-A\left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44\end{array}\right)=\left(\begin{array}{c}-7/220 \\ 21/110 \\ -14/55 \\ 21/220\end{array}\right)}\nonumber$

The best-fit parabola minimizes the sum of the squares of these vertical distances.

Example $$\PageIndex{8}$$: Best-fit linear function

Find the linear function $$f(x,y)$$ that best approximates the following data:

$\begin{array}{r|r|c} x & y & f(x,y) \\\hline 1 & 0 & 0 \\ 0 & 1 & 1 \\ -1 & 0 & 3 \\ 0 & -1 & 4 \end{array} \nonumber$

What quantity is being minimized?

Solution

The general equation for a linear function in two variables is

$f(x,y) = Bx + Cy + D. \nonumber$

We want to solve the following system of equations in the unknowns $$B,C,D\text{:}$$

\begin{align} B(1)+C(0)+D&=0 \nonumber \\ B(0)+C(1)+D&=1 \nonumber \\ B(-1)+C(0)+D&=3\label{eq:3} \\ B(0)+C(-1)+D&=4\nonumber\end{align}

In matrix form, we can write this as $$Ax=b$$ for

$A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber$

We observe that the columns $$u_1,u_2,u_3$$ of $$A$$ are orthogonal, so we can use Recipe 2: Compute a Least-Squares Solution:

$\hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber$ We find a least-squares solution by multiplying both sides by the transpose:

$A^TA = \left(\begin{array}{ccc}2&0&0\\0&2&0\\0&0&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}-3\\-3\\8\end{array}\right). \nonumber$ The matrix $$A^TA$$ is diagonal (do you see why that happened?), so it is easy to solve the equation $$A^TAx = A^Tb\text{:}$$

$\left(\begin{array}{cccc}2&0&0&-3 \\ 0&2&0&-3 \\ 0&0&4&8\end{array}\right) \xrightarrow{\text{RREF}} \left(\begin{array}{cccc}1&0&0&-3/2 \\ 0&1&0&-3/2 \\ 0&0&1&2\end{array}\right) \implies \hat x = \left(\begin{array}{c}-3/2 \\ -3/2 \\ 2\end{array}\right). \nonumber$ Therefore, the best-fit linear equation is

$f(x,y) = -\frac 32x - \frac32y + 2. \nonumber$

Here is a picture of the graph of $$f(x,y)\text{:}$$

Figure $$\PageIndex{17}$$

Now we consider what quantity is being minimized by the function $$f(x,y)$$. The least-squares solution $$\hat x$$ minimizes the sum of the squares of the entries of the vector $$b-A\hat x$$. The vector $$b$$ is the right-hand side of $$\eqref{eq:3}$$, and

$A\hat{x}=\left(\begin{array}{rrrrr}-\frac32(1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(1) &+& 2\\ -\frac32(-1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(-1) &+& 2\end{array}\right)=\left(\begin{array}{c}f(1,0) \\ f(0,1) \\ f(-1,0) \\ f(0,-1)\end{array}\right).\nonumber$

In other words, $$A\hat x$$ is the vector whose entries are the values of $$f$$ evaluated on the points $$(x,y)$$ we specified in our data table, and $$b$$ is the vector whose entries are the desired values of $$f$$ evaluated at those points. The difference $$b-A\hat x$$ is the vertical distance of the graph from the data points, as indicated in the above picture. The best-fit linear function minimizes the sum of these vertical distances.

All of the above examples have the following form: some number of data points $$(x,y)$$ are specified, and we want to find a function

$y = B_1g_1(x) + B_2g_2(x) + \cdots + B_mg_m(x) \nonumber$

that best approximates these points, where $$g_1,g_2,\ldots,g_m$$ are fixed functions of $$x$$. Indeed, in the best-fit line example we had $$g_1(x)=x$$ and $$g_2(x)=1\text{;}$$ in the best-fit parabola example we had $$g_1(x)=x^2\text{,}$$ $$g_2(x)=x\text{,}$$ and $$g_3(x)=1\text{;}$$ and in the best-fit linear function example we had $$g_1(x_1,x_2)=x_1\text{,}$$ $$g_2(x_1,x_2)=x_2\text{,}$$ and $$g_3(x_1,x_2)=1$$ (in this example we take $$x$$ to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns $$B_1,B_2,\ldots,B_m$$—once we evaluate the $$g_i\text{,}$$ they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of $$y = f(x)$$ to our original data points.

To emphasize that the nature of the functions $$g_i$$ really is irrelevant, consider the following example.

Example $$\PageIndex{9}$$: Best-fit trigonometric function

What is the best-fit function of the form

$y=B+C\cos(x)+D\sin(x)+E\cos(2x)+F\sin(2x)+G\cos(3x)+H\sin(3x) \nonumber$

passing through the points

$\left(\begin{array}{c}-4\\ -1\end{array}\right),\; \left(\begin{array}{c}-3\\ 0\end{array}\right),\; \left(\begin{array}{c}-2\\ -1.5\end{array}\right),\; \left(\begin{array}{c}-1\\ .5\end{array}\right),\; \left(\begin{array}{c}0\\1\end{array}\right),\; \left(\begin{array}{c}1\\-1\end{array}\right),\; \left(\begin{array}{c}2\\-.5\end{array}\right),\; \left(\begin{array}{c}3\\2\end{array}\right),\; \left(\begin{array}{c}4 \\-1\end{array}\right)? \nonumber$

Figure $$\PageIndex{19}$$

Solution

We want to solve the system of equations

$\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &+& C\cos(-4) &+& D\sin(-4) &+& E\cos(-8) &+& F\sin(-8) &+& G\cos(-12) &+& H\sin(-12)\\ 0 &=& B &+& C\cos(-3) &+& D\sin(-3) &+& E\cos(-6) &+& F\sin(-6) &+& G\cos(-9) &+& H\sin(-9)\\ -1.5 &=& B &+& C\cos(-2) &+& D\sin(-2) &+& E\cos(-4) &+& F\sin(-4) &+& G\cos(-6) &+& H\sin(-6) \\ 0.5 &=& B &+& C\cos(-1) &+& D\sin(-1) &+& E\cos(-2) &+& F\sin(-2) &+& G\cos(-3) &+& H\sin(-3)\\ 1 &=& B &+& C\cos(0) &+& D\sin(0) &+& E\cos(0) &+& F\sin(0) &+& G\cos(0) &+& H\sin(0)\\ -1 &=& B &+& C\cos(1) &+& D\sin(1) &+& E\cos(2) &+& F\sin(2) &+& G\cos(3) &+& H\sin(3)\\ -0.5 &=& B &+& C\cos(2) &+& D\sin(2) &+& E\cos(4) &+& F\sin(4) &+& G\cos(6) &+& H\sin(6)\\ 2 &=& B &+& C\cos(3) &+& D\sin(3) &+& E\cos(6) &+& F\sin(6) &+& G\cos(9) &+& H\sin(9)\\ -1 &=& B &+& C\cos(4) &+& D\sin(4) &+& E\cos(8) &+& F\sin(8) &+& G\cos(12) &+& H\sin(12).\end{array}\nonumber$

All of the terms in these equations are numbers, except for the unknowns $$B,C,D,E,F,G,H\text{:}$$

$\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &-& 0.6536C &+& 0.7568D &-& 0.1455E &-& 0.9894F &+& 0.8439G &+& 0.5366H\\ 0 &=& B &-& 0.9900C &-& 0.1411D &+& 0.9602E &+& 0.2794F &-& 0.9111G &-& 0.4121H\\ -1.5 &=& B &-& 0.4161C &-& 0.9093D &-& 0.6536E &+& 0.7568F &+& 0.9602G &+& 0.2794H\\ 0.5 &=& B &+& 0.5403C &-& 0.8415D &-& 0.4161E &-& 0.9093F &-& 0.9900G &-& 0.1411H\\ 1&=&B&+&C&{}&{}&+&E&{}&{}&+&G&{}&{}\\ -1 &=& B &+& 0.5403C &+& 0.8415D &-& 0.4161E &+& 0.9093F &-& 0.9900G &+& 0.1411H\\ -0.5 &=& B &-& 0.4161C &+& 0.9093D &-& 0.6536E &-& 0.7568F &+& 0.9602G &-& 0.2794H\\ 2 &=& B &-& 0.9900C &+& 0.1411D &+& 0.9602E &-& 0.2794F &-& 0.9111G &+& 0.4121H\\ -1 &=& B &-& 0.6536C &-& 0.7568D &-& 0.1455E &+& 0.9894F &+& 0.8439G &-& 0.5366H.\end{array}\nonumber$

Hence we want to solve the least-squares problem

$\left(\begin{array}{rrrrrrr} 1 &-0.6536& 0.7568& -0.1455& -0.9894& 0.8439 &0.5366\\ 1& -0.9900& -0.1411 &0.9602 &0.2794& -0.9111& -0.4121\\ 1& -0.4161& -0.9093& -0.6536& 0.7568& 0.9602& 0.2794\\ 1& 0.5403& -0.8415 &-0.4161& -0.9093& -0.9900 &-0.1411\\ 1& 1& 0& 1& 0& 1& 0\\ 1& 0.5403& 0.8415& -0.4161& 0.9093& -0.9900 &0.1411\\ 1& -0.4161& 0.9093& -0.6536& -0.7568& 0.9602& -0.2794\\ 1& -0.9900 &0.1411 &0.9602& -0.2794& -0.9111& 0.4121\\ 1& -0.6536& -0.7568& -0.1455& 0.9894 &0.8439 &-0.5366\end{array}\right)\left(\begin{array}{c}B\\C\\D\\E\\F\\G\\H\end{array}\right)=\left(\begin{array}{c}-1\\0\\-1.5\\0.5\\1\\-1\\-0.5\\2\\-1\end{array}\right).\nonumber$

We find the least-squares solution with the aid of a computer:

$\hat{x}\approx\left(\begin{array}{c} -0.1435 \\0.2611 \\-0.2337\\ 1.116\\ -0.5997\\ -0.2767 \\0.1076\end{array}\right).\nonumber$

Therefore, the best-fit function is

$\begin{split} y \amp\approx -0.1435 + 0.2611\cos(x) -0.2337\sin(x) + 1.116\cos(2x) -0.5997\sin(2x) \\ \amp\qquad\qquad -0.2767\cos(3x) + 0.1076\sin(3x). \end{split} \nonumber$

Figure $$\PageIndex{20}$$

As in the previous examples, the best-fit function minimizes the sum of the squares of the vertical distances from the graph of $$y = f(x)$$ to the data points.

The next example has a somewhat different flavor from the previous ones.

Example $$\PageIndex{10}$$: Best-fit ellipse

Find the best-fit ellipse through the points

$(0,2),\, (2,1),\, (1,-1),\, (-1,-2),\, (-3,1),\, (-1,-1). \nonumber$

Figure $$\PageIndex{22}$$

What quantity is being minimized?

Solution

The general equation for an ellipse (actually, for a nondegenerate conic section) is

$x^2 + By^2 + Cxy + Dx + Ey + F = 0. \nonumber$

This is an implicit equation: the ellipse is the set of all solutions of the equation, just like the unit circle is the set of solutions of $$x^2+y^2=1.$$ To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of $$x$$ and $$y\text{:}$$

$\label{eq:4} \begin{array}{rrrrrrrrrrrrl} (0)^2 &+& B(2)^2 &+& C(0)(2)&+& D(0) &+& E(2)&+& F&=& 0 \\ (2)^2 &+& B(1)^2 &+& C(2)(1)&+& D(2)&+&E(1)&+&F&=& 0 \\ (1)^2&+& B(-1)^2&+&C(1)(-1)&+&D(1)&+&E(-1)&+&F&=&0 \\ (-1)^2&+&B(-2)^2&+&C(-1)(2)&+&D(-1)&+&E(-2)&+&F&=&0 \\ (-3)^2&+&B(1)^2&+&C(-3)(1)&+&D(-3)&+&E(1)&+&F&=&0 \\ (-1)^2&+&B(-1)^2&+&C(-1)(-1)&+&D(-1)&+&E(-1)&+&F&=&0.\end{array}$

To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as $$Ax=b$$ for

$A=\left(\begin{array}{ccccc}4&0&0&2&1\\1&2&2&1&1\\1&-1&1&-1&1 \\ 4&2&-1&-2&1 \\ 1&-3&-3&1&1 \\ 1&1&-1&-1&1\end{array}\right)\quad x=\left(\begin{array}{c}B\\C\\D\\E\\F\end{array}\right)\quad b=\left(\begin{array}{c}0\\-4\\-1\\-1\\-9\\-1\end{array}\right).\nonumber$

We compute

$A^TA = \left(\begin{array}{ccccc}36&7&-5&0&12 \\ 7&19&9&-5&1 \\ -5&9&16&1&-2 \\ 0&-5&1&12&0 \\ 12&1&-2&0&6\end{array}\right) \qquad A^T b = \left(\begin{array}{c}-19\\17\\20\\-9\\-16\end{array}\right). \nonumber$

We form an augmented matrix and row reduce:

$\left(\begin{array}{ccccc|c} 36 &7& -5& 0& 12& -19\\ 7& 19& 9& -5& 1& 17\\ -5& 9& 16& 1& -2& 20\\ 0& -5& 1 &12& 0& -9\\ 12& 1& -2& 0& 6& -16\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccccc|c} 1& 0& 0& 0& 0& 405/266\\ 0& 1& 0& 0& 0& -89/133\\ 0& 0& 1& 0& 0& 201/133\\ 0& 0& 0& 1& 0& -123/266\\ 0& 0& 0& 0& 1& -687/133\end{array}\right).\nonumber$

The least-squares solution is

$\hat{x}=\left(\begin{array}{c}405/266\\ -89/133\\ 201/133\\ -123/266\\ -687/133\end{array}\right),\nonumber$

so the best-fit ellipse is

$x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} = 0. \nonumber$

Multiplying through by $$266\text{,}$$ we can write this as

$266 x^2 + 405 y^2 - 178 xy + 402 x - 123 y - 1374 = 0. \nonumber$

Figure $$\PageIndex{23}$$

Now we consider the question of what quantity is minimized by this ellipse. The least-squares solution $$\hat x$$ minimizes the sum of the squares of the entries of the vector $$b-A\hat x\text{,}$$ or equivalently, of $$A\hat x-b$$. The vector $$-b$$ contains the constant terms of the left-hand sides of $$\eqref{eq:4}$$, and

$A\hat{x}=\left(\begin{array}{rrrrrrrrr} \frac{405}{266}(2)^2 &-& \frac{89}{133}(0)(2)&+&\frac{201}{133}(0)&-&\frac{123}{266}(2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-& \frac{89}{133}(2)(1)&+&\frac{201}{133}(2)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2 &-&\frac{89}{133}(1)(-1)&+&\frac{201}{133}(1)&-&\frac{123}{266}(-1)&-&\frac{687}{133} \\ \frac{405}{266}(-2)^2&-&\frac{89}{133}(-1)(-2)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-&\frac{89}{133}(-3)(1)&+&\frac{201}{133}(-3)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2&-&\frac{89}{133}(-1)(-1)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-1)&-&\frac{687}{133}\end{array}\right)\nonumber$

contains the rest of the terms on the left-hand side of $$\eqref{eq:4}$$. Therefore, the entries of $$A\hat x-b$$ are the quantities obtained by evaluating the function

$f(x,y) = x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} \nonumber$

on the given data points.

If our data points actually lay on the ellipse defined by $$f(x,y)=0\text{,}$$ then evaluating $$f(x,y)$$ on our data points would always yield zero, so $$A\hat x-b$$ would be the zero vector. This is not the case; instead, $$A\hat x-b$$ contains the actual values of $$f(x,y)$$ when evaluated on our data points. The quantity being minimized is the sum of the squares of these values:

$\begin{split} \amp\text{minimized} = \\ \amp\quad f(0,2)^2 + f(2,1)^2 + f(1,-1)^2 + f(-1,-2)^2 + f(-3,1)^2 + f(-1,-1)^2. \end{split} \nonumber$

One way to visualize this is as follows. We can put this best-fit problem into the framework of Example $$\PageIndex{8}$$ by asking to find an equation of the form

$f(x,y) = x^2 + By^2 + Cxy + Dx + Ey + F \nonumber$

which best approximates the data table

$\begin{array}{r|r|c} x & y & f(x,y) \\\hline 0 & 2 & 0 \\ 2 & 1 & 0 \\ 1 & -1 & 0 \\ -1 & -2 & 0 \\ -3 & 1 & 0 \\ -1 & -1 & 0\rlap. \end{array} \nonumber$

The resulting function minimizes the sum of the squares of the vertical distances from these data points $$(0,2,0),\,(2,1,0),\,\ldots\text{,}$$ which lie on the $$xy$$-plane, to the graph of $$f(x,y)$$.

Note $$\PageIndex{3}$$

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

6.5: The Method of Least Squares is shared under a GNU Free Documentation License 1.3 license and was authored, remixed, and/or curated by LibreTexts.