6.5: The Method of Least Squares

Last updated
Save as PDF

Page ID: 79396

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\dsum}{\displaystyle\sum\limits} $

$ \newcommand{\dint}{\displaystyle\int\limits} $

$ \newcommand{\dlim}{\displaystyle\lim\limits} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$\newcommand{\longvect}{\overrightarrow}$

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

Learning Objectives

Learn examples of best-fit problems.
Learn to turn a best-fit problem into a least-squares problem.
Recipe: find a least-squares solution (two ways). solution.
Picture: geometry of a least-squares solution.
Vocabulary words: least-squares solution.

In this section, we answer the following important question:

Suppose that $Ax=b$ does not have a solution. What is the best approximate solution?

For our purposes, the best approximate solution is called the least-squares solution. We will present two methods for finding least-squares solutions, and we will give several applications to best-fit problems.

Least-Squares Solutions

We begin by clarifying exactly what we will mean by a “best approximate solution” to an inconsistent matrix equation $Ax=b$.

Definition $\PageIndex{1}$: Least-Squares Solution

Let $A$ be an $m\times n$ matrix and let $b$ be a vector in $\mathbb{R}^m $. A least-squares solution of the matrix equation $Ax=b$ is a vector $\hat x$ in $\mathbb{R}^n $ such that

\[ \text{dist}(b,\,A\hat x) \leq \text{dist}(b,\,Ax) \nonumber \]

for all other vectors $x$ in $\mathbb{R}^n $.

Recall that $\text{dist}(v,w) = \|v-w\|$ is the distance, Definition 6.1.2 in Section 6.1, between the vectors $v$ and $w$. The term “least squares” comes from the fact that $\text{dist}(b,Ax) = \|b-A\hat x\|$ is the square root of the sum of the squares of the entries of the vector $b-A\hat x$. So a least-squares solution minimizes the sum of the squares of the differences between the entries of $A\hat x$ and $b$. In other words, a least-squares solution solves the equation $Ax=b$ as closely as possible, in the sense that the sum of the squares of the difference $b-Ax$ is minimized.

Least Squares: Picture

Suppose that the equation $Ax=b$ is inconsistent. Recall from Note 2.3.6 in Section 2.3 that the column space of $A$ is the set of all other vectors $c$ such that $Ax=c$ is consistent. In other words, $\text{Col}(A)$ is the set of all vectors of the form $Ax.$ Hence, the closest vector, Note 6.3.1 in Section 6.3, of the form $Ax$ to $b$ is the orthogonal projection of $b$ onto $\text{Col}(A)$. This is denoted $b_{\text{Col}(A)}\text{,}$ following Definition 6.3.1 in Section 6.3.

$A 3D graph of a vector b projected onto a plane, showing components $b - \hat{A}x = a_{\text{closest}}$, $\hat{A}^TAx$, and orthogonal projection lines.$

Figure $\PageIndex{1}$

Note $\PageIndex{1}$

A least-squares solution of $Ax=b$ is a solution $\hat x$ of the consistent equation $Ax=b_{\text{Col}(A)}$

Note $\PageIndex{2}$

If $Ax=b$ is consistent, then $b_{\text{Col}(A)} = b\text{,}$ so that a least-squares solution is the same as a usual solution.

Where is $\hat x$ in this picture? If $v_1,v_2,\ldots,v_n$ are the columns of $A\text{,}$ then

\[ A\hat x = A\left(\begin{array}{c}\hat{x}_1 \\ \hat{x}_2 \\ \vdots \\ \hat{x}_{n}\end{array}\right) = \hat x_1v_1 + \hat x_2v_2 + \cdots + \hat x_nv_n. \nonumber \]

Hence the entries of $\hat x$ are the “coordinates” of $b_{\text{Col}(A)}$ with respect to the spanning set $\{v_1,v_2,\ldots,v_m\}$ of $\text{Col}(A)$. (They are honest $\mathcal{B}$-coordinates if the columns of $A$ are linearly independent.)

$A diagram showing two vectors, n1 and n2, on a purple plane labeled Ceil.N, with angles and forces annotated. A force labeled AF in force points vertically upwards.$

Figure $\PageIndex{2}$

$A 3D plot illustrating a transformation of vectors and a matrix equation with an overlay of a purple plane.$

Figure $\PageIndex{3}$: The violet plane is $\text{Col}(A)$. The closest that $Ax$ can get to $b$ is the closest vector on $\text{Col}(A)$ to $b\text{,}$ which is the orthogonal projection $b_{\text{Col}(A)}$ (in blue). The vectors $v_1,v_2$ are the columns of $A\text{,}$ and the coefficients of $\hat x$ are the lengths of the green lines. Click and drag $b$ to move it.

We learned to solve this kind of orthogonal projection problem in Section 6.3.

Theorem $\PageIndex{1}$

Let $A$ be an $m\times n$ matrix and let $b$ be a vector in $\mathbb{R}^m $. The least-squares solutions of $Ax=b$ are the solutions of the matrix equation

\[ A^TAx = A^Tb \nonumber \]

Proof: By Theorem 6.3.2 in Section 6.3, if $\hat x$ is a solution of the matrix equation $A^TAx = A^Tb\text{,}$ then $A\hat x$ is equal to $b_{\text{Col}(A)}$. We argued above that a least-squares solution of $Ax=b$ is a solution of $Ax = b_{\text{Col}(A)}.$

In particular, finding a least-squares solution means solving a consistent system of linear equations. We can translate the above theorem into a recipe:

Recipe 1: Compute a Least-Squares Solution

Let $A$ be an $m\times n$ matrix and let $b$ be a vector in $\mathbb{R}^n $. Here is a method for computing a least-squares solution of $Ax=b\text{:}$

Compute the matrix $A^TA$ and the vector $A^Tb$.
Form the augmented matrix for the matrix equation $A^TAx = A^Tb\text{,}$ and row reduce.
This equation is always consistent, and any solution $\hat x$ is a least-squares solution.

To reiterate: once you have found a least-squares solution $\hat x$ of $Ax=b\text{,}$ then $b_{\text{Col}(A)}$ is equal to $A\hat x$.

Example $\PageIndex{1}$

Find the least-squares solutions of $Ax=b$ where:

\[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

What quantity is being minimized?

Solution

We have

\[ A^TA = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) = \left(\begin{array}{cc}5&3\\3&3\end{array}\right) \nonumber \]

and

\[ A^T b = \left(\begin{array}{ccc}0&1&2\\1&1&1\end{array}\right)\left(\begin{array}{c}6\\0\\0\end{array}\right) = \left(\begin{array}{c}0\\6\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{cc|c}5&3&0\\3&3&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&-3\\0&1&5\end{array}\right). \nonumber \]

Therefore, the only least-squares solution is $\hat x = {-3\choose 5}.$

This solution minimizes the distance from $A\hat x$ to $b\text{,}$ i.e., the sum of the squares of the entries of $b-A\hat x = b-b_{\text{Col}(A)} = b_{\text{Col}(A)^\perp}$. In this case, we have

\[||b-A\hat{x}||=\left|\left|\left(\begin{array}{c}6\\0\\0\end{array}\right)-\left(\begin{array}{c}5\\2\\-1\end{array}\right)\right|\right|=\left|\left|\left(\begin{array}{c}1\\-2\\1\end{array}\right)\right|\right|=\sqrt{1^2+(-2)^2+1^2}=\sqrt{6}.\nonumber\]

Therefore, $b_{\text{Col}(A)} = A\hat x$ is $\sqrt 6$ units from $b.$

In the following picture, $v_1,v_2$ are the columns of $A\text{:}$

$A 3D coordinate plane with vectors labeled v1 and v2 on the surface. The vector b is pointing outwards, perpendicular. The plane is labeled ColA, and an equation involving b is shown.$

Figure $\PageIndex{4}$

$A 3D plot with a purple plane representing a linear equation. Includes vectors and matrices with values, and labeled points in red, blue, and orange.$

Figure $\PageIndex{5}$: The violet plane is $\text{Col}(A)$. The closest that $Ax$ can get to $b$ is the closest vector on $\text{Col}(A)$ to $b\text{,}$ which is the orthogonal projection $b_{\text{Col}(A)}$ (in blue). The vectors $v_1,v_2$ are the columns of $A\text{,}$ and the coefficients of $\hat x$ are the $\mathcal{B}$-coordinates of $b_{\text{Col}(A)}\text{,}$ where $\mathcal{B} = \{v_1,v_2\}$. lengths of the green lines.

Example $\PageIndex{2}$

Find the least-squares solutions of $Ax=b$ where:

\[ A = \left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) \qquad b = \left(\begin{array}{c}1\\0\\-1\end{array}\right). \nonumber \]

Solution

We have

\[ A^T A = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{cc}2&0\\-1&1\\0&2\end{array}\right) = \left(\begin{array}{cc}5&-1\\-1&5\end{array}\right) \nonumber \]

and

\[ A^T b = \left(\begin{array}{ccc}2&-1&0\\0&1&2\end{array}\right)\left(\begin{array}{c}1\\0\\-1\end{array}\right) = \left(\begin{array}{c}2\\-2\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{cc|c}5&-1&2\\-1&5&-2\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{cc|c}1&0&1/3\\0&1&-1/3\end{array}\right). \nonumber \]

Therefore, the only least-squares solution is $\hat x = \frac 13{1\choose -1}.$

$3D coordinate plane with translucent purple plane and vectors in red, blue, green, orange. Equations and matrices are displayed at the top within a boxed area.$

Figure $\PageIndex{6}$: The red plane is $\text{Col}(A)$. The closest that $Ax$ can get to $b$ is the closest vector on $\text{Col}(A)$ to $b\text{,}$ which is the orthogonal projection $b_{\text{Col}(A)}$ (in blue). The vectors $v_1,v_2$ are the columns of $A\text{,}$ and the coefficients of $\hat x$ are the $\mathcal{B}$-coordinates of $b_{\text{Col}(A)}\text{,}$ where $\mathcal{B} = \{v_1,v_2\}$. lengths of the green lines.

The reader may have noticed that we have been careful to say “the least-squares solutions” in the plural, and “a least-squares solution” using the indefinite article. This is because a least-squares solution need not be unique: indeed, if the columns of $A$ are linearly dependent, then $Ax=b_{\text{Col}(A)}$ has infinitely many solutions. The following theorem, which gives equivalent criteria for uniqueness, is an analogue of Corollary 6.3.1 in Section 6.3.

Theorem $\PageIndex{2}$

Let $A$ be an $m\times n$ matrix and let $b$ be a vector in $\mathbb{R}^m $. The following are equivalent:

$Ax=b$ has a unique least-squares solution.
The columns of $A$ are linearly independent.
$A^TA$ is invertible.

In this case, the least-squares solution is

\[ \hat x = (A^TA)^{-1} A^Tb. \nonumber \]

Proof: The set of least-squares solutions of $Ax=b$ is the solution set of the consistent equation $A^TAx=A^Tb\text{,}$ which is a translate of the solution set of the homogeneous equation $A^TAx=0$. Since $A^TA$ is a square matrix, the equivalence of 1 and 3 follows from Theorem 5.1.1 in Section 5.1. The set of least squares-solutions is also the solution set of the consistent equation $Ax = b_{\text{Col}(A)}\text{,}$ which has a unique solution if and only if the columns of $A$ are linearly independent by Recipe: Checking Linear Independence in Section 2.5.

Example $\PageIndex{3}$: Infinitely many least-squares solutions

Find the least-squares solutions of $Ax=b$ where:

\[ A = \left(\begin{array}{ccc}1&0&1\\1&1&-1\\1&2&-3\end{array}\right) \qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

Solution

We have

\[ A^TA = \left(\begin{array}{ccc}3&3&-3\\3&5&-7\\-3&-7&11\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}6\\0\\6\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[ \left(\begin{array}{ccc|c}3&3&-3&6\\3&5&-7&0\\-3&-7&11&6\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&1&5\\0&1&-2&-3\\0&0&0&0\end{array}\right). \nonumber \]

The free variable is $x_3\text{,}$ so the solution set is

\[\left\{\begin{array}{rrrrr}x_1 &=& -x_3 &+& 5\\ x_2 &=& 2x_3 &-& 3\\ x_3 &=& x_3&{}&{}\end{array}\right. \quad\xrightarrow[\text{vector form}]{\text{parametric}}\quad \hat{x}=\left(\begin{array}{c}x_1\\x_2\\x_3\end{array}\right)=x_3\left(\begin{array}{c}-1\\2\\1\end{array}\right)+\left(\begin{array}{c}5\\-3\\0\end{array}\right).\nonumber\]

For example, taking $x_3 = 0$ and $x_3=1$ gives the least-squares solutions

\[ \hat x = \left(\begin{array}{c}5\\-3\\0\end{array}\right)\quad\text{and}\quad \hat x =\left(\begin{array}{c}4\\-1\\1\end{array}\right). \nonumber \]

Geometrically, we see that the columns $v_1,v_2,v_3$ of $A$ are coplanar:

$A 3D diagram showing a tilted purple plane labeled Child with two vector arrows: one labeled Child.A at the center and another labeled v.Aim - AE pointing diagonally upward.$

Figure $\PageIndex{7}$

Therefore, there are many ways of writing $b_{\text{Col}(A)}$ as a linear combination of $v_1,v_2,v_3$.

$A 3D plot showing a plane and labeled vectors. The equation $ \textbf{A}\mathbf{x} = \textbf{b}_{\text{est}} $ is illustrated with matrices and vectors. The vector $\textbf{b}_{\text{est}}$ is highlighted.$

Figure $\PageIndex{8}$: The three columns of $A$ are coplanar, so there are many least-squares solutions. (The demo picks one solution when you move $b$.)

As usual, calculations involving projections become easier in the presence of an orthogonal set. Indeed, if $A$ is an $m\times n$ matrix with orthogonal columns $u_1,u_2,\ldots,u_m\text{,}$ then we can use the Theorem 6.4.1 in Section 6.4 to write

\[ b_{\text{Col}(A)} = \frac{b\cdot u_1}{u_1\cdot u_1}\,u_1 + \frac{b\cdot u_2}{u_2\cdot u_2}\,u_2 + \cdots + \frac{b\cdot u_m}{u_m\cdot u_m}\,u_m = A\left(\begin{array}{c}(b\cdot u_1)/(u_1\cdot u_1) \\ (b\cdot u_2)/(u_2\cdot u_2) \\ \vdots \\ (b\cdot u_m)/(u_m\cdot u_m)\end{array}\right). \nonumber \]

Note that the least-squares solution is unique in this case, since an orthogonal set is linearly independent, Fact 6.4.1 in Section 6.4.

Recipe 2: Compute a Least-Squares Solution

Let $A$ be an $m\times n$ matrix with orthogonal columns $u_1,u_2,\ldots,u_m\text{,}$ and let $b$ be a vector in $\mathbb{R}^n $. Then the least-squares solution of $Ax=b$ is the vector

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \ldots,\; \frac{b\cdot u_m}{u_m\cdot u_m} \right). \nonumber \]

This formula is particularly useful in the sciences, as matrices with orthogonal columns often arise in nature.

Example $\PageIndex{4}$

Find the least-squares solution of $Ax=b$ where:

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

Solution

Let $u_1,u_2,u_3$ be the columns of $A$. These form an orthogonal set, so

\[ \hat x = \left(\frac{b\cdot u_1}{u_1\cdot u_1},\; \frac{b\cdot u_2}{u_2\cdot u_2},\; \frac{b\cdot u_3}{u_3\cdot u_3} \right) = \left(\frac{-3}{2},\;\frac{-3}{2},\;\frac{8}{4}\right) = \left(-\frac32,\;-\frac32,\;2\right). \nonumber \]

Compare Example 6.4.8 in Section 6.4.

Best-Fit Problems

In this subsection we give an application of the method of least squares to data modeling. We begin with a basic example.

Example $\PageIndex{5}$: Best-Fit Line

Suppose that we have measured three data points

\[ (0,6),\quad (1,0),\quad (2,0), \nonumber \]

and that our model for these data asserts that the points should lie on a line. Of course, these three points do not actually lie on a single line, but this could be due to errors in our measurement. How do we predict which line they are supposed to lie on?

$Graph with three points: (0,6) at the top, (1,0) in the middle, (2,0) at the bottom, plotted on a grid with labeled coordinates.$

Figure $\PageIndex{9}$

The general equation for a (non-vertical) line is

\[ y = Mx + B. \nonumber \]

If our three data points were to lie on this line, then the following equations would be satisfied:

\[\begin{align} 6&=M\cdot 0+B\nonumber \\ 0&=M\cdot 1+B\label{eq:1} \\ 0&=M\cdot 2+B.\nonumber\end{align}\]

In order to find the best-fit line, we try to solve the above equations in the unknowns $M$ and $B$. As the three points do not actually lie on a line, there is no actual solution, so instead we compute a least-squares solution.

Putting our linear equations into matrix form, we are trying to solve $Ax=b$ for

\[ A = \left(\begin{array}{cc}0&1\\1&1\\2&1\end{array}\right) \qquad x = \left(\begin{array}{c}M\\B\end{array}\right)\qquad b = \left(\begin{array}{c}6\\0\\0\end{array}\right). \nonumber \]

We solved this least-squares problem in Example $\PageIndex{1}$: the only least-squares solution to $Ax=b$ is $\hat x = {M\choose B} = {-3\choose 5}\text{,}$ so the best-fit line is

\[ y = -3x + 5. \nonumber \]

$A graph with a red line labeled y + 3x = -5 intersects at points (0,6), (1,0), and (2,0) on a grid.$

Figure $\PageIndex{10}$

What exactly is the line $y= f(x) = -3x+5$ minimizing? The least-squares solution $\hat x$ minimizes the sum of the squares of the entries of the vector $b-A\hat x$. The vector $b$ is the left-hand side of $\eqref{eq:1}$, and

\[ A\left(\begin{array}{c}-3\\5\end{array}\right) = \left(\begin{array}{c}-3(0)+5\\-3(1)+5\\-3(2)+5\end{array}\right) = \left(\begin{array}{c}f(0)\\f(1)\\f(2)\end{array}\right). \nonumber \]

In other words, $A\hat x$ is the vector whose entries are the $y$-coordinates of the graph of the line at the values of $x$ we specified in our data points, and $b$ is the vector whose entries are the $y$-coordinates of those data points. The difference $b-A\hat x$ is the vertical distance of the graph from the data points:

$Graph with a red line for equation $y = -3x + 5$, passing through points (0,6) and (2,0). Black points marked at (0,6), (1,0), and (2,0). Grid background with vertical and horizontal lines.$

Figure $\PageIndex{11}$

\[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}6\\0\\0\end{array}\right)-A\left(\begin{array}{c}-3\\5\end{array}\right)=\left(\begin{array}{c}-1\\2\\-1\end{array}\right)}\nonumber\]

The best-fit line minimizes the sum of the squares of these vertical distances.

Example $\PageIndex{6}$: Interactive: Best-fit line

$Graph of the function $f(x) = -3x + 5$ with a red line crossing the y-axis at 5 and x-axis at approximately 1.67. Grid lines and some mathematical expressions are shown in the background.$

Figure $\PageIndex{12}$: The best-fit line minimizes the sum of the squares of the vertical distances (violet). Click and drag the points to see how the best-fit line changes.

Example $\PageIndex{7}$: Best-fit parabola

Find the parabola that best approximates the data points

\[ (-1,\,1/2),\quad(1,\,-1),\quad(2,\,-1/2),\quad(3,\,2). \nonumber \]

$A graph with labeled points: (-1, 1/2) in the second quadrant, (3, 2) in the first quadrant, (1, -1) and (2, -1/2) in the fourth quadrant.$

Figure $\PageIndex{13}$

What quantity is being minimized?

Solution

The general equation for a parabola is

\[ y = Bx^2 + Cx + D. \nonumber \]

If the four points were to lie on this parabola, then the following equations would be satisfied:

\[\begin{align} \frac{1}{2}&=B(-1)^2+C(-1)+D\nonumber \\ -1&=B(1)^2+C(1)+D\nonumber \\ -\frac{1}{2}&=B(2)^2+C(2)+D\label{eq:2} \\ 2&=B(3)^2+C(3)+D.\nonumber\end{align}\]

We treat this as a system of equations in the unknowns $B,C,D$. In matrix form, we can write this as $Ax=b$ for

\[ A = \left(\begin{array}{ccc}1&-1&1\\1&1&1\\4&2&1\\9&3&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}1/2 \\ -1\\-1/2 \\ 2\end{array}\right). \nonumber \]

We find a least-squares solution by multiplying both sides by the transpose:

\[ A^TA = \left(\begin{array}{ccc}99&35&15\\35&15&5\\15&5&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}31/2\\7/2\\1\end{array}\right), \nonumber \]

then forming an augmented matrix and row reducing:

\[ \left(\begin{array}{ccc|c}99&35&15&31/2 \\ 35&15&5&7/2 \\ 15&5&4&1\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccc|c}1&0&0&53/88 \\ 0&1&0&-379/440 \\ 0&0&1&-41/44\end{array}\right) \implies \hat x = \left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44 \end{array}\right). \nonumber \]

The best-fit parabola is

\[ y = \frac{53}{88}x^2 - \frac{379}{440}x - \frac{41}{44}. \nonumber \]

Multiplying through by $88\text{,}$ we can write this as

\[ 88y = 53x^2 - \frac{379}{5}x - 82. \nonumber \]

$A red parabola graph labeled with the equation $88y = 53x^2 - \frac{379}{5}x - 82$, passing through points (-1, 0.5), (1, -1), (2, -0.5), and (3, 2) on a grid.$

Figure $\PageIndex{14}$

Now we consider what exactly the parabola $y = f(x)$ is minimizing. The least-squares solution $\hat x$ minimizes the sum of the squares of the entries of the vector $b-A\hat x$. The vector $b$ is the left-hand side of $\eqref{eq:2}$, and

\[A\hat{x}=\left(\begin{array}{c} \frac{53}{88}(-1)^2-\frac{379}{440}(-1)-\frac{41}{44} \\ \frac{53}{88}(1)^2-\frac{379}{440}(1)-\frac{41}{44} \\ \frac{53}{88}(2)^2-\frac{379}{440}(2)-\frac{41}{44} \\ \frac{53}{88}(3)^2-\frac{379}{440}(3)-\frac{41}{44}\end{array}\right)=\left(\begin{array}{c}f(-1) \\ f(1) \\ f(2) \\ f(3)\end{array}\right).\nonumber\]

In other words, $A\hat x$ is the vector whose entries are the $y$-coordinates of the graph of the parabola at the values of $x$ we specified in our data points, and $b$ is the vector whose entries are the $y$-coordinates of those data points. The difference $b-A\hat x$ is the vertical distance of the graph from the data points:

$A red quadratic curve on a graph with labeled points at (-7, 22), (-3, 20), (0, -82), (14, 54), and the vertex at (71, 110). Equation: 88y = 53x² - 379/5x - 82.$

Figure $\PageIndex{15}$

\[\color{blue}{b-A\hat{x}=\left(\begin{array}{c}1/2 \\ -1 \\ -1/2 \\ 2\end{array}\right)-A\left(\begin{array}{c}53/88 \\ -379/440 \\ -41/44\end{array}\right)=\left(\begin{array}{c}-7/220 \\ 21/110 \\ -14/55 \\ 21/220\end{array}\right)}\nonumber\]

The best-fit parabola minimizes the sum of the squares of these vertical distances.

$Graph of a polynomial curve (red line) fitted against six data points (blue). Equation and values are shown above the plot. The curve is parabolic, indicating a quadratic function.$

Figure $\PageIndex{16}$: The best-fit parabola minimizes the sum of the squares of the vertical distances (violet). Click and drag the points to see how the best-fit parabola changes.

Example $\PageIndex{8}$: Best-fit linear function

Find the linear function $f(x,y)$ that best approximates the following data:

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 1 & 0 & 0 \\ 0 & 1 & 1 \\ -1 & 0 & 3 \\ 0 & -1 & 4 \end{array} \nonumber \]

What quantity is being minimized?

Solution

The general equation for a linear function in two variables is

\[ f(x,y) = Bx + Cy + D. \nonumber \]

We want to solve the following system of equations in the unknowns $B,C,D\text{:}$

\[\begin{align} B(1)+C(0)+D&=0 \nonumber \\ B(0)+C(1)+D&=1 \nonumber \\ B(-1)+C(0)+D&=3\label{eq:3} \\ B(0)+C(-1)+D&=4\nonumber\end{align}\]

In matrix form, we can write this as $Ax=b$ for

\[ A = \left(\begin{array}{ccc}1&0&1\\0&1&1\\-1&0&1\\0&-1&1\end{array}\right) \qquad x = \left(\begin{array}{c}B\\C\\D\end{array}\right) \qquad b = \left(\begin{array}{c}0\\1\\3\\4\end{array}\right). \nonumber \]

We observe that the columns $u_1,u_2,u_3$ of $A$ are orthogonal, so we can use Recipe 2: Compute a Least-Squares Solution:

\[ A^TA = \left(\begin{array}{ccc}2&0&0\\0&2&0\\0&0&4\end{array}\right) \qquad A^Tb = \left(\begin{array}{c}-3\\-3\\8\end{array}\right). \nonumber \] The matrix $A^TA$ is diagonal (do you see why that happened?), so it is easy to solve the equation $A^TAx = A^Tb\text{:}$

\[ \left(\begin{array}{cccc}2&0&0&-3 \\ 0&2&0&-3 \\ 0&0&4&8\end{array}\right) \xrightarrow{\text{RREF}} \left(\begin{array}{cccc}1&0&0&-3/2 \\ 0&1&0&-3/2 \\ 0&0&1&2\end{array}\right) \implies \hat x = \left(\begin{array}{c}-3/2 \\ -3/2 \\ 2\end{array}\right). \nonumber \] Therefore, the best-fit linear equation is

\[ f(x,y) = -\frac 32x - \frac32y + 2. \nonumber \]

Here is a picture of the graph of $f(x,y)\text{:}$

$3D geometric diagram with labeled points on a rhombohedral lattice. Points include (0,-1,0), (0,1,0), (0,0,1), (0,0,-1), (1,0,0), (-1,0,0), (1,-1,0), (0,-1,-1), (0,-1,1), shaded pink.$

Figure $\PageIndex{17}$

Now we consider what quantity is being minimized by the function $f(x,y)$. The least-squares solution $\hat x$ minimizes the sum of the squares of the entries of the vector $b-A\hat x$. The vector $b$ is the right-hand side of $\eqref{eq:3}$, and

\[A\hat{x}=\left(\begin{array}{rrrrr}-\frac32(1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(1) &+& 2\\ -\frac32(-1) &-& \frac32(0) &+& 2\\ -\frac32(0) &-& \frac32(-1) &+& 2\end{array}\right)=\left(\begin{array}{c}f(1,0) \\ f(0,1) \\ f(-1,0) \\ f(0,-1)\end{array}\right).\nonumber\]

In other words, $A\hat x$ is the vector whose entries are the values of $f$ evaluated on the points $(x,y)$ we specified in our data table, and $b$ is the vector whose entries are the desired values of $f$ evaluated at those points. The difference $b-A\hat x$ is the vertical distance of the graph from the data points, as indicated in the above picture. The best-fit linear function minimizes the sum of these vertical distances.

$A 3D plot shows a red plane intersecting a grid. Blue points and arrows mark specific locations. The equation f(x,y) = -1.50x - 1.50y + 2.00 is displayed above the plot.$

Figure $\PageIndex{18}$: The best-fit linear function minimizes the sum of the squares of the vertical distances (violet). Click and drag the points to see how the best-fit linear function changes.

All of the above examples have the following form: some number of data points $(x,y)$ are specified, and we want to find a function

\[ y = B_1g_1(x) + B_2g_2(x) + \cdots + B_mg_m(x) \nonumber \]

that best approximates these points, where $g_1,g_2,\ldots,g_m$ are fixed functions of $x$. Indeed, in the best-fit line example we had $g_1(x)=x$ and $g_2(x)=1\text{;}$ in the best-fit parabola example we had $g_1(x)=x^2\text{,}$ $g_2(x)=x\text{,}$ and $g_3(x)=1\text{;}$ and in the best-fit linear function example we had $g_1(x_1,x_2)=x_1\text{,}$ $g_2(x_1,x_2)=x_2\text{,}$ and $g_3(x_1,x_2)=1$ (in this example we take $x$ to be a vector with two entries). We evaluate the above equation on the given data points to obtain a system of linear equations in the unknowns $B_1,B_2,\ldots,B_m$—once we evaluate the $g_i\text{,}$ they just become numbers, so it does not matter what they are—and we find the least-squares solution. The resulting best-fit function minimizes the sum of the squares of the vertical distances from the graph of $y = f(x)$ to our original data points.

To emphasize that the nature of the functions $g_i$ really is irrelevant, consider the following example.

Example $\PageIndex{9}$: Best-fit trigonometric function

What is the best-fit function of the form

\[ y=B+C\cos(x)+D\sin(x)+E\cos(2x)+F\sin(2x)+G\cos(3x)+H\sin(3x) \nonumber \]

passing through the points

\[ \left(\begin{array}{c}-4\\ -1\end{array}\right),\; \left(\begin{array}{c}-3\\ 0\end{array}\right),\; \left(\begin{array}{c}-2\\ -1.5\end{array}\right),\; \left(\begin{array}{c}-1\\ .5\end{array}\right),\; \left(\begin{array}{c}0\\1\end{array}\right),\; \left(\begin{array}{c}1\\-1\end{array}\right),\; \left(\begin{array}{c}2\\-.5\end{array}\right),\; \left(\begin{array}{c}3\\2\end{array}\right),\; \left(\begin{array}{c}4 \\-1\end{array}\right)? \nonumber \]

$Graph with labeled points: (-4,6), (-1,5), (0,1), (3,2), (-4,-1), (-2,-3), (1,-1), (4,-1). X and Y axes intersect at (0,0).$

Figure $\PageIndex{19}$

Solution

We want to solve the system of equations

\[\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &+& C\cos(-4) &+& D\sin(-4) &+& E\cos(-8) &+& F\sin(-8) &+& G\cos(-12) &+& H\sin(-12)\\ 0 &=& B &+& C\cos(-3) &+& D\sin(-3) &+& E\cos(-6) &+& F\sin(-6) &+& G\cos(-9) &+& H\sin(-9)\\ -1.5 &=& B &+& C\cos(-2) &+& D\sin(-2) &+& E\cos(-4) &+& F\sin(-4) &+& G\cos(-6) &+& H\sin(-6) \\ 0.5 &=& B &+& C\cos(-1) &+& D\sin(-1) &+& E\cos(-2) &+& F\sin(-2) &+& G\cos(-3) &+& H\sin(-3)\\ 1 &=& B &+& C\cos(0) &+& D\sin(0) &+& E\cos(0) &+& F\sin(0) &+& G\cos(0) &+& H\sin(0)\\ -1 &=& B &+& C\cos(1) &+& D\sin(1) &+& E\cos(2) &+& F\sin(2) &+& G\cos(3) &+& H\sin(3)\\ -0.5 &=& B &+& C\cos(2) &+& D\sin(2) &+& E\cos(4) &+& F\sin(4) &+& G\cos(6) &+& H\sin(6)\\ 2 &=& B &+& C\cos(3) &+& D\sin(3) &+& E\cos(6) &+& F\sin(6) &+& G\cos(9) &+& H\sin(9)\\ -1 &=& B &+& C\cos(4) &+& D\sin(4) &+& E\cos(8) &+& F\sin(8) &+& G\cos(12) &+& H\sin(12).\end{array}\nonumber\]

All of the terms in these equations are numbers, except for the unknowns $B,C,D,E,F,G,H\text{:}$

\[\begin{array}{rrrrrrrrrrrrrrr} -1 &=& B &-& 0.6536C &+& 0.7568D &-& 0.1455E &-& 0.9894F &+& 0.8439G &+& 0.5366H\\ 0 &=& B &-& 0.9900C &-& 0.1411D &+& 0.9602E &+& 0.2794F &-& 0.9111G &-& 0.4121H\\ -1.5 &=& B &-& 0.4161C &-& 0.9093D &-& 0.6536E &+& 0.7568F &+& 0.9602G &+& 0.2794H\\ 0.5 &=& B &+& 0.5403C &-& 0.8415D &-& 0.4161E &-& 0.9093F &-& 0.9900G &-& 0.1411H\\ 1&=&B&+&C&{}&{}&+&E&{}&{}&+&G&{}&{}\\ -1 &=& B &+& 0.5403C &+& 0.8415D &-& 0.4161E &+& 0.9093F &-& 0.9900G &+& 0.1411H\\ -0.5 &=& B &-& 0.4161C &+& 0.9093D &-& 0.6536E &-& 0.7568F &+& 0.9602G &-& 0.2794H\\ 2 &=& B &-& 0.9900C &+& 0.1411D &+& 0.9602E &-& 0.2794F &-& 0.9111G &+& 0.4121H\\ -1 &=& B &-& 0.6536C &-& 0.7568D &-& 0.1455E &+& 0.9894F &+& 0.8439G &-& 0.5366H.\end{array}\nonumber\]

Hence we want to solve the least-squares problem

\[\left(\begin{array}{rrrrrrr} 1 &-0.6536& 0.7568& -0.1455& -0.9894& 0.8439 &0.5366\\ 1& -0.9900& -0.1411 &0.9602 &0.2794& -0.9111& -0.4121\\ 1& -0.4161& -0.9093& -0.6536& 0.7568& 0.9602& 0.2794\\ 1& 0.5403& -0.8415 &-0.4161& -0.9093& -0.9900 &-0.1411\\ 1& 1& 0& 1& 0& 1& 0\\ 1& 0.5403& 0.8415& -0.4161& 0.9093& -0.9900 &0.1411\\ 1& -0.4161& 0.9093& -0.6536& -0.7568& 0.9602& -0.2794\\ 1& -0.9900 &0.1411 &0.9602& -0.2794& -0.9111& 0.4121\\ 1& -0.6536& -0.7568& -0.1455& 0.9894 &0.8439 &-0.5366\end{array}\right)\left(\begin{array}{c}B\\C\\D\\E\\F\\G\\H\end{array}\right)=\left(\begin{array}{c}-1\\0\\-1.5\\0.5\\1\\-1\\-0.5\\2\\-1\end{array}\right).\nonumber\]

We find the least-squares solution with the aid of a computer:

\[\hat{x}\approx\left(\begin{array}{c} -0.1435 \\0.2611 \\-0.2337\\ 1.116\\ -0.5997\\ -0.2767 \\0.1076\end{array}\right).\nonumber\]

Therefore, the best-fit function is

\[ \begin{split} y \amp\approx -0.1435 + 0.2611\cos(x) -0.2337\sin(x) + 1.116\cos(2x) -0.5997\sin(2x) \\ \amp\qquad\qquad -0.2767\cos(3x) + 0.1076\sin(3x). \end{split} \nonumber \]

$A sine-like graph in red with labeled points: (-4,-1), (-2,1.5), (0,1), (1,1.5), (2,-5), (4,-1). Below is the equation y = -8.43 sin(0.33x) - 0.92 sin(x) + 1.7 sin(2.6x) + 1.9.$

Figure $\PageIndex{20}$

As in the previous examples, the best-fit function minimizes the sum of the squares of the vertical distances from the graph of $y = f(x)$ to the data points.

$A graph of a trigonometric function with red waves, blue dots on peaks and troughs, and a displayed equation with coefficients. X and Y axes are marked, with a textbox above showing the equation.$

Figure $\PageIndex{21}$: The best-fit function minimizes the sum of the squares of the vertical distances (violet). Click and drag the points to see how the best-fit function changes.

The next example has a somewhat different flavor from the previous ones.

Example $\PageIndex{10}$: Best-fit ellipse

Find the best-fit ellipse through the points

\[ (0,2),\, (2,1),\, (1,-1),\, (-1,-2),\, (-3,1),\, (-1,-1). \nonumber \]

$A graph with points labeled: (0,2), (-2,1), (2,1), (-3,1), (1,-1), and (-1,-2) on a coordinate grid.$

Figure $\PageIndex{22}$

What quantity is being minimized?

Solution

The general equation for an ellipse (actually, for a nondegenerate conic section) is

\[ x^2 + By^2 + Cxy + Dx + Ey + F = 0. \nonumber \]

This is an implicit equation: the ellipse is the set of all solutions of the equation, just like the unit circle is the set of solutions of $x^2+y^2=1.$ To say that our data points lie on the ellipse means that the above equation is satisfied for the given values of $x$ and $y\text{:}$

\[\label{eq:4} \begin{array}{rrrrrrrrrrrrl} (0)^2 &+& B(2)^2 &+& C(0)(2)&+& D(0) &+& E(2)&+& F&=& 0 \\ (2)^2 &+& B(1)^2 &+& C(2)(1)&+& D(2)&+&E(1)&+&F&=& 0 \\ (1)^2&+& B(-1)^2&+&C(1)(-1)&+&D(1)&+&E(-1)&+&F&=&0 \\ (-1)^2&+&B(-2)^2&+&C(-1)(2)&+&D(-1)&+&E(-2)&+&F&=&0 \\ (-3)^2&+&B(1)^2&+&C(-3)(1)&+&D(-3)&+&E(1)&+&F&=&0 \\ (-1)^2&+&B(-1)^2&+&C(-1)(-1)&+&D(-1)&+&E(-1)&+&F&=&0.\end{array}\]

To put this in matrix form, we move the constant terms to the right-hand side of the equals sign; then we can write this as $Ax=b$ for

\[A=\left(\begin{array}{ccccc}4&0&0&2&1\\1&2&2&1&1\\1&-1&1&-1&1 \\ 4&2&-1&-2&1 \\ 1&-3&-3&1&1 \\ 1&1&-1&-1&1\end{array}\right)\quad x=\left(\begin{array}{c}B\\C\\D\\E\\F\end{array}\right)\quad b=\left(\begin{array}{c}0\\-4\\-1\\-1\\-9\\-1\end{array}\right).\nonumber\]

We compute

\[ A^TA = \left(\begin{array}{ccccc}36&7&-5&0&12 \\ 7&19&9&-5&1 \\ -5&9&16&1&-2 \\ 0&-5&1&12&0 \\ 12&1&-2&0&6\end{array}\right) \qquad A^T b = \left(\begin{array}{c}-19\\17\\20\\-9\\-16\end{array}\right). \nonumber \]

We form an augmented matrix and row reduce:

\[\left(\begin{array}{ccccc|c} 36 &7& -5& 0& 12& -19\\ 7& 19& 9& -5& 1& 17\\ -5& 9& 16& 1& -2& 20\\ 0& -5& 1 &12& 0& -9\\ 12& 1& -2& 0& 6& -16\end{array}\right)\xrightarrow{\text{RREF}}\left(\begin{array}{ccccc|c} 1& 0& 0& 0& 0& 405/266\\ 0& 1& 0& 0& 0& -89/133\\ 0& 0& 1& 0& 0& 201/133\\ 0& 0& 0& 1& 0& -123/266\\ 0& 0& 0& 0& 1& -687/133\end{array}\right).\nonumber\]

The least-squares solution is

\[\hat{x}=\left(\begin{array}{c}405/266\\ -89/133\\ 201/133\\ -123/266\\ -687/133\end{array}\right),\nonumber\]

so the best-fit ellipse is

\[ x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} = 0. \nonumber \]

Multiplying through by $266\text{,}$ we can write this as

\[ 266 x^2 + 405 y^2 - 178 xy + 402 x - 123 y - 1374 = 0. \nonumber \]

$Graph showing an ellipse centered near (0,2) with points (-3,1), (-1,2), (1,-1), (2,1). Equation: 25x² + 4y² - 175xy + 40x - 12y - 1374 = 0 in red below.$

Figure $\PageIndex{23}$

Now we consider the question of what quantity is minimized by this ellipse. The least-squares solution $\hat x$ minimizes the sum of the squares of the entries of the vector $b-A\hat x\text{,}$ or equivalently, of $A\hat x-b$. The vector $-b$ contains the constant terms of the left-hand sides of $\eqref{eq:4}$, and

\[A\hat{x}=\left(\begin{array}{rrrrrrrrr} \frac{405}{266}(2)^2 &-& \frac{89}{133}(0)(2)&+&\frac{201}{133}(0)&-&\frac{123}{266}(2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-& \frac{89}{133}(2)(1)&+&\frac{201}{133}(2)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2 &-&\frac{89}{133}(1)(-1)&+&\frac{201}{133}(1)&-&\frac{123}{266}(-1)&-&\frac{687}{133} \\ \frac{405}{266}(-2)^2&-&\frac{89}{133}(-1)(-2)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-2)&-&\frac{687}{133} \\ \frac{405}{266}(1)^2&-&\frac{89}{133}(-3)(1)&+&\frac{201}{133}(-3)&-&\frac{123}{266}(1)&-&\frac{687}{133} \\ \frac{405}{266}(-1)^2&-&\frac{89}{133}(-1)(-1)&+&\frac{201}{133}(-1)&-&\frac{123}{266}(-1)&-&\frac{687}{133}\end{array}\right)\nonumber\]

contains the rest of the terms on the left-hand side of $\eqref{eq:4}$. Therefore, the entries of $A\hat x-b$ are the quantities obtained by evaluating the function

\[ f(x,y) = x^2 + \frac{405}{266} y^2 -\frac{89}{133} xy + \frac{201}{133}x - \frac{123}{266}y - \frac{687}{133} \nonumber \]

on the given data points.

If our data points actually lay on the ellipse defined by $f(x,y)=0\text{,}$ then evaluating $f(x,y)$ on our data points would always yield zero, so $A\hat x-b$ would be the zero vector. This is not the case; instead, $A\hat x-b$ contains the actual values of $f(x,y)$ when evaluated on our data points. The quantity being minimized is the sum of the squares of these values:

\[ \begin{split} \amp\text{minimized} = \\ \amp\quad f(0,2)^2 + f(2,1)^2 + f(1,-1)^2 + f(-1,-2)^2 + f(-3,1)^2 + f(-1,-1)^2. \end{split} \nonumber \]

One way to visualize this is as follows. We can put this best-fit problem into the framework of Example $\PageIndex{8}$ by asking to find an equation of the form

\[ f(x,y) = x^2 + By^2 + Cxy + Dx + Ey + F \nonumber \]

which best approximates the data table

\[ \begin{array}{r|r|c} x & y & f(x,y) \\\hline 0 & 2 & 0 \\ 2 & 1 & 0 \\ 1 & -1 & 0 \\ -1 & -2 & 0 \\ -3 & 1 & 0 \\ -1 & -1 & 0\rlap. \end{array} \nonumber \]

The resulting function minimizes the sum of the squares of the vertical distances from these data points $(0,2,0),\,(2,1,0),\,\ldots\text{,}$ which lie on the $xy$-plane, to the graph of $f(x,y)$.

$3D graph and 2D contour plot of a quadratic equation: $z = x^2 + 1.82y^2 - 0.87xy + 1.31x - 0.8y - 0.17$. Points indicate minimum on the surface and zero set.$

Figure $\PageIndex{24}$: The best-fit ellipse minimizes the sum of the squares of the vertical distances (violet) from the points $(x,y,0)$ to the graph of $f(x,y)$ on the left. The ellipse itself is the zero set of $f(x,y)\text{,}$ on the right. Click and drag the points on the right to see how the best-fit ellipse changes. Can you arrange the points so that the best-fit conic section is actually a hyperbola?

Note $\PageIndex{3}$

Gauss invented the method of least squares to find a best-fit ellipse: he correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind the sun in 1801.

Search

Text Color

Text Size

Margin Size

Font Type

Definition \(\PageIndex{1}\): Least-Squares Solution

Note \(\PageIndex{1}\)

Note \(\PageIndex{2}\)

Theorem \(\PageIndex{1}\)

Recipe 1: Compute a Least-Squares Solution

Example \(\PageIndex{1}\)

Solution

Example \(\PageIndex{2}\)

Solution

Theorem \(\PageIndex{2}\)

Example \(\PageIndex{3}\): Infinitely many least-squares solutions

Solution

Recipe 2: Compute a Least-Squares Solution

Example \(\PageIndex{4}\)

Solution

Example \(\PageIndex{5}\): Best-Fit Line

Example \(\PageIndex{6}\): Interactive: Best-fit line

Example \(\PageIndex{7}\): Best-fit parabola

Solution

Example \(\PageIndex{8}\): Best-fit linear function

Solution

Example \(\PageIndex{9}\): Best-fit trigonometric function

Solution

Example \(\PageIndex{10}\): Best-fit ellipse

Solution

Note \(\PageIndex{3}\)