Skip to main content
Mathematics LibreTexts

Least Squares

  • Page ID
    218317
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Least Squares

    Definition and Derivations

    We have already spent much time finding solutions to 

            Ax  =  b 

    If there isn't a solution, we attempt to seek the x that gets closest to being a solution.  The closest such vector will be the x such that

            Ax  =  projWb 

    where W is the column space of A.  

            Graph of a plane, a slanted vector b coming from the plane, the projection vector on the plane, and the perpendicular to the plane vector from the projection vector to b.

    Notice that b - projWb is in the orthogonal complement of W hence in the null space of AT.  Hence if x is a this closest vector, then

            AT(b - Ax)  =  0        ATAx  =  ATb 

    Now we need to show that ATA nonsingular so that we can solve for x.  


    Lemma

    If A is an m x n matrix of rank n, then ATA is nonsingular.  

     

    Proof

    We want to show that the null space of ATA is zero.  If 

            0  =  ATAx

    then multiplying both sides by xT, we get

            0  =  xTATAx  =  (Ax)TAx  =  Ax . Ax  =  ||Ax||2

    If the magnitude of a vector is zero, then the vector is zero, hence

            Ax  =  0 

    Since 

            rank(A)  =  n

    we can conclude that 

            x  =  0 


    We can now state the main theorem.

    Theorem

    Let A be an m x n matrix or rank n, then the system

            Ax  =  b 

    has the unique least squares solution 

            x  =  (ATA) -1ATb 


    Examples

    Example

    Find the least squares solution to 

            Ax  =  b

    with

           \( A = \begin{pmatrix} 1 & 3 \\ 2 & 4 \\ 1 & 6 \end{pmatrix}  b = \begin{pmatrix} 4 \\ 1 \\ 3 \end{pmatrix} \)

    Solution

    We can quickly check that A has rank 2 (the first two rows are not multiples of each other).  Hence we can compute

     

           \( x = (A^T A)^{-1}A^T b = \begin{pmatrix} -0.377 \\ 0.662 \end{pmatrix} \)

    Notice that    

           \( Ax = \begin{pmatrix} 1.61 \\ 1.90 \\ 3.60 \end{pmatrix} \)

     not exactly b, but as close as we are going to get.


    Least Squares Regression Line

    Of fundamental importance in statistical analysis is finding the least squares regression line. 

    Example

    An engineer is tracking the friction index over mileage of a breaking system of a vehicle.  She expects that the mileage-friction relationship is approximately linear.  She collects five data points that are show in the table below. 

            

    Mileage 2000 6000 20,000 30,000 40,000
    Friction Index 20 18 10 6 2

    The graph below shows these points

            Scatter plot of mileage and friction with points from the table and the regression linethat goes down from (0,21) to (43,0)

    We are interested in the line that best fits the data.  More specifically, if b is the vector of friction index data values and y is the vector consisting of y values when we plug in the mileage data for x and find y by the equation of the line, then we want the line that minimizes the distance between b and y.  If the equation of the line is 

            ax + b  =  y

    then we get the five equations

            2a + b  =  20
            6a + b  =  18
            20a + b  =  10
            30a + b  =  6
            40a + b  =  2

    The corresponding matrix equation is  

            Ax  =  b 

    or

           \(\begin{pmatrix} 2 & 1 \\ 6 & 1 \\ 20 & 1 \\ 30 & 1 \\ 40 & 1 \end{pmatrix} \begin{pmatrix} a \\ b  \end{pmatrix} = \begin{pmatrix} 20 \\ 18 \\ 10 \\ 6 \\  2 \end{pmatrix} \)

    Although this does not have an exact solution, it does have a closest solution.  We have

            \( \begin{pmatrix} a \\ b  \end{pmatrix} = (A^T A)^{-1}A^T b = \begin{pmatrix} -0.48 \\ 20.6 \end{pmatrix}\)

    We can conclude that the equation of the regression line is

            y  =  -0.48x + 20.6


    Best Fitting Curves

    Often, a line is not the best model for the data.  Fortunately the same technique works if we want to use other nonlinear curves to fit the data.  Here we will explain how to find the least squares cubic.  The process for other polynomials is similar.

     

    Example

    A bioengineer is studying the growth of a genetically engineered bacteria culture and suspects that is it approximately follows a cubic model.  He collects six data points listed below

    Time in Days 1 2 3 4 5 6
    Grams 2.1 3.5 4.2 3.1 4.4 6.8

    He assumes the equation has the form 

            ax3 + bx2 + cx + d  =  y

     This gives six equations with four unknowns

                  a +     b +   c + d  =  2.1
                8a +   4b + 2c + d  =  3.5
              27a +   9b + 3c + d  =  4.2
              64a + 16b + 4c + d  =  3.1
            125a + 25b + 5c + d  =  4.4

            216a + 36b + 6c + d  =  6.8

    The corresponding matrix equation is 

           \(\begin{pmatrix} 1 & 1 & 1 & 1 \\ 8 & 4 & 2 & 1 \\ 27 & 9 & 3 & 1 \\ 64 & 16 & 4 & 1 \\ 125 & 25 & 5 & 1 \\ 216 & 36 & 6 & 1 \\  \end{pmatrix} \begin{pmatrix} a \\ b \\ c \\ d \end{pmatrix} = \begin{pmatrix} 2.1 \\ 3.5 \\ 4.2 \\ 3.1 \\  4.4 \\ 6.8 \end{pmatrix} \)

    We can use the least squares equation to find the best solution 

            \( \begin{pmatrix} a \\ b \\ c \\ d \end{pmatrix} = (A^T A)^{-1}A^T b = \begin{pmatrix} 0.2 \\ -2.0 \\  6.1 \\  -2.3  \end{pmatrix}\)

    So that the best fitting cubic is

            \( y  =  0.2x^3 - 2.0x^2 + 6.1x - 2.3 \)

    The graph is shown below

            Cubic regression curve of time and grams.  It shows the points all close to the cubic curve.



    Back to the Matrices and Vectors Page

     

     

    Least Squares is shared under a CC BY license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?