6.03: Linear Regression

Last updated

Oct 5, 2023
Save as PDF
- 6.01: Prerequisites to Regression
- 6.04: Nonlinear Regression

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\id}{\mathrm{id}}$ $\newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$ $\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$ $\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\id}{\mathrm{id}}$

$\newcommand{\Span}{\mathrm{span}}$

$\newcommand{\kernel}{\mathrm{null}\,}$

$\newcommand{\range}{\mathrm{range}\,}$

$\newcommand{\RealPart}{\mathrm{Re}}$

$\newcommand{\ImaginaryPart}{\mathrm{Im}}$

$\newcommand{\Argument}{\mathrm{Arg}}$

$\newcommand{\norm}[1]{\| #1 \|}$

$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$\newcommand{\Span}{\mathrm{span}}$ $\newcommand{\AA}{\unicode[.8,0]{x212B}}$

$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vectorC}[1]{\textbf{#1}}$

$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$

$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$

$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$

$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$

$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$

$\newcommand{\avec}{\mathbf a}$

$\newcommand{\bvec}{\mathbf b}$

$\newcommand{\cvec}{\mathbf c}$

$\newcommand{\dvec}{\mathbf d}$

$\newcommand{\dtil}{\widetilde{\mathbf d}}$

$\newcommand{\evec}{\mathbf e}$

$\newcommand{\fvec}{\mathbf f}$

$\newcommand{\nvec}{\mathbf n}$

$\newcommand{\pvec}{\mathbf p}$

$\newcommand{\qvec}{\mathbf q}$

$\newcommand{\svec}{\mathbf s}$

$\newcommand{\tvec}{\mathbf t}$

$\newcommand{\uvec}{\mathbf u}$

$\newcommand{\vvec}{\mathbf v}$

$\newcommand{\wvec}{\mathbf w}$

$\newcommand{\xvec}{\mathbf x}$

$\newcommand{\yvec}{\mathbf y}$

$\newcommand{\zvec}{\mathbf z}$

$\newcommand{\rvec}{\mathbf r}$

$\newcommand{\mvec}{\mathbf m}$

$\newcommand{\zerovec}{\mathbf 0}$

$\newcommand{\onevec}{\mathbf 1}$

$\newcommand{\real}{\mathbb R}$

$\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$

$\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$

$\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$

$\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$

$\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$

$\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$

$\newcommand{\laspan}[1]{\text{Span}\{#1\}}$

$\newcommand{\bcal}{\cal B}$

$\newcommand{\ccal}{\cal C}$

$\newcommand{\scal}{\cal S}$

$\newcommand{\wcal}{\cal W}$

$\newcommand{\ecal}{\cal E}$

$\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$

$\newcommand{\gray}[1]{\color{gray}{#1}}$

$\newcommand{\lgray}[1]{\color{lightgray}{#1}}$

$\newcommand{\rank}{\operatorname{rank}}$

$\newcommand{\row}{\text{Row}}$

$\newcommand{\col}{\text{Col}}$

$\renewcommand{\row}{\text{Row}}$

$\newcommand{\nul}{\text{Nul}}$

$\newcommand{\var}{\text{Var}}$

$\newcommand{\corr}{\text{corr}}$

$\newcommand{\len}[1]{\left|#1\right|}$

$\newcommand{\bbar}{\overline{\bvec}}$

$\newcommand{\bhat}{\widehat{\bvec}}$

$\newcommand{\bperp}{\bvec^\perp}$

$\newcommand{\xhat}{\widehat{\xvec}}$

$\newcommand{\vhat}{\widehat{\vvec}}$

$\newcommand{\uhat}{\widehat{\uvec}}$

$\newcommand{\what}{\widehat{\wvec}}$

$\newcommand{\Sighat}{\widehat{\Sigma}}$

$\newcommand{\lt}{<}$

$\newcommand{\gt}{>}$

$\newcommand{\amp}{&}$

$\definecolor{fillinmathshade}{gray}{0.9}$

Lesson 1: Introduction to Linear Regression

Learning Objectives

After successful completion of this lesson, you should be able to:

1) Define a residual for a linear regression model,

2) Explain the concept of the least-squares method as an optimization approach,

3) Explain why other criteria of finding the regression model do not work.

Introduction

The problem statement for a regression model is as follows. Given data pairs , best fit to the data (Figure $\PageIndex{1.1}$ ).

Fitting a curve through a series of data points, and measuring the shortest distance between each point and the curve. — Figure $\PageIndex{1.1}$ . A general regression model for discrete $y$ vs. $x$ data.

Linear regression is the most popular regression model. In this model, we wish to predict response to $n$ data points $\left( x_{1},y_{1} \right),\left( x_{2},y_{2} \right),\ldots\ldots,\left( x_{n},y_{n} \right)$ by a regression model given by

$y = a_{0} + a_{1}x\;\;\;\;\;\;\;\;\;\;\;\; (\PageIndex{1.1}) \nonumber$

where $a_{0}$ and $a_{1}$ are the constants of the regression model.

A measure of goodness of fit, that is, how well $a_{0} + a_{1}x$ predicts the response variable $y$ , is the magnitude of the residual $E_{i}$ at each of the $n$ data points.

$E_{i} = y_{i} - \left( a_{0} + a_{1}x_{i} \right)\;\;\;\;\;\;\;\;\;\;\;\;(\PageIndex{1.2}) \nonumber$

Ideally, if all the residuals $E_{i}$ are zero, one has found an equation in which all the points lie on the model. Thus, minimization of the residuals is an objective of obtaining regression coefficients.

The most popular method to minimize the residual is the least-squares method, where the estimates of the constants of the models are chosen such that the sum of the squared residuals is minimized, that is, minimize

$S_{r}=\sum_{i = 1}^{n}{E_{i}}^{2}\;\;\;\;\;\;\;\;\;\;\;\; (\PageIndex{1.3}) \nonumber$

Why minimize the sum of the square of the residuals, $S_{r}$ ?

Why not, for instance, minimize the sum of the residual errors or the sum of the absolute values of the residuals? Alternatively, constants of the model can be chosen such that the average residual is zero without making individual residuals small. Would any of these criteria yield unbiased parameters with the smallest variance? All of these questions will be answered. Look at the example data in Table $\PageIndex{1.1}$ .

Table $\PageIndex{1.1}$ . Data points.
$x$	$y$
$2.0$	$4.0$
$3.0$	$6.0$
$2.0$	$6.0$
$3.0$	$8.0$

To explain this data by a straight line regression model,

$y = a_{0} + a_{1}x\;\;\;\;\;\;\;\;\;\;\;\;(\PageIndex{1.4}) \nonumber$

Let us use minimizing $\displaystyle \sum_{i = 1}^{n}E_{i}$ as a criterion to find $a_{0}$ and $a_{1}$ . Assume randomly that

$y = 4x - 4\;\;\;\;\;\;\;\;\;\;\;\;(\PageIndex{1.5}) \nonumber$

as the resulting regression model (Figure $\PageIndex{1.2}$ ).

Plot showing the given data points with the regression curve y=4x-4. — Figure $\PageIndex{1.2}$ . Regression curve $y = 4x - 4$ for $y$ vs. $x$ data.

The sum of the residuals is shown in Table $\PageIndex{2.2}$ .

Table $\PageIndex{1.2}$ . The residuals at each data point for regression model $y = 4x - 4$ .
$x$	$y$	$y_{predicted}$	$E = y - y_{predicted}$
$2.0$	$4.0$	$4.0$	$0.0$
$3.0$	$6.0$	$8.0$	$-2.0$
$2.0$	$6.0$	$4.0$	$2.0$
$3.0$	$8.0$	$8.0$	$0.0$
			$\displaystyle \sum_{i = 1}^{4}E_{i} = 0$

So does this give us the smallest possible sum of residuals? For this data, it does as and it cannot be made any smaller. But does it give unique values for the parameters of the regression model? No, because, for example, a straight-line model (Figure $\PageIndex{1.3}$ )

$y = 6\;\;\;\;\;\;\;\;\;\;\;\;(\PageIndex{1.6}) \nonumber$

also gives as shown in Table $\PageIndex{1.3}$ .

In fact, there are many other straight lines for this data for which the sum of the residuals $\displaystyle \sum_{i = 1}^{4}E_{i} = 0$ . We hence find the regression models are not unique, and therefore this criterion of minimizing the sum of the residuals is a bad one.

Plot showing the given data points with the regression curve y=6. — Figure $\PageIndex{1.3}$ . Regression curve $y = 6$ for $y$ vs. $x$ data.

Table $\PageIndex{1.3}$ . The residuals at each data point for regression model $y = 6$ .
$x$	$y$	$y_{\text{predicted}}$	$E = y - y_{predicted}$
$2.0$	$4.0$	$6.0$	$-2.0$
$3.0$	$6.0$	$6.0$	$0.0$
$2.0$	$6.0$	$6.0$	$0.0$
$3.0$	$8.0$	$6.0$	$2.0$
			$\displaystyle \sum_{i = 1}^{4}E_{i} = 0$

You may think that the reason the criterion of minimizing does not work is because negative residuals cancel with positive residuals. So, is minimizing the sum of absolute values of the residuals, that is, better? Let us look at the same example data given in Table $\PageIndex{1.1}$ . For the regression model , the sum of the absolute value of residuals is shown in Table $\PageIndex{1.4}$ .

Table $\PageIndex{1.4}$ . The absolute residuals at each data point when employing $y = 4x - 4$ .
$x$	$y$	$y_{predicted}$	$E = y - y_{predicted}$
$2.0$	$4.0$	$4.0$	$0.0$
$3.0$	$6.0$	$8.0$	$2.0$
$2.0$	$6.0$	$4.0$	$2.0$
$3.0$	$8.0$	$8.0$	$0.0$
			$\displaystyle \sum_{i = 1}^{4}\left\| E_{i} \right\| = 4$

The value of also exists for the straight-line model (see Table $\PageIndex{1.5}$ ).

Table $\PageIndex{1.5}$ . The absolute residuals at each data point for regression model $y = 6$ .
$x$	$y$	$y_{predicted}$	$E = y - y_{predicted}$
$2.0$	$4.0$	$6.0$	$-2.0$
$3.0$	$6.0$	$6.0$	$0.0$
$2.0$	$6.0$	$6.0$	$0.0$
$3.0$	$8.0$	$6.0$	$2.0$
			$\displaystyle \sum_{i = 1}^{4}{\|E_{i}}\| = 4$

No other straight-line model that you may choose for this data has $\displaystyle \sum_{i = 1}^{4}\left| E_{i} \right| < 4$ . And there are many other straight lines for which the sum of absolute values of the residuals $\displaystyle \sum_{i = 1}^{4}\left| E_{i} \right| = 4$ . We hence find that the regression models are not unique, and hence the criterion of minimizing the sum of the absolute values of the residuals is also a bad one.

To get a unique regression model, the least-squares criterion where we minimize the sum of the square of the residuals

$\begin{split} S_{r} &= \sum_{i = 1}^{n}{E_{i}}^{2}\\ &= \sum_{i = 1}^{n}(y_i-a_0- a_1x_i)^{2}\;\;\;\;\;\;\;\;\;\;\;\;(\PageIndex{1.7}) \end{split}$

is recommended. The formulas obtained for the regression constants $a_0$ and $a_1$ are given below and will be derived in the next lesson.

$\displaystyle a_{0} = \frac{\displaystyle\sum_{i = 1}^{n}y_{i}\sum_{i = 1}^{n}x_{i}^{2} - \sum_{i = 1}^{n}x_{i}\sum_{i = 1}^{n}{x_{i}y_{i}}}{\displaystyle n\sum_{i = 1}^{n}x_{i}^{2} \ -\left( \sum_{i = 1}^{n}x_{i} \right)^{2}}\;\;\;\;\;\;\;\;\;\;\;\; (\PageIndex{1.8}) \nonumber$

$\displaystyle a_{1} = \frac{\displaystyle n\sum_{i = 1}^{n}{x_{i}y_{i}} - \sum_{i = 1}^{n}x_{i}\sum_{i = 1}^{n}y_{i}}{\displaystyle n\sum_{i = 1}^{n}x_{i}^{2}-\left( \sum_{i = 1}^{n}x_{i} \right)^{2}}\;\;\;\;\;\;\;\;\;\;\;\; (\PageIndex{1.9}) \nonumber$

The formula for $a_0$ can also be written as

$\begin {split} \displaystyle a_{0} &= \frac{\displaystyle \sum_{i = 1}^{n}y_{i}}{n} -a_1\frac{\displaystyle \sum_{i = 1}^{n}x_{i}}{n} \\ &= \bar{y} - a_{1}\bar{x} \end{split}\;\;\;\;\;\;\;\;\;\;\;\; (\PageIndex{1.10}) \nonumber$

Audiovisual Lecture

Title: Linear Regression - Background

Summary: This video is about learning the background of linear regression of how the minimization criterion is selected to find the constants of the model.