6.01: Prerequisites to Regression
( \newcommand{\kernel}{\mathrm{null}\,}\)
Lesson 1: Simple Statistics
Learning Objectives
After successful completion of this lesson, you should be able to:
1) compute the average of a set of numbers,
2) compute the total sum of squares of a set of numbers,
3) compute the variance of a set of numbers,
4) compute the standard deviation of a set of numbers.
Introduction
In regression analysis, we are required to find the average, variance, and standard deviation of a set of numbers. In this lesson, we cover some simple descriptive statistics.
For a given set of n numbers (y1,y2,…,yn), the average or arithmetic mean ˉy is defined by
¯y=n∑i=1yin(6.1.1.1)
The total sum of the squares of the difference between the numbers and the mean (sometimes, just called total sum of squares) St is defined as
St=n∑i=1(yi−ˉy)2(6.1.1.2)
The variance of the numbers σ2 is defined by
σ2=n∑i=1(yi−ˉy)2n−1(6.1.1.3)
The standard deviation σ of the numbers is defined by
σ=√n∑i=1(yi−ˉy)2n−1(6.1.1.4)
Standard deviation is the measure of the dispersion of the set of data from its mean. If the data points are further from the mean, the standard deviation is high; if the data points are close to the mean, the standard deviation is low.
Given the numbers (5, 8, 50, 3, 7), calculate the average, total sum of the squares, variance, and standard deviation of the numbers.
Solution
The average of a set of numbers is given by
¯y=n∑i=1yin
In the problem,
n=5, y1=5, y2=8, y3=50, y4=3, y5=7
The average of the numbers is
ˉy=5+8+50+3+75=14.6
The total sum of the squares is
St=n∑i=1(yi−ˉy)2=n∑i=1(yi−14.6)2=(5−14.6)2+(8−14.6)2+(50−14.6)2+(3−14.6)2+(7−14.6)2=1581.2
The variance is
σ2=n∑i=1(yi−ˉy)2n−1=Stn−1=181.25−1=395.3
The standard deviation is
σ=√σ2=√395.3=19.88
Lesson 2: Absolute Minimum of a Function of One Variable
Learning Objectives
After successful completion of this lesson, you should be able to:
1) Find the local minimum of a twice-differentiable continuous single-variant function. This knowledge will assist you in deriving the parameters of regression models in the lessons to follow.
Minimum of a twice-differentiable continuous function
In regression, we are asked to minimize a differentiable, continuous function of one or more variables. In this primer, we cover the basics of finding the minimum of a continuous function that is twice differentiable with domain D.
Absolute Minimum Value: Given a function f(x) with domain D, then f(c) is the absolute minimum on D if and only if f(x)≥f(c) for all x in D.

Look at Figure 6.1.2.1, where if the domain D is given by the interval [a,g], then C = absolute minimum of the function, as it is the smallest value of the function in the domain D. To find the absolute minimum of a continuous function with domain D, we look at the value of the function at the endpoints of D and also check where f′(x)=0.
These points wheref′(x)=0 are critical points and could be local extreme (local minimum or local maximum) values. If f′′(x)>0 at any of these points where local extremes occur, then it corresponds to a local minimum. Out of all the local minimums and the values at the domain ends, one can find the minimum of all such values. The point where this minimum exists is then the location of the absolute minimum value, and the value of the function at that point is the absolute minimum of the function.
Find the location of the minimum of a polynomial 25−20x+4x2.
Solution
f′(x)=ddx(25−20x+4x2)=ddx(25)+ddx(−20x)+ddx(4x2)=0−20+4ddx(x2)=−20x+4(2x)=−20+8x
Check where f′(x)=0
−20+8x=0
gives
x=208=2.5
Now check for f′′(x)
f′(x)=−20+8xf′′(x)=ddx(−20+8x)=8f′′(2.5)=8
Since f′(2.5)=0 andf′′(2.5)>0, the function has a local minimum at x=2.5. Is this point also the location of the absolute minimum? Yes, because of two reasons – firstly, the value of the function approaches infinity (maximum value) at the endpoints of the domain (−∞,∞). This check is made to look for extremes at the endpoints of the domain. Secondly, there is only one critical point at which f′(x)=0 and f′′(x)>0.
Given (x,y) data points (5,10), (6,15), (10,20), and S=3∑i=1(yi−axi)2, find the value of a where the minimum of the summation series occurs.
Solution
The (x,y) data pairs are given as follows
x1=5, x2=6, x3=10, y1=10, y2=15, y3=20
Calculating S
S=3∑i=1(yi−axi)2=(y1−ax1)2+(y2−ax2)2+(y3−ax3)2=(10−5a)2+(15−6a)2+(20−10a)2=161a2−680a+725
Finding the first derivative
dSda=dda(161a2−680a+725)=161(2a)−680=322a−680
Using
dSda=0
gives
322a−680=0
a=680322=2.111
For the second derivative test
d2Sda2=dda(dSda)=dda(322a−680)=322
d2Sda2(2.11)=322
Since
dSda=0 at a=2.11 and
d2Sda2(2.11)>0,
a local minimum exists at a=2.11. Also, since S is a continuous function of a and it has only one point where dSda=0 and d2Sda2>0, this local minimum is the absolute minimum.
Alternative Solution
Look at the solution if we had not expanded the summation.
S=n∑i=1(yi−axi)2
Using the chain rule, if u is a function of variable a
dda(u2)=2ududa
then for
dSda=0
we get
n∑i=12(yi−axi)(−xi)=0
n∑i=1−2yixi+2axi2=0
3∑i=1−2yixi+3∑i=12axi2=0
−23∑i=1yixi+2a3∑i=1xi2=0
2a3∑i=1xi2=23∑i=1yixi
a=3∑i=1yixi3∑i=1xi2
Substituting given values of (xi,yi) gives
a=(10×5)+(15×6)+(20×10)52+62+102=340161=2.11
We found
dSda=3∑i=1(−2yixi+2axi2)
Then
d2Sda2=3∑i=12xi2=2(5)2+2(6)2+2(10)2=322
A local minimum hence exists at a=2.11. Also, since S is a continuous function of a and it has only one point where dSda=0 and d2Sda2>0, this local minimum is the absolute minimum.
Lesson 3: Partial Derivatives
Learning Objectives
After successful completion of this lesson, you should be able to:
1) know the definition of a partial derivative,
2) find partial derivatives of a function.
Introduction
In regression, we are asked to find partial derivatives of functions. In this lesson, we cover the definition of a partial derivative and find partial derivatives of a simple function.
If someone gives a function f(x) of one variable x, then we already know how to find the derivative f′(x). However, functions can be of more than one independent variable. How can we then calculate the rate of change of a function with respect to each variable? This is simply done by defining partial derivatives, where one can find the derivative with respect to one variable while treating other variables as constants. For example, for a function f(x,y), the partial derivative with respect to x is defined as
∂f∂x=limΔx→0f(x+Δx,y)−f(x,y)Δx(6.1.3.1)
and the partial derivative with respect to y is defined as
∂f∂y=limΔy→0f(x,y+Δy)−f(x,y)Δy(6.1.3.2)
Given
f(x,y)=2x3y2+7x2y2,
find
∂f∂x and ∂f∂y.
Solution
f(x,y)=2x3y2+7x2y2
∂f∂x=∂∂x(2x3y2+7x2y2)
Since all variables other than x are considered to be constant,
∂f∂x=2y2∂∂x(x3)+7y2∂∂x(x2)=2y2(3x2)+7y2(2x)=6x2y2+14xy2
Now
∂f∂y=∂∂y(2x3y2+7x2y2)
Since all the variables other than y are considered to be constant,
∂f∂y=2x3∂∂y(y2)+7x2∂∂y(y2)=2x3(2y)+7x2(2y)=4x3y+14x2y
Lesson 4: Absolute Minimum of a Function of Multiple Variables
Learning Objectives
After successful completion of this lesson, you should be able to:
1) explain the first-order optimality condition,
2) apply the first-order optimality condition to find potential local minimums of a multivariant, continuous, and twice-differentiable function.
Recap: Local Minimum of a Single-Variant Function
In a previous lesson, we learned about the optimality conditions for finding extreme points and how to apply them to find a local minimum of a single-variant function. The optimality conditions are:
- First-order optimality condition: f′(x)=0 (zero slope)
- Second-order optimality condition: f′′(x)>0 (bowl-shaped, also called concave up)
Local Minimum of a Multivariant Function
The ideas for finding a local minimum of a multivariant function y=f(x1,x2,x3,…, xn) are very similar to those for a single-variant function. The first-and second-order derivatives of the function can help us find local minimums.
First-Order Optimality Condition
The condition remains the same: f′(x)=0, where x is a vector of n independent variables (x1,x2,x3,…, xn). However, since f(x) is now a multivariant function f(x)=f(x1,x2,x3,…, xn), f′(x) is now a vector, called gradient. Each element of the gradient vector is the corresponding partial derivative. The first-order optimality condition can now be more explicitly written as:
f′(x)=[∂f∂x1∂f∂x2⋮∂f∂xn]=[00⋮0](6.1.4.1)
In other words, we have
∂f∂xi=0, ∀i=1, 2,…,n(6.1.4.2)
where the symbol ∀ means “for all.”
Second-Order Optimality Condition:
Similarly, the second-order optimality condition has to do with the second-order derivative of the function f′′(x). For a multivariant function, its second-order derivative is defined by the Hessian matrix:
H(x)=[∂2f∂x21∂2f∂x1∂x2∂2f∂x2∂x1∂2f∂x22 ⋯∂2f∂x1∂xn ⋯∂2f∂x2∂xn⋮⋮∂2f∂xn∂x1∂2f∂xn∂x2⋮⋮⋯∂2f∂x2n](6.1.4.3)
The shape of the function is determined by the definiteness of H(x). The second-order optimality discussion is beyond the scope of this course.
.png?revision=1)
Critical Points of a Function of Two Variables
Let z=f(x,y) be a function of two variables x and y. Then (x0,y0) is considered to be a critical point of f if f is differentiable on an open set that contains (x0,y0) and if one of two conditions is true.
(1) ∂f∂x|x0,y0=0; ∂f∂y|x0,y0=0(2) ∂f∂x|x0,y0or ∂f∂y|x0,y0does not exist(6.1.4.4)
Find the critical points of the function
f(x,y)=y4+8xy−4y−4x
Solution
Now
∂f∂x=∂∂x(y4+8xy−4y−4x)=8y−4∂f∂y=∂∂y(y4+8xy−4y−4x)=4y3+8x−4
Putting
∂f∂x=0
gives
8y−4=0
y=48=0.5
Putting
∂f∂y=0
gives
4y3+8x−4=0
x=4 − 4y38
Then at
y=0.5
since
∂f∂y=0
we get
x=4 − 4(0.5)38=0.4375
Hence (x,y)=(0.4375, 0.5) is a critical point of the function. This critical point could be corresponding to a local minimum, a local maximum, or a saddle point.
The following appendix is not used for the course but is included here for completion. It is the second derivative test that would establish if a critical point corresponds to a local minimum. And following that, one can find the absolute minimum by looking at all the local minima and the values of the function at the endpoints.
Appendix
Let z=f(x,y) be a function of two variables x and y. Let the first and second-order partial derivatives be continuous on a domain containing (x0,y0). Then f(x,y) has a local minimum at (x0,y0) if all the conditions below are met.
∂f∂x|x0,y0=0, and ∂f∂y|x0,y0 =0,
( ∂2f∂x2|x0,y0)( ∂2f∂y2|x0,y0)−( ∂2f∂x∂y|x0,y0)2>0, and
∂2f∂x2|x0,y0>0(6.1.A.1)
To find the absolute minimum, one needs to choose the smallest value amongst all the local minimums, and check the value of the function at the boundary of the domain.
Multiple Choice Test
(1). The average of 7 numbers is given 12.6. If 6 of the numbers are 5, 7, 9, 12, 17, and 10, the remaining number is
(A) −47.9
(B) −47.4
(C) 15.6
(D) 28.2
(2). The average and standard deviation of 7 numbers are given a 8.142 and 5.005, respectively. If 5 numbers are 5, 7, 9, 12, and 17, the other two numbers are
(A) −0.1738, 7.175
(B) 3.396, 12.890
(C) 3.500, 3.500
(D) 4.488, 2.512
(3). A local minimum of a continuous function in the interval (−∞,∞) exists at x=a if
(A) f′(a)=0,f′′(a)=0
(B) f′(a)=0,f′′(a)<0
(C) f′(a)=0,f′′(a)>0
(D) f′(a)=0,f′′(a) does not exist
(4). The absolute minimum of a function f(x)=x2+2x−15 in the interval (−∞,∞) exists at x=________ and is ________.
(A) x=−1, f(−1)=−16
(B) x=−1, f(−1)=0
(C) x=3, f(3)=0
(D) x=5, f(5)=0
(5). The first order partial derivative with respect to x of u(x,y)=x2y3+6x3e2y
(A) y3+6e2y
(B) 3x2y2+18x3e2y
(C) 2xy3+18x2e2y
(D) 2xy3+24x2e2y
(6). The critical point(s) (x,y) of the function f(x,y)=y3+4xy−16y−4x2 is (are)
(A) (−4/3,1),(−8/3,2)
(B) (4/3,8/3) and (−1,−2)
(C) (−4/3,−8/3) and (1,2)
(D) (0,0)
For the complete solution, go to
http://nm.mathforcollege.com/mcquizzes/06reg/quiz_06reg_background_solution_new.pdf