Things to do in Lecture 1 • Outline basic concepts of causality • Go through the ideas & principles underlying ordinary least squares (OLS)

Download Report

Transcript Things to do in Lecture 1 • Outline basic concepts of causality • Go through the ideas & principles underlying ordinary least squares (OLS)

Things to do in Lecture 1
• Outline basic concepts of causality
• Go through the ideas & principles
underlying ordinary least squares
(OLS) estimation
• Derive formally how to estimate the
value of the coefficients we are
interested using OLS
• Run an OLS regression
DERIVING OLS REGRESSION COEFFICIENTS
• Regression analysis is primarily concerned with
quantifying the relationship between variables
• Does more than measures of association like the
correlation coefficient since it implies that one
variable depends on another
• hence there is a causal relationship between the
variable whose behaviour we would like to
explain
- the dependent variable (which label y)
and the explanatory (or independent) variable
that we think can explain this behaviour (which
label X)
- Given data on both the dependent and
explanatory variable we can establish that
causal relationship using regression analysis
- How?
- fit a straight line through the data that best
summarises the relationship
- Since the equation of a straight line is given by
Y= b0 + b1X
we can obtain an estimate of the values for the
intercept b0 (the constant)
and the slope b1
that together give the “line of best fit”
How ?
Y
Suppose we plot a set of observations on the Y variable
and the X variable of interest - a scatter diagram
do
#
#
b1
#
#
#
d1
#
b0
Which line (and hence which slope & intercept) to choose?
- The one that minimises the sum of squared residuals
X
Using the principle of Ordinary Least Squares (OLS)
- Try to fit a (straight) line through the data based on
“minimising the sum of squared residuals”
What is a residual?
If we fit a line through some data then this will give a
predicted value for the dependent variable based on the
value of the X variable and the values for the constant
and the slope
y
Given any straight line can always read off actual and
predicted values of variable of interest for every
individual i in the data set
#
b1
#
^
yi
#
↓
#
#
yi
#
^ ^ ^
y i  b 0  b1 X i
b0
Xi
X
For example given the equation of a straight line
Y = b0 + b1X
we know
when X = 1, then the predicted value of Y
^
^ ^
y  b 0  b1(1)
and when X = 2, then the predicted value of Y
^
^ ^
y  b0  b1(2)
We can then compare this predicted value with the
actual value of the dependent variable and the
difference between the actual and predicted value
gives the residual
^
^
u i  yi  y i
Which since
gives
^ ^ ^
y i  b 0  b1 X i
^
^
^
^
u i  yi  y i  yi   0   1 X i
The difference between the actual and predicted value is the residual
y
predicted
#
b1
#
^
yi
#
#
#
yi
#
^
ui
b0
actual
Xi
X
We can then compare this predicted value with
the actual value of the dependent variable and
the difference between the actual and
predicted value gives the residual
^
^
^ ^
u i  yi  y i  y  b0  b1 X i
(where the i subscript refers to the ith individual
or firm or time period in the data set)
Things to know about residuals
^
^
^ ^
u i  yi  y i  y  b0  b1 X i
1. The larger the residual the worse the
prediction
So intuitively then the line of best fit should be the
one that delivers the smallest residual values for
the each observation in the data set
2. Since the difference between the actual
and predicted value gives the residual
^
^
^ ^
u i  yi  y i  y  b0  b1 X i
a positive residual means
^
^
u i  yi  y i  0
and similarly
^
^
u i  yi  y i  0
The model underpredicts
The model overpredicts
(larger than actual)
Given this..
Suppose we tried to minimise the sum of all the
residuals in the data in an attempt to get the
line of best fit
N^
 ui
i 1
Whilst this might seem intuitive it will not work
because it is possible that any positive residual
will be offset by a negative residual in the
summation and so the sum could be close to
zero even if the overall fit of the regression were
poor
We can avoid this problem is use instead the
principle of Ordinary Least Squares (OLS)
Rather than minimise the sum of residuals,
minimise the sum of squared residuals
2
N^
 ui
i 1
- Squaring ensures that values are always positive and
so can never cancel each other out
-Also gives more “weight” to larger residuals and so
harder to get away with a poor fit, (the larger any one
residual in absolute value, the larger is the sum of
squared residuals (RSS)
Things to do in Lecture 2
• Derive formally how to estimate the
value of the coefficients we are
interested using OLS
• Run an OLS regression
• Interpret the regression output
• Measure how well the model fits the
data
The Idea Behind Ordinary Least Squares (OLS)
y
do
#
#
b1
#
#
#
d1
#
b0
Which line (and hence which slope & intercept) to choose?
- The one that minimises the sum of squared residuals
X
The difference between the actual and predicted value is the residual
y
predicted
#
b1
#
^
yi
#
#
#
yi
#
^
ui
b0
actual
Xi
X
Consider the following simple example
N=2 and want to fit a straight line y=b0 +b1X
thru’ the following data points using principle of OLS
(min sum of squared residuals)
(Y1=3 X1 =1)
(Y2=5 X2=2)
It follows that can write estimated residual for the 1st observation
^
^
u1  y1  y1
using
^
^ ^
y1  b 0  b1( X1)
and y1=3
as
and similarly for the 2nd observation
^ ^ ^
y1  b 0  b1(1)
^
^ ^
u1  3  b 0  b1(1)
^
^
^ ^
u 2  y2  y 2  5  b0  b1(2)
OLS: minimise the sum of squared residuals
^2 ^2
S  u1  u 2
^ ^
^
^
2
S  (3  b0  b1)  (5  b0  2 b1)2
Expanding the terms in brackets
^
S
^
^
^ ^
2
 (9  b 0  b1  6 b 0  6 b1  2 b 0 b1 )
^
2
^
^
^
^
^ ^
2
 (25  b 0  4 b1  10 b 0  20 b1  4 b 0 b1 )
2
Adding together like terms
^
S
^
^
^ ^
2
 (2 b 0  5 b1  16 b 0  26 b1  6 b 0 b1  25)
2
^
(A)
^
Now need to find values of
b0
^
which minimise this sum,
^
S
and
^
^
^ ^
2
 (2 b 0  5 b1  16 b 0  26 b1  6 b 0 b1  25)
2
^
Using the rules of calculus we know the first order condition for
minimisation are:
dS
^
dS
0
^
0
d b1
d b0
→
^
^
4 b 0  6 b1  16  0
^
b1
^
6 b 0  10 b1  26  0
This gives 2 simultaneous equations
^
^
2 b 0  3b1  8
^
^
3 b 0  5 b1  13
^
which can solve for unknown values of b 0
^
and
b1
using rules for simultaneous equations
^
^
b1  2
b0  1
^
So the estimated regression line becomes Y  1  2 X
ie the intercept (constant) with the y axis is at 1 and the slope
of the straight line is 2
Basic idea underlying OLS is to choose a “line of best fit”
-Choose a straight line that passes through the data and
minimises the sum of squared residuals
Now need to do this more generally so can apply the technique
to any possible combination of (x, y) data pairs and any number
of observations
If we wish to fit a (straight) line through N (rather than 2)
observations, then the OLS principle is still the same ie choose
^
^
and
b0
S
b1
to minimise
^2 ^2
^2
 u1  u 2  .....u N
2
N^
 ui
i 1
N
^
  ( yi  y i ) 2
i 1
(where now the summation runs from 1 to N rather than 1 to 2 )
^
^
^
sub. in y i  b0  b1 X i
^
^
^
^
S  (Y1  b0  b1 X 1 )  ...  (Yn  b0  b1 X N ) 2

Y12
^ 2
 b0 
2
^2
b1 X 12
^

2 b0 Y1 
^
2 b1 X 1Y1 
^ ^
2 b0 b1 X 1
 ...

Yn2
^ 2
^2
 b0 
^ 2
b1 X n2
^2
  Yi  N b0  b1  X i2
2
^

2 b0 Yn 
^
^
^
2 b1 X nYn 
^ ^
^ ^
2 b0 b1 X n
 2 b0  Yi  2 b1  X i Yi  2 b0 b1  X i
This is just a generalised version of (A) above
Again, find values of
^
^
b 0 and
b1
which minimise this sum, using the same simple calculus rules
dS
^
d b0
0
dS
and
^
d b1
0
Now these two (1st order) minimisation conditions give
^
^
S
 0  2 N b 0  2 Yi  2 b1  X i  0
^
 b0
^
^
S
2
 0  2 b1  X i  2   X iYi  2 b 0  X i  0
^
 b1
(1)
(2)
and again we have 2 simultaneous equations (called the normal
equations) which can again solve for
^
b0
^
and
b1
Using the fact that the sample means of Y and X
N
_  yi
Y  i 1
N
_ N
 N Y   yi
i 1
can re-write (1)
N
_  xi
X  i 1
N
_ N
 N X   xi
i 1
^
^
2 N b0  2 Yi  2 b1  X i  0
_ ^
_
^
2 N b0  2 N Y  2 b1 N X  0
and so obtain the formula to calculate the OLS estimate of the intercept
_ ^ _
^
b 0  Y  b1 X
(3)
(*** learn this **)
Sub.
_ ^ _
^
b 0  Y  b1 X
into (2)
^
^
2 b1  X i2  2   X iYi  2 b 0  X i  0
^
^
b1  X i2   X iYi  (Y  b1 X ) X i  0
and simplifying
^
^
2
b1  X i   X iYi  (Y  b1 X ) NX  0
collecting terms
^
b 1   X i2  NX 2    X iYi  NXY


gives
Dividing both sides by 1/N
^ 1
1
b 1   X i2  X 2    X iYi  XY
N
 N
^ 1
1


2
b 1   ( X i  X )    ( X i  X )(Yi  Y )
N
 N
which gives the formula to calculate the OLS estimate of the slope
^
b1 Var ( X )  Cov( X , Y )
^ Cov( X ,Y )
b1 
Var ( X )
(**** learn this ****)
So
_ ^ _
^
b 0  Y  b1 X
^ Cov( X ,Y )
b1 
Var ( X )
are how the computer determines the size of the intercept and
the slope respectively in an OLS regression
The OLS equations give a nice, clear intuitive meaning about
the influence of the variable X on the size of the slope, since it
shows that:
i) the greater the covariance between X and Y, the larger the
(absolute value of) the slope
ii) the smaller the variance of X, the larger the (absolute value
of) the slope
It is equally important to be able to interpret the
effect of an estimated regression coefficient
Given OLS essentially passes a straight line through the data,
then given
^ ^ ^
y  b 0  b1 X
dY
 b1
dX
So the OLS estimate of the slope will give an estimate of the
unit change in the dependent variable y following a unit
change in the level of the explanatory variable
(so you need to be aware of the units of measurement of
your variables in order to be able to interpret what the OLS
coefficient is telling you)