Econ 427 lecture 3 slides Byron Gangnes

Download Report

Transcript Econ 427 lecture 3 slides Byron Gangnes

Econ 427 lecture 3 slides
Byron Gangnes
A scatterplot
14
13
12
11
Y
10
9
8
7
6
5
-3
-2
-1
0
1
X
Byron Gangnes
2
3
4
A regression line
Y vs. X
14
13
12
11
Y
10
9
8
7
6
5
-3
-2
-1
0
1
2
X Gangnes
Byron
3
4
A regression line
Y vs. X
14
13
12
11
Y
10
9
8
7
6
5
-3
-2
-1
0
1
2
X Gangnes
Byron
3
4
Linear Regression
•
We assume that y is linearly related to x,
with an independently and identically
distributed (iid) disturbance term with a
zero mean and constant variance:
yt   0  1 xt   t
iid
 t (0,  )
2
t = 1,…, T
Byron Gangnes
Linear Regression
•
The regression function gives an estimate of y,
given x, which is just the conditional
expectation of y given x = x*,
E( y | x*)  E  0  1 x *  E  t 
E ( y | x*)  0  1 x *
Byron Gangnes
Linear Regression
•
Since we don’t know the true
(population) relationship, we estimate it
from the data by calculating the
parameters that minimize squared errors:
2
T
min   yt   0  1 xt 
β
t 1
Byron Gangnes
Linear Regression
• Then we used the estimated parameters to get a
fitted value (also called an “in-sample forecast”
of y, given x:
yˆt  ˆ0  ˆ1 xt
• where the “hats” indicate estimated values.
• The in-sample forecast errors are just:

eˆt  yt  yˆt  yt  ˆ0  ˆ1 xt
Byron Gangnes

sum of squared residuals
•
SSR is the sum of squared residuals of
the regression (the minimized value that
OLS searches for)
T
SSR   e
t 1
2
t
Byron Gangnes
R-squared
•
A standard measure of overall goodness of
fit is R2 (R-squared), technically the
percentage of the variance of y explained
by the variables in the model:
T
R  1
2
1
2
et

T t 1
1 T
2
( yt  y )

T t 1
Byron Gangnes
Adjusted R-squared
• the problem with R2 is that it always goes up
when you add more variables. To avoid
“overfitting” (any model will fit the data if there
are enough RHS variables), we normally adjust for
the degrees of freedom using the “adjusted Rsquared”:
1 T 2
et

T  k t 1
2
R  1
1 T
2
(
y

y
)

t
T  1 t 1
Byron Gangnes
F-statistic
•
F-statistic is a test of whether all model
coefficients are jointly zero—an overall
test of the significance of the regression:
( SSRres  SSR) /(k  1)
F
SSR /(T  k )
Byron Gangnes
Durbin-Watson statistic
•
The Durbin-Watson statistic is a measure
of serial correlation in the regression
errors. (Why do we care about whether
errors are serially correlated?)
T
DW 
 (e  e
t 2
t 1
t
)
2
T
e
t 1
2
t
Byron Gangnes
Durbin-Watson statistic
• is a test of whether there is first-order
autocorrelation of the model errors, i.e.
yt   0  1 xt   t
 t   t 1  t
iid
 t  N (0,  2 )
•
• Values for DW fall in the [0,4] interval and
values significantly below 2 (below 1.5 say)
are indicative of serial correlation.
Byron Gangnes
Moments
• Mean
E  y      pi yi
i
• Variance
• Skewness
  var  y   E  y   
2
S
• Kurtosis
K
E y  

3
3
E y  
4
4
Byron Gangnes
2