Transcript Chapter 11

1
Autocorrelation
The Nature of Autocorrelation
2
 The randomness of the sample implies that the error terms for
different observations will be uncorrelated.
 When we have time-series data, where the observations follow
a natural ordering through time, there is always a possibility
that successive errors will be correlated with each other.
 In any one period, the current error term contains not only the
effects of current shocks but also the carryover from previous
shocks. This carryover will be related to, or correlated with,
the effects of the earlier shocks. When circumstances such as
these lead to error terms that are correlated, we say that
autocorrelation exists.
 The possibility of autocorrelation should always be entertained
when we are dealing with time-series data.
3
For efficiency (accurate estimation /
prediction) all systematic information
needs to be incorporated into the regression
model.
Autocorrelation is a systematic pattern in
the errors that can be either attracting
(positive) or repelling (negative)
autocorrelation.
Postive
Auto.
No
Auto.
et
0
et
0
et
Negative
Auto.
4
crosses line not enough (attracting)
0
.
.
. . ..
.. . .
.
. .
...
. ..
.
..
t
crosses line randomly
. . .. . . . . .
. . .
.
.
..
.
. .
..
.
.
.
.
. .t
.
crosses line too much (repelling)
. .
.
. . .
.
.
.
.
. . . t
.
.
.
.
.
Regression Model
5
yt = 1 + 2xt + et
zero mean:
E(et) = 0
homoskedasticity:
var(et) =  2
nonautocorrelation:
cov(et, es) = t = s
autocorrelation:
cov(et, es) =  t= s
Order of Autocorrelation
6
yt = 1 + 2xt + et
1st Order: et =  et1 + t
2nd Order: et = 1 et1 + 2 et2 + t
3rd Order: et = 1 et1 + 2 et2 + 3 et3 + t
We will assume First Order Autocorrelation:
AR(1) :
et = et1 + t
7
First Order Autocorrelation
yt = 1 + 2xt + et
et =  et1 + t
E(t) = 0
where 1 <  < 1
var(t) = 2 cov(t, s) = t = s
These assumptions about t imply the following about et :
E(et) = 0
2

var(et) =  e2 =  2
1
cov(et, etk) = e2 k
for k > 0
corr(et, etk) = k
for k > 0
Autocorrelation creates some
Problems for Least Squares:
8
If we have an equation whose errors exhibit
autocorrelation, but we ignore it, or are
simply unaware of it, what does it have on the
properties of least squares estimates?
1. The least squares estimator is still linear and
unbiased but it is not efficient.
2. The formulas normally used to compute the
least squares standard errors are no longer
correct and confidence intervals and
hypothesis tests using them will be wrong.
9
yt = 1 + 2xt + e
Autocorrelation:
E(et) = 0, var(et) = 2 , cov(et, es) = , t  s
 b2  wt yt   2   wt et
Where
wt 
(Linear)
Xt  X
  X t  X 2
 E (b2 )  E (  2   wt et )   2   wt E (et )   2
(Unbiased)
10
yt = 1 + 2xt + et
Autocorrelation:
cov(et, es) = , t  s
Incorrect formula for least squares variance:
var(b2) =
e 2

x
x
 t 
Correct formula for least squares variance:
var( b2 )  

2
wt
2
e 
var( et )   wt w s cov( et , es )
ts
2
wt

2
e
 w t w s 
ts
ts
Generalized Least Squares
AR(1) :
et = et1 + t
yt = 1 + 2xt + et
substitute
in for et
yt = 1 + 2xt + et1 + t
Now we need to get rid of et1
(continued)
11
12
yt = 1 + 2xt + et1 + t
yt = 1 + 2xt + et
et = yt  12xt
et1 = yt1  12xt1
lag the
errors
once
yt = 1 + 2xt + yt1  12xt1 + t
(continued)
13
yt = 1 + 2xt + yt1  12xt1 + t
yt = 1 + 2xt + yt1  1 2xt1 + t
yt  yt1 = 1(1) + 2(xtxt1) + t
y*t = 1 xt1*+ 2 xt2* + t ,
y*t = yt  yt1
t =2, 3, …, T.
x*t2 = (xtxt1)
x1t* = (1)
y*t =
yt  yt1
x*t2 = xt xt1
x1t* = (1)
14
y*t = 1 xt1*+ 2 xt2* + t ,
Problems estimating this model with least squares:
1. One observation is used up in creating the
transformed (lagged) variables leaving only (T1)
observations for estimating the model
(Cochrane-Orcutt method drops the first observation).
2. The value of  is not known. We must find some way
to estimate it.
15
(Option) Recovering the 1st Observation
Dropping the 1st observation and applying least squares
is not the best linear unbiased estimation method.
Efficiency is lost because the variance of the
error associated with the 1st observation is not
equal to that of the other errors.
This is a special case of the heteroskedasticity
problem except that here all errors are assumed
to have equal variance except the 1st error.
Recovering the 1st Observation
16
The 1st observation should fit the original model as:
y1 = 1 + 2x1 + e1
with error variance: var(e1) = e2 = 2 /(1-2).
We could include this as the 1st observation for
our estimation procedure but we must first
transform it so that it has the same error variance
as the other observations.
Note: The other observations all have error variance 2.
y1 = 1 + 2x1 + e1
17
with error variance: var(e1) = e2 = 2 / (1-2).
The other observations all have error variance 2.
Given any constant c :
var(ce1) = c2 var(e1).
If c = 1-2 , then var( 1-2 e1) = (1-2) var(e1).
= (1-2) e2
= (1-2) 2 /(1-2)
=
The transformation 1 =
2
1-2 e1 has variance 2 .
18
y1 = 1 + 2x1 + e1
Multiply through by
1-2
y1 =
1-2
1 +
The transformed error 1 =
1-2 to get:
1-2
2x1 +
1-2
e1
1-2 e1 has variance 2 .
This transformed first observation may now be
added to the other (T-1) observations to obtain
the fully restored set of T observations.
19
We can summarize these results by saying that,
providing  is known, we can find the Best Linear
Unbiased Estimator for 1 and 2 by applying least
squares to the transformed mode
y*t = 1 xt1*+ 2 xt2* + t ,
t =1, 2, 3, …, T.
where the transformed variables are defined by
*
*
y1*  1   2 y1 , x11
 1   2 , x12
 1   2 x1
for the first observation, and
yt*  yt   yt  1 , xt*1  1   , xt*2  xt   xt  1
for the remaining t = 2, 3, …, T observations.
Estimating Unknown  Value
20
If we had values for the et , we could estimate:
et = et1 + t
First, use least squares to estimate the model:
yt = 1 + 2xt + et
The residuals from this estimation are:
^e = y  b  b x
t
t
1
2 t
^e = y - b - b x
t
t
1
2 t
21
Next, estimate the following by least squares:
^e =  e^ + 
t
t1
t
The least squares solution is:
T
^=
^
^
e
e

t
t-1
t=2
T
2
^
e

t-1
t=2
Durbin-Watson Test
22
The Durbin-Watson test is by far the most
important one for detecting AR(1) errors.
It is assumed that the vt are independent
random errors with distribution N(0, v2).
The assumption of normally distributed
random errors is needed to derive the probability
distribution of the test statistic used in the DurbinWatson test.
23
The Durbin-Watson Test statistic, d, is :
T
d =
2
^
^
e
e

t
t-1
t=2
T
2
^
e
 t
t=1
For a null hypothesis of no
autocorrelation, we can use H0:  = 0.
For an alternative hypothesis we could use
H1:  > 0 or H1:  < 0 or H1:   0.
Testing for Autocorrelation
24
The test statistic, d , is approximately related to^ as:
^  4
0  d  2(1)
When ^
 = 0 , the Durbin-Watson statistic is d  2.
When ^
 = 1 , the Durbin-Watson statistic is d  0.
When ^
 = 1, the Durbin-Watson statistic is d  4.
Tables for critical values for d are not always
readily available so it is easier to use the p-value
that most computer programs provide for d.
Reject H0 if p-value < , the significance level.
Test for the first-order autocorrelation
H1:  > 0
25
Reject H0
Inclusive
Do not reject H0
d < dL
dL < d < d U
d > dU
H1:  < 0 d > 4  dL 4  dU < d < 4  dL d < 4  dU
H1:   0
d > 4  dL 4  dU < d < 4  dL
or
or
dU < d < 4  dU
d < dL
dL < d < dU
Note: The lower and upper bounds (dL and dU) depend
on sample size n and the number of explanatory
variables k (not include intercept).
26
A. Test for Positive Autocorrelation
>0
0
Inclusive
dU
dL
No evidence of positive autocorrelation
2
4
B. Test for Negative Autocorrelation
No evidence of negative autocorrelation
2
0
Inclusive
4  dU
<0
4  dL
4
C. Two-Sided Test for Autocorrelation
>0
0
Inclusive
dL
dU
No evidence of
autocorrelation
2
Inclusive
4  dU
<0
4  dL
4
Prediction with AR(1) Errors
27
When errors are autocorrelated, the previous period
error may help us predict next period error.
The best predictor, yT+1 , for next period is:
^
^
~
^
^
yT+1 = 1 + 2xT+1 +  eT
^
^
where 1 and 2 are generalized least squares
~
estimates and eT is given by:
~
^
^
eT = yT  1  2xT
28
For h periods ahead, the best predictor is:
^
^
~
^
^
h
yT+h = 1 + 2xT+h +  eT
~
^
^
h
Assuming |  | < 1, the influence of  eT
diminishes the further we go into the future
(the larger h becomes).