Chapter 6 Autocorrelation 1

Download Report

Transcript Chapter 6 Autocorrelation 1

Chapter 6
Autocorrelation
1
What is in this Chapter?
• How do we detect this problem?
• What are the consequences?
• What are the solutions?
2
What is in this Chapter?
• Regarding the problem of detection, we start with the
Durbin-Watson (DW) statistic, and discuss its several
limitations and extensions.
– Durbin's h-test for models with lagged dependent variables
– Tests for higher-order serial correlation.
• We discuss (in Section 6.5) the consequences of serially
correlated errors and OLS estimators.
3
What is in this Chapter?
• The solutions to the problem of serial correlation are
discussed in:
– Section 6.3: estimation in levels versus first differences
– Section 6.9: strategies when the DW test statistic is significant
– Section 6.10: trends and random walks
• This chapter is very important and the several ideas
have to be understood thoroughly.
4
6.1 Introduction
• The order of autocorrelation
• In the following sections we discuss how to:
1. Test for the presence of serial correlation.
2. Estimate the regression equation when the
errors are serially correlated.
5
6.2 Durbin-Watson Test
n
d
2
ˆ
ˆ
 (ut  ut 1 )
2
n
2
ˆ
u
 t
1
2
2
ˆ
ˆ
ut   ut 1  2 uˆt uˆt 1

d
2
 uˆt
6
6.2 Durbin-Watson Test
• The sampling distribution of d depends on
values of the explanatory variables and hence
Durbin and Watson derived upper (dU ) limits
and lower (d L ) limits for the significance level for
d.
• There are tables to test the hypothesis of zero
autocorrelation against the hypothesis of firstorder positive autocorrelation. ( For negative
autocorrelation we interchange d L and dU .)
7
6.2 Durbin-Watson Test
• If d  d L , we reject the null hypothesis of no
autocorrelation.
• If d  dU , we do not reject the null hypothesis.
• If d L  d  dU , the test is inconclusive.
8
6.2 Durbin-Watson Test
• Although we have said that d  2 (1  ˆ ) this
approximation is valid only in large samples.
• The mean of d when   0 has been shown to
be given approximately by (the proof is rather
complicated for our purpose)
2 (k  1)
E (d )  2 
nk
where k is the number of regression parameters
estimated (including the constant term) and n is
9
the sample size.
6.2 Durbin-Watson Test
• Thus, even for zero serial correlation, the
statistic is biased upward form 2.
• If k=5 and n=15, the bias is as large as 0.8.
• We illustrate the use of the DW test with an
example.
10
6.2 Durbin-Watson Test
Illustrative Example
• Consider the data in Table 3.11. The estimated
production function is
log X   3.938 1.451 log L1  0.384 log K1
( 0.237)
R 2  0.9946
( 0.083)
DW  0.88
( 0.048)
ˆ  0.559
• Referring to the DW table with k’=2 and n=39 for
5% significance level, we see that d L  1.38 .
• Since the observed d  0.858  d L , we reject the
hypothesis   0 at the 5% level.
11
6.2 Limitations of D-W Test
1. It test for only first-order serial correlation.
2. The test is inconclusive if the computed value
lies between d L and dU .
3. The test cannot be applied in models with
lagged dependent variables.
12
6.3 Estimation in Levels Versus
First Differences
• Simple solutions to the serial correlation problem: First
Difference
• If the DW test rejects the hypothesis of zero serial
correlation, what is the next step?
• In such cases one estimates a regression by
transforming all the variables by
– ρ-differencing (quasi-first difference)
– First-difference
13
6.3 Estimation in Levels Versus
First Differences
yt     xt  ut
yt 1     xt 1  ut 1
( yt  yt 1 )   ( xt  xt 1 )  (ut  ut 1 )
14
6.3 Estimation in Levels Versus
First Differences
yt     t   xt  ut
yt 1     (t  1)   xt 1  ut 1
( yt  yt 1 )     ( xt  xt 1 )  (ut  ut 1 )
15
6.3 Estimation in Levels Versus
First Differences
• When comparing equations in levels and first
differences, one cannot compare the R2 because
the explained variables are different.
• One can compare the residual sum of squares
but only after making a rough adjustment.
(Please refer to P.231)
16
6.3 Estimation in Levels Versus
First Differences
ˆ e2  var( ut  ut 1 )  var( ut )  var( ut 1 )  2 cov( ut , ut 1 )
     2 
2
u
2
u
2
u
 2  (1   )
2
u
 n  k 1 

 2 (1   )
 nk 
 n  k 1 

d
 nk 
17
6.3 Estimation in Levels Versus
First Differences
• For instance, if the residual sum of squares is ,
say, 1.2 by the level equation, and 0.8 by the
difference equation and n= 11, k=1, DW=0.9,
then the adjusted residual sum of squares with
the levels equation is (9/10)(0.9)(1.2)=0.97
which is the number to be compared with 0.8.
18
6.3 Estimation in Levels Versus
First Differences
• Since we have comparable residual sum
of squares (RSS), we can get the
comparable R2 as well, using the
relationship RSS = S y y (l - R2)
19
6.3 Estimation in Levels Versus
First Differences
• Let
2
2
R

R
 1
from the first difference equation
 RSS 0 residual sum of squares from the levels equation
 RSS 1 residual sum of squares from the first difference
equation
 RD2 comparable R 2 from the levels equation
Then
1  R 2D 
n  k 1 
  RSS 0 
 d  RSS 1
2
nk
1 R 1 

RSS 0  n  k  1 

d 

RSS 1  n  k

20
6.3 Estimation in Levels Versus
First Differences
Illustrative Examples
• Consider the simple Keynesian model discussed
by Friedman and Meiselman. The equation
estimated in levels is
Ct     At   t
t  1 , 2 ,...., T
where Ct= personal consumption expenditure
(current dollars)
At= autonomous expenditure
(current dollars)
21
6.3 Estimation in Levels Versus
First Differences
• The model fitted for the 1929-1030 period gave
(figures in parentheses are standard)
1. Ct  58,335.0  2.4984 A t
( 0.312)
R  0.8771
2
1
DW  0.89
RSS 1  11,943 10
DW  1.51
RSS 0  8,387  10 4
2. Ct  1.993  A t
( 0.324)
R02  0.8096
22
4
6.3 Estimation in Levels Versus
First Differences
RSS 0  n  k  1 
2
R  1
d
(
1

R


1 )
RSS 1  n  k 
11.943  9 
 1

 (0.89) (1  0.8096)
8.387  10 
 1  0.2172  0.7828
2
D
2
R
• This is to be compared with 1  0.8096 from the
equation in first differences.
23
6.3 Estimation in Levels Versus
First Differences
• For the production function data in Table 3.11
the first difference equation is
 log X  0.987  log L1  0.502  log K1
( 0.158)
( 0.134)
R  0.8405 DW  1.177 RSS1  0.0278
2
1
• The comparable figures the levels equation
reported earlier in Chapter 4, equation (4.24) are
R  0.9946 DW  0.858 RSS 0  0.0434
2
0
24
6.3 Estimation in Levels Versus
First Differences
 0.0434  36 
R  1 

 (0.858) (1  0.8405)
 0.0278  37 
 1  0.2079  0.7921
2
D
• This is to be compared with R12  0.8405 from the
equation in first differences.
25
6.3 Estimation in Levels Versus
First Differences
• Harvey gives a different definition of RD2 .He
defines it as
RSS 0

R 
1  R 12 
RSS1
2
D
• This does not adjust for the fact that the error
variances in the levels equations and the first
difference equation are not the same.
• The arguments for his suggestion are given in
his paper.
26
6.3 Estimation in Levels Versus
First Differences
• In the example with the Friedman-Meiselman data his
measure of RD2 is given by
119,430
1  0.8096  0.7289
R  1
83,872
2
D
• Although RD2 cannot be greater than 1, it can be
negative.
• This would be the case when  ( yt   y )  RSS 0 ,that is,
when the level model is giving a poorer explanation
than the naïve model, which says that  yt is a
constant.
2
27
6.3 Estimation in Levels Versus
First Differences
• Usually, with time-series data, one gets high R2
values if the regressions are estimated with the
levels yt and Xt but one gets low R2 values if the
regressions are estimated in first differences (yt yt-1) and (xt - xt-1).
28
6.3 Estimation in Levels Versus
First Differences
• Since a high R2 is usually considered as proof of
a strong relationship between the variables
under investigation, there is a strong tendency to
estimate the equations in levels rather than in
first differences.
• This is sometimes called the “R2 syndrome."
• An example
29
6.3 Estimation in Levels Versus
First Differences
• However, if the DW statistic is very low, it often
implies a misspecified equation, no matter what
the value of the R2 is
• In such cases one should estimate the
regression equation in first differences and if the
R2 is low, this merely indicates that the variables
y and x are not related to each other.
30
6.3 Estimation in Levels Versus
First Differences
• Granger and Newbold present some examples
with artificially generated data where y, x, and
the error u are each generated independently so
that there is no relationship between y and x.
• But the correlations between yt and yt-1,.Xt and
Xt-1, and ut and ut-1 are very high.
• Although there is no relationship between y and
x the regression of y on x gives a high R2 but a
low DW statistic.
31
6.3 Estimation in Levels Versus
First Differences
• When the regression is run in first differences, the R2 is
close to zero and the DW statistic is close to 2.
• Thus demonstrating that there is indeed no relationship
between y and x and that the R2 obtained earlier is
spurious.
• Thus regressions in first differences might often reveal
the true nature of the relationship between y and x.
• An example
32
Homework
• Find the data
– Y is the Taiwan stock index
– X is the U.S. stock index
• Run two equations
– The equation in levels (log-based price)
– The equation in the first differences
• A comparison between the two equations
– The beta estimate and its significance
– The R square
– The value of DW statistic
• Q: Adopt the equation in levels or the first
differences?
33
6.3 Estimation in Levels Versus
First Differences
• For instance, suppose that we have quarterly
data; then it is possible that the errors in any
quarter this year are most highly correlated with
the errors in the corresponding quarter last year
rather than the errors in the preceding quarter
• That is, ut could be uncorrelated with ut-1 but it
could be highly correlated with ut-4.
• If this is the case, the DW statistic will fail to
detect it.
34
6.3 Estimation in Levels Versus
First Differences
• What we should be using is a modified
statistic defined as
d4
(uˆ  uˆ


 uˆ
t
t 4
2
t
)
2
• The quarterly data (e.g. GDP)
• The monthly data (e.g. Industrial product
index)
35
6.4 Estimation Procedures with
Autocorrelated Errors
ut   ut 1  et
• Now we will derive var(ut) and the correlations
between ut and lagged values of ut. ..
• From equation(6.1) note that ut depends on et
and ut-1 ,ut-1 depends on et-1 and ut-2 ,and so on.
36
6.4 Estimation Procedures with
Autocorrelated Errors
• This ut depends on et ,et-1 ,et-2 ,…….Since et are
serially independent, and ut-1 depends on et1 ,et-2 and so on, but not et ,we have
E (ut 1 et )  0
• Since E (et )  0 , we have E (ut )  0 for all t.
37
6.4 Estimation Procedures with
Autocorrelated Errors
• If we denote var(ut) by  2 , we have
 u2  var( ut )  E (ut2 )
 E (  ut 1  et ) 2
   
2
2
u
2
e
since
cov( ut 1 , et )  0
2

2
e


• Thus we have
u
2
1 
• This gives the variance of ut in terms of the
variance of et and the parameter  .
38
6.4 Estimation Procedures with
Autocorrelated Errors
• Let us now derive the correlations. Denoting the
correlation between ut and ut-s (which is called
the correlation of lag s) by  s , we get
• But
E (ut ut  s )   2  s
E (ut ut s )   E (ut 1 ut s )  E (et ut  s )
• Hence
or
 s     s 1  0
 s     s 1
39
6.4 Estimation Procedures with
Autocorrelated Errors
• Since  0  1 we get by successive substitution
1   ,
 2   , 3  
2
3
• Thus the lag correlations are all powers of 
and decline geometrically.
40
6.4 Estimation Procedures with
Autocorrelated Errors
• GLS (Generalized least squares)
yt     xt  ut
ut   ut 1  et
t  1, 2 ,......, T
(6.2)
, et ~ (0,  )
iid
2
e
41
6.4 Estimation Procedures with
Autocorrelated Errors
 yt 1       xt 1   ut 1
yt   yt 1   (1   )   ( xt   xt 1 )  et
yt*  yt   yt 1
t  2 , 3 ,...., T
(6.4)
(6.5)
(6.6)
xt*  xt   xt 1
y1*  1   2 y1
(6.6' )
x1*  1   2 x1
42
6.4 Estimation Procedures with
Autocorrelated Errors
• In actual practice  is not known
• There are two types of procedures for
estimating
– 1. Iterative procedures
– 2. Grid-search procedures.
43
6.4 Estimation Procedures with
Autocorrelated Errors
Iterative Procedures
• Among the iterative procedures, the earliest was
the Cochrane-Orcutt (C-O) procedure.
• In the Cochrane-Otcutt procedure we estimate
equation(6.2) by OLS, get the estimated
residuals
ût , and estimate  by ˆ   û t û t-1 /  uˆt2 .
44
6.4 Estimation Procedures with
Autocorrelated Errors
• Durbin suggested an alternative method of
estimating  .
• In this procedure, we write equation (6.5) as
yt   (1   )   yt 1   xt    xt 1  et
(6.7)
• We regress yt on yt 1 , xt , and xt 1 and take the
estimated coefficient of yt 1 as an estimate of  .
45
6.4 Estimation Procedures with
Autocorrelated Errors
• Use equation (6.6) and (6.6’) and estimate a
regression of y* on x*.
• The only thing to note is that the slop coefficient
in this equation is  , but the intercept is  (1   ).
• Thus after estimating the regression of y* on x*,
we have to adjust the constant term
appropriately to get estimates of the parameters
of the original equation (6.2).
46
6.4 Estimation Procedures with
Autocorrelated Errors
• Further, the standard errors we compute from
the regression of y* on x* are now ”asymptotic”
standard errors because of the fact that  has
been estimated.
• If there are lagged values of y as explanatory
variables, these standard errors are not correct
even asymptotically.
• The adjustment needed in this case is discussed
in Section 6.7.
47
6.4 Estimation Procedures with
Autocorrelated Errors
Grid-Search Procedures
• One of the first grid-search procedures is the
Hildreth and Lu procedure suggested in 1960.
• The procedure is as follows. Calculate yt* and xt*
in equation(6.6) for different values of  at
intervals of 0.1 in the rang  1    1 .
• Estimate the regression of yt* on xt* and calculate
the residual sum of squares RSS in each case.
48
6.4 Estimation Procedures with
Autocorrelated Errors
• Choose the value of  for which the RSS is
minimum.
• Again repeat this procedure for smaller intervals
of  around this value.
• For instance, if the value of  for which RSS is
minimum is -0.4, repeat this search procedure
for values of  at intervals of 0.01 in the
range  0.5    0.3 .
49
6.4 Estimation Procedures with
Autocorrelated Errors
• This procedure is not the same as the ML procedure.
If the errors et are normally distributed, we can write
the log-likelihood function as (derivation is omitted)
T
1
Q
2
2
log L  const .  log  e  log (1   ) 
(6.8)
2
2
2
where
2
Q    yt   yt 1   (1   )   ( xt   xt 1 )
• Thus minimizing Q is not the same as maximizing
log L.
• We can use the grid-search procedure to get the ML
estimate.
50
6.4 Estimation Procedures with
Autocorrelated Errors
• Consider the data in Table 3.11 and the
estimation of the production function
log X     1 log L 1  2 log K1  u
• The OLS estimation gave a DW statistic of 0.86,
suggesting significant positive autocorrelation.
• Assuming that the errors were AR(1), two
estimation procedures were used: the HildrethLu grid search and the iterative Cochrane-Orcutt
51
(C-O).
6.4 Estimation Procedures with
Autocorrelated Errors
• The Hildreth-Lu procedure gave ˆ  0.77 .
• The iterative C-O procedure
gave ˆ  0.80 .
• The DW test statistic implied
that ˆ  (1/ 2)(2  0.86)  0.57 .
52
6.4 Estimation Procedures with
Autocorrelated Errors
• The estimates of the parameters (with standard
errors in parentheses) were as follows:
53
6.4 Estimation Procedures with
Autocorrelated Errors
• In this example the parameter estimates given
by Hildreth-Lu and the iterative C-O procedures
are pretty close to each other.
• Correcting for the autocorrelation in the errors
has resulted in a significant change in the
estimates of  1 and  2 .
54
6.5 Effect of AR(1) Errors on OLS
Estimates
• In Section 6.4 we described different
procedures for the estimation of
regression models with AR(1) errors
• We will now answer two questions that
might arise with the use of these
procedures:
– 1. What do we gain from using these
procedures?
– 2. When should we not use these procedures?
55
6.5 Effect of AR(1) Errors on OLS
Estimates
• First, in the case we are considering (i.e., the
case where the explanatory variable Xt is
independent of the error ut), the OLS estimates
are unbiased
• However, they will not be efficient
• Further, the tests of significance we apply, which
will be based on the wrong covariance matrix,
will be wrong.
56
6.5 Effect of AR(1) Errors on OLS
Estimates
• In the case where the explanatory variables
include lagged dependent variables, we will
have some further problems, which we discuss
in Section 6.7
• For the present, let us consider the simple
regression model
57
6.5 Effect of AR(1) Errors on OLS
Estimates
• For the present, let us consider the simple
regression model
yt   xt  ut
(6.9)
• Let var( ut )   u2 , cov(ut , ut  j )   j  u2
• If ut are AR(1), we have  j   j .
x
y

ˆ 
x
t
t
2
t
58
6.5 Effect of AR(1) Errors on OLS
Estimates
xt ut

ˆ
  
2
x
t
V ( ˆ ) 
and
E ( ˆ   )  0
1
var(  xt ut )
2 2
( xt )
 u2
2
2


x

2

x
x

2

xt xt  2  ....


t t 1
2 2  t
( xt )
2 

xt xt 1
xx


2  t t 2
u 
ˆ
V ( ) 
1  2
 2
 ..... 
2 
2
2

x
x
x
t
t
t

(6.10)
59
6.5 Effect of AR(1) Errors on OLS
Estimates
• If we ignore the autocorrelation problem, we
would be computing V (ˆ ) as  u2 /  x t2 .Thus
we would be ignoring the expression in the
parentheses of equation (6.10).
• To get an idea of the magnitude of this
expression, let us assume that the xt series also
follow an AR(1) process with
Var ( xt )   x2
and
corel ( xt , xt 1 )  r
60
6.5 Effect of AR(1) Errors on OLS
Estimates
• Since we are now assuming xt to be stochastic,
we will consider the asymptotic variance of  .
• The expression in parentheses in equation (6.10)
is now
2 r
1  r
1  2  r  2  r  ....  1 

1  2 r 1   r
2
2
61
6.5 Effect of AR(1) Errors on OLS
Estimates
• Thus
2

u 1 r 
ˆ
V ( ) 
2
T  x 1  r
where T is the number of observations.
• If r    0.8 ,then 1  r   1.64  4.56
1  r
0.36
• Thus ignoring the expression in the parenthesis
of equation (6.10) results in an underestimation
by close to 78% for the variance of ˆ .
62
6.5 Effect of AR(1) Errors on OLS
Estimates
• If   0 ,this is an unbiased estimate
.
• If   0 ,then under the assumptions we are
making, we have approximately

1 r  

E ( uˆ )    T 
1 r  

2
t
2
u
• Again if   r  0.8 and T  20 , we have
2
ˆ
u
15.45 2

t
E(
)
  0.81 2
T 1
19
63
6.5 Effect of AR(1) Errors on OLS
Estimates
• We can also derive the asymptotic variance of
~
the ML estimator  when both x and u are firstorder autoregressive as follow. Note that the ML
estimator of  is asymptotically equivalent to
the estimator obtained from a regression
of ( yt   yt 1 ) on ( xt   xt 1 ) .
64
6.5 Effect of AR(1) Errors on OLS
Estimates
• Hence
(x

V (  )  var
~
  xt 1 )(ut   ut 1 )
2
)
x


x
(
 t
t 1
t

(x   x

var
 (x   x
t

) et
2
)
t 1
t 1
t
 e2
2
)
x


x
(
 t
t 1
where  e2  var( et )
65
6.5 Effect of AR(1) Errors on OLS
Estimates
• When xt is autoregressive we have
2
1
lim  ( xt   xt 1 )   x2 (1   2  2 r  )
T  T
2

e
var( u )   2 
1 
or
 e2   2 (1   2 )
• Hence by substitution we get the asymptotic
~
variance of  as
2
1 
V ( ) 
T  x2 1   2  2r 
~
66
6.5 Effect of AR(1) Errors on OLS
Estimates
• Thus the efficiency of the OLS estimator is
~
V (  ) 1  r
1 2

2
ˆ
V (  ) 1  r 1    2r
• One can compute this for different values
of r and  .
• For r    0.8 this efficiency is 0.21.
67
6.5 Effect of AR(1) Errors on OLS
Estimates
• Thus the consequences of autocorrelated errors are:
1. The least squares estimators are unbiased but
are not efficient. Sometimes they are
considerably less efficient than the procedures
that take account of the autocorrelation
2. The sampling variances are biased and
sometimes likely to be seriously understated.
Thus R2 as well as t and F statistics tend to be
exaggerated.
68
6.5 Effect of AR(1) Errors on OLS
Estimates
• The solution to these problems is to use the
maximum likelihood procedure (one-step
procedure) or some other procedure mentioned
earlier (two-step procedure) that takes account
of the autocorrelation.
• However, there are four important points to note:
69
6.5 Effect of AR(1) Errors on OLS
Estimates
1.
If  is known, it is true that one can get estimators
better than OLS that take account of autocorrelation.
However, in practice  is known and has to be
estimated. In small samples it is not necessarily true
that one gains (in terms of mean-square error for ˆ ) by
estimating  .
This problem has been investigated by Rao and
Griliches, who suggest the rule of thumb (for sample of
size 20) that one can use the methods that take
account of autocorrelation if ˆ  0.3 ,where ̂ is the
estimated first-order serial correlation from an OLS
regression. In samples of larger sizes it would be
70
worthwhile using these methods for ̂ smaller than 0.3.
6.5 Effect of AR(1) Errors on OLS
Estimates
• 2. The discussion above assumes that the true errors
are first-order autoregressive. If they have a more
complicated structure (e.g., second-order
autoregressive), it might be thought that it would still be
better to proceed on the assumption that the errors are
first-order autoregressive rather than ignore the problem
completely and use the OLS method???
– Engle shows that this is not necessarily true (i.e.,
sometimes one can be worse off making the
assumption of first-order autocorrelation than ignoring
the problem completely).
71
6.5 Effect of AR(1) Errors on OLS
Estimates
3. In regressions with quarterly (or monthly) data,
one might find that the errors exhibit fourth (or
twelfth)-order autocorrelation because of not
making adequate allowance for seasonal
effects. In such case if one looks for only firstorder autocorrelation, one might not find any.
This does not mean that autocorrelation is not
a problem. In this case the appropriate
specification for the error term may be u t   u t 4 et
for quarterly data and u t   u t 12 et
monthly data.
72
6.5 Effect of AR(1) Errors on OLS
Estimates
4. Finally, and most important, it is often possible
to confuse misspecified dynamics with serial
correlation in the errors. For instance, a static
regression model with first-order autocorrelation
in the errors, that is, y t   xt  ut , ut   ut 1  et ,can
written as
y t   yt 1   xt    xt 1  et
(6.11)
73
6.5 Effect of AR(1) Errors on OLS
Estimates
4. The model is the same as
y t   1 yt 1   2 xt   3 xt 1  et
(6.11' )
with the restriction  1  2   3  0 .
We can estimate the model (6.11’) and test this
restriction. If it is rejected, clearly it is not valid
to estimate (6.11).(the test procedure is
described in Section 6.8.)
74
6.5 Effect of AR(1) Errors on OLS
Estimates
• The errors would be serially correlated but not
because the errors follow a first-order
autoregressive process but because the term xt-1
and yt-1 have been omitted.
• Thus is what is meant by “misspecified
dynamics.” Thus significant serial correlation in
the estimated residuals does not necessarily
imply that we should estimate a serial correlation
model.
75
6.5 Effect of AR(1) Errors on OLS
Estimates
• Some further tests are necessary (like the
restriction 1 2   3  0 in the abovementioned case).
• In fact, it is always best to start with an equation
like (6.11’) and test this restriction before
applying any test for serial correlation.
76
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• In previous sections we considered explanatory
variables that were uncorrelated with the error term
• This will not be the case if we have lagged
dependent variables among the explanatory
variables and we have serially correlated errors
• There are several situations under which we would
be considering lagged dependent variables as
explanatory variables
• These could arise through expectations, adjustment
lags, and so on.
• Let us consider a simple model
77
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• Let us consider a simple model
• et are independent with mean 0 and variance σ2
and   1,   1 .
• Because ut depends on ut-1 and yt-1 depends on
ut-1, the two variables yt-1 and ut will be
correlated.
78
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
An example
79
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
Durbin’s h-Test
• Since the DW test is not applicable in these
models, Durbin suggests an alternative test,
called the h-test.
• This test uses
n
h  ˆ
1  n Vˆ (ˆ )
as a standard normal variable.
80
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• Hence ̂ is the estimated first-order serial
correlation from the OLS residual, Vˆ (ˆ ) is the
estimated variance of the OLS estimate of α,
and n is the sample size.
• If n Vˆ (ˆ )  1 , the test is not applicable. In this
case Durbin suggests the following test.
81
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
Durbin’s Alternative Test
• From the OLS estimation of equation(6.12)
compute the residuals ût .
• Then regress uˆt on uˆt 1 , yt 1 , and xt
• The test for ρ=0 is carried out by testing the
significance of the coefficient of
uˆin
t 1 the
latter regression.
82
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• An equation of demand for food estimated from 50
observations gave the following results (figures in
parentheses are standard errors):
where qt=food consumption per capita
pt=food price (retail price deflated by the consumer
price index)
yt=per capita disposable income deflated by the
consumer price index
83
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• We have
• Hence Duribin’s h-statistic is
• This is significant at the1%level.
• Thus we reject the hypothesis ρ=0, even though the
DW statistic is close to 2 and the estimate
from
̂
84
the OLS residuals is
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• Let us keep all the numbers the same and just
change the standard error of ̂ .
• The following are the results:
• Thus, other things equal, the precision with
which ̂ is estimated has estimated has
significant effect on the outcome of the h-test.
85
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• In the case where the h-test cannot be used, we
can use the alternative test suggested by Durbin,
• However, the Monte Carlo study by Maddala
and Rao suggests that this test does not have
good power in those cases where the h-test
cannot be used.
86
6.7 Tests for Serial Correlation in Models
with Lagged Dependent Variables
• On the other hand, in cases where the h-test
can be used, Durbin’s second test is almost as
powerful.
• It is not often used because it involves more
computations.
• However, we will show that Durbin’s second test
can be generalized to higher-order
autoregressions, whereas the h-test cannot.
87
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• The h-test we have discussed is, like the DurbinWatson test, a test for first-order autoregression.
• Breusch and Godfrey discuss some general
tests that are easy to apply and are valid for very
general hypotheses about the serial correlation
in the errors.
88
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• These tests are derived from a general
principle — called the Lagrange multiplier (LM)
principle
• A discussion of this principle is beyond the
scope of this book.
• For the present we will explain what the test is.
• The test is similar to Durbin's second test that
we have discussed
89
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• Consider the regression model
• We are interested in testing H0:ρ1=ρ2=…=ρp=0.
• The x’s in equation(6.14) include lagged dependent
variables as well.
• The LM test is as follows.
90
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• First, estimate (6.14) by OLS and obtain the least
squares residuals ût .
• Next, estimate the regression equation
• And test whether the coefficient of uˆt  i are all zero.
• We take the conventional F-statistic and use p.F as χ2
with d.f. of p.
• We use the χ2-test rather than the F-test because the LM
91
test is a large sample test.
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• The test can be used for different specifications
of the error process.
• For instance, for the problem of testing for
fourth-order autocorrelation
ut  ut  4  et
(6.17)
we just estimate
k
uˆt   xit i   4uˆt  4  t
(6.18)
i 1
• Instead of (6.16) and test ρ4=0
92
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• The test procedure is the same for
autoregressive or moving average errors.
• For instance if we have a moving average (MA)
error
ut  et   4et 4
instead of (6.17), the test procedure is still to
estimate (6.18) and test 4  0 .
93
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• In all these case, we just test H0 by estimating
equation (6.16) with p=2 and test ρ1=ρ2=0.
• What is of importance is the degree of
autoregression, not its nature.
94
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
• Finally, an alternative to the estimation of
(6.16) is to estimate the equation
95
6.8 A General Test for Higher-Order
Serial Correlation: The LM Test
The LM test for serial correlation is:
1. Estimate equation (6.14) by OLS get the
residual ût .
2. Estimate equation (6.16) or (6.19) by OLS and
compute the F-statistic for testing the
hypothesis H 0 : 1   2  ......   p  0 .
3. Use p  F as X 2 with p degrees of freedom.
96
6.9 Strategies When the DW Test
Statistic is Significant
• The DW test is designed as a test for the hypothesis ρ =
0 if the errors follow a first-order autoregressive process
• However, the test has been found to be robust against
other alternatives such as AR(2), MA(1), ARMA(1, 1),
and so on.
• Further, and more disturbingly, it catches specification
errors like omitted variables that are themselves
autocorrelated, and misspecified dynamics (a term that
we will explain).
• Thus the strategy to adopt, if the DW test statistic is
significant, is not clear.
• We discuss three different strategies:
97
6.9 Strategies When the DW Test
Statistic is Significant
• 1. Assume that the significant DW statistic
is an indication of serial correlation but
may not be due to AR(1) errors
• 2. Test whether serial correlation is due to
omitted variables.
• 3. Test whether serial correlation is due to
misspecified dynamics.
98
6.9 Strategies When the DW Test
Statistic is Significant
Errors Not AR(1)
• In case 1, if the DW statistic is significant, since
it does not necessarily mean that the errors are
AR(1), we should check for higher-order
autoregressions by estimating equations of the
form
ut   1ut 1   2ut  2  et
• Once the order been determined, we can
estimate the model with appropriate
assumptions about the error stricture by the
methods described in Section 6.4.
99
6.9 Strategies When the DW Test
Statistic is Significant
• Moving average (MA) errors and ARMA errors?
• Estimation with MA errors and ARMA errors is more
complicated than with AR errors.
• However, researchers suggest that it is the order of
the error process that is more important than the
particular form the practical point of view, for most
economic data, it is just sufficient to determine the
order of the AR process.
• Thus if a significant DW statistic is observed, the
appropriate strategy would be to try to see whether
the errors are generated by a higher-order AR
process than AR(1) and then undertake estimation.
100
6.9 Strategies When the DW Test
Statistic is Significant
Autocorrelation Caused by Omitted Variables
• Suppose that the true regression equation
is
2
yt   0  1 xt   2 xt  ut
and instead we estimate
yt  0  1 xt  vt
(6.20)
101
6.9 Strategies When the DW Test
Statistic is Significant
• Then since vt   2 xt2  ut , if xt is
autocorrelated, this will produce
autocorrelation in vt.
• However vt is no longer independent of xt.
• This not only are the OLS estimators of β0
and β1 from (6.20) inefficient, they are
inconsistent as well.
102
6.9 Strategies When the DW Test
Statistic is Significant
• Serial correlation due to misspecification dynamics.
• In a seminal paper published in 1964, Sargan pointed
out that a significant DW statistic does not necessarily
imply that we have a serial correlation problem.
• This point was also emphasized by Henry and Mizon.
• The argument goes as follows.
• Consider
yt  xt  ut
with ut   ut 1  et
(6.24)
and et are independent with a common variable σ2.
• We can write this model as
yt   yt 1   xt   xt 1  et
(6.25)
103
6.9 Strategies When the DW Test
Statistic is Significant
• Consider an alternative stable dynamic model:
• Equation (6.25) is the same as equation(6.26)
with the restriction
104
6.9 Strategies When the DW Test
Statistic is Significant
• A test for ρ=0 is a test for β1=0 (and β3=0).
• But before we test this, what Sargan says is that
we should first test the restriction (6.27) and test
for ρ=0 only if the hypothesis H 0 :  1 2  3  0is
not rejected.
• If this hypothesis is rejected, we do not have a
serial correlation model and the serial correlation
in the errors in (6.24) is due to “misspecified”
dynamics, that is the omission of the variable yt-1
and xt-1 from the equation.
105
6.9 Strategies When the DW Test
Statistic is Significant
• If the DW test statistic is significant, a proper
approach is to test the restriction(6.27) to make sure
that what we have is a serial correlation model
before we undertake any autoregressive
transformation of the variables.
• In fact, Sargan suggests starting with the general
model (6.26) and testing the restriction (6.27) first,
before attempting any test for serial correlation.
106
6.9 Strategies When the DW Test
Statistic is Significant
Illustrative Example
• Consider the data in Table 3.11 and the estimation
of the production function (4.24).
• In Section 6.4 we presented estimates of the
equation assuming that the errors are AR(1).
• This was based on a DW test statistic of 0.86.
• Suppose that we estimate an equation of the
form(6.26).
• The results are as follows (all variables in logs;
figures in parentheses are standard errors):
107
6.9 Strategies When the DW Test
Statistic is Significant
Illustrative Example
• Under the assumption that the errors are AR(1),
the residual sum of squares, obtained from the
Hildreth-Lu procedure we used in Section 6.4, is
RSS1=0.02635
108
6.9 Strategies When the DW Test
Statistic is Significant
• Since we have two slope coefficients, we
have two restrictions of the form (6.27).
• Note that for the general dynamic model
we estimating six parameters (α and five
β’s).
• For the serial correlation model we are
estimating four parameters (α, two β’s, and
ρ).
• We will use the likelihood ratio test (LR)
109
6.9 Strategies When the DW Test
Statistic is Significant
RSS 0 n / 2
 (
)
RSS1
and -2logeλhas a χ2 -distribution with d.f. 2(number of
restrictions).
• In our example
which is significant at the 1% level.
• Thus the hypothesis if a first-order autocorrelation is
rejected.
• Although the DW statistic is significant, this does not
mean that the errors are AR(1).
110