Chapter 11 assumption c.7 (EC220)

Download Report

Transcript Chapter 11 assumption c.7 (EC220)

Christopher Dougherty
EC220 - Introduction to econometrics
(chapter 11)
Slideshow: assumption c.7
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 11). [Teaching Resource]
© 2012 The Author
This version available at: http://learningresources.lse.ac.uk/137/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows
the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user
credits the author and licenses their new creations under the identical terms.
http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
ASSUMPTION C.7
ASSUMPTIONS FOR MODEL C
C.7
The disturbance term is distributed independently of the regressors
ut is distributed independently of Xjt' for all t' (including t) and j
(1)
The disturbance term in any observation is distributed
independently of the values of the regressors in the same
observation, and
(2)
The disturbance term in any observation is distributed
independently of the values of the
regressors in the other observations.
Assumption C.7, like its counterpart Assumption B.7, is essential for both the unbiasedness
and the consistency of OLS estimators.
1
ASSUMPTION C.7
ASSUMPTIONS FOR MODEL C
C.7
The disturbance term is distributed independently of the regressors
ut is distributed independently of Xjt' for all t' (including t) and j
(1)
The disturbance term in any observation is distributed
independently of the values of the regressors in the same
observation, and
(2)
The disturbance term in any observation is distributed
independently of the values of the
regressors in the other observations.
It is helpful to divide it into two parts, as shown above. Both parts are required for
unbiasedness. However only the first part is required for consistency (as a necessary, but
not sufficient, condition).
2
ASSUMPTION C.7
ASSUMPTIONS FOR MODEL C
C.7
The disturbance term is distributed independently of the regressors
ut is distributed independently of Xjt' for all t' (including t) and j
(1)
The disturbance term in any observation is distributed
independently of the values of the regressors in the same
observation, and
(2)
The disturbance term in any observation is distributed
independently of the values of the
regressors in the other observations.
For cross-sectional regressions, Part (2) is rarely an issue. Since the observations are
generated randomly there is seldom any reason to suppose that the disturbance term in one
observation is not independent of the values of the regressors in the other observations.
3
ASSUMPTION C.7
ASSUMPTIONS FOR MODEL C
C.7
The disturbance term is distributed independently of the regressors
ut is distributed independently of Xjt' for all t' (including t) and j
(1)
The disturbance term in any observation is distributed
independently of the values of the regressors in the same
observation, and
(2)
The disturbance term in any observation is distributed
independently of the values of the
regressors in the other observations.
Hence unbiasedness really depended on part (1). Of course, this might fail, as we saw with
measurement errors in the regressors and with simultaneous equations estimation.
4
ASSUMPTION C.7
ASSUMPTIONS FOR MODEL C
C.7
The disturbance term is distributed independently of the regressors
ut is distributed independently of Xjt' for all t' (including t) and j
(1)
The disturbance term in any observation is distributed
independently of the values of the regressors in the same
observation, and
(2)
The disturbance term in any observation is distributed
independently of the values of the
regressors in the other observations.
With time series regression, part (2) becomes a major concern. To see why, we will review
the proof of the unbiasedness of the OLS estimator of the slope coefficient in a simple
regression model.
5
ASSUMPTION C.7
OLS
2
b
 X  X Y  Y 


 X  X 
 X  X    X  u      X  u 


 X  X 

  X  X    X  X u  u 


 X  X 
 X  X u  u 

 
 X  X 
i
i
2
i
i
1
2
i
i
1
2
2
i
2
2
i
i
i
2
i
i
i
2
2
i
The slope coefficient may be written as shown above.
6
ASSUMPTION C.7
OLS
2
b
 X  X Y  Y 


 X  X 
 X  X    X  u      X  u 


 X  X 

  X  X    X  X u  u 


 X  X 
 X  X u  u 

 
 X  X 
i
i
2
i
i
1
2
i
i
1
2
2
i
2
2
i
i
i
2
i
i
i
2
2
i
We substitute for Y from the true model.
7
ASSUMPTION C.7
OLS
2
b
 X  X Y  Y 


 X  X 
 X  X    X  u      X  u 


 X  X 

  X  X    X  X u  u 


 X  X 
 X  X u  u 

 
 X  X 
i
i
2
i
i
1
2
i
i
1
2
2
i
2
2
i
i
i
2
i
i
i
2
2
i
The 1 terms in the second factor in the numerator cancel each other. Rearranging what is
left, we obtain the third line.
8
ASSUMPTION C.7
OLS
2
b
 X  X Y  Y 


 X  X 
 X  X    X  u      X  u 


 X  X 

  X  X    X  X u  u 


 X  X 
 X  X u  u 

 
 X  X 
i
i
2
i
i
1
2
i
i
1
2
2
i
2
2
i
i
i
2
i
i
i
2
2
i
The first term in the numerator, when divided by the denominator, reduces to 2. Hence as
usual we have decomposed the slope coefficient into the true value and an error term.
9
ASSUMPTION C.7
OLS
2
b
 2
 X  X u  u 


 X  X 
 X  X u   X  X u



 X  X   X  X 
 X  X u u   X  X 



 X  X   X  X 
 X  X u


 X  X 
i
i
2
i
 2
i
i
2
i
 2
i
2
i
i
2
i
 2
i
i
i
2
i
i
2
i
The error term can be decomposed as shown.
10
ASSUMPTION C.7
OLS
2
b
 2
 X  X u  u 


 X  X 
 X  X u   X  X u



 X  X   X  X 
 X  X u u   X  X 



 X  X   X  X 
 X  X u


 X  X 
i
i
2
i
 2
i
i
2
i
 2
i
2
i
i
2
i
 2
i
i
i
2
i
i
2
i
u– is a common factor in the second component of the error term and so can be brought out
of it as shown.
11
ASSUMPTION C.7
OLS
2
b
 2
 X  X u  u 


 X X X X   nX
 X  XunX 
XX 0X u


n


 X  X   X  X 
 X  X u u   X  X 



 X  X   X  X 
 X  X u


 X  X 
i
i
2
i
i
 2
i
i
i
2
i
 2
i
2
i
i
2
i
 2
i
i
i
2
i
i
2
i
It can then be seen that the numerator of the second component of the error term is zero.
12
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
We are thus able to show that the OLS slope coefficient can be decomposed into the true
value and an error term that is a weighted sum of the values of the disturbance term in the
observations, with weights ai defined as shown.
13
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2  E  ai ui    2   E ai ui 
  2   E ai E ui 
Eai ui   Ea1u1  ...  anun   Ea1u1   ...  Eanun    Eai ui 
  2   E ai  0   2
Now we will take expectations. The expectation of the right side of the equation is the sum
of the expectations of the individual terms.
14
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2  E  ai ui    2   E ai ui 
  2   E ai E ui 
  2   E ai  0   2
If the ui are distributed independently of the ai, we can decompose the E(aiui) terms as
shown.
15
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2  E  ai ui    2   E ai ui 
  2   E ai E ui 
  2   E ai  0   2
Unbiasedness then follows from the assumption that the expectation of ui is zero.
16
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
The crucial step is the previous one, which requires ui to be distributed independently of ai.
ai is a function of all of the X values in the sample, not just Xi. So Part (1) of Assumption
C.7, that ui is distributed independently of Xi, is not enough.
17
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
We also need Part (2), that ui is distributed independently of Xj, for all j.
18
ASSUMPTION C.7
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
In regressions with cross-sectional data this is usually not a problem.
19
ASSUMPTION C.7
CROSS-SECTIONAL DATA:
LGEARNi = 1 + 2Si + ui
LGEARNj = 1 + 2Sj + uj
Reasonable to assume uj and
Si independent (i ≠ j).
The main issue is whether
is independent of Si.
ui
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
If, for example, we are relating the logarithm of earnings to schooling using a sample of
individuals, it is reasonable to suppose that the disturbance term affecting individual I will
be unrelated to the schooling of any other individual.
20
ASSUMPTION C.7
CROSS-SECTIONAL DATA:
LGEARNi = 1 + 2Si + ui
LGEARNj = 1 + 2Sj + uj
Reasonable to assume uj and
Si independent (i ≠ j).
The main issue is whether
is independent of Si.
ui
 X  X u

 
 X  X 
b    a u
i
OLS
2
b
i
2
2
i
OLS
2
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
Assuming this, the independence of ui and ai then depends only on the independence of ui
and Si.
21
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
However with time series data it is different. Suppose, for example, that you have a model
with a lagged dependent variable as a regressor. Here we have a very simple model where
the only regressor is the lagged dependent variable.
22
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
We will suppose that Part (1) of Assumption C.7 is valid and that ut is distributed
independently of Yt–1.
23
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
Even if Part (1) is valid, Part (2) must be invalid in this model.
24
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
  2   Eai  0   2
ut is a determinant of Yt and Yt is the regressor in the next observation. Hence even if ut is
uncorrelated with the explanatory variable Yt–1 in the observation for Yt, it will be correlated
with the explanatory variable Yt in the observation for Yt+1.
25
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
X
  2   Eai  0   2
As a consequence ui is not independent of ai and so we cannot write E(aiui) = E(ai)E(ui). It
follows that the OLS slope coefficient will in general be biased.
26
ASSUMPTION C.7
b
TIME SERIES DATA:
i
i
2
2
i
Yt = 1 + 2Yt–1 + ut
OLS
2
Yt+1 = 1 + 2Yt + ut+1
The disturbance term ut is
automatically correlated with
the explanatory variable
the next observation.
 X  X u

 
 X  X 
b    a u
OLS
2
Yt in
2
i i
Xi  X
ai 
2


X

X
 i
E b2OLS    2   Eai ui 
  2   Eai Eui 
X
  2   Eai  0   2
We cannot obtain a closed-form analytical expression for the bias. However we can
investigate it with Monte Carlo simulation.
27
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
We will start with the very simple model shown at the top of the slide. Y is determined only by
its lagged value, with intercept 10 and slope coefficient 0.8.
28
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
The disturbance term u will be generated using random numbers drawn from a normal
distribution with mean 0 and variance 1. The sample size is 20.
29
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
Here are the estimates of the coefficients and their standard errors for 10 samples. We will
start by looking at the distribution of the estimate of the slope coefficient. 8 of the
estimates are below the true value and only 2 above.
30
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
This suggests that the estimator is downwards biased. However it is not conclusive proof
because an 8–2 split will occur quite frequently even if the estimator is unbiased. (If you are
good at binomial distributions, you will be able to show that it will occur 11% of the time.)
31
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
However the suspicion of a bias is reinforced by the fact that many of the estimates below
the true value are much further from it than those above. The mean value of the estimates
is 0.62.
32
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
n = 20
Sample
1
2
n = 1000
b1 s.e.(b1)
24.3 10.2
12.6
8.1
b2 s.e.(b2)
0.52
0.20
0.74
0.16
b1 s.e.(b1) b2
11.0 1.0
0.78
11.8 1.0
0.76
s.e.(b2)
0.02
0.02
3
4
5
6
26.5
28.8
10.5
9.5
11.5
9.3
5.4
7.0
0.49
0.43
0.78
0.81
0.22
0.18
0.11
0.14
10.8
9.4
12.2
10.5
1.0
0.9
1.0
1.0
0.78
0.81
0.76
0.79
0.02
0.02
0.02
0.02
7
8
9
10
4.9
26.9
25.1
20.9
7.4
10.5
10.6
8.8
0.91
0.47
0.49
0.58
0.15
0.20
0.22
0.18
10.6
10.3
10.0
9.6
1.0
1.0
0.9
0.9
0.79
0.79
0.80
0.81
0.02
0.02
0.02
0.02
To determine whether the estimator is biased or not, we need a greater number of samples.
33
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
2.5
2
1.5
mean = 0.6233
(n = 20)
1
0.5
0
-0.5
0
0.5
0.8
1
1.5
The chart shows the distribution with 1 million samples. This settles the issue. The
estimator is biased downwards.
34
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
2.5
2
1.5
mean = 0.6233
(n = 20)
1
0.5
0
-0.5
0
0.5
0.8
1
1.5
There is a further puzzle. If the disturbance terms are drawn randomly from a normal
distribution, as was the case in this simulation, and the regression model assumptions are
valid, the regression coefficients should also have normal distributions.
35
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
2.5
2
1.5
mean = 0.6233
(n = 20)
1
0.5
0
-0.5
0
0.5
0.8
1
1.5
However the distribution is not normal. It is negatively skewed.
36
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
2.5
2
1.5
mean = 0.6233
(n = 20)
1
0.5
0
-0.5
0
0.5
0.8
1
1.5
Nevertheless the estimator may be consistent, provided that certain conditions are
satisfied.
37
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
7
6
5
mean = 0.7650
(n = 100)
4
3
2
mean = 0.6233
(n = 20)
1
0
-0.5
0
0.5
0.8
1
1.5
When we increase the sample size from 20 to 100, the bias is much smaller. (X has been
assigned the values 1, …, 100. The distribution here and for all the following diagrams is for
1 million samples.)
38
ASSUMPTION C.7
Yt = 10 + 0.8Yt–1 + ut
25
20
mean = 0.7966
(n = 1000)
15
10
mean = 0.7650
(n = 100)
5
mean = 0.6233
(n = 20)
0
-0.5
0
0.5
0.8
1
1.5
If we increase the sample size to 1,000, the bias almost vanishes. (X has been assigned the
values 1, …, 1,000.)
39
ASSUMPTION C.7
Yt = 10 + 0.5Xt + 0.8Yt–1 + ut
2
1.5
mean = 0.4979
(n = 20)
1
mean = 1.2553
(n = 20)
0.5
0
-1
-0.5
0
0.5
0.8
1
1.5
2
2.5
3
Here is a slightly more realistic model with an explanatory variable Xt as well as the lagged
dependent variable.
40
ASSUMPTION C.7
Yt = 10 + 0.5Xt + 0.8Yt–1 + ut
2
1.5
mean = 0.4979
(n = 20)
1
mean = 1.2553
(n = 20)
0.5
0
-1
-0.5
0
0.5
0.8
1
1.5
2
2.5
3
The estimate of the coefficient of Yt–1 is again biased downwards, more severely than before
(black curve). The coefficient of Xt is biased upwards (red curve).
41
ASSUMPTION C.7
Yt = 10 + 0.5Xt + 0.8Yt–1 + ut
6
5
mean = 0.7441
(n = 100)
4
3
mean = 0.6398
(n = 100)
2
-1
1
mean = 0.4979
(n = 20)
0
-0.5
0
mean = 1.2553
(n = 20)
0.5
0.8
1
1.5
2
2.5
3
If we increase the sample size to 100, the coefficients are much less biased.
42
ASSUMPTION C.7
Yt = 10 + 0.5Xt + 0.8Yt–1 + ut
20
15
mean = 0.7947
(n = 1000)
10
mean = 0.5132
(n = 1000)
5
0
-1
-0.5
0
0.5
0.8
1
1.5
2
2.5
3
If we increase the sample size to 1,000, the bias almost disappears, as in the previous
example.
43
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
2
t 1
t 1
t 1
t
2
2
t 1
t 1
In both of these examples the OLS estimators were consistent, despite being biased for
finite samples. We will explain this for the first example. The slope coefficient can be
decomposed as shown in the usual way.
44
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
2
t 1
t 1
t 1
t
2
2
t 1
t 1
 1 Y  Y u  u  
  t 1 t 1 t

  Yt 1  Yt 1 ut  u  
n
  plim 

plim 
2

1
2


Y

Y



t

1
t

1




Y

Y


t 1
t 1
 n

1
plim  Yt 1  Yt 1 ut  u  
n

 Yt21 ,ut
1
2
 Yt 1
plim  Yt 1  Yt 1 
n
We will show that the plim of the error term is 0. As it stands, neither the numerator nor the
denominator possess limits, so we cannot invoke the plim quotient rule.
45
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
2
t 1
t 1
t 1
t
2
2
t 1
t 1
 1 Y  Y u  u  
  t 1 t 1 t

  Yt 1  Yt 1 ut  u  
n
  plim 

plim 
2

1
2


Y

Y



t

1
t

1




Y

Y


t 1
t 1
 n

1
plim  Yt 1  Yt 1 ut  u  
n

 Yt21 ,ut
1
2
 Yt 1
plim  Yt 1  Yt 1 
n
We divide the numerator and the denominator by n.
46
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
2
t 1
t 1
t 1
t
2
2
t 1
t 1
 1 Y  Y u  u  
  t 1 t 1 t

  Yt 1  Yt 1 ut  u  
n
  plim 

plim 
2

1
2


Y

Y



t

1
t

1




Y

Y


t 1
t 1
 n

1
plim  Yt 1  Yt 1 ut  u  
n

 Yt21 ,ut
1
2
 Yt 1
plim  Yt 1  Yt 1 
n
Now we can invoke the plim quotient rule, because it can be shown that the plim of the
numerator is the covariance of Yt–1 and ut and the plim of the denominator is the variance of
Yt–1.
47
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
OLS
2
plim b
t
2
2
t 1
t 1
2
t 1
t 1
t 1
 Y ,u
0
 2  2  2  2  2
Y
Y
t 1
t 1
t
t 1
If Part (1) of Assumption C.7 is valid, the covariance between ut and Yt–1 is zero. In this
model it is reasonable to suppose that Part(1) is valid because Yt–1 is determined before ut is
generated.
48
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
OLS
2
plim b
t
2
2
t 1
t 1
2
t 1
t 1
t 1
 Y ,u
0
 2  2  2  2  2
Y
Y
t 1
t 1
t
t 1
Thus the plim of the slope coefficient is equal to the true value and the slope coefficient is
consistent.
49
ASSUMPTION C.7
Yt  1   2Yt 1  ut
OLS
2
b
Y  Y Y  Y 
Y  Y u  u 



 
Y  Y 
Y  Y 
t 1
t 1
t 1
t
OLS
2
plim b
t
2
2
t 1
t 1
2
t 1
t 1
t 1
 Y ,u
0
 2  2  2  2  2
Y
Y
t 1
t 1
t
t 1
You will often see models with lagged dependent variables in the applied literature. Usually
the problem discussed in this slideshow is ignored. This is acceptable if the sample size is
large enough, but if the sample is small, there is a risk of serious bias.
50
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
It is obvious that Part (2) of Assumption C.7 is invalid in models with lagged dependent
variables. However it is often invalid in more general models. Consider the two-equation
model shown above.
51
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
It may be reasonable to suppose that ut is distributed independently of Xt–1 because it is
generated randomly at time t, by which time Xt–1 has already been determined. Then Part (1)
of Assumption C.7 is valid for the first equation. The same goes for the second equation.
52
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
Xt+1 ← Yt ← ut
Yt  2  1   2 X t 1  ut  2
X t 1  1   2Yt  vt 1
However ut is a determinant of Yt, and hence of Xt+1. This means that ut is correlated with
the X regressor in the first equation in the observations for Yt+2, Yt+4, ... etc. Again Part (2) of
Assumption C.7 is violated and the OLS estimators will be biased.
53
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
Xt+1 ← Yt ← ut
Yt  2  1   2 X t 1  ut  2
X t 1  1   2Yt  vt 1
Since interactions and lags are common in economic models using time series data, the
problem of biased coefficients should be taken as the working hypothesis, the rule rather
than the exception.
54
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
Xt+1 ← Yt ← ut
Yt  2  1   2 X t 1  ut  2
X t 1  1   2Yt  vt 1
Fortunately Part (2) of Assumption C.7 is not required for consistency. Part (1) is a
necessary condition. If it is violated, the regression coefficients will be inconsistent.
55
ASSUMPTION C.7
Yt  1   2 X t 1  ut
X t  1   2Yt 1  vt
Xt+1 ← Yt ← ut
Yt  2  1   2 X t 1  ut  2
X t 1  1   2Yt  vt 1
However, Part (1) is not a sufficient condition for consistency because it is possible that the
regression estimators may not tend to finite limits as the sample size becomes large. This
is a relatively technical issue that will be discussed in Chapter 13.
56
Copyright Christopher Dougherty 2011.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 11.5 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own and who feel that they might
benefit from participation in a formal course should consider the London School
of Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
20 Elements of Econometrics
www.londoninternational.ac.uk/lse.
11.07.25