No Slide Title

Transcript No Slide Title

Research Method
Lecture 8 (Ch14)
Advanced Panel Data
Method
©
1
Fixed effects estimation
Fixed effects estimation is another method
to eliminate the time invariant
unobserved effect.
 Consider the following model
Yit=β0+β1xit1+β2xit2+…+xitk+ai+uit ……. (1)
The correlation between the fixed effect ai and
the explanatory variables will cause biases in
the estimated coefficients.
2
Thus, we need to eliminate ai from the
estimation. The first differencing is one
method. Another method is the following.
First, compute the sample average of
variables for each individual. (That is, for
ith individual, you compute the time series
sample average of each variables). Then,
you have the following
yi  0  1 xi1  2 xi 2      k xik  ai  ui
.........(2)
Since ai is constant over time, ai term in the
equation (2) does not have the over-bar.
3
Now, subtract (2) from (1). Then, you get the
following equation.
( yit  yi )  1 ( xit1  xi1 )  2 ( xit 2  xi 2 )      k ( xitk  xik )  (uit  ui )
Notice that, this transformation eliminates the fixed
effect ai. This transformation is called the within
transformation. Note also that this transformation
eliminates the constant as well.
Now, we simplify the notation by writing the above
equation as:
yit  1xit1  2 xit 2      k xitk  uit    (3)
where y  y  y . This is called the time-demeaned
data on y. The same notation is used for the xvariables and u.
4
it
it
i
Finally, estimate the demeaned equation
(3) using OLS. This is called the fixed
effect estimation.
To repeat, you simply run the OLS for the
following equation and it is called the
fixed effect estimation..
yit  1xit1  2 xit 2      k xitk  uit
Note that you do not have the intercept in
this model.
5
The standard error for the
fixed effect estimator
Now, define the fixed effects residual as
uît  yit  (ˆ1xit1      ˆk xitk )
Then, the unbiased estimator of the
sample variance is given by
=Total # of
observations. (T
is the # of period,
and N is the # of
cross sectional
units)
2
ˆ
u
n T
1
2
ˆ

uit

NTk

N
i 1 t 1



Degree of freedom
# of parameters excluding the
intercept
SSR
# cross sectional units (# of
individuals, firms etc)
6
2
ˆ

variance u ,
After computing the estimated sample
you can compute the standard errors for the
parameters by applying the formula given in
Handout 2.
Notice that, if you manually create the timedemeaned variables and apply OLS, the usual
statistical software will compute the degree of
freedom as NT-k. This will understate the standard
errors.
 In this case, you have to correct the sample
standard errors by multiplying each standard error
by (NT  k) /(NT  k  N) .
Fortunately, STATA has a command that estimates
the fixed effect model automatically with correct
7
standard errors.
Estimating ai
Sometimes (not often though), ai ,itself is
of interest. This can be easily estimated as:
aî  yi  ˆ1 xi1      ˆk xik
When you estimate a fixed effect model
using STATA, STATA reports the
ìntercept’. Remember that, fixed effect
does not have the intercept. What STATA
is reporting is the average value of aˆ i .
8
Example
JTRAIN.dta is a three year panel data. In
the first differenced model, we used only
the first two years. Now use all the three
years and estimate the following model.
log(scrap)it=β0+β1(grant)it
+β2log(sales)it+β3log(#employees)it
+β4(year88)it+β5(year89)it+ai+uit
Ex1. Estimate the model using OLS ignoring the
presence of the fixed effect.
Ex2. Estimate the model using the fixed effect
model.
9
Ex1. OLS result
. use "D:\My Documents\IUJ_teaching\Research Methodology\Wooldridge Econometrics resources\data\JTRAIN.DTA", clear
. tsset fcode year
panel variable: fcode (strongly balanced)
time variable: year, 1987 to 1989
delta: 1 unit
. reg lscrap grant lemploy lsale d88 d89
Source
SS
df
MS
Model
Residual
31.4579815
272.667651
5 6.29159631
142 1.92019473
Total
304.125633
147 2.06888185
lscrap
Coef.
grant
lemploy
lsales
d88
d89
_cons
.1460224
.7193017
-.5983353
-.1792051
-.3971186
7.007854
Std. Err.
.3185326
.2004453
.2072189
.3029787
.2897043
2.569933
t
0.46
3.59
-2.89
-0.59
-1.37
2.73
Number of obs
F( 5, 142)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.647
0.000
0.004
0.555
0.173
0.007
=
=
=
=
=
=
148
3.28
0.0079
0.1034
0.0719
1.3857
[95% Conf. Interval]
-.4836564
.3230593
-1.007968
-.7781368
-.9698094
1.927583
.7757012
1.115544
-.1887027
.4197266
.1755721
12.08813
10
Fixed effect model
. xtreg lscrap grant lemploy lsale d88 d89, fe
Fixed-effects (within) regression
Group variable: fcode
Number of obs
Number of groups
=
=
148
51
R-sq: within = 0.1637
between = 0.0111
overall = 0.0059
Obs per group: min =
avg =
max =
1
2.9
3
corr(u_i, Xb) = -0.0613
F(5,92)
Prob > F
lscrap
Coef.
grant
lemploy
lsales
d88
d89
_cons
-.088648
-.0149276
-.0654104
-.0926408
-.3785155
1.570754
.1340168
.3581686
.2660989
.1165089
.1168365
3.178357
sigma_u
sigma_e
rho
1.3991565
.50390488
.88518501
(fraction of variance due to u_i)
F test that all u_i=0:
Std. Err.
F(50, 92) =
t
-0.66
-0.04
-0.25
-0.80
-3.24
0.49
19.64
P>|t|
=
=
0.510
0.967
0.806
0.429
0.002
0.622
3.60
0.0051
[95% Conf. Interval]
-.3548169
-.7262814
-.5939057
-.3240376
-.6105629
-4.741738
.1775209
.6964261
.463085
.138756
-.1464681
7.883246
Prob > F = 0.0000
11
Ex3. The fixed effect model above did not
show statistically significant effects of the
grant. It is probably because it takes some
time for the effect of grants to appear. In
order to capture this possibility, include
the lag of grant. That is, estimate the
One year lag
following model.
of the grant
log(scrap)it=β0+β1(grant)it +β2(grant)it-1
+β3log(sales)it+β4log(#employees)it
+β5(year88)it+β6(year89)it+ai+uit
This is called the distributed lag model. The lag of the grant
captures the effect of receiving grant last year on this
year’s scrap rate.
12
.
Fixed effect model with one year lag of the grant
xtreg lscrap grant grant_1 lemploy lsale d88 d89, fe
R-sq:
within = 0.2131
between = 0.0341
overall = 0.0004
corr(u_i, Xb)
=
=
148
51
Obs per group: min =
avg =
max =
1
2.9
3
=
=
4.11
0.0011
Number of obs
Number of groups
Fixed-effects (within) regression
Group variable: fcode
F(6,91)
Prob > F
= -0.2258
Std. Err.
t
P>|t|
[95% Conf. Interval]
lscrap
Coef.
grant
grant_1
lemploy
lsales
d88
d89
_cons
-.2967542
-.5355783
-.0763679
-.0868577
-.0039609
-.132193
2.115481
.1570861
.224206
.3502902
.2596985
.1195487
.1536863
3.10843
sigma_u
sigma_e
rho
1.4415155
.49149057
.89585692
(fraction of variance due to u_i)
F test that all u_i=0:
F(50, 91) =
-1.89
-2.39
-0.22
-0.33
-0.03
-0.86
0.68
20.75
0.062
0.019
0.828
0.739
0.974
0.392
0.498
-.6087863
-.980936
-.7721764
-.6027167
-.2414296
-.4374719
-4.059034
.015278
-.0902207
.6194405
.4290014
.2335079
.173086
8.289996
Prob > F = 0.0000
The lag of grant has greater effect than current grant. This indicates
that it takes time for the effect to appear.
13
Ex4. Finally, estimate the following fixed
effect model by manually creating the
time-demeaned variable. This is a good
exercise for you to understand the exact
procedure of the fixed effect estimation.
log(scrap)it=β0+β1(grant)it
+β2(year88)it+β3(year89)it+ai+uit
14
. xtreg lscrap grant d88 d89, fe
Fixed-effects (within) regression
Group variable: fcode
Number of obs
Number of groups
=
=
162
54
R-sq:
Obs per group: min =
avg =
max =
3
3.0
3
within = 0.1701
between = 0.0189
overall = 0.0130
corr(u_i, Xb)
F(3,105)
Prob > F
= -0.0109
lscrap
Coef.
grant
d88
d89
_cons
-.0822141
-.140066
-.42704
.5974341
.1262632
.106835
.0999338
.0687024
sigma_u
sigma_e
rho
1.4283441
.50485774
.88894293
(fraction of variance due to u_i)
F test that all u_i=0:
Std. Err.
t
P>|t|
=
=
-0.65
-1.31
-4.27
8.70
F(53, 105) =
0.516
0.193
0.000
0.000
23.90
7.18
0.0002
[95% Conf. Interval]
-.3325706
-.3519
-.6251903
.4612098
.1681424
.0717681
-.2288897
.7336583
Prob > F = 0.0000
. reg dmlscrap dmgrant dmd88 dmd89
Source
SS
df
MS
Model
Residual
5.48707982
26.7625405
3 1.82902661
158 .169383168
Total
32.2496203
161 .200308201
dmlscrap
Coef.
dmgrant
dmd88
dmd89
_cons
-.0822141
-.140066
-.42704
-8.26e-09
Std. Err.
.1029302
.0870923
.0814664
.0323354
t
-0.80
-1.61
-5.24
-0.00
Number of obs
F( 3, 158)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.426
0.110
0.000
1.000
Fixed effect
estimated
automatically
=
=
=
=
=
=
162
10.80
0.0000
0.1701
0.1544
.41156
[95% Conf. Interval]
-.2855107
-.3120812
-.5879437
-.0638653
Fixed effect
estimated by
manually creating
time-demeaned
variables. Note the
standard errors are
wrong, so you have
to correct them.
.1210825
.0319493
-.2661363
.0638653
15
The do file
*****************************
* Mannually estimating the
*
* fixed effect model
*
*****************************
sort fcode
by fcode: egen meanlscrap=mean(lscrap)
gen dmlscrap=lscrap-meanlscrap
by fcode: egen meangrant=mean(grant)
gen dmgrant=grant-meangrant
by fcode: egen meand88=mean(d88)
gen dmd88=d88-meand88
by fcode: egen meand89=mean(d89)
gen dmd89=d89-meand89
*******************
*Estimate the model *
*******************
reg dmlscrap dmgrant dmd88 dmd89
xtreg lscrap grant d88 d89, fe
16
Note, when you estimate the fixed effect model,
it is a good idea to tell your audience what the
potential fixed effect would be and whether it is
correlated with the explanatory variables.
Off course, one can never tell exactly what the
fixed effect is since it is the aggregate effects of
all the unobserved effects. However, if you tell
what is contained in the fixed effect, your
audience can understand the potential direction
of the bias, and why you need to use the fixed
effect model.
17
The dummy variable
regression
Consider again the following model.
log(scrap)it=β0+β1(grant)it
+β2(year88)it+β3(year89)it+ai+uit
We learned that fixed effect model can
correct for the biases arising from the
correlation between ai and the
explanatory variables.
18
Now, consider instead that you include
all the firm dummy variables in the
model, and estimate the model using the
usual OLS.
It is known that the slope coefficients and
their standard errors obtained from this
procedure are exactly the same as those
obtained from the fixed effect estimation.
The coefficients for dummy variables will
be the same as the fixed effect estimates
for ai.
19
However, note that the coefficients for the
dummy variables are not consistent when
the number of periods (T) is fixed and the
number of firms (N) gets large. This is
because, when N gets large, the number of
ai will increase. So no information
accumulates on each ai.
20
The Random Effect Estimation
Consider the following unobserved effect model.
Yit   0  1 xit1   2 xit 2       k xitk  (ai  uit )    (1)



vit
Previously, we applied the fixed effect estimation
since we suspect that ai are correlated with some
of the explanatory variables.
But if we can assume that ai are not correlated
with any of the explanatory variables, we can
estimate the model more efficiently (i.e., get
smaller standard errors).
21
When ai are not correlated with any of the
explanatory variables, pooled OLS will be
consistent.
But the problem is now the serial
correlation. That is, for a given person i,
the composite error term vit of this period
and other periods are correlated.
22
To be more precise, assume the following.
Cov(xitj, ai)=0 for t=1,2,…,T, and j=1,2,…,k
That is: ai is uncorrelated with all the explanatory
variables in all the periods.
In addition, we assume that ai and the
idiosyncratic errors in all the periods are
uncorrelated.
Then we can show the following.
 a2
Corr(vit , vis )  2
0
2
a u
where σa2=var(ai) and σu2=Var(uit). Proof: See the front board.
23
Here is a way to eliminate the serial
correlation.
Consider the following.


 u2
  1  2
2 
  u  T a 
1/ 2
Then, the term vit  vi are not serially
correlated. Thus, first consider the
following.
yi  0  1 (xi1 )  2 (xi 2 )      k (xik )  (vit ) ...(2)
24
Then, subtract (2) from (1) to get,
( yit  yi )   0 (1   )  1 ( xit1  xi1 )
  2 ( xit 2  xi 2 )       k ( xitk  xik )  (vit  vi )    (3)
As can be seen, the composite error term is vit  vi
, and we know that this error term has no serial
correlation. The transformed data are called the
quasi-demeaned data. Therefore, if we apply the
OLS to (3), we get the correct standard error.
One problem is that λ is an unknown parameter.
So this has to be estimated.
The procedure to estimate λ is the following.
25
1. Estimate (1) using OLS. Then estimate σa2 σu2
σv2 and as:
ˆ  [ NT (T  1) / 2  (k  1)]
1
2
a
T
T
  vˆ vˆ
i 1 t 1 s t 1
ˆ  [ NT  (k  1)]
2
v
N
1
N
T
2
ˆ
v
 it
i 1 t 1
ˆ u2  ˆ v2  ˆ a2
it is
This is just the
estimate of the
sigma-squared
estimated from
the pooled OLS
of (1).
2. Then estimate λ as:
2


ˆ

u
ˆ  1   2
2 
ˆ
ˆ


T

a 
 u
1/ 2
3. Finally, replace λ in equation (3) with ˆ and
estimate the equation using OLS. This is called the
Random Effect Estimation.
26
Example
Estimate a log wage equation using
WAGEPAN.dta. Include in the model
education, black, hispan, exper, exper
squared, married, union, and full set of
year dummies.
First, estimate the model using OLS
Next, estimate the model using the
random effect.
Finally estimate the model using the fixed
effect model. Why does STATA drops
some of the variables?
27
OLS
. reg lwage
Source
educ black hisp exper expersq married union
SS
Model
Residual
234.048277
1002.48136
14
4345
16.7177341
.230720682
Total
1236.52964
4359
.283672779
lwage
Coef.
educ
black
hisp
exper
expersq
married
union
d81
d82
d83
d84
d85
d86
d87
_cons
.0913498
-.1392342
.0160195
.0672345
-.0024117
.1082529
.1824613
.05832
.0627744
.0620117
.0904672
.1092463
.1419596
.1738334
.0920558
Number of obs
F( 14, 4345)
Prob > F
R-squared
Adj R-squared
Root MSE
MS
df
Std. Err.
.0052374
.0235796
.0207971
.0136948
.00082
.0156894
.0171568
.0303536
.0332141
.0366601
.0400907
.0433525
.046423
.049433
.0782701
t
17.44
-5.90
0.77
4.91
-2.94
6.90
10.63
1.92
1.89
1.69
2.26
2.52
3.06
3.52
1.18
d81 d82 d83 d84 d85 d86 d87
P>|t|
0.000
0.000
0.441
0.000
0.003
0.000
0.000
0.055
0.059
0.091
0.024
0.012
0.002
0.000
0.240
=
=
=
=
=
=
4360
72.46
0.0000
0.1893
0.1867
.48033
[95% Conf. Interval]
.0810819
-.1854622
-.0247535
.0403856
-.0040192
.0774937
.1488253
-.0011886
-.0023421
-.0098608
.011869
.0242533
.0509469
.0769194
-.0613935
.1016177
-.0930062
.0567925
.0940834
-.0008042
.1390122
.2160973
.1178286
.1278909
.1338843
.1690654
.1942393
.2329723
.2707474
.2455051
28
Random Effect
. xtreg lwage
educ black hisp exper expersq married union
d81 d82 d83 d84 d85 d86 d87,re
Random-effects GLS regression
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
Obs per group: min =
avg =
max =
8
8.0
8
within = 0.1799
between = 0.1860
overall = 0.1830
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)
Std. Err.
Wald chi2(14)
Prob > chi2
lwage
Coef.
z
educ
black
hisp
exper
expersq
married
union
d81
d82
d83
d84
d85
d86
d87
_cons
.0918763
-.1393767
.0217317
.1057545
-.0047239
.063986
.1061344
.040462
.0309212
.0202806
.0431187
.0578155
.0919476
.1349289
.0235864
.0106597
.0477228
.0426063
.0153668
.0006895
.0167742
.0178539
.0246946
.0323416
.041582
.0513163
.0612323
.0712293
.0813135
.1506683
sigma_u
sigma_e
rho
.32460315
.35099001
.46100216
(fraction of variance due to u_i)
8.62
-2.92
0.51
6.88
-6.85
3.81
5.94
1.64
0.96
0.49
0.84
0.94
1.29
1.66
0.16
P>|z|
0.000
0.003
0.610
0.000
0.000
0.000
0.000
0.101
0.339
0.626
0.401
0.345
0.197
0.097
0.876
=
=
957.77
0.0000
[95% Conf. Interval]
.0709836
-.2329117
-.0617751
.0756361
-.0060753
.0311091
.0711415
-.0079385
-.0324672
-.0612186
-.0574595
-.0621977
-.0476592
-.0244427
-.271718
.1127689
-.0458417
.1052385
.1358729
-.0033726
.0968629
.1411273
.0888626
.0943096
.1017798
.1436969
.1778286
.2315544
.2943005
.3188907
29
Fixed effect
. xtreg lwage
educ black hisp exper expersq married union
d81 d82 d83 d84 d85 d86 d87,fe
Fixed-effects (within) regression
Group variable: nr
Number of obs
Number of groups
=
=
4360
545
R-sq:
Obs per group: min =
avg =
max =
8
8.0
8
within = 0.1806
between = 0.0005
overall = 0.0635
corr(u_i, Xb)
F(10,3805)
Prob > F
= -0.1212
lwage
Coef.
educ
black
hisp
exper
expersq
married
union
d81
d82
d83
d84
d85
d86
d87
_cons
(dropped)
(dropped)
(dropped)
.1321464
-.0051855
.0466804
.0800019
.0190448
-.011322
-.0419955
-.0384709
-.0432498
-.0273819
(dropped)
1.02764
sigma_u
sigma_e
rho
.4009279
.35099001
.56612236
F test that all u_i=0:
Std. Err.
t
P>|t|
=
=
83.85
0.0000
[95% Conf. Interval]
.0098247
.0007044
.0183104
.0193103
.0203626
.0202275
.0203205
.0203144
.0202458
.0203863
13.45
-7.36
2.55
4.14
0.94
-0.56
-2.07
-1.89
-2.14
-1.34
0.000
0.000
0.011
0.000
0.350
0.576
0.039
0.058
0.033
0.179
.1128842
-.0065666
.0107811
.0421423
-.0208779
-.0509798
-.0818357
-.0782991
-.0829434
-.0673511
.1514087
-.0038044
.0825796
.1178614
.0589674
.0283359
-.0021553
.0013573
-.0035562
.0125872
.0299499
34.31
0.000
.9689201
1.086359
(fraction of variance due to u_i)
F(544, 3805) =
7.96
Prob > F = 0.0000
30
Fixed effect or random effect
Fixed effect estimation allows arbitrary
correlation between ai and explanatory
variables. Random effect is valid only if ai
are uncorrelated with any of the
explanatory variables.
When you conduct a policy analysis,
correlation should be considered as the
rule rather than the exception.
Thus fixed effect is almost always more
convincing than the random effect.
31
But if the policy variable is set
experimentally, then you might apply
random effect. For example, suppose that
you want to know the effect of the class
size on the students’ achievement. And if
students are randomly assigned to classes
of different size, then random effect can
be applied.
However, again, this kind of situation is
rare. So, the usual recommendation is to
use the fixed effect method.
32

No Slide Title

Transcript No Slide Title

Directory