DANE PANELOWE - Uniwersytet Warszawski

Download Report

Transcript DANE PANELOWE - Uniwersytet Warszawski

PANEL DATA
Development
Workshop
What are we going to do today?
1.
2.
3.
4.
5.
Panels – introduction and data properties
How to measure distance
What comes first: trade or GDP?
What else affects trade?
Role of currency?
Why panel data?

What is the sense of panel data?
 pooled
data in econometrics
 panels in econometrics
 long or wide?
 fixed or random effects?
Gravity model

All that theory is ql, but transport costs matter and
market size matters: => push and pull
–
–
–
–
–
Isard (1954), logs by Tinbergen (1962) [what if there were no
barriers? „missing trade”], Linneman (1966) [standard macro
approach],
Anderson (1979) [first theoretical model – expenses based]
Helpman-Krugman (1985) [intra-industry trade]
Bergstrand (1985) [general equilibrium, one country/one factor]
Bergstrand (1989) [H-O model with Lindera hypothesis]
Simplest model

Variables:
Explained: bilateral trade
– Explanatory: GDP, populations, distance
reg lntrade lngdp lnpop lndist
–
Source
SS
df
MS
Model
Residual
2462.80855
504.022113
3
1070
820.936185
.471048704
Total
2966.83067
1073
2.76498664
lntrade
Coef.
lngdp
lndist
lnpop
_cons
1.050343
-1.364936
.3443573
2.82369
Std. Err.
.0695453
.0408487
.0719854
.4257506
t
15.10
-33.41
4.78
6.63
Number of obs
F( 3, 1070)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.000
=
1074
= 1742.78
= 0.0000
= 0.8301
= 0.8296
= .68633
[95% Conf. Interval]
.9138822
-1.445088
.2031089
1.988289
1.186803
-1.284783
.4856058
3.659091
Panel data

Data collected for the same group of subjects for several years =>
there may be something consistently specific about some
particular subjects

E(yit| xit, c) = xit β+ ci, with ci unobserved and time constant
Unobserved (individual) effect (component) also unobserved
(individual) heterogeneity

Panel data


yit = xit β+ ci + uit and E (uit) = 0
If ci is nonzero, running yit = xit β+ uit has an omitted
variable problem, so biased
Panel results – same as OLS?
Random-effects GLS regression
Group variable: id
Number of obs
Number of groups
=
=
1074
91
R-sq:
Obs per group: min =
avg =
max =
6
11.8
12
within = 0.5289
between = 0.8342
overall = 0.8158
corr(u_i, X)
Wald chi2(3)
Prob > chi2
= 0 (assumed)
lntrade
Coef.
Std. Err.
z
lngdp
lndist
lnpop
_cons
1.776719
-1.239047
-.3058137
-.3401835
.0571014
.1286976
.1083628
1.054482
sigma_u
sigma_e
rho
.61522954
.29088637
.81729472
(fraction of variance due to u_i)
31.12
-9.63
-2.82
-0.32
P>|z|
0.000
0.000
0.005
0.747
=
=
1549.73
0.0000
[95% Conf. Interval]
1.664803
-1.491289
-.5182008
-2.40693
1.888636
-.9868039
-.0934266
1.726562
Panel results – same as OLS?
Variable
lngdp
lndist
lnpop
_cons
OLS
1.0503428
-1.3649357
.34435733
2.8236899
PANEL
1.7767192
-1.2390467
-.30581371
-.34018353
Panel data – how to get results?


Same data, same question, but „sth” consists of groups over time
STATA needs to be told what are panel dimensions
1. Set of commands:
iis grouping_var
tis time_var
2. xtset grouping_var time_var
3. tsset grouping_var time_var
(they are all equivalent)

Once data are set for panel? xtsum vs sum
Panel data – not as simple as that…
yit = xitβ + νit and ν it = c i + u it

Case 1 - E (xitci) = 0
–
–

Case 2 - E (xitci) ≠ 0 – RANDOM EFFECTS
–

not correlated with individual effects E (uit;ci) = 0
Case 3 - E (xitci) ≠ 0 – FIXED EFFECTS
–

Individual effects and exogenous variables uncorrelated
OLS consistent – no need for panel
correlated with individual effects E (uit;ci) ≠ 0
Need to choose somehow…
Panel regression

Do not forget context menu in STATA

To find out how to do panel regressions in STATA:
Statistics => Longtitudal/panel data

–
Many options already covered: xtset, sum, des, tab
(check’em out)
–
Also: linear models
Simplest code
–
xtreg lntrade lnpop lngdp lndist
RANDOM
–
xtreg lntrade lnpop lngdp lndist, fe
FIXED
Testing for FIXED vs RANDOM

Need to know if E (uit;ci) = 0
 If E (uit;ci) ≠ 0, then assuming that E (uit;ci) = 0 leads to bias
 If E (uit;ci) = 0, then estimating a model that allows E (uit;ci) ≠ 0
inefficient (too many parameters estimated)

In plain English:
– If RE is true, runnig FE leads to inefficient estimators
– If FE is true, running RE leads to biased estimators
An idea for a test: compare coefficients and see if they are „the
same”. If yes – RE is better (efficient)
Called HAUSMAN TEST or Breish-Pagan test


hausman fe re
Coefficients
(b)
(B)
fe
re
lngdp
lnpop
1.043304
13.0698
1.776719
-.3058137
(b-B)
Difference
-.7334154
13.37561
sqrt(diag(V_b-V_B))
S.E.
.0799723
1.391313
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test:
Ho:
difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
78.60
Prob>chi2 =
0.0000
(V_b-V_B is not positive definite)
xttest0
Breusch and Pagan Lagrangian multiplier test for random effects
lntrade[id,t] = Xb + u[id] + e[id,t]
Estimated results:
Var
lntrade
e
u
Test:
sd = sqrt(Var)
2.764987
.0846149
.3785074
1.662825
.2908864
.6152295
Var(u) = 0
chibar2(01) =
Prob > chibar2 =
3397.25
0.0000
Let’s interpret FE panel estimator
Fixed-effects (within) regression
Group variable: id
Number of obs
Number of groups
=
=
1074
91
R-sq:
Obs per group: min =
avg =
max =
6
11.8
12
within = 0.5693
between = 0.5846
overall = 0.5648
corr(u_i, Xb)
F(2,981)
Prob > F
= -0.9941
lntrade
Coef.
lngdp
lndist
lnpop
_cons
1.043304
0
13.0698
-54.17363
sigma_u
sigma_e
rho
9.9186897
.29088637
.99914066
F test that all u_i=0:
Std. Err.
.0982657
(omitted)
1.395526
4.665788
t
=
=
648.39
0.0000
P>|t|
[95% Conf. Interval]
10.62
0.000
.8504686
1.236139
9.37
-11.61
0.000
0.000
10.33124
-63.32971
15.80836
-45.01756
(fraction of variance due to u_i)
F(90, 981) =
124.35
Prob > F = 0.0000
How do we know if it makes sense?



Different from pooled estimator? But maybe just looks so?
What if we add country effects to a pooled estimation? Let’s try
areg lntrade lnpop lngdp lndist, absorb(id)
Some we know from the literature and some from experience
– Linear or in logs? Maybe also non-linear terms and
interactions, trade or export share, etc.
– Should we do fixed or random effects?
– Are we interested in differences across time or across
countries? Between and within R2 tell a different story, no?
What do our models say?
areg lntrade lngdp lndist lnpop, a(id)
note: lndist omitted because of collinearity
Linear regression, absorbing indicators
lntrade
Coef.
lngdp
lndist
lnpop
_cons
1.043304
0
13.0698
-54.17363
id
Std. Err.
.0982657
(omitted)
1.395526
4.665788
F(90, 981) =
Number of obs
F(
2,
981)
Prob > F
R-squared
Adj R-squared
Root MSE
t
=
=
=
=
=
=
1074
648.39
0.0000
0.9720
0.9694
0.2909
P>|t|
[95% Conf. Interval]
10.62
0.000
.8504686
1.236139
9.37
-11.61
0.000
0.000
10.33124
-63.32971
15.80836
-45.01756
124.348
0.000
(91 categories)
OLS, RE, FE, AREG - comparison
Variable
lngdp
lndist
lnpop
_cons
OLS
1.0503428
-1.3649357
.34435733
2.8236899
re
fe
AREG
1.7767192
-1.2390467
-.30581371
-.34018353
1.0433037
(omitted)
13.069798
-54.173633
1.0433037
(omitted)
13.069798
-54.173633
What if there are FEs?

One idea to estimate a model with FE without
FE is to first difference the model 
–


yit = xitβ + ci + uit => Δyit = Δ xit β + Δ uit
We loose all time-invariant effects (like in FE)
Interpretation of the coefficients is different
FD estimation…
Source
SS
df
MS
Model
Residual
.14862047
38.2886732
2
980
.074310235
.039070075
Total
38.4372936
982
.039141847
D.lntrade
Coef.
Std. Err.
lngdp
D1.
-.0821923
.1301412
lndist
D1.
0
(omitted)
lnpop
D1.
-2.734188
_cons
.1004433
t
Number of obs
F( 2,
980)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
=
=
=
=
=
=
983
1.90
0.1498
0.0039
0.0018
.19766
[95% Conf. Interval]
-0.63
0.528
-.3375798
.1731952
1.492163
-1.83
0.067
-5.662391
.1940149
.009844
10.20
0.000
.0811255
.1197611
And if we wanted to keep TI effects?
yit = ziγ + xitβ + ci + uit

FE and FD eliminate γ
But if E (z’ici) = 0 we can estimate γ –
Hausman-Taylor estimator

But that is for more advanced users … 

xthtaylor lntrade lndist lnpop lngdp,
endog(lngdp) constant(lndist)
Hausman-Taylor estimation
Group variable: id
Number of obs
Number of groups
Random effects u_i ~ i.i.d.
lntrade
TVexogenous
lnpop
TVendogenous
lngdp
TIexogenous
lndist
Coef.
=
=
1074
91
Obs per group: min =
avg =
max =
6
11.8
12
Wald chi2(3)
Prob > chi2
Std. Err.
z
=
=
1305.69
0.0000
P>|z|
[95% Conf. Interval]
6.697837
.9950836
6.73
0.000
4.747509
8.648165
1.414057
.0793424
17.82
0.000
1.258548
1.569565
2.249499
2.196748
1.02
0.306
-2.056048
6.555046
_cons
-48.74691
16.72393
-2.91
0.004
-81.5252
-15.96861
sigma_u
sigma_e
rho
9.7558279
.2905903
.99911356
(fraction of variance due to u_i)
Do more complex estimators
always make more sense?
Variable
lngdp
lndist
lnpop
_cons
OLS
1.0503428
-1.3649357
.34435733
2.8236899
re
fe
AREG
HT
1.7767192
-1.2390467
-.30581371
-.34018353
1.0433037
(omitted)
13.069798
-54.173633
1.0433037
(omitted)
13.069798
-54.173633
1.4140566
2.2494989
6.6978367
-48.746905
What do you find on do-file?
1.
2.
3.
Declare panel, run simplest models, do graphs, etc
Run diagnostics
Learn more 
Next - huge problem - endogeneity


What is first:
– rich trade more or rich because trade more?
– how to go around this problem?
What is it that we want?
– Cross country differences?
– Time evolutions within one country?
– Test theory?