Transcript DANE PANELOWE - Uniwersytet Warszawski
PANEL DATA
Development Workshop
What are we going to do today?
1.
2.
3.
4.
5.
Panels – introduction and data properties How to measure distance What comes first: trade or GDP?
What else affects trade?
Role of currency?
Why panel data?
What is the sense of panel data?
pooled data in econometrics panels in econometrics long or wide?
fixed or random effects?
Gravity model
All that theory is ql, but transport costs matter and market size matters: => push and pull – – – – – Isard (1954), logs by Tinbergen (1962) [what if there were no barriers? „missing trade”], Linneman (1966) [standard macro approach], Anderson (1979) [first theoretical model – expenses based] Helpman-Krugman (1985) [intra-industry trade] Bergstrand (1985) [general equilibrium, one country/one factor] Bergstrand (1989) [H-O model with Lindera hypothesis]
Simplest model
Variables: – – Explained: bilateral trade Explanatory: GDP, populations, distance
reg trade gdp pop dist
Source SS df MS Number of obs = 1074 F( 3, 1070) = 543.02
Model 196764.006 3 65588.0021 Prob > F = 0.0000
Residual 129238.275 1070 120.783434 R-squared = 0.6036
Adj R-squared = 0.6025
Total 326002.281 1073 303.823188 Root MSE = 10.99
tradevolume Coef. Std. Err. t P>|t| [95% Conf. Interval] gdpsum .0141613 .0011921 11.88 0.000 .0118221 .0165004
population~m .0528096 .0228549 2.31 0.021 .0079642 .097655
distance -.0073704 .0005152 -14.31 0.000 -.0083813 -.0063594
_cons 5.762674 1.067794 5.40 0.000 3.667467 7.857882
Panel data
Same data, same question, but „sth” consists of groups over time STATA learns that by 1.
Set of commands:
iis grouping_var tis time_var
2.
xtset grouping_var time_var
3.
tsset grouping_var time_var
(they are all equivalent) Once data are set for panel?
xtsum
vs
sum
Panel regression
Do not forget context menu in STATA To find out how to do panel regressions in STATA:
Statistics => Longtitudal/panel data
– Many options already covered:
xtset
,
sum
,
des
,
tab
(check’em out ) – Also:
linear models
Simplest code
xtreg trade pop gdp dist
Panel results
Random-effects GLS regression Number of obs = 1074 Group variable: id Number of groups = 91 R-sq: within = 0.4879 Obs per group: min = 6 between = 0.6091 avg = 11.8
overall = 0.5995 max = 12 Random effects u_i ~ Gaussian Wald chi2(3) = 1070.28
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
tradevolume Coef. Std. Err. z P>|z| [95% Conf. Interval] gdpsum .0187795 .0006722 27.94 0.000 .017462 .0200969
population~m -.0098166 .0375135 -0.26 0.794 -.0833418 .0637085
distance -.0068902 .0017132 -4.02 0.000 -.010248 -.0035324
_cons 4.429218 3.53079 1.25 0.210 -2.491003 11.34944
sigma_u 10.536556
sigma_e 3.3908988
rho .90615037 (fraction of variance due to u_i)
How do we know if it makes sense?
Different from pooled estimator?
What if we add country effects to the pooled estimation? Let’s try
areg trade pop gdp dist, absorb(grouping_var)
Some we know from the literature and some from experience – – Linear or in logs? Maybe also non-linear terms and interactions, trade or export share, etc.
Should we do fixed or random effects?
– Are we interested in differences across time or across countries? Between and within R2 tell a different story, no? What do our models say?
xttest0
tradevolume[id,t] = Xb + u[id] + e[id,t] Estimated results: Var sd = sqrt(Var) tradevo~e 303.8232 17.43052
e 11.49819 3.390899
u 111.019 10.53656
Test: Var(u) = 0 chi2(1) = 4793.89
Prob > chi2 = 0.0000
Huge problem - endogeneity
What is first: – rich trade more or rich because trade more?
– how to go around this problem?
What is it that we want? – Cross country differences?
– – Time evolutions within one country?
Test theory?
What do you find on do-file?
1.
2.
3.
Declare panel, run simplest models, do graphs, etc Run diagnostics
Learn more