Panel Data - Intinno Paathshaala
Download
Report
Transcript Panel Data - Intinno Paathshaala
Panel Data Modelling
Outline
Panel Data
Fixed-effects vs. random-effects
First-differencing or fixed-effects
Strict Exogeneity Assumption
Panel Data (or Longitudinal Data)
A typical panel data set has both a
cross-sectional dimension and a time
series dimension. In particular, the
same cross-sectional units (e.g.
individuals, families, firms, cities,
states) are observed over time.
“Panel data” is different from “pooling
independent cross sections across time”
(or “pooled OLS”). Estimating the latter
is a simple extension of OLS.
Large N or Large T?
“N” is the number of cross-sectional units and “T” is
the number of time periods.
Small N and small T (of little use)
* Large N and small T (Traditional Panel Data)
N is large enough for the Law of Large Numbers to
apply while T is not.
Convenient to use if cross-sectional units are
independent.
Small N and Large T
T is large enough for the Law of Large Numbers to
apply while N is not.
Autocorrelation has to be addressed.
Large N and Large T (Still under exploration)
Fixed Effects Panel-data Model
(individual-specific intercepts)
yit=β0+βt+β1xit1+β2xit2+ai+uit
Strict Exogeneity Assumption
Cov(Xit,uis)=0 for all t and s
Ruling out dynamic models, which have lagged
dependent variables (e.g. yi,t-1) as explanatory
variables. Models with the lags of dependent
variables as ind. Var. are still fine.
The effects of time-constant independent
variables can not be directly estimated because
they are mixed in ai
βt (time-specific intercepts) controls for
common shocks to all agents at period t.
Names
The individual-specific intercept ai may be called ai fixed
effect or unobserved heterogenity.
The term uit is called idiosyncratic error.
The sum ai+uit is often called the composite error.
If Cov(Xit,ai) is nonzero but the pooled OLS method is
used, estimates of all parameters might be biased.This
bias can be called heterogeneity bias.
Balanced Panel indicates panel data with observations
for the same time periods for all individuals. Otherwise,
the data are unbalanced.
Random Effects Models
yit=β0+βt+β1xit1+β2xit2+ai+uit
Key assumption:
ai is uncorrelated with each explanatory variable in all
time periods.
Difference between RE and FE estimators
In FE, we effectively control for ai using dummy
variables.
In RE, ai is omitted and is part of the disturbance
RE estimates are more efficient (or more precise) if
the RE assumption is valid.
Random Effects Models
(continued)
Difference between RE and pooled OLS
Since ai is in the error term, observations over time
are correlated for the same individual i
In RE approach, the correlation over time is
eliminated using some sophisticated GLS
(generalized least square) method.
In pooled OLS, the GLS correction is not used.
Hauman test
Compare the RE and FE estimates, if the
estimates are very different, then the RE
assumption is probably invalid. In this case FE
has to be used. Otherwise, RE is more efficient.
Estimation of the Fixed-effect Panel
Data Model
Fixed-effects (or Within) Estimator
Each variable is demeaned (i.e. subtracted by its
average)
Dummy Variable Regression (i.e. put in a
dummy variable for each cross-sectional unit,
along with other explanatory variables.) This
may cause estimation difficulty when N is large.
First-difference Estimator
Each variable is differenced once over time, so
we are effectively estimating the relationship
between changes of variables.
First Differencing or Fixed-Effect?
Theoretically, when N is large and T is small but
greater than 2, FE is more efficient when uit are
serially uncorrelated while FD is more efficient when
uit follows a random walk.
When T is large and N is small
FD has advantage for processes with large positive
autocorrelation. FE is more sensitive to nonnormality,
heteroskedasticity, and serial correlation in the
idiosyncratic errors.
On the other hand, FE is less sensitive to violation of
the strict exogeneity assumption. So FE is preferred
when the processes are weakly dependent over time.
With Classical Measurement Errors
When T>2, the measurement error
bias using FE estimator may be
smaller than that with FD approach
but higher than that with OLS.
(Griliches and Hausman, 1986)
Natural IV for Measurement Error:
Lagged dependent variables
Violation of the Strict Exogeneity
Assumption
Parameter estimates are inconsistent,
natural experiment approach (e.g. IV)
is needed.
With Strict Exogeneity and
Dependent Observations
Parameter estimates are consistent
Standard errors estimates could still be
biased:
Cross-sectional correlation or serial correlation
(over time) in error terms
Heteroskedasticity
Possible Solutions (Need Large N and
Zero Cross-Sectional Correlation)
Heteroskedasticity
Use White robust standard errors
Autocorrelation
Group the sample time dimension into two
periods and apply the first-difference estimator
(need large N). (Perform the best with D-in-D
approach by Bertrand et al. 2004)
Clustered robust errors
Newey-West standard errors (which also
accounts for heteroskedasticity)
Cross-sectional Correlations
Clustered robust errors
Clustered Standard Errors
Key Assumption
Correlations within a cluster (a group of firms, a
region, different years for the same firm, different
years for the same region) are the same are the
same for different observations.
Procedure
Identify clusters using economic theory (clustered by
industry, year, industry and year)
Let computer calculate clustered standard errors
Try different ways of defining clusters and see how
estimated standard errors are affected.
Unbalanced Panels
If a panel data set is unbalanced for
reasons uncorrelated with uit, estimation
consistency using FE will not be affected
The “attrition” problem: If an unbalanced
panel is a result of some selection process
related to uit, then endogeneity problem is
present and need to be dealt with using
some correction methods.