Panel Data - Intinno Paathshaala

Download Report

Transcript Panel Data - Intinno Paathshaala

Panel Data Modelling
Outline
 Panel Data
 Fixed-effects vs. random-effects
 First-differencing or fixed-effects
 Strict Exogeneity Assumption
Panel Data (or Longitudinal Data)
 A typical panel data set has both a
cross-sectional dimension and a time
series dimension. In particular, the
same cross-sectional units (e.g.
individuals, families, firms, cities,
states) are observed over time.
 “Panel data” is different from “pooling
independent cross sections across time”
(or “pooled OLS”). Estimating the latter
is a simple extension of OLS.
Large N or Large T?
 “N” is the number of cross-sectional units and “T” is
the number of time periods.
 Small N and small T (of little use)
 * Large N and small T (Traditional Panel Data)
 N is large enough for the Law of Large Numbers to
apply while T is not.
 Convenient to use if cross-sectional units are
independent.
 Small N and Large T
 T is large enough for the Law of Large Numbers to
apply while N is not.
 Autocorrelation has to be addressed.
 Large N and Large T (Still under exploration)
Fixed Effects Panel-data Model
(individual-specific intercepts)
yit=β0+βt+β1xit1+β2xit2+ai+uit
 Strict Exogeneity Assumption
 Cov(Xit,uis)=0 for all t and s
 Ruling out dynamic models, which have lagged
dependent variables (e.g. yi,t-1) as explanatory
variables. Models with the lags of dependent
variables as ind. Var. are still fine.
 The effects of time-constant independent
variables can not be directly estimated because
they are mixed in ai
 βt (time-specific intercepts) controls for
common shocks to all agents at period t.
Names
 The individual-specific intercept ai may be called ai fixed
effect or unobserved heterogenity.
 The term uit is called idiosyncratic error.
 The sum ai+uit is often called the composite error.
 If Cov(Xit,ai) is nonzero but the pooled OLS method is
used, estimates of all parameters might be biased.This
bias can be called heterogeneity bias.
 Balanced Panel indicates panel data with observations
for the same time periods for all individuals. Otherwise,
the data are unbalanced.
Random Effects Models
yit=β0+βt+β1xit1+β2xit2+ai+uit
 Key assumption:
 ai is uncorrelated with each explanatory variable in all
time periods.
 Difference between RE and FE estimators
 In FE, we effectively control for ai using dummy
variables.
 In RE, ai is omitted and is part of the disturbance
 RE estimates are more efficient (or more precise) if
the RE assumption is valid.
Random Effects Models
(continued)
 Difference between RE and pooled OLS
 Since ai is in the error term, observations over time
are correlated for the same individual i
 In RE approach, the correlation over time is
eliminated using some sophisticated GLS
(generalized least square) method.
 In pooled OLS, the GLS correction is not used.
 Hauman test
 Compare the RE and FE estimates, if the
estimates are very different, then the RE
assumption is probably invalid. In this case FE
has to be used. Otherwise, RE is more efficient.
Estimation of the Fixed-effect Panel
Data Model
 Fixed-effects (or Within) Estimator
 Each variable is demeaned (i.e. subtracted by its
average)
 Dummy Variable Regression (i.e. put in a
dummy variable for each cross-sectional unit,
along with other explanatory variables.) This
may cause estimation difficulty when N is large.
 First-difference Estimator
 Each variable is differenced once over time, so
we are effectively estimating the relationship
between changes of variables.
First Differencing or Fixed-Effect?
 Theoretically, when N is large and T is small but
greater than 2, FE is more efficient when uit are
serially uncorrelated while FD is more efficient when
uit follows a random walk.
 When T is large and N is small
 FD has advantage for processes with large positive
autocorrelation. FE is more sensitive to nonnormality,
heteroskedasticity, and serial correlation in the
idiosyncratic errors.
 On the other hand, FE is less sensitive to violation of
the strict exogeneity assumption. So FE is preferred
when the processes are weakly dependent over time.
With Classical Measurement Errors
 When T>2, the measurement error
bias using FE estimator may be
smaller than that with FD approach
but higher than that with OLS.
(Griliches and Hausman, 1986)
 Natural IV for Measurement Error:
Lagged dependent variables
Violation of the Strict Exogeneity
Assumption
 Parameter estimates are inconsistent,
natural experiment approach (e.g. IV)
is needed.
With Strict Exogeneity and
Dependent Observations
 Parameter estimates are consistent
 Standard errors estimates could still be
biased:
 Cross-sectional correlation or serial correlation
(over time) in error terms
 Heteroskedasticity
Possible Solutions (Need Large N and
Zero Cross-Sectional Correlation)
 Heteroskedasticity
 Use White robust standard errors
 Autocorrelation
 Group the sample time dimension into two
periods and apply the first-difference estimator
(need large N). (Perform the best with D-in-D
approach by Bertrand et al. 2004)
 Clustered robust errors
 Newey-West standard errors (which also
accounts for heteroskedasticity)
 Cross-sectional Correlations
 Clustered robust errors
Clustered Standard Errors
 Key Assumption
 Correlations within a cluster (a group of firms, a
region, different years for the same firm, different
years for the same region) are the same are the
same for different observations.
 Procedure
 Identify clusters using economic theory (clustered by
industry, year, industry and year)
 Let computer calculate clustered standard errors
 Try different ways of defining clusters and see how
estimated standard errors are affected.
Unbalanced Panels
 If a panel data set is unbalanced for
reasons uncorrelated with uit, estimation
consistency using FE will not be affected
 The “attrition” problem: If an unbalanced
panel is a result of some selection process
related to uit, then endogeneity problem is
present and need to be dealt with using
some correction methods.