Longitudinal Data Analysis and Actuarial Science
Download
Report
Transcript Longitudinal Data Analysis and Actuarial Science
Chapter 7 - Modeling Issues
• 7.1 Heterogeneity
• 7.2 Comparing fixed and random effects
estimators
• 7.3 Omitted variables
– Models of omitted variables
– Augmented regression estimation
• 7.4 Sampling, selectivity bias, attrition
– Incomplete and rotating panels
– Unplanned nonresponse
– Non-ignorable missing data
7.1 Heterogeneity
• Also think of clustering
– different observations from the same subject
(observational unit) tend to be related to one
another.
• Methods for handling this
– variables in common
– jointly dependent distribution functions
Variables in Common
• These are latent (unobserved) variables
• May be fixed (mean structure) or random
(covariance structure)
• May be organized by the cross-section or by
time
– If by cross-section, may be subject oriented (by
i) or spatial
• May be nested
Jointly Dependent Distribution
Functions
• For the covariance structures, this is a more
general way to think about random
variables that are common to subjects.
• Also includes additional structures not
suggested by the common variable approach
– Example: For the error components model, we
have Corr (yi1, yi2)= where >0. However,
we need not require positive correlations for a
general uniform correlation model.
Practical Identification with Heterogeneity
may be difficult – Jones (1993) Example
6
Subject 1
1 0
4
2
0
-2
-4
5
15
25
5
15
25
5
15
25
6
Subject 2
2 2
4
2
0
-2
-4
Subject 3
3 2
6
4
2
0
-2
-4
Time
Theoretical Identification with Heterogeneity may be
Impossible– Neyman Scott (1948) Example
• Example - Identification of variance components
• Consider the fixed effects panel data model
yit = i + it , i=1, …, n, t=1,2,
• where Var it = 2 and Cov (i1, i2) = 2 .
• The ordinary least squares estimator of i is = (yi1 + yi2)/2.
• Thus, the residuals are
ei1 yi1 yi = (yi1 - yi2)/2 and ei 2 y i 2 y i = (yi2 - yi1)/2= - ei1 .
• Thus, cannot be estimated, despite having 2n - n = n
degrees of freedom available for estimating the variance
components.
Estimation of regression coefficients without
complete identification is possible
• If our main goal is to estimate or test hypotheses about the regression
coefficients, then we do not require knowledge of all aspects of the
model.
• For example, consider the one-way fixed effects model
yi = i 1i + Xi β + i .
• Apply the common transformation matrix Q = I – T -1 J to each
equation to get
y i* = Q y i = Q X i β + Q i = X i* β + i* ,
• because Q 1 = 0.
• Use ols on the transformed data. For T = 2,
1 1
R* = Q R Q =
4 1 1
2 1
1
1 2 (1 - ) 1 1
.
1 1 1
2
1 1
• Note that can estimate the quantity 2 (1- ) yet cannot separate the
terms 2 and .
7.2 Comparing fixed and random effects
estimators
• Sometimes, the context of the problem does not make clear
the choice of fixed or random effects estimators.
• It is of interest to compare the fixed effects to the random
effects estimators using the data.
• In random effects models, we assume that { i } are
independent of {it } .
– Think instead of drawing {xit } at random and performing inference
conditional on {xit }.
– Interpret {i } to be the unobserved (time-invariant) characteristics of
a subject.
– This assumes that individual effects { i } are independent of other
individual characteristics {xit}, our strict conditional exogeneity
assumption SEC6.
A special case
• Consider the error components model with K = 2 so that
yit = i + 0 + 1 xit,1 + it ,
• We can express (Fuller-Battese) the gls estimator as
b1, EC
• where
x
n
T
i 1
t 1
*
it
x
n
T
i 1
t 1
1/ 2
2
*
xit xit xi 1
T 2 2
x * yit* y *
*
it
x
* 2
1/ 2
2
*
y it y it y i 1
T 2 2
• As 2 0, we have that xit* xit,, yit* yit and b1,EC
b1,OLS
• As 2 , we have that b1,EC b1,FE
A special case
• Define the so-called “between groups” estimator,
x x y y
x x
n
b1, B
i
i 1
i
n
i 1
2
i
• This estimator can be motivated by averaging all
observations from a subject and then computing an
ordinary least squares estimator using the data . xi , yi in1
• The following decomposition due to Maddala (1971),
b1,EC = (1- ) b1,FE + b1,B
• where
Var b
1, EC
Var b1, B
• measures the relative precision of the two estimators of .
A special case
• To express the relationship between i and xi, we consider E [i | xi ].
• Specifically, we assume that i = i + xi ,1 , where {i} is i.i.d.
• Thus, the model of correlated effects is
yit = i + 0 + 1 xit,1 + xi ,1+ it .
– Surprisingly, one can show that the generalized least squares
estimator of 1 is b1,FE.
– Intuitively, by considering deviations from the time series means,
yit yi , the fixed effects estimator “sweeps out” all time-constant
omitted variables.
• In this sense, the fixed effects estimator is robust to this type of model
aberration.
• Under the correlated effects model,
– the estimator b1,FE is unbiased, consistent and asymptotically
normal.
– the estimators b1,HOM, b1,B and b1,EC are biased and inconsistent.
A special case
• To test whether or not to use the random or fixed estimator,
we need only examine the null hypothesis H0: = 0.
• This is customarily done using the Hausman (1978) test
statistic
2
FE
b
1, EC
b1, FE
2
Var b1, FE - Var b1, EC
• This test statistic has an asymptotic (as n ) chi-square
distribution with 1 degree of freedom.
– Test statistic is large, go with the fixed effects estimator
– Test statistic is small, go with the random effects estimator
General Case
• Assume that E αi = α and re-write the model as
yi = Zi α + Xi β + i* ,
– where i* = i + Zi (αi - α) and
– Var i* = Zi D Zi + Ri = Vi.
– This re-writing is necessary to make the beta’s under fixed and
random effects formulations comparable.
• With the appropriate definitions, the extension of
Maddala’s (1971) result is
bGLS I Δb FE ΔbB
1
• where
Δ Var bGLS Var b B
• The extension of the Hausman test statistic is
1
2
FE
b FE b GLS Var b FE Var b GLS b FE b GLS
Case study: Income tax payments
• Consider the model with Variable Intercepts but no
Variable Slopes (Error Components)
2
– The test statistic is FE = 6.021
– With K = 8 degrees of freedom, the p-value associated with this
test statistic is Pr ob( 2 6.021
. ) 0.6448
• For the model with Variable Intercepts and two Variable
Slopes
2
– The test statistic is FE = 13.628.
– With K = 8 degrees of freedom, the p-value associated with this
2
test statistic is Prob( >13.628)=0.0341.
.
7.3 Omitted Variables
• I call these models of “correlated effects.”
• Section 7.2 described the Hausman/Mundlak model of
time-constant omitted variables.
• Chamberlain (1982) – an alternative hypothesis
– Omitted variables need not be time-constant
• Hausman and Taylor (1981) – another alternative
hypothesis
– Some of the explanatory variables are not correlated
with i .
• To estimate these models, Arellano (1993) used an
“augmented” regression model. We will also use this
approach.
• For a different approach, Stata has programmed an
instrumental variable approach introduce by Hausman
and Taylor as well as Amemiya and Mac Curdy.
Unobserved Variables Models
• Introduced by Palta and Yao (1991) and coworkers.
• Let oi =(zi´, xi´)´ be observed variables and ui be
“unobserved” variables.
• Assuming multivariate normality, we can express:
E y i | α i , oi , ui Ziα i Xiβ Ui γ
E y i | α i , o i Z i α i X i β (I i γ ) Σ uo Σ 01 (o i E o i )
Var y i | α i , o i R i (I i γ ) Σ u Σ uo Σ 01Σuo (I i γ )
Unobserved Variables Models
• The unobserved variables enter the likelihood through:
– linear conditional expectations ()
– correlation between observed and unobserved variables
(Suo)
• The fixed effects estimator may be biased, unlike the
correlated effects model case.
• By examining certain special cases, we again arrive at the
Mundlak, Chamberlain and Hausman/Taylor alternative
hypotheses.
• Other alternatives are also of interest. Specifically, an
“extended” Mundlak alternative is:
E yit | αi , oi β0* zit αi xitβ zi γ1* xi γ*2
Examples of Correlated Effects Models
• Assuming q = 1 and zit =1, this is Chamberlain’s
alternative.
– Chamberlain used the hypothesis: H1 : αi ηi Tt1 xit γ t
– Thus,
E yit | x i β0* xit β t 1 x it γ t
T
• Assume an error components design for the x’s.
That is,
Σ x ,1 Σ x , 2 for s t
Cov(x is , x it )
for s t
Σ x,2
– Assuming q = 1 and zit =1, this is Mundlak’s
alternative. That is E y | x β* x β xγ
it
i
0
it
i
– Further assume that the first K-r x-variables are
uncorrelated with i . This is the Hausman/Taylor
alternative.
Augmented Regression Estimation and Testing
• I advocate the “augmented” regression approach that uses
the model: E [yi | i oi] = Zi i + Xi + Gi .
– Random slopes i that do not affect the conditional
regression function.
– Thus, so that E [yi | oi] = Xi + Gi .
– Choose Gi = G(Xi, Zi) to be a known function of the
observed effects.
– Choice of G depends on the alternative model you
consider.
– The test for omitted variables is thus H0: = 0.
• Define bAR and AR to be the corresponding weighted least
squares estimators.
Some Results
• The estimator bAR is unbiased (even in the presence of
omitted variables).
• The weights corresponding to gls (Wi = Vi) and
Gi
1
1
Z i Zi R i Z i Zi R i1Xi
– yields bAR = bFE.
– This is an extension of Mundlak’s alternative.
– The chi-square test for H0: = 0 is:
2
AR
1
ˆ
ˆ
γ AR Var γ AR γˆ AR
Determinants of Tax Liability
• I examine a 4 % sample (258) of taxpayers from
the University of Michigan/Ernst & Young Tax
Data Base
– The panel consists of tax years 1982-84, 86, 87
• Tax Liability data, we use
– xit’ linear function of demographic and
earning characteristics of a taxpayer
– zit’ i = 1i + 2i LNTPIit + 3i MRit
– yit = logarithmic tax liability for the ith taxpayer
in the tth year
Empirical Fits
• I present fits of four different models
– Random effects
• includes variable intercept plus two variable slopes
• +omitted variable corrections
– Random coefficients
• With AR1 parameter
• effects +omitted variable corrections (“Extended
Mundlak alternative”)
Results
• Section 7.2 indicated, with only variable intercepts, that the
fixed effects estimator is preferable to the random effects
estimator.
• For random effects,
– two additional variable slope terms were useful
– the random coefficients model did not yield a positive
definite estimate of Var , I used a third order factor
analytic model
• New tests indicate that both the fixed effects model with 3
variable components and the extended Mundlak model are
preferable to the random effects model with 3 variable
components
• Comparing fixed effects model with 3 variable components
and the extended Mundlak model, the AIC favors the
former yet I advocate the latter (parsimony and so on).
7.4 Sampling, selectivity bias and attrition
• 7.4.1 Incomplete and rotating panels
– Early longitudinal and panel data methods assumed
balanced data, that is, Ti = T.
– This suggests techniques from multivariate analysis.
– Data may not be available due to:
• Delayed entry
• Early exit
• Intermittent nonresponse
– If planned, then there is generally no difficulty.
• See the text for the algebraic transformation needed.
• Planned incomplete data is the norm in panel surveys of
people.
7.4.2 Unplanned nonresponse
• Types of panel survey nonresponse (source
Verbeek and Nijman, 1996)
– Initial nonresponse. A subject contacted cannot, or will
not, participate. Because of limited information, this
potential problem is often ignored in the analysis.
– Unit nonresponse. A subject contacted cannot, or will
not, participate even after repeated attempts (in
subsequent waves) to include the subject.
– Wave nonresponse. A subject does not respond for one
or more time periods but does respond in the preceding
and subsequent times (for example, the subject may be
on vacation).
– Attrition. A subject leaves the panel after participating
in at least one survey.
Missing data models
• Let rij be an indicator variable for the ijth
observation, with a one indicating that this
response is observed and a zero indicating that the
response is missing.
• Let ri = (ri1, …, riT) and r = (r1, …, rn).
• The interest is in whether or not the responses
influence the missing data mechanism.
• Use yi = (yi1, …, yiT) to be the vector of all
potentially observed responses for the ith subject
• Let Y = (y1, …, yn) to be the collection of all
potentially observed responses.
Rubin’s (1976) Missing data models
• Missing completely at random (MCAR).
– The case where Y does not affect the distribution of r.
– Specifically, the missing data are MCAR
if f(r | Y) = f(r), where f(.) is a generic probability mass
function.
• Little (1995) - the adjective “covariate dependent”
is added when
– Y does not affect the distribution of r, conditional on
the covariates.
– If the covariates are summarized as {X, Z}, then the
condition corresponds to the relation f(r | Y, X, Z) = f(r|
X, Z).
– Example: x=age, y=income. Missingness could vary by
income but is really due to age (young people don’t
respond)
General advice on missing at random
• One option is to treat the available data as if nonresponses
were planned and use unbalanced estimation techniques.
• Another option is to utilize only subjects with a complete
set of observations by discarding observations from
subjects with missing responses.
• A third option is to impute values for missing responses.
• Little and Rubin note that each option is generally easy to
carry out and may be satisfactory with small amounts of
missing data.
– However, the second and third options may not be efficient.
– Further, each option implicitly relies heavily on the MCAR
assumption.
Selection Model
• Partition the Y vector into observed and missing components: Y
= (Yobs, Ymiss).
• Selection model is given by f(r | Y).
• With parameters θ and ψ, assume that the log likelihood of the
observed random variables is
L(θ,ψ) = log f(r, Yobs,θ,ψ) = log f(Yobs,θ) + log f(r | Yobs,ψ).
• MCAR case
– f(r | Yobs,ψ) = f(r | ψ) does not depend on Yobs.
• Data missing at random (MAR)
– if selection mechanism model distribution does not depend on Ymiss but
may depend on Yobs.
– That is, f(r | Y) = f(r | Yobs).
• For both MAR and MCAR, the likelihood may be maximized
over the parameters separately in term.
– For inference about θ, the selection model mechanism may be “ignored.”
– MAR and MCAR are referred to as the ignorable case.
Example – Income tax payments
•
•
•
•
•
•
Let y = tax liability and x = income.
The taxpayer is not selected (missing) with probability .
– The selection mechanism is MCAR.
The taxpayer is not selected if tax liability < $100.
– The selection mechanism depends on the observed and missing response. The
selection mechanism cannot be ignored.
The taxpayer is not selected if income < $20,000.
– The selection mechanism is MCAR, covariate dependent.
– Assuming that the purpose of the analysis is to understand tax liabilities conditional
on knowledge of income, stratifying based on income does not serious bias the
analysis.
The probability of a taxpayer being selected decreases with tax liability. For example,
suppose the probability of being selected is logit (- yi).
– In this case, the selection mechanism depends on the observed and missing
response. The selection mechanism cannot be ignored.
The taxpayer is followed over T = 2 periods. In the second period, a taxpayer is not
selected if the first period tax < $100.
– The selection mechanism is MAR. That is, the selection mechanism is based on an
observed response.
Example - correction for selection bias
• Historical heights. y = the height of men recruited to serve
in the military.
– The sample is subject to censoring in that minimum height
standards were imposed for admission to the military.
– The selection mechanism is non-ignorable because it depends on
the individual’s height.
• The joint distribution for observables is
– f(r, Yobs, , ) = f(Yobs, , ) f(r | Yobs )
n
f( yi | yi c) Prob(yi c) Prob(y c)
m
i 1
– This is easy to maximize in and .
• If one ignored the censoring mechanisms, then the “log
n
likelihood” is
1 yi
log φ
i 1
• MLEs based on this are different, and biased.
Non-ignorable missing data
• There are many models of missing data
mechanisms - see Little and Rubin (1987).
• Heckman two-stage procedure
– Heckman (1976) developed for cross-sectional data but
also applicable to fixed effects panel data models.
– Thus, use
yit = i + xit β + it .
– Further, assume that the sampling response mechanism
is governed by the latent (unobserved) variable rit*
rit* = wit γ+ it .
– We observe
*
1 if rit 0
rit
0 otherwise
• Assume {yit, rit} is multivariate normal to get
E (yit | rit* 0) = i + xit β + (wit γ),
• where = and λ( a) (a) .
(a)
• Heckman’s two-step procedure
– Use the data {( rit, wit)} and a probit regression
model to estimate γ. Call this estimator g H.
– Use the estimator g H to create a new
explanatory variable, xit,K+1 = (wit g H).
• Run a one-way fixed effects model using the K
explanatory variables xit as well as the additional
explanatory variable xit,K+1.
– To test for selection bias, test H0: = 0 .
– Estimates of give a “correction for the
selection bias.”
Hausman and Wise procedure
• Use an error components model,
yit = i + xit β + it.
• The sampling response mechanism is governed by
the latent variable error components model
rit* = xi + wit γ + it .
2
0 x
0
i
• The variances are:
2
it 0
Var
x
i x
0
it
0
0
x2
0
0
2
• if x = 0, then the selection process is independent
of the observation process.
Hausman and Wise procedure
• Again, assume joint normality. With this
assumption, one can check that:
E( y it | ri ) x it β
x
T x
2
2
g it
2
x2
g
it T x2 2
g is
s 1
T
• where git = E (xi + it | r i).
– Calculating this quantity is computationally intensive, requiring
numerical integration of multivariate normals.
• if x = 0, then E (yit | ri) = xit β.