Longitudinal Data Analysis and Actuarial Science

Download Report

Transcript Longitudinal Data Analysis and Actuarial Science

Chapter 7 - Modeling Issues
• 7.1 Heterogeneity
• 7.2 Comparing fixed and random effects
estimators
• 7.3 Omitted variables
– Models of omitted variables
– Augmented regression estimation
• 7.4 Sampling, selectivity bias, attrition
– Incomplete and rotating panels
– Unplanned nonresponse
– Non-ignorable missing data
7.1 Heterogeneity
• Also think of clustering
– different observations from the same subject
(observational unit) tend to be related to one
another.
• Methods for handling this
– variables in common
– jointly dependent distribution functions
Variables in Common
• These are latent (unobserved) variables
• May be fixed (mean structure) or random
(covariance structure)
• May be organized by the cross-section or by
time
– If by cross-section, may be subject oriented (by
i) or spatial
• May be nested
Jointly Dependent Distribution
Functions
• For the covariance structures, this is a more
general way to think about random
variables that are common to subjects.
• Also includes additional structures not
suggested by the common variable approach
– Example: For the error components model, we
have Corr (yi1, yi2)= where  >0. However,
we need not require positive correlations for a
general uniform correlation model.
Practical Identification with Heterogeneity
may be difficult – Jones (1993) Example
6
Subject 1
1  0
4
2
0
-2
-4
5
15
25
5
15
25
5
15
25
6
Subject 2
2  2
4
2
0
-2
-4
Subject 3
 3  2
6
4
2
0
-2
-4
Time
Theoretical Identification with Heterogeneity may be
Impossible– Neyman Scott (1948) Example
• Example - Identification of variance components
• Consider the fixed effects panel data model
yit = i + it , i=1, …, n, t=1,2,
• where Var it =  2 and Cov (i1, i2) =  2 .
• The ordinary least squares estimator of i is = (yi1 + yi2)/2.
• Thus, the residuals are
ei1  yi1  yi = (yi1 - yi2)/2 and ei 2  y i 2  y i = (yi2 - yi1)/2= - ei1 .
• Thus,  cannot be estimated, despite having 2n - n = n
degrees of freedom available for estimating the variance
components.
Estimation of regression coefficients without
complete identification is possible
• If our main goal is to estimate or test hypotheses about the regression
coefficients, then we do not require knowledge of all aspects of the
model.
• For example, consider the one-way fixed effects model
yi = i 1i + Xi β + i .
• Apply the common transformation matrix Q = I – T -1 J to each
equation to get
y i* = Q y i = Q X i β + Q  i = X i* β +  i* ,
• because Q 1 = 0.
• Use ols on the transformed data. For T = 2,
 1 1


R* = Q R Q =

4   1 1  
2  1
  1
 1  2 (1 -  )  1  1

 

.
1   1 1 
2
1 1 
• Note that can estimate the quantity  2 (1- ) yet cannot separate the
terms  2 and .
7.2 Comparing fixed and random effects
estimators
• Sometimes, the context of the problem does not make clear
the choice of fixed or random effects estimators.
• It is of interest to compare the fixed effects to the random
effects estimators using the data.
• In random effects models, we assume that { i } are
independent of {it } .
– Think instead of drawing {xit } at random and performing inference
conditional on {xit }.
– Interpret {i } to be the unobserved (time-invariant) characteristics of
a subject.
– This assumes that individual effects { i } are independent of other
individual characteristics {xit}, our strict conditional exogeneity
assumption SEC6.
A special case
• Consider the error components model with K = 2 so that
yit = i + 0 + 1 xit,1 + it ,
• We can express (Fuller-Battese) the gls estimator as
b1, EC
• where
x



n
T
i 1
t 1
*
it
  x
n
T
i 1
t 1
1/ 2
2
 




*

 
xit  xit  xi 1  
  T 2   2  



 x * yit*  y *
*
it
x


* 2
1/ 2
2
 




*
 
y it  y it  y i 1  
  T 2   2  


• As 2 0, we have that xit* xit,, yit* yit and b1,EC
b1,OLS
• As 2  , we have that b1,EC b1,FE
A special case
• Define the so-called “between groups” estimator,
x  x  y  y 


 x  x 
n
b1, B
i
i 1
i
n
i 1
2
i
• This estimator can be motivated by averaging all
observations from a subject and then computing an
ordinary least squares estimator using the data . xi , yi in1
• The following decomposition due to Maddala (1971),
b1,EC = (1- ) b1,FE +  b1,B
• where
Var b

1, EC
Var b1, B
• measures the relative precision of the two estimators of  .
A special case
• To express the relationship between i and xi, we consider E [i | xi ].
• Specifically, we assume that i = i +  xi ,1 , where {i} is i.i.d.
• Thus, the model of correlated effects is
yit = i + 0 + 1 xit,1 +  xi ,1+ it .
– Surprisingly, one can show that the generalized least squares
estimator of 1 is b1,FE.
– Intuitively, by considering deviations from the time series means,
yit  yi , the fixed effects estimator “sweeps out” all time-constant
omitted variables.
• In this sense, the fixed effects estimator is robust to this type of model
aberration.
• Under the correlated effects model,
– the estimator b1,FE is unbiased, consistent and asymptotically
normal.
– the estimators b1,HOM, b1,B and b1,EC are biased and inconsistent.
A special case
• To test whether or not to use the random or fixed estimator,
we need only examine the null hypothesis H0:  = 0.
• This is customarily done using the Hausman (1978) test
statistic
2
 FE

b
1, EC
 b1, FE 
2
Var b1, FE - Var b1, EC
• This test statistic has an asymptotic (as n  ) chi-square
distribution with 1 degree of freedom.
– Test statistic is large, go with the fixed effects estimator
– Test statistic is small, go with the random effects estimator
General Case
• Assume that E αi = α and re-write the model as
yi = Zi α + Xi β + i* ,
– where i* = i + Zi (αi - α) and
– Var i* = Zi D Zi + Ri = Vi.
– This re-writing is necessary to make the beta’s under fixed and
random effects formulations comparable.
• With the appropriate definitions, the extension of
Maddala’s (1971) result is
bGLS  I  Δb FE  ΔbB
1
• where
Δ  Var bGLS Var b B 
• The extension of the Hausman test statistic is

1
2
 FE
 b FE  b GLS  Var b FE   Var b GLS  b FE  b GLS 
Case study: Income tax payments
• Consider the model with Variable Intercepts but no
Variable Slopes (Error Components)
2
– The test statistic is  FE = 6.021
– With K = 8 degrees of freedom, the p-value associated with this
test statistic is Pr ob( 2  6.021
. )  0.6448
• For the model with Variable Intercepts and two Variable
Slopes
2

– The test statistic is FE = 13.628.
– With K = 8 degrees of freedom, the p-value associated with this
2
test statistic is Prob(  >13.628)=0.0341.
.
7.3 Omitted Variables
• I call these models of “correlated effects.”
• Section 7.2 described the Hausman/Mundlak model of
time-constant omitted variables.
• Chamberlain (1982) – an alternative hypothesis
– Omitted variables need not be time-constant
• Hausman and Taylor (1981) – another alternative
hypothesis
– Some of the explanatory variables are not correlated
with i .
• To estimate these models, Arellano (1993) used an
“augmented” regression model. We will also use this
approach.
• For a different approach, Stata has programmed an
instrumental variable approach introduce by Hausman
and Taylor as well as Amemiya and Mac Curdy.
Unobserved Variables Models
• Introduced by Palta and Yao (1991) and coworkers.
• Let oi =(zi´, xi´)´ be observed variables and ui be
“unobserved” variables.
• Assuming multivariate normality, we can express:
E  y i | α i , oi , ui   Ziα i  Xiβ  Ui γ
E y i | α i , o i   Z i α i  X i β  (I i  γ ) Σ uo Σ 01 (o i  E o i )


Var y i | α i , o i   R i  (I i  γ ) Σ u  Σ uo Σ 01Σuo (I i  γ )
Unobserved Variables Models
• The unobserved variables enter the likelihood through:
– linear conditional expectations ()
– correlation between observed and unobserved variables
(Suo)
• The fixed effects estimator may be biased, unlike the
correlated effects model case.
• By examining certain special cases, we again arrive at the
Mundlak, Chamberlain and Hausman/Taylor alternative
hypotheses.
• Other alternatives are also of interest. Specifically, an
“extended” Mundlak alternative is:
E yit | αi , oi   β0*  zit αi  xitβ  zi γ1*  xi γ*2
Examples of Correlated Effects Models
• Assuming q = 1 and zit =1, this is Chamberlain’s
alternative.
– Chamberlain used the hypothesis: H1 : αi  ηi  Tt1 xit γ t
– Thus,
E  yit | x i   β0*  xit β  t 1 x it γ t
T
• Assume an error components design for the x’s.
That is,
Σ x ,1  Σ x , 2 for s  t
Cov(x is , x it )  

for s  t
Σ x,2
– Assuming q = 1 and zit =1, this is Mundlak’s
alternative. That is E y | x   β*  x β  xγ
it
i
0
it
i
– Further assume that the first K-r x-variables are
uncorrelated with i . This is the Hausman/Taylor
alternative.
Augmented Regression Estimation and Testing
• I advocate the “augmented” regression approach that uses
the model: E [yi | i oi] = Zi i + Xi  + Gi  .
– Random slopes i that do not affect the conditional
regression function.
– Thus, so that E [yi | oi] = Xi  + Gi  .
– Choose Gi = G(Xi, Zi) to be a known function of the
observed effects.
– Choice of G depends on the alternative model you
consider.
– The test for omitted variables is thus H0:  = 0.
• Define bAR and AR to be the corresponding weighted least
squares estimators.
Some Results
• The estimator bAR is unbiased (even in the presence of
omitted variables).
• The weights corresponding to gls (Wi = Vi) and
Gi 


1
1
Z i Zi R i Z i Zi R i1Xi
– yields bAR = bFE.
– This is an extension of Mundlak’s alternative.
– The chi-square test for H0:  = 0 is:

2
AR
1
ˆ
ˆ

 γ AR Var γ AR  γˆ AR
Determinants of Tax Liability
• I examine a 4 % sample (258) of taxpayers from
the University of Michigan/Ernst & Young Tax
Data Base
– The panel consists of tax years 1982-84, 86, 87
• Tax Liability data, we use
– xit’   linear function of demographic and
earning characteristics of a taxpayer
– zit’ i = 1i + 2i LNTPIit + 3i MRit
– yit = logarithmic tax liability for the ith taxpayer
in the tth year
Empirical Fits
• I present fits of four different models
– Random effects
• includes variable intercept plus two variable slopes
• +omitted variable corrections
– Random coefficients
• With AR1 parameter
• effects +omitted variable corrections (“Extended
Mundlak alternative”)
Results
• Section 7.2 indicated, with only variable intercepts, that the
fixed effects estimator is preferable to the random effects
estimator.
• For random effects,
– two additional variable slope terms were useful
– the random coefficients model did not yield a positive
definite estimate of Var , I used a third order factor
analytic model
• New tests indicate that both the fixed effects model with 3
variable components and the extended Mundlak model are
preferable to the random effects model with 3 variable
components
• Comparing fixed effects model with 3 variable components
and the extended Mundlak model, the AIC favors the
former yet I advocate the latter (parsimony and so on).
7.4 Sampling, selectivity bias and attrition
• 7.4.1 Incomplete and rotating panels
– Early longitudinal and panel data methods assumed
balanced data, that is, Ti = T.
– This suggests techniques from multivariate analysis.
– Data may not be available due to:
• Delayed entry
• Early exit
• Intermittent nonresponse
– If planned, then there is generally no difficulty.
• See the text for the algebraic transformation needed.
• Planned incomplete data is the norm in panel surveys of
people.
7.4.2 Unplanned nonresponse
• Types of panel survey nonresponse (source
Verbeek and Nijman, 1996)
– Initial nonresponse. A subject contacted cannot, or will
not, participate. Because of limited information, this
potential problem is often ignored in the analysis.
– Unit nonresponse. A subject contacted cannot, or will
not, participate even after repeated attempts (in
subsequent waves) to include the subject.
– Wave nonresponse. A subject does not respond for one
or more time periods but does respond in the preceding
and subsequent times (for example, the subject may be
on vacation).
– Attrition. A subject leaves the panel after participating
in at least one survey.
Missing data models
• Let rij be an indicator variable for the ijth
observation, with a one indicating that this
response is observed and a zero indicating that the
response is missing.
• Let ri = (ri1, …, riT) and r = (r1, …, rn).
• The interest is in whether or not the responses
influence the missing data mechanism.
• Use yi = (yi1, …, yiT) to be the vector of all
potentially observed responses for the ith subject
• Let Y = (y1, …, yn) to be the collection of all
potentially observed responses.
Rubin’s (1976) Missing data models
• Missing completely at random (MCAR).
– The case where Y does not affect the distribution of r.
– Specifically, the missing data are MCAR
if f(r | Y) = f(r), where f(.) is a generic probability mass
function.
• Little (1995) - the adjective “covariate dependent”
is added when
– Y does not affect the distribution of r, conditional on
the covariates.
– If the covariates are summarized as {X, Z}, then the
condition corresponds to the relation f(r | Y, X, Z) = f(r|
X, Z).
– Example: x=age, y=income. Missingness could vary by
income but is really due to age (young people don’t
respond)
General advice on missing at random
• One option is to treat the available data as if nonresponses
were planned and use unbalanced estimation techniques.
• Another option is to utilize only subjects with a complete
set of observations by discarding observations from
subjects with missing responses.
• A third option is to impute values for missing responses.
• Little and Rubin note that each option is generally easy to
carry out and may be satisfactory with small amounts of
missing data.
– However, the second and third options may not be efficient.
– Further, each option implicitly relies heavily on the MCAR
assumption.
Selection Model
• Partition the Y vector into observed and missing components: Y
= (Yobs, Ymiss).
• Selection model is given by f(r | Y).
• With parameters θ and ψ, assume that the log likelihood of the
observed random variables is
L(θ,ψ) = log f(r, Yobs,θ,ψ) = log f(Yobs,θ) + log f(r | Yobs,ψ).
• MCAR case
– f(r | Yobs,ψ) = f(r | ψ) does not depend on Yobs.
• Data missing at random (MAR)
– if selection mechanism model distribution does not depend on Ymiss but
may depend on Yobs.
– That is, f(r | Y) = f(r | Yobs).
• For both MAR and MCAR, the likelihood may be maximized
over the parameters separately in term.
– For inference about θ, the selection model mechanism may be “ignored.”
– MAR and MCAR are referred to as the ignorable case.
Example – Income tax payments
•
•
•
•
•
•
Let y = tax liability and x = income.
The taxpayer is not selected (missing) with probability .
– The selection mechanism is MCAR.
The taxpayer is not selected if tax liability < $100.
– The selection mechanism depends on the observed and missing response. The
selection mechanism cannot be ignored.
The taxpayer is not selected if income < $20,000.
– The selection mechanism is MCAR, covariate dependent.
– Assuming that the purpose of the analysis is to understand tax liabilities conditional
on knowledge of income, stratifying based on income does not serious bias the
analysis.
The probability of a taxpayer being selected decreases with tax liability. For example,
suppose the probability of being selected is logit (- yi).
– In this case, the selection mechanism depends on the observed and missing
response. The selection mechanism cannot be ignored.
The taxpayer is followed over T = 2 periods. In the second period, a taxpayer is not
selected if the first period tax < $100.
– The selection mechanism is MAR. That is, the selection mechanism is based on an
observed response.
Example - correction for selection bias
• Historical heights. y = the height of men recruited to serve
in the military.
– The sample is subject to censoring in that minimum height
standards were imposed for admission to the military.
– The selection mechanism is non-ignorable because it depends on
the individual’s height.
• The joint distribution for observables is
– f(r, Yobs,  , ) = f(Yobs,  , )  f(r | Yobs )
n
  f( yi | yi  c)  Prob(yi  c) Prob(y  c) 
m
i 1
– This is easy to maximize in  and .
• If one ignored the censoring mechanisms, then the “log
n
likelihood” is
 1  yi    
log  φ
i 1

 

• MLEs based on this are different, and biased.
Non-ignorable missing data
• There are many models of missing data
mechanisms - see Little and Rubin (1987).
• Heckman two-stage procedure
– Heckman (1976) developed for cross-sectional data but
also applicable to fixed effects panel data models.
– Thus, use
yit = i + xit β + it .
– Further, assume that the sampling response mechanism
is governed by the latent (unobserved) variable rit*
rit* = wit γ+ it .
– We observe
*

1 if rit  0
rit  
0 otherwise
• Assume {yit, rit} is multivariate normal to get
E (yit | rit*  0) = i + xit β +  (wit γ),
• where  =  and λ( a)   (a) .
 (a)
• Heckman’s two-step procedure
– Use the data {( rit, wit)} and a probit regression
model to estimate γ. Call this estimator g H.
– Use the estimator g H to create a new
explanatory variable, xit,K+1 = (wit g H).
• Run a one-way fixed effects model using the K
explanatory variables xit as well as the additional
explanatory variable xit,K+1.
– To test for selection bias, test H0:  = 0 .
– Estimates of  give a “correction for the
selection bias.”
Hausman and Wise procedure
• Use an error components model,
yit = i + xit β + it.
• The sampling response mechanism is governed by
the latent variable error components model
rit* = xi + wit γ + it .
2



0  x
0 


i

• The variances are:


 
2
  it   0
Var    
x

 i   x
   0
 it  

0
0
 x2
 
0
  

0 
 2 
• if x   = 0, then the selection process is independent
of the observation process.
Hausman and Wise procedure
• Again, assume joint normality. With this
assumption, one can check that:
E( y it | ri )  x it β 
 x
T x   
2
2
g it 
  

2
 x2
g 
 it T x2   2


g is 

s 1

T

• where git = E (xi + it | r i).
– Calculating this quantity is computationally intensive, requiring
numerical integration of multivariate normals.
• if x   = 0, then E (yit | ri) = xit β.