Little - Statistics

Download Report

Transcript Little - Statistics

Partially missing at random and
ignorable inferences for parameter
subsets with missing data
Roderick Little
Outline
• Survey Bayesics in three slides
• Inference with missing data: Rubin's (1976)
paper on conditions for ignoring the missingdata mechanism
• Rubin’s standard conditions are sufficient but
not necessary: example
• Propose definitions of MAR, ignorability for
likelihood (and Bayes) inference for subsets of
parameters
• Examples
• Joint work with Sahar Zanganeh
Graybill Conference: Partially Missing at Random
2
Calibrated Bayes
– Frequentists should be Bayesian
• Bayes is optimal under assumed model
– Bayesians should be frequentist
• We never know the model (and all models are wrong)
• Inferences should have good repeated sampling
characteristics
– Calibrated Bayes (e.g. Box 1980, Rubin 1984, Little 2012)
• Inference based on a Bayesian model
• Model chosen to yield inferences that are well-calibrated
in a frequentist sense
• Aim for posterior probability intervals that have
(approximately) nominal frequentist coverage
Graybill Conference: Partially Missing at Random
3
Calibrated Bayes models for surveys should
incorporate sample design features
– All models are wrong, some models are useful
• Design-assisted: make the estimator more robust
• Calibrated Bayes: make the model more robust –
many models yield design-consistent estimates
– Models that ignore features like survey weights
are vulnerable to misspecification
– But models can be successfully applied in survey
setting, with attention to design features
• Weighting, stratification, clustering
– Capture design weights as covariates in the
prediction model (e.g. Gelman 2007)
Graybill Conference: Partially Missing at Random
4
Benefits of Bayes
• Unified approach to all problems
– Avoids current approach -- “inferential
schizophrenia”
• Not asymptotic
– Propagates errors in estimating parameters
• Avoids frequentist pitfalls:
– Conditions on ancillaries
– Obeys likelihood principle
Graybill Conference: Partially Missing at Random
5
v
Graybill Conference: Partially Missing at Random
6
There are
those who
predict…
… and
those who
weight
Graybill Conference: Partially Missing at Random
7
Rubin (1976 Biometrika)
• Landmark paper (3700+ citations, after being
rejected by many journals!)
– RL wrote his first (11 page) referee report, and an
obscure discussion
• Modeled the missing data mechanism by
treating missingness indicators as random
variables, assigning them a distribution
• Sufficient conditions under which missing data
mechanism can be ignored for likelihood and
frequentist inference about parameters
– Focus here on likelihood, Bayes
Graybill Conference: Partially Missing at Random
8
Ignoring the mechanism
D  data with no missing values, Dobs observed, Dmis missing
R = response indicator matrix
f D ,R ( D, R |  ,  )  f D ( D |  ) f R|D ( R | D,  )
• Full likelihood:
L( , | Dobs , R)  const.   f D ( D |  ) f R|D ( R | D,  )dDmis
• Likelihood ignoring mechanism:
Lign ( | Dobs , R)  const.   f D ( D |  )dDmis
• Missing data mechanism can be ignored for
likelihood inference when
L( , | Dobs , R)  Lign ( |Dobs ,R)  Lrest ( | Dobs , R)
Graybill Conference: Partially Missing at Random
9
Rubin’s sufficient conditions for
ignoring the mechanism
• Missing data mechanism can be ignored for
likelihood inference when
– (a) the missing data are missing at random (MAR):
f R|D ( R | Dobs , Dmis , )  f R|D ( R | Dobs ,  ) for all Dmis , 
– (b) distinctness of the parameters of the data model
and the missing-data mechanism:
( , )     ; for Bayes,  and  a-priori independent
• MAR is the key condition: without (b),
inferences are valid but not fully efficient
Graybill Conference: Partially Missing at Random
10
“Sufficient for ignorable” is not the
same as “ignorable”
• These definitions have come to define ignorability (e.g.
Little and Rubin 2002)
• However, Rubin (1976) described (a) and (b) as the
"weakest simple and general conditions under which it
is always appropriate to ignore the process that causes
missing data".
• These conditions are not necessary for ignoring the
mechanism in all situations.
MAR+distinctness  ignorable
ignorable  MAR+distinctness
Graybill Conference: Partially Missing at Random
11
Example 1: Nonresponse with
Or whole
auxiliary data
population N
Dobs  ( Dresp , Daux )
Dresp  ( yi1 , yi 2 ), i  1,..., m , Daux   y*j1 , j  1,..., n Y1 R Y1 Y2
Daux includes the respondent values of Y1 ,
but we do not know which they are.
Y1 , Y2 ~ ind f ( yi1 , yi 2 |  )
Pr(ri  1| yi1 , yi 2 ,  )  g ( yi1 ,  )
0
0
0
1
1
?
?
?
?
Not linked
Not MAR -- yi1 missing for nonrespondents i
But... mechanism is ignorable, does not need to be modeled:
Marginal distribution of Y1 estimated from Daux
Conditional of Y2 given Y1 estimated from D resp
Graybill Conference: Partially Missing at Random
12
MAR, ignorability for parameter subsets
• MAR and ignorability are defined in terms of
the complete set of parameters in the data
model for D
• It would be useful to have a definition of MAR
that applies to subsets of parameters, including
parameters of substantive interest.
• A trivial example: It seems plausible that a
nonignorable mechanism would be MAR for the
parameters of distributions of variables that
are not missing.
Graybill Conference: Partially Missing at Random
13
MAR, ignorability for parameter subsets
 =(1 , 2 )
Mechanism is partially MAR for likelihood inference
about 1 , denoted P-MAR(1 ), if:
L(1 , 2 ,  | Dobs , R)  Lign (1 | Dobs , R)  Lrest ( 2 ,  | Dobs , R)
for all 1 , 2 , 
Mechanism is IGN(1 ) if MAR(1 ) and 1 and ( 2 ,  ) distinct
Graybill Conference: Partially Missing at Random
14
MAR, ignorability for parameter subsets
Special case where 1 =
Mechanism is P-MAR( ) if:
L( ,  | Dobs , R)  Lign ( | Dobs , R)  Lrest ( | Dobs , R)
for all  ,
A consequence of (but does not imply) Rubin's MAR condition
IGN( ) if MAR( ) and  and  distinct
Graybill Conference: Partially Missing at Random
15
Partial MAR given a function of
mechanism
Harel and Schafer (2009) define a different kind of Partial MAR:
Mechanism is partially MAR given g ( R) if:
P( R | Yobs , Ymis , g ( R ), ,  )  P ( R | Yobs , g ( R), ,  )
for all  ,  , R, Yobs
Here "partial" relates to the mechanism,
In my definition "partial" relates to the parameters
This ideas seems quite distinct
Graybill Conference: Partially Missing at Random
16
Example 1: Auxiliary Survey Data
Dobs  ( Dresp , Daux )
Dresp  ( yi1 , yi 2 ), i  1,..., m , Daux   y*j1 , j  1,..., n Y1 R Y1 Y2
Daux includes the respondent values of Y1 ,
but we do not know which they are.
D  ( yi1 , yi 2 ), i  1,..., n}
0
0
0
1
1
Y1 , Y2 ~ f ( yi1 , yi 2 |  )
Pr(ri  1| yi1 , yi 2 ,  )  g ( yi1 ,  )
?
?
?
?
Not linked
Easy to show that mechanism is P-MAR( ),
and IGN( ) if  , are distinct
Graybill Conference: Partially Missing at Random
17
Ex. 2: MNAR Monotone Bivariate Data
D  ( yi1 , yi 2 ), i  1,..., n}
Dobs  ( yi1 , yi 2 ), i  1,..., m and  yi1 , i  m  1,..., n
Y1 , Y2 ~ f ( yi1 , yi 2 |  )  f ( yi1 | 1 )  f ( yi 2 | yi1 ,  2 )
Pr (ri 2  1| yi1 , yi 2 ,  )  g ( yi1 , yi 2 ,  ) (MNAR)
M Y1 Y2
0
0
0
1
1
?
?
COMMENT: Clearly, inference about parameters 1
of the marginal distribution of Y1 can ignore mechanism,
since Y1 has no missing values.
In proposed definition, this mechanism is P-MAR(1 ),
and IGN(1 ) if 1 and ( 2 ,  ) distinct
• Paper presents more interesting case with Y1, Y2 blocks
of variables and missing data in each block
Graybill Conference: Partially Missing at Random
18
More generally…
(Y1 , R (1) ),(Y2 , R (2) ) blocks of incomplete variables, and
f ( y1i , y2i , ri(1) , ri(2) )  f1 ( y1i | 1 ) Pr(ri(1) | y1i , 1 )
 f1 ( y2i | y1i , 2 ) Pr(ri(2) | ri(1) , y1i , y2i , 2 )
Assume: Pr(ri(1) | y1i ;1 )  g1 ( y1,obs,i ,1 ) for all y1,mis,i ,
Pr(ri(2) | ri(1) , y1i , y2i ;2 )  g 2 (ri(1) , y1i , y2i , 2 ),
Mechanism is P-MAR(1 ), IGN(1 ) if 1 and
( 2 , 1 , 2 ) are distinct
Graybill Conference: Partially Missing at Random
19
Ex. 3: Complete Case Analysis in Regression
D  ( yi1 , yi 2 ), i  1,..., n}
Dobs  ( yi1 , yi 2 ), i  1,..., m Y1 , Y2 ~ f ( yi1 , yi 2 |  )
Pr(ri  1| yi1 , yi 2 ,  )  g ( yi1 ,  )
MNAR, but inference about parameters of
conditional distribution of Y2 given Y1 based on
R Y1 Y2
0
0
0
0
1
1
?
?
?
?
complete cases is valid, ignoring the mechanism.
Let f ( yi1 , yi 2 |  )  f1 ( yi1 | 1 ) f 2 ( yi 2 | yi1 , 2 )
L(1 , 2 ,  | Dobs , R)  const.  L1 ( 2 | Dobs )  L2 (1 ,  | Dobs , R), where
r
L1 ( 2 | Dobs )   f 2 ( yi 2 | yi1 , 2 )
i 1
MNAR, but P-MAR( 2 ), and IGN( 2 ) if  2 , (1 , ) distinct
Graybill Conference: Partially Missing at Random
20
Ex. 4:A normal pattern-mixture model
Dobs  ( yi1 , yi 2 ), i  1,..., m and  yi1 , i  m  1,..., n
f ( D, R2 |  , )  f D|R ( D | R2 ,  ) f R ( R2 |  )
( yi1 , yi 2 | ri 2  j ,  ) ~ ind G (  ( j ) ,  ( j ) ), j  0,1, ri 2 ~ ind Bern( )
Assume Pr(ri 2  1| yi1 , yi 2 )  g ( yi 2 ), g unknown (MNAR)
R2 Y1 Y2
0
0
0
1
1
?
?
COMMENT: Distribution of Y1 given Y2 and R2 is independent of R2 ,
so it can be estimated from complete cases, ignoring the mechanism
12  ( 120 , 122 , 112 ),   2(0) , 22(0) , 1(1) , 11(1) 
L( ,  | Dobs , R2 )  const.  L1 (12 | Dobs , R2 )  L2 ( , | Dobs , R2 ), where
m
L1 (1 | Dobs )   f12 ( yi1 | yi 2 ,12 )
i 1
MNAR, but P-MAR(12 ), not IGN(12 ) since 12 and  are not distinct
Graybill Conference: Partially Missing at Random
21
uw
Ex. 5: Subsample ignorable likelihood
Pattern
Z W X
Y
Little and Zhang (2011)
P1
√ √ ?
?
P2
√ ?
?
Columns could be vectors
√ = fully observed
? = observed or missing
?
• Interest concerns parameters 1 of regression of Y on (Z,X,W)
• Z complete, W and (X,Y) incomplete. W complete in P1.
• Division of covariates into W, X is based on following MNAR
assumptions about the missing data mechanism:
• Pr(W complete) = fn(W,X,Z) (not Y)
(X,Y) MAR in subsample with W fully observed (that is, P1)
This mechanism is P-MAR(1 );corresponding analysis is
to apply an ignorable likelihood method, discarding data in P2
Graybill Conference: Partially Missing at Random
22
Ex. 6: Auxiliary data, survey nonresponse
D  ( yi1 , yi 2 , yi 3 ), i  1,..., n}
Dobs  ( Dresp , Daux )
Dresp  ( yi1 , yi 2 , yi 3 ), i  1,..., r , ( yi1 ), i  r  1,..., n ,
Daux   y , j  1,..., N  , N = population size
*
j2
Y1 , Y2 , Y3 ~ f ( yi1 , yi 2 , yi 3 |  )
Pr(mi  1| yi1 , yi 2 , yi 3 ,  )  g ( yi1 , yi 2 ,  )
Y2 Y1 Y2 Y3
1
.
.
r
.
.
n
.
.
N
?
?
?
?
Not linked
NOT MAR -- yi 2 missing for nonrespondents
But mechanism is P-MAR( ) if g ( yi1 , yi 2 , ) additive function of (yi1 , yi 2 )
Marginal of Y2 from Daux , marginal of Y1 from Dresp
Conditional of Y3 given Y1 ,Y2 from complete cases in Dresp
Graybill Conference: Partially Missing at Random
23
Simulation Study
[Y1 , Y2 , Y3 , M ]  [Y1 , Y2 ][Y3 | Y1 , Y2 ][ M | Y1 , Y2 , Y3 ]
[Y1 , Y2 ] multinomial
[Y3 | Y1 , Y2 ] generated as
log it Pr(Y3  1| Y1 , Y2 )  0.5  1Y1   2Y2  12Y1 * Y2
[ M | Y1 , Y2 ] generated as
log it Pr( M  1| Y1 , Y2 )  0.5  1Y1  2Y2  12Y1 * Y2
Each  j ,  j set to zero or two (various combinations)
N  100,000, n  200, 1000 and 10, 000
Graybill Conference: Partially Missing at Random
24
Simulation Study: methods
CC: Complete Case estimates based on the responding units
M1: ML based on a logistic regression with interaction for Y3
M2: ML based on an additive logistic regression for Y3
NR: Weighting class estimates where nonresponse weights
are obtained based on Y1
PS: Post-stratification weighted estimates (PS) based on Y2
NRPS: Adjust weights using both Y1 and Y2. For the case of
categorical variable, this method is equivalent to Linear
Calibration regression, or Generalized Raking estimates
Graybill Conference: Partially Missing at Random
25
Graybill Conference: Partially Missing at Random
26
Simulation: summary findings
• When response depends on Y1 *Y2 interaction,
all methods do poorly
• When data are MCAR, all methods do similarly
well
• Model-based methods remove almost all the
bias and perform better when response doesn’t
depend on Y1 *Y2 interaction
• Qualitative patterns hold for different sample
sizes
Graybill Conference: Partially Missing at Random
27
Frequentist inference
• Rubin’s (1976) sufficient conditions for
ignorability for frequentist inference
were even stronger (essentially MCAR)
• These can be weakened too – for example
asymptotic frequentist inference based
on ML and observed information matrix
works under conditions given here
• Small sample inference seems more
problematic
Graybill Conference: Partially Missing at Random
28
Frequentist inference
• Rubin’s (1976) sufficient conditions for
ignorability for frequentist inference
were even stronger (essentially MCAR)
• These can be weakened too – for example
asymptotic frequentist inference based
on ML and observed information matrix
works under conditions given here
• Small sample inference is more complex
Graybill Conference: Partially Missing at Random
29
Summary
• Proposed definitions of partial MAR,
ignorability for subsets of parameters
• Expands range of situations where
missing data mechanism can be ignored
• Though, in some cases, MAR analysis
entails a loss of information –
– How much is lost is an interesting question,
varies by context
Graybill Conference: Partially Missing at Random
30
References
Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability
in missing data problems. Biometrika, 2009, 1-14
Little, R.J.A. (1993). Pattern-Mixture Models for Multivariate
Incomplete Data. JASA, 88, 125-134.
Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with
Missing Data (2nd ed.) Wiley.
Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and
ignorability for inferences about subsets of parameters with
missing data. University of Michigan Biostatistics Working
Paper Series.
Little, R. J. and Zhang, N. (2011). Subsample ignorable
likelihood for regression analysis with missing data. JRSSC,
60, 4, 591–605.
Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63,
581-592.
Graybill Conference: Partially Missing at Random
31