Bayesian modeling for nonsampling error

Download Report

Transcript Bayesian modeling for nonsampling error

Bayesian modeling of
nonsampling error
Alan M. Zaslavsky
Harvard Medical School
General setup for nonsampling error
• Focus on measurement error problem
– Item responses with error
– Item or unit nonresponse as a special error
response
– …or nonresponse as part of error for aggregates
• Y = data measured with error
• Y* = latent “true” values (object of inference)
– Might be observed for part of data (calibration)
• X = covariates
– Assumed (for presentation) correct and complete
– Include design information
Objective of inference
• Estimate statistics of “true” values f(Y*)
• Estimate parameters of models
– From likelihood standpoint:
inference from L(q | Y*,X)
– (Specifically) from Bayesian standpoint, draw
from P(q | Y*,X)
• Both possible if we have draws of Y*
– Multiple imputation for valid inferences
Two ways to factorize distribution
• Predictive factorization:
P(Y,Y* | X,b,b*) = P(Y* | Y,X,b*) P(Y | X, b)
– Direct prediction of Y* for imputation
• “Scientific” factorization:
P(Y,Y* | X,b,b*) = P(Y | Y*,X,b) P(Y* | X, b*)
– First factor is observation (measurement error)
model
– Second factor is model for true relationships
More on “scientific” factorization
• Separates two distinct processes
– Information might be from different sources
– Possibility of more (or different) generalizability
• Models are more interpretable
– Incorporate prior information for specification
and parameters
– Easier to assess “congeniality” of models?
• Compare model for P(Y* | X, b*) with model
involving q
– Simplifications? e.g. P(Y | Y*,X,b) = P(Y | Y*,b)
Inference with “scientific”
factorization
• Computations via Gibbs sampler
– Imputation of Y* by Bayes’s theorem
– Complete-data inferences for b, b*
• Inferences of scientific interest (q)
– Multiple imputation inference using Y*
– Direct from model if q=q(b*)
Possible sources for measurement
error model parameters (b)
• Calibration study
– Sample of (Y,Y*) pairs to identify the two
parameters
– For robustness, important to build in adequate
flexibility to avoid identifying off unverified model
assumptions about P(Y | X,b,b*)
• Prior studies (also used Bayesianly as prior)
– Previous calibration model estimates, if
measurement process is consistent
– Synthesis of accumulated survey methodology
Example 1: Correction for
underreporting in study of
chemotherapy for colorectal cancer
• Provision of guideline-recommended
adjuvant chemotherapy a critical issue in
quality of care for cancer
• Cancer registries as a source of chemo
data
– Excellent population coverage
– Underreporting of treatment
California study
• Cancer registry data
– Statewide coverage
– About 70,000 cases over 5 years in relevant
stages (appropriate for chemotherapy)
• Calibration survey
– Request medical record data from physicians
– Limited in time (1+ year) and space (3 of 10
regions)
– 1956 cases in sample, 1449 (74%) respond
Reporting of adjuvant therapy
• Folllowup survey response rate higher …
– at HMO-affiliated and high-volume hospitals
– when chemo reported in original record
• 82% of adjuvant therapy was reported to
Registry (among “respondents”)
– Substantial underestimation if Registry alone
used
– More complete in teaching hospitals, HMO
affiliates, high volume hospitals, younger and
rectal cancer patients
Cress et al., Medical Care 2003
Naïve estimation of administration
of adjuvant chemotherapy
• Analysis based only on “gold standard”
survey + Registry data in sample
• Strong variation by patient characteristics
– Age (less if older), marital status
– Race (less if Black, more if Hispanic, Asian)
– Income (upward gradient with higher income)
• Substantial unexplained hospital-level
variation
Ayanian et al., J Clinical Oncology 2003
Limitations of standard analytic
approaches
• Survey respondents alone:
– Small portion of available California data
(1449/70,000)
– Single area of state
– Unrepresentative due to survey nonresponse
– Confounding of survey response, reporting,
treatment variation (e.g. volume effects)
• Registry data alone:
– Underreporting of chemotherapy
– Reporting is nonuniform
Combining Registry
and survey data
• Combine
– power of large Registry data
– correction for underreporting based on
survey
• Simple correction based on:
P(reported chemo) =
P(chemo)  P(report | chemo)
Therefore: P(chemo) =
P(reported chemo) / P(report | chemo)
Registry plus simple correction
• In survey:
P(reported chemo) = 59%
P(report | chemo) = 82%
P(chemo) = 59%/82% ≈ 71%
• Outside survey (mostly rest of state):
P(reported chemo) = 49%
P(report | chemo) = 82%
P(chemo) = 49%/82%≈ 60%
Depends on assumption that reporting is
similar in the two areas
Model-based methodology
(Yucel and Zaslavsky)
• Disaggregated model
– Take into account individual effects on both
chemotherapy and reporting
– Take into account hospital variation in both
chemotherapy and reporting
• Imputation of chemo for individual cases
– Allow fitting of any desired models
– Multiple imputation to obtain proper measures
of uncertainty with imputed data
Models for reporting and therapy
• Logit or Probit regression for therapy (outcome)
– Patient p has characteristics xhp: age, sex, race/ethnicity,
comorbidity score (Charlson), tumor stage/site, income
category
– Hospital h has characteristics zh: volume, ACOS-certified
registry, teaching
– Random effect gh for hospital h
logit P(chemohp) = bxhp + lzh + gh
• Similar model (with or without random effect) for
reporting given therapy
– Random effects for reporting & therapy could be
correlated
Two versions of hierarchical model
(a) single random effect
Outcome
Reporting
(b) bivariate RE
Outcome
←Parameters→
Latent “true” status
Observed status
Reporting
Fitting the model
• Full Bayesian specification
– Diffuse priors for coefficients, (co)variances
• Fit via Gibbs sampling: alternately
– Impute true chemo status for non-survey
cases
– Draw random hospital effects g
– Draw “fixed” coefficients b, l and variance
components S
Imputing chemo status (Bayes thrm)
• Example: consider individual (not in survey)
for whom models give
– Prior P(chemo)=70%
– Prior P(reporting | chemo) = 80%
• If chemo reported, then true chemo = 1
• If chemo not reported:
– P(no chemo, no report) = 30%
– P(chemo, no report) = 70%  20% = 14%
– P(chemo | no report) = 14%/(14% + 30%) ≈ 32%
– Impute chemo=1 with probability 32%
Computing: probit via latent variables
• Probit model: F(P(Yhp=1))= bxhp + lzh + gh
– Equivalently: Yhp=1 ↔ ehp < bxhp + lzh + gh,
where ehp ~N(0,1) is a normal latent variable
(Albert & Chib 1993)
– Equivalently, Yhp=1 ↔ uhp= bxhp + lzh + gh−ehp >0
– Observing Yhp implies truncated normal posterior for uhp
given higher-level parameters b, l, gh
• Given a draw of uhp, higher levels reduce to normal
multilevel model with observation uhp
and fixed variance=1 at bottom level (well-known problem)
• independent of the discrete data or imputed values
• direct generalization to correlated bivariate response
“Restricted” inference for robustness
• Two kinds of information involved in inference for
“reporting” model
– “Direct” in survey sample (1449 cases):
Y | Y*, parameters, X
– “Indirect” in remaining area (~74,000+ cases):
Y | parameters, X
(combines outcome & reporting models)
– Possibly sensitive to model misspecification?
• Ad hoc solution: Restrict likelihood for reporting
model to direct data from reporting survey cases
– Throw away some information from others
– Greater robustness to slight misspecification?
– Reparametrize S as regression g(R)| g(O) & marginal g(O)
Direct interpretation of fitted model
• Effects broadly similar to those in naïve
(sample only) analyses.
– Volume effect on reporting but not on chemo
– Lower chemo rate outside survey region
• Substantial hospital random effects in both
reporting and therapy rates
– Indication of substantial unexplained variation
– a problem (from health services standpoint)!
– Reporting completeness and therapy rates
not (residually) correlated
Using imputations to estimate effect
of chemotherapy on survival
• Re-fit model including 2-year survival as predictor of
chemotherapy
• Using imputed corrected chemotherapy, fit model
with chemotherapy (and other variables) as
predictor of survival
– Correct variances with multiple imputation
– Missing info ≈70% for chemo, 1-4% for other variables
• Finds significant positive effect (OR=1.26) of chemo
on survival
– [Are the severity controls good enough?]
Modeling critical with missing data
• Several kinds of missing data:
– Unreported chemotherapy
– Nonresponse to followback (validation) survey
– Areas excluded from followback survey
• Potential for confounding if unjustifiable
MCAR (or insufficiently conditional MAR)
assumptions are made
– MCAR = Missing Completely at Random:
missingness independent of everything
– MAR = Missing at Random:
missingness independent of unobserved,
conditional on observed
Some countinterintuitive results!
Hospital Volume
Low Med High
63
73
78
81
81
92
All
75
87
54
44
51
68
66
53
63
72
62
48
58
71
70
44
63
71
74
53
69
69
73
47
67
69
Survey response rate
Reporting completeness in survey
Chemotherapy rates by registry
Survey respondents
60
Survey nonrespondents
40
All
52
Chemotherapy rates by survey
77
Chemotherapy rates by hybrid method
Survey respondents
80
Survey nonrespondents
40
All
65
Chemotherapy rates under model
67
Limitations and potential design
improvements
• Major limitation: calibration survey is
unrepresentative (in known ways)
– Only covers some areas (trial implementation)
– Differences by region in reporting are plausible
– Can evaluate sensitivity to alternative
assumptions
• Could improve design for ongoing studies
– Sample across entire area
– Quality improvement for both therapy and
reporting
Example 2: Adjustment for
measurement bias of 1990 Post
Enumeration Survey
• Post-Enumeration Survey provides
estimates of proportional error in Decennial
Census estimates
– Includes whole-household and withinhousehold under- and overenumerations
– Tabulated for poststrata of individuals defined
by household-level (region, urbanicity) and
individual-level (age, sex, race/ethnicity)
variables
Notation for undercount estimation
(Zaslavsky 1993, JASA)
• k = domain index
• ck = population share of domain k
• y*k = true census underenumeration rate
•
yˆ k = (biased) estimate of y*k from survey
• yk = E yˆ = expectation, bk = yk −y*k= bias
k
• bˆ = unbiased estimate of bk, E bˆk = bk
k
• Constraints: S ck y*k = S ck yk = S ck yˆ k = S ck bk = 0
(sum of errors in shares is 0).
• Sampling variance of yˆ = Var yˆ | y = Vy
Components and variance of
bˆk
• Sources of bias estimates (total error model)
– Small calibration studies to estimate process
errors (matching, geocoding, fabrications)
– Model-based estimates of correlation bias
– Uncertainty about imputation model
• Var ( bˆ − b) = Vb includes
– Sampling variances from calibration studies,
– Uncertainty across correlation bias models,
– (Multiple) imputation variance and model
uncertainty
A naïve approach and its problems
• Simple bias corrected estimate is yˆ  bˆ
– Unbiased estimator of y*
– Variance is Vy + Vb and Vb is likely to be large
– Problem for non-Bayesian approaches: if we have
very little data to estimate something, must we
assume that it could be “anything”?
• Alternative (Bayesian) approach: introduce
reasonable prior beliefs
– Bias terms bk are a collection centered around 0
– Characterize variability by variance component
– Similar argument for undercount terms yk
Hierarchical model for estimation
and bias correction
ˆ
y

• “Sampling” model:   ~ N   y ,  Vy
 bˆ 
b  0
 
  
0 
 
Vb  
– Not exactly “sampling” since some model
uncertainty is included in Vb
• “Structural” (Level 2) model:
2



U



U
y
0
 


y
yb
y
b


  ~ N   , 
2
   U



U
b
0
 
b

    yb y b
Hierarchical model for estimation
and bias correction
• “Structural” (Level 2) model:
2



U



U
0
y


y
yb
y
b


  ~ N   , 
2
   U



U
b
0
 
b

    yb y b
– Undercount and bias terms each drawn from
common distribution
– Proportional covariance structures for each and
for correlation of the two
– Matrix U based on a prior “similarity” of domains
(number of common characteristics)
Priors and inference
• Fairly vague priors for variance components,
correlation
– These represent assessments of degree of variation
in bias, undercount and how they relate across
domains
– Key to this inference is existence of collection of
domains
• Inference via Gibbs sampler
• Extensive simulations
– Compare to uniform shrinkage, hypothesis testing
approaches, etc.
– Suggested that full hierarchical Bayes model would
outperform competitors
Analyses with 1992 data
• Data combined 3 sources
– 1990 census
– Post-Enumeration Survey
– Various sources of bias component estimates
• Estimates:
– Substantial differential undercount, ~y  1.2%
– Substantial differential bias, ~b  3.2%
Refinement: misaligned domains
(Zaslavsky 1992, Proc. SRMS)
• Domains for bias estimates might differ
from those for y
– e.g. if they combine the main domains
ˆ  Xbˆ
– Observation is b
0
• Modifies the sampling model:
  y   Vy
 yˆ 
  ~ N   , 
ˆ
 b0 
  Xb   0
0 
 
XVb X'  
• Applied to 1992 data:
– 357 poststrata, 51 poststratum groups, but only
10 evaluation poststrata
Other potential applications
• Domain-level estimates
– No gold standard data for individuals
– No individual-level corrections
• Many applications where there are small
evaluation samples for a measure
– Welfare or food stamp payment error
– Quality evaluations in medical care
Example 3: Imputation of households
to correct for enumeration error
• Setting: Census (or survey) of households
with errors of enumeration
– Whole-household errors
– Within-household errors
– [Assumption (here) that all errors are omissions]
• Objective: To (multiply) impute corrected
rosters.
– Add person to households
– Impute additional households
Bayesian imputation strategy
(Zaslavsky 2004; Zaslavsky & Rubin 1989 Proc. ARC)
• Based on “scientific” factorization
– Prevalence model: distribution of households
by compositional type (roster of members by
poststratum), P(Y*bk=t | bk)
 bk= (latent) parameter of block b
– Observational model: probability of observed
types (with error), P(Ybk=u | Y*bk=t,b)
Model specifics
• Prevalence models
– x(t) summarizes characteristics of type t
– Prevalence proportional to exp(x(t) · bk) · h(t)
• h(t) is (nonparametric) general prevalence of type t
• Observational model
• Loglinear model based on probabilities of omission of
individuals
• Terms for dependence of omissions within household
• Could be based on (hypothetical) dataset …
• … and/or calibrated to match aggregate omission
rate estimates by poststratum
Imputations
• Draw Y*bk by Bayes’s theorem
– Possible values are those types that could
“lose” one or more members yielding
observed Y*bk
– Draw from all possible values of t
• Special type for unobserved households
– Count imputed using SOUP (unbiased) prior
– True types imputed similar to others
• Gibbs sampler to estimate all parameters
General summary of examples
• All are “Bayesian” in drawing corrected
values from posterior distributions
– “Scientific” factorization for interpretability
(Examples 1 and 3)
– “Observations” might have simple (Ex. 1,2) or
complex (Ex. 3) structure
• Bayesian also in
– Incorporating prior information
– Pooling across collections of units (“shrinkage”)
– Hierarchical specification of complex models
– Probability representation of model uncertainty
(Ex. 2)
Program to move forward
• Systematic quantitative meta-analysis of
information on nonresponse errors
• Models for various types of nonresponse
error
• Think more about how to combine
information from data and model uncertainty
• Standard algorithms and software
• Integrate with analyses of nonresponse,
item missing data, etc.