DESIGN AND ANALYSIS ISSUES IN FAMILY

Download Report

Transcript DESIGN AND ANALYSIS ISSUES IN FAMILY

Monte Carlo Maximum Likelihood
Methods for Estimating Uncertainty
Arising from Shared Errors in
Exposures in Epidemiological Studies
Daniel O. Stram
University of Southern California
Complex Dosimetry Systems: a Working
Definition (my definition)
• A complex dosimetry system for the study of an
environmental exposure is one in which no single best
exposure estimate is provided
• Instead a distribution of possible true exposures is
developed, together with a computer program that
generates exposure replications from this distribution
– Generates doses conditional on input data
• Both shared and unshared errors are incorporated into the
dose replications
• The statistician/epidemiologist treats this system as a
“Black Box”, ie one that (s)he can manipulate, but doesn’t
know (or care?) about its inner workings
• Some examples of epidemiological
studies (of radiation) that use a complex
dosimetry system to estimate doses
– Utah Thyroid Disease Cohort Study
– Hanford Thyroid Disease Study
– Colorado plateau Uranium Miners study
• In such studies limited or no direct
measurements of individual dose exist.
Instead a complex dose reconstruction
(Utah, Hanford) or interpolation system
(Colorado) is used to construct individual
dose estimates or histories.
• Even when all subjects in the study have
(radiation) badge measurements these
may need adjustments to reflect temporal
or geographical differences in monitoring
technology
– Random errors and systematic biases exist for
virtually any method
– Information about the size of random and
systematic biases for each dosimeter type
comes from only a few experiments
• Therefore there may considerable uncertainty in the
systematic biases for any single dosimeter
– Systematic biases constitute shared error
Representation of uncertainty in
complex dosimetry systems
• Uncertainty in the dose estimates produced by
these systems is increasingly characterized
using Monte-Carlo methods which yield many
“realizations” of possible dose, rather than a
single best estimate of dose for each subject.
• Part of the uncertainty of these estimates may be
due to lack of knowledge of factors that influence
simultaneously some or all the subjects’ doses
Dose estimation in the Hanford
Thyroid Disease Study
• Reconstruction based on physical modeling and
some measurements of
– Releases of I-131
– Deposition and pasture retention of I-131
– Pasture practices
– Milk transfer coefficients
– Individual consumption of milk
• Note that errors in most of these will affect doses
for all individuals simultaneously
Colorado Plateau Underground Miners
Study
•
Dose estimates created using a complex exposure history / job
history matrix
•
PHS exposure history matrix consisted of interpolations of limited
WLM measurements temporally and geographically
– Stram et al 1999 used a PHS developed hierarchy of mines within
localities within districts and used a multilevel model to mimic
temporal and geographical variation in dose.
•
The 1999 analysis was based upon the ‘regression-substitution’
method in which E(true dose|all measurements) was computed for
each mine-year, after fitting the lognormal multilevel model to the
WLM measurements
•
Errors in mine-year measurements are correlated by the
interpolation system used, and many miners work in the same
mines leading to correlated errors in the exposure history of each
miner.
Pooled Nuclear Workers
•
Multi-facility, multi-year study
– Each worker had badge measurements but the technologies
changed through time and across facility.
•
The systematic errors in each badge type are shared by all
subjects working at the time the badge was in use
– For many but not all types of personal monitor some limited
work (using phantoms, etc.) has been done to assess the
relationship between true exposure and the badge
measurement
•
One important issue is whether the low dose-rate
exposures of the N workers produce risks that are in line
with those seen for the A-bomb
– upper confidence intervals that take account of shared
dosimetry error needed
Monte-Carlo Dosimetry
• Adopts a Bayesian framework
– Is Bayesian about sampling error in the experimental work
(with badges), interpreted as giving posterior distributions
– Prior distributions for uncertain parameters (for N workers, the
likely size of biases for unmeasured badges) using expert
opinion
• For each replication the uncertain parameters are sampled
from their distribution and combined with samples of other
random factors (e.g. local meteorology for the Hanford or
Utah studies) and with all relevant individual data for each
subject (location, milk consumption, age, etc)
• Each set of random quantities is combined with individual
data to form dose estimates for each individual
• Let us assume that the dose replications really
may be regarded as samples from the distribution
of true dose given all the individual data
– For retrospective dose-reconstruction systems this
assumption may be a very large leap of faith
– For other studies using badge calibration (Workers) or
measurement interpolation this may be considerably
more solidly founded.
• Consider the sampling characteristics of
frequentist inference concerning risk estimation.
We want to know the influence of uncertainty on
– The power to detect an effect (of exposure on risk of
disease) of a certain size
– Confidence limits on estimated risk parameters
An idealized dosimetry system
• Assume each replication of dose is a
sample from the joint distribution
f(X1, X2,.., XN | W1, W2,…,WN)
of true dose given the “input data” Wi
recorded for all subjects. Because many
realizations, from f(X|W) are available we can
calculate
Zi = E(Xi| W)
as the average over a very large number of
realizations, Xri where Xr ~ f(X|W) r=1 … 
How should an epidemiologist deal
with the uncertainty in the random
replications of dose?
• We are interested in estimating
parameters in the dose-response function
for disease Di given true dose Xi ,
specifically the relationship
E(Di | Xi)
• parameterized by  (dose response slope)
Simplifications of the disease model
•
Assume a linear relation between D and X
E(Di | Xi) = a + b Xi
•
(1)
Linear models are of interest for at least two
reasons
1. They may be important for radio-biological and radio
protection reasons even for binary disease outcomes
(where logistic regression models are the “standard”)
2. For “small” b it may be impossible to distinguish
between linear and smooth nonlinear (e.g. logistic)
dose response shapes
–
A study with good power to detect a dose-response
relationship may have very poor power to fully define the
shape of the response
Berkson error models
• If the errors in the Z_i’s defined above are
independent from one another then fitting model
(1) is done by replacement of true X_i with Z_i.
• This is a Berkson error model in the sense that
the truth is distributed around the measured
value. Regression-substitution yields unbiased
estimates.
• The classical error model has the measurement
distributed around the truth. This produces risk
estimates that are biased towards the null.
Impact of independent measurement error
• For either Berkson or Classical error
models the most important effect of
random error is loss of power to detect
nonzero risk estimates
– If R2 is the squared correlation between true
exposure X and measured exposure Z then it
will take 1/R2 subjects to detect the same risk
using Z as using true X.
Shared versus unshared dosimetry error
•
A key distinction between the effects of shared
versus unshared dosimetry error is their effect
on the validity of sample variance estimates
used to characterize the variability of the
estimates
– Independent Berkson Errors: The “usual” estimate of
the std error of the slope estimate remains valid despite
the loss of power
– Independent Classical Errors: Again the “usual”
estimate of the standard error of the slope estimates
generally remains valid despite
1. The loss of power
2. The attenuation in the dose response parameter estimate
Dosimetry simplifications
• Adopt a generalization of the Berkson error
model for the joint distribution of true Xi around
its conditional mean Zi which incorporates both
shared and unshared errors
X i   SM  M i Zi   SA   Ai
SM is shared multiplicative error with mean 1
M,i is unshared multiplicative error with mean 1
SA is shared additive error with mean 0
A,i is unshared additive error with mean 0
• Under this shared and unshared
multiplicative and additive (SUMA) error
model we have E(X|W) = Z (the usual
Berkson property) over the distribution of
all four 
• What happens when we fit
E(Di | Zi) = a* + b* Zi
If there are no measurement errors Var()
= 0, we will have (for small values of b* )
Var ( D)
*
ˆ
Var (b ) 
NVar ( Z )
(1)
Effects of shared and unshared errors on
estimation
– We are interested in three questions
regarding each error component in the
SUMA model
• What is its effect on study power?
• What is its effect on the validity of expression
(1) for the variance of the estimate of b?
• How valid are the estimates of study power
when they are based on expression (1)?
– Shared Additive error: has little effect on either
the estimation of b or on the variability of the
estimate
– Unshared Additive or Multiplicative errors:
reduces the correlation, R, between Xi and Zi,
thereby reducing study power, the reduction in
study efficiency due to unshared measurement
error is roughly proportional to R2
– however the validity of expression (1) for the
variance of the estimator remains appropriate.
Further the estimate of study power using (1)
remains appropriate
Effect of multiplicative shared error
• Averaging over the distribution of random
SM we retain the “Berkson” property that
*
ˆ
E(b )  b
But with
*
*
ˆ
ˆ
Var (b )  Var{E(b ) |  SM }
*
ˆ
E{Var(b |  Sm ,  SA )}
b
2
2
SM

Var ( D)
NVar ( Z )
(2)
• Notice that if b = 0 that the “naïve”
estimate of the variance of
*
ˆ
b ignoring the shared error
is equal to
the true variance of this parameter
• If |b| > 0, the naïve estimate of the variance
2 2
is biased downward by b  SM
• We conclude
1. Ignoring shared error does not affect
the validity of the test of the null
hypothesis that b=0, because
expression (2) = expression (1) when
b=0
– More generally: non-differential ME
weakens the power, but doesn’t invalidate
the validity, of a test of association
between disease and exposure
2. Ignoring shared error will overstate the
power to detect a b>0, because (1) < (2)
in this case
– Ignoring shared error will result in
confidence limits that are too
narrow
– However it is the upper
confidence limit that is most
affected.
• If the lower confidence limit ignoring
shared error does not include zero,
correcting for shared error will not
cause it to include zero (because of
conclusion 1)
How to incorporate shared ME directly into
an analysis
• Multiple imputation
• “Full Parametric Bootstrap”
• Likelihood analysis with MCML
Multiple Imputation
ˆ
• It is tempting to try to quantify the uncertainty in b
by regressing Di on each set of Xr and using the
quantiles of the resulting
as confidence limits
r
for b
bˆ
– This ignores the sampling variability of D
– Moreover the distribution of the slope estimates can be
badly biased towards the null value. Essentially there is
a reintroduction of classical error into the problem
• True multiple imputation requires sampling Xr
from the distribution of X given both the input
data W and the outcomes Di (not just W) to
remove these biases
“Full Parametric Bootstrap”
• A simulation experiment in which bˆ is used as the
true value of the risk parameter and both doses and
outcomes Di are simulated from a complete model
Monte-Carlo maximum likelihood
• We can compute likelihood ratio tests as
follows
– For null a0 and b0 generate n samples of Xr
from the distribution of X given W and D
– For any test values a and b compute the log
likelihood ratio as
1
f ( D | X r ; a, b) 
ln (a, b)  log  
)
f (D | Xr ; a0 , b0 ) 
n
– If we use b0 = 0 then we don’t have to condition
on D (so that we can use the dosimetry system
directly)
Once we compute the likelihood what do
we do with it?
• We have a funny mishmash; we are being
– Bayesian about the doses
– Frequentist about the dose-response parameter
• Moreover we can’t really expect standard
Frequentist asymptotic likelihood theory to hold
– Suppose the number of subjects  ∞ then the
distribution of bˆ will be dominated by the distribution
of the shared multiplicative errors in the dosimetry
system the distribution of which is arbitrary.
– Is it still reasonable to use chi-square approximations to
the distribution of changes in log likelihood?
Other problems
• If shared multiplicative error is large then as |b-b0|
gets large the summands in (5)
f (D | Xr ; a, b)
f (D | Xr ; a0 , b0 )
become extremely variable
– Convergence of the average is incredibly slow
– Round-off error dominates the performance of the
algorithm
Application to the ORNL N-workers data
Stayner et al in review
• Estimate a single risk parameter using
(time dependent) total dose in a partial
likelihood analysis
• Write a computer program that simulates
the bias factors for the badges used in
those facilities and re-links the risk sets
Three analyses
1. Compute the MCML Likelihood
– For each replication of doses compute the
partial likelihoods over a 1-dimensional grid of
risk parameters
– Average the partial likelihoods over the
replications
– Pretend that the asymptotics still hold and
compute a confidence interval
2. Compute FPB estimates of bˆ
•
Compare these to the MCML confidence intervals
3. For each set of D computed in 2 compute a
separate MCML confidence interval (more
simulations from the dose distribution)
•
Count the number of times that the standard
frequentist confidence interval contains the true
value of the risk parameter
-1
0
Log Likelihood
-2
y
MCML log likelihood
-3
Uncorrected log likelihood
0
2
4
6
8
10
12
Betahat_r
100
7.8% < LCI
50
6.8% < 0
3.3% > UCI
0
Frequency
150
FPB
simulations
-5
0
5
bhatx
10
15
Some observations
• The MCML widens the confidence interval on the
high side more than the low side
– The 90 percent “asymptotic” lower CI for the MCML
does not include 0.
• This is good because (1): the uncorrected CI did not
include 0; and (2) we claim that correcting for
measurement error shouldn’t affect the significance of a
test of no association.
• Note that the two curves (corrected and MCML log
likelihoods) are very close to parallel at b=0
– This implies that a score test of beta=0 wil be (nearly)
identical using the corrected and uncorrected likelihoods
using any significance criterion
• This observed result follows from Tosteson and Tsiatis
1988 on score tests for EIV problems
• The FPB on the other hand puts “significantly”
more than 5 percent of the estimates < 0 (68 of
1,000) and significantly fewer of the estimates (33
of 1,000) above the MCML UCI.
• This may actually be a promising observation for
the validity of the MCML confidence intervals
– They tend to be skewed to the right (not symmetric
around the MLE) so more (than 3.3 percent) of the upper
confidence limits and fewer (than 6.8 percent) of the
lower confidence intervals should fail to contain the true
value
– Simulations are now in progress
•
Validity of MCML CI
Consider limiting case when n->∞ the slope estimate will be
determined by the distribution of shared multiplicative errors
• Worst case would be the SUMA model
– Suppose that SM is distributed as log normal with arithmetic mean 1
(log mean -1/2 2)
bˆ
– Then
= b / SM is also distributed as log normal with mean
parameter log(b) -1/2 2 and log variance 2
– Consider twice the change in log likelihood from true b to MLE
– This will be 1/2[log( bˆ )-(log(b)-1/2 2)]2 which is exactly 2
• Consider next a normal distribution for the shared multiplicative
error
– Would make sense if the SM was itself a sum of many components
• For this model twice the change in log likelihood
is of form
-2 log(|SM|) + 1/(2)(SM-1)2 + c
Where c = 2 log(1/2+1/2(1+4 2))
- [1/2(1+4 2)-1/2]2 / 2
• What is the distribution of this random variable?
– How close is it to a Chi Square w 1 df?
Change in likelihood,Std Dev=.2
0.4
0.4
0.6
0.8
1.0
0.4
0.6
0.8
1.0
Chisq CDF
Change in likelihood,Std Dev=.3
Change in likelihood,Std Dev=.4
0.6
0.4
0.6
Empirical CDF
0.8
0.8
1.0
1.0
Chisq CDF
0.2
0.4
Empirical CDF
0.6
Empirical CDF
0.6
0.4
Empirical CDF
0.8
0.8
1.0
1.0
Change in likelihood,Std Dev=.1
0.4
0.6
Chisq CDF
0.8
1.0
0.4
0.6
Chisq CDF
0.8
1.0
Conclusions
• The MCML has promise but it is complicated
– But other methods (multiple imputation, etc, have
complications of their own)
– Score tests of b=0 based on the average likelihood agree with
analyses that ignore measurement errors
• Our application of the MCML method for partial likelihoods
ignores the dilution effects described by Prentice
(Biometrika 1982): but these are expected to be very small
in most settings
• In shared error settings the asymptotics are not “correct”
for ordinary frequentist calculations, but it seems to be
hard to come up with situations where they fail drastically
Acknowledgements
• Leslie Stayner, Stephen Gilbert UIC/NIOSH
• Elisabeth Cardis, Martine Vrijheid, Isabelle
Deltour, IARC
• Geoffrey Howe, Columbia
• Terri Kang, USC