Missing Data and Repeated Measurements

Download Report

Transcript Missing Data and Repeated Measurements

Presentation title
Date
1
Missing Data
and Repeated
Measurements
Søren Andersen
Presentation title
Menu
•
•
•
•
•
•
•
•
Missing data taxonomy, MCAR, MAR, MNAR
Handling of missing data in MMRM (SAS)
Illustration of missing data
Technical aspects of Multiple Imputation (SAS)
Inference based on Multiple Imputation (SAS)
Intent to treat, ITT
Estimand concept
Structural missing data
Date
2
Presentation title
Date
Missing data taxonomy
• MCAR the missing-data mechanism is independent of all data,
observed and unobserved
• MAR the missing-data mechanism is independent on unobserved
data given the observed data (the trajectory and the model)
• MNAR all other cases, missing-data mechanism depend on the
missing data given the observed data
• Useful distinction when (only when?) missing data is not counterfactual (useful if a subject dies? Or cannot tolerate the treatment?)
3
Presentation title
Missing data taxonomy
• General result: if missing data mechanism is MAR then the ML
principle adjusts for missing data in the estimation of model
parameters
• Are the model parameters linked to the scientific question of
interest?
Date
4
Date
Estimands
•
•
•
•
Estimand = what is planned to be estimated
Need to improve clarity between objective and estimand
The problem is lack in clarity/details in protocols for study objectives
In many situations estimands are not clear with regard to how posttreatment events such as drop-out/switching/rescue
medication/retrieved data should be taken into account
Slide no
5
Presentation title
Date
Example from nonclinic: CIA model in mouse
• 4 treatment groups, vehicle and three doses of new drug
• 18 mice in each group
• Disease model, the animals are given a clinical (integer) score each
day (duration 11 days), higher scores for worse condition
• Animals are withdrawn from study if the score is greater than or
equal to 11
• Compare the groups wrt the score on day 11, and wrt total score
over days (equivalent to average score)
6
Presentation title
Date
7
Presentation title
Date
8
Presentation title
Date
How are/should drop-outs be treated?
• If not for ethical reasons we would have liked the animals to stay in
the study – we would like to know what would have happened had
the animals continued treatment
• Estimand: Difference in outcome improvement if all subjects adhered
• Data are missing at random since animals are removed due to a high
score, which is observed, and included in the model
• -> a Mixed model (ML) approach is useful, if the “right” model is
selected then appropriate adjustment for missing values takes place
9
Presentation title
Date
Statistical model
• Repeated measurement model:
proc mixed;
class group day id;
model score = day group(day) / noint solution ddfm = satterth;
repeated day / subject = id type = un R Rcorr;
• Note, different group means for each day, unstructured covariance
matrix
Y  X   ,   N (0,V )
10
Presentation title
Explanation of consequences of MMRM
• Obtain predictions of the missing values:
model score = day group(day) / outp = predScore
• The option “outp” will give the predicted values
calculated as the conditional mean given observed values
Date
11
Presentation title
Date
12
Presentation title
Date
Comments to MMRM predictions
• For this case the predictions are extrapolations
• The predictions are based on a regression model, regression
coefficients are derived from the variance-covariance matrix
• The predictions can be explained as “what would have happened to
the animals, had they continued on treatment”
13
Presentation title
Correlation matrix UN and TOEPH
Date
14
Presentation title
Date
15
Presentation title
Comparison of lsmeans from UN and TOEPH
Date
16
Slide no
17
Illustration of drop-outs
• Kaplan-Meier plots
• Descriptive plots
• Simple means
• Spaghetti plots
• Plots to show MMRM model assumptions and model handling of dropouts
• Predicted trajectories
• Influence plots
18
19
Slide no
20
Influence of individual subjects on treatment
difference
Date
Exercise 1
• Run the sas code and inspect spaghetti plots and mean plots
• Run the sas code for proc mixed and inspect the prediction plots and
the influence plot
Slide no
21
Presentation title
Date
Multiple imputation, 3 steps
• The missing data values are imputed M times from an imputation
model and M complete data sets are obtained
• Each complete data set is analysed. Since data is complete a simple
model can often be used, including only data from last visit
• The M sets of parameter estimates are combined into one estimate,
and (an approximation) of the variance of the estimate is obtained
from Rubin’s formula
• Different choices of the imputation model can be used to model
different MNAR assumptions
22
Slide no
23
Multiple Imputation methods
• Two models, an imputation model and an analysis model
• The two models may be (and often are) different, e.g imputation
model may include post-randomisation information, such as history
to drop-out (should include all analysis model terms)
Date
Slide no
24
Different multiple imputation (MI) models
• MI from own treatment group
• MI with a penalty parameter (less favourable response in drop-outs)
• pMI, all imputations of missing data are based on Placebo group data
only
• Discard all post-baseline measurements from Active group and
impute from baseline based on Placebo group
Slide no
25
Multiple pImputation (Ratitch & O’Kelly)
• First impute missing values to obtain only monotone missing value
patterns. The filling algorithm is based on a multivariate normal
model for the variables baseline, y1, y2,… and the filling is done
separately for each treatment group (not a sensitivity issue)
• Then missing data are imputed sequentially from first visit. The
imputation is from an imputation regression model with
parameters estimated from the placebo (or control group)
Slide no
26
Explanation of concepts in placebo imputation
Imputation in the control group is by a regression model as
E(Y2|Y1) = mc2+bc(Y1- mc1)
where the parameters mc1, mc2 and bc are estimated from the control
group, (parameters sampled, random noise added to E(Y2|Y1))
Imputation in the active group:
same model
E(Y2|Y1) = mc2+bc(Y1 - mc1)
so a gained treatment benefit (Y1 - mc1) will be moderated (in general
bc < 1)
Slide no
27
Analysis of imputed data set.
•
•
•
•
Number of imputations? Rubin: 5, NN: 100, Roger: 1000
A simple analysis of each of the imputed, complete, data sets
Only the measurement from the last visit is used in the analysis
Synthesis of information from all the imputations by proc MIAnalyze
in sas® (Rubin’s method)
• Issue with Rubin’s method, conservative estimate of variance?
Date
Exercise 2, pIM applied to TLC data
• Run the sas code to impute missing data in a stepwise manner from
placebo group
• Analyse the imputed data sets, only data from last visit (week 6)
• Combine estimates across the imputed data sets
• Compare to results from MMRM
• Explain the results
• Make appropriate changes in the program and perform the
imputation from baseline in active group
• Compare and explain the results
Slide no
28
Date
Multiple imputations for binary data
• If binary data were obtained by dichotomization (e.g. responder
analysis) then imputation may be performed on the original
continuous scale
• Use logistic regression option in proc MI
Slide no
29
Date
Slide no
30
Exercise 3 Amenorrhea data
• Use imputation from own group by a logistic model and analyse
response at last occasion
• Use imputation from dose 0 by a logistic model and analyse response
at last occasion
• Compare the results to a population average model and a subject
specific model
Date
Estimands
•
•
•
•
Estimand = what is planned to be estimated
Need to improve clarity between objective and estimand
The problem is lack in clarity/details in protocols for study objectives
In many situations estimands are not clear with regard to how posttreatment events such as drop-out/switching/rescue medication
should be taken into account
Slide no
31
Date
Slide no
32
FDA comment
• In your study protocol please include a section describing how you plan to
address missing data.
• We recommend missing data be avoided by continuing to collect (efficacy
and safety) data even from subjects who prematurely discontinue study
drug.
• Our preference is that the primary analysis 1) include all data, not just
data while adhering to study drug, and 2) for the limited missing data that
do occur, it be represented by what their response likely would have been
had it been measured.
• Because missing data tend to be associated with treatment adherence, it
would not be appropriate to have an analysis that uses information from
those with data who adhered to treatment to describe what happened to
those without data who did not adhere to treatment.
Date
Slide no
33
Date
Definition of estimand
• An estimand: A more detailed objective
• Population of interest
• Endpoint
• Measure of intervention effect
1. Discontinuation of study
2. Discontinuation of treatment
3. Rescue medication
4. Retrieved data
Slide no
34
35
Estimand 1: Difference in outcome improvement
at the planned endpoint for all randomized
participants
• Data after withdrawal from the initially randomized medication
and/or the addition of a rescue medication are included in the
analysis
• The intention-to-treat framework (i.e. ITT population) is used to
compare the initially randomized groups regardless of what
treatment subjects actually received
• Rescue medications can mask or exaggerate both the efficacy and
safety effects of the initially assigned treatments
• Causal effects of the investigational drugs are typically the focus, not
treatment policies (efficacy, not effectiveness)
Presentation title
Estimand 1 examples
• Regulators may prefer this estimand as it addresses the expected
change in the population
• Obesity trials
• Duration 12 months
• Subjects who discontinue treatment should come back at planned
final visit for assesments
• Missing data for subjects who do not return should be imputed,
preferably from subjects who discontinue but return
Date
36
37
Estimand 2: Difference in outcome improvements
in tolerators
• Compares the mean outcomes for treatment versus control in the
subset of the population who initially tolerate the treatment
• A run-in phase may be used to identify patients that meet efficacy
and/or safety and tolerability criteria to continue
• Without run-in: potential bias since non-tolerators are not identified
in placebo
Presentation title
Date
Estimand 2, comments
• Patients and prescribers may prefer this estimand, since it can be
interpreted on an individual level (Leuchs et al. 2015)
• A cross-over trial may be used to identify non-tolerators (only data
from completers is used)
• In parallel group design data from non-tolerators are imputed
(counter-factual) from tolerators?
• Subjects with missing data due to other reasons (e.g. lack of
efficacy), have data imputed from own treatment group
38
39
Estimand 3: Difference in outcome improvement
if all subjects tolerated or adhered
• A hypothetical parameter (counter factual), there will always be
patients which cannot adhere
• Secondary assessments of effectiveness are needed
40
Estimand 6 (Mallinckrodt et al)
• Difference in outcome improvement in all randomized patients at the
planned endpoint of the trial attributable to the initially randomized
medication
• Estimand 6 needs to be free from the confounding effects of rescue
medications (so in general there will be missing data)
• Assesses effectiveness
Presentation title
Date
Comments to estimate 6
• Data from subjects who discontinue treatment may be imputed from
placebo starting from first visit with missing data
• Useful quantity to estimate in e.g. a 6 months study of treatment for
a chronic disease?
• The problem is non-tolerators: should they be treated differently in
the two arms, when the comparator is not placebo?
41
Presentation title
Date
Primary analysis and sensitivity analysis
• The current trend seems to be that the required estimand forces
imputation of missing data (MMRM addresses another estimand)
• Two types of sensitivity analyses:
• Internal validation addresses robustness of the estimation method
for the primary estimand with respect to model assumptions
• External validation addresses alternative estimands, i.e. robustness
with regards to generalizability.
42
Presentation title
Date
43
Presentation title
Date
Exercise Social Anxiety Disorder
• Data are from two arms ( A = placebo, B = new drug) in a study of
Social Anxiety Disorder. Measurements are recorded as LSAS score,
at baseline (week 0), and at week 1, 2, 4, 6, 8, 10, 12. The primary
endpoint is change from baseline to week 12
• The data set contains two binary indicators, wloe = 1 for withdrawal
due to lack of efficiency, and wae = 1 for withdrawal due to adverse
event
• Assume you were to plan a new similar study. Discuss the estimand
in particular how missing data due to the two withdrawal causes
should be included. Run sas code to get an idea of withdrawal rates
44