Comparing two strategies for primary analysis of longitudinal trials with missing data Peter Lane Research Statistics Unit.

Download Report

Transcript Comparing two strategies for primary analysis of longitudinal trials with missing data Peter Lane Research Statistics Unit.

Comparing two strategies for
primary analysis of longitudinal
trials with missing data
Peter Lane
Research Statistics Unit
Acknowledgements
 Missing data working group (2001– )
– Fiona Holland (Stats & Prog, Harlow)
– Byron Jones (RSU Harlow)
– Mike Kenward (LSHTM)
 MNLM vs LOCF working group (2004– )
– Paul McSorley (Psychiatry area leader, RTP)
– Suzanne Edwards & Wen-Jene Ko (S&P, RTP)
– Kath Davy, Claire Blackburn, Andrea Machin
(S&P, Harlow)
2
FDA/Industry Workshop 23 September 2004
Contents
 Outline of the problem
 Methods of analysis
 Six clinical trials in GSK
 Simulation study
– parameters estimated from trials
– range of drop-out mechanisms
– comparison of two methods of analysis
 Conclusions
3
FDA/Industry Workshop 23 September 2004
Outline of the problem
 Missing values in longitudinal trials are a big issue
– First aim should be to reduce proportion
– Ethics dictate that it can’t be avoided
– Information lost can’t be conjured up
– There is no magic method to fix it
 Magnitude of problem varies across areas
– 8-week depression trial: 25%−50% may drop
out by final visit
– 12-week asthma trial: maybe only 5%−10%
– Most serious when efficacy evaluated at end
4
FDA/Industry Workshop 23 September 2004
Methods of analysis
 Ignore drop-out




– CC (complete-case analysis)
Single imputation of missing values
– LOCF (last observation carried forward)
Generate small samples from estimated distributions
– MI (multiple imputation)
Fit model for response at all time-points
– GEE (generalized estimating equations)
– MNLM (multivariate normal linear model; also referred to
as MMRM, or mixed-model repeated measures)
Model drop-out as well as response
– SM (selection models)
– PMM (pattern-mixture models)
5
FDA/Industry Workshop 23 September 2004
Properties of methods
 MCAR: drop-out independent of response
– CC is valid, though it ignores information
– LOCF is valid if there are no trends with time
 MAR: drop-out depends only on observations
– CC, LOCF, GEE invalid
– MI, MNLM, weighted GEE valid
 MNAR: drop-out depends also on unobserved
– CC, LOCF, GEE, MI, MNLM invalid
– SM, PMM valid if (uncheckable) assumptions true
6
FDA/Industry Workshop 23 September 2004
Usage of methods
 In the past, LOCF has been used widely



– seen as conservative: not necessarily true
– gives envelope together with CC: not necessarily true
– conditional inference: not often interpretable
MI was developed to improve imputation
– concern with repeatability & assumptions
MNLM is being increasingly used
– software available, but lack of understanding
SM, PMM recommended for sensitivity analysis
– looks at some types of MNAR, requiring assumptions
7
FDA/Industry Workshop 23 September 2004
Compare LOCF and MNLM
 Simulation study, based on experience from trials
– Six trials from a range of psychiatry areas
– Pattern of treatment means over time
– Covariance matrix between repeated obs
– Drop-out rates
 Set up a range of drop-out mechanisms
 Generate many datasets and analyse both ways
 Look at bias of treatment diff. at final time-point
 Look at power to detect diff.
8
FDA/Industry Workshop 23 September 2004
Trial 2
Pick two
comparisons
Trials 3, 4, 6
Pick one
comparison
Gives seven
two-arm
scenarios
9
FDA/Industry Workshop 23 September 2004
Covariance matrix from Trial 4
Week
1
2
3
4
5

.68
.57
.52
.43
Correlation
.72
.64
.53
.83
.70
SD
4.6
6.3
7.2
7.3
7.2
.82
6 .39 .50 .64 .75 .85
7.4
7 .33 .43 .60 .71 .78 .89
7.6
8 .32 .44 .59 .67 .74 .84 .88 7.7
1
2
3
4
5
6
7
Used estimates from each trial in simulation
10
FDA/Industry Workshop 23 September 2004
% drop-out rates from Trials 2 & 6
Week
Treat 1
1
Treat 2
Treat 3
Week
Treat 1
Treat 2
Treat 3
1
2
17
3
11
4
15
5
5
6
11
Total
58
10
6
13
15
14
8
10
8
1
3
49
40
2
3
7
6
3
9
7
3
4
5
5
2
6
6
7
3
8
7
9
9
Total
30
36
22
 Used average rate over times and treatments from each trial
11
FDA/Industry Workshop 23 September 2004
Drop-out mechanisms


MCAR – generate drop-out at random

MNAR – as for MAR, but simulate drop-out at
Time k, so actual response that influences dropout is “not observed”

MAR – classify responses at Time k by size, and
simulate drop-out at Time k+1 with varying
probabilities for each class
Divide all responses at any visit into 9 quantiles,
and investigate 3 probability patterns (next slide)
for drop-out
12
FDA/Industry Workshop 23 September 2004
Drop-out probabilities
Drop-out
probability
increases
as response
increases
These
patterns
give an
average 4%
drop-out
rate per visit
13
FDA/Industry Workshop 23 September 2004
Trial 1, simulation results
 Large treatment difference: 19
– average obs. SD: 19
– patients per arm: 93
 Example of simulation results
– MCAR drop-out
– 1000 simulations
%power_mnlm
%power_cc
%power_locf
%bias_mnlm
%bias_cc
%bias_locf
14
99.90
99.90
99.90
0.32
0.29
–12.17
FDA/Industry Workshop 23 September 2004
Trial 1, summary
 Bias uniformly greater for LOCF
– average 18% vs 4% for MNLM
– all negative bias except one for LOCF (MAR
extreme)
– e.g. MNAR linear: 13% bias for LOCF, i.e. treat
diff 15 rather than 19; 2% bias for MNLM
– e.g. MNAR extreme: 24% for LOCF, 18% for
MNLM
 Power nearly all 100%
15
FDA/Industry Workshop 23 September 2004
Trial 2, first comparison
 Medium treatment difference: 13
– average obs. SD: 19; patients per arm: 75
 Bias greater for LOCF than MNLM except one
(MNAR extreme) with 27% for LOCF, 28% for
MNLM
– average 23% for LOCF, 7% for MNLM
– all negative bias except one for LOCF (+39% for
MAR extreme)
 Power uniformly higher for LOCF: average 92% vs
67% for MNLM
16
FDA/Industry Workshop 23 September 2004
Trial 3
 Medium treatment difference: 3
– average obs. SD: 8.7; patients per arm: 116
 Similar results to Trial 2 with first comparison,
except
– smaller power difference: 76% for LOCF, 60%
for MNLM
17
FDA/Industry Workshop 23 September 2004
Trial 4
 Small treatment difference: 2
– average obs. SD: 6.9; patients per arm: 142
 Bias uniformly greater for LOCF (but small in
magnitude as treatment difference is small)
– average 44% vs 4% for LOCF
– all negative bias except three for MNLM (+2, 0,
0 for MCAR, MAR light and MAR medium)
 Power uniformly lower for LOCF
– average 21% vs 36% for MNLM
18
FDA/Industry Workshop 23 September 2004
Trial 5
 Small treatment difference: 2
– average obs SD: 8.9; patients per arm: 121
 Similar results to Trial 4, except
– smaller bias difference: 12% for LOCF, 4% for
MNLM
– little power difference: 26% for LOCF, 22% for
MNLM
19
FDA/Industry Workshop 23 September 2004
Trial 6
 Almost no treatment difference: 1
– average obs. SD: 10.3; patients per arm: 115
 Bias uniformly greater for LOCF
– average 28% vs 9% for MNLM
– negative bias except five for MNLM (+12, +9,
+5, +2, +4 for MCAR, MAR and MNAR light)
 Power virtually the same
– average 7% for LOCF vs 9% for MNLM
20
FDA/Industry Workshop 23 September 2004
Trial 2, second comparison
 Almost no treatment difference: 1
– average obs. SD: 19; patients per arm: 75
 Similar results to Trial 6, except
– little bias difference: 23% for both
21
FDA/Industry Workshop 23 September 2004
Conclusions
1. MNLM is nearly always superior in terms of
reduced bias
– LOCF is biased even for MCAR with these patterns
– MNLM has virtually no bias for MCAR and MAR
– MNLM has less bias than LOCF for moderate MNAR
– extreme MNAR gives problems for both
2. Bias is usually negative
– underestimates the effect of a drug
– is this contributing to the attrition rate of late-phase
drugs?
22
FDA/Industry Workshop 23 September 2004
Conclusions (continued)
3. LOCF sometimes has more power than MNLM,
sometimes less
– reduced treatment effect can be more than counteracted
by artificially increased sample-size
– against statistical and ethical principles to augment data
with invented values
4. MNLM gives very similar results to CC
– MNLM adjusts CC for non-MCAR effects
– LOCF adjusts CC in unacceptable ways
– other methods must be used to investigate non-MAR
effects: neither LOCF nor MNLM can address these
problems
23
FDA/Industry Workshop 23 September 2004
Actions within GSK
 Continue to propose MNLM for primary analysis of
longitudinal trials
 Prepare clear guides for statisticians, reviewers
and clinicians about MNLM
 Continue to investigate methods for sensitivity
analysis to handle MNAR drop-out
24
FDA/Industry Workshop 23 September 2004
Selected references
 Mallinckrodt et al. (2003). Assessing and interpreting




treatment effects in longitudinal clinical trials with missing
data. Biological Psychiatry 53, 754–760.
Gueorguieva & Krystal (2004) Move Over ANOVA. Archives
of General Psychiatry 61, 310–317.
Mallinckrodt et al. (2004). Choice of the primary analysis in
longitudinal clinical trials. Pharmaceutical Statistics 3, 161–
169.
Molenberghs et al. (2004). Analyzing incomplete
longitudinal clinical trial data (with discussion). Biostatistics
5, 445–464.
Cook, Zeng & Yi (2004). Marginal analysis of incomplete
longitudinal binary data: a cautionary note on LOCF
imputation. Biometrics 60, 820-828.
25
FDA/Industry Workshop 23 September 2004