Revitalizing Clinical Trial Methodology and Translational

Download Report

Transcript Revitalizing Clinical Trial Methodology and Translational

Moving beyond the comfort zone in
practicing translational statistics
L.J. Wei
Harvard University
Why are we staying in a
“Comfort Zone” ?
 Generally following a fixed pattern for conducting studies
 Are we like lawyers?
 Avoiding delay of review processes?
What is the goal of a clinical study?
 Use efficient and reliable procedures to obtain robust,
clinically interpretable results with respect to risk-benefit
perspectives at the patient’s level.
What are the problems?
 The conventional way to conduct trials gives us fragmentary
information
 Lack of clinically meaningful totality evidence
 Difficult to use the trial results for future patient’s management
A Few Methodology Issues
1. Estimation vs. testing
 P-value provides little clinical information about treatment
effectiveness
 The size of the effects (efficacy and toxicity) matters
 Design using interval estimates is quite flexible
 Almost everything we want to know via testing, we can get
from estimation
TREAT study for EPO CV safety
 If we follow the patients up to 48 month, the control
arm's average stroke-free time is
46.9 months and the Darb arm's is 46 months. The difference
is 0.9 month (0.95 CI: 0.4m, 1.4m) with p<0.001 (very
significant).
2. How do we define a primary endpoint with
multiple outcomes?
 What is current practice?
 Component/composite analyses
 Efficacy and toxicity (how to connect them together?)
 Disease burden measure?
 Competing risks problem?
 Informative dropout?
Example : Beta-Blocker Evaluation of Survival
(BEST) Trial (NEJM, 2001)
 Study
 Bucindolol vs. placebo
 patients with advanced chronic heart failure
-- n = 2707
 Average follow-up: 2 years
 Primary endpoint: overall survival
 Hazard ratio for death = 0.90 (p-value = 0.1)
BEST Trial
Possible solutions?
 Using the patient’s disease burden or progression information
during the entire followup to define the “responder”
 Creating more than one response categories: ordinal
categorical response
 Brian Claggett’s thesis paper
BEST Example: 8 Categories
 1: No events
 2: Alive, non-HF hospitalization only
 3: Alive, 1 HF hosp.
 4: Alive, >1 HF hosp.
 5: Late non-CV death (>12 months)
 6: Late CV death (>12 months)
 7: Early non-CV death (<12 months)
 8: Early CV death (<12 months)
3. How to handle dropouts or competing risks?
 LOCF? BOCF?
 MMRM (model based)
 Pattern mixture model (cannot handle non-random missing)
 Using responder analysis with different ways to define
informative dropouts for sensitivity analysis
4. Analysis of Covariance
 Compare two treatments with baseline adjustments via
regression models
 For nonlinear model, different adjustments may lead to
incoherent results
 The inadequacy of the Cox ANCOVA
Possible solutions?
 Using the augmentation method by Tsiatis et al; Tian et al.
 No need to pre-specify the baseline covariates, but a set of
potential covariates in the adjustment process
5. Data monitoring
 Heavily utilizing p-value or conditional power
 A low conditional power may indicate that the sample size is
too small or there is no real treatment difference
 Using estimation and prediction for monitoring?
6. Stratified medicine (personalized medicine)?
 A negative trial does not mean the treatment is no good for
anyone
 A positive trial does not mean it works for everyone
 The usual subgroup analysis is not adequate to address this
issue
 Need a built-in pre-specified procedure for identifying
patients who benefit from treatment
7. Identify patients who respond the new therapy
(predictive enrichment)
8. How to monitor safety?
 What is the conventional way?
 Component-wise tabulation or analysis?
 No information about multiple AE events at the patient level
 Graphical method?
9. Quantifying treatment contrast (difference)?
 Should be model-free parameter
 Using difference of means, median, etc.
 For censored data, using a constant hazard ratio (heavily
model-based)?
 Model-based measure is difficult to interpret or validate
Issues for the hazard ratio estimate
 Hazard ratio estimate is routinely used for designing,
monitoring and analyzing clinical studies in survival analysis
Model Free Parameter for Treatment
Contrast
* Considering a two treatment comparison study in “survival
analysis”
* How do we quantify the treatment difference?
• Median failure time (may not be estimable);
• t-year survival rate (not an overall measure)?
• A constant hazard ratio over time with the log-rank test
Eastern Cooperative Oncology Group
 E4A03 trial to compare low- and high-dose dexamethasone




for naïve patients with multiple myeloma
The primary endpoint is the survival time
n=445
The trial stopped early at the second interim analysis; the low
dose was superior.
Patients on high-dose arm were then received low-dose and
follow-up for overall survival were continued.
1.0
A Cancer Study Example
0.8
Group 1
0.6
0.4
0.2
0.0
Probability
Group 2
0
10
20
30
Month
40
 The proportional hazards assumption is not valid
 The PH estimator is estimating a quantity which cannot be
interpreted and, worse, depends on the study-specific
censoring distributions
 Any model-based treatment contrast has such issues (need a
model-free parameter)
 The logrank test is not powerful
 Conventional analysis:
 Log-rank test: p=0.47
 Hazard Ratio: HR=0.87 (0.60, 1.27)
What is the alternative way for survival
analysis?
 Using the area under the curve of Kaplan-Meier estimate up
to a fixed time point
 Restricted mean survival time
 Model-free and a global measure of efficacy
 Can be estimated even under heavy censoring
Cancer Study Example
Restricted Mean (up to 40 months):
 35.4 months vs. 33.3 months
 Δ = 2.1 (0.1, 4.2) months; p=0.04
 Ratio of Survival time = 35.4/33.3 = 1.06 (1.00, 1.13)
 Ratio of time lost = 6.7/4.6 = 1.46 (1.02, 2.13)
10. Post-marketing/safety studies ?
 It is not appropriate to use an event driven procedure to
conduct a safety study.
 The event rate is low, the exposure time matters
 Requires lot of resources (large or long-term study)
 Meta analysis; observational studies
CV safety study for anti-diabetes drugs
 Event driven studies, that is, we need to have a pre-specified
# of events so the resulting confidence interval for the
treatment difference is “narrow”
 For example, the upper bound of 95% confidence interval is
less than 1.3
The EXAMINE trial (alogliptin)
NEJM, October 3, 2013
RMST (24 months):
Placebo 21.9 (21.7, 22.2)
Alogliptin 22.0 (21.8, 22.3)
Difference -0.08 (-0.39, 0.24)
Ratio 1.00 (0.98, 1.01)
RMST (30 months):
Placebo 27.1 (26.7, 27.4)
Alogliptin 27.2 (26.9, 27.5)
Difference -0.12 (-0.56, 0.33)
Ratio 1.00 (0.98, 1.01)
What if a smaller study?
95% confidence intervals for various measures
All data
25%
20%
15%
N=16492
N=4123
N=3298
N=2427
(0.89, 1.12)
(0.80, 1.26)
(0.78, 1.28)
(0.76, 1.36)
Difference in event rate at
Day 900 [%]
(-1.2, 0.9)
(-2.3, 2.0)
(-2.6, 2.2)
(-2.9, 2.6)
Difference in RMST at Day
900 [days]
(-5, 4)
(-9, 9)
(-11, 10)
(-12, 12)
Hazard Ratio
34
 11. Meta analysis for safety issues
 Nissen and Wolski (2007) performed a meta analysis to examine
whether Rosiglitazone (Avandia, GSK), a drug for treating type 2
diabetes mellitus, significantly increases the risk of MI or CVD
related death.
Example
Effect of Rosiglitazone on MI or CVD Deaths
 Avandia was introduced in 1999 and is widely used as
monotherapy or in fixed-dose combinations with either
Avandamet or Avandaryl.
 The original approval of Avandia was based on its ability in
reducing blood glucose and glycated hemoglobin levels.
 Initial studies were not adequately powered to determine the
effects of this agent on micro- or macro- vascular complications of
diabetes, including cardiovascular morbidity and mortality.
Example
Effect of Rosiglitazone on MI or CVD Deaths
 However, the effect of any anti-diabetic therapy on
cardiovascular outcomes is particularly important because more
than 65% of deaths in patients with diabetes are from
cardiovascular causes.
 Of 116 screened studies, 48 satisfied the inclusion criteria for the
analysis proposed in Nissen and Wolski (2007).
 42 studies were reported in Nissen and Wolski (2007), the remaining 6
studies have zero MI or CVD death
 10 studies with zero MI events
 25 studies with zero CVD related deaths
 Event Rates from 0% to 2.70% for MI
 Event Rates from 0% to 1.75% for CVD Death
MI
CVD Death
???
???
Log Odds Ratio
Log Odds Ratio
95% CI: (1.03, 1.98); p-value = 0.03
(in favor of the control)
95% CI: (0.98, 2.74); p-value = 0.06
Questions
 Rare events?
 How to utilize studies with 0/0 events?
 Validity of asymptotic inference?
 Exact inference?
 Choice of effect measure?
 Between Study Heterogeneity?
 Common treatment effect or study specific treatment effect?
 The number of studies not large?
Asymptotic Inference
MI
Exact Inference
ˆ  0.18%
ˆ  0.19%
95% CI: (-0.08, 0.38)%
P-value = 0.27
95% CI: (0.02, 0.42)%
P-value = 0.03
Asymptotic Inference
CVD Death
Exact Inference
ˆ  0.063%
ˆ  0.11%
95% CI: (-0.13, 0.23)%
P-value = 0.83
95% CI: (0.00, 0.31)%
P-value = 0.05
Summary
 Could we modify our statistical training?
 Teaching young generations “how, where and what to learn”
 Learning from doing a project with mentoring?
 Could we have a coherent approach from the beginning to the end
for a research project?
 George Box: Instead of figuring out the optimal solution to a
wrong problem, try to get A solution to a right problem.
 Asking ourselves “What is the question?”