Adaptive Clinical Trial Designs - University of Rochester Medical

Download Report

Transcript Adaptive Clinical Trial Designs - University of Rochester Medical

Adaptive Clinical Trials
Scott Evans, Ph.D.
Harvard University
Muscle Study Group
September 28, 2012
Special Thank You
• Dr. Griggs
• Dr. McDermott
DEFINITION
Adaptive Designs
• Not universally defined
– Broad definition: any design in which key parameters can
be changed during the trial based on data from the current
study or from external sources
– Narrow definition: specific design changes as a result of
planned after interim analyses of treatment responses
Adaptive Designs
• A design feature
– A planned procedure for statistical error and bias control
– Described in the protocol
• Not a substitute for careful planning
– Not a rescue medication
• Fancy adaptations and statistical methods cannot rescue
poorly designed trials
MOTIVATION
Practical Questions During Trial Conduct
• Stop for efficacy or futility?
• Are there subgroups w/ unacceptable toxicity?
• Has medical knowledge changed the scientific validity, medical
importance, ethical acceptability, or equipoise of the trial?
• Should we adjust our design due to inaccurate design assumptions?
– Re-calculate sample size?
– Modify duration of follow-up?
Motivation
“I’ve designed >1000 clinical trials, each time
having to make assumptions about variation,
control-group response rate, etc. in order to
calculate sample size …
Motivation
“I’ve designed >1000 clinical trials, each time
having to make assumptions about variation,
control-group response rate, etc. in order to
calculate sample size …
I have not been right yet.”
Example as DSMB Member
• Trial designed to detect difference between response rates of 90%
(control) and 97.5%
– 7.5% absolute difference
– 486 patients required to have 90% power
• Observed rate of control at interim is 80%
– With N=486, 56% power to detect 7.5% difference
– N=1066 required for 90% power to detect a difference between
80% vs. 87.5%
Motivation
• Answering these questions has:
– Ethical attractiveness
• Safer trials: fewer participants exposed to
inefficacious/harmful therapies
– Economical advantages
• Smaller expected sample sizes
• Shorter trials
– Public health advantages
• Answers may get to the medical community more quickly
What Can be Adapted?
•
•
•
•
•
•
•
Sample size
Drop/add arms
Stop for efficacy or futility
Population enrichment (adapt eligibility criteria)
Randomization probabilities
Doses
Objectives / hypotheses
– E.g., switching between NI and superiority
• Endpoints
• NI margin
When and Where? Trials with:
• High levels of uncertainty/unknowns (e.g. novel interventions)
• Design characteristics (e.g., power) that are sensitive to assumptions
• Long FU: adaptation is feasible and medical practice can change
• Invasive procedures or expensive evaluations
• Serious diseases; high risk treatments
• Vulnerable populations
• Data that serves as the basis for adaptation is available quickly
TRIAL INTEGRITY
Complexity and Acceptability
• Some adaptations are well understood/accepted
• Depends upon
–
–
–
–
Type of adaptation
The data utilized for decision-making
How adaptation is implemented
Who is reviewing data and making the decision to adapt
Threat to Trial Integrity
• LOW
– Adaptations prior to any data analyses
– Adaptations based on
• Baseline data
• External data
• Blinded (aggregate) data
• Nuisance parameters (e.g. variation)
• HIGH
– Unplanned adaptations
– Adaptations based on observed treatment effects
Example: Adaptation based on external data
• ATN 082: Evaluation of Pre-exposure prophylaxis (PREP)
• Randomization to PREP vs. placebo to prevent HIV transmission
– 8/2008: 1st participant enrolled
– 11/22/2010: email notifying results from iPREX trial (Gates Fndtn)
• PREP reduced HIV acquisition in similar trial (NEJM; 11/23/2010)
– 11/23/2010: DSMB call
• Equipoise? Still ethical to randomize and follow?
• Recommendation
– Notify participants and IRBs of iPREX results
– Unblind participants
– Discontinue control arm; offer rollover onto PREP
– Continue enrollment into PREP
Major Scientific Concerns with
Adaptive Designs
• Statistical
– Error control associated with multiplicity
• Operational bias
– Adaptations are visible and could be used to infer trial results,
affecting patient/investigator action during the trial
• E.g., participation, adherence, objectivity of patient ratings, etc.
– Not a statistical source of bias and thus difficult to adjust for
– May cause heterogeneity of results (before vs. after adaptation)
Addressing Concerns
• Statistical
– Methods exist (e.g., group sequential and modern adaptive design
methods for controlling errors)
• Operational bias
– Careful and responsible application of adaptation
– Well-constructed processes
• Control of dissemination of adaptation
• Interim analyses and DSMBs procedures
• The “closed protocol” (protocol team blinded)
– Details regarding the planned adaptation are put into a
separate (limited distribution) document to reduce backcalculation for inferring effects
Example: Industry Trial
• Randomized controlled trial for treating lymphoma
• Conditional power calculated during interim analyses
• Pre-specified sample size adaptation rule (e.g., if low stop for futility;
if very high continue as scheduled; if in the middle there are various
sample size adaptations (or # events)
• DMC can recommend trial continuation but does not specify sample
size (thus nobody can back- calculate treatment effect at interim)
• DMC is kept apprised of enrollment and # of events (event-driven
trial), and DMC says “STOP” when appropriate
Conceptual Issue
• Should we adapt sample size based on
observed treatment effect?
– Trials are designed to detect relevant effects
– Observed effects may not be relevant
– Are we losing sight of clinical relevance?
5/26/2011: Email from FDA Team Leader
• I find many sponsors re-estimate their sample based on the interim
difference. I feel this is incorrect. Sample size should only be reestimated based on mispecification of variability or control rate. The
difference we are trying to detect should be based on clinical input. If
we re-estimate based on observed difference, we may end up with a
trial that shows a statistically significant difference but not a clinically
meaningful difference. Furthermore, I think using this information to
would affect the Type I error rate even if we adjust for the interim
analysis. There seems to be disagreement in FDA as to whether you
can re-estimate based on observed difference I may be in the
minority. I would appreciate your thoughts.
CHANGING ENDPOINTS
Changing Endpoints
• NEJM 2009: Evaluation of 12 reports of trials of Gabapentin
– 8 had a primary endpoint in manuscript different from protocol
– 5 trials failed to report protocol-defined primary outcomes
• ENHANCE Trial: Changes driven by science or business?
– Vytorin vs. simvastatin for preventing atherosclerosis (negative trial)
– Completed in 2006; Registered in clinicaltrials.gov in OCT 2007
– Endpoints entered differed from original design
• Chan et al. (JAMA, 2004) compared published articles with protocols
for 102 randomized trials; 62% of the trials had at least one primary
endpoint that had been changed, introduced, or omitted.
• Changes to endpoints can compromise the scientific integrity of a trial
• Not generally recommended (concern for “cherry-picking” / error inflation)
• New information could merit endpoint changes
– Evolving medical knowledge (long-term trials); results from other trials
or identification of better biomarkers
• Incorporation of up-to-date knowledge into design is theoretically okay if the
decision is “independent” of trial data (e.g., external data)
– Demonstration of independence is difficult
– DSMBs: not be appropriate decision-maker if they have seen the data
ANALYSIS, REPORTING, AND
PRACTICAL ISSUES
Handling Adaptations
• Update documentation
– The protocol (amendment)
– The clinical trial registry (clinicaltrials.gov)
– The monitoring plan
– The statistical analysis plan
Practical Issues
• Budget implications of changing sample size
• Complex drug supply issues
• Resources to conduct interim analyses
– Data cleaning
– Statistical analyses
– Arranging DSMB meetings
• Protecting the blind and restricted access to data
• Perception issues
Analysis Issues
• Interpret cautiously
• Evaluate issues with
• Statistical error control
• Operational bias
• Generalizability
• Evaluate consistency of results before and after adaptation
Reporting / Publishing
• Clearly describe
– The adaptation
– Whether the adaptation was planned or unplanned
– The rationale for the adaptation
– When the adaptation was made
– The data upon which adaptation is based and whether the data
were blinded or unblinded
– The planned process for the adaptation including who made the
decision regarding adaptation
– Deviations from the planned process
– Consistency of results before vs. after the adaptation
• Discuss
– Potential biases induced by the adaptation
– Adequacy of firewalls to protect against operational bias
– The effects on error control and multiplicity context
DSMBs
“It’s probably the toughest job in clinical medicine.
Being on a DSMB requires real cojones.”
Jeffrey Drazen
Editor, NEJM
Forbes, 2012
DSMBs and Adaptive Designs
• Many (MDs and statisticians) don’t understand adaptive design issues
well or appreciate implications of DSMB actions
– Poor DSMB processes can jeopardize trial integrity
• Considerations
– Get DSMB members experienced with adaptive designs
– Statistician chair
– Well-constructed charter
Recent example:
Release of Interim Results
• Vertex has ongoing treatment trial for cystic fibrosis
• Positive results of interim analyses released; trial continued
• Stock price sored
• Executive VP sold stock for 8.8 million profit; other officers too
• Oops! We made a mistake. Results not as positive as reported
• Stock price tumbles…
• Questions regarding interim data practices including DSMB operations
• SEC Investigation ongoing
Recent Example: DSMB Actions Questioned
• J&J prostate cancer drug Zytiga
• Interim results leaned heavily towards positive trial… but results not
significant p=0.08.
• DSMB felt ethical obligation and stopped trial anyway
• Frequentists say trial stopped too early from an evidence perspective
• Some Bayesian argue no problem when you consider prior
• Perception that DSMB is in bed with the company
Predicted Interval Plots (PIPs)
Evans SR, Li L, Wei LJ, “Data Monitoring in Clinical Trials Using Prediction”, Drug
Information Journal, 41:733-742, 2007.
Li L, Evans SR, Uno H, Wei LJ, “Predicted Interval Plots: A Graphical Tool for Data
Monitoring in Clinical Trials”, Statistics in Biopharmaceutical Research, 1:4:348-355, 2009.
RESPONSE-ADAPTIVE
TREATMENT REGIMES
Motivation
• Patient management
– Not a single decision but tailored sequential treatment decisions
(adjustments of therapy over time) based on individual patient
response (transitions of health states based on efficacy, toxicity,
adherence, QOL, etc.)
– Mixture of short-term and long-term outcomes
• Adaptive treatment regime designs
– Compares treatment strategies (of sequential decisions) that are
consistent with clinical practice
Therapy
#1
Eligible
Patients
Therapy
#2
= Randomization
Short-term Response
Responders
Therapy
#1
Non
Responders
Eligible
Patients
Responders
Therapy
#2
Non
Responders
= Randomization
Short-term Response
Responders
Therapy #1
Therapy
#1
Non
Responders
Eligible
Patients
Responders
Therapy
#2
Non
Responders
= Randomization
Therapy #2
Short-term Response
Responders
Therapy #1
Non
Responders
Therapy #3
Therapy
#1
Therapy #4
Eligible
Patients
Responders
Therapy #2
Therapy
#2
Non
Responders
= Randomization
Therapy #3
Therapy #5
Short-term Response
Long-term Response
Responders
Therapy #1
Follow-up
Non
Responders
Therapy #3
Follow-up
Therapy #4
Follow-up
Therapy #2
Follow-up
Therapy #3
Follow-up
Therapy #5
Follow-up
Therapy
#1
Eligible
Patients
Responders
Therapy
#2
Non
Responders
= Randomization
Example: HIV-Associated PML
• Design compares 4 treatment STRATEGIES
–
–
–
–
cART + steroids if IRIS is observed
cART without steroids
Enhanced-cART + steroids if IRIS is observed
Enhanced-cART without steroids
• Step 1
– Randomized to cART or enhanced-cART (cART + enfuvirtide)
– Observe patient response, particularly for IRIS
• Step 2
– If no IRIS then patient continues with therapy
– If IRIS, then randomize to steroids or placebo
Coinfection: PML
Short-term Outcome: IRIS
Long-term Outcome: Survival
Short-term Response
No IRIS
Long-term Response
cART
Follow-up
+ Steroid
Follow-up
+ Placebo
Follow-up
cART+ENF
Follow-up
+ Steroid
Follow-up
+ Placebo
Follow-up
cART
IRIS
PML
No IRIS
cART +
ENF
IRIS
Adaptive Treatment Regimes
• Distinction between the regime (strategy dictating patient
treatment) vs. realized experiences
– Data from individual patients can contribute to multiple strategies
– Patients on the same regime can have different treatment
experiences
• ITT complexity
– Assigning treatment at later stages for patients LFU in early stages
– Should consent patients to agree to ALL sequential randomizations
Summary
• Adaptation is a design feature
– Requires careful and responsible planning
• When used appropriately, adaptive designs can be efficient and
informative
• When used inappropriately, adaptive designs can threaten trial integrity
• Be aware of information apparent to observers and consider actions to
protect trial integrity
– Minimize access to results to control operational bias
Adaptive Statistician
Many collaborating clinicians ask:
Adaptive Statistician
Many collaborating clinicians ask:
“Can we change statisticians? I’m tired of
listening to Evans explain all of the
mistakes we are making.”
…as you can see dear colleagues,
adaptive design is a very easy concept…
Thank you for listening.
BACK-UP
2 STAGE DESIGNS
2-Stage Designs
• “Internal pilot”: Stage 1 vs. Stage II: learn vs. confirm
– Hypothesis generation vs. hypothesis testing
• Efficiency advantage
– Single trial addresses objectives traditionally addressed in two trials
– Eliminates down-time between separate trials (but less thinking time)
– IRB advantage (vs. approval of two trials)
• Classify by whether objectives or endpoints changes across stages
• Important distinction is whether final analyses uses data from both stages
or only Stage II
Seamless Designs (e.g., Phase II/III)
2-Stage Design
Same Objectives and Endpoints
• Stage I: Evaluate preliminary evidence of effect/no effect
• ACTG 269
(Evans et.al., JCO, 2002)
– Phase II single arm trial of oral etoposide for AIDS KS
– Endpoint: tumor response rate (50% decrease in lesion number/size)
– Stage I
• Enroll small number of participants (N=14)
• If response is unacceptably low (0/14), then quit for futility
noting that if true response rate is 20% then <5% chance of
observing 0/14
• Otherwise continue to Stage II (not testing for efficacy)
• Expected sample size is minimized when response is low given
error constraints
– Trial continued w/ final response rate = 36%
Adaptive Randomization
• Randomization schedule cannot be constructed prior to trial initiation
• Treatment allocation depends on:
1. Baseline characteristics, or
2. Responses
• Minimization
– Creates between-treatment-group balance wrt important variables
• “minimizes imbalances”
– Revises the probability of treatment assignment based on baseline
characteristics of the participant and participants already randomized
Adaptive Randomization
• Response adaptive randomization
– Bases treatment assignment probabilities on the observed
responses of participants that are already enrolled
– Feasible with short-term outcomes (e.g., emergency medicine
trials, e.g., stroke, status epilepticus, or traumatic brain injury)
– “Play-the-winner” or “urn design”
• Proportionally more patients are randomized to the more
effective intervention
• May be attractive for this reason
• Disadvantages
– Time trends in response create challenges (e.g., learning effects in
surgery trials)
– Suggests equipoise does not hold
– May be less efficient than group-sequential designs
Adaptive Dose Selection or Duration
• Enroll sequence of cohorts where subsequent cohorts open depending
upon outcome from previous cohort
• A5210: AMD11070 (oral CXCR4 entry inhibitor)
– Accrue 6 participants; if <x DLTs then treat next 6 at next higher dose
• 5277: ITX-5061 HCV entry inhibitor for HCV monoinfection
– 3 doses (25/27/150); 3 durations (3/14/28 days)
– Start with highest dose on cohort of 10 (8 active; 2 placebo)
– If anti-viral activity (4/8 show 1 log drop), then reduce dose
– If no activity; then increase duration
Limitations of Many Traditional Methods
• Over-reliance on p-values without careful consideration of effect
sizes (clinical relevance) and precision
• Inflexible decision rules based ONE endpoint
– Desire to base decisions upon totality of evidence (e.g., safety
data, secondary endpoints, QOL, external data, etc.)
• No formal evaluation of the ramifications of continuing
Predicted Intervals
•
Predict CI at future timepoint (e.g., end of trial
or next interim analysis time) conditional upon:
1. Observed data
2. Assumptions regarding future data (e.g., observed
trend continues, HA is true, H0 is true, best/worst case
scenarios, etc.)
•
Use with repeated confidence interval theory to control false
positive error
NARC 009
Evans et. al., PLoS ONE, 2007.
• Randomized, double-blind, placebo-controlled, multicenter, doseranging study of prosaptide (PRO) for the treatment of HIV-associated
neuropathic pain
• Participants were randomized to 2, 4, 8, 16 mg/d PRO or placebo
administered via subcutaneous injection
• Primary endpoint:
– 6 week change from baseline in weekly average of random daily
Gracely pain scale prompts using an electronic diary
• Designed N= 390 equally allocated between groups
– Interim analysis conducted after 167 participants completed the 6week double-blind treatment period
Interim Analysis Results: NARC 009
Treatment
N
95% CI for
Mean Change
95% CI for
Diff1
95% PI for
Diff2
95% PI for
Diff3
Required
Diff4
Placebo
31
(-0.35, -0.11)
2 mg
34
(-0.21, -0.04)
(-0.04, 0.25)
(-0.01, 0.21)
(-0.16, 0.06)
-0.54
4 mg
34
(-0.38, -0.12)
(-0.19, 0.16)
(-0.14, 0.10)
(-0.23, 0.01)
-0.45
8 mg
32
(-0.18, -0.02)
(-0.01, 0.28)
(0.03, 0.23)
(-0.15, 0.05)
-0.56
16 mg
36
(-0.34, -0.09)
(-0.16, 0.19)
(-0.11, 0.14)
(-0.21, 0.04)
-0.54
1: 95% CI for the difference in mean changes vs. placebo
2: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming current trend
3: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming per protocol, μ placebo = 0.17 and μdrug = -0.34
4: Difference in mean changes needed in the remaining participants for the CI for the difference in mean changes to
exclude zero (in favor of active treatment) at the end of the trial
Predicted Intervals and PIPs
•
•
–
–
–
–
Intuitive
Advantages
Flexible decision making
•
Considering all data (all endpoints, external data, etc.)
Effect sizes and associated precision
•
Clinical relevance and statistical significance
Evaluation of trial with continuation
•
PI width provides information about gain in precision
•
Conditional power
Can be used for all types of endpoints (e.g., binary. Continuous,
event-time) and hypotheses (e.g., superiority or noninferiority)