What is it that the Clinical Statistician thinks about?
Download
Report
Transcript What is it that the Clinical Statistician thinks about?
Informing the selection of futility stopping thresholds:
case study from a late-phase clinical trial
Hughes S, Cuffe RL, Lieftucht A, Nichols WG
Pharmaceutical Statistics 2009; 8: 25-37
Sara Hughes
GSK Head of Clinical Statistics
For PSI/DIA Journal Club, June 2012
[email protected]
Research Example
Constant drive for more efficient clinical trial designs
– Quicker decisions
– Reduced financial & human investment in ‘futile’ drugs / doses
– Patient safety
Adaptive designs receiving much press and research
– Futility designs became viable in late 1970s
– But, limited examples of application in clinical trial literature (at time
of this paper)
Futility case study
– General futility definitions and case study background
– Useful graphical tools created to demonstrate risks of futility design
– Decision analysis developed to aid selection of futility stopping rules
2
Futility Dictionary of Terms
Futility interim analysis: the option to stop a study if the
possibility at the interim stage of ultimately getting a
positive result is remote
– ie “it’s futile to continue - the data looks so bad that no amount of
further data will reverse that - let’s quit now”
Stopping threshold: what result would make us quit?
Various statistical methods exist to quantify probability of
future success (POS) but little guidance available for
selecting optimal values for stopping thresholds
– High threshold few bad trials continue but some good trials
stopped
– Low threshold most good trials continue but so do some
failures
3
HIV Futility Case Study
GSK has an EU license to sell HIV drug Telzir at dose
700mg twice-daily with Ritonovir 100mg twice-daily boosting
– Interested in investigating Telzir 1400mg once-daily with Ritonovir
100mg once-daily boosting
– Once daily dosing would offer increased convenience
– Reduced Ritonovir dose may offer improved safety profile
Study to assess this is large, lengthy and costly
– Futility design reduces the risk of a failed study
– Without high probability of success, can redirect resources to
other research & stop prescribing ineffective dosing regimen
4
Study Design
1:1
randomisation
Stage One (N=200)
Stage Two (N=528)
Investigational dose
Investigational dose
Standard dose
Standard dose
24 week
Interim futility analysis
48 week
Final analysis
Primary endpoint: Non-inferiority on efficacy (proportion with undetectable HIV viral load) Stop After
Stage 1 if POS < X%
Key powered secondary endpoint: Superior on safety (difference of ≥13mg/dL in non-HDL cholesterol)
Stop After Stage 1 if POS <Y%
5
“POS” for Case Study
A variety of statistical stopping methods can be used for calculating
POS (probability of future success):
– frequentist conditional power (calculated under H0, H1, or current trend)
– semi-Bayesian predictive power
– formal group sequential methods
Case study POS: conditional power under current trend
– “Based on the results so far - and assuming these results reflect the truth
- what is the probability of successfully achieving the study objectives at
the end of the study?”
Choice of stopping thresholds more important than choice of method.
We had two challenges:
– How to convey features & risks of futility design to non-statistical
colleagues
– how to derive optimal stopping thresholds?
6
Interpreting Conditional Power
100
Control response: 76%
Control response: 72%
Control response: 68%
Conditional power (%)
80
60
40
20
0
-12
-10
-8
-6
-4
-2
Difference in response rates (%)
0
7
Probability of falsely
stopping
futility
at the interim (%)
(%)
of falseforstop
chance
Impact Of “When” The Interim Occurs
50
POS threshold of 90%
POS threshold of 70%
POS threshold of 50%
POS threshold of 30%
40
30
20
10
0
0
200
400
patients recruited
600
8
Impact of Interim on Trial’s Power for Primary
Efficacy Endpoint
Power (%)
90
80
70
60
10
30
50
70
90
Conditional power threshold (%)
Note: no impact on type I error for futility designs
9
Quantifying Risks of Design
Setting a stopping
threshold of 70% POS
will lead to a 27%
chance of stopping at
the interim if the drug
works and a 10%
chance of continuing
if it doesn’t work
probability of false stop (%)
50
40
90%
30
70%
20
50%
30%
10
10%
0
0
10
20
30
40
50
probability of false go (%)
10
Issues
Clear graphs illustrated risks & benefits of varying stopping
thresholds and timing of interim analysis
But:
Not every ‘successful’ trial is equally good
– eg some results more or less likely to lead to license approval
Wanted to quantitatively include in decision making
information we already had on this new regimen’s
performance (PK and small pilot studies)
Decision analysis combined all these factors in order to
weigh up benefits and risks of each stopping threshold
11
Decision Analysis Step 1:
Categorise possible outcomes & elicit prior expectations
Prior probability
Efficacy
Safety
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
bad
mediocre
good
excellent
Proportion of responders at Wk 24
bad
mediocre good
excellent
Improvement in non-HDL cholesterol (mg/dL)
12
Decision Analysis Step 2:
Calculate predicted distribution of trial outcomes for each choice of
stopping threshold (shown for 80% POS)
efficacy
safety
18%
22%
39%
50%
20%
30%
1%
3%
9%
Stop at interim
Continue to bad results
Continue to mediocre results
8%
Continue to good results
Continue to excellent results
13
Results of Decision Analysis
Using pie-charts for primary efficacy endpoint:
– 50% probability of study continuation given 80% POS stopping
threshold
– Relaxing POS threshold to 70% progresses an additional 5% of
trials, 53% of which go on to good/excellent results
– Relaxing POS threshold to 60% progresses a further 5% of trials,
48% of which go on to good/excellent results
– Relaxing POS threshold to 50% progresses a further 5% of trials,
43% of which go on to good/excellent results
– …
14
Final Stopping Thresholds Selected
Efficacy endpoint: 70% POS
Safety endpoint: 60% POS
Based on our assumptions, we had 38% overall probability
of continuing the study to Stage Two
If study continued, estimated 62% probability of final
good/excellent results for both endpoints
– Compared to 33% probability of good/excellent results with no
futility interim analysis
If stopped correctly for futility, prevented 528/2 subjects
from possibly inferior regimen and saved company approx.
£8million in wasted R&D funds
15
Case Study Conclusions
Futility designs under-utilised but have great potential:
– “playing the winner”, maximising use of limited resources
– Depending on phase of trial and nature of disease and drug being
studied, stopping threshold level may vary considerably
Selection of optimal stopping threshold challenging
– Lack of practical guidance in statistical literature
– Can motivate discussion via informative graphs, simulations and
decision analysis – making this design far more appealing and
acceptable to non-statistical colleagues
Statistical team led the study design development & choice
of threshold work
16