Early stopping for phase II studies with time-to
Download
Report
Transcript Early stopping for phase II studies with time-to
Early stopping for phase II cancer studies:
a likelihood approach
Elizabeth Garrett-Mayer, PhD
Associate Professor of Biostatistics
The Hollings Cancer Center
The Medical University of South Carolina
[email protected]
1
Motivation
Oncology Phase II studies
Historically,
Single arm
Evaluation of efficacy
‘clinical response’ is the outcome of interest
Evaluated within several months (cycles) of enrollment
Early stopping often incorporated for futility
More recently
targeted agents halt growth
may or may not shrink tumor
‘progression-free survival’ is outcome of interest
extensions for survival evaluation, but not today
2
Early Stopping in Phase II studies:
Binary outcome (clinical response)
Attractive solutions exist for this setting
Common design is Simon’s two-stage
Preserves type I and type II error
Procedure: Enroll N1 patients (stage 1).
(Simon, 1989)
If x or more respond, enroll N2 more (stage 2)
If fewer than x respond, stop.
Appropriate for binary responses
Bayesian approaches also implemented
binary likelihood, beta prior → beta binomial model
other forms possible
requires prior
Lee and Liu: predictive probability design (Clinical Trials,
2008)
3
Alternative approach for early stopping
Use likelihood-based approach
Not that different than Bayesian
Blume (2002))
(Royall (1997),
Parametric model-based
No “penalties” for early looks
But it is different
No prior information included
Early evaluations are relatively simple
“Probability of misleading evidence” controlled
Can make statements about probability of
misleading evidence
4
Today’s talk
Likelihood approach
principles
multiple looks
Focus on binary outcome situation
Can be extended to the time-to-event outcome
setting
parametric survival distributions
issues with length of follow-up
issues with how often to “look”
5
Law of Likelihood
If hypothesis A implies that the probability of
observing some data X is PA(X), and
hypothesis B implies that the probability is
PB(X), then the observation X=x is evidence
supporting A over B if PA(x) > PB(x), and
the likelihood ratio, PA(x)/PB(x), measures the
strength of that evidence.
(Hacking 1965, Royall 1997)
6
Likelihood approach
1.0
0.8
0.6
0.4
0.2
1/8
1/32
0.0
Determine “what the data say” about the parameter of
interest
Likelihood function: gives a picture of the data
Likelihood intervals (LI): gives range of reasonable values
for parameter of interest
Likelihood
0.01
0.02
0.03
0.04
0.05
7
Lambda
Likelihood approach
Likelihood ratios (LR)
1.0
0.8
0.6
0.4
0.2
Take ratio of heights of L for different values of λ
L(λ=0.030)=0.78; L(λ=0.035)=0.03.
LR = 26
0.0
Likelihood
0.01
0.02
0.03
0.04
0.05
8
Lambda
Likelihood-Based Approach
Use likelihood ratio to determine if there is
sufficient evidence in favor of the one or
another hypothesis
Error rates are bounded
Implications: Can look at data frequently
without concern over mounting errors
“Evidence-based”
9
Key difference in likelihood versus
frequentist paradigm
Consideration of the alternative hypothesis
Frequentist hypothesis testing:
Frequentist p-values:
H0: null hypothesis
H1: alternative hypothesis
calculated assuming the null is true,
Have no regard for the alternative hypothesis
Likelihood ratio:
Compares evidence for two hypotheses
Acceptance or rejection of null depends on the
alternative
10
p = 0.01
Reject the null
Likelihood
LR = 1/4
Weak evidence in favor of null
0.8
0.6
0.4
0.2
0.0
Assume H0: λ = 0.12 vs.
H1: λ = 0.08
What if true λ = 0.10?
Simulated data, N=300
Frequentist:
Likelihood
1.0
Example:
0.08
0.09
0.10
0.11
0.12
Lambda
11
Example:
Why?
P-value looks for
evidence against null
LR compares evidence
for both hypotheses
When the “truth” is in the
middle, which makes
more sense?
0.01
1e-04
Likelihood
0.1
1
0.08
0.09
0.10
0.11
0.12
Lambda
12
Likelihood Inference
Weak evidence: at the end of the study, there is
not sufficiently strong evidence in favor of either
hypothesis
This can be controlled by choosing a large enough
sample size
But, if neither hypothesis is correct, can end up with
weak evidence even if N is seemingly large (appropriate)
Strong evidence
Correct evidence: strong evidence in favor of correct
hypothesis
Misleading evidence: strong evidence in favor of
the incorrect hypothesis.
This is our interest today: what is the
probability of misleading evidence?
This is analogous to the alpha (type I) and beta
(type II) errors that frequentists worry about
13
Operating Characteristics
0.6
0.2
0.4
Accept H0
Reject H0
0.0
Probability
0.8
1.0
Simon two-stage
0.1
0.2
0.3
True p
0.4
0.5
0.6
14
Operating Characteristics
0.6
0.2
0.4
Accept H0
Accept HA
Weak Evidence
0.0
Probability
0.8
1.0
Likelihood Approach
0.1
0.2
0.3
True p
0.4
0.5
0.6
15
Misleading Evidence in Likelihood Paradigm
Universal bound: Under H0,
P
L1
L0
k
1
k
(Birnbaum, 1962; Smith, 1953)
In words, the probability that the likelihood ratio
exceeds k in favor of the wrong hypothesis can be no
larger than 1/k.
In certain cases, an even lower bound applies (Royall,2000)
Difference between normal means
Large sample size
Common choices for k are 8 (strong), 10, 32 (very
strong).
16
Implications
Important result: For a sequence of independent
observations, the universal bound still holds
(Robbins, 1970)
Implication: We can look at the data as often as
desired and our probability of misleading
evidence is bounded
That is, if k=10, the probability of misleading
strong evidence is ≤ 10%
Reasonable bound: Considering β = 10-20% and
α = 5-10% in most studies
17
Early stopping in phase II study:
binary outcome
Motivating Example
Single arm cancer clinical trial
outcome = clinical response
Early stopping for futility
Standard frequentist approach
Simon two-stage design
Only one look at the data
Determine “optimality” criterion
minimax
minimum E(N) under H0 (Simon’s optimal)
Likelihood approach
Use binomial likelihood
Can look at the data after each observation
18
Motivating Example
New cancer treatment agent
Anticipated response rate is 40%
Null response rate is 20%
the standard of care yields 20%
not worth pursuing new treatment with same response
rate as current treatment
Using frequentist approach:
Simon two-stage with alpha = beta = 10%
Optimum criterion: smallest E(N)
First stage: enroll 17. if 4 or more respond, continue
Second stage: enroll 20. if 11 or more respond,
conclude success.
19
Likelihood Approach
Recall: we can look after each patient at the
data
Use the binomial likelihood to compare two
hypotheses.
Difference in the log-likelihoods provides the log
likelihood ratio
Simplifies to something simple
log L1 log L0 yi log p1 log p0 log(1 p1 ) log(1 p0 )
N log(1 p1 ) log(1 p0 )
20
Implementation
Look at the data after each patient
Estimate the difference in logL0 and logL1
Rules:
if logL0 – logL1 > log(k): stop for futility
if logL0 – logL1 < log(k): continue
21
Likelihood Approach
But, given discrete nature, only certain looks
provide an opportunity to stop
Current example: stop the study if…
0
1
2
3
4
5
6
7
responses in 9 patients
response in 12 patients
responses in 15 patients
responses in 19 patients
responses in 22 patients
responses in 26 patients
responses in 29 patients
responses in 32 patients
Although total N can be as large as 37, there are
only 8 thresholds for futility early stopping
assessment
22
Design Performance Characteristics
How does the proposed approach compare
to the optimal Simon two-stage design?
What are performance characteristics we
would be interested in?
small E(N) under the null hypothesis
frequent stopping under null (similar to above)
infrequent stopping under alternative
acceptance of H1 under H1
acceptance of H0 under H0
23
Example 1: Simon Designs
H0: p = 0.20 vs. H1: p = 0.40. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:
Stage 1: N1 = 17, r2=3
Stage 2: N = 37, r=10
Enroll 17 in stage 1. Stop if 3 or fewer responses.
If more than three responses, enroll to a total N of 37.
Reject H0 if more than 10 responses observed in 37 patients
Minimax Design:
Stage 1: N1 = 22, r2=4
Stage 2: N = 36, r=10
Enroll 22 in stage 1. Stop if 4 or fewer responses.
If more than four responses, enroll to a total N of 36.
Reject H0 if more than 10 responses observed in 36 patients
24
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Optimal vs. Likelihood (N=37)
0.1
0.2
0.3
0.4
True p
0.5
0.6
25
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Minimax vs. Likelihood (N=36)
0.1
0.2
0.3
0.4
True p
0.5
0.6
26
0.8
0.2
0.4
0.6
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
0.0
Probability of Stopping Early
1.0
Probability of Early Stopping
0.1
0.2
0.3
0.4
True p
0.5
0.6
27
30
15
20
25
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
10
Expected Sample Size
35
Expected Sample Size
0.2
0.4
True p
0.6
0.8
28
Another scenario
Lower chance of success
H0: p = 0.05 vs. H1: p = 0.20
Now, only 3 criteria for stopping:
0 out of 14
1 out of 23
2 out of 32
29
Simon Designs
H0: p = 0.05 vs. H1: p = 0.20. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:
Stage 1: N1 = 12, r2=0
Stage 2: N = 37, r=3
Enroll 12 in stage 1. Stop if 0 responses.
If at least one response, enroll to a total N of 37.
Reject H0 if more than 3 responses observed in 37 patients
Minimax Design:
Stage 1: N1 = 18, r2=0
Stage 2: N = 32, r=3
Enroll 18 in stage 1. Stop if 0 responses.
If at least one response, enroll to a total N of 32.
Reject H0 if more than 3 responses observed in 32 patients
30
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Optimal vs. Likelihood (N=37)
0.0
0.1
0.2
True p
0.3
0.4
31
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Minimax vs. Likelihood (N=32)
0.0
0.1
0.2
True p
0.3
0.4
32
1.0
0.8
0.2
0.4
0.6
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
0.0
Probability of Stopping Early
Probability of Early Stopping
0.00
0.05
0.10
0.15
True p
0.20
0.25
0.30
33
35
30
20
25
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
15
Expected Sample Size
Expected Sample Size
0.0
0.1
0.2
0.3
True p
0.4
0.5
0.6
34
Last scenario
Higher chance of success
H0: p = 0.40 vs. H1: p = 0.60
Now, 21 criteria for stopping:
0 out of 6
1 out of 8
2 out of 10
3 out of 12
4 out of 4
5 out of 16
6 out of 18
...
20 out 46
35
Simon Designs
H0: p = 0.40 vs. H1: p = 0.60. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:
Stage 1: N1 = 18, r2=7
Stage 2: N = 46, r=22
Enroll 18 in stage 1. Stop if 7 or fewer responses.
If more than 7 responses, enroll to a total N of 46.
Reject H0 if more than 22 responses observed in 46 patients
Minimax Design:
Stage 1: N1 = 28, r2=11
Stage 2: N = 41, r=20
Enroll 28 in stage 1. Stop if 11 or fewer responses.
If more than 11 responses, enroll to a total N of 41.
Reject H0 if more than 20 responses observed in 41 patients
36
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Optimal vs. Likelihood (N=46)
0.2
0.3
0.4
0.5
True p
0.6
0.7
0.8
37
0.6
0.2
0.4
Accept HA, Lik
Accept H0, Lik
Weak Evidence
Accept HA, Simon
Accept H0, Simon
0.0
Probability
0.8
1.0
Simon Minimax vs. Likelihood (N=41)
0.2
0.3
0.4
0.5
True p
0.6
0.7
0.8
38
1.0
0.8
0.2
0.4
0.6
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
0.0
Probability of Stopping Early
Probability of Early Stopping
0.2
0.3
0.4
0.5
True p
0.6
0.7
0.8
39
40
30
20
Likelihood (optimal N)
Likelihood (minmax N)
Simon Optimal
Simon MinMax
10
Expected Sample Size
Expected Sample Size
0.2
0.3
0.4
0.5
True p
0.6
0.7
0.8
40
More on the predictive probability approach
Lee and Liu, Clinical Trials, 2008.
Bayesian but without loss function (no Bayes
risk)
Searches for design parameters to ensure size
and power
Prior is chosen so that mean of prior is the null
hypothesis, but weak.
Predictive probability (PP) = probability that the
end result of the trial is positive given current
data and data to be observed
Based on the probability that the true response rate is
greater than the null response rate.
Again, ignores the alternative
41
More on the predictive probability approach
Stopping:
if PP < θL : then stop trial and reject alternative
if PP > θU: stop trial and reject the null (but often θU = 1)
At pre-specified times, the predictive probability
is calculated
Lee and Liu explore different frequency of stopping
Comparisons here are for looking after every patient
θT is defined as the threshold for determining
efficacy at the trial’s end
θT and θU do not have the same stringency
42
Example of Predictive Probability Design
43
Comparison with Predicted Probability
Minimax Sample Size
44
Comparison with Predicted Probability
Optimal Sample Size
45
Summary and Conclusions (1)
Likelihood based stopping provides another option for trial
design in phase II single arm studies
We only considered 1 value of K
Overall, sample size is smaller
chosen to be comparable to frequentist approach
other values will lead to more/less conservative results
extension: different K for early stopping versus final go/no go
decision
especially marked when you want to stop for futility
when early stopping is not expected, not much difference in
sample size
For ‘ambiguous’ cases:
likelihood approach stops early more often than Simon
In minimax designs, finds ‘weak’ evidence frequently
46
Summary and Conclusions (2)
‘r’ for final analysis is generally smaller.
why?
the notion of comparing hypotheses instead of
conditioning only on the null.
Comparison the the PP approach is
favorable
likelihood stopping is less time consuming and
less computationally intensive
LS does not require specification of a prior
“search” for designs in relatively simple
47
Thank you for your attention!
[email protected]
48
Early Stopping in Phase II studies:
time-to-event outcomes
Disease stabilization
More common with novel treatments,
targeted therapies
Example: targeting stem cells
If treatment works, cancer does not
progress
But, “bulk” may still remain
Time-to-progression is relevant outcome
But, takes a long time to evaluate…
49
One suggested/common approach
Apply Simon’s two-stage design
Example:
1 year PFS of 0.30 versus 0.50 (α = β = 0.10)
Enroll 20 patients
If 6 or more are PF at 1 year, enroll an additional 22 for
a total of 42 patients.
Study design
Assume trial will take 2 years to accrue (21 patients per
year)
First 20 patients will be enrolled by end of year 1
20th patient should be evaluable for 1 year PFS at end of
year 2.
50
One suggested/common approach
So, what’s the problem?
Problem 1: By the end of year 2, almost all of the
additional 22 patients will have been enrolled, yet the
stage 1 patients have just become evaluable.
Problem 2: if the trial needs to be suspended after 20
patients (to wait for events), investigators may need to
stop enrollment for 1 year.
51
Current approaches for early stopping
with TTE outcomes
Bayesian approaches (Thall et al., 2005)
Frequentist approaches (Case and Morgan, 2003)
Ad hoc approaches
Use related outcome (e.g., clinical response)
Spend a little alpha early and evaluate:
At a prespecified time
When a prespecified number of patients have
reached a landmark time (e.g. 1 year)
When a prespecified number of patients have been
enrolled
52
Early stopping in phase II study:
time to event outcome
Motivating Example
Standard frequentist approach
Single arm
Time-to-event outcome
Early stopping for futility
Non-parametric (i.e., no model)
Kaplan-Meier estimate of 6 mo. PFS
“Robust”, but not powerful!
Likelihood approach
Requires a parametric model (like the
Bayesians!)
53
Model Choice Considerations
Trade-off: One-parameter vs. >One-parameter
model
Parsimony versus fit
Bias versus variance
Small amount of data: cannot tolerate many
parameters
Exponential (one-parameter) obvious choice
Some other options:
Weibull
Log-normal
Cure-rate
54
Critical Issue
Decision to stop must be robust to model
misspecification
“Robustifying” likelihood (Royall and Tsou, 2003)
Not appropriate here: exponential with
censoring does not meet criteria
Further study needed to see early
stopping behavior when model is
misspecified
55
Exponential Model and Likelihood
probability density: f (t | ) e t
survival function: S (t ) e t
Log - likelihood function: L( ; t , d )
Maximum likelihood estimate:
d
t
N
d log t
i 1
i
i
i
i
56
Simulations
Need comparability across distributions of
simulated data
Chose underlying distributions with same 6
month survival
Exponential
Weibull: one with larger variance, one with smaller
Log-normal: one with larger variance, one with smaller
Cure-rate
Working model: exponential distribution
Simulations: data generated assuming treatment
lacks efficacy
57
Comparison of underlying distributions
15
25
35
0.8
0.0
0.4
S(t)
0.0
0.4
S(t)
0.4
0.0
0 5
15
25
35
0 5
15
25
35
Cure Rate, 0.13, 30%
Weibull, 1/17, 0.7
LogNormal, 2.4, 2.0
15
25
35
0.0
0.0
0 5
0.4
S(t)
S(t)
0.4
0.0
0.8
Time (months)
0.8
Time (months)
0.8
Time (months)
0.4
S(t)
0 5
S(t)
LogNormal, 2, 0.69
0.8
Weibull, 1/10, 1.43
0.8
Exponential, 0.08
Black: true distn
Red: best exponential
0 5
15
25
35
0 5
15
25
35
58
Time (months)
Time (months)
Time (months)
Simulation study 1
Null hypothesis is true: should stop early for futility in large
fraction of trials
Three ways to characterize hypotheses:
H0: 6 mo PFS = 62% vs. H1: 6 mo PFS = 74%
H0: E(t) = 12.5 mo vs. H1: E(t) = 20 mo
H0: λ = 0.08 vs. H1: λ = 0.05
N = 100
Starting with 25th patient, analyze data every 5th enrollment
Censoring is assumed to be administrative
24 months of enrollment (assuming no early stopping)
Total study time 36 months (24 month accrual, 12 month
F.U.)
Use likelihood intervals of 1/10
59
Stopping Rule
Stop if the likelihood ratio < 1/10
That is, if the ratio of the likelihood for the NULL
to the ALTERNATIVE is 10, then stop.
Note 1: ONLY considering stopping for futility!
Note 2: based on universal bound, we have a
less than 10% chance of strong evidence in favor
of the wrong hypothesis
Note 3: based on Royall (2000), probably have
even less than that….
60
Simulated Data Examples
No stopping
10
0.25
1
Likelihood Ratio
1
0.1
0.25
0.1
Likelihood Ratio
4
4
10
Stop at N=55
25
35
45
55
65
75
85
Number of Patients Enrolled
95
25
35
45
55
65
75
85
Number of Patients Enrolled
95
61
Frequentist Properties of Simulation Study
N=100, H0: λ = 0.08 vs. H1: λ = 0.05
Using exponential test and assuming exponential
data:
Using non-parametric test, and assuming
exponential data:
Alpha = 5%
Power = 98%
Alpha = 5%
Power = 78%
No interim analyses included
62
Why not look before 25 patients?
End
month
1
End
month
2
End
month
3
End
month
4
End
month
5
End
month
6
Total enrolled
4
8
12
16
21
25
≥ 1 month f.u
0
4
8
12
16
21
≥ 2 month f.u
0
0
4
8
12
16
≥ 3 month f.u
0
0
0
4
8
12
≥ 4 month f.u
0
0
0
0
4
8
≥ 5 month f.u
0
0
0
0
0
4
≥ 6 month f.u
0
0
0
0
0
0
63
Simulations
Blue: 12 month estimate
Red: 60 month estimate
15
25
35
0.8
0.0
0.4
S(t)
0.0
0.4
S(t)
0.4
0.0
0 5
15
25
35
0 5
15
25
35
Cure Rate, 0.13, 30%
Weibull, 1/17, 0.7
LogNormal, 2.4, 2.0
15
25
Time (months)
35
0.0
0.0
0 5
0.4
S(t)
S(t)
0.4
0.0
0.8
Time (months)
0.8
Time (months)
0.8
Time (months)
0.4
S(t)
0 5
S(t)
LogNormal, 2, 0.69
0.8
Weibull, 1/10, 1.43
0.8
Exponential, 0.08
solid black: true distn
dashed: hypotheses
0 5
15
25
Time (months)
35
0 5
15
25
Time (months)
35
64
Early Stopping
Median N = 60
% stopped = 87
100
40
10
4
2000
1000
400
Frequency
10
4
100
40
10
4
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
Median N = 60
% stopped = 99
Total sample size
Total sample size
Total sample size
Cure Rate 1
Weibull 2
Log-Normal 2
100
40
10
4
Median N = 35
% stopped = 97
100
40
10
4
Median N = 55
% stopped = 62
100
40
10
4
0
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
2000
1000
400
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
2000
1000
400
Frequency
Median N = 100
% stopped = 4
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Frequency
2000
1000
400
100
40
2000
1000
400
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
Median N = 85
% stopped = 64
Log-Normal 1
Frequency
Frequency
2000
1000
400
Weibull 1
Frequency
Exponential
Total sample size
Total sample size
Total sample size
65
Likelihood ratios:
<1/32
[1/32, 1/10) [1/10, 1) [1,10) [10,32) >32
Exponential*
0.20
0.76
0.03
0.01
<0.01
<0.01
Weibull 1
0.47
0.53
<0.01
<0.01
<0.01
<0.01
Log-Normal 1 0.27
0.73
<0.01
<0.01
<0.01
<0.01
Cure Rate
<0.01
0.04
<0.01
<0.01
<0.01
0.96
Weibull 2
0.18
0.80
0.01
0.01
<0.01
0.01
Log-Normal 2 0.06
0.55
<0.01
<0.01
0.01
0.37
66
Frequentist Approach: Exponential Data
Based on observed data (stopped and
completed trials)
0.55% of trials showed significant p-value
(versus 0.45% with LR>10)
Agreement of 99.6% for hypothesis testing
decision
High agreement in inferences
67
Additional simulations
Early stopping is critical when we have a
rate that is even WORSE than the null
Example:
We are testing 62% vs. 74% 6 month PFS
What if true 6 month PFS based on our
regimen is only 55%? Or 49%?
What is the chance of early stopping in these
cases?
Simple scenario: exponential data,
exponential model
68
Early Stopping:
H0: 6 mo PFS = 62% vs. H1: 6 mo PFS = 74%
6 mo PFS = 55%
100
40
10
4
400
Median N = 35
% stopped = 99.9
100
40
10
4
0
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Frequency
400
Median N = 40
% stopped = 99.8
2000
1000
Frequency
2000
1000
6 mo PFS = 49%
69
Total sample size
Total sample size
Likelihood Ratios
<1/32
[1/32, 1/10) [1/10, 1) [1,10)
[10,32 >32
)
55% 6 mo
PFS
0.19
0.81
<0.01
<0.01
<0.01
<0.01
49% 6 mo
PFS
0.26
0.74
<0.01
<0.01
<0.01
<0.01
70
Summary and Conclusions (2)
Properties are consistent with what we expected
When we have exponential data and k=10:
We stop early OFTEN when we should
We RARELY stop early when we shouldn’t
But, we need to be careful…
We need a full understanding of the expected and
observed survival distribution
If we have model misspecification, we could run into
trouble
Not unrealistic: breast cancer—cure rate might be bestfitting
Quantifying simply by one point in time (e.g. 6 month
PFS) could be dangerous
Should elicit several PFS at several times from clinical
investigator
71
Summary and Conclusions (3)
This is the perfect example of why we need
to work in close collaboration with
oncologists
Need to get a good appreciation for the
anticipated distribution
Early stopping should be carefully considered
based on observed data
Implementation issues
Probably will not be able to do this in an “off-theshelf” way
High-maintenance for the statistician
Better for patients
Better for Cancer Center (resources)
72
Future work in TTE
Feasibility of 2-parameter models
In practice, can we fix one parameter?
Preliminary data should give us a sense of the shape
Interval censoring
Different censoring mechanisms
Larger deviations from exponential (how
common?)
Looks: when to start and how often?
Study design guidelines (e.g. sample size)
73
References
Case and Morgan (2003) Design of Phase II cancer trials evaluating survival
probabilities. BMC Medical Research Methodology; v. 3.
Birnbaum (1962) On the Foundations of Statistical Inference (with discussion). JASA,
53, 259-326.
Blume (2002) Likelihood Methods for Measuring Statistical Evidence, Statistics in
Medicine, 21, 2563-2599.
Hacking (1965) Logic of Statistical Inference, New York: Cambridge Univ Press.
Royall (1997) Statistical Evidence:A Likelihood Paradigm, London, Chapman & Hall.
Royall (2000) On the Probability of Misleading Statistical Evidence, JASA, 95; 760768.
Royall and Tsou (2003) Interpreting statistical evidence by using imperfect models:
robust adjusted likelihood functions. JRSS-B; 65(2), 391-404.
Simon (1989) Optimal Two-Stage Designs for Phase II Clinical Trials. Controlled
Clinical Trials; 10,1-10.
Smith (1953) The Detection of Linkage in Human Genetics. JRSS-B, 15, 153-192.
Thall, Wooten and Tannir (2005) Monitoring event times in early phase clinical
trials: some practical issues. Clinical Trials; v. 2, 467-478.
74