Transcript LECTURE 23

Interim Analyses of
Clinical Trials
A Requirement
Outline
• Background and how DSMBs arose
and function
• Group sequential methods
• Examples
References
• Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring
Committees in Clinical Trials, Wiley, 2002.
• DeMets DL, Furberg CD, Friedman LM. Data
Monitoring in Clinical Trials. A Case Studies
Approach, Springer, 2006.
• Jennison C and Turnbull BW, Group Sequential
Methods with Applications to Clinical Trials, Chapman
and Hall, 2000.
• Proschan MA, Lan KKG, Wittes J, Statistical Monitoring
of Trials. A Unified Approach, 2006, Springer.
• http://www.biostat.wisc.edu/landemets
Structure for Cooperative Studies
(Greenberg Report)
Policy Board
or
Advisory Committee
National Advisory Heart
Council
Institute
staff
Executive Committee
or
Steering Committee
Coordinating
Center
Participating Units
Cont Clinical Trials 9:137-48, 1988.
Initial
review
group
Monitoring Committee Acronyms
• PAB = Policy advisory board
• DSMB = Data and Safety Monitoring Board
• DMC = Data Monitoring Committee
• ESMB = Efficacy and safety monitoring board
• OSMB = Observational study monitoring board
Responsibilities
• Steering/Executive
Committee/Protocol
Team
– Study design
– Patient recruitment and
follow-up
– Data collection
– Quality assurance
– Review of external data
– Study reports
• DMC or DSMB
– Safety of patients
– Protection of integrity of
study
– Review of blinded data on
safety and efficacy of
treatments
– Review of trial conduct,
amendments and external
data
DMCs are responsible to patients, investigators
IRBs, regulatory agencies and sponsor.
Data Monitoring Rationale
• Accumulating data needs to be monitored
for risk/benefit (Safety is best assured by
comparing the rate of adverse events with
a control group)
• Reasons:
– Ethical : do not expose participants to an inferior
intervention longer than needed to test hypothesis
– Scientific: assessment of relevance of question (e.g.,
external data), design assumptions, logistical
problems.
– Economic: do not waste financial or human resources
for a futile trial.
Reasons for Early Termination
of Clinical Trials
• Based on accumulated data from the trial:
– Unequivocal evidence of treatment benefit or harm
– Unexpected, unacceptable side effects
– No emerging trends and no reasonable chance of
demonstrating benefit
• Based on overall progress of the trial:
– Failure to include enough patients at a sufficient rate
– Lack of compliance in a large number of patients
– Poor follow-up
– Poor data quality
Today
• All NIH sponsored clinical trials are required to have a
data monitoring plan
• NIH-sponsored trials with clinical endpoints have a
DSMB
• Many industry sponsored studies have a DSMB
• The FDA has prepared a guidance document
(Establishment and Operation of Clinical Trial Data
Monitoring Committees)
http://www.fda.gov/RegulatoryInformation/Guidances/ucm127069.htm
• There is variation in operating procedures for DSMBs
When is an Independent
DSMB Needed
• Early phase studies
– Monitoring usually at local level; independent DMC not
usually needed.
• Phase III & IV studies with morbidity/mortality outcomes;
pivotal phase III trials
• Frail populations, e.g., children, elderly
• Trial with substantial uncertainty about safety, e.g., gene
therapy
See FDA Guidance and ICH/E9, section 4.5.
DSMB
Composition: Multidisciplinary
• Clinical experts in the subject matter area
• Biostatisticians with expertise in clinical trials
and preferably in the subject matter area
• Others depending on the nature of the study,
e.g., ethicist, pharmacologist, patient advocate
Senior investigators without significant conflicts
of interest
Independence of DSMB:
• Voting members should not be part of the
investigative team or work for the sponsor
• There should be a clear “need to know” policy
for non-DSMB members, e.g., the statistician
preparing interim summaries needs to know and
may be an employee of the sponsor or member
of the investigative team
• Members should state potential conflicts
This view is not shared by all. See Meinert CL and
discussion, Cont Clin Trials, 1998
Typical DSMB Meeting Format
• Open Session
– Progress report using open data (no outcome data by
treatment group)
– Sponsor, e.g., NIH, Executive Committee, Protocol Chairs,
DSMB and unblinded statisticians
• Closed Session
– Outcome data by treatment group (usually coded)
– DSMB and unblinded statisticians only
• Executive Session (DSMB only)
• Debriefing Session
– DSMB, Sponsor, Executive Committee, Protocol Chairs,
and unblinded statisticians
DSMB Confidentiality
• Interim data reviewed by the DSMB must remain confidential
• Members must not share interim data with anyone outside
DSMB
• Leaks can affect
– Patient recruitment
– Protocol compliance
– Outcome assessment
– Trial integrity and support
DMC Recommendations
• Continue the study unmodified
• Modify the study protocol
• Terminate the study
– Serious toxicity
– Clear benefit
– Futility
– Design/logistical problems
Outline
• Background and how DSMBs
function
• Group sequential methods
• Examples
DSMB Decision Making Can Be
Complex
•
•
•
•
•
•
Internal consistency
Benefit/Risk
External consistency
Current versus future patients
Clinical and public health impact
Statistical issues – monitoring guidelines
Overall Probability of Achieving a Result
with Given Nominal Significance of 0.05
After N Repeated Tests Under Ho
No. of Tests (N)
Probability
1
2
3
4
5
10
25
Ref: McPherson, NEJM, 1974.
.05
.083
.107
.126
.142
.193
.266
Value of Nominal Significance Level
Necessary to Achieve a True Level of 0.05
After N Repeated Tests
No. of Tests (N)
Significance Level
Which Should be Used
1
2
3
4
5
10
Ref: McPherson, NEJM, 1974.
.05
.0296
.0221
.0183
.0159
.0107
Early Work
• Acceptance sampling
• Wald (1947) sequential probability ratio test
Manufacturing problems, continuous
monitoring of the data, no upper bound on
sample size
Group Sequential Methods
• Calculate a summary statistics (e.g., Z for
logrank test) on each additional new group of
participants (events)
• Compare the test statistic to a critical value
that preserves overall type 1 error (e.g., 0.05).
Critical Values (z) for 2-sided Group
Sequential Design with .05 Overall
Significance and 7 Looks
Interim
O-Brien/
Analysis Pocock Fleming
1
2
3
4
5
6
7
2.49
2.49
2.49
2.49
2.49
2.49
2.49
5.46
3.85
3.15
2.73
2.44
2.23
2.06
Haybittle/
Peto
3.0
3.0
3.0
3.0
3.0
3.0
1.96 (2.00)
Critical Values
O’Brien-Fleming
No. of
Looks
Look
2
1
2.178
.029
2.797
.005
3.290
.001
2
2.178
.029
1.977
.048
1.962
.050
1
2.289
.022
3.471
.0005
3.290
.001
2
2.289
.022
2.454
.014
3.290
.001
3
2.289
.022
2.004
.045
1.964
.050
1
2.361
.018
4.049
.0001
3.290
.001
2
2.361
.018
2.863
.004
3.290
.001
3
2.361
.018
2.338
.019
3.290
.001
4
2.361
.018
2.024
.043
1.967
.049
1
2.413
.016
4.562
.00001
3.290
.001
2
2.413
.016
3.226
.0013
3.290
.001
3
2.413
.016
2.634
.008
3.290
.001
4
2.413
.016
2.281
.023
3.290
.001
5
2.413
.016
2.040
.041
1.967
.049
3
4
5
Pocock
Z
P
Z
P
Peto
Z
P
Choosing Critical Values
 Choose the values c1 , . . . , c k so that :
Pr  Z1  c1 , . . . , Z k  ck ;   0   1  
or
Pr  Z1  c1 , or Z 2  c2 ...or Z k  ck ;   0   
 Pocock (1977)
 Use the same boundary value at each look
 Reject H 0 the first time when
Z k  c p or equivalently S k  c p k
 O' Brien and Fleming (1979)
 Use larger boundary values at earlier looks
 It is hard to reject H 0 early in the study
 The final test is similar to a fixed sample test
 Reject H 0 the first time when
Z k  cB
K / k  or equivalently S k
 cB k
General Approach
• Compute sample size as if a single look (fixed sample
approach)
• Specify number of interim analyses and stopping boundary
(usually OBF).
• Inflate sample size to preserve assumed power using
constants in table (not always done as adjustment is minor).
• Compute the standardized statistic Zk at each analysis and
compare with critical values corresponding to monitoring
boundary chosen.
• At the end or upon early termination determine P-values and
confidence intervals in the usual manner.
Problems with Initial Approach
• Difficult to specify number of analyses in
advance
• Logistically difficult to organize reviews after
equal increments of information.
Solutions: Slud and Wei and Lan-DeMets
Flexible Approaches
• Slud and Wei (JASA, 1982) – specify exit probabilities
for each look (stage) such that they sum to , e.g., the
prob of exiting the kth stage is the joint prob of not
exiting the 1st k-1 stages and exiting the kth one.
• Lan-DeMets (Biometrika, 1983) – specify a use function
or type I error spending function, e.g., at time zero, 
used = 0 and with full information  used = 0.05 (or
nominal level)
Spending Function  (t)
Alpha
.05
(t2 )

(t )
}
(t1 )
.0
t1
t2
1
Information Fraction
 spending function plotted over fraction of total
information to be obtained in the study, evaluated at two
arbitrary points, t1 and t2 in the study
t=
(number of events observed at monitoring)
(total number of anticipated events)
Cont Clin Trial 2000;21:190-207
Critical Values
O’Brien-Fleming
No. of
Looks
Look
2
1
2.178
.029
2.797
.005
3.290
.001
2
2.178
.029
1.977
.048
1.962
.050
1
2.289
.022
3.471
.0005
3.290
.001
2
2.289
.022
2.454
.014
3.290
.001
3
2.289
.022
2.004
.045
1.964
.050
1
2.361
.018
4.049
.0001
3.290
.001
2
2.361
.018
2.863
.004
3.290
.001
3
2.361
.018
2.338
.019
3.290
.001
4
2.361
.018
2.024
.043
1.967
.049
1
2.413
.016
4.562
.00001
3.290
.001
2
2.413
.016
3.226
.0013
3.290
.001
3
2.413
.016
2.634
.008
3.290
.001
4
2.413
.016
2.281
.023
3.290
.001
5
2.413
.016
2.040
.041
1.967
.049
3
4
5
Pocock
Z
P
Z
P
Peto
Z
P
Plots of Pocock-type and O’Brien Fleming-type spending
functions for a one-sided 0.025 significance level,
for four analyses at 25%, 50%, 75% and 100% of the
expected information.
0.025
Spending Functions
Pocock
Alpha
0.02
0.015
0.01
OBF
0.005
0
0
0.25
0.5
0.5
Information Fraction
1
Approximate O’Brien Fleming Boundaries Using LanDeMets Spending Function Approach: Overall
Significance =0.05 and 4 Looks
Interim
O-Brien
OBF
Analysis Fleming Lan-DeMets
1
2
3
4
4.05
2.86
2.34
2.02
4.33
2.96
2.36
2.01
Usual Choices for Information
• Planned number of events in event-driven trial
with common closing date chosen to achieve
event target.
• Follow-up time, e.g., percent of participants
attending final follow-up visit in trial with fixed
follow-up for each participant.
• Calendar time, e.g., trial with common calendar
closing date (e.g., to ensure some minimum
follow-up for each participant) but not eventdriven.
Beta-Blocker Heart Attack Trial (BHAT)
• Placebo-controlled trial of propranolol in patients with
a recent MI
• Recruitment began in June 1978; planned termination
June 1982; average of 3 years of follow-up and
maximum of 4
• Primary endpoint – all-cause mortality
• Event target - 629 deaths
• Stopped early in October 1981
JAMA 1982; 247:1707-1714.
Interim Monitoring of BHAT Study
Look Monitoring
Months
Cumulative Logrank
Number
Date
Since Start
Deaths
Statistic
1
May 1979
11 (.23)
56 (.09)
1.68
2
Oct 1979
16 (.33)
77 (.12)
2.24
3
Mar 1980
21 (.44)
126 (.20)
2.37
4
Oct 1980
28 (.58)
177 (.28)
2.30
5
Apr 1981
34 (.71)
247 (.39)
2.34
6
Oct 1981
40 (.83)
318 (.51)
2.82
Critical Values (z) for 2-sided Group Sequential
Design with .05 Overall Significance and 7 Looks
(BHAT)
Interim
Analysis
1
2
3
4
5
6
7
OBF
5.46
3.85
3.15
2.73
2.44
2.23
2.06
Lan-DeMets (OBF)
Events Calendar
8.00
8.00
4.86
4.08
3.41
2.95
1.97
Logrank Z=2.82
4.53
3.73
3.20
2.75
2.47
2.28
2.05
Flexible Number of Looks
• Another advantage of the Lan-DeMets spending
function approach is the flexibility with the number
of looks.
• Suppose BHAT was not stopped and there were 3
more looks before the end (10 total).
• Looks 7-10 correspond to information fractions
considering the number of events of 0.65, 0.75, 0.85
and 1.0.
• Stopping boundaries can be calculated conditioned
upon the previous tests
Critical Values (z) for 2-sided Group Sequential Design
with .05 Overall Significance and 7 Looks
(BHAT)
Interim
Analysis
1
2
3
4
5
6
7
8
9
10
Lan-DeMets (OBF)
7 Looks
10 Looks
8.00
8.00
4.86
4.08
3.41
2.95
1.97
8.00
8.00
4.86
4.08
3.41
2.95
2.58
2.41
2.26
2.06
Suppose We Get To the 6th Analysis
by A Different Route
• Information fractions are .05, .20, .30, .40, .45
• Instead of .09, .12, .20, .28, and .39
Critical Values (z) for 2-sided Group Sequential Design
with .05 Overall Significance and 7 Looks
(BHAT)
Interim
Analysis
1
2
3
4
5
6
Lan-DeMets (OBF)
7 Looks
7 Looks
8.00
8.00
4.86
4.08
3.41
2.95
8.00
4.89
3.93
3.33
3.19
2.98
Variations of the Theme
• Asymmetric boundaries (e.g., non-significant harmful
effect of new treatment)
–
–
Use upper boundary for superiority and less conservative
boundary for harm (Z= -1.5 or –2.0, or OBF for efficacy and
Pocock for harm)
Appropriate for an investigational product but probably not for a
product already approved and used as part of standard of care
• Multiple outcomes, e.g., efficacy and safety, and
composites
• Multiple trials (CHARM heart failure, Cox-2 chemoprevention)
• Futility and curtailed sampling procedures (conditional
and unconditional power)
• Repeated confidence intervals (e.g., use OBF critical
values to compute interim CIs)
Asymmetric Monitoring Boundary
for Harm
Harm
Pocock
2.4
1.5
Z
Benefit
SMART Study Design
CD4+ cell count >350 cells/mm3
n = 2752
Virologic Suppression
(VS) Strategy
[Use of ART to maintain viral
load as low as possible
throughout follow-up]
n = 2720
Drug Conservation
(DC) Strategy
[Stop or defer ART until CD4+
< 250; then episodic ART
based on CD4+ cell count to
increase counts to > 350]
Plan: 910 primary endpoints; 8 years average follow-up.
Intervention interrupted on 11 January 2005.
N Engl J Med 2006.
SMART Guideline
“…it is recommended that the DSMB consider
early termination or protocol modification only
when the O’Brien-Fleming boundary is crossed
for the primary endpoint and the findings for
the primary and the composite cardiovascular,
metabolic endpoint are consistent...”
Interim Monitoring: O’Brien Fleming Boundaries
for the Primary Endpoint, by DSMB Date
Interim Monitoring: O’Brien Fleming Boundaries
for the Primary Endpoint, by Cut Date
SMART Primary and Supportive
Endpoint Results
• OD or death
• (primary endpoint)
DC Group
VS Group
N
N
Rate
Rate
HR (DC/VS)
[95% CI]
P-value
122
3.4
50
1.4
2.5 [1.8, 3.5]
<0.001
• CVD, Renal, Liver
•
• - CVD
•
• - Renal
65
1.8
39
1.1
1.7 [1.1, 2.5]
0.009
48
1.3
31
0.8
1.6 [1.0, 2.5]
0.05
9
0.2
2
0.1
4.5 [1.0, 20.9]
0.05
• - Liver
10
0.3
7
0.2
1.4 [0.6, 3.8]
0.46
Futility
• Usual definition - convincing evidence exists that the
new treatment is not beneficial.
• If this is the case, minimizing exposure to an
ineffective treatment with potential toxicities and
saving resources should lead to a consideration to
stop the trial.
• What is convincing?
• Futility, more generally, can also be impacted by low
event rate or slow enrollment (e.g., CVD mortality
outcome in the Physician’s Health Study).
Conditional Power (or Stochastic
Curtailment) to Assess Futility
• What is the probability of rejecting the null
hypothesis (i.e., getting a significant result)
given the data to date and my best guess
about the future, e.g.,
– will look like the past
– no difference
– like assumed in the design
Lan KKG, Wittes J, Biometrics, 1988.
Example of Curtailment from Proschan’s
Book
Event
No Event
Control
75
116
191
Treatment
75
118
193
150
234
384
Planned sample size = 400
Even if all 9 remaining controls had events and all 7 treatment
group patients did not, Z=0.92. Why continue?
Example from Proschan’s Book (cont.)
Event
No Event
Control
71
100
171
Treatment
71
100
171
142
200
342
Planned sample size = 400
If all 20 remaining controls had events and all 20 treatment
group patients did not, the result would be significant. But how
likely is that? Answer = almost zero.
Conditional Power: Usual
Implementation
• Guidelines in protocol (pre-specified)
• Typically compute conditional power after you have a
fair amount of data (e.g., 50% of information)
• Compute conditional power under a number of
scenarios for assumed intervention effect (observed
effect to date, alternative assumed in design, null
effect, others effect sizes in between).
• Can graph boundaries of conditional power versus
information accrued to facilitate decision making.
Unconditional Power
• What is the probability of rejecting the null
hypothesis (i.e., getting a significant result)
based on the original design assumptions for
the treatment effect, but considering:
– revised estimate of control group event rate
– duration of follow-up accounting for recruitment
period and minimum follow-up originally planned for
each participant
Is a null result still meaningful?
Guideline for HIV Early Treatment
Trial (START)
• 1st consider unconditional power. If < 70%,
consider conditional power.
• If conditional power is < 20%, consider
stopping for futility.
Rationale: Unconditional power could be low in
the presence of a large treatment effect.
Summary (1)
• Many studies require a DSMB
– Trials with morbidity and mortality outcomes
– Trials of treatments that may be associated with serious toxicities
(need to have a group look a controlled comparisons)
– Trials of novel, high risk treatments (e.g., gene therapy)
– Trials involving frail populations (elderly, infants)
Summary (2)
• A DSMB can be most effective in its role of protecting the
interests of patients if it is independent of the sponsor and
trial investigators – peer review works!
• Operating procedures should be agreed upon in advance
• An informed statistician who performs interim analyses is
important
• To carry out interim analyses data must be collected in a
timely way
• Reports should focus on comparisons of clinical outcomes
and their validity
Summary (3)
• Monitoring guidelines should be pre-specified
• Guidelines need to be accompanied with common
sense, a careful assessment of risks and benefits,
and and opinions from experts from different
backgrounds.
• This is a fruitful area for research.
Recommendation from Paul Canner based
on his experiences in Coronary Drug
Project
“…no single statistical decision rule or procedure
can take the place of the well-reasoned
consideration of all aspects of the data by a group
of concerned, competent, and experienced
persons with a wide range of scientific
backgrounds and points of view.”
Cont Clin Trials 1981; 1:363-376.