Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA 2005 Industry/FDA Workshop, Washington.

Download Report

Transcript Multiple Endpoint Testing in Clinical Trials – Some Issues & Considerations Mohammad Huque, Ph.D. Division of Biometrics III/Office of Biostatistics/OPaSS/CDER/FDA 2005 Industry/FDA Workshop, Washington.

Multiple Endpoint Testing in
Clinical Trials – Some Issues &
Considerations
Mohammad Huque, Ph.D.
Division of Biometrics III/Office of
Biostatistics/OPaSS/CDER/FDA
2005 Industry/FDA Workshop, Washington. DC
11/6/2015
Disclaimer
• Views expressed here is that of the
presenter and not necessarily of the
FDA
11/6/2015
Sources of Multiplicity in Clinical
Trials
• Multiple endpoints 
•
•
•
•
•
Multiple comparisons
Interim analysis
Subgroup analysis
Selection of covariates in an analysis model
Others
11/6/2015
OUTLINE
1.
2.
3.
4.
5.
Type I error concept and type I error control when
testing for multiple endpoints. Complexities?
Multiple endpoints are often triaged into primary,
secondary and other types of endpoints. Reasons for
doing so and how these endpoints are tested?
Sequential testing of endpoints - no alpha adjustment is
needed. Issues and fixes?
Some trials require that 2 or more endpoints must show
effects for clinical evidence. Reasons for doing so and
consequences?
Composite endpoints. Underlying concepts and
complexities?
11/6/2015
Trial has a single endpoint to test –
type I and type II errors
• Conduct a test for
claiming that a
new treatment is
beneficial
• α = Probability of
the Type I error
• β = Probability of
the Type II error
(power = 1- β )
11/6/2015
Concludes
Treatment
Not
beneficial
Concludes
Treatment
beneficial
Truly Not
beneficial
H0
Correct
Decision
Type I
error
Truly
beneficial
Ha
Type II
error
Correct
Decision
Trial has multiple endpoints to test
• Consider a two arm superiority trial, a test
treatment versus a control
Endpoints: y1, y2, …, yK
Multiple Null Hypotheses: F = {H01, H02, …,
H0K}
H0j: δj = 0, Haj δj ≠ 0, j =1, …, K
11/6/2015
Trial has multiple endpoints to test
• Two scenarios:
(A) In the family F all are true null hypotheses
(B) Some may be true null hypotheses, and some
may be false null hypotheses, but their true
state are unknown.
11/6/2015
Testing under scenario (A)
• Scenario (A) and the trial has 3 endpoints y1, y2, and
y3
• A test procedure can give type I error in multiple
ways: (-, -, +), (-, +, -), (+, -, -), (-, +, +), (+, -, +), (+,
+, -), (+, +, +). These are chance events because of
multiplicity of tests when in fact there is no treatment
benefit for any of the endpoint.
• α0 = Pr {of at least one of these chance events | test
procedure, H0}, H0= ∩H0j
11/6/2015
Testing under scenario (A)
• α0 is called global alpha (or overall alpha). Also,
called the familywise type I error rate (FWER)
under H0, where
• H0= ∩H0j is the global null hypothesis.
• A test procedure for testing H0 is called a global
test procedure
11/6/2015
Global Test procedures
• Useful for non-specific global claims.
Difficulty in interpreting the result. Type I
error rate can remain inflated for specific
claims.
• Examples: Simes test, O’Brien’s OLS/GLS
tests, Hotelling’s T2 test (Sankoh et al, DIA
Jr.,1999)
11/6/2015
Testing under scenario (B)
• Some of the null hypotheses F = {H01, H02, …, H0K}
may be true null hypotheses and some be false, but its
not known which ones are which.
• Question: Is there a treatment effect specifically for the
endpoint y1?
• For answering this question, the null hypothesis is not a
single null hypothesis like a global null hypothesis,
rather it is a class of null hypothesis configurations in
which there is no treatment effect for y1, and all
possible scenarios for treatment effects for the
remaining endpoints y2, …, yK
11/6/2015
Testing under scenario (B)
• Consider 3 endpoints y1, y2, and y3.
• Question: Is there a treatment effect specifically
for the endpoint y1?
• Null hypothesis configurations F1 for testing for
treatment effect specifically for the endpoint y1:
F1 = { (δ1 = 0, δ2 = 0, δ3 = 0),
(δ1 = 0, δ2 = 0, δ3 ≠ 0),
(δ1 = 0, δ2 ≠ 0, δ3 = 0),
(δ1 = 0, δ2 ≠ 0, δ3 ≠ 0)}.
11/6/2015
Control of FWER
• Weak control
(two types)
– Control FWER only under the global null
configuration
• Strong control
– Control FWER under all null configurations
– Specificity property -- useful for making specific
claims.
– Examples of methods: Bonferroni, Holm, Hochberg*,
closed statistical tests, and other methods
*with some caveats
11/6/2015
Triaging of multiple endpoints into meaningful
families by trial objectives
• Two important families
Primary endpoints
Exploratory endpoints
1) Prospectively defined
2) FWE controlled
Secondary endpoints
(usually not prospectively defined)
• Primary endpoints are primary focus of the trial. Their results determine
main benefits of he clinical trial’s intervention.
• Secondary endpoints by themselves generally not sufficient for characterizing
treatment benefit. Generally, tested for statistical significance for extended
indication and labeling after the primary objectives of the trial are met.
11/6/2015
Statistical methods
• Prospective alpha allocation schemes
(PAAS) – Moyé (2000)
– Spend alpha1 for the primary endpoints and the
remaining alpha for the secondary endpoints FWER is controlled
11/6/2015
Statistical methods
• Parallel gatekeeping strategies for clinical
trials –
– Dmitrienko-Offen-Westfall (SM 2003)
– Chen-Luo-Capizzi (SM 2005)
• Allows testing of secondary endpoints when
at least one of the primary endpoints
exhibits a statistically significant result
• These methods controls FWER for both the
primary and secondary endpoints in the
11/6/2015
strong sense.
Sequential testing of multiple
endpoints
• A fixed sequence approach allows testing of each
of the k null hypotheses at the same significance
level of α without any adjustment, as long as the
null hypotheses to be tested are hierarchically
ordered and are tested in a pre-defined sequential
order.
• Hierarchical ordering of null hypotheses can be
achieved, for example, by their clinical relevance.
11/6/2015
Sequential testing of multiple
endpoints
For this fixed-sequence approach, however,
there are two caveats:
• Pre-specification of the testing sequence
• No further testing once the sequence breaks
• Problem: when the sequence breaks and the
next p-value is extreme (e.g., p1= 0.50, p2=
0.001)
11/6/2015
A flexible fixed-sequence approach
H(01) is rejected
Test H(02) at
Level α
Test H(01) at
Level α1
H(01) is rejected
e.g., α1 = 0.04, α = 0.05, γ = 0.0104, ρ = 0
(γ = 0.0214, ρ = 0.8 )
11/6/2015
Test H(02) at
Level γ
Example: flexible fixed-sequence method
Win
M12
M15
.02
.038
Fail
M15
Win
Fail
M9
.03
Fail
Win
Assert
M9
M15, M9
.012
Fail
M9
Assert
Assert
.012
M15
M12
Fail
Stop
No Assertion
M9
.05
Fail
.018
11/6/2015
Win
Fail
Win
Assert
M12,M9
Win
Assert
M12, M15,
M9
Assert
M12 & M15
Some trials require that 2 or more endpoints
must show effects
Examples:
• Alzheimer trial
– (win on ADAS-Cognitive Sub-scale) and (win on
Clinician’s Interview Based Impression of Change)
• Many other examples (PhRMA draft paper)
Main Reason:
• Clinical expectations of the desired clinical benefit
(concept beyond statistics)
11/6/2015
Adjustments in the Type I error rate - Some wining
criterion require adjustments and some don’t
Adjusted Alpha
Adjustments in the Type I Error rate for the Two Win
Senarions (1-Sided Test), Case of 2 Endpoints
0.03
0.025
0.02
0.015
0.01
0.005
0
Alpha Adj/ Win in At
Least One
No Alpha Adj/ Win in
Both
0
0.2
0.4
0.6
0.8
1
Correlation
 Adjustment by Sidak’s method on accounting for correlation
Note: Which method to use depends on on the clinical decision rule set in
11/6/2015
advance
Power Comparison
Case of K=2 endpoints:
Power (%)
Win in Both Versus Win in At Least One (1-Sided
Test at 0.025)
92
90
88
86
84
82
80
Win in at least one
endpoint
Win in both endpoints
0
0.5
Correlation
11/6/2015
1
Single endpoint
power
Loss in Power when win in all endpoints
K=# of endpoints
Power Comparison: Win in Each Endpopint at Alpha
Level 0.025 (1-Sided Test)
Power (%)
90
80
K=1
K=2
70
K=3
60
K=4
50
K=5
0
11/6/2015
0.2
0.4
0.6
Correlation
0.8
1
Sample Size Increase (1) When Win in All K
Endpoints Compared to Single Endpoint Case
Correlation
0.0
0.3
0.4
0.5
0.6
0.7
0.8
0.9
(1)
Alpha = 0.025 (1-sided), Power = 0.90
K=2
K=3
K=4
22.8%
35.9%
45.0%
21.1
33.1
41.2
20.2
31.7
39.7
19.1
29.8
37.3
17.7
27.5
34.4
15.9
24.6
30.7
13.5
20.8
25.8
10.0
15.3
18.9
Calculations using mutivariate normal distribution of the test statistics
comparing active treatment versus placebo for a 2-arm trial, assuming same
delta/sigma for all K endpoints
11/6/2015
Composite Endpoints
Two types • Total score or index based on a rating scale,
e.g., HAMD totals in depression trials,
ACR20/ACR70 in rheumatoid arthritis
trials
Issues: validity and reliability
11/6/2015
Composite Endpoints
Another Type
• Composite endpoint is defined in terms of
the time to the first “event”, where event is
one of several possible event types
LIFE study: Composite of cardiovascular
death, stroke and myocardial infraction
events.
11/6/2015
Composite Endpoint Issues
Life Study
The Composite endpoint was significantly positive.
However, analysis of the first events by individual
components and sub-composite endpoints indicate
overall composite result mainly due to reduction
in fatal and non-fatal stroke.
Issue:
How to interpret composite endpoint results? How
to characterize benefits in terms of the component
endpoints?
11/6/2015
Extent of multiplicity adjustments between
endpoints
correlation
high
low
Small
adjustments
Large
adjustments
low
Practically no
adjustments
Good case
for combining
endpoints
high
Homogeneity of treatment effects across endpoints
11/6/2015
Concluding Remarks
• For endpoint specific claims – strong control of the type I
error is needed
• Parallel gate-keeping strategies can be used for the primary
and secondary endpoint claims
• Flexible sequential test procedure can be used to gain power
of the test
• There is a scientific basis when a reasonable clinical decision
rule asks for statistically significant efficacy results in more
than 1 endpoint – issue of loss of power?
• When 4 or more endpoints included as primary (e.g., arthritis
trials), and homogeneity of treatment effects acress
endpoints is expected - a composite or responder endpoint
approach will be effective.
11/6/2015