Statistical challenges in the validation of surrogate endpoints Marc Buyse International Drug Development Institute (IDDI), Brussels Limburgs Universitair Centrum, Diepenbeek, Belgium [email protected] FDA Industry Workshop, September.

Download Report

Transcript Statistical challenges in the validation of surrogate endpoints Marc Buyse International Drug Development Institute (IDDI), Brussels Limburgs Universitair Centrum, Diepenbeek, Belgium [email protected] FDA Industry Workshop, September.

Statistical challenges in the
validation of surrogate endpoints
Marc Buyse
International Drug Development Institute (IDDI), Brussels
Limburgs Universitair Centrum, Diepenbeek, Belgium
[email protected]
FDA Industry Workshop, September 22-23, 2004
Outline
 Need for surrogates
 Definitions
 Validation criteria
– Single trial
– Several trials (meta-analysis)
 Case studies
– PSA and survival (advanced prostatic cancer)
– 3-year PFS and 3-year OS (early colorectal cancer)
Why do we need surrogates?
 Practicality of studies:
– Shorter duration
– Smaller sample size (?)
 Availability of biomarkers:
– Tissue, cellular, hormonal factors, etc.
– Imaging techniques
– Genomics, proteomics, other-ics
Ref: Schatzkin and Gail, Nature Reviews (Cancer) 2001, 3.
Validity of a surrogate endpoint
Evidence that biomarkers predict clinical effects
–
–
–
–
Epidemiological
Pathophysiological
Biological
Statistical
What are the conditions required to show this?
Ref: Biomarkers Definition Working Group, Clin Pharmacol Ther 2001, 69: 89.
Definitions
 Clinical endpoint: a characteristic or variable that
reflects how a patient feels, functions, or survives
 Biomarker: a characteristic that is objectively
measured and evaluated as an indicator of normal
biological processes, pathogenic processes, or
pharmacologic responses to a therapeutic
intervention
 Surrogate endpoint: a biomarker that is intended to
substitute for a clinical endpoint. A surrogate
endpoint is expected to predict clinical benefit (or
harm or lack of benefit or harm)
Ref: Temple, JAMA 1999;282:790.
Single trial
Parameters of interest
– effect of treatment on surrogate endpoint ()
– effect of treatment on true endpoint ()
– effect of surrogate on true endpoint ()
– adjusted effect of treatment on true endpoint (S)
– adjusted effect of surrogate on true endpoint (Z)
Ref: Buyse and Molenberghs, Biometrics 1998;54:1014.
Surrogate
endpoint
Treatment
True
endpoint
Correlation of endpoints is not enough
Key point: “A correlate does not a surrogate make”
   0 is not a sufficient condition for validity
Ref: Fleming and DeMets, Ann Intern Med 1996, 125: 605.
A first formal definition and criteria
Prentice’s definition
H0S :  = 0  H0T :  = 0
Prentice’s criteria
An endpoint can be used as a surrogate if
– it predicts the final endpoint (  0)
– it fully captures the effect of treatment upon the
final endpoint (  0 and S = 0)
Ref: Prentice, Statist in Med 1989;8:431.
A first formal definition and criteria
Problems with Prentice’s approach
–
–
–
–
rooted in hypothesis testing
require significant treatment effects
overly stringent
criteria not equivalent to definition (except for
binary endpoints)
– one can never prove the null (S = 0)
Ref: Buyse and Molenberghs, Biometrics 1998;54:1014.
The proportion explained
Freedman’s “proportion explained” is defined as
PE = 1 - S / 
 if S = , PE = 0 and the surrogate explains nothing
 if S = 0, PE = 1 and the surrogate explains the entire
effect of treatment on the true endpoint
Ref: Freedman et al, Statist in Med 1989;8:431.
The proportion explained
Problems with the proportion explained
– PE is not a proportion (can be <0 or >1)
– PE confuses two sources of variability, one at the
individual level, the other at the trial level:
PE = Z /
– PE can be anywhere on the real line, depending on
precision of S and T…
Ref: Molenberghs et al, Controlled Clin Trials 2002;23:607.
Statistical validation of
surrogate endpoints
“The effect of treatment on a surrogate endpoint must be
reasonably likely to predict clinical benefit”
Ref: Biomarkers Definitions Working Group, Clin Pharmacol Ther
2001;69:89.
The relative effect
Interest now focuses on the two components of PE:
– the surrogate must predict the true endpoint (Z  0)
– the relative effect, defined as
RE = /
allows prediction of the effect of treatment on the
true endpoint () based on the effect of treatment
on the surrogate ()
Ref: Buyse and Molenberghs, Biometrics 1998;54:1014.
Prediction of true endpoint
from surrogate endpoint
Endpoints observed on
individual patients
True Endpoint
R² indicates quality
of regression
Slope = 
Surrogate Endpoint
Treatment Effect on True Endpoint ()
Prediction of treatment effect:
one trial
1
Treatment effect observed
in the trial
.5
Slope = /
0
-.5
Regression through origin;
only one point!
-1
-1
0
1
Treatment Effect on Surrogate Endpoint ()
Several trials
For a marker to be used as a surrogate, we need
“repeated demonstrations of a strong correlation
between the marker and the clinical outcome”
Ref: Holland, 9th EUFEPS Conference on “Optimising Drug
Development: Use of Biomarkers”, Basel, 2001.
Treatment Effect on True Endpoint ()
Prediction of treatment effect:
several trials
1
Treatment effects observed
in all trials
.5
Slope = /
0
-.5
R² indicates quality of regression
-1
-1
0
1
Treatment Effect on Surrogate Endpoint ()
Validation criteria using several trials
Parameters of interest
– effect of treatment on surrogate endpoint ()
– effect of treatment on true endpoint ()
– effect of surrogate on true endpoint ()
– measure of association between surrogate endpoint
and true endpoint (R²individual)
– measure of association between effects of treatment
on surrogate endpoint and on true endpoint (R²trial)
Ref: Buyse et al, Biostatistics 2000;1:49;
Gail et al, Biostatistics 2000;1:231.
Technical difficulties: the endpoints
are not normally distributed
In practice, endpoints are often of the following type : response,
survival, longitudinal. Such endpoints are not normally
distributed, and therefore complex modelling is required to
characterize the association between endpoints (“individual level
association”).
At the trial level, however, simple linear models are still adequate
to characterize the association between treatment effects on the
endpoints (“trial level association”).
Refs:
Molenberghs et al, Stat Med 20:3023, 2001;
Burzykowski et al, J Royal Stat Soc A 50: 405, 2001;
Renard et al, J Applied Statist 30:235, 2002.
A case study in advanced prostatic cancer:
the trials
 Two multicentric trials for patients in relapse after firstline endocrine therapy (596 patients)
 Unit of analysis for treatment effects: country (19 units)
 Patients randomized between two treatments:
– Experimental (retinoic acid metabolism-blocking agent)
– Control (anti-androgen)
Ref: Buyse et al, in: Biomarkers in Clinical Drug Development
(Bloom JC, ed.): Springer-Verlag, 2003.
A case study in advanced prostatic cancer:
the endpoints
Potential surrogate endpoints:
 Longitudinal PSA measurements taken at pre-defined
time points
 PSA response (decrease of at least 50%)
 Time to PSA progression (TPP)
True endpoint:
 Overall survival
A case study in advanced prostatic cancer
Experimental
Control
10
Surrogate
endpoint
Log(PSA)
8
Treatment
Experimental
6
4
2
0
-2
0
.5
1
1.5
2
2.5
3
Time (years)
Control
Estimated hazard rate
Rz
1.5
1
True
endpoint
.5
Experimental
Control
0
0
.5
1
1.5
2
Time (years)
2.5
3
PSA response as surrogate for survival
Treatment effect on survival time
Very weak association between treatment effects
R² = 0.05
2
1
0
-1
-2
-3
-2
-1
0
1
Treatment effect on PSA response
2
TTP as surrogate for survival
Weak association between treatment effects
Treatment effect on survival time
3
R² = 0.22
2
1
0
-1
-2
-3
-3
-2
-1
0
Treatment effect on time to PSA progression
1
Longitudinal PSA as surrogate for survival
Moderate association between treatment effects
Treatment effect on survival time
3
R²trial = 0.45
2
1
0
-1
-2
-4
-3
-2
-1
0
1
2
Treatment effect on longitudinal PSA
3
Individual-level and trial-level
measures of association
Individual-level
association between
PSA and survival
[95% C.I.]
PSA response
Time to PSA progression
Longitudinal PSA
Survival odds ratio
= 5.5 [2.7 - 8.2]
Survival odds ratio
= 6.3 [4.4 – 8.2]
Coefficient of
determination R²(t)
> 0.84 at all times t
Trial-level association
between treatment
effects on PSA and
survival [S.E.]
R²trial = 0.05 [0.13]
R²trial = 0.22 [0.18]
R²trial = 0.45 [0.18]
A case study in early colorectal cancer:
the trials
 Fifteen collaborative group trials for patients after
resection of colorectal tumor (12,915 patients)
 Unit of analysis for treatment effects: 18 comparisons
between 33 treatment arms
 Patients randomized between various 5-FU regimens
and/or control
A case study in early colorectal cancer:
the endpoints
Potential surrogate endpoint:
 3-year disease-free survival
True endpoint:
 5-year overall survival
Ref: Sargent et al, Proceedings ASCO (Abstract # 3502), 2004.
Acknowledgement: the following slides are based on Dr Daniel
Sargent’s presentations to ODAC on May 5 and at ASCO on June 6
Most recurrences occur before 3 years
8
7.2
6.9
Recurrence Rate (%)
7
5.6
6
5
4
3.5
4
3.2
3
2.2
2
2
1.3 1.2
1
0.9 0.8
0
0.5 0.5 0.4
0.3
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Years after randomization
6
6.5
7
7.5
8
Strong association between endpoints
0.8
R2=0.86
Overall Survival
0.75
0.7
0.65
0.6
0.55
0.5
0.5
0.55
0.6
0.65
0.7
Disease Free Survival
0.75
0.8
Strong association between treatment effects
1.3
Overall Survival Hazard Ratio
2
R =0.87
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.5
0.6
0.7
0.8
0.9
1
1.1
Disease Free Survival Hazard Ratio
1.2
1.3
Predicted versus actual OS hazard ratios
1.6
Predicted Overall Survival Hazard Ratio
1.4
Actual Overall Survival Hazard Ratio
1
0.8
0.6
0.4
c1
-8
9
N
-8
9
c2
15
N
S9
4
05
C
c3
-8
9
-9
1
N
N
01
C
c1
04
C
04
c2
02
C
C
-8
7
N
C
IC
N
IO
IV
G
03
C
-7
8
N
D
SI
EN
A
IN
T00
35
0.2
FF
C
Hazard Ratio
1.2
Overview of validation approaches
 Single trial
– full capture (Prentice)
– proportion explained (Freedman et al)
– relative effect (Buyse & Molenberghs)
– likelihood reduction factor (Alonso et al)
 Several trials (meta-analysis)
– concordance (Begg & Leung)
– correlation of effects (Daniels & Hughes)
– trial-level measures of association (Gail et al)
– individual- and trial-level measures of association (Buyse et al)
– predicted treatment effect (Baker)
– surrogate threshold effect (Burzykowski & Buyse)
Conclusions on surrogate validation
 Ideally, statistical validation requires the following:
–
–
–
–
–
data from randomized trials
replication at the trial or center level
at least some observations of T
large numbers of observations
range of therapeutic questions (Z1, Z2, …)
 Hence:
– individual patient data meta-analyses are needed
– access to such data is a problem when they are proprietary
Ref: Burzykowski, Molenberghs and Buyse (eds.), “The Evaluation of
Surrogate Endpoints”, Springer-Verlag (in press).