Experimental Design and Statistical Considerations

Download Report

Transcript Experimental Design and Statistical Considerations

Experimental Design and
Statistical Considerations in
Translational Cancer Research
(in 15 minutes)
Elizabeth Garrett-Mayer, PhD
Associate Professor of Biostatistics
and Epidemiology
Two Parts


Phase I studies
Taking markers into the clinic
Phase I Trial Design


Historically, DOSE FINDING study
Classic Phase I objective:



“What is the highest dose we can safely administer to
patients?”
Translation: Kill the cancer, not the patient
Assumes monotonic relationship between


dose and toxicity
dose and efficacy
1.0
Classic Phase I Assumption:
Efficacy and toxicity both increase with dose
DLT =
doselimiting
toxicity
0.8
0.6
0.4
0.2
0.0
Probability of Outcome
Response
DLT
1
2
3
4
Dose Level
5
6
7
Classic Phase I approach:
Algorithmic Designs


“3+3” or “3 by 3”
Prespecify a set of doses to consider, usually between 3 and
10 doses.
Treat 3 patients at dose K
1. If 0 patients experience DLT, escalate to dose K+1
2. If 2 or more patients experience DLT, de-escalate to level K-1
3. If 1 patient experiences DLT, treat 3 more patients at dose level K
A. If 1 of 6 experiences DLT, escalate to dose level K+1
B. If 2 or more of 6 experiences DLT, de-escalate to level K-1


MTD is considered highest dose at which 1 or 0 out of six
patients experiences DLT.
Confidence in MTD is usually poor.
“Novel” Phase I approaches

Continual reassessment method (CRM)
(O’Quigley et al., Biometrics 1990)


Many changes and updates in 20 years
Tends to be most preferred by statisticians

Other Bayesian designs (e.g. EWOC) and model-based
designs (Cheng et al., JCO, 2004, v 22)

Other improvements in algorithmic designs


Accelerated titration design (Simon et al. 1999, JNCI)
Up-down design (Storer, 1989, Biometrics)
CRM: Bayesian Adaptive Design




Dose for next patient is determined based on toxicity
responses of patients previously treated in the trial
After each cohort of patients, posterior distribution is
updated to give model prediction of optimal dose for
a given level of toxicity (DLT rate)
Find dose that is most consistent with desired DLT rate
Modifications have been both Bayesian and non-Bayesian.
New paradigm: Targeted Therapy
How do targeted therapies change the early phase
drug development paradigm?

Not all targeted therapies have toxicity




Toxicity may not occur at all
Toxicity may not increase with dose
Targeted therapies may not reach the target of interest
Implications for study design: Previous assumptions may
not hold




Does efficacy increase with dose?
Endpoint (DLT) may no longer be appropriate
Should we be looking for the MTD?
What good is phase I if the agent does not hit the target?
0.2
0.4
0.6
0.8
Efficacy
Toxicity
0.0
Probability of Outcome
1.0
Possible Dose-Toxicity & Dose-Efficacy
Relationships for Targeted Agent
0
2
4
6
dose
8
10
12
What is a Correlative Study?


A study that correlates a “marker” with disease
What is a marker?


An innate characteristic of a tumor or tissue
Examples
Marker
PSA
Estrogen
receptor
SUV from
PET
KIT
mutation
Disease
Prostate
cancer
Breast
cancer
Many
cancers
GIST
What is it good for?

Prognostic marker:


Predictive marker:


Predicts outcome (independent of therapy)
Predicts response to therapy
Can be used for





Treatment assignment
Treatment stratification in clinical trials
Surrogate endpoint (?)
Targeted therapy development
Diagnosis
Mitotic Rate: Prognostic Marker
Figure 3. Recurrence-free survival in
127 patients with completely resected
localized gastrointestinal stromal tumor
(GIST) based on mitotic rate
DeMatteo et al, Cancer, 112:608-615
HER-2: Predictive Marker
Disease-free survival.
Gennari A et al. JNCI J Natl Cancer Inst 2007;100:14-20
© The Author 2007. Published by Oxford University Press.
Lifecycle of a marker

Analytical development


Clinical development



Measurement, logistics etc
Sample collection, storage, processing
“Retrospective” connection with outcome
Clinical validation

“Prospective “ connection with outcome
Statistical issues during
analytical development

Reproducibility




Repeat the measurement on the same sample multiple
times under otherwise identical conditions
Suppose binary marker, twice measured
Results can be summarized in a fourfold (2x2) table
Statistical Significance?



not good enough!
p<0.05 shows there is a trend
need strong agreement, not just a trend
Continuous Measurements
Measurement 1
p = 5.2x10E-11
R-squared = 0.59
Measurement 2
p = 3.2x10E-5
R-squared = 0.62
Measurement 2
Measurement 2
p = 1.2x10E-11
R-squared = 0.92
Measurement 1
DO NOT RELY ON P-VALUES!!
Measurement 1
Clinical development of a marker


Correlate marker(s) with the outcome on a cohort of
patients
Many issues relate to bias



Case/control selection
Quality/Processing
Over-fitting/Lack of validation
What is bias?




A systematic difference between what we think we
observe and what we actually observe
The more “haphazard” the data collection process, the
more chances of bias creeping in
Buyer beware: Commercial Tissue Microarrays
Why is bias a problem?


Cannot be “quantified” (within a study)
Does not diminish with increasing sample sizes
Double dipping



Use the same data to develop/fine-tune a marker (or model)
and evaluate its characteristics
Most obvious with multivariable analyses (gene signatures etc)
Might happen in seemingly innocuous circumstances



Choosing a cutpoint
Not reporting negative markers
VALIDATION!!!


“cross-validation”: statistical approaches that use the same data but
account for double-dipping
true validation:


repeat the study in a new but similar population
apply the “model” to a new dataset and test its prediction accuracy
Be critical of your results

All sorts of biases crept in




Patients with tissue are unlikely to be a random sample
No real inclusion/exclusion criteria
Possibly looked at many markers, many subsets and many
thresholds
Build your marker into a clinical trial
Incorporating markers into clinical trials



Start as secondary endpoints in a Phase I or II trial
If Phase I, might be better to have an MTD-cohort
and limit the correlative studies to that cohort
If Phase II and an expensive/invasive marker, consider
a two-stage design where marker will be measured
only in the second stage