A comparative meta-analysis of prognostic gene signatures for late

Download Report

Transcript A comparative meta-analysis of prognostic gene signatures for late

A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer
May 16, 2013
Levi Waldron
Supervisors: Curtis Huttenhower and Giovanni Parmigiani
Harvard School of Public Health
Department of Biostatistics
Dana Farber Cancer Institute
Biostatistics and Computational Biology
Predictive modeling for
translational genomics
• Measure Xij (gene expression, mutations, …)
• Predict Yj (survival, treatment response, …)
2
Training + Validation
Cross-validation to estimate
prediction accuracy
Independent Validation
training test
Dataset 2
Dataset 3
Lasso
Ridge
Elastic Net
Random Forests
Support Vector Machine
K Nearest Neighbors
Supervised PCA
Linear Discriminant Analysis
Boosting / Bagging
Insert Favorite Method Here
•
Need a new cohort of patients
•
Can use public data
3
Prognostic gene signatures
of ovarian cancer
Objectives:
1. Assess the reproducibility of published prognostic gene
expression models
2. Evaluate published models using publicly available data
3. Improve on models using all publicly available data
4. Validate promising models in FFPE specimens from GOG-218
bevacizumab phase-III clinical trial
With Michael Birrer, MD (MGH)
4
4
23 ovarian cancer microarray studies
Machine syntax check
C
U
R
A
T
I
O
N
ID
Debulk
Status
D2640
S
0
ID
ch1.3
Status
GSM123
opt
DOD
…
…
sampleid
debulking
vital_status
D2640
suboptimal
living
sampleid
debulking
vital_status
GSM123
optimal
deceased
…
✔
✔
…
Human double check
Available in Bioconductor
(v2.12):
Y
Download
expression
data
Affymetrix
platform
Y
(f) RMA
re-normalization
Raw data?
> source("http://bioconductor.org/biocLite.R")
N
N
> biocLite("curatedOvarianData")
Collapse
probesets
to genes
Probeset
Gene
GSM123
GSM124
204531_s_at
BRCA1
4.0
4.1
211851_x_at
BRCA1
5.0
6.0
Automatically build documented
curatedOvarianData R package
Gene
GSM123
GSM124
BRCA1
5.0
6.0
B.F. Ganzfried, M. Riester, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic,
X. V. Wang, M. Ahmadifar, M. Birrer, G. Parmigiani, C. Huttenhower, L.
Waldron. curatedOvarianData: Clinically Annotated Data for the
Ovarian Cancer Transcriptome (DATABASE 2013).
5
Meta-analysis overview
Literature review
Prognostic models
101 papers from Pubmed search
Five review papers
Inclusion Criteria
Training sample size > 40
Focus on late-stage serous
Multivariate model
Continuous risk score
Claims to predict survival
Possible to reproduce model
14 prediction models implemented
100 pages documentation
survHD Bioconductor package
curatedOvarianData
Standardized clinical annotation and gene ID
23 studies, 2,908 samples
Inclusion Criteria
Sample size > 40
Primary tumors
Overall survival available
Events (deaths) > 15
Late stage, high grade tumors
Serous subtype
10 datasets, 1,386 samples
Assessment of prognostic signatures
Validation Statistics:
14 Models in 10 Datasets
14 prognostic signatures
C-Index = Pr(g(Z1)>g(Z2) | T2>T1)
T1, T2 = times to death of two patients
g(Z1), g(Z2) = predicted risk scores
C=0.5 expectation for random prediction
C=1 if the exact order of all deaths is predicted
Forest plot
Study
Survival
Kaplan-Meier estimate
10 microarray datasets
L. Waldron et al. A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer. Submitted.
C-Index
Time
7
Assessment of prognostic models
14 prognostic signatures
Validation Statistics:
14 Models in 10 Datasets
10 microarray datasets
L. Waldron et al. A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer. Submitted.
8
Assessment of prognostic models
14 prognostic signatures
Validation Statistics:
14 Models in 10 Datasets
Cancer Genome Atlas Research Network.
Nature. 2011 474(7353):609-15.
Integrated genomic analyses of ovarian
carcinoma.
Bonome et al. Cancer Res. 2008
68(13):5478-86. A gene signature
predicting for survival in suboptimally
debulked patients with ovarian cancer.
193
10
263
10 microarray datasets
L. Waldron et al. A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer. Submitted.
9
A little gene overlap corresponds to
substantial risk score similarity
Risk scores
Correlations Gene overlap
10
Assessment of prognostic models
14 prognostic signatures
Validation Statistics:
14 Models in 10 Datasets
Dressman et al. J Clin Oncol. 2007
25(5):517-25. An integrated genomicbased approach to individualized
treatment of patients with advancedstage ovarian cancer.
Baggerly et al. J Clin Oncol. 2008
26(7):1186-7. Run batch effects
potentially compromise the usefulness
of genomic signatures for ovarian
cancer.
Dressman et al. J Clin Oncol. 2012
30(6):678. Retraction.
10 microarray datasets
L. Waldron et al. A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer. Submitted.
11
Assessment of prognostic models
Validation Statistics:
14 Models in 10 Datasets
14 prognostic signatures
Conclusions:
• Validation datasets can be
biased
• Most models make better
predictions than random
• Large, consortium studies
performed best
• None of these models are
ready for the clinic
L. Waldron et al. A comparative meta-analysis of prognostic gene
signatures for late-stage ovarian cancer. Submitted.
12
Assessment of gene signatures
(not models)
•
•
•
•
Start with a signature defined as a list of genes
Fit a simple prediction algorithm (β = ±1)
Compute “leave-one-in” matrix of C-statistics
Repeat with random gene sets
Test sets
Training sets
1
2
3
4
5
1
CV
Z12
Z13
Z14
Z15
2
Z21
CV
Z23
Z24
Z25
3
Z31
Z32
CV
Z34
Z35
4
Z41
Z42
Z43
CV
Z45
5
Z51
Z52
Z53
Z54
CV
13
Assessment of gene signatures
About half of gene
signatures provide
prognostic “value added”
over 97.5% of gene
random signatures
L. Waldron et al. A comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer.
14
Prediction of surgical debulkability
• Standard treatment includes surgical
debulking, but it is suboptimal for ~50% cases
• What if we could predict suboptimal
debulking from the biopsy?
M. Riester, W. Wei, L. Waldron, A. C. Culhane, L. Trippa, F. Michor, C. Huttenhower, G. Parmigiani, M. Birrer. Risk prediction
for late-stage ovarian cancer by meta-analysis of 1,622 patient samples: Biologic and Clinical Correlations.
Validation of a meta-analysis discovery:
prediction of suboptimal debulking
Stage 1: public data
200-gene signature
16
Validation of a meta-analysis discovery:
prediction of suboptimal debulking
qRT-PCR
8-gene signature
78 new specimens
from Bonome et al. study
Compare to AUC ~ 0.6 in microarray validation
17
Validation of a meta-analysis discovery:
prediction of suboptimal debulking
179 new specimens from tissue microarray
Immunohistochemistry
3-protein signature
Number of Cases
POSTN Immunohistochemistry
-
+
++
+++
Compare to AUC ~ 0.6 in microarray validation
18
Outlook: Meta-analysis and Validation
• Meta-analysis for prediction modeling works
– Provides sample size
– Identifies and mitigates dataset-specific bias
• qRT-PCR and protein assays can dramatically
improve prediction accuracy
• Model testing in meta-analysis by:
– “leave-one-dataset-in” cross-validation
– “leave-one-dataset-out” cross-validation
19
Reproducible analysis
20
Thank you
Giovanni Parmigiani lab
Markus Riester, Dave Zhao, Cristian Tomasetti, Emmanuele Mazzola, Jie Ding,
Svitlana Tyekucheva, Victoria Wang, Ina Jazic, Ben Ganzfried, Romi Magori-Cohen
Curtis Huttenhower lab
Nicola Segata, Tim Tickle, Xochitl Morgan, Daniela Boernigen, Eric Franzosa, Brian
Palmer, Joseph Moon, Emma Schwager, Jim Kaminski, Craig Bielski, Vagheesh
Narasimhan
MGH – Boston
Michael Birrer
Dana-Farber Cancer Institute
Lorenzo Trippa
University of Montreal
Benjamin Haibe-Kains
21
HR increases with training sample size
for most test sets
22
RNA-seq vs. microarray validation
TCGA validation dataset
23
Manuscripts and publications
1. B.F. Ganzfried* and M. Riester*, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, X.
V. Wang, M. Ahmadifar, M. Birrer, G. Parmigiani, C. Huttenhower, L. Waldron.
curatedOvarianData: Clinically Annotated Data for the Ovarian Cancer
Transcriptome (DATABASE 2013).
2. L. Waldron, B. Haibe-Kains, A. C. Culhane, M. Riester, J. Ding, V. Wang, S.
Tyekucheva, C. Bernau, T. Risch, B. Ganzfried, C. Huttenhower, M. Birrer, G.
Parmigiani. A comparative meta-analysis of prognostic gene signatures for latestage ovarian cancer (submitted).
3. M. Riester, W. Wei, L. Waldron, A. C. Culhane, L. Trippa, F. Michor, C. Huttenhower,
G. Parmigiani, M. Birrer. Risk prediction for late-stage ovarian cancer by metaanalysis of 1,622 patient samples: Biologic and Clinical Correlations (submitted).
4. D. Zhao, C. Huttenhower, G. Parmigiani, L. Waldron. Mas-o-menos: a simple sign
average method for discrimination in genomic data analysis (submitted, preprint
at http://biostats.bepress.com/harvardbiostat/paper158/).
5. L. Trippa, L. Waldron, C. Huttenhower, G. Parmigiani. Cross-study validation of
prediction methods. (submitted).
24