Technical approaches for biomarker discovery

Download Report

Transcript Technical approaches for biomarker discovery

Canadian Bioinformatics Workshops
www.bioinformatics.ca
1
Module #: Title of Module
2
Anna Lapuk, PhD
Vancouver Prostate Centre
[email protected]
Module 3: Clinical genomics
and survival analysis
Disease characterization
Whole
genome/whole
transcriptome
data
Module 3: Clinical genomics
and survival analysis
“Clinical data”
ID
race
family history (yes/no)
Nodal status (yes/no; number of nodes involved)
Radiation
Chemo
Hormone therapy
Protein IHC
Stage
Size
Age at diagnosis
Estrogen receptor level
Progesterone level
SBR grade
Overall outcome (dead/alive)
Overall survival time
Disease specific outcome (dead/alive)
Disease specific survival time
Recurrence status (yes/no)
time to recurrence
Time to distant recurrence
Distant recurrence status (yes/no)
Module 3: Clinical genomics
and survival analysis
Survival times – time
to a given end point
Survival analysis
Goal
Technique
Estimate the probability of individual
surviving for a given time period (one year)
Kaplan-Meier survival curve, life table
Compare survival experience of two
different groups of individuals
(drug/placebo)
Logrank test (comparison of different K-M
curves)
Detect clinical/genomic/epidemiologic
variables which contribute to the risk
(associated with poor outcome)
Multivariate (univariate) Cox regression
model
Module 3: Clinical genomics
and survival analysis
• Survival time – is the time from a fixed point
to an end point
Starting point
Surgery
End point
Death/Recurrence/Relapse
Diagnosis
Treatment
Death/Recurrence/Relapse
Death/Recurrence/Relapse
• Almost never observe the event of interest in
all subjects (censoring of data)
• Need for a special analytical techniques
Module 3: Clinical genomics
and survival analysis
• Arise whenever the dependent variable of interest represents
the time to a terminal event, and the duration of the study is
limited in time.
• Incomplete observation - the event of interest did not occur at
the time of the analysis.
Event of Interest
Death of the disease
Censored observation
Still alive
Survival of marriage
Still married
Drop-out-time from school Still in school
• Type I and II censoring (time fixed/proportion of subjects
fixed)
• Right and left censoring
Module 3: Clinical genomics
and survival analysis
Module 3: Clinical genomics
and survival analysis
p1
p2 p3
p4
• Survival probability for a given length of time can be calculated considering time in intervals.
• Probability of survival month 2 is the probability of
surviving month 1 multiplied by the probability of surviving month 2
provided that the patient has survived month 1 (conditional probability)
Survival probability = p1 x p2 x p3 x p4 x ... pj
pj is the probability of surviving month j of those still known to be alive after (j-1) months.
• In the reality time intervals contain exactly one case.
Module 3: Clinical genomics
and survival analysis
Survival probability
1
0.5
Censored
observations
0
0
1
2
3
4
5
Time (months)
6
7
r – still at risk
f – failure (reached the end point)
Module 3: Clinical genomics
and survival analysis
Survival probability
1
What is the probability of a
patient to survive 2.5 months?
0.5
Censored
observations
0
0
1
Module 3: Clinical genomics
and survival analysis
2
3
4
5
Time (months)
6
7
1
Survival probability
Treated patients
Untreated patients
0.5
Are survival experiences
significantly different?
0
0
1
Module 3: Clinical genomics
and survival analysis
2
3
4
5
Time (months)
6
7
Is a non-parametric method to test the null
hypothesis that compared groups are samples
from the same population with regard to
survival experience.
(Doesn’t tell how different)
Module 3: Clinical genomics
and survival analysis
1
Survival probability
Treated patients
Untreated patients
0.5
Compare proportions at
every time interval and
summarize it across
intervals (similar to a Chisquare test)
0
0
1
Module 3: Clinical genomics
and survival analysis
2
3
4
5
Time (months)
6
7
Chi-square
Log-rank
k time intervals
O – observed proportion
E – expected
V – variance of (O-E)
Then compare with the χ2 distribution with (k-1) degrees of
freedom
Module 3: Clinical genomics
and survival analysis
Measures relative survival in two groups base
don the complete period studied
R=0.43 – relative risk (hazard) of poor outcome under
the condition of group 1 is 43% of that of group 2.
Module 3: Clinical genomics
and survival analysis
(tells how different)
investigate the effect of several variables on
survival experience
Multivariate proportional hazards
regression model
Module 3: Clinical genomics
and survival analysis
• X1...Xp – independent variable of interest
• b1 ... bp – regression coefficients to be estimated
• Assumption: the effect of variables is constant over time and
additive in a particular scale
• (Similarly to K-M) Hazard function is a risk of dying after a given
time assuming survival thus far
• Cumulative function
• H0(t) – cumulative baseline or underlying function.
• Probability of surviving to time t is
S(t) = exp[-H(t)]
for every individual with given values of the variables in the model
we can estimate this probability.
Module 3: Clinical genomics
and survival analysis
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216)
variable
Regression coef (b)
SE(b)
exp(b)
Serum billirubin
2.510
0.316
12.31
Age
0.00690
0.00162
1.01
Cirrhosis
0.879
0.216
2.41
Serum albumin
-0.0504
0.0181
0.95
Central cholestasis
0.679
0.275
1.97
Therapy
0.52
0.207
1.68
• Coefficient:
•Sign – positive or negative association with poor survival
•Magnitude – refers to the increase in log hazard for an increase of
1 in the value of the covariate
Module 3: Clinical genomics
and survival analysis
Altman D, 1991
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216)
variable
Regression coef (b)
SE(b)
exp(b)
Increase of value
of the variable by
1 will result in
Serum billirubin
2.510
0.316
12.31
1231%
Age
0.00690
0.00162
1.01
101%
Cirrhosis
0.879
0.216
2.41
241%
Serum albumin
-0.0504
0.0181
0.95
95%
Central cholestasis
0.679
0.275
1.97
197%
Therapy
0.52
0.207
1.68
168%
• Coefficient:
•Sign – positive or negative association with poor survival
•Magnitude – refers to the increase in log hazard for an increase of
1 in the value of the covariate. If the value changes by 1, hazard
changes Exp(b) times.
Module 3: Clinical genomics
and survival analysis
Modified from Altman D, 1991
Module 3: Clinical genomics
and survival analysis
Altman D, 1991
• Clinical data is a highly important component and is intrinsically
different from genomic/transcriptomic data.
• Survival data is a special type of data requiring special
methodology
• Main applications of survival analysis:
– Estimates of survival probability of a patient for a given length of time (KaplanMeier survival curve) under given circumstances.
– Comparison of survival experiences of groups of patients (is the drug working???)
(log-rank test)
– Investigation of risk factors contributing to the outcome (make a prognosis for a
given patient and choose appropriate therapy)
Module 3: Clinical genomics
and survival analysis
Back in 20 minutes
Module 3: Clinical genomics
and survival analysis
•
•
•
•
•
•
•
•
Statistics for Medical Research, Douglas G Altman , 1991 Chapman & Hall/CRC
Pharmacogenetics and pharmacogenomics: development, science, and translation.
Weinshilboum RM, Wang L. Annu Rev Genomics Hum Genet. 2006;7:223-45. PMID:
16948615
Pharmacogenomics: candidate gene identification, functional validation and mechanisms.
Wang L, Weinshilboum RM. Hum Mol Genet. 2008 Oct 15;17(R2):R174-9. PMID: 18852207
End-sequence profiling: sequence-based analysis of aberrant genomes. Volik S, Zhao S, Chin
K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo WL, Magrane G, De Jong
P, Gray JW, Collins C. Proc Natl Acad Sci U S A. 2003 Jun 24;100(13):7696-701. PMID:
12788976
A Review of Trastuzumab-Based Therapy in Patients with HER2-positive Metastatic Breast
Cancer, David N. Church and Chris G.A. Price. Clinical Medicine: Therapeutics 2009:1 557-570
Other useful references:
The hallmarks of cancer. Hanahan D, Weinberg RA. Cell. 2000 Jan 7;100(1):57-70. PMID:
10647931
Aberrant and alternative splicing in cancer. Venables JP Cancer Res. 2004 Nov 1;64(21):764754. PMID: 15520162
Module 3: Clinical genomics
and survival analysis