Transcript Slide 1

Canadian Bioinformatics Workshops
www.bioinformatics.ca
Module #: Title of Module
2
Module 8, Part II
Clinical Data Integration and
Survival Analysis
Anna Lapuk, PhD
Module 8 Part II overview
– Clinical data and survival analysis theory
– Lab on survival analysis
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Analytic techniques for biomarker discovery
Disease characterization
Whole
genome/who
le
transcriptom
e data
Module 8: Clinical Data Integration and Survival Analysis
“Clinical
data”
bioinformatics.ca
Clinical data (variables)
ID
race
family history (yes/no)
Nodal status (yes/no; number of nodes involved)
Radiation
Chemo
Hormone therapy
Protein IHC
Stage
Size
Age at diagnosis
Estrogen receptor level
Progesterone level
SBR grade
Overall outcome (dead/alive)
Overall survival time
Survival times – time to a given
Disease specific outcome (dead/alive)
end point
Disease specific survival time
Recurrence status (yes/no)
Survival analysis
time to recurrence
Time to distant recurrence
Distant recurrence status (yes/no)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Survival analysis
Goal
Technique
Estimate the probability of individual
surviving for a given time period (one
year)
Kaplan-Meier survival curve, life table
Compare survival experience of two
different groups of individuals
(drug/placebo)
Logrank test (comparison of different KM curves)
Detect clinical/genomic/epidemiologic
variables which contribute to the risk
(associated with poor outcome)
Multivariate (univariate) Cox regression
model
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Survival data
• Survival time – is the time from a fixed point to an
end point
Starting point
End point
Surgery
Death/Recurrence/Relapse
Diagnosis
Death/Recurrence/Relapse
Treatment
Death/Recurrence/Relapse
• Almost never observe the event of interest in all
subjects (censoring of data)
• Need for a special analytical techniques
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Censored observations
• Arise whenever the dependent variable of interest represents
the time to a terminal event, and the duration of the study is
limited in time.
• Incomplete observation - the event of interest did not occur at
the time of the analysis.
Event of Interest
Censored observation
Death of the disease
Still alive
Survival of marriage
Still married
Drop-out-time from school
Still in school
• Type I and II censoring (time fixed/proportion of subjects
fixed)
• Right and left censoring
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Types of censoring
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Survival time and probability
p p2 p3
p4
• Survival probability for a given length of time can be calculated considering time in
1
intervals.
• Probability of survival month 2 is the probability of
surviving month 1 multiplied by the probability of surviving month 2
provided that the patient has survived month 1 (conditional probability)
Survival probability = p1 x p2 x p3 x p4 x ... pj
pj is the probability of surviving month j of those still known to be alive after (j-1)
months.
• In the reality time intervals contain exactly one case.
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Kaplan-Meier Curve
Survival probability
1
0.5
Censored
observations
0
0
1
2
3
4
5
6
7
Time (months)
r – still at risk
f – failure (reached the end point)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Kaplan-Meier Curve
Survival probability
1
What is the probability of
a patient to survive 2.5
months?
0.5
Censored
observations
0
0
1
2
3
4
5
6
7
Time (months)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Kaplan-Meier Curve
Survival probability
1
Treated patients
Untreated patients
0.5
Are survival
experiences
significantly different?
0
0
1
2
3
4
5
6
7
Time (months)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Logrank test
Is a non-parametric method to test the null hypothesis
that compared groups are samples from the same
population with regard to survival experience.
(Doesn’t tell how different)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Divide time scale into intervals
1
Survival probability
Treated patients
Untreated patients
0.5
0
Compare proportions at
every time interval and
summarize it across
intervals (similar to a
Chi-square test)
0
1
2
3
4
5
6
7
Time (months)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Logrank test: compare survival experience
of two different groups of individuals
Chi-square
Log-rank
k time intervals
O – observed proportion
E – expected
V – variance of (O-E)
Then compare with the χ2 distribution with (k-1) degrees of
freedom
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Hazard ratio
Hazard ratio compares two groups differing in treatments or
prognostic variables etc. Measures relative survival in two
groups based on the complete period studied.
R=0.43 – relative risk (hazard) of poor outcome under the
condition of group 1 is 43% of that of group 2.
R= 2.0 then the rate of failure in group 1 is twice the rate in the
group 2.
Note: for entire period. Check for consistency across time intervals
(tells how different)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Cox-proportional hazard model
Used to investigate the effect of several variables on
survival experience.
Multivariate proportional hazards regression model described
by D.R. Cox for modeling survival times. It is also called
proportional hazards model because it estimates the ratio of
the risks (hazard ratio or relative hazard). There are multiple
predictor variables (such as prognostic markers whose
individual contribution to the outcome is being assessed in
the presence of the others) and the outcome variable .
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Hazard function
Prognostic index (PI)
• X1...Xp – independent variable of interest
• b1 ... bp – regression coefficients to be estimated
• Assumption: the effect of variables is constant over time and
additive in a particular scale
• (Similarly to K-M) Hazard function is a risk of dying after a given
time assuming survival thus far.
• Cumulative function
• H0(t) – cumulative baseline or underlying function.
• Probability of surviving to time t is
S(t) = exp[-H(t)]
for every individual with given values of the variables in the model
we can estimate this probability.
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Interpretation of the Cox model
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216)
variable
Regression coef
(b)
SE(b)
exp(b)
Serum billirubin
2.510
0.316
12.31
Age
0.00690
0.00162
1.01
Cirrhosis
0.879
0.216
2.41
Serum albumin
-0.0504
0.0181
0.95
Central cholestasis
0.679
0.275
1.97
Therapy
0.52
0.207
1.68
• Coefficient:
•Sign – positive or negative association with poor survival
•Magnitude – refers to the increase in log hazard for an
increase of 1 in the value of the covariate
Altman D, 1991
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Interpretation of the Cox model
Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216)
variable
Regression coef (b)
SE(b)
exp(b)
Increase of value of the
variable by 1 will result
in (relative to baseline)
Serum billirubin
2.510
0.316
12.31
1231%
Age
0.00690
0.00162
1.01
101%
Cirrhosis
0.879
0.216
2.41
241%
Serum albumin
-0.0504
0.0181
0.95
95%
Central cholestasis
0.679
0.275
1.97
197%
Therapy
0.52
0.207
1.68
168%
• Coefficient:
•Sign – positive or negative association with poor survival
•Magnitude – refers to the increase in log hazard for an
increase of 1 in the value of the covariate. If the value
changes by 1, hazard changes Exp(b) times.
Modified from Altman D, 1991
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Survival curves based on Cox model
Altman D, 1991
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Survival curves based on Cox model
power of analysis depends on the number of
terminal events – deaths
Higher power requires longer follow-up times.
Alternative , more frequent endpoints – recurrence
 Estimation of a sample size to achieve required
power is a hard task. Namograms help.
Altman D, 1991
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
What Have We Learned?
• Clinical data is a highly important component and is intrinsically
different from genomic/transcriptomic data.
• Survival data is a special type of data requiring special
methodology
• Main applications of survival analysis:
– Estimates of survival probability of a patient for a given length of time (KaplanMeier survival curve) under given circumstances.
– Comparison of survival experiences of groups of patients (is the drug working???)
(log-rank test)
– Investigation of risk factors contributing to the outcome (make a prognosis for a
given patient and choose appropriate therapy) (Cox-regression model)
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
Questions?
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
References
•
•
•
•
•
Statistics for Medical Research, Douglas G Altman , 1991 Chapman & Hall/CRC
Pharmacogenetics and pharmacogenomics: development, science, and translation.
Weinshilboum RM, Wang L. Annu Rev Genomics Hum Genet. 2006;7:223-45. PMID: 16948615
Pharmacogenomics: candidate gene identification, functional validation and mechanisms.
Wang L, Weinshilboum RM. Hum Mol Genet. 2008 Oct 15;17(R2):R174-9. PMID: 18852207
End-sequence profiling: sequence-based analysis of aberrant genomes. Volik S, Zhao S, Chin K,
Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo WL, Magrane G, De Jong P,
Gray JW, Collins C. Proc Natl Acad Sci U S A. 2003 Jun 24;100(13):7696-701. PMID: 12788976
A Review of Trastuzumab-Based Therapy in Patients with HER2-positive Metastatic Breast
Cancer, David N. Church and Chris G.A. Price. Clinical Medicine: Therapeutics 2009:1 557-570
• Other useful references:
•
•
The hallmarks of cancer. Hanahan D, Weinberg RA. Cell. 2000 Jan 7;100(1):57-70. PMID:
10647931
Aberrant and alternative splicing in cancer. Venables JP Cancer Res. 2004 Nov 1;64(21):764754. PMID: 15520162
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca
We are on a Coffee Break &
Networking Session
Module 8: Clinical Data Integration and Survival Analysis
bioinformatics.ca