Transcript Slide 1
Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module 8, Part II Clinical Data Integration and Survival Analysis Anna Lapuk, PhD Module 8 Part II overview – Clinical data and survival analysis theory – Lab on survival analysis Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Analytic techniques for biomarker discovery Disease characterization Whole genome/who le transcriptom e data Module 8: Clinical Data Integration and Survival Analysis “Clinical data” bioinformatics.ca Clinical data (variables) ID race family history (yes/no) Nodal status (yes/no; number of nodes involved) Radiation Chemo Hormone therapy Protein IHC Stage Size Age at diagnosis Estrogen receptor level Progesterone level SBR grade Overall outcome (dead/alive) Overall survival time Survival times – time to a given Disease specific outcome (dead/alive) end point Disease specific survival time Recurrence status (yes/no) Survival analysis time to recurrence Time to distant recurrence Distant recurrence status (yes/no) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Survival analysis Goal Technique Estimate the probability of individual surviving for a given time period (one year) Kaplan-Meier survival curve, life table Compare survival experience of two different groups of individuals (drug/placebo) Logrank test (comparison of different KM curves) Detect clinical/genomic/epidemiologic variables which contribute to the risk (associated with poor outcome) Multivariate (univariate) Cox regression model Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Survival data • Survival time – is the time from a fixed point to an end point Starting point End point Surgery Death/Recurrence/Relapse Diagnosis Death/Recurrence/Relapse Treatment Death/Recurrence/Relapse • Almost never observe the event of interest in all subjects (censoring of data) • Need for a special analytical techniques Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Censored observations • Arise whenever the dependent variable of interest represents the time to a terminal event, and the duration of the study is limited in time. • Incomplete observation - the event of interest did not occur at the time of the analysis. Event of Interest Censored observation Death of the disease Still alive Survival of marriage Still married Drop-out-time from school Still in school • Type I and II censoring (time fixed/proportion of subjects fixed) • Right and left censoring Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Types of censoring Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Survival time and probability p p2 p3 p4 • Survival probability for a given length of time can be calculated considering time in 1 intervals. • Probability of survival month 2 is the probability of surviving month 1 multiplied by the probability of surviving month 2 provided that the patient has survived month 1 (conditional probability) Survival probability = p1 x p2 x p3 x p4 x ... pj pj is the probability of surviving month j of those still known to be alive after (j-1) months. • In the reality time intervals contain exactly one case. Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Kaplan-Meier Curve Survival probability 1 0.5 Censored observations 0 0 1 2 3 4 5 6 7 Time (months) r – still at risk f – failure (reached the end point) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Kaplan-Meier Curve Survival probability 1 What is the probability of a patient to survive 2.5 months? 0.5 Censored observations 0 0 1 2 3 4 5 6 7 Time (months) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Kaplan-Meier Curve Survival probability 1 Treated patients Untreated patients 0.5 Are survival experiences significantly different? 0 0 1 2 3 4 5 6 7 Time (months) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Logrank test Is a non-parametric method to test the null hypothesis that compared groups are samples from the same population with regard to survival experience. (Doesn’t tell how different) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Divide time scale into intervals 1 Survival probability Treated patients Untreated patients 0.5 0 Compare proportions at every time interval and summarize it across intervals (similar to a Chi-square test) 0 1 2 3 4 5 6 7 Time (months) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Logrank test: compare survival experience of two different groups of individuals Chi-square Log-rank k time intervals O – observed proportion E – expected V – variance of (O-E) Then compare with the χ2 distribution with (k-1) degrees of freedom Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Hazard ratio Hazard ratio compares two groups differing in treatments or prognostic variables etc. Measures relative survival in two groups based on the complete period studied. R=0.43 – relative risk (hazard) of poor outcome under the condition of group 1 is 43% of that of group 2. R= 2.0 then the rate of failure in group 1 is twice the rate in the group 2. Note: for entire period. Check for consistency across time intervals (tells how different) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Cox-proportional hazard model Used to investigate the effect of several variables on survival experience. Multivariate proportional hazards regression model described by D.R. Cox for modeling survival times. It is also called proportional hazards model because it estimates the ratio of the risks (hazard ratio or relative hazard). There are multiple predictor variables (such as prognostic markers whose individual contribution to the outcome is being assessed in the presence of the others) and the outcome variable . Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Hazard function Prognostic index (PI) • X1...Xp – independent variable of interest • b1 ... bp – regression coefficients to be estimated • Assumption: the effect of variables is constant over time and additive in a particular scale • (Similarly to K-M) Hazard function is a risk of dying after a given time assuming survival thus far. • Cumulative function • H0(t) – cumulative baseline or underlying function. • Probability of surviving to time t is S(t) = exp[-H(t)] for every individual with given values of the variables in the model we can estimate this probability. Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Interpretation of the Cox model Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216) variable Regression coef (b) SE(b) exp(b) Serum billirubin 2.510 0.316 12.31 Age 0.00690 0.00162 1.01 Cirrhosis 0.879 0.216 2.41 Serum albumin -0.0504 0.0181 0.95 Central cholestasis 0.679 0.275 1.97 Therapy 0.52 0.207 1.68 • Coefficient: •Sign – positive or negative association with poor survival •Magnitude – refers to the increase in log hazard for an increase of 1 in the value of the covariate Altman D, 1991 Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Interpretation of the Cox model Cox regression model fitted to data from PBC trial of azathioprine vs placebo (n=216) variable Regression coef (b) SE(b) exp(b) Increase of value of the variable by 1 will result in (relative to baseline) Serum billirubin 2.510 0.316 12.31 1231% Age 0.00690 0.00162 1.01 101% Cirrhosis 0.879 0.216 2.41 241% Serum albumin -0.0504 0.0181 0.95 95% Central cholestasis 0.679 0.275 1.97 197% Therapy 0.52 0.207 1.68 168% • Coefficient: •Sign – positive or negative association with poor survival •Magnitude – refers to the increase in log hazard for an increase of 1 in the value of the covariate. If the value changes by 1, hazard changes Exp(b) times. Modified from Altman D, 1991 Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Survival curves based on Cox model Altman D, 1991 Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Survival curves based on Cox model power of analysis depends on the number of terminal events – deaths Higher power requires longer follow-up times. Alternative , more frequent endpoints – recurrence Estimation of a sample size to achieve required power is a hard task. Namograms help. Altman D, 1991 Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca What Have We Learned? • Clinical data is a highly important component and is intrinsically different from genomic/transcriptomic data. • Survival data is a special type of data requiring special methodology • Main applications of survival analysis: – Estimates of survival probability of a patient for a given length of time (KaplanMeier survival curve) under given circumstances. – Comparison of survival experiences of groups of patients (is the drug working???) (log-rank test) – Investigation of risk factors contributing to the outcome (make a prognosis for a given patient and choose appropriate therapy) (Cox-regression model) Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca Questions? Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca References • • • • • Statistics for Medical Research, Douglas G Altman , 1991 Chapman & Hall/CRC Pharmacogenetics and pharmacogenomics: development, science, and translation. Weinshilboum RM, Wang L. Annu Rev Genomics Hum Genet. 2006;7:223-45. PMID: 16948615 Pharmacogenomics: candidate gene identification, functional validation and mechanisms. Wang L, Weinshilboum RM. Hum Mol Genet. 2008 Oct 15;17(R2):R174-9. PMID: 18852207 End-sequence profiling: sequence-based analysis of aberrant genomes. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo WL, Magrane G, De Jong P, Gray JW, Collins C. Proc Natl Acad Sci U S A. 2003 Jun 24;100(13):7696-701. PMID: 12788976 A Review of Trastuzumab-Based Therapy in Patients with HER2-positive Metastatic Breast Cancer, David N. Church and Chris G.A. Price. Clinical Medicine: Therapeutics 2009:1 557-570 • Other useful references: • • The hallmarks of cancer. Hanahan D, Weinberg RA. Cell. 2000 Jan 7;100(1):57-70. PMID: 10647931 Aberrant and alternative splicing in cancer. Venables JP Cancer Res. 2004 Nov 1;64(21):764754. PMID: 15520162 Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca We are on a Coffee Break & Networking Session Module 8: Clinical Data Integration and Survival Analysis bioinformatics.ca