Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Download ReportTranscript Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Slide 1
Introduction to Survival Analysis
October 19, 2004
Brian F. Gage, MD, MSc
with thanks to Bing Ho, MD, MPH
Division of General Medical Sciences
Slide 2
Presentation goals
Survival analysis compared w/ other regression
techniques
What is survival analysis
When to use survival analysis
Univariate method: Kaplan-Meier curves
Multivariate methods:
• Cox-proportional hazards model
• Parametric models
Assessment of adequacy of analysis
Examples
Slide 3
Regression vs. Survival Analysis
T ech n iq u e
P red icto r
V ariab les
O u tco m e
V ariab le
L in ear
R eg ression
C ategorical o r
continuous
L o g istic
R eg ression
C ategorical o r B inary (excep t in
p o lyto m o u s lo g.
continuous
No
S u rv iv al
A n alyses
T im e and
categorical or
continuous
Y es
N orm ally
distributed
C en so rin g
p erm itted?
No
regressio n )
B inary
Slide 4
Regression vs. Survival Analysis
T ech n iq u e
M ath em atical
m o d el
Y ield s
L in ear
R eg ression
Y = B 1X + B o
(lin ear)
L in ear ch an g e s
L o g istic
R eg ression
L n (P /1 -P )= B 1 X + B o
(sigm o id al pro b .)
O d d s ratio s
S u rv iv al
A n alyses
h (t) =
h o (t)ex p ( B 1 X + B o )
H azard rates
Slide 5
What is survival analysis?
Model time to failure or time to event
• Unlike linear regression, survival analysis has a
dichotomous (binary) outcome
• Unlike logistic regression, survival analysis analyzes the
time to an event
– Why is that important?
Able to account for censoring
Can compare survival between 2+ groups
Assess relationship between covariates and
survival time
Slide 6
Importance of censored data
Why is censored data important?
What is the key assumption of censoring?
Slide 7
Types of censoring
Subject does not
experience event of
interest
Incomplete follow-up
• Lost to follow-up
• Withdraws from study
• Dies (if not being studied)
Left or right censored
Slide 8
When to use survival analysis
Examples
• Time to death or clinical endpoint
• Time in remission after treatment of disease
• Recidivism rate after addiction treatment
When one believes that 1+ explanatory variable(s)
explains the differences in time to an event
Especially when follow-up is incomplete or variable
Slide 9
Relationship between survivor function
and hazard function
Survivor function, S(t) defines the probability of
surviving longer than time t
• this is what the Kaplan-Meier curves show.
• Hazard function is the derivative of the survivor
function over time h(t)=dS(t)/dt
– instantaneous risk of event at time t (conditional failure rate)
Survivor and hazard functions can be converted
into each other
Slide 10
Approach to survival analysis
Like other statistics we have studied we can do any
of the following w/ survival analysis:
• Descriptive statistics
• Univariate statistics
• Multivariate statistics
Slide 11
Descriptive statistics
Average survival
• When can this be calculated?
• What test would you use to compare average survival
between 2 cohorts?
Average hazard rate
• Total # of failures divided by observed survival time
(units are therefore 1/t or 1/pt-yrs)
• An incidence rate, with a higher values indicating more
events per time
Slide 12
Univariate method: Kaplan-Meier
survival curves
Also known as product-limit formula
Accounts for censoring
Generates the characteristic “stair step” survival
curves
Does not account for confounding or effect
modification by other covariates
• When is that a problem?
• When is that OK?
Slide 13
1.0
0.9
0.8
0.7
Warf
ASA
0.6
0.5
No Rx
0.4
Age 76 Years and Older (N = 394)
0.3
0.2
0.1
0.0
0
100 200 300 400
500 600 700 800
Days Since Index Hospitalization
900
Slide 14
Time to Cardiovascular Adverse Event in VIGOR Trial
Slide 15
Slide 16
Comparing Kaplan-Meier curves
Log-rank test can be used to compare survival
curves
• Less-commonly used test: Wilcoxon, which places greater weights
on events near time 0.
Hypothesis test (test of significance)
• H0: the curves are statistically the same
• H1: the curves are statistically different
Compares observed to expected cell counts
Test statistic which is compared to 2 distribution
Slide 17
Comparing multiple Kaplan-Meier curves
Multiple pair-wise comparisons produce cumulative
Type I error – multiple comparison problem
Instead, compare all curves at once
• analogous to using ANOVA to compare > 2 cohorts
• Then use judicious pair-wise testing
Slide 18
Limit of Kaplan-Meier curves
What happens when you have several covariates that you
believe contribute to survival?
Example
• Smoking, hyperlipidemia, diabetes, hypertension, contribute to
time to myocardial infarct
Can use stratified K-M curves – for 2 or maybe 3
covariates
Need another approach – multivariate Cox proportional
hazards model is most common -- for many covariates
• (think multivariate regression or logistic regression rather than a
Student’s t-test or the odds ratio from a 2 x 2 table)
Slide 19
Multivariate method: Cox proportional hazards
Needed to assess effect of multiple covariates on
survival
Cox-proportional hazards is the most commonly
used multivariate survival method
• Easy to implement in SPSS, Stata, or SAS
• Parametric approaches are an alternative, but they
require stronger assumptions about h(t).
Slide 20
Cox proportional hazard model
Works with hazard model
Conveniently separates baseline hazard function from
covariates
• Baseline hazard function over time
– h(t) = ho(t)exp(B1X+Bo)
• Covariates are time independent
• B1 is used to calculate the hazard ratio, which is similar to the
relative risk
Nonparametric
Quasi-likelihood function
Slide 21
Cox proportional hazards model, continued
Can handle both continuous and categorical
predictor variables (think: logistic, linear regression)
Without knowing baseline hazard ho(t), can still
calculate coefficients for each covariate, and
therefore hazard ratio
Assumes multiplicative risk—this is the
proportional hazard assumption
• Can be compensated in part with interaction terms
Slide 22
Limitations of Cox PH model
Does not accommodate variables that change over
time
• Luckily most variables (e.g. gender, ethnicity, or
congenital condition) are constant
– If necessary, one can program time-dependent variables
– When might you want this?
Baseline hazard function, ho(t), is never specified
• You can estimate ho(t) accurately if you need to estimate
S(t).
Slide 23
Hazard ratio
What is the hazard ratio and how to you calculate it
from your parameters, β
How do we estimate the relative risk from the
hazard ratio (HR)?
How do you determine significance of the hazard
ratios (HRs).
• Confidence intervals
• Chi square test
Slide 24
Assessing model adequacy
Multiplicative assumption
Proportional assumption: covariates are
independent with respect to time and their hazards
are constant over time
Three general ways to examine model adequacy
• Graphically
• Mathematically
• Computationally: Time-dependent variables (extended
model)
Slide 25
Model adequacy: graphical approaches
Several graphical approaches
• Do the survival curves intersect?
• Log-minus-log plots
• Observed vs. expected plots
Slide 26
Testing model adequacy mathematically with
a goodness-of-fit test
Uses a test of significance (hypothesis test)
One-degree of freedom chi-square distribution
p value for each coefficient
Does not discriminate how a coefficient might
deviate from the PH assumption
Slide 27
Example: Tumor Extent
3000 patients derived from SEER cancer registry
and Medicare billing information
Exploring the relationship between tumor extent
and survival
Hypothesis is that more extensive tumor
involvement is related to poorer survival
Slide 28
Log-Rank
2 = 269.0973 p <.0001
Slide 29
Example: Tumor Extent
Tumor extent may not be the only covariate that
affects survival
• Multiple medical comorbidities may be associated
with poorer outcome
• Ethnic and gender differences may contribute
Cox proportional hazards model can quantify
these relationships
Slide 30
Example: Tumor Extent
Test proportional hazards assumption with logminus-log plot
Perform Cox PH regression
• Examine significant coefficients and corresponding
hazard ratios
Slide 31
Slide 32
Example: Tumor Extent 5
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Variable DF
age2
age3
race2
race3
comorb1
comorb2
comorb3
DISTANT
REGIONAL
LIPORAL
PHARYNX
treat3
treat2
treat0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Parameter
Estimate
0.15690
0.58385
0.16088
0.05060
0.27087
0.32271
0.61752
0.86213
0.51143
0.28228
0.43196
0.07890
0.47215
1.52773
Standard
Error Chi-Square Pr > ChiSq
0.05079
0.06746
0.07953
0.09590
0.05678
0.06341
0.06768
0.07300
0.05016
0.05575
0.05787
0.06423
0.06074
0.08031
9.5430
74.9127
4.0921
0.2784
22.7549
25.9046
83.2558
139.4874
103.9513
25.6366
55.7206
1.5090
60.4215
361.8522
0.0020
<.0001
0.0431
0.5977
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.2193
<.0001
<.0001
Hazard 95% Hazard Ratio Variable
Ratio Confidence Limits Label
1.170
1.793
1.175
1.052
1.311
1.381
1.854
2.368
1.668
1.326
1.540
1.082
1.603
4.608
1.059
1.571
1.005
0.872
1.173
1.219
1.624
2.052
1.512
1.189
1.375
0.954
1.423
3.937
1.292
2.046
1.373
1.269
1.465
1.564
2.117
2.732
1.840
1.479
1.725
1.227
1.806
5.393
70 age>80
black
other
both
rad
none
Slide 33
Summary
Survival analyses quantifies time to a single, dichotomous
event
Handles censored data well
Survival and hazard can be mathematically converted to
each other
Kaplan-Meier survival curves can be compared
statistically and graphically
Cox proportional hazards models help distinguish
individual contributions of covariates on survival, provided
certain assumptions are met.
Introduction to Survival Analysis
October 19, 2004
Brian F. Gage, MD, MSc
with thanks to Bing Ho, MD, MPH
Division of General Medical Sciences
Slide 2
Presentation goals
Survival analysis compared w/ other regression
techniques
What is survival analysis
When to use survival analysis
Univariate method: Kaplan-Meier curves
Multivariate methods:
• Cox-proportional hazards model
• Parametric models
Assessment of adequacy of analysis
Examples
Slide 3
Regression vs. Survival Analysis
T ech n iq u e
P red icto r
V ariab les
O u tco m e
V ariab le
L in ear
R eg ression
C ategorical o r
continuous
L o g istic
R eg ression
C ategorical o r B inary (excep t in
p o lyto m o u s lo g.
continuous
No
S u rv iv al
A n alyses
T im e and
categorical or
continuous
Y es
N orm ally
distributed
C en so rin g
p erm itted?
No
regressio n )
B inary
Slide 4
Regression vs. Survival Analysis
T ech n iq u e
M ath em atical
m o d el
Y ield s
L in ear
R eg ression
Y = B 1X + B o
(lin ear)
L in ear ch an g e s
L o g istic
R eg ression
L n (P /1 -P )= B 1 X + B o
(sigm o id al pro b .)
O d d s ratio s
S u rv iv al
A n alyses
h (t) =
h o (t)ex p ( B 1 X + B o )
H azard rates
Slide 5
What is survival analysis?
Model time to failure or time to event
• Unlike linear regression, survival analysis has a
dichotomous (binary) outcome
• Unlike logistic regression, survival analysis analyzes the
time to an event
– Why is that important?
Able to account for censoring
Can compare survival between 2+ groups
Assess relationship between covariates and
survival time
Slide 6
Importance of censored data
Why is censored data important?
What is the key assumption of censoring?
Slide 7
Types of censoring
Subject does not
experience event of
interest
Incomplete follow-up
• Lost to follow-up
• Withdraws from study
• Dies (if not being studied)
Left or right censored
Slide 8
When to use survival analysis
Examples
• Time to death or clinical endpoint
• Time in remission after treatment of disease
• Recidivism rate after addiction treatment
When one believes that 1+ explanatory variable(s)
explains the differences in time to an event
Especially when follow-up is incomplete or variable
Slide 9
Relationship between survivor function
and hazard function
Survivor function, S(t) defines the probability of
surviving longer than time t
• this is what the Kaplan-Meier curves show.
• Hazard function is the derivative of the survivor
function over time h(t)=dS(t)/dt
– instantaneous risk of event at time t (conditional failure rate)
Survivor and hazard functions can be converted
into each other
Slide 10
Approach to survival analysis
Like other statistics we have studied we can do any
of the following w/ survival analysis:
• Descriptive statistics
• Univariate statistics
• Multivariate statistics
Slide 11
Descriptive statistics
Average survival
• When can this be calculated?
• What test would you use to compare average survival
between 2 cohorts?
Average hazard rate
• Total # of failures divided by observed survival time
(units are therefore 1/t or 1/pt-yrs)
• An incidence rate, with a higher values indicating more
events per time
Slide 12
Univariate method: Kaplan-Meier
survival curves
Also known as product-limit formula
Accounts for censoring
Generates the characteristic “stair step” survival
curves
Does not account for confounding or effect
modification by other covariates
• When is that a problem?
• When is that OK?
Slide 13
1.0
0.9
0.8
0.7
Warf
ASA
0.6
0.5
No Rx
0.4
Age 76 Years and Older (N = 394)
0.3
0.2
0.1
0.0
0
100 200 300 400
500 600 700 800
Days Since Index Hospitalization
900
Slide 14
Time to Cardiovascular Adverse Event in VIGOR Trial
Slide 15
Slide 16
Comparing Kaplan-Meier curves
Log-rank test can be used to compare survival
curves
• Less-commonly used test: Wilcoxon, which places greater weights
on events near time 0.
Hypothesis test (test of significance)
• H0: the curves are statistically the same
• H1: the curves are statistically different
Compares observed to expected cell counts
Test statistic which is compared to 2 distribution
Slide 17
Comparing multiple Kaplan-Meier curves
Multiple pair-wise comparisons produce cumulative
Type I error – multiple comparison problem
Instead, compare all curves at once
• analogous to using ANOVA to compare > 2 cohorts
• Then use judicious pair-wise testing
Slide 18
Limit of Kaplan-Meier curves
What happens when you have several covariates that you
believe contribute to survival?
Example
• Smoking, hyperlipidemia, diabetes, hypertension, contribute to
time to myocardial infarct
Can use stratified K-M curves – for 2 or maybe 3
covariates
Need another approach – multivariate Cox proportional
hazards model is most common -- for many covariates
• (think multivariate regression or logistic regression rather than a
Student’s t-test or the odds ratio from a 2 x 2 table)
Slide 19
Multivariate method: Cox proportional hazards
Needed to assess effect of multiple covariates on
survival
Cox-proportional hazards is the most commonly
used multivariate survival method
• Easy to implement in SPSS, Stata, or SAS
• Parametric approaches are an alternative, but they
require stronger assumptions about h(t).
Slide 20
Cox proportional hazard model
Works with hazard model
Conveniently separates baseline hazard function from
covariates
• Baseline hazard function over time
– h(t) = ho(t)exp(B1X+Bo)
• Covariates are time independent
• B1 is used to calculate the hazard ratio, which is similar to the
relative risk
Nonparametric
Quasi-likelihood function
Slide 21
Cox proportional hazards model, continued
Can handle both continuous and categorical
predictor variables (think: logistic, linear regression)
Without knowing baseline hazard ho(t), can still
calculate coefficients for each covariate, and
therefore hazard ratio
Assumes multiplicative risk—this is the
proportional hazard assumption
• Can be compensated in part with interaction terms
Slide 22
Limitations of Cox PH model
Does not accommodate variables that change over
time
• Luckily most variables (e.g. gender, ethnicity, or
congenital condition) are constant
– If necessary, one can program time-dependent variables
– When might you want this?
Baseline hazard function, ho(t), is never specified
• You can estimate ho(t) accurately if you need to estimate
S(t).
Slide 23
Hazard ratio
What is the hazard ratio and how to you calculate it
from your parameters, β
How do we estimate the relative risk from the
hazard ratio (HR)?
How do you determine significance of the hazard
ratios (HRs).
• Confidence intervals
• Chi square test
Slide 24
Assessing model adequacy
Multiplicative assumption
Proportional assumption: covariates are
independent with respect to time and their hazards
are constant over time
Three general ways to examine model adequacy
• Graphically
• Mathematically
• Computationally: Time-dependent variables (extended
model)
Slide 25
Model adequacy: graphical approaches
Several graphical approaches
• Do the survival curves intersect?
• Log-minus-log plots
• Observed vs. expected plots
Slide 26
Testing model adequacy mathematically with
a goodness-of-fit test
Uses a test of significance (hypothesis test)
One-degree of freedom chi-square distribution
p value for each coefficient
Does not discriminate how a coefficient might
deviate from the PH assumption
Slide 27
Example: Tumor Extent
3000 patients derived from SEER cancer registry
and Medicare billing information
Exploring the relationship between tumor extent
and survival
Hypothesis is that more extensive tumor
involvement is related to poorer survival
Slide 28
Log-Rank
2 = 269.0973 p <.0001
Slide 29
Example: Tumor Extent
Tumor extent may not be the only covariate that
affects survival
• Multiple medical comorbidities may be associated
with poorer outcome
• Ethnic and gender differences may contribute
Cox proportional hazards model can quantify
these relationships
Slide 30
Example: Tumor Extent
Test proportional hazards assumption with logminus-log plot
Perform Cox PH regression
• Examine significant coefficients and corresponding
hazard ratios
Slide 31
Slide 32
Example: Tumor Extent 5
The PHREG Procedure
Analysis of Maximum Likelihood Estimates
Variable DF
age2
age3
race2
race3
comorb1
comorb2
comorb3
DISTANT
REGIONAL
LIPORAL
PHARYNX
treat3
treat2
treat0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Parameter
Estimate
0.15690
0.58385
0.16088
0.05060
0.27087
0.32271
0.61752
0.86213
0.51143
0.28228
0.43196
0.07890
0.47215
1.52773
Standard
Error Chi-Square Pr > ChiSq
0.05079
0.06746
0.07953
0.09590
0.05678
0.06341
0.06768
0.07300
0.05016
0.05575
0.05787
0.06423
0.06074
0.08031
9.5430
74.9127
4.0921
0.2784
22.7549
25.9046
83.2558
139.4874
103.9513
25.6366
55.7206
1.5090
60.4215
361.8522
0.0020
<.0001
0.0431
0.5977
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.2193
<.0001
<.0001
Hazard 95% Hazard Ratio Variable
Ratio Confidence Limits Label
1.170
1.793
1.175
1.052
1.311
1.381
1.854
2.368
1.668
1.326
1.540
1.082
1.603
4.608
1.059
1.571
1.005
0.872
1.173
1.219
1.624
2.052
1.512
1.189
1.375
0.954
1.423
3.937
1.292
2.046
1.373
1.269
1.465
1.564
2.117
2.732
1.840
1.479
1.725
1.227
1.806
5.393
70
black
other
both
rad
none
Slide 33
Summary
Survival analyses quantifies time to a single, dichotomous
event
Handles censored data well
Survival and hazard can be mathematically converted to
each other
Kaplan-Meier survival curves can be compared
statistically and graphically
Cox proportional hazards models help distinguish
individual contributions of covariates on survival, provided
certain assumptions are met.