Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.

Download Report

Transcript Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.

Slide 1

Introduction to Survival Analysis
October 19, 2004

Brian F. Gage, MD, MSc
with thanks to Bing Ho, MD, MPH
Division of General Medical Sciences


Slide 2

Presentation goals
Survival analysis compared w/ other regression
techniques
 What is survival analysis
 When to use survival analysis
 Univariate method: Kaplan-Meier curves
 Multivariate methods:


• Cox-proportional hazards model
• Parametric models



Assessment of adequacy of analysis
Examples


Slide 3

Regression vs. Survival Analysis
T ech n iq u e

P red icto r
V ariab les

O u tco m e
V ariab le

L in ear
R eg ression

C ategorical o r
continuous

L o g istic
R eg ression

C ategorical o r B inary (excep t in
p o lyto m o u s lo g.
continuous

No

S u rv iv al
A n alyses

T im e and
categorical or
continuous

Y es

N orm ally
distributed

C en so rin g
p erm itted?
No

regressio n )

B inary


Slide 4

Regression vs. Survival Analysis
T ech n iq u e

M ath em atical
m o d el

Y ield s

L in ear
R eg ression

Y = B 1X + B o
(lin ear)

L in ear ch an g e s

L o g istic
R eg ression

L n (P /1 -P )= B 1 X + B o
(sigm o id al pro b .)

O d d s ratio s

S u rv iv al
A n alyses

h (t) =
h o (t)ex p ( B 1 X + B o )

H azard rates


Slide 5

What is survival analysis?


Model time to failure or time to event
• Unlike linear regression, survival analysis has a
dichotomous (binary) outcome
• Unlike logistic regression, survival analysis analyzes the
time to an event
– Why is that important?

Able to account for censoring
 Can compare survival between 2+ groups
 Assess relationship between covariates and
survival time



Slide 6

Importance of censored data



Why is censored data important?
What is the key assumption of censoring?


Slide 7

Types of censoring




Subject does not
experience event of
interest
Incomplete follow-up
• Lost to follow-up
• Withdraws from study
• Dies (if not being studied)



Left or right censored


Slide 8

When to use survival analysis


Examples
• Time to death or clinical endpoint
• Time in remission after treatment of disease
• Recidivism rate after addiction treatment




When one believes that 1+ explanatory variable(s)
explains the differences in time to an event
Especially when follow-up is incomplete or variable


Slide 9

Relationship between survivor function
and hazard function


Survivor function, S(t) defines the probability of
surviving longer than time t
• this is what the Kaplan-Meier curves show.
• Hazard function is the derivative of the survivor
function over time h(t)=dS(t)/dt
– instantaneous risk of event at time t (conditional failure rate)



Survivor and hazard functions can be converted
into each other


Slide 10

Approach to survival analysis


Like other statistics we have studied we can do any
of the following w/ survival analysis:
• Descriptive statistics
• Univariate statistics
• Multivariate statistics


Slide 11

Descriptive statistics


Average survival
• When can this be calculated?
• What test would you use to compare average survival
between 2 cohorts?



Average hazard rate
• Total # of failures divided by observed survival time
(units are therefore 1/t or 1/pt-yrs)
• An incidence rate, with a higher values indicating more
events per time


Slide 12

Univariate method: Kaplan-Meier
survival curves





Also known as product-limit formula
Accounts for censoring
Generates the characteristic “stair step” survival
curves
Does not account for confounding or effect
modification by other covariates
• When is that a problem?
• When is that OK?


Slide 13

1.0
0.9
0.8
0.7

Warf
ASA

0.6
0.5

No Rx
0.4

Age 76 Years and Older (N = 394)

0.3
0.2
0.1
0.0
0

100 200 300 400

500 600 700 800

Days Since Index Hospitalization

900


Slide 14

Time to Cardiovascular Adverse Event in VIGOR Trial


Slide 15


Slide 16

Comparing Kaplan-Meier curves


Log-rank test can be used to compare survival
curves
• Less-commonly used test: Wilcoxon, which places greater weights
on events near time 0.



Hypothesis test (test of significance)
• H0: the curves are statistically the same
• H1: the curves are statistically different




Compares observed to expected cell counts
Test statistic which is compared to 2 distribution


Slide 17

Comparing multiple Kaplan-Meier curves



Multiple pair-wise comparisons produce cumulative
Type I error – multiple comparison problem
Instead, compare all curves at once
• analogous to using ANOVA to compare > 2 cohorts
• Then use judicious pair-wise testing


Slide 18

Limit of Kaplan-Meier curves



What happens when you have several covariates that you
believe contribute to survival?
Example
• Smoking, hyperlipidemia, diabetes, hypertension, contribute to
time to myocardial infarct




Can use stratified K-M curves – for 2 or maybe 3
covariates
Need another approach – multivariate Cox proportional
hazards model is most common -- for many covariates
• (think multivariate regression or logistic regression rather than a
Student’s t-test or the odds ratio from a 2 x 2 table)


Slide 19

Multivariate method: Cox proportional hazards



Needed to assess effect of multiple covariates on
survival
Cox-proportional hazards is the most commonly
used multivariate survival method
• Easy to implement in SPSS, Stata, or SAS
• Parametric approaches are an alternative, but they
require stronger assumptions about h(t).


Slide 20

Cox proportional hazard model


Works with hazard model



Conveniently separates baseline hazard function from
covariates
• Baseline hazard function over time
– h(t) = ho(t)exp(B1X+Bo)
• Covariates are time independent
• B1 is used to calculate the hazard ratio, which is similar to the
relative risk




Nonparametric
Quasi-likelihood function


Slide 21

Cox proportional hazards model, continued





Can handle both continuous and categorical
predictor variables (think: logistic, linear regression)
Without knowing baseline hazard ho(t), can still
calculate coefficients for each covariate, and
therefore hazard ratio
Assumes multiplicative risk—this is the
proportional hazard assumption
• Can be compensated in part with interaction terms


Slide 22

Limitations of Cox PH model


Does not accommodate variables that change over
time
• Luckily most variables (e.g. gender, ethnicity, or
congenital condition) are constant
– If necessary, one can program time-dependent variables
– When might you want this?



Baseline hazard function, ho(t), is never specified
• You can estimate ho(t) accurately if you need to estimate
S(t).


Slide 23

Hazard ratio




What is the hazard ratio and how to you calculate it
from your parameters, β
How do we estimate the relative risk from the
hazard ratio (HR)?
How do you determine significance of the hazard
ratios (HRs).
• Confidence intervals
• Chi square test


Slide 24

Assessing model adequacy





Multiplicative assumption
Proportional assumption: covariates are
independent with respect to time and their hazards
are constant over time
Three general ways to examine model adequacy
• Graphically
• Mathematically
• Computationally: Time-dependent variables (extended
model)


Slide 25

Model adequacy: graphical approaches


Several graphical approaches
• Do the survival curves intersect?
• Log-minus-log plots
• Observed vs. expected plots


Slide 26

Testing model adequacy mathematically with
a goodness-of-fit test
Uses a test of significance (hypothesis test)
 One-degree of freedom chi-square distribution
 p value for each coefficient
 Does not discriminate how a coefficient might
deviate from the PH assumption



Slide 27

Example: Tumor Extent




3000 patients derived from SEER cancer registry
and Medicare billing information
Exploring the relationship between tumor extent
and survival
Hypothesis is that more extensive tumor
involvement is related to poorer survival


Slide 28

Log-Rank

2 = 269.0973 p <.0001


Slide 29

Example: Tumor Extent


Tumor extent may not be the only covariate that
affects survival
• Multiple medical comorbidities may be associated
with poorer outcome
• Ethnic and gender differences may contribute



Cox proportional hazards model can quantify
these relationships


Slide 30

Example: Tumor Extent



Test proportional hazards assumption with logminus-log plot
Perform Cox PH regression
• Examine significant coefficients and corresponding
hazard ratios


Slide 31


Slide 32

Example: Tumor Extent 5
The PHREG Procedure
Analysis of Maximum Likelihood Estimates

Variable DF
age2
age3
race2
race3
comorb1
comorb2
comorb3
DISTANT
REGIONAL
LIPORAL
PHARYNX
treat3
treat2
treat0

1
1
1
1
1
1
1
1
1
1
1
1
1
1

Parameter
Estimate
0.15690
0.58385
0.16088
0.05060
0.27087
0.32271
0.61752
0.86213
0.51143
0.28228
0.43196
0.07890
0.47215
1.52773

Standard
Error Chi-Square Pr > ChiSq
0.05079
0.06746
0.07953
0.09590
0.05678
0.06341
0.06768
0.07300
0.05016
0.05575
0.05787
0.06423
0.06074
0.08031

9.5430
74.9127
4.0921
0.2784
22.7549
25.9046
83.2558
139.4874
103.9513
25.6366
55.7206
1.5090
60.4215
361.8522

0.0020
<.0001
0.0431
0.5977
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.2193
<.0001
<.0001

Hazard 95% Hazard Ratio Variable
Ratio Confidence Limits Label
1.170
1.793
1.175
1.052
1.311
1.381
1.854
2.368
1.668
1.326
1.540
1.082
1.603
4.608

1.059
1.571
1.005
0.872
1.173
1.219
1.624
2.052
1.512
1.189
1.375
0.954
1.423
3.937

1.292
2.046
1.373
1.269
1.465
1.564
2.117
2.732
1.840
1.479
1.725
1.227
1.806
5.393

70age>80
black
other

both
rad
none


Slide 33

Summary






Survival analyses quantifies time to a single, dichotomous
event
Handles censored data well
Survival and hazard can be mathematically converted to
each other
Kaplan-Meier survival curves can be compared
statistically and graphically
Cox proportional hazards models help distinguish
individual contributions of covariates on survival, provided
certain assumptions are met.