Essentials of survival analysis

Download Report

Transcript Essentials of survival analysis

Essentials of survival analysis

How to practice evidence based oncology

European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor of Mathematics Indiana University NW www.iun.edu/~mathiho

Time-to-Event

 Time-to-event data are generated when the measure of interest is the amount of time to occurrence of an event of interest.

       

For Example: – –

Time from randomization to death in clinical trial Time from randomization to recurrence in a cancer clinical trial

– – – –

Time from diagnosis of cancer to death due to the cancer Time from diagnosis of cancer to death due to any causes Time from remission to relapse of leukemia Time from HIV infection to AIDS

Time from exposure to cancer incidence in an epidemiological cohort study

Censoring

  Censoring occurs when we have some information, but we don’t know the exact time-to-event measure.

For example, patients typically enter a clinical study at the time randomization (or the time of diagnosis, or treatment) and are followed up until the event of interest is observed.    However, censoring may occur for the following reasons: a person does not experience the event before the study ends; death due to a cause not considered to be the event of interest (traffic accident, adverse drug reaction,…); and loss to follow-up, for example, if the person moves.

We say that the survival time is censored. These are examples of right censoring, which is the most common form of censoring in medical studies. For these patients, the complete time-to-event measure is unknown; we only know that the true time-to-event measure is greater than the observed measurement.

Example:

X means an event occurred; O means that the subject was censored.

Example 2 (from Kleinbaum: “Survival Analysis”) Consider data from a retrospective study of 13 women who had surgery for breast cancer. The survival times are: 23, 47, 69, 70+, 71+, 100+, 101+, 148, 181, 198+, 208+, 212+, 224+ (the “+” means that that particular patient was censored) Patient 1 2 3 4 5 6 7 8 9 10 11 12 13 Time (t) 23 47 69 70 71 100 101 148 181 198 208 212 224 Censor ( d ) 1 1 1 0 0 0 0 0 0 1 1 0 0

Survival Curve - Calculus

S(t)

= cumulative survival function = proportion that survive until time

t f(t) h(t)

= frequency distribution of age at death = hazard function (i.e. death rate at age

t

) = event rate  Relationships:

S

  

P

T

t

 

t

 

f

 

du

e

  0

t h

 

du f h

    lim 

t

 0  

dS dt P

t

 lim 

t

 0

P

t

t

 

T P

t

T

t t

 

T

 

t

t

   

t

|

t f S t

    

T

  

d dt

 ln

S

  

Distribution Function, Survival Function and Density Function

   Probability Distribution function

F

(

t

)  Pr(

T

 Probability Density function

f

Survival function (

t

)  

F

t

(

t

)

t

)

S

(

t

)  Pr(

T

t

)  1 

F

(

t

)

Creating a Kaplan-Meier curve

For each non-censored failure time

t

j (time-to-event time) evaluate: •

n j

d

j = number at risk before time

t

j = number of deaths from

t

j-1 to

t

j • Fraction

n j n

j d j

= estimated probability of surviving past

t

j-1 given that you are at risk at time

t j

 1 

P

T

t j

 1 |

T

t j

 2  The Product Limit Formula:

S P

  

T

T

t j

 2 | 

T t

t

j

 3

P

 

T

  ...

t

j

 1 |

P

T T

 

t

1

t j

 2   |

T

t

0

n j

d j n j

n j

 1 

n j

 1

d j

 1   

n

1 

n

1

d

1

Kaplan-Meier Product Limit Estimate

Consider data from a retrospective study of 45 women who had surgery for breast cancer. The survival times are: 23, 47, 69, 70+, 71+, 100+, 101+, 148, 181, 198+, 208+, 212+, 224+

j

Interval

n j d j

n j

d j n j

S (t ) 1

0 

t

 23

13 0 1.00

1.00

2 13 1 0.92

0.92

3 4

23 

t

 47 47 

t

 69 69 

t

 148

12 11 1 1 0.92

0.91

0.85

0.77

5 6

148 

t

 181 181 

t

6 5 1 1 0.83

0.80

0.64

0.51

1.20

1.00

0.80

0.60

0.40

0.20

0.00

0 20 40 60 80 100 120 140 160 180 200

100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 0 50 100 150 200

Survival Curves – more examples

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Age 76 Years and Older (N = 394)

0 Warf ASA No Rx 100 200 300 400 500 600 700 800 900

Days Since Index Hospitalization

Log-Rank test for two groups

 Suppose we have two groups, each with a different treatment. Usually, we represent this kind of situation in a 2x2 table.

or

Intervention Control Event 45 52 Intervention Control # at Risk n1 = 243 n2 = 255

Expected

  #

group

 

total

total

_# _# _ _

events risk

 

TOTAL: N = 498

O

E

Observed

Expected Var

n

1 

n

2

N

 2

M

N

  

N

1  

M

Expected number of events: Intervention Control Observed- Expected: Variance: No Event 198 203 # Events m1 = 45 m2 = 52 M = 97 47.33

49.67

-2.33

19.55

Log-Rank test for two groups

    If the data are given through time, we have a series of 2x2 tables.

Expected number of events If the two groups were the same – what would the expected number of events be?

Observed minus expected This is a measure of deviation of one treatment from their average (the expected) Log-rank statistic measures whether the data in the two groups are statistically “different”.

Comparing Survival Functions

   1.

Question: Did the treatment make a difference in the survival experience of the two groups?

Hypothesis: H

0

:

S

1

(t)=S

2

(t)

for all

t

≥ 0

.

Three often used tests:

Log-rank test (aka Mantel-Haenszel Test); 2.

3.

Wilcoxon Test; Likelihood ratio test.

Log-rank example

(from Kleinbaum: “Survival Analysis”)

Time n1

1 2 3 21 21 21 4 5 6 7 8 10 11 12 13 15 16 17 22 23

Total:

21 21 21 17 16 15 13 12 12 11 11 10 7 6

m1

0 0 0 1 0 1 0 0 3 0 1 1

9

0 0 1 0 1

m2

2 2 1 0 4 0 2 2 0 1 1 1

21

2 2 0 1 0

n2

21 19 17 16 14 12 12 12 8 3 2 1 8 6 4 4 3

Expected Obs-Exp

1.00

1.05

0.55

-1.00

-1.05

-0.55

1.14

1.20

1.91

0.59

2.29

0.65

1.24

1.33

0.75

0.73

0.79

0.77

1.56

1.71

Var

0.488

0.486

0.247

-1.14

-1.20

1.09

0.41

-2.29

0.35

-1.24

-1.33

0.25

-0.73

0.21

0.477

0.466

0.651

0.243

0.871

0.227

0.448

0.418

0.188

0.196

0.168

-0.77

-0.56

-0.71

0.178

0.302

0.204

-10.25 6.257

Log-rank Statistic

16.7929

Chi-square p-value

0.00004

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 10 20 30

Survival data vs. two-by-two table = different

Timen1 0 1 2 3 4 5 6 7 8 10 11 12 13 15 16 17 22 23 Total: 21 21 21 21 21 21 21 17 16 15 13 12 12 11 11 10 7 6 m1 0 0 0 0 0 0 3 1 1 1 9 0 1 0 0 1 0 1 0 q1 1 1 2 3 5 S1 100% 100% 100% 100% 100% 100% 86% 81% 81% 75% 75% 75% 69% 69% 63% 63% 54% 45% n2 21 21 19 17 16 14 12 12 12 8 8 6 4 4 3 3 2 1 m2 1 1 21 0 2 2 1 2 2 0 0 4 0 2 2 0 1 0 1 q2 S2 100% 90% 81% 76% 67% 57% 57% 57% 38% 38% 29% 19% 19% 14% 14% 10% 5% 0%

Rx1 Rx2 Total Event no-event Total 9 21 30 Surv. Rx1 Surv. Rx2 12 0 12 21 21 42 =12/21= 57.1% =0/21= 0.0%

Log-Rank test for several groups

 The null hypothesis is that all the survival curves are the same.

 Log-rank statistic is given by the sum:

X

2  # _

of

_

i

groups

O i Var

 

O i E i

  2

E i

  # _

of

_

i

groups

O i

E i

 2

E i

 This statistic has Chi-square distribution with (# of groups – 1) degrees of freedom.

Cox Proportional Hazards Regression     Most interesting survival-analysis research examines the relationship between survival — typically in the form of the hazard function — and one or more explanatory variables (or covariates).

Most common are linear-like models for the log hazard.  For example, a parametric regression model based on the exponential distribution, Needed to assess effect of multiple covariates on survival Cox-proportional hazards is the most commonly used multivariate survival method  Easy to implement in SPSS, Stata, or SAS  Parametric approaches are an alternative, but they require stronger assumptions about h(t).

Multivariate methods: Cox proportional hazards

     Assumes multiplicative risk—this is the proportional hazard assumption Conveniently separates baseline hazard function from covariates   Baseline hazard function over time Covariates are time independent Nonparametric Can handle both continuous and categorical predictor variables (think: logistic, linear regression) Without knowing baseline hazard h o (t), can still calculate coefficients for each covariate, and therefore hazard ratio

Limitations of Cox PH model

 Covariates normally do not vary over time    True with respect to gender, ethnicity, or congenital condition  One can program time-dependent variables Baseline hazard function, h o (t), is never specified, but Cox PH models known hazard functions You can estimate h o (t) accurately if you need to estimate S(t).

Hazard Ratio

HR

e O

E V

 Interesting to interpret For example, if HR = 0.70, we can deduce the following:  Relative effect on survival is 1 

HR

 1  0 .

70  0 .

30 or 30% reduction of the risk of death   Absolute Difference in survival is given as

e

ln   

HR

S

S HR

S

so, if

S = 60%, AbsDiff

 0 .

60 0 .

70 which represents a 10% difference.

 0 .

60  0 .

10 Difference in median survival is given as the difference between the median/HR and the median. For example, if the median is   25 months, then the difference is given as 25 0 .

70  25  10 .

71 or 10.71 months increase in median survival.