Main presentation title goes here.

Download Report

Transcript Main presentation title goes here.

Survival Analysis:
An Introductory Course
Scott Harris
October 2009
Learning outcomes
By the end of this session you should:
• know when to apply survival methods;
• understand how to use the survival techniques in SPSS
and the differences between them;
• be able to produce and interpret life tables;
• be able to produce and interpret Kaplan-Meier curves;
2
Contents
• Introduction
– When/why use survival analysis.
– Types of survival/time to event data.
• Life table analysis
– Producing life tables by hand.
– Producing life tables in SPSS.
• Kaplan-Meier
– Producing Kaplan-Meier plots in SPSS.
– Comparison of Kaplan-Meier survival curves (Logrank test) in SPSS.
3
Dataset 1: Typical survival dataset
Not all survival times known (Limited follow-up)
Cancer group
Time to death
Death status
Pancreatic
39
Deceased
Breast
? (>45)
Alive
Breast
? (>68)
Alive
Pancreatic
94
Deceased
Pancreatic
67
Deceased
Breast
352
Deceased
4
Survival analysis?
• Potentially missing values
– Lost to follow-up
– Withdrew from study
• Limited duration of follow-up
– Some patients still alive – yet to experience the
event of interest (death)
• Comparative analysis
– Survival analysis methods
5
Dataset 2
Beijing, 100m final:
Athlete
Country
Time
Status
Usain Bolt
JAM
9.69
Finished
Richard Thompson
TRI
9.89
Finished
Walter Dix
USA
9.91
Finished
Churandy Martina
AHO
9.93
Finished
Asafa Powell
JAM
9.95
Finished
Michael Frater
JAM
9.97
Finished
Marc Burns
TRI
10.01
Finished
Darvis Patton
USA
10.03
Finished
6
Survival analysis? (Time to event)
• Potentially missing values
– Disqualified
– Injured
• Short duration of follow-up
– Everyone who finishes will have a time
• Comparative Analysis (JAM vs. Other )
– Survival Methods possible but...
– Independent samples t test (Normal)
– Mann-Whitney test (Non parametric)
7
When to use survival methods?
• Time to event data
– Duration between treatment and death
– Time from admission to successful discharge from hospital
– Time from starting a diet to losing 10 lbs.
– Time from release to watching the new Harry Potter film
• The event may or may not happen:
– Ever (some people will die in hospital before being
discharged)
– In the time period concerned (limited follow-up)
8
Censoring
Censoring occurs when we have missing information.
• Left Censoring:
Unclear on exact start of monitoring
– Missing date of birth
– Unknown date of starting treatment
– Experiences event before inclusion in study
• Right censoring:
Some individuals may not be observed for the full time to event
– Loss to follow-up
– Drop out
– Termination of study / follow-up
9
Right censoring
Earliest event
Key
Subject
Event
Study start
Censored
time
End of follow-up
No event but no
more follow-up Time (days)
Actual event
may occur here
but having
stopped followup earlier this
would be
missed.
10
Right censoring – Staggered start
Quickest event
Key
Subject
Event
Study start
Censored
time
End of follow-up
No event but no
more follow-up Time (days)
Actual event
may occur here
but having
stopped followup earlier this
would be
missed.
11
The Example Dataset
SPSS – Survival time data
In SPSS (as with other packages) we require the
following two variables when dealing with survival
time data:
– A continuous time variable that measures the time
until either the event or the individuals withdrawal
(censoring).
– A categorical variable that acts as an indicator for
whether the subject experienced the event of
interest or whether they did not and were
censored.
13
Example dataset
• Time to event data for two groups (Group A and
Group B): Coded 1 and 2 respectively.
• Time in days until event or until end of follow-up.
• Whether the individual has had the event of interest
(‘No event’ and ‘Event’): Coded 0 and 1 respectively.
• The age of the individual at the start of the study.
14
Example dataset
Group Time
Status
Age
Event
65
B
No event 61
Group Time
Status
Age
3
Event
70
B
7
Event
64
A
9
A
12
A
14
Event
57
B
9
No event
64
A
14
Event
55
B
11
Event
61
A
16
No event 50
B
12
Event
53
A
18
Event
52
B
15
Event
51
A
24
Event
51
B
19
Event
50
A
30
No event 50
B
21
Event
48
15
SPSS – Example dataset
16
SPSS – Example dataset: Labelled
17
SPSS – Calculating the Time
Transform  Compute Variable…
Calculating the Time in days .
COMPUTE Time = DATEDIFF(LastDate,StartDate,"Days") .
EXECUTE .
18
Info: Creating new variables in SPSS
1) From the menus select ‘Transform’  ‘Compute…’.
2) Enter the name of the new variable that you want to create into
the ‘Target Variable:’ box.
3) Enter the formula for the new variable into the ‘Numeric
Expression’ box.
In this case we just want to create the difference between
two date variables. To do this we need to make use of the
date functions. Select ‘Date Arithmetic’ and then ‘Datediff’
from the boxes on the right. Then we need to replace the
question marks with the relevant information as indicated
by the function help in the middle of the window. In this
case ‘DATEDIFF(LastDate,StartDate,"Days")’ was entered
in the ‘Numeric Expression’ box.
4) Finally click ‘OK’ to produce the new variable or ‘Paste’ to add
the syntax for this into your syntax file.
●
19
SPSS – Example dataset: Complete
20
Practical Questions
Survival Analysis
Question 1
Practical: Download & Setup
From the course webpage download the two SPSS datasets that
will be used for the practical's by clicking the right mouse
button on the file name and selecting Save Target As.
The two datasets are:
•
–
Survival_Ex1.sav (The example dataset used in the slides)
–
BC_Survival.sav (A dataset on Breast cancer survival: Data
are from the Mayo clinic)
Open up both of the datasets in SPSS.
1) Calculate the Time variable for the Survival_Ex1.sav dataset.
22
Life Table Analysis
Life table analysis
• The simplest form of survival analysis
– Generally the quickest to do by hand
– Split the time variable into X categories
– One set of calculations for each time category
– Most easily done in a table structure, hence the
name
24
Theory: Life table analysis
• For Each time category:
– No. Entering: Subjects entering (NE)
– No. withdrawing : Subjects withdrawing (NW)
– At risk:
NW 

 NE
  AR
2 

– Events: Number of events (Number of failures)
25
Theory: Life table analysis
• For Each time category:
– Proportion failing:
No . Failures
AR
– Proportion surviving: 1 
– Cumulative Survival:
Proportion
surviving
at time point i.
No . Failures
 Pi
AR
Pi  CPi 1  CPi
Cumulative proportion at
time point i-1 (previous)
Cumulative
proportion at
time point i
(current)
26
Theory: Life table analysis
Group A life table
8 – 0/2 = 8
Interval
Entering Withdrew
1/8 = 0.125
1 – 0.125 = 0.875
At
Cum.
Events Failing Surviving
risk
Survival
0 to <10
8
0
8
1
0.125
0.875
0.875
10 to <20
7
2
6
3
0.5
0.5
0.438
20 to <30
2
0
2
1
0.5
0.5
0.219
30 to <40
1
1
0.5
0
0
1
0.219
7 – 2/2 = 6
3/6 = 0.5
1 – 0.5 = 0.5
0.5 x 0.875 = 0.438
27
Life table analysis
Group A life table
Interval
Entering Withdrew
At
Cum.
Events Failing Surviving
risk
Survival
0 to <10
8
0
8
1
0.125
0.875
0.875
10 to <20
7
2
6
3
0.5
0.5
0.438
20 to <30
2
0
2
1
0.5
0.5
0.219
30 to <40
1
1
0.5
0
0
1
0.219
28
SPSS – Life table analysis
Analyze  Survival  Life Tables…
29
SPSS – Life table analysis
* Calculating the life table .
SURVIVAL
TABLE=Time BY Group(1 2)
/INTERVAL=THRU 40 BY 10
/STATUS=Status(1)
/PRINT=TABLE .
30
Info: Life table analysis in SPSS
1)
From the menus select ‘Analyze’  ‘Survival’  ‘Life Tables…’.
2)
Put the variable containing the time into the ‘Time:’ box. Decide on the
period of time to group together and put this into the ‘by’ box of the
‘Display Time Intervals’ box. The first value to go into the ‘Display
Time Intervals’ box has to be a multiple of the value in the ‘by’ box as
well as being greater than the longest time recorded in your dataset.
3)
Put the categorical variable, that indicates whether a case had the
event of interest or not into the ‘Status:’ box. Then click the ‘Define
Event…’ button and enter the single value or range of values that all
indicate that the event occurred. Click ‘Continue’.
4)
If you want separate results for each level of a categorical variable then
put this variable into the ‘Factor:’ box. Click the ‘Define Range…’ box
and then enter the numeric codes for the minimum and maximum of
the groups that you want to compare. Click ‘Continue’.
5)
Finally click ‘OK’ to produce the test results or ‘Paste’ to add the
syntax for this into your syntax file.
31
SPSS – Life table analysis : Output
Life Table
First-order Controls
Group Group B
Group A
Proportion
Surviving
.73
.20
.00
.88
.50
.50
1.00
Cumulative
Proportion
Surviving at
End of Interval
.73
.15
.00
.88
.44
.22
.22
Std. Error of
Cumulative
Proportion
Surviving at
End of Interval
.16
.14
.00
.12
.19
.18
.18
Interval Start Time
.000
10.000
20.000
.000
10.000
20.000
30.000
Probability
Density
.027
.059
.015
.013
.044
.022
.000
Std. Error of
Probability
Density
.016
.018
.014
.012
.019
.018
.000
Number
Entering
Interval
8
5
1
8
7
2
1
Hazard Rate
.03
.13
.20
.01
.07
.07
.00
Number
Withdrawing
during Interval
1
0
0
0
2
0
1
Std. Error of
Hazard Rate
.02
.05
.00
.01
.04
.06
.00
Number
Exposed
to Risk
7.500
5.000
1.000
8.000
6.000
2.000
.500
Number of
Terminal
Events
2
4
1
1
3
1
0
Proportion
Terminating
.27
.80
1.00
.13
.50
.50
.00
Proportion
Surviving
.73
.20
.00
.88
.50
.50
1.00
Cumulative
Proportion
Surviving at
End of Interval
.73
.15
.00
.88
.44
.22
.22
Std. Error of
Cumulative
Proportion
Surviving at
End of Interval
.16
.14
.00
.12
.19
.18
.18
The same values as
were calculated by hand
32
Practical Questions
Survival Analysis
Question ?2? and 3
Practical Questions
2) Calculate the Life table values for Group B from the
example dataset by hand, using the skeleton table
below:
Interval
Entering
Withdrew
At
risk
Events Failing Surviving
Cum.
Survival
0 to <10
10 to <20
20 to <30
34
Practical Questions
The file BC_Survival.sav contains data on 1207
women who were diagnosed with breast cancer.
3) Produce a Life table for this data, separating those
women for whom the cancer had infected the lymph
nodes from those for whom it had not (ln_yesno).
Split the survival time into yearly periods.
35
Practical Solutions
2. The life table for Group B should look like this:
Interval
Entering Withdrew
At
Events Failing Surviving
risk
Cum.
Survival
0 to <10
8
1
7.5
2
0.267
0.733
0.733
10 to <20
5
0
5
4
0.8
0.2
0.147
20 to <30
1
0
1
1
1.0
0
0
36
Practical Solutions: Instructions
3.
To produce the Life table you will need syntax
similar to the following:
* Producing the Life table .
SURVIVAL
TABLE=time BY ln_yesno(0 1)
/INTERVAL=THRU 144 BY 12
/STATUS=status(1)
/PRINT=TABLE .
37
Practical Solutions: Output
Life Table
First-order Controls
Lymph Nodes?
No
Yes
Interval Start Time
.000
12.000
24.000
36.000
48.000
60.000
72.000
84.000
96.000
108.000
120.000
132.000
.000
12.000
24.000
36.000
48.000
60.000
72.000
84.000
96.000
108.000
120.000
Number
Entering
Interval
929
831
691
566
413
284
190
115
65
34
16
3
278
245
187
151
118
86
54
31
22
13
6
Number
Withdrawing
during Interval
97
133
117
142
124
90
70
50
30
18
13
3
32
50
30
24
29
31
21
9
9
7
6
Number
Exposed
to Risk
880.500
764.500
632.500
495.000
351.000
239.000
155.000
90.000
50.000
25.000
9.500
1.500
262.000
220.000
172.000
139.000
103.500
70.500
43.500
26.500
17.500
9.500
3.000
Number of
Terminal
Events
1
7
8
11
5
4
5
0
1
0
0
0
1
8
6
9
3
1
2
0
0
0
0
Proportion
Terminating
.00
.01
.01
.02
.01
.02
.03
.00
.02
.00
.00
.00
.00
.04
.03
.06
.03
.01
.05
.00
.00
.00
.00
Proportion
Surviving
1.00
.99
.99
.98
.99
.98
.97
1.00
.98
1.00
1.00
1.00
1.00
.96
.97
.94
.97
.99
.95
1.00
1.00
1.00
1.00
Cumulative
Proportion
Surviving at
End of Interval
1.00
.99
.98
.96
.94
.93
.90
.90
.88
.88
.88
.88
1.00
.96
.93
.87
.84
.83
.79
.79
.79
.79
.79
Std. Erro
Cumulat
Proporti
Surviving
End of Inte
38
Life Table
Number
Entering
Interval
929
831
691
566
413
284
190
115
65
34
Number
Entering16
Interval 3
278
929
245
831
187
691
151
566
118
413
86
284
54
190
31
115
22
65
13
34
6
16
3
278
245
187
151
118
86
54
31
22
13
6
Number
Withdrawing
during Interval
97
133
117
142
124
90
70
50
30
18
Number
Withdrawing13
during Interval3
32
97
50
133
30
117
24
142
29
124
31
90
21
70
9
50
9
30
7
18
6
13
3
32
50
30
24
29
31
21
9
9
7
6
Number
Exposed
to Risk
880.500
764.500
632.500
495.000
351.000
239.000
155.000
90.000
50.000
25.000
Number
9.500
Exposed
1.500
to Risk
262.000
880.500
220.000
764.500
172.000
632.500
139.000
495.000
103.500
351.000
70.500
239.000
43.500
155.000
26.500
90.000
17.500
50.000
9.500
25.000
3.000
9.500
1.500
262.000
220.000
172.000
139.000
103.500
70.500
43.500
26.500
17.500
9.500
3.000
Number of
Terminal
Proportion
Events
Terminating
1
.00
7
.01
8
.01
11
.02
5
.01
4
.02
5 Life Table .03
0
.00
1
.02
0
.00
Number of
0
.00
Terminal
Proportion
.00
Events 0
Terminating
1
.00
8
.04
7
.01
6
.03
8
.01
9
.06
11
.02
3
.03
5
.01
1
.01
4
.02
2
.05
5
.03
0
.00
0
.00
1
.02
0
.00
0
.00
0
.00
1
.00
8
.04
6
.03
9
.06
3
.03
1
.01
2
.05
0
.00
0
.00
0
.00
0
.00
Practical Solutions: Output
First-order Controls
Lymph Nodes?
No
First-order Controls
Yes
Lymph Nodes?
No
Yes
Interval Start Time
.000
12.000
24.000
36.000
48.000
60.000
72.000
84.000
96.000
108.000
120.000
132.000Start Time
Interval
.000
12.000
24.000
36.000
48.000
60.000
72.000
84.000
96.000
108.000
120.000
132.000
.000
12.000
24.000
36.000
48.000
60.000
72.000
84.000
96.000
108.000
120.000
Proportion
Surviving
1.00
.99
.99
.98
.99
.98
.97
1.00
.98
1.00
1.00
Proportion
1.00
Surviving
1.00
.96
.99
.97
.99
.94
.98
.97
.99
.99
.98
.95
.97
1.00
1.00
.98
1.00
1.00
1.00
1.00
.96
.97
.94
.97
.99
.95
1.00
1.00
1.00
1.00
Cumulative
Proportion
Surviving at
End of Interval
1.00
.99
.98
.96
.94
.93
.90
.90
.88
Cumulative
.88
Proportion
.88
Surviving at
.88
End of Interval
1.00
.96
.99
.93
.98
.87
.96
.84
.94
.83
.93
.79
.90
.79
.90
.79
.88
.79
.88
.79
.88
.88
1.00
.96
.93
.87
.84
.83
.79
.79
.79
.79
.79
Std. Erro
Cumulat
Proporti
Surviving
End of Inte
Std. Erro
Cumulat
Proporti
Surviving
End of Inte
39
Kaplan-Meier
Kaplan-Meier
• Rather than categorising, we can estimate the survival
function directly from the continuous survival times.
• Imagine creating a life table so that each time interval
contains exactly one case. Multiplying these survival
probabilities across the intervals gives what is known
as the Kaplan-Meier product limit estimator.
41
SPSS – Kaplan-Meier
Analyze  Survival  Kaplan-Meier…
(Just looking at Group A)
There is a filter in place to limit the results to those from Group A alone.
42
SPSS – Kaplan-Meier
* KM plot for just Group A .
KM
Time /STATUS=Status(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL .
43
Info: Kaplan-Meier in SPSS
1) From the menus select ‘Analyze’  ‘Survival’  ‘KaplanMeier…’.
2) Put the variable containing the time into the ‘Time:’ box.
3) Put the categorical variable, that indicates whether a case had
the event of interest or not into the ‘Status:’ box. Then click the
‘Define Event…’ button and enter the single value or range of
values that all indicate that the event occurred. Click
‘Continue’.
4) If you want separate curves and results for each level of a
categorical variable then put this variable into the ‘Factor:’ box.
5) Click the ‘Options’ button and tick the ‘Survival’ option in the
‘Plots’ box. Click ‘Continue’.
6) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
44
SPSS – Kaplan-Meier: Output - values
Survival Table
1
2
3
4
5
6
7
8
Time
9.000
12.000
14.000
14.000
16.000
18.000
24.000
30.000
Status
Event
No event
Event
Event
No event
Event
Event
No event
Cumulative Proportion
Surviving at the Time
Estimate
Std. Error
.875
.117
.
.
.
.
.583
.186
.
.
.389
.201
.194
.170
.
.
N of
Cumulative
Events
1
1
2
3
3
4
5
5
N of
Remaining
Cases
7
6
5
4
3
2
1
0
Means and Medians for Surv ival Time
a
Estimate
19.208
Mean
95% Confidence Interval
Std. Error
Lower Bound
Upper Bound
2.724
13.870
24.547
Estimate
18.000
Calculated proportion
with no event at
each time point
Information for each
individual subject in order
of length of follow-up
Total number
of events
Median
95% Confidence Interval
Std. Error
Lower Bound
Upper Bound
4.140
9.885
26.115
a. Estimation is limited to the largest survival time if it is censored.
45
SPSS – Kaplan-Meier: Output - plot
0.60
0.20
0.40
Group A
0.00
Proportion: No event
0.80
1.00
Kaplan-Meier survival curve for Group A
0
5
10
15
Time
20
25
30
46
SPSS – Kaplan-Meier: Output - plot
0.60
0.20
0.40
Group A
0.00
Proportion: No event
0.80
1.00
Kaplan-Meier survival curve for Group A
Can also mark where
censored observations
occur (not advisable for
large datasets)
0
5
10
15
Time
20
25
30
47
SPSS – Kaplan-Meier
The last few plots are not from SPSS
but come from another statistical
package: Stata. The default KM plot
from SPSS (shown here) is ok but
generally needs a bit of tidying up
within the SPSS graph editor.
As you can see the plot does not
automatically start from the top left
corner (100% survival at time 0). It
starts from the time of the first
event, which is not ideal. You may
also notice the time axis (x axis)
does not start from 0 although this
is easily altered.
48
Log-rank test
• Allows for comparison between groups.
• Possible to compute by hand (based on Chi-square).
• ‘Just another option’ when using a Statistics package.
• Other options for comparison include the Breslow and
Tarone-Ware tests.
H0: No difference between the groups.
H1: The groups are different.
49
SPSS – Log-rank test
Having removed the filter, but leaving the other options the
same as the previous KM setup you only need to add a Factor
variable and then select another option for the Log rank test.
* Comparative KM plot with log-rank test .
KM
Time BY Group /STATUS=Status(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL
/TEST LOGRANK
/COMPARE OVERALL POOLED .
50
Info: K-M and log rank tests in SPSS
1) Follow the information sheet on producing a Kaplan-Meier
curve, but stop after point 5.
2) The log rank test will compare the levels of the categorical
variable that is put into the ‘Factor:’ box. As such it is
unavailable when no such variable has been specified.
3) Once a variable is in the ‘Factor:’ box, click on the ‘Compare
Factor…’ button. Tick the option for the ‘Log Rank’ test in the
‘Test Statistics’ box. Click ‘Continue’.
4) Finally click ‘OK’ to produce the test results or ‘Paste’ to add
the syntax for this into your syntax file.
51
SPSS – Log-rank test: Output
Overall Comparisons
Log Rank (Mantel-Cox)
Chi-Square
2.429
df
1
Sig .
.119
Test of equality of survival distributions for the different levels of
Group.
The log rank test here shows no
significant difference between
the groups (p=0.119)
The KM plot is now split
into each of the levels of
the categorical variable (2
groups in this case)
52
1.00
SPSS – Kaplan-Meier: Presentation
Kaplan-Meier survival curves for Group A and Group B
0.60
0.20
0.40
Group A
Group B
0.00
Proportion: No event
0.80
Log rank test, p = 0.1191
0
5
10
15
Time
20
25
30
53
SPSS – Kaplan-Meier: Presentation
1.00
Kaplan-Meier plot of time to event, split by Group
0.60
Group A
0.20
0.40
Group B
0.00
Proportion: No event
0.80
Log rank test, p = 0.1191
0
10
20
30
Time (in days)
54
Practical Questions
Survival Analysis
Question 4
Practical Questions
The file BC_Survival.sav contains data on 1207
women who were diagnosed with breast cancer.
3) Produce a Kaplan-Meier curve for this data,
separating those women for whom the cancer had
infected the lymph nodes from those for whom it had
not. Conduct a log-rank test to see if the survival of
the two groups is significantly different. Edit the KM
plot so that it would be able to ‘stand alone’ in a
publication and comment on all of your results.
56
Practical Solutions: Instructions
4. To produce a Kaplan-Meier curve and the log-rank test you will
need syntax similar to the following (You will then need to
customise the plot itself with the graph editor afterwards):
* Producing the KM plot .
KM
time BY ln_yesno /STATUS=status(1)
/PRINT TABLE MEAN
/PLOT SURVIVAL
/TEST LOGRANK
/COMPARE OVERALL POOLED .
There is clearly a significant difference between the two
categories, with survival being better in the group without
lymph node involvement (p<0.001).
57
Practical Solutions: Output
58
Practical Solutions: Output
Means and Medians for Survival Time
a
Lymph Nodes?
No
Yes
Overall
Estimate
124.920
111.331
122.692
Std. Error
1.400
3.008
1.307
Mean
95% Confidence Interval
Lower Bound
Upper Bound
122.177
127.664
105.436
117.226
120.131
125.253
Estimate
.
.
.
Std. Error
.
.
.
Median
95% Confidence Interval
Lower Bound
Upper Bound
.
.
.
.
.
.
a. Estimation is limited to the largest survival time if it is censored.
It can be seen that the mean survival times are:
• 124.92 (95% CI: 122.18 to 127.66) months for no
involvement,
• 111.33 (95% CI: 105.44 to 117.23) months for nodal
involvement.
There are no median survival estimates as at no point over the
duration do 50% of the subjects in either group experience an
event.
59
Summary
You should now:
• know when to apply survival methods;
• understand how to use the survival techniques in SPSS
and the differences between them;
• be able to interpret life tables;
• be able to interpret Kaplan-Meier curves;
60
References
• Practical Statistics for medical research, D Altman: Chapter 13.
• Medical Statistics, B Kirkwood, J Stern: Chapter 26.
• An introduction to medical statistics, M Bland: Chapter 15.6.
Survival analysis specific texts
• Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning
Text, Springer-Verlag Publishers, 2005.
• Parmar M. K. B., Machin D., Survival analysis: a practical
approach, Wiley, 1995.
61