Transcript No Slide Title
Measures of disease frequency (I)
•
MEASURES OF DISEASE FREQUENCY
Absolute measures of disease frequency:
– Incidence – Prevalence – Odds
•
Measures of association
:
– Ratios (Relative risk-type measures) – Differences (Attributable risk-type measures)
Two types of disease frequency measures
• Incidence and Odds of Incidence – New disease – Deaths in the population (mortality) – Deaths in patients (case-fatality) – Recurrences – etc.
• Prevalence and Odds of Prevalence
Analyses of cohort (prospective) data:
• Calculation of incidence rates • Comparison of incidence rates across (exposure) groups
Unexposed Exposed
What is "incidence"?
Two major ways to define incidence • Cumulative incidence (cumulative probability, hazards)
SURVIVAL ANALYSIS
• Rate
ANALYSIS BASED ON PERSON-TIME
Calculation of incidence Strategy #1
SURVIVAL ANALYSIS
• Variable of interest:
TIME to occurrence of an EVENT (death,disease, relapse)
• Primary objectives:
1) TO ESTIMATE CUMULATIVE INCIDENCE (q) or SURVIVAL FUNCTION (1-q)
1.0
• Methods:
– LIFE TABLE (Actuarial) – KAPLAN-MEIER
Time
2) TO COMPARE SURVIVAL IN DIFFERENT GROUPS
1.0
Unexposed Exposed • Methods: Time
– PROPORTIONAL HAZARDS (COX) REGRESSION – LOGRANK TEST
Examples: • Clinical trial (experimental study):
Group of infants with acute diarrhea: randomized to 3 treatment groups: (
NEJM
1993;328:1653): • Bismuth (100 mg/kg) • Bismuth (150 mg/kg) • Placebo
Which treatment results in earlier remission of diarrhea?
Examples: • Cohort (observational) study:
Group of Johns Hopkins University medical students, classes 1948-64 (
Precursors Study
) • Positive family history of hypertension • Negative family history of hypertension –
Which group results in a higher cumulative incidence of hypertensives?
–
Is there evidence that the hypertension diagnoses occur earlier in one of the groups?
•
OBJECTIVE OF SURVIVAL ANALYSIS:
To compare the “cumulative incidence” of an event (or the cumulative probability surviving event-free) in exposed and unexposed (characteristic present or absent) •
BASIS FOR THE ANALYSIS
• NUMBER of
EVENTS
•
TIME
of occurrence
Need to precisely define:
• “EVENT”
(failure): – Death – Disease (diagnosis, start of symptoms, relapse) – Remission of diarrhea – Quit smoking – Menopause
Need to precisely define:
• “EVENT”
(failure): – Death – Disease (diagnosis, start of symptoms, relapse) – Remission of diarrhea – Quit smoking – Menopause
• “TIME”
: – Time from recruitment into the study – Time from employment – Time from diagnosis (prognostic studies) – Time from infection – Calendar time – Age
Why is survival analysis “tricky”?
• Different follow-up for the study participants: – Because of staggered (late) entries – Because of losses to follow-up – Example: • Follow up of 6 patients (2 yrs) – 3 Deaths – 2 censored before 2 years – 1 survived 2 years Question: What is the Cumulative Incidence (or the Cumulative Survival) up to 2 years?
Person ID 1 2 3 4 5 6
(6) (3)
Jan 1999 Jan 2000
( ≠)
Death Censored observation (lost to follow-up, withdrawal) Number of months to follow-up
(18) (15) (12) (24)
Jan 2001 Crude Survival: 3/6= 50%
Change time scale from calendar time to follow-up time:
Person ID 1 2 3 4 5 6
(3) (6) (12) (15) (18) (24)
0 1
Follow-up time (years)
2
What is the 2-year cumulative survival?
• Assume both censored individuals survived up to 2-years:
S
Person ID ( 2
yrs
) 3 6 0 .
50 1
(24) (6)
2
(18)
3
(15)
4 5 6
(3) (12)
0 1
Follow-up time (years)
2
What is the 2-year cumulative survival?
• Assume both censored individuals survived up to 2 years:
S
( 2
yrs
) 3 6 0 .
50 • Assume both censored individuals died before 2 years: Person ID
S
( 2
yrs
) 1 6 0 .
17 1
(24)
2
(6)
3
(18)
4 5 6
(3) (12) (15)
0 1 2
Follow-up time (years)
“True” survival is probably somewhere in between these extreme estimates …, but where?
• Calculating CUMULATIVE INCIDENCE (up to time “t”)
q t
Number of individual s with the event by t Number at risk at baseline
• Calculating CUMULATIVE SURVIVAL
S
(
t
) 1
q t
Number of individual s alive beyond t Number at risk at baseline
Problem:
requires accounting for censoring (losses to follow-up)
ID 1 2 3 4 5 6 0 One solution:
• Actuarial life table
Assume that censored observations over the period contribute one-half the persons at risk in the denominator.
(6) (12) (15) (18) (3)
1
Follow-up time (years)
2
(24)
q
2
yrs
6 3 1 2 2 3 5 0 .
60
S
( 2
yrs
) 1
q
2
yrs
0 .
40
Further problem: If the follow-up is long, the risks cannot be assumed to be constant, and thus, the follow up time needs to be partitioned.
Methods:
– LIFE TABLE – KAPLAN-MEIER
LIFE TABLE (Actuarial method)
Non-parametric method for grouped data Example:
Precursors Study
, incidence of CHD by follow-up time
i 1 2 3 4 5 6 Time (yr) 0 -9 10-19 20-24 25-29 30-34 35 N i 1170 1145 1076 980 687 396 d i 2 21 24 24 23 23 c i 23 48 72 269 268 373
Data: N i : Number alive (disease-free) at the beginning of each interval d i : Number of cases during each interval c i : Number of losses during each interval
LIFE TABLE (Actuarial method)
, cont’d
i Time (yr) N i d i c i N i * 1 2 3 4 0 -9 10-19 20-24 25-29 1170 1145 1076 980 2 21 24 24 23 48 72 269 1158.5
1121.0
1040.0
845.5
5 30-34 687 23 268 553.0
6 35 396 23 373 †Cumulative survival by the end of each interval 209.5
Calculations: N i * : “Corrected” number at risk in the interval
N i * = N i – c i / 2
q i : Probability of the event in each interval
q i = d i / N i *
p i : Probability of survival in each interval
p i = 1- q i q i 0.00173
0.0187
0.0231
0.0284
0.0416
0.1098
:
Cumulative probability of survival
p i 0.99827
0.9813
0.9769
0.9716
0.9584
0.8902
0.99827
0.9796
0.9570
0.9298
0.8911
0.7933
Note: p i and q i are “conditional” probabilities (of the event and of survival, respectively).
I.e., in order to have the event or to survive throughout a given interval one has to have survived (is conditioned on…) through all the previous ones.
S(t i ) Cumulative probability of survival at (or “up to”) time t: S(t i ) All t intervals j t i p j Example in English: the cumulative survival up to to the beginning of year 25 (end of interval 20-24) is the product of the conditional survival probabilities through all previous interval up to that date: S(25) p 1 p 2 p 3 .
99827 .
9813 .
9769 .
9570
Plotting the survival function: Survival 1.00
0.95
0.90
0.85
0.80
0.75
0.70
0 10 20 30 40 Years of follow-up 50 Time (yr) 0 -9 10-19 20-24 25-29 30-34 35-
i † 0.9983
0.9796
0.9570
0.9298
0.8911
0.7933
VARIANCE FOR CUMULATIVE SURVIVAL ESTIMATE
Method described by Greenwood in 1926* Var [ ˆ (t i )] [ ˆ (t i )] 2 All t intervals j t i
d i N i
* (
N i
*
d i
) And 95% Confidence interval can then be obtained: ˆ (t i ) 1.96
Var [ ˆ (t i )] *Greenwood M. A report on the natural duration of cancer.
Rep Pub Health Med Subjects
1926;33:1-26.
i Time (yr) N i d i c i N i * 1 2 3 4 0 -9 10-19 20-24 25-29 1170 1145 1076 980 2 21 24 24 23 48 72 269 1158.5
1121.0
1040.0
845.5
5 30-34 687 23 268 553.0
6 35 396 23 373 †Cumulative survival by the end of each interval 209.5
q i 0.00173
0.0187
0.0231
0.0284
0.0416
0.1098
p i 0.99827
0.9813
0.9769
0.9716
0.9584
0.8902
i † 0.9983
0.9796
0.9570
0.9298
0.8911
0.7933
Var [ ˆ (t i )] [ ˆ (t i )] 2 All t intervals j t i
d i N i
* (
N i
*
d i
)
Example:
Var [ S (25)] [0.957] 2 2 1158 .
5 ( 1158 .
5 2 ) 0 .
0000378 21 1121 ( 1121 21 ) 24 1040 ( 1040 24 )
VARIANCE FOR CUMULATIVE SURVIVAL ESTIMATE
Method described by Greenwood in 1926* Var [ ˆ (t i )] [ ˆ (t i )] 2 All t intervals j t i
d i N i
* (
N i
*
d i
) And 95% Confidence interval can then be obtained: ˆ (t i ) 1.96
Var [ ˆ (t i )]
Example:
Var [ Sˆ (25)] [0.957] 2 2 1158 .
5 ( 1158 .
5 2 ) 0 .
0000378 21 1121 ( 1121 21 ) 24 1040 ( 1040 24 ) 95% CI: 0 .
957 1.96
0 .
0000378 0 .
957 1.96
0 .
00614 [ 0 .
945 0 .
969 ] *Greenwood M. A report on the natural duration of cancer.
Rep Pub Health Med Subjects
1926;33:1-26.
KAPLAN-MEIER METHOD
E.L. Kaplan and P. Meier, 1958*
Calculate the cumulative probability of event (and survival) based on conditional probabilities at each event time
Person ID 1 2 3 4 5 6 0
(6) (12) (15) (18) (3)
1
Follow-up time (years)
2
(24)
*Kaplan EL, Meier P.Nonparametric estimation from incomplete observations.
J Am Stat Assoc
1958;53:457-81.
KAPLAN-MEIER METHOD
E.L. Kaplan and P. Meier, 1958*
Calculate the cumulative probability of event (and survival) based on conditional probabilities at each event time Step 1:
Sort the survival times from shortest to longest Person ID 6 2 5 4 3 1
(3) (6) (12) (15) (18) (24)
0 1
Follow-up time (years)
2 *Kaplan EL, Meier P.Nonparametric estimation from incomplete observations.
J Am Stat Assoc
1958;53:457-81.
Step 2:
For each time of occurrence of an event, compute the conditional survival Person ID 6 2 5 4 3 1
(3) (6) (12) (15) (18) (24)
0 1
Follow-up time (years)
2
When the first event occurs (3 months after beginning of follow-up), there are 6 persons at risk. One of them dies at that point; 5 of the 6 survive beyond that point. Thus: • Instantaneous incidence of event at time 3 months: 1/6 • Probability of survival beyond 3 months: 5/6
Person ID 6 2 5 4 3 1 (3) (6)
(12) (15) (18) (24)
0 1
Follow-up time (years)
2
When the second event occurs (12 months), there are 4 persons at risk. One of them dies at that point; 3 of the 4 survive beyond that point. Thus: • Incidence of event at time 12 months: 1/4 • Probability of survival beyond 12 months: ¾
Person ID 6 2 5 4 3 1 (3) (6) (12) (15)
(18) (24)
0 1
Follow-up time (years)
2
When the third event occurs (18 months), there are 2 persons at risk. One of them dies at that point; 1 of the 2 survive beyond that point. Thus: • Incidence of event at time 18 months: 1/2 • Probability of survival beyond 18 months: ½
CONDITIONAL PROBABILITY OF AN EVENT (or of survival) The probability of an event (or of survival) at time t for the individuals at risk at time t, that is, conditioned on being at risk at time t.
Step 3:
For each time of occurrence of an event, compute the cumulative survival (survival function) of surviving beyond that time, by multiplying conditional probabilities of survival.
3 months: 12 months: 18 months: In Greek: S(3)=5/6=0.833
S(12)=5/6 3/4=0.625
S(18)=5/6 3/4 1/2 =0.3125
S(t i ) All t deaths j t i 1 d j n j
Plotting the survival function when using the Kaplan-Meier approach: Survival 1.00
0.80
0.60
0.40
0.20
Time (mo) 3 12 18
i 0.833
0.625
0.3125
0 5 10 15 20 Month of follow-up 25
The cumulative incidence (up to 24 months): 1 - 0.3125 = 0.6875 (or 69%)
Greenwood’s formula for variance calculation also works for the KM estimate … and thus, confidence limits for the cumulative survival estimates can be calculated and plotted. E.g.:
Life table vs. Kaplan-Meier
• Generally (if N is large and/or if life-table intervals small), it wont make much difference. E.g.: Survival after diagnosis of Ewing’s sarcoma* * (Solid line, actuarial life table estimate; broken line, KM estimate)
ASSUMPTIONS IN SURVIVAL ESTIMATES
• (For the actuarial life table only)
Risk is constant within each interval
• (If individuals are recruited over a long period of time)
No secular trends
Calendar time Follow-up time
•
ASSUMPTIONS IN SURVIVAL ESTIMATES
(Cont’d)
Censoring is independent of survival
(uninformative censoring): Those censored at time t have the same prognosis as those remaining.
Types of censoring: • Lost to follow-up – Migration – Refusal • Death (from another cause) • Administrative withdrawal (study finished)
• If censored observations tend to have worse prognosis than those remaining in the study:
1.0
Observed in study True Survival Time
• If censored observations tend to have better prognosis than those remaining in the study:
1.0
True Survival Observed in study Time
Note:
This assumption is generic to any kind of analysis (absolute risk calculation) of prospective data.