Transcript Document

Department of Epidemiology and Public Health
Unit of Biostatistics and Computational Sciences
Regression models for binary and survival data
PD Dr. C. Schindler
Swiss Tropical and Public Health Institute
University of Basel
[email protected]
Annual meeting of the Swiss Societies of Clinical Neurophysiology
and of Neurology, Lugano, May 3rd 2012
Binary outcome data
Binary endpoints Y
Examples:
Y = „death within 1 year“
Y = „disease progression
within 2 years“
Y = „remission within
three months“
Y = 1, event occurred
= 0, otherwise
Preliminaries:
1. Meaning of E(Y) for a binary variable Y
E(Y) = Mean of Y at the population level
= P(Y = 0) · 0 + P(Y = 1) · 1
= P(Y = 1)
Thus,
mean of Y = probability of the event represented by Y.
2. Notion of odds
Odds (Y) =
P(Y = 1)
Example:
P(Y = 1) / P(Y = 0)
=
P(Y = 1) / [1 – P(Y = 1)]
=
Odds(Y) / [1 + Odds(Y)]
(*)
Y = „disease progression“
P(Y =1) = 0.3
Odds(Y) = 0.3 / [1 – 0.3] = 0.3 / 0.7 = 0.429
P(Y=1) = 0.429 / [1 + 0.429] = 0.3
3. Notion of odds ratio (OR)
X = „high risk“ (0 -> normal risk, 1 -> increased risk)
P(Y = 1 | X = 0) = P(Y = 1) in subjects with X = 0
= 0.2
P(Y = 1 | X = 1) = P(Y = 1) in subjects with X = 1
= 0.4
Odds(Y | X = 0) = 0.2 / 0.8 = 0.25
Odds(Y | X = 1) = 0.4 / 0.6 = 0.667
OR(Y | X) = OR(Y | X = 1)/OR(Y | X = 0) = 0.667 / 0.25 = 2.67
with outcome
(Y = 1)
w/o outcome
(Y = 0)
with risk factor
(X = 1)
40
60
100
w/o risk factor
(X = 0)
20
80
100
70
130
200
Symmetry of OR:
Prospective (cohort study):
OR of Y between X=1 and X=0 = 40/60 : 20/80 = 2.67
Retrospective (case control study)
OR of X between Y=1 and Y=0 = 40/20 : 60/80 = 2.67
4. Calculus of odds and odds ratios
Example:
risk of disease progression without risk factors A and B = 20%.
OR of disease progression associated with risk factor A = 2.0.
OR of disease progression associated with risk factor B = 3.0.
Odds without risk factors = 0.2 / (1 – 0.2) = 0.25.
a) Odds with risk factor A only = 0.25  2.0 = 0.5
b) Odds with risk factor B only = 0.25  3.0 = 0.75
c) Odds with both risk factors = 0.25  2.0  3.0 = 1.5
(if factors do not interact)
Corresponding risks:
a) 0.5/1.5 = 0.33, b) 0.75/1.75 = 0.43, c) 1.5/2.5 = 0.6
Regresssion models for probabilities / odds
Idea: As in classical regression consider
E(Y) = b0 + b1 · risk score
Equivalent formulation of the model:
P(Y = 1) = b0 + b1 · risk score
Problem:
P(Y = 1)
in (0, 1)
=
b0 + b1· risk score
can take values outside (0,1)
Solution:
P(Y = 1)
= F ( b0 + b1 · risk score )
z = linear predictor
(linear prediction score)
where F(z) is a function whose values are always in (0, 1)
Logistic function (standard choice)
1.0
F(z)
0.8
0.6
F(z) = ez / (1 + ez)
0.4
0.2
0.0
-4
-2
a) ez < 1 + ez => F(z) < 1
0
z
2
b) ez > 0 => F(z) > 0 .
4
Y = Outcome
(1 = present, 0 = absent)
X = risk factor (1 = present, 0 = absent)
Linear predictor (logit):
P (Y  1 | X )  e
z
=
z
1 e

z
b0 + b1 · x
e
b 0  b1 x
1 e
b 0  b1  x
Recalling that P(Y = 1|X) = Odds(Y|X) / [1 + Odds(Y|X)]
shows that
Odds (Y | X )

e
z

e
b 0  b1  x
b 0  b 1 1
x = 1:
Odds (Y )

e
x = 0:
Odds (Y )

e
b 0  b 1 0


e
b 0  b1
e
b0
Odds Ratio of Y between x=1 and x=0:
e
= > b0 = ln (Odds(Y|X = 0))
b 0  b1
/e
b0
e
b1 = ln (OR(Y|X))
b1
Logistic regression
P (Y  1)  e
z
1 e
z

e
b 0  b1  x
1 e
b 0  b1  x
Probit regression
P (Y  1)

 z 
  b 0  b 1  x 
F = cumulative density function of standard normal distribution
(another sigmoid shaped function ranging from 0 to 1)
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -117.34141
=
=
=
=
200
9.66
0.0019
0.0395
-----------------------------------------------------------------------------outcome |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------risk_factor |
.9808289
.3227486
3.04
0.002
.3482533
1.613404
_cons | -1.386294
.25
-5.55
0.000
-1.876285
-.896303
------------------------------------------------------------------------------
ln (Odds Ratio) = 0.9808
Odds Ratio
ln (Odds(Y|X = 0) = -1.3863 Odds(Y | X = 0)
=
e 0.9808 = 2.67
= e-1.3863 = 0.25
P(Y = 1 | X = 0)
=
0.25 / (1 + 0.25)
=
0.2
P(Y = 1 | X = 1)
=
0.25  2.67 / (1 + 0.25  2.67) =
0.4
Summary
exp(coefficient of risk factor) = OR of outcome between those
with and those without
risk factor (cohort study)
= OR of risk factor between those
with and those without
outcome (case-control study)
exp(intercept term) = odds of outcome among unexposed
subjects
Direct output of Odds ratio and odds of unexposed:
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -117.34141
=
=
=
=
200
9.66
0.0019
0.0395
-----------------------------------------------------------------------------outcome | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------risk_factor |
2.666666
.8606626
3.04
0.002
1.416591
5.019872
_cons |
.2500001
.0625
-5.55
0.000
.153158
.4080755
------------------------------------------------------------------------------
95%-confidence interval of odds ratio:
0
1
2
3
( 1.42 , 5.02)
4
5
Note: 1. Confidence intervals of odds ratios are asymmetrical!
2. If they do not include 1, then the respective association is statistically
significant at the 5% level.
Example from the literature
Ritchie K. et al., The neuroprotective effects of caffeine –
a prospective population study (The Three City Study),
Neurology 2007; 69: 536-45
outcome =
cognitive decline,
measured as either
a) decline by at least 6 units in Isaacs set test
or
b) decline by at least 2 units in Benton visual
retention test
over four years
1)
3)
2)
1. On average, odds of CD increased by 6% per additional year of age at baseline.
2. On average, odds of CD increased by 8% per additional unit in baseline cognitive test.
3. Compared to subjects with 5 years of education, the odds of CD among subjects with
≥12 years of education was reduced by more than 40%.
Main result of paper:
Significantly reduced risk of DIsaacs  -6 among women who drank more than
3 units of caffeine per day at baseline compared to women who only drank 0 – 1 units
CAVE:
NOTE:
Odds ratio (OR)
Odds
Odds
>


relative risk (RR)
risk (relative frequency)
risk (relative frequency)
possible situations for OR and RR:
a) OR < RR < 1
b) OR = RR = 1
c) 1 < RR < OR
Interpretation of OR as relative risk and of odds as risk (relative
frequency) is only appropriate if risks are small (i.e., < 10%)
Model comparison
Likelihood ratio test
Akaike information criterion (AIC)
Bayesian information criterion (BIC)
Pseudo-R2
Likelihood L of a model
Probability of observing exactly the same outcome data again if
exactly the same predictor data are given, provided that the
model describes reality in the best possible way with the given
variables.
log-Likelihood ln(L)
natural logarithm of the likelihood
ln(L) is always  0 , since probabilities are  1.
The perfect model would have L = 1 and ln(L) = 0.
The better the model, the closer ln(L) is to 0.
Comparison of nested models
A model M1 is said to be nested in another model M2 ,
if M1 is a special case of M2, e.g.,
if all the terms of M1 are also in M2 but not vice versa.
Under the hypothesis, that the additional terms of M2 are of no
predictive value, the difference
likelihood ratio
D = 2 [ln(L2) - ln(L1)]
=
2  ln(L2/L1)
has an approximate Chi2-distribution with df2 - df1 degrees of
freedom (where dfi = number of parameters of model i).
CAVE: Both models must be based on exactly the same data.
In particular, their n‘s must be identical.
Akaike information criterion
(„smaller is better“)
AIC =
- 2 ln(L) + 2  p
penalty for complexity of
the model
(p = number of parameters of the model in
addition to the intercept parameter)
The two models compared must be based on exactly
the same data (same n!), but they need not be nested
and can contain different variables.
Bayesian information criterion (Schwarz criterion)
(„smaller is better“)
BIC =
penalty for complexity of
the model
- 2 ln(L) + p  ln(n)
(p = number of parameters of the model in
addition to the intercept parameter,
n = sample size)
The two models compared need not be based on
the same data nor do they have to be nested.
Pseudo-R2
There exists an analog of R2 with logistic and other
generalized linear regression models.
„total variance“
ln(Lnull model)
ln(Lmodel)
0
„variance explained“
Pseudo-R2 = [ ln(Lmodel) - ln(Lnull model) ] / [ 0 - ln(Lnull model ) ]
Null model = model with an intercept term only.
Goodness of fit of a logistic regression model
Can be assessed using the Hosmer-Lemeshow-Test.
Mechanics of the test:
1. For each subject, the logistic regression model predicts its
individual probability of having Y = 1.
2. Subjects are then categorized into a certain number of classes
based on the size of their predicted probabilities.
3. In each of the classes, the proportion of subjects with Y = 1
is determined and compared with the mean value of the
predicted probabilities (ideally the two values should coincide
in each class).
Analysis of survival data
Censored and uncensored survival times
t0
Observation period (= individual time scale!)
x
t1
true event
-> uncensored survival time
o loss to follow-up
-> censored survival time
Event-free survival until t1
-> censored survival time
In survival analyses, two outcome variables are needed:
1. Event variable
1 = event was observed
-> uncensored observation
0 = event was not observed
-> censored observation
2. Variable for
event-free time observed
= time until event (uncensored observation)
observation time (censored observation)
Simple group comparisons of survival data
1. Construction of survival curves using the method
of Kaplan-Meier
2. Comparison of survival curves using the log rank
or the Wilcoxon test.
0.00
0.25
0.50
0.75
1.00
Kaplan-Meier survival estimates
0
20
group = 1
40
60
analysis time (weeks)
group = 2
80
100
group = 3
S(t) = Proportion of patients without event until time t
S(t+Dt)  S(t) – S(t)  h(t)  Dt
S(t+Dt) – S(t)  -S(t)  h(t)  Dt
S(t+Dt) – S(t)
 -h(t)
Dt  S(t)
S(t+Dt) – S(t)
S(t)  -h(t)
Dt
•
S(t) S(t)
= -h(t)
d/dt [ln(S(t))] = -h(t)
h(t)
=
- d/dt [ln(S(t))]
h(t) = instantaneous event risk
(hazard at time t)
S(0)
S(t)
S(t+Dt)
t t+Dt
Assumption of proportional hazards (PH)
h1(t) = hazard function in group 1
h2(t) = hazard function in group 2
h2(t) / h1(t)
=
HR
(hazard ratio)
h2(t)
=
HR  h1(t)
PH: ratio of hazards is independent of time t
1.00
Kaplan-Meier survival curves
0
0.00
0.25
.005
.01
0.50
.015
0.75
.02
Smoothed hazard estimates
0
20
40
60
80
analysis time (weeks)
group = 1
group = 2
group = 3
100
0
20
40
60
80
analysis time (weeks)
group = 1
group = 2
group = 3
Hazard functions of group 1 and 2 are proportional.
Hazard function of group 3 violates PH assumption
100
Logarithmized hazard estimates
0
-6.5
-6
.005
-5.5
.01
-5
.015
-4.5
-4
.02
Smoothed hazard estimates
0
20
40
60
80
analysis time (weeks)
group = 1
group = 2
group = 3
100
0
20
40
60
80
analysis time (weeks)
100
group = 1
group = 2
group = 3
Logarithmized hazards of group 1 and 2
run parrallel (PH-assumption).
Modelling of the hazard ratio
Sir David Roxbee Cox
* 1924
1950-1956 Cambridge University
1956-1966 Birkbeck College, London
1966-1988 Imperial College, London
1988-1994 Oxford University
HR
=
e
bx
x=1
with risk factor
x=0
without risk factor
x dichotomous
HR
= e
b1
= e
b
=
hazard ratio associated with
risk factor
HR
=
e
x continuous,
HR
= e
b1
bx
e.g. x = age at baseline
= e
b
=
hazard ratio associated with
a unit increase in x
(cross-sectional comparison)
Multiple proportional hazard regression model
(„Cox-regression model“)
HR
=
e
b1x1
+ b2x2 + ...... + bkxk
Reference category: subjects with x1 = x1 = ... = xk = 0
In our example:
No. of subjects =
No. of failures =
Time at risk
=
Log likelihood
=
1500
671
97331
-4697.2077
Number of obs
=
1500
LR chi2(2)
Prob > chi2
=
=
39.93
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group_2 |
1.393155
.1395814
3.31
0.001
1.144766
1.695439
group_3 |
1.832595
.1779825
6.24
0.000
1.514947
2.216846
------------------------------------------------------------------------------
On average, the hazards in group 3 and 2 were higher by
83% (95%-CI: 51 to 121%) and 39% (95%-CI: 15 to 70%),
respectively,
than in group 1 (= reference group).
Analysis restricted to first year:
No. of subjects =
No. of failures =
Time at risk
=
1500
579
55997
Number of obs
=
1500
LR chi2(2)
=
54.84
Log likelihood =
-4073.1439
Prob > chi2
=
0.0000
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group_2 |
1.387788
.1550301
2.93
0.003
1.114898
1.727472
group_3 |
2.122101
.222757
7.17
0.000
1.727489
2.606854
------------------------------------------------------------------------------
Analysis restricted to survivors of first year:
No. of subjects =
No. of failures =
Time at risk
=
872
92
41334
Number of obs
=
872
LR chi2(2)
=
11.92
Log likelihood =
-610.6509
Prob > chi2
=
0.0026
-----------------------------------------------------------------------------_t | Haz. Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group_2 |
1.442213
.3266646
1.62
0.106
.9251881
2.248167
group_3 |
.5261936
.1709134
-1.98
0.048
.2783979
.9945465
------------------------------------------------------------------------------
Conclusion:
1. Hazard ratio between group 2 and group 1 was very similar
in both years, i.e., around 1.4.
(confirming the proportionality of the two hazard functions).
2. Hazard ratio between group 3 and group 1 was higher than 2
in the first year but smaller than 1 in the second year
(confirming the sharp decrease of the hazard function of group 3,
falling below the one of group 1 after 1 year).
Example from the literature
Koch M. et al., The natural history of secondary progressive
multiple sclerosis, J. Neurol Neurosurg Psychiatry 2001;
81:1039-43
Some study characteristics:
- 5207 patients from British Columbia with a remitting disease
at baseline.
- Onset of immunomodulatory treatment was considered as
censoring event.
Kaplan-Meier curves:
PH-assumption was probably not satisfied by the factor gender!
1. On average, the covariate-adjusted hazard of secondary progression was higher
in men than in women by 43% and the median of time to secondary progression
was lower by 25% (i.e., 17.1 vs. 22.7 years).
2. On average, the covariate-adjusted hazard of secondary progression increased
by 5% with each additional year of age at baseline.
CAVE:
1. In general, the ratio of median or mean survival times between
two groups is not inversely proportional to the hazard ratio.
But in the special case of constant hazards, this relation holds
for the mean survival times.
2. The hazard ratio may be interpreted as relative risk only
if the event rates are small in the respective groups during the
time period of interest.
Thank you for your attention
again!