Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1 1 Exposure=0 e P( D / ~ E ) 1 e Disease = 1 e P(
Download
Report
Transcript Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1 1 Exposure=0 e P( D / ~ E ) 1 e Disease = 1 e P(
Logistic Regression II
Simple 2x2 Table
(courtesy Hosmer and Lemeshow)
Exposure=1
1
Exposure=0
e
P( D / ~ E )
1 e
Disease = 1
e
P( D / E )
1 e 1
Disease = 0
1
1
P (~ D / E )
P(~ D / ~ E )
1
1 e
1 e
Odds Ratio for simple 2x2 Table
1
e
1
1
1
e
1
e
OR
1
1
e
1
1 e
1 e
e
1
e
e
( 1 )
(courtesy Hosmer and Lemeshow)
e
1
Example 1: CHD and Age
(2x2)
(from Hosmer and Lemeshow)
=>55 yrs
<55 years
CHD Present
21
22
CHD Absent
6
51
Example 1: CHD and Age
(2x2)
(from Hosmer and Lemeshow)
=>55 yrs
<55 years
CHD Present
21
22
CHD Absent
6
51
The Logit Model
P( D)
log(
) 1 X 1
1 P( D)
1 if exposed (older)
X1
0 if unexposed (younger)
The Likelihood
1
e
1
e
1 51
21
6
22
L( , 1 ) (
) x(
) x(
) x(
)
1
1
1 e
1 e
1 e
1 e
The Log Likelihood
recall :
1
log e
1
log e e
log e log e
1
1
e 1 21
1
e
1 51
6
22
L( , 1 ) (
) x(
) x(
) x(
)
1
1
1 e
1 e
1 e
1 e
log L( , 1 )
1
21( 1 ) 21log(1 e
1
) 0 6 log(1 e
22 22 log(1 e ) 0 51log(1 e )
)
Derivative(s) of the log
likelihood
log L( , 1 )
21( 1 ) 21log(1 e 1 ) 0 6 log(1 e 1 )
22 22 log(1 e ) 0 51log(1 e )
d [log L( 1 )]
d 1
1
1
21e
6e
21
1
1
1 e
1 e
d [log L( )]
d
22e
51e
22
1 e
1 e
Maximize
22e 51e
22
0
1 e 1 e
73e
22
1 e
22(1 e ) 73e
22 51e
22
e
51
=Odds of disease in
the unexposed (<55)
Maximize 1
1
27e
21
0
1
1 e
27e 1 21(1 e 1 )
6e 1 21
21
e
6
21
21
21x51
1
6
6
e
OR
22
e
6 x 22
51
1
Hypothesis Testing
H0: =0
Null value of
1. The Wald test:
beta is 0 (no
association)
ˆ 0
Z
asymptoticstandard error ( ˆ )
2. The Likelihood Ratio test:
Reduced=reduced model with k parameters; Full=full model with k+p parameters
2 ln
L(reduced )
L( full)
2 ln(L(reduced )) [2 ln(L( full))] ~ 2p
Hypothesis Testing
H0: =0
1. What
is the Wald Test here?
Z
51x 21
ln(
)
6 x 22
3.96
1 1 1 1
51 6 21 22
2. What is the Likelihood Ratio test here?
– Full model = includes age variable
– Reduced model = includes only intercept
Maximum likelihood for reduced model ought to be (.43)43x(.57)57
(57 cases/43 controls)…does MLE yield this?…
The Reduced Model
P( D)
log(
)
1 P ( D)
Likelihood value for reduced model
e 43
1 57
L( ) (
)
x
(
)
1 e
1 e
log L( ) 43 log e 43(1 e ) 57(1 e )
d log L( )
100e
43
0
d
1 e
43 43e 100e
43 57e
43
e
.75 = marginal odds of CHD!
57
ln(.75) .28
.75 43
1 57
L( .28) (
) x(
)
1.75
1.75
(.43) 43 x(.57) 57 2.1x10 30
Likelihood value of full model
21
22
1 6
1 51
21
22
6
51
L ( 1 ) (
) x(
) x(
) x(
)
21
21
22
22
1
1
1
1
6
6
51
51
3.5 21 1 6 .43 22
1 51
( ) x( ) x(
) x(
) 2.43x10 26
4.5
4.5
1.43
1.43
Finally the LR…
L(reduced)
2 ln
L( full)
2 ln(2.1x1030 ) [2 ln(2.43x10 26 )] 136.7 117.96 18.7
18.7 (3.96) 2
Example 2:
>2 exposure levels
*(dummy coding)
CHD
status
White
Black
Hispanic
Other
Present
5
20
15
10
Absent
20
10
10
10
(From Hosmer and Lemeshow)
SAS CODE
data race;
input chd race_2 race_3 race_4 number;
datalines;
0 0 0 0 20
Note the use of “dummy
1 0 0 0 5
variables.”
0 1 0 0 10
1 1 0 0 20
“Baseline” category is
0 0 1 0 10
white here.
1 0 1 0 15
0 0 0 1 10
1 0 0 1 10
end;
run;
proc logistic data=race descending;
weight number;
model chd = race_2 race_3 race_4;
run;
What’s the likelihood here?
e white 5
1
e white black 20
1
20
10
L(β) (
)
x
(
)
x
(
)
x
(
)
1 e white
1 e white
1 e white black
1 e white black
white hisp
e
15
1
x(
) x(
white hisp
1 e
1 e white hisp
e white other 10
1
10
) (
)
x
(
)
1 e white other
1 e white other
10
In this case there is more
than one unknown beta
(regression coefficient)—
so this symbol represents a
vector of beta coefficients.
SAS OUTPUT – model fit
Criterion
AIC
SC
-2 Log L
Intercept
Only
Intercept
and
Covariates
140.629
140.709
138.629
132.587
132.905
124.587
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
14.0420
13.3333
11.7715
3
3
3
0.0028
0.0040
0.0082
SAS OUTPUT – regression
coefficients
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
race_2
race_3
race_4
1
1
1
1
-1.3863
2.0794
1.7917
1.3863
0.5000
0.6325
0.6455
0.6708
Wald
Chi-Square
Pr > ChiSq
7.6871
10.8100
7.7048
4.2706
0.0056
0.0010
0.0055
0.0388
SAS output – OR estimates
The LOGISTIC Procedure
Odds Ratio Estimates
Effect
Point
Estimate
race_2
race_3
race_4
8.000
6.000
4.000
95% Wald
Confidence Limits
2.316
1.693
1.074
27.633
21.261
14.895
Interpretation:
8x increase in odds of CHD for black vs. white
6x increase in odds of CHD for hispanic vs. white
4x increase in odds of CHD for other vs. white
Example 3: Prostrate Cancer Study
(same data as from lab 3)
Question: Does PSA level predict tumor
penetration into the prostatic capsule (yes/no)?
(this is a bad outcome, meaning tumor has spread).
Is this association confounded by race?
Does race modify this association (interaction)?
1. What’s the relationship
between PSA (continuous
variable) and capsule
penetration (binary)?
Capsule (yes/no) vs. PSA (mg/ml)
psa vs. capsule
capsule
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
60
70
psa
80
90
100
110
120
130
140
Mean PSA per quintile vs. proportion capsule=yes
S-shaped?
proportion
with
capsule=yes
0.70
0.68
0.66
0.64
0.62
0.60
0.58
0.56
0.54
0.52
0.50
0.48
0.46
0.44
0.42
0.40
0.38
0.36
0.34
0.32
0.30
0.28
0.26
0.24
0.22
0.20
0.18
0
10
20
30
PSA (mg/ml)
40
50
logit plot of psa predicting capsule, by
quintiles linear in the logit?
logit plot of psa predicting capsule, by
QUARTILE linear in the logit?
logit plot of psa predicting capsule, by decile
linear in the logit?
model: capsule = psa
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
49.1277
41.7430
29.4230
1
1
1
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Intercept
psa
1
1
-1.1137
0.0502
Standard
Error
0.1616
0.00925
Wald
Chi-Square
47.5168
29.4230
Pr > ChiSq
<.0001
<.0001
Model: capsule = psa race
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
1
-0.4992
0.4581
1.1878
psa
1
0.0512
0.00949
29.0371
race
1
-0.5788
0.4187
1.9111
Wald
Chi-Square
Pr > ChiSq
0.2758
<.0001
0.1668
No indication of confounding by race since the
regression coefficient is not changed in
magnitude.
Model:
capsule = psa race psa*race
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq
Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603
psa*race
1
-0.0349
0.0193
3.2822
0.0700
Standard
Parameter
Evidence of effect modification by race (p=.07).
STRATIFIED BY RACE:
---------------------------- race=0 ----------------------------
Parameter
DF
Estimate
Standard
Error
Intercept
psa
1
1
-1.1904
0.0608
0.1793
0.0117
Wald
Chi-Square
Pr > ChiSq
44.0820
26.9250
<.0001
<.0001
---------------------------- race=1 ---------------------------Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
psa
1
1
-1.0950
0.0259
0.5116
0.0153
Wald
Chi-Square
Pr > ChiSq
4.5812
2.8570
0.0323
0.0910
How to calculate ORs from
model with interaction term
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq
Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603
psa*race
1
-0.0349
0.0193
3.2822
0.0700
Standard
Parameter
Increased odds for every 5 mg/ml increase in
PSA:
If white (race=0):
If black (race=1):
e
( 5*.0608 )
e
1.36
( 5*(.0608 .0349 ))
1.14
How to calculate ORs from
model with interaction term
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq
Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603
psa*race
1
-0.0349
0.0193
3.2822
0.0700
Standard
Parameter
Increased odds for every 5 mg/ml increase in
PSA:
If white (race=0):
If black (race=1):
e
( 5*.0608 )
e
1.36
( 5*(.0608 .0349 ))
1.14
ORs for increasing psa at
different levels of race.
OR for a 5 mg/mlincrease in psa level among white men :
e .0608*(5) .0954*(0) .0349*( 0)
.0608*(0) .0954*(0) .0349*( 0) e .0608*5 1.36
e
OR for a 10 mg/mlincrease in psa level among white men :
e .0608*(10) .0954*(0) .0349*( 0)
.0608*(0) .0954*(0) .0349*( 0) e .0608*10 1.82
e
OR for a 5 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(5*1)
.0608*(0) .0954*(1) .0349*( 0*1) e .0608*5.0349*5 1.14
e
OR for a 10 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(10*1)
.0608*(0) .0954*(1) .0349*( 0) e10 (.0608 .0349 ) 1.30
e
ORs for increasing psa at
different levels of race.
OR for a 5 mg/mlincrease in psa level among white men :
e .0608*(5) .0954*(0) .0349*( 0)
.0608*(0) .0954*(0) .0349*( 0) e .0608*5 1.36
e
OR for a 10 mg/mlincrease in psa level among white men :
e .0608*(10) .0954*(0) .0349*( 0)
.0608*(0) .0954*(0) .0349*( 0) e .0608*10 1.82
e
OR for a 5 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(5*1)
.0608*(0) .0954*(1) .0349*( 0*1) e .0608*5.0349*5 1.14
e
OR for a 10 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(10*1)
.0608*(0) .0954*(1) .0349*( 0) e10 (.0608 .0349 ) 1.30
e
OR for being black (vs. white), at
different levels of psa.
OR for being black (vs. white) among men with psa 100 mg/ml:
e .0608*(100) .0954*(1) .0349*(1*100)
.0608*(100) .0954*(0) .0349*( 0*100) e .0954*(1) .0349*(1*100) 0.034
e
OR for being black (vs. white) among men with psa 50 mg/ml:
e .0608*(50) .0954*(1) .0349*(1*50)
.0608*(50) .0954*(0) .0349*( 0) e .0954*(1) .0349*(1*50) 0.19
e
OR for being black (vs. white) among men with psa 1 mg/ml:
e .0608*(1) .0954*(1) .0349*(1)
.0608*(1) .0954*(0) .0349*( 0) e .0954*(1) .0349*(1) 1.06
e
OR for being black (vs. white) among men with psa 0 mg/ml:
e .0608*(0) .0954*(1) .0349*( 0)
.0608*(0) .0954*(0) .0349*( 0) e .0954*(1) 1.10
e
Predictions
The model:
logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race)
What’s
the predicted probability for a white man with
psa level of 10 mg/ml?
P(capsule 1)
ln(
) 1.2858 .0608( psa) .0954(race) .0349( psa * race)
1 - P(capsule 1)
P(capsule 1)
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
1 - P(capsule 1)
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule 1)
1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0)
P(capsule 1/white, psa 10)
1 e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0e )
e 1.2858 .0608 (10 )
.51
.34
1.2858 .0608 (10 )
1.51
1 e
Predictions
The model:
logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race)
What’s
the predicted probability for a black man with
psa level of 10 mg/ml?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule 1)
1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0608 (10 ) .0954 (1) .0349 (10 )
.39
P(capsule 1/black, psa 10)
.28
1.2858 .0608 (10 ) .0954 (1) .0349 (10 )
1.39
1 e
Predictions
The model:
logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race)
What’s
the predicted probability for a white man with
psa level of 0 mg/ml (reference group)?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule 1)
1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858
.28
P(capsule 1/black, psa 10)
.22
1.2858
1.28
1 e
Predictions
The model:
logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race)
What’s
the predicted probability for a black man with
psa level of 0 mg/ml?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule 1)
1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0954 (1)
.30
P(capsule 1/black, psa 10)
.23
1.2858 .0954 (1)
1.30
1 e
Diagnostics: Residuals
What’s a residual in the context of logistic
regression?
Residual=observed-predicted
For logistic regression:
residual= 1 – predicted probability
OR residual = 0 – predicted probability
Diagnostics: Residuals
What’s
the residual for a white man with psa level of
0 mg/ml who has capsule penetration?
Residual 1 .22 .88
What’s
the residual for a white man with psa level of
0 mg/ml who does not have capsule penetration?
Residual 0 .22 .22
In SAS…recall model with psa
and gleason…
proc logistic data = hrp261.psa;
model capsule (event="1") = psa gleason;
output out=MyOutdata l=MyLowerCI
p=Mypredicted u=MyUpperCI resdev=Myresiduals;
run;
proc gplot data = MyOutdata;
plot Myresiduals*predictor;
run;
Residual*psa
De v i a n c e
Re s i d u a l
3
2
1
0
- 1
- 2
- 3
0
10
20
30
40
50
60
70
psa
80
90
100
110
120
130
140
Estimated prob*gleason
E s t i ma t e d
Pr o b a b i l i t y
1. 0
0. 9
0. 8
0. 7
0. 6
0. 5
0. 4
0. 3
0. 2
0. 1
0. 0
0
1
2
3
4
5
gl eason
6
7
8
9