Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1   1 Exposure=0 e P( D / ~ E )   1 e Disease = 1 e P(

Transcript Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1   1 Exposure=0 e P( D / ~ E )   1 e Disease = 1 e P(

Logistic Regression II
Simple 2x2 Table
(courtesy Hosmer and Lemeshow)
Exposure=1
  1
Exposure=0
e
P( D / ~ E ) 

1 e
Disease = 1
e
P( D / E ) 
1  e  1
Disease = 0
1
1
P (~ D / E ) 
P(~ D / ~ E ) 
  1

1 e
1 e
Odds Ratio for simple 2x2 Table
  1
e
1

  1

1

e
1

e
OR 

1
1
e

  1

1 e
1 e
e
  1
e

e
(  1 ) 
(courtesy Hosmer and Lemeshow)
e
1
Example 1: CHD and Age
(2x2)
(from Hosmer and Lemeshow)
=>55 yrs
<55 years
CHD Present
21
22
CHD Absent
6
51
Example 1: CHD and Age
(2x2)
(from Hosmer and Lemeshow)
=>55 yrs
<55 years
CHD Present
21
22
CHD Absent
6
51
The Logit Model
P( D)
log(
)    1 X 1
1  P( D)
1 if exposed (older)
X1  
0 if unexposed (younger)
The Likelihood
  1

e
1
e
1 51
21
6
22
L( , 1 )  (
) x(
) x(
) x(
)
  1
  1


1 e
1 e
1 e
1 e
The Log Likelihood
recall :
  1
log e
 1
 log e e

 log e  log e
1
   1

e  1 21
1
e
1 51
6
22
L( , 1 )  (
) x(
) x(
) x(
)
  1
  1


1 e
1 e
1 e
1 e
 log L( , 1 ) 
  1
21(  1 )  21log(1  e

  1
)  0  6 log(1  e

22  22 log(1  e )  0  51log(1  e )
)
Derivative(s) of the log
likelihood
 log L( , 1 ) 
21(  1 )  21log(1  e  1 )  0  6 log(1  e  1 ) 
22  22 log(1  e )  0  51log(1  e )
d [log L( 1 )] 
d 1
  1
  1
21e
6e
21 

  1
  1
1 e
1 e
d [log L( )] 
d


22e
51e
22 



1 e
1 e
Maximize 
22e 51e
22 

0


1 e 1 e

73e
22 

1 e


22(1  e )  73e

22  51e
22
e 
51

=Odds of disease in
the unexposed (<55)
Maximize 1
  1
27e
21
0
  1
1 e
27e  1  21(1  e  1 )
6e  1  21
21
e

6
21
21
21x51
1
6
6
e 


 OR

22
e
6 x 22
51
  1
Hypothesis Testing
H0: =0
Null value of
1. The Wald test:
beta is 0 (no
association)
ˆ  0
Z
asymptoticstandard error ( ˆ )
2. The Likelihood Ratio test:
Reduced=reduced model with k parameters; Full=full model with k+p parameters
 2 ln
L(reduced )

L( full)
 2 ln(L(reduced ))  [2 ln(L( full))] ~  2p
Hypothesis Testing
H0: =0
1. What
is the Wald Test here?
Z
51x 21
ln(
)
6 x 22
 3.96
1 1 1 1
  
51 6 21 22
2. What is the Likelihood Ratio test here?
– Full model = includes age variable
– Reduced model = includes only intercept

Maximum likelihood for reduced model ought to be (.43)43x(.57)57
(57 cases/43 controls)…does MLE yield this?…
The Reduced Model
P( D)
log(
) 
1  P ( D)
Likelihood value for reduced model
e 43
1 57
L( )  (
)
x
(
)


1 e
1 e
log L( )  43 log e  43(1  e )  57(1  e )
d log L( )
100e
 43 
0

d
1 e
43  43e  100e
43  57e
43
e 
 .75 = marginal odds of CHD!
57
  ln(.75)  .28
.75 43
1 57
L(  .28)  (
) x(
) 
1.75
1.75
(.43) 43 x(.57) 57  2.1x10 30
Likelihood value of full model
21
22
1 6
1 51
21
22
6
51
L ( 1 )  (
) x(
) x(
) x(
) 
21
21
22
22
1
1
1
1
6
6
51
51
3.5 21 1 6 .43 22
1 51
( ) x( ) x(
) x(
)  2.43x10 26
4.5
4.5
1.43
1.43
Finally the LR…
L(reduced)
 2 ln

L( full)
 2 ln(2.1x1030 )  [2 ln(2.43x10 26 )]  136.7  117.96  18.7
18.7  (3.96) 2
Example 2:
>2 exposure levels
*(dummy coding)
CHD
status
White
Black
Hispanic
Other
Present
5
20
15
10
Absent
20
10
10
10
(From Hosmer and Lemeshow)
SAS CODE
data race;
input chd race_2 race_3 race_4 number;
datalines;
0 0 0 0 20
Note the use of “dummy
1 0 0 0 5
variables.”
0 1 0 0 10
1 1 0 0 20
“Baseline” category is
0 0 1 0 10
white here.
1 0 1 0 15
0 0 0 1 10
1 0 0 1 10
end;
run;
proc logistic data=race descending;
weight number;
model chd = race_2 race_3 race_4;
run;
What’s the likelihood here?
e white 5
1
e white   black 20
1
20
10
L(β)  (
)
x
(
)
x
(
)
x
(
)
1  e white
1  e white
1  e white   black
1  e white   black
 white   hisp
e
15
1
x(
) x(
 white   hisp


1 e
1  e white hisp
e white   other 10
1
10
) (
)
x
(
)
1  e white   other
1  e white   other
10
In this case there is more
than one unknown beta
(regression coefficient)—
so this symbol represents a
vector of beta coefficients.
SAS OUTPUT – model fit
Criterion
AIC
SC
-2 Log L
Intercept
Only
Intercept
and
Covariates
140.629
140.709
138.629
132.587
132.905
124.587
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
14.0420
13.3333
11.7715
3
3
3
0.0028
0.0040
0.0082
SAS OUTPUT – regression
coefficients
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
race_2
race_3
race_4
1
1
1
1
-1.3863
2.0794
1.7917
1.3863
0.5000
0.6325
0.6455
0.6708
Wald
Chi-Square
Pr > ChiSq
7.6871
10.8100
7.7048
4.2706
0.0056
0.0010
0.0055
0.0388
SAS output – OR estimates
The LOGISTIC Procedure
Odds Ratio Estimates
Effect
Point
Estimate
race_2
race_3
race_4
8.000
6.000
4.000
95% Wald
Confidence Limits
2.316
1.693
1.074
27.633
21.261
14.895
Interpretation:
8x increase in odds of CHD for black vs. white
6x increase in odds of CHD for hispanic vs. white
4x increase in odds of CHD for other vs. white
Example 3: Prostrate Cancer Study
(same data as from lab 3)

Question: Does PSA level predict tumor
penetration into the prostatic capsule (yes/no)?
(this is a bad outcome, meaning tumor has spread).

Is this association confounded by race?

Does race modify this association (interaction)?
1. What’s the relationship
between PSA (continuous
variable) and capsule
penetration (binary)?
Capsule (yes/no) vs. PSA (mg/ml)
psa vs. capsule
capsule
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
60
70
psa
80
90
100
110
120
130
140
Mean PSA per quintile vs. proportion capsule=yes
 S-shaped?
proportion
with
capsule=yes
0.70
0.68
0.66
0.64
0.62
0.60
0.58
0.56
0.54
0.52
0.50
0.48
0.46
0.44
0.42
0.40
0.38
0.36
0.34
0.32
0.30
0.28
0.26
0.24
0.22
0.20
0.18
0
10
20
30
PSA (mg/ml)
40
50
logit plot of psa predicting capsule, by
quintiles  linear in the logit?
logit plot of psa predicting capsule, by
QUARTILE  linear in the logit?
logit plot of psa predicting capsule, by decile
 linear in the logit?
model: capsule = psa
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
49.1277
41.7430
29.4230
1
1
1
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Intercept
psa
1
1
-1.1137
0.0502
Standard
Error
0.1616
0.00925
Wald
Chi-Square
47.5168
29.4230
Pr > ChiSq
<.0001
<.0001
Model: capsule = psa race

Analysis of Maximum Likelihood Estimates

Parameter
DF
Estimate
Standard
Error

Intercept
1
-0.4992
0.4581
1.1878

psa
1
0.0512
0.00949
29.0371

race
1
-0.5788
0.4187
1.9111


Wald
Chi-Square
Pr > ChiSq
0.2758

<.0001
0.1668
No indication of confounding by race since the
regression coefficient is not changed in
magnitude.
Model:
capsule = psa race psa*race
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq

Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603

psa*race
1
-0.0349
0.0193
3.2822
0.0700


Standard
Parameter



Evidence of effect modification by race (p=.07).
STRATIFIED BY RACE:
---------------------------- race=0 ----------------------------
Parameter
DF
Estimate
Standard
Error
Intercept
psa
1
1
-1.1904
0.0608
0.1793
0.0117
Wald
Chi-Square
Pr > ChiSq
44.0820
26.9250
<.0001
<.0001
---------------------------- race=1 ---------------------------Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
psa
1
1
-1.0950
0.0259
0.5116
0.0153
Wald
Chi-Square
Pr > ChiSq
4.5812
2.8570
0.0323
0.0910
How to calculate ORs from
model with interaction term
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq

Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603

psa*race
1
-0.0349
0.0193
3.2822
0.0700


Standard
Parameter



Increased odds for every 5 mg/ml increase in
PSA:
If white (race=0):
If black (race=1):
e
( 5*.0608 )
e
 1.36
( 5*(.0608 .0349 ))
 1.14
How to calculate ORs from
model with interaction term
DF
Wald
Estimate
Error
Chi-Square
Pr > ChiSq

Intercept
psa
race
1
1
1
-1.2858
0.0608
0.0954
0.6247
0.0280
0.5421
4.2360
11.6952
0.0310
0.0396
0.0006
0.8603

psa*race
1
-0.0349
0.0193
3.2822
0.0700


Standard
Parameter



Increased odds for every 5 mg/ml increase in
PSA:
If white (race=0):
If black (race=1):
e
( 5*.0608 )
e
 1.36
( 5*(.0608 .0349 ))
 1.14
ORs for increasing psa at
different levels of race.
OR for a 5 mg/mlincrease in psa level among white men :
e .0608*(5) .0954*(0) .0349*( 0)
  .0608*(0) .0954*(0) .0349*( 0)  e .0608*5  1.36
e
OR for a 10 mg/mlincrease in psa level among white men :
e .0608*(10) .0954*(0) .0349*( 0)
  .0608*(0) .0954*(0) .0349*( 0)  e .0608*10  1.82
e
OR for a 5 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(5*1)
  .0608*(0) .0954*(1) .0349*( 0*1)  e .0608*5.0349*5  1.14
e
OR for a 10 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(10*1)
  .0608*(0) .0954*(1) .0349*( 0)  e10 (.0608 .0349 )  1.30
e
ORs for increasing psa at
different levels of race.
OR for a 5 mg/mlincrease in psa level among white men :
e .0608*(5) .0954*(0) .0349*( 0)
  .0608*(0) .0954*(0) .0349*( 0)  e .0608*5  1.36
e
OR for a 10 mg/mlincrease in psa level among white men :
e .0608*(10) .0954*(0) .0349*( 0)
  .0608*(0) .0954*(0) .0349*( 0)  e .0608*10  1.82
e
OR for a 5 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(5*1)
  .0608*(0) .0954*(1) .0349*( 0*1)  e .0608*5.0349*5  1.14
e
OR for a 10 mg/mlincrease in psa level among black men :
e .0608*(5) .0954*(1) .0349*(10*1)
  .0608*(0) .0954*(1) .0349*( 0)  e10 (.0608 .0349 )  1.30
e
OR for being black (vs. white), at
different levels of psa.
OR for being black (vs. white) among men with psa  100 mg/ml:
e  .0608*(100) .0954*(1) .0349*(1*100)
  .0608*(100) .0954*(0) .0349*( 0*100)  e .0954*(1) .0349*(1*100)  0.034
e
OR for being black (vs. white) among men with psa  50 mg/ml:
e  .0608*(50) .0954*(1) .0349*(1*50)
  .0608*(50) .0954*(0) .0349*( 0)  e .0954*(1) .0349*(1*50)  0.19
e
OR for being black (vs. white) among men with psa  1 mg/ml:
e  .0608*(1) .0954*(1) .0349*(1)
  .0608*(1) .0954*(0) .0349*( 0)  e .0954*(1) .0349*(1)  1.06
e
OR for being black (vs. white) among men with psa  0 mg/ml:
e  .0608*(0) .0954*(1) .0349*( 0)
  .0608*(0) .0954*(0) .0349*( 0)  e .0954*(1)  1.10
e
Predictions

The model:
logit (capsule  1)  1.2858  .0608( psa)  .0954(race)  .0349( psa * race)
What’s
the predicted probability for a white man with
psa level of 10 mg/ml?
P(capsule  1)
ln(
)  1.2858  .0608( psa)  .0954(race)  .0349( psa * race)
1 - P(capsule  1)
P(capsule  1)
 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
1 - P(capsule  1)
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule  1) 
1  e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0)
P(capsule  1/white, psa  10) 
1  e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0e )
e 1.2858 .0608 (10 )
.51


 .34
1.2858 .0608 (10 )
1.51
1 e
Predictions

The model:
logit (capsule  1)  1.2858  .0608( psa)  .0954(race)  .0349( psa * race)
What’s
the predicted probability for a black man with
psa level of 10 mg/ml?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule  1) 
1  e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0608 (10 ) .0954 (1) .0349 (10 )
.39
P(capsule  1/black, psa  10) 

 .28
1.2858 .0608 (10 ) .0954 (1) .0349 (10 )
1.39
1 e
Predictions

The model:
logit (capsule  1)  1.2858  .0608( psa)  .0954(race)  .0349( psa * race)
What’s
the predicted probability for a white man with
psa level of 0 mg/ml (reference group)?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule  1) 
1  e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858
.28
P(capsule  1/black, psa  10) 

 .22
1.2858
1.28
1 e
Predictions

The model:
logit (capsule  1)  1.2858  .0608( psa)  .0954(race)  .0349( psa * race)
What’s
the predicted probability for a black man with
psa level of 0 mg/ml?
e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
P(capsule  1) 
1  e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race )
e 1.2858 .0954 (1)
.30
P(capsule  1/black, psa  10) 

 .23
1.2858 .0954 (1)
1.30
1 e
Diagnostics: Residuals
What’s a residual in the context of logistic
regression?
Residual=observed-predicted

For logistic regression:
residual= 1 – predicted probability
OR residual = 0 – predicted probability
Diagnostics: Residuals
What’s
the residual for a white man with psa level of
0 mg/ml who has capsule penetration?
Residual  1  .22  .88
What’s
the residual for a white man with psa level of
0 mg/ml who does not have capsule penetration?
Residual  0  .22  .22
In SAS…recall model with psa
and gleason…
proc logistic data = hrp261.psa;
model capsule (event="1") = psa gleason;
output out=MyOutdata l=MyLowerCI
p=Mypredicted u=MyUpperCI resdev=Myresiduals;
run;
proc gplot data = MyOutdata;
plot Myresiduals*predictor;
run;
Residual*psa
De v i a n c e
Re s i d u a l
3
2
1
0
- 1
- 2
- 3
0
10
20
30
40
50
60
70
psa
80
90
100
110
120
130
140
Estimated prob*gleason
E s t i ma t e d
Pr o b a b i l i t y
1. 0
0. 9
0. 8
0. 7
0. 6
0. 5
0. 4
0. 3
0. 2
0. 1
0. 0
0
1
2
3
4
5
gl eason
6
7
8
9

Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1   1 Exposure=0 e P( D / ~ E )   1 e Disease = 1 e P(

Transcript Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1   1 Exposure=0 e P( D / ~ E )   1 e Disease = 1 e P(

Directory