Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1 1 Exposure=0 e P( D / ~ E ) 1 e Disease = 1 e P(
Download ReportTranscript Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1 1 Exposure=0 e P( D / ~ E ) 1 e Disease = 1 e P(
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1 1 Exposure=0 e P( D / ~ E ) 1 e Disease = 1 e P( D / E ) 1 e 1 Disease = 0 1 1 P (~ D / E ) P(~ D / ~ E ) 1 1 e 1 e Odds Ratio for simple 2x2 Table 1 e 1 1 1 e 1 e OR 1 1 e 1 1 e 1 e e 1 e e ( 1 ) (courtesy Hosmer and Lemeshow) e 1 Example 1: CHD and Age (2x2) (from Hosmer and Lemeshow) =>55 yrs <55 years CHD Present 21 22 CHD Absent 6 51 Example 1: CHD and Age (2x2) (from Hosmer and Lemeshow) =>55 yrs <55 years CHD Present 21 22 CHD Absent 6 51 The Logit Model P( D) log( ) 1 X 1 1 P( D) 1 if exposed (older) X1 0 if unexposed (younger) The Likelihood 1 e 1 e 1 51 21 6 22 L( , 1 ) ( ) x( ) x( ) x( ) 1 1 1 e 1 e 1 e 1 e The Log Likelihood recall : 1 log e 1 log e e log e log e 1 1 e 1 21 1 e 1 51 6 22 L( , 1 ) ( ) x( ) x( ) x( ) 1 1 1 e 1 e 1 e 1 e log L( , 1 ) 1 21( 1 ) 21log(1 e 1 ) 0 6 log(1 e 22 22 log(1 e ) 0 51log(1 e ) ) Derivative(s) of the log likelihood log L( , 1 ) 21( 1 ) 21log(1 e 1 ) 0 6 log(1 e 1 ) 22 22 log(1 e ) 0 51log(1 e ) d [log L( 1 )] d 1 1 1 21e 6e 21 1 1 1 e 1 e d [log L( )] d 22e 51e 22 1 e 1 e Maximize 22e 51e 22 0 1 e 1 e 73e 22 1 e 22(1 e ) 73e 22 51e 22 e 51 =Odds of disease in the unexposed (<55) Maximize 1 1 27e 21 0 1 1 e 27e 1 21(1 e 1 ) 6e 1 21 21 e 6 21 21 21x51 1 6 6 e OR 22 e 6 x 22 51 1 Hypothesis Testing H0: =0 Null value of 1. The Wald test: beta is 0 (no association) ˆ 0 Z asymptoticstandard error ( ˆ ) 2. The Likelihood Ratio test: Reduced=reduced model with k parameters; Full=full model with k+p parameters 2 ln L(reduced ) L( full) 2 ln(L(reduced )) [2 ln(L( full))] ~ 2p Hypothesis Testing H0: =0 1. What is the Wald Test here? Z 51x 21 ln( ) 6 x 22 3.96 1 1 1 1 51 6 21 22 2. What is the Likelihood Ratio test here? – Full model = includes age variable – Reduced model = includes only intercept Maximum likelihood for reduced model ought to be (.43)43x(.57)57 (57 cases/43 controls)…does MLE yield this?… The Reduced Model P( D) log( ) 1 P ( D) Likelihood value for reduced model e 43 1 57 L( ) ( ) x ( ) 1 e 1 e log L( ) 43 log e 43(1 e ) 57(1 e ) d log L( ) 100e 43 0 d 1 e 43 43e 100e 43 57e 43 e .75 = marginal odds of CHD! 57 ln(.75) .28 .75 43 1 57 L( .28) ( ) x( ) 1.75 1.75 (.43) 43 x(.57) 57 2.1x10 30 Likelihood value of full model 21 22 1 6 1 51 21 22 6 51 L ( 1 ) ( ) x( ) x( ) x( ) 21 21 22 22 1 1 1 1 6 6 51 51 3.5 21 1 6 .43 22 1 51 ( ) x( ) x( ) x( ) 2.43x10 26 4.5 4.5 1.43 1.43 Finally the LR… L(reduced) 2 ln L( full) 2 ln(2.1x1030 ) [2 ln(2.43x10 26 )] 136.7 117.96 18.7 18.7 (3.96) 2 Example 2: >2 exposure levels *(dummy coding) CHD status White Black Hispanic Other Present 5 20 15 10 Absent 20 10 10 10 (From Hosmer and Lemeshow) SAS CODE data race; input chd race_2 race_3 race_4 number; datalines; 0 0 0 0 20 Note the use of “dummy 1 0 0 0 5 variables.” 0 1 0 0 10 1 1 0 0 20 “Baseline” category is 0 0 1 0 10 white here. 1 0 1 0 15 0 0 0 1 10 1 0 0 1 10 end; run; proc logistic data=race descending; weight number; model chd = race_2 race_3 race_4; run; What’s the likelihood here? e white 5 1 e white black 20 1 20 10 L(β) ( ) x ( ) x ( ) x ( ) 1 e white 1 e white 1 e white black 1 e white black white hisp e 15 1 x( ) x( white hisp 1 e 1 e white hisp e white other 10 1 10 ) ( ) x ( ) 1 e white other 1 e white other 10 In this case there is more than one unknown beta (regression coefficient)— so this symbol represents a vector of beta coefficients. SAS OUTPUT – model fit Criterion AIC SC -2 Log L Intercept Only Intercept and Covariates 140.629 140.709 138.629 132.587 132.905 124.587 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq 14.0420 13.3333 11.7715 3 3 3 0.0028 0.0040 0.0082 SAS OUTPUT – regression coefficients Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept race_2 race_3 race_4 1 1 1 1 -1.3863 2.0794 1.7917 1.3863 0.5000 0.6325 0.6455 0.6708 Wald Chi-Square Pr > ChiSq 7.6871 10.8100 7.7048 4.2706 0.0056 0.0010 0.0055 0.0388 SAS output – OR estimates The LOGISTIC Procedure Odds Ratio Estimates Effect Point Estimate race_2 race_3 race_4 8.000 6.000 4.000 95% Wald Confidence Limits 2.316 1.693 1.074 27.633 21.261 14.895 Interpretation: 8x increase in odds of CHD for black vs. white 6x increase in odds of CHD for hispanic vs. white 4x increase in odds of CHD for other vs. white Example 3: Prostrate Cancer Study (same data as from lab 3) Question: Does PSA level predict tumor penetration into the prostatic capsule (yes/no)? (this is a bad outcome, meaning tumor has spread). Is this association confounded by race? Does race modify this association (interaction)? 1. What’s the relationship between PSA (continuous variable) and capsule penetration (binary)? Capsule (yes/no) vs. PSA (mg/ml) psa vs. capsule capsule 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 10 20 30 40 50 60 70 psa 80 90 100 110 120 130 140 Mean PSA per quintile vs. proportion capsule=yes S-shaped? proportion with capsule=yes 0.70 0.68 0.66 0.64 0.62 0.60 0.58 0.56 0.54 0.52 0.50 0.48 0.46 0.44 0.42 0.40 0.38 0.36 0.34 0.32 0.30 0.28 0.26 0.24 0.22 0.20 0.18 0 10 20 30 PSA (mg/ml) 40 50 logit plot of psa predicting capsule, by quintiles linear in the logit? logit plot of psa predicting capsule, by QUARTILE linear in the logit? logit plot of psa predicting capsule, by decile linear in the logit? model: capsule = psa Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 49.1277 41.7430 29.4230 1 1 1 <.0001 <.0001 <.0001 Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Parameter DF Estimate Intercept psa 1 1 -1.1137 0.0502 Standard Error 0.1616 0.00925 Wald Chi-Square 47.5168 29.4230 Pr > ChiSq <.0001 <.0001 Model: capsule = psa race Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept 1 -0.4992 0.4581 1.1878 psa 1 0.0512 0.00949 29.0371 race 1 -0.5788 0.4187 1.9111 Wald Chi-Square Pr > ChiSq 0.2758 <.0001 0.1668 No indication of confounding by race since the regression coefficient is not changed in magnitude. Model: capsule = psa race psa*race DF Wald Estimate Error Chi-Square Pr > ChiSq Intercept psa race 1 1 1 -1.2858 0.0608 0.0954 0.6247 0.0280 0.5421 4.2360 11.6952 0.0310 0.0396 0.0006 0.8603 psa*race 1 -0.0349 0.0193 3.2822 0.0700 Standard Parameter Evidence of effect modification by race (p=.07). STRATIFIED BY RACE: ---------------------------- race=0 ---------------------------- Parameter DF Estimate Standard Error Intercept psa 1 1 -1.1904 0.0608 0.1793 0.0117 Wald Chi-Square Pr > ChiSq 44.0820 26.9250 <.0001 <.0001 ---------------------------- race=1 ---------------------------Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept psa 1 1 -1.0950 0.0259 0.5116 0.0153 Wald Chi-Square Pr > ChiSq 4.5812 2.8570 0.0323 0.0910 How to calculate ORs from model with interaction term DF Wald Estimate Error Chi-Square Pr > ChiSq Intercept psa race 1 1 1 -1.2858 0.0608 0.0954 0.6247 0.0280 0.5421 4.2360 11.6952 0.0310 0.0396 0.0006 0.8603 psa*race 1 -0.0349 0.0193 3.2822 0.0700 Standard Parameter Increased odds for every 5 mg/ml increase in PSA: If white (race=0): If black (race=1): e ( 5*.0608 ) e 1.36 ( 5*(.0608 .0349 )) 1.14 How to calculate ORs from model with interaction term DF Wald Estimate Error Chi-Square Pr > ChiSq Intercept psa race 1 1 1 -1.2858 0.0608 0.0954 0.6247 0.0280 0.5421 4.2360 11.6952 0.0310 0.0396 0.0006 0.8603 psa*race 1 -0.0349 0.0193 3.2822 0.0700 Standard Parameter Increased odds for every 5 mg/ml increase in PSA: If white (race=0): If black (race=1): e ( 5*.0608 ) e 1.36 ( 5*(.0608 .0349 )) 1.14 ORs for increasing psa at different levels of race. OR for a 5 mg/mlincrease in psa level among white men : e .0608*(5) .0954*(0) .0349*( 0) .0608*(0) .0954*(0) .0349*( 0) e .0608*5 1.36 e OR for a 10 mg/mlincrease in psa level among white men : e .0608*(10) .0954*(0) .0349*( 0) .0608*(0) .0954*(0) .0349*( 0) e .0608*10 1.82 e OR for a 5 mg/mlincrease in psa level among black men : e .0608*(5) .0954*(1) .0349*(5*1) .0608*(0) .0954*(1) .0349*( 0*1) e .0608*5.0349*5 1.14 e OR for a 10 mg/mlincrease in psa level among black men : e .0608*(5) .0954*(1) .0349*(10*1) .0608*(0) .0954*(1) .0349*( 0) e10 (.0608 .0349 ) 1.30 e ORs for increasing psa at different levels of race. OR for a 5 mg/mlincrease in psa level among white men : e .0608*(5) .0954*(0) .0349*( 0) .0608*(0) .0954*(0) .0349*( 0) e .0608*5 1.36 e OR for a 10 mg/mlincrease in psa level among white men : e .0608*(10) .0954*(0) .0349*( 0) .0608*(0) .0954*(0) .0349*( 0) e .0608*10 1.82 e OR for a 5 mg/mlincrease in psa level among black men : e .0608*(5) .0954*(1) .0349*(5*1) .0608*(0) .0954*(1) .0349*( 0*1) e .0608*5.0349*5 1.14 e OR for a 10 mg/mlincrease in psa level among black men : e .0608*(5) .0954*(1) .0349*(10*1) .0608*(0) .0954*(1) .0349*( 0) e10 (.0608 .0349 ) 1.30 e OR for being black (vs. white), at different levels of psa. OR for being black (vs. white) among men with psa 100 mg/ml: e .0608*(100) .0954*(1) .0349*(1*100) .0608*(100) .0954*(0) .0349*( 0*100) e .0954*(1) .0349*(1*100) 0.034 e OR for being black (vs. white) among men with psa 50 mg/ml: e .0608*(50) .0954*(1) .0349*(1*50) .0608*(50) .0954*(0) .0349*( 0) e .0954*(1) .0349*(1*50) 0.19 e OR for being black (vs. white) among men with psa 1 mg/ml: e .0608*(1) .0954*(1) .0349*(1) .0608*(1) .0954*(0) .0349*( 0) e .0954*(1) .0349*(1) 1.06 e OR for being black (vs. white) among men with psa 0 mg/ml: e .0608*(0) .0954*(1) .0349*( 0) .0608*(0) .0954*(0) .0349*( 0) e .0954*(1) 1.10 e Predictions The model: logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race) What’s the predicted probability for a white man with psa level of 10 mg/ml? P(capsule 1) ln( ) 1.2858 .0608( psa) .0954(race) .0349( psa * race) 1 - P(capsule 1) P(capsule 1) e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) 1 - P(capsule 1) e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) P(capsule 1) 1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0) P(capsule 1/white, psa 10) 1 e 1.2858 .0608 (10 ) .0954 ( 0) .0349 ( 0e ) e 1.2858 .0608 (10 ) .51 .34 1.2858 .0608 (10 ) 1.51 1 e Predictions The model: logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race) What’s the predicted probability for a black man with psa level of 10 mg/ml? e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) P(capsule 1) 1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) e 1.2858 .0608 (10 ) .0954 (1) .0349 (10 ) .39 P(capsule 1/black, psa 10) .28 1.2858 .0608 (10 ) .0954 (1) .0349 (10 ) 1.39 1 e Predictions The model: logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race) What’s the predicted probability for a white man with psa level of 0 mg/ml (reference group)? e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) P(capsule 1) 1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) e 1.2858 .28 P(capsule 1/black, psa 10) .22 1.2858 1.28 1 e Predictions The model: logit (capsule 1) 1.2858 .0608( psa) .0954(race) .0349( psa * race) What’s the predicted probability for a black man with psa level of 0 mg/ml? e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) P(capsule 1) 1 e 1.2858 .0608 ( psa ) .0954 ( race ) .0349 ( psa *race ) e 1.2858 .0954 (1) .30 P(capsule 1/black, psa 10) .23 1.2858 .0954 (1) 1.30 1 e Diagnostics: Residuals What’s a residual in the context of logistic regression? Residual=observed-predicted For logistic regression: residual= 1 – predicted probability OR residual = 0 – predicted probability Diagnostics: Residuals What’s the residual for a white man with psa level of 0 mg/ml who has capsule penetration? Residual 1 .22 .88 What’s the residual for a white man with psa level of 0 mg/ml who does not have capsule penetration? Residual 0 .22 .22 In SAS…recall model with psa and gleason… proc logistic data = hrp261.psa; model capsule (event="1") = psa gleason; output out=MyOutdata l=MyLowerCI p=Mypredicted u=MyUpperCI resdev=Myresiduals; run; proc gplot data = MyOutdata; plot Myresiduals*predictor; run; Residual*psa De v i a n c e Re s i d u a l 3 2 1 0 - 1 - 2 - 3 0 10 20 30 40 50 60 70 psa 80 90 100 110 120 130 140 Estimated prob*gleason E s t i ma t e d Pr o b a b i l i t y 1. 0 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 0 1 2 3 4 5 gl eason 6 7 8 9