Transcript Document
Lecture 8: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II Interaction AKA effect modification Allows there to be a different association between two variables for differing levels of a third variable. Example: In the model with length of stay as an outcome, is there an interaction between medschool and nurse? Note that ‘adjustment’ is a rather weak form of accounting for a variable. Allowing an interaction allows much greater flexibility in the model Interactions Interactions can be formed between • • • • two continous variables a binary and a continuous variable two binary variables a binary variable and a categorical variable with >2 categories. • Etc. Three-way interaction: interaction between 3 variables Four-way, etc. Example: log(LOS) ~ INFRISK*MS log LOSi 0 1 INFRISKi 2 MS i 3 MS i * INFRISKi ei E[log LOSi | MS i 0] 0 1 INFRISKi E[log LOSi | MS i 1] 0 1 INFRISKi 2 3 INFRISKi ( 0 2 ) ( 1 3 ) INFRISKi How does this differ from the model without the interaction? Without the adjustment? Model 1: log LOSi 0 1 INFRISKi ei Model 2: log LOSi 0 1 INFRISKi 2 MSi ei Model 3: log LOSi 0 1 INFRISKi 2 MSi 3 MSi * INFRISKi ei Model 1 > plot(data$INFRISK, data$logLOS, xlab="Infection Risk, %", ylab="Length of Stay, days", pch=16, cex=1.5) > > # Model 1: > reg1 <- lm(logLOS ~ INFRISK, data=data) > abline(reg1, lwd=2) > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.93250 0.04794 40.310 < 2e-16 *** INFRISK 0.07293 0.01053 6.929 2.92e-10 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1494 on 111 degrees of freedom Multiple R-Squared: 0.302, Adjusted R-squared: 0.2957 F-statistic: 48.02 on 1 and 111 DF, p-value: 2.918e-10 2.0 2.2 2.4 2.6 Length of Stay, days 2.8 3.0 Model 1 2 3 4 5 Infection Risk, % 6 7 8 Model 2 > > > > > > > > reg2 <- lm(logLOS ~ INFRISK + ms, data=data) infriski <- seq(1,8,0.1) beta <- reg2$coefficients yhat0 <- beta[1] + beta[2]*infriski yhat1 <- beta[1] + beta[2]*infriski + beta[3] lines(infriski, yhat0, lwd=2, col=2) lines(infriski, yhat1, lwd=2, col=2) summary(reg2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.94449 0.04709 41.295 < 2e-16 *** INFRISK 0.06677 0.01058 6.313 5.91e-09 *** ms 0.09882 0.03949 2.503 0.0138 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1459 on 110 degrees of freedom Multiple R-Squared: 0.3396, Adjusted R-squared: 0.3276 F-statistic: 28.28 on 2 and 110 DF, p-value: 1.232e-10 2.0 2.2 2.4 2.6 Length of Stay, days 2.8 3.0 Model 2 2 3 4 5 Infection Risk, % 6 7 8 Model 3 > > > > > > > > # Model 3: reg3 <- lm(logLOS ~ INFRISK + ms + ms:INFRISK, data=data) infriski <- seq(1,8,0.1) beta <- reg3$coefficients yhat0 <- beta[1] + beta[2]*infriski yhat1 <- beta[1] + beta[3] + (beta[2]+beta[4])*infriski lines(infriski, yhat0, lwd=2, col=4) lines(infriski, yhat1, lwd=2, col=4) > summary(reg3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.947942 0.049698 39.195 < 2e-16 *** INFRISK 0.065950 0.011220 5.878 4.6e-08 *** ms 0.059514 0.178622 0.333 0.740 INFRISK:ms 0.007856 0.034807 0.226 0.822 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1466 on 109 degrees of freedom Multiple R-Squared: 0.3399, Adjusted R-squared: 0.3217 F-statistic: 18.71 on 3 and 109 DF, p-value: 7.35e-10 2.0 2.2 2.4 2.6 Length of Stay, days 2.8 3.0 Model 3 2 3 4 5 Infection Risk, % 6 7 8 Conclusions There does not appear to be an interaction between MEDSCHOOL and INFRISK Both MEDSCHOOL and INFISK are associated with log(LOS), in the presence of each other the association between INFRISK and log(LOS) is positive: for a 1% increase in infection risk, logLOS is expected to increase by 0.07, adjusting for Med School affiliation Hospitals with Med School affiliation tend to have longer average length of stay, adjusting for infection risk Interactions with continuous variables How to interpret with continuous variables? log LOSi 0 1 INFRISKi 2 NURSEi 3 INFRISKi * NURSEi ei Example: Difference between two hospitals with a 1% difference in INFRISK log LOSi 0 1 INFRISKi 2 NURSEi 3 INFRISKi * NURSEi ei log LOSi 0 1 ( INFRISKi 1) 2 NURSEi 3 ( INFRISKi 1) * NURSEi ei Difference 1 3 NURSEi Interaction with continuous variables > reg4 <- lm(logLOS ~ INFRISK*NURSE, data=data) > summary(reg4) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.067e+00 6.642e-02 31.120 < 2e-16 *** INFRISK 3.164e-02 1.586e-02 1.995 0.04853 * NURSE -1.025e-03 4.657e-04 -2.201 0.02986 * INFRISK:NURSE 2.696e-04 9.727e-05 2.771 0.00657 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1427 on 109 degrees of freedom Multiple R-Squared: 0.3739, Adjusted R-squared: 0.3567 F-statistic: 21.7 on 3 and 109 DF, p-value: 4.284e-11 3.0 2.6 2.4 2.2 2.0 data$logLOS 2.8 NURSE=300 NURSE=100 2 3 4 5 data$INFRISK 6 7 8 0.04 0.08 0.12 0.16 Change in logLOS for 1% change in INFRISK Interaction interpretation 0 100 200 300 NURSE 400 500 600 Interactions between categorical variables Simple with two binary variables More complicated to keep track of when more than two categories in one or more variable\ Example: REGION and MEDSCHOOL Question: Is there an interaction between REGION and MEDSCHOOL in regards to logLOS? That is: does the association between MEDSCHOOL and logLOS differ by REGION? Interpreting coefficients log LOSi 0 1 MS i 2 I ( Ri 2) 3 I ( Ri 3) 4 I ( Ri 4) 5 MS i I ( Ri 2) 6 MS i I ( Ri 3) 7 MS i I ( Ri 4) ei E[log LOSi | Ri 1, MS i 0] 0 E[log LOSi | Ri 1, MS i 1] 0 1 E[log LOSi | Ri 2, MS i 0] 0 2 E[log LOSi | Ri 2, MS i 1] 0 1 2 5 E[log LOSi | Ri 3, MS i 0] 0 3 E[log LOSi | Ri 3, MS i 1] 0 1 3 6 E[log LOSi | Ri 4, MS i 0] 0 4 E[log LOSi | Ri 4, MS i 1] 0 1 4 7 Regression Results > reg5 <- lm(logLOS ~ factor(REGION)*ms, data=data) > > summary(reg5) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.359927 0.030715 76.833 < 2e-16 factor(REGION)2 -0.122926 0.042560 -2.888 0.0047 factor(REGION)3 -0.163065 0.039769 -4.100 8.16e-05 factor(REGION)4 -0.299316 0.049933 -5.994 2.92e-08 ms 0.125486 0.072685 1.726 0.0872 factor(REGION)2:ms -0.007176 0.096181 -0.075 0.9407 factor(REGION)3:ms 0.033145 0.114691 0.289 0.7732 factor(REGION)4:ms 0.082734 0.132974 0.622 0.5352 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 *** ** *** *** . ‘ ’ 1 Residual standard error: 0.1473 on 105 degrees of freedom Multiple R-Squared: 0.3578, Adjusted R-squared: 0.3149 F-statistic: 8.356 on 7 and 105 DF, p-value: 4.356e-08 Association between MS and REGION table(data$REGION, data$ms) 1 2 3 4 0 23 25 34 14 1 5 7 3 2 = = = = 18% 22% 8% 13%