Transcript AMMBR II
AMMBR II Gerrit Rooks Checking assumptions in logistic regression • • • • Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference) Hosmer & Lemeshow Average Probability In j th group First logistic regression . logit hiqual yr_rnd meals cred_ml Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -349.01971 -199.10312 -160.11854 -156.27132 -156.25612 -156.25611 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -156.25611 hiqual Coef. yr_rnd meals cred_ml _cons -1.189537 -.0936 .7406536 2.425635 Std. Err. .5022235 .0084587 .3152647 .3995025 z -2.37 -11.07 2.35 6.07 P>|z| 0.018 0.000 0.019 0.000 = = = = 707 385.53 0.0000 0.5523 [95% Conf. Interval] -2.173877 -.1101786 .1227463 1.642624 -.2051967 -.0770213 1.358561 3.208645 Then postestimation command . estat gof, table group(10) Logistic model for hiqual, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total 1 2 3 4 5 0.0008 0.0019 0.0037 0.0078 0.0208 1 1 0 0 1 0.0 0.1 0.2 0.4 0.9 71 71 71 68 71 72.0 71.9 70.8 67.6 71.1 72 72 71 68 72 6 7 8 9 10 0.0560 0.1554 0.4960 0.7531 0.9595 2 4 23 44 62 2.4 7.4 22.0 43.5 61.1 68 68 47 26 8 67.6 64.6 48.0 26.5 8.9 70 72 70 70 70 number of observations number of groups Hosmer-Lemeshow chi2(8) Prob > chi2 = = = = 707 10 40.45 0.0000 Including interaction term helps . gen ym=yr_rnd*meals . logit hiqual yr_rnd meals cred_ml ym , nolog Logistic regression Number of obs LR chi2(4) Prob > chi2 Pseudo R2 Log likelihood = -153.78831 hiqual Coef. yr_rnd meals cred_ml ym _cons -2.834458 -.1019211 .7789823 .0463257 2.686005 Std. Err. .8630901 .0098691 .3206881 .0188326 .4307661 z -3.28 -10.33 2.43 2.46 6.24 P>|z| 0.001 0.000 0.015 0.014 0.000 = = = = 707 390.46 0.0000 0.5594 [95% Conf. Interval] -4.526083 -.1212641 .1504452 .0094145 1.841719 -1.142832 -.0825781 1.407519 .0832368 3.530291 Multicollinearity . reg hiqual avg_ed yr_rnd meals Source SS df MS Model Residual 145.983509 108.279876 3 1154 48.6611696 .093830049 Total 254.263385 1157 .219760921 hiqual Coef. avg_ed yr_rnd meals _cons .1729601 -.0008586 -.0076084 .2445202 Std. Err. .021089 .0248112 .000527 .0824989 . vif Variable VIF 1/VIF meals avg_ed yr_rnd 3.31 3.25 1.11 0.301982 0.307731 0.903460 Mean VIF 2.56 t 8.20 -0.03 -14.44 2.96 Number of obs F( 3, 1154) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.972 0.000 0.003 = = = = = = 1158 518.61 0.0000 0.5741 0.5730 .30632 [95% Conf. Interval] .1315831 -.0495386 -.0086423 .0826554 .2143371 .0478215 -.0065744 .4063849 Residuals • Residual = observed value – predicted value / square root of variation . predict p (option pr assumed; Pr(hiqual)) (42 missing values generated) . predict stdres, rstand (42 missing values generated) 50 Residuals 30 40 1403 10 20 1402 840 5154 0 167 3656 328 1033 4719 4558 596 5864 4556 4852 4745 3159 4536 5978 285 2152 3765 4547 2334 1851 5968 4724 5755 4678 4035 5787 5663 2635 4386 42 3634 4702 4084 995 2918 4518 2339 1038 3874 3812 4400 5421 810 2337 4292 2490 1500 4043 2369 2679 4800 3307 5039 1923 1795 346 1672 2802 5056 2353 1629 4799 4696 4240 1234 4609 4514 3593 5704 6105 4583 4 2387 5304 3520 2816 2338 4728 4257 2546 1777 694 3829 3518 1241 3675 2704 3081 2817 5716 5656 5192 4608 1112 4705 138 2333 5639 2703 4654 5288 4853 3087 2913 3797 151 4091 2167 4320 3581 4936 4698 301 2989 5636 4985 3636 1987 2607 2521 284 4391 3868 5326 4270 5149 2922 2930 4820 2984 1502 5597 3845 2652 5434 5664 853 2498 3224 5777 5712 4735 5189 3800 6116 2908 4934 923 4289 5701 5331 58 2076 3904 78 2905 4910 3521 2842 5638 5294 4326 2902 2625 5334 1620 4213 3201 3858 5593 3064 3530 10 4552 3289 2636 3353 4790 5361 93 1362 3956 1131 2266 4286 3824 590 84 5207 5442 203 5194 4929 3206 3083 3204 4282 5635 5713 5427 2944 999 4237 431 4356 4572 2934 3294 3317 4557 1860 4519 3296 4911 4428 4399 1379 4200 5433 5219 1213 4585 477 4437 3638 611 2935 5211 3288 3583 3193 3343 5057 37 5114 66 4783 18 1383 2929 2319 3621 3207 3063 2097 3293 2955 5499 2313 3350 4083 5465 4284 4285 3266 6145 5563 3960 3265 6124 6038 692 3004 5276 4439 3366 2078 6106 129 3966 3589 532 3235 3849 3238 2898 4964 123 5469 3356 43 5676 4594 3670 5401 1444 2535 3285 3210 15 5374 1985 5210 1125 1437 4435 2317 513 446 3355 6146 2565 2899 591 65 6142 4921 272 3246 3029 2904 4596 6108 380 604 521 3345 519 4445 5065 4963 4271 520 2073 20 2910 4329 4328 4433 5110 4436 6129 5003 3365 5218 6126 5408 3236 5215 5387 3214 6127 4955 5090 5379 4927 5000 394 512 5305 1415 5273 3371 529 81 382 5338 427 234 5227 4626 2527 5270 5224 5268 5222 2208 233 3373 5271 5228 4330 1850 698 3069 696 1951 664 1408 492 4522 4006 2086 4068 1967 4132 3509 1292 4098 3121 1846 2223 1642 1713 5095 1966 1863 5922 5946 1748 930 893 1769 4173 1861 1843 1451 1497 2141 1886 1768 496 759 1926 1997 1982 3118 932 1595 4504 4133 2786 5834 1890 1318 1069 1786 867 4860 1516 1894 5245 3119 1723 5896 2127 657 1746 3216 4146 1717 1906 3131 1916 5953 2280 1721 753 4143 1898 2323 3794 714 3688 5900 4521 670 5949 700 653 3150 5101 1855 935 1450 1839 688 2714 628 5956 319 1859 4061 2089 1762 2226 2115 2274 1185 1186 5948 980 3767 4618 296 5012 1813 4534 4064 3728 5907 3975 644 2114 2144 1807 1630 1315 2140 5018 738 2269 3147 1077 6046 4727 2103 2272 505 5768 736 3130 1623 5899 2685 4003 5867 5240 483 2788 3174 909 3764 3741 5926 2582 2270 6008 4815 1771 3895 2227 5911 1687 3833 2083 4414 2070 4415 671 4134 3484 5093 1490 3471 2440 4010 1914 2606 3778 5092 667 4135 5858 3460 5020 3454 1344 5059 1302 1618 1899 4012 1607 1187 3978 1947 4718 2441 2326 872 465 4729 1494 1345 1718 5728 2624 2136 2293 2695 3572 981 4018 3522 1853 640 5098 1952 5457 2732 1660 767 3166 2519 5906 1872 1879 2623 1714 2548 3107 3622 3426 6007 2276 1511 2692 4523 641 661 761 649 4019 3610 856 1681 2772 4901 1724 4882 5242 1729 3970 1980 949 4865 481 1350 4130 2583 1601 3775 647 842 5943 5506 5471 1198 4145 3408 4007 5993 2477 5573 4351 5620 1340 846 4500 5016 4307 5882 2179 1293 5870 3009 4868 2325 1072 2324 1339 6017 4746 3944 2491 4002 2589 2730 4056 5998 2430 106 3449 6044 1728 4736 4663 1161 954 6182 116 422 1915 4477 4033 5980 4627 4876 4699 3013 6015 2795 4638 4396 3043 5748 4879 3613 1156 3733 3502 5737 4497 2663 2126 3475 2882 4873 4975 3023 4870 2480 5581 666 3742 594 2580 2977 1988 4673 1739 1809 743 5967 3760 1912 1488 2281 2128 3519 3428 3695 836 2981 92 2600 5313 2494 1613 4496 3822 3864 4505 3329 4483 1085 1055 3007 6016 4409 1871 5990 5252 3316 1427 792 449 1118 4385 4302 3703 4880 453 228 1426 1830 1661 4275 5844 1992 4301 2544 4826 1493 2991 4381 2957 2119 4709 4537 941 1390 2691 3735 5547 1924 1755 5133 1103 1709 1484 96 4024 881 4022 773 4131 1199 4822 1419 3640 4581 5694 5910 173 3172 3161 4811 1965 2972 4544 1737 5847 2643 2752 5200 2639 4512 2383 2191 4059 2754 4747 134490 4268 5 1824 2168 4266 5752 3162 3084 2599 1473 6090 1492 1758 1657 1276 2698 4226 4036 1461 1108 1275 3876 4248 5586 329 4781 1696 2750 4948 1031 4539 1679 3881 140 4816 2573 2282 6172 1828 3754 801 4525 2593 4220 3305 2622 615 5534 2159 3532 5 3578 3411 5842 3834 6109 2672 6180 5874 1 351 6088 1685 5134 136 5406 2116 5548 3111 3022 490 47 358 1625 299 3955 258 1949 559 776 4253 5107 2378 6190 1454 5937 4411 4264 4358 4309 2687 3410 3893 5761 4452 6030 1706 4580 3986 2951 2588 4223 5851 1932 4194 4553 3612 4824 5647 4045 4486 2870 1627 1297 563 3375 3582 487 5873 3272 2753 470 3422 4182 1401 2822 4394 3708 5369 1819 3570 1751 924 2377 4506 2198 6171 3415 660 1018 5862 5605 3416 4040 3843 1219 1799 2453 3699 748 4645 244 1608 3712 5773 6043 1264 5035 1 1214 3566 874 1280 5693 5928 5607 1458 1600 2489 2307 784 799 5719 1373 2711 5363 1100 2696 4175 5818 3003 2973 5606 2755 3757 1215 4670 5853 4932 5612 3340 5295 4849 4366 464 5444 3650 5917 862 2520 205 5798 4839 5397 3884 3887 4203 3882 574 5380 2801 1904 583 1240 3954 4369 5723 1698 1646 1045 4923 5054 2571 4121 5796 1249 238 5555 28 2835 3655 2150 5312 147 5837 3870 137 1115 5375 5725 179 5708 5483 5494 283 5078 4683 4591 70 2509 5904 4926 1278 1232 6019 2587 5404 5196 4278 3853 5646 4550 259 1239 5403 4651 6101 3097 1001 4561 5062 2924 543 4786 54 3836 62 5441 5036 386 1666 5299 2098 4202 5572 4334 5026 5561 5409 2705 503 1887 342 302 550 5063 6063 1310 5569 5316 312 5692 5599 2386 6036 1311 4314 125 3098 4778 5657 6186 363 5700 373 5329 5524 5589 6087 3998 576 3917 1384 305 5836 1004 140 2165 372 5300 181 401 0 .2 .4 .6 Pr(hiqual) . scatter stdres p, mlabel(snum) .8 1 Inspect observations with large residuals (>2.5 a 3) . list if snum==1403 458. snum 1403 dnum 315 cred_hl low awards No schqual high pared medium hiqual high pared_ml medium ell 27 yr_rnd nd pared_hl . avg_ed 2.19 meals 100 api00 808 enroll 497 api99 824 hicred 0 cred low full 59 cred_ml low some_col 28 ym 100 . logit hiqual Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: yr_rnd meals avg_ed if snum != 1403 log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -729.56398 -332.43297 -270.06297 -265.70542 -265.68934 -265.68934 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -265.68934 hiqual Coef. yr_rnd meals avg_ed _cons -1.1328 -.0790397 2.010791 -3.528875 . logit hiqual Std. Err. .3842377 .0076984 .2947269 1.037345 z -2.95 -10.27 6.82 -3.40 P>|z| 0.003 0.000 0.000 0.001 = = = = 1157 927.75 0.0000 0.6358 [95% Conf. Interval] -1.885892 -.0941283 1.433137 -5.562035 -.3797077 -.0639511 2.588445 -1.495716 yr_rnd meals avg_ed, nolog Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -273.66402 hiqual Coef. yr_rnd meals avg_ed _cons -.9913148 -.0758864 1.98805 -3.566451 Std. Err. .3743452 .0074453 .2884154 1.01715 z -2.65 -10.19 6.89 -3.51 P>|z| 0.008 0.000 0.000 0.000 = = = = 1158 914.05 0.0000 0.6255 [95% Conf. Interval] -1.725018 -.090479 1.422766 -5.560028 -.2576117 -.0612938 2.553334 -1.572874 Cooks distance (< 1) Prediction for j from all observations Number of parameter Prediction for j for observations excluding observation i Means square error . logit hiqual meals yr_rnd cred_ml, nolog Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Logistic regression Log likelihood = -156.25611 hiqual Coef. meals yr_rnd cred_ml _cons -.0936 -1.189537 .7406536 2.425635 Std. Err. .0084587 .5022235 .3152647 .3995025 z -11.07 -2.37 2.35 6.07 P>|z| 0.000 0.018 0.019 0.000 707 385.53 0.0000 0.5523 = = = = [95% Conf. Interval] -.1101786 -2.173877 .1227463 1.642624 -.0770213 -.2051967 1.358561 3.208645 . predict cook, dbeta (493 missing values generated) . summ cook Variable Obs Mean cook 707 .0257177 Std. Dev. .0899176 Min Max 2.11e-07 .6101257 . graph twoway scatter cook p, mlabel(snum) .6 2086 4068 1846 1966 930 1497 934 718 1879 1516 753 1819 1403 935 .2 .4 1402 2334 1672 3845 1987 5836 140 5599 5664 5457 709 5620 1737 5569 0 3638 5589 2593 1033 3834 4002 167 4696 1777 5304 137 1241 1904 1620 62 5316 3893 5403 1232 5196 2802 1932 5864 5387 81 3917 2239 70 1860 5036 5063 4552 2165 2521 2984 5149 2607 2679 1706 351 4558 4799 5062 5441 4786 4294 2339 4853 1311 1310 5326 46985334 284 2703 66 373 78 1500 5380 5572 4312 1234 2369 3675 3829 904 4702 1239 3765 2150 3566 5329 1698 5874 3519 5406 5134 4182 6171 2490 1112 4651 2520 5719 4040 2696 5140 881 4253 3022 2116 6190 2687 5358 2378 596 4240 2870 660 2307 2672 1219 3966 185125871249 58 3858 3224 4175 4645 5363 5369 3870 995 2337 1795 1045 4705 2705 6101 259 3655 2266 2167 3518 5607127854 5444 5207 5434 3800 10 3824 2625 2386 3998 84 4783 5219 2319 3904 4816 5773 136 2951 1214 1646 5312 205 1629 3887 5873 3954 2338 1887 3235 3583 3956 1379 372 5561 6146 5798 179 1799 2588 3836 4975 3172 4309 2383 4248 6180 3570 4781 5937 1264 4724 28 5215 3214 3207 3356 1871 949 1600 4059 941 1627 2453 2489 1100 2750 2639 470 4778 2652 4839 5361 590 181 93 4213 2333 5194 2073 5374 3201 2635 1240 5210 5636 1383 5090 3246 521 4948 6088 1115 5211 3350 1985 5465 3475 6044 3970 4539 5313 1484 2494 1473 2168 1751 1401 1454 3986 2198 1103 5990 1276 1280 1608 464 3884 4561 65 6116 3849 6106 3004 2159 2622 1 299 1297 1874 3955 4824 4849 2755 2835 1923 4852 5295 3650 5397 53002955 2944 5331 5593 1213 5003 3238 529 1415 1444 4901 5993 4699 1824 3522 2977 4296 6182 5998 5980 1488 2698 47 5870 4318 1055 1199 3009 4307 1118 3023 1419 1685 4811 4826 6090 2753 2822 258 449 2282 5586 720 453 1373 2377 2208 3248 5379 1185 1186 2245 1175 2519 2326 1180 4736 4746 4879 422 872 1493 1912 2119 3864 4512 3978 1992 1161 1696 5882 2240 2480 2128 2663 1924 2126 5059 3166 909 3426 2772 2325 4663 2730 96 1293 2732 767 4729 1494 1872 173 92 1755 1492 2957 1828 1461 5200 1625 2544 2972 1758 4747 4226 25133 4220 2599 754 4223 4022 1458 5421 1666 6043 238 2152 5375 5494 4728 2076 5904 2098 4202 1502 1362 5409 3087 138 363 3960 5222 3355 1813 1779 698 696 1951 1952 664 1408 1618 1599 4132 1642 1713 1863 1784 1748 1769 1861 1843 1451 1886 1622 1768 1997 2089 4133 932 1890 1894 3216 657 1723 1717 1906 3131 1916 1718 4143 1721 1898 1762 1687 2083 2070 1855 1839 1850 1859 492 4522 4006 4061 1899 1967 1292 3121 2223 1807 1630 5095 5018 893 4173 2441 2141 1982 759 1926 3118 1595 1069 4003 1786 3767 3119 1746 2127 4146 5953 2280 2323 2270 1771 3150 5101 2226 2227 5016 3833 4134 1809 1490 2430 1679 2440 4010 1914 2714 3502 3613 5956 5858 5020 1681 4534 4064 1344 2115 2274 3975 4098 644 873 892 5946 1988 980 4504 5834 1318 867 483 5245 878 1345 1714 4618 1709 2136 2293 2695 671 5093 1450 4018 688 3471 1853 4135 3610 3460 1915 5012 4116 4873 3509 2114 3612 2144 1315 2140 4718 738 3147 2103 5768 496 3130 1623 2685 3174 6008 6007 2276 3895 4414 4415 3572 1511 1601 2692 1198 3069 4012 1660 5922 2272 5948 505 465 1980 2786 4344 1179 481 5911 1739 1350 856 836 1724 1947 2269 6046 4727 1171 4876 5867 5240 4815 1339 1340 846 6015 3484 1613 2991 1302 5242 1077 1728 5906 4868 4865 2623 5896 2624 3944 4056 106 1661 4145 5907 1187 1729 736 4860 2788 4500 116 2795 2583 1390 1607 5573 5899 981 228 4477 4007 900 2281 2491 2589 4130 3728 6017 2580 1426 1830 1427 5581 2691 2600 2477 1965 6016 2191 1156 5252 3013 3876 4095 954 4709 1275 4024 1657 4882 4870 3822 2752 4627 4880 5910 1108 0 .2 .4 .6 Pr(hiqual) .8 1 To Stata • • • • • Use apilog.dta Awards = dependent variable For Awards inspect frequency counts Recode Awards into binary variable Estimate a LR model using yr_rnd meals enroll as predictors To Stata • • • • • Inspect classification table Perform Hosmer & Lemeshow test Inspect standardized residuals Inspect cooks distance See if interaction effects improve fit • Is the Wald test an accurate test to the significance of coefficients in Logistic regression analysis? a) Yes, just like regression analysis. b) Yes, it is accurate, although a Likelihood ratio test is more efficient c) No, unlike regression analysis, the Wald test is biased, especially for relatively small coefficients . d) No, unlike regression analysis, the Wald test is biased, especially for relatively large coefficients . • Use LRtest to check the significance effect of the variable yr_rnd • Use auto.dta (if not on your pc then) – use http://www.stata-press.com/data/r11/auto • Predict which car will be foreign, using weigth and mpg as predictors • Is the interaction between weigth and mpg significant? • Tip: always center variable before making interactionvariable. • use http://www.stata-press.com/data/r11/choice • Does income, gender or type of car (European, Japanese or American) predict whether a car will be bought (choice)?