Transcript AMMBR II

AMMBR II
Gerrit Rooks
Checking assumptions in logistic
regression
•
•
•
•
Hosmer & Lemeshow
Residuals
Multi-collinearity
Cooks distance
Hosmer & Lemeshow
Test divides sample in subgroups, checks whether difference
between observed and predicted is about equal in these groups
Test should not be significant (indicating no difference)
Hosmer & Lemeshow
Average
Probability
In j th group
First logistic regression
. logit hiqual yr_rnd meals cred_ml
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-349.01971
-199.10312
-160.11854
-156.27132
-156.25612
-156.25611
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -156.25611
hiqual
Coef.
yr_rnd
meals
cred_ml
_cons
-1.189537
-.0936
.7406536
2.425635
Std. Err.
.5022235
.0084587
.3152647
.3995025
z
-2.37
-11.07
2.35
6.07
P>|z|
0.018
0.000
0.019
0.000
=
=
=
=
707
385.53
0.0000
0.5523
[95% Conf. Interval]
-2.173877
-.1101786
.1227463
1.642624
-.2051967
-.0770213
1.358561
3.208645
Then postestimation command
. estat gof, table group(10)
Logistic model for hiqual, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
Group
Prob
Obs_1
Exp_1
Obs_0
Exp_0
Total
1
2
3
4
5
0.0008
0.0019
0.0037
0.0078
0.0208
1
1
0
0
1
0.0
0.1
0.2
0.4
0.9
71
71
71
68
71
72.0
71.9
70.8
67.6
71.1
72
72
71
68
72
6
7
8
9
10
0.0560
0.1554
0.4960
0.7531
0.9595
2
4
23
44
62
2.4
7.4
22.0
43.5
61.1
68
68
47
26
8
67.6
64.6
48.0
26.5
8.9
70
72
70
70
70
number of observations
number of groups
Hosmer-Lemeshow chi2(8)
Prob > chi2
=
=
=
=
707
10
40.45
0.0000
Including interaction term helps
. gen ym=yr_rnd*meals
. logit hiqual yr_rnd meals cred_ml ym , nolog
Logistic regression
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
Log likelihood = -153.78831
hiqual
Coef.
yr_rnd
meals
cred_ml
ym
_cons
-2.834458
-.1019211
.7789823
.0463257
2.686005
Std. Err.
.8630901
.0098691
.3206881
.0188326
.4307661
z
-3.28
-10.33
2.43
2.46
6.24
P>|z|
0.001
0.000
0.015
0.014
0.000
=
=
=
=
707
390.46
0.0000
0.5594
[95% Conf. Interval]
-4.526083
-.1212641
.1504452
.0094145
1.841719
-1.142832
-.0825781
1.407519
.0832368
3.530291
Multicollinearity
. reg
hiqual avg_ed yr_rnd meals
Source
SS
df
MS
Model
Residual
145.983509
108.279876
3
1154
48.6611696
.093830049
Total
254.263385
1157
.219760921
hiqual
Coef.
avg_ed
yr_rnd
meals
_cons
.1729601
-.0008586
-.0076084
.2445202
Std. Err.
.021089
.0248112
.000527
.0824989
. vif
Variable
VIF
1/VIF
meals
avg_ed
yr_rnd
3.31
3.25
1.11
0.301982
0.307731
0.903460
Mean VIF
2.56
t
8.20
-0.03
-14.44
2.96
Number of obs
F( 3, 1154)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.972
0.000
0.003
=
=
=
=
=
=
1158
518.61
0.0000
0.5741
0.5730
.30632
[95% Conf. Interval]
.1315831
-.0495386
-.0086423
.0826554
.2143371
.0478215
-.0065744
.4063849
Residuals
• Residual = observed value – predicted value /
square root of variation
. predict p
(option pr assumed; Pr(hiqual))
(42 missing values generated)
. predict stdres, rstand
(42 missing values generated)
50
Residuals
30
40
1403
10
20
1402
840
5154
0
167
3656
328
1033
4719
4558
596
5864
4556
4852
4745
3159
4536
5978
285
2152
3765
4547
2334
1851
5968
4724
5755
4678
4035
5787
5663
2635
4386
42
3634
4702
4084
995
2918
4518
2339
1038
3874
3812
4400
5421
810
2337
4292
2490
1500
4043
2369
2679
4800
3307
5039
1923
1795
346
1672
2802
5056
2353
1629
4799
4696
4240
1234
4609
4514
3593
5704
6105
4583
4
2387
5304
3520
2816
2338
4728
4257
2546
1777
694
3829
3518
1241
3675
2704
3081
2817
5716
5656
5192
4608
1112
4705
138
2333
5639
2703
4654
5288
4853
3087
2913
3797
151
4091
2167
4320
3581
4936
4698
301
2989
5636
4985
3636
1987
2607
2521
284
4391
3868
5326
4270
5149
2922
2930
4820
2984
1502
5597
3845
2652
5434
5664
853
2498
3224
5777
5712
4735
5189
3800
6116
2908
4934
923
4289
5701
5331
58
2076
3904
78
2905
4910
3521
2842
5638
5294
4326
2902
2625
5334
1620
4213
3201
3858
5593
3064
3530
10
4552
3289
2636
3353
4790
5361
93
1362
3956
1131
2266
4286
3824
590
84
5207
5442
203
5194
4929
3206
3083
3204
4282
5635
5713
5427
2944
999
4237
431
4356
4572
2934
3294
3317
4557
1860
4519
3296
4911
4428
4399
1379
4200
5433
5219
1213
4585
477
4437
3638
611
2935
5211
3288
3583
3193
3343
5057
37
5114
66
4783
18
1383
2929
2319
3621
3207
3063
2097
3293
2955
5499
2313
3350
4083
5465
4284
4285
3266
6145
5563
3960
3265
6124
6038
692
3004
5276
4439
3366
2078
6106
129
3966
3589
532
3235
3849
3238
2898
4964
123
5469
3356
43
5676
4594
3670
5401
1444
2535
3285
3210
15
5374
1985
5210
1125
1437
4435
2317
513
446
3355
6146
2565
2899
591
65
6142
4921
272
3246
3029
2904
4596
6108
380
604
521
3345
519
4445
5065
4963
4271
520
2073
20
2910
4329
4328
4433
5110
4436
6129
5003
3365
5218
6126
5408
3236
5215
5387
3214
6127
4955
5090
5379
4927
5000
394
512
5305
1415
5273
3371
529
81
382
5338
427
234
5227
4626
2527
5270
5224
5268
5222
2208
233
3373
5271
5228
4330
1850
698
3069
696
1951
664
1408
492
4522
4006
2086
4068
1967
4132
3509
1292
4098
3121
1846
2223
1642
1713
5095
1966
1863
5922
5946
1748
930
893
1769
4173
1861
1843
1451
1497
2141
1886
1768
496
759
1926
1997
1982
3118
932
1595
4504
4133
2786
5834
1890
1318
1069
1786
867
4860
1516
1894
5245
3119
1723
5896
2127
657
1746
3216
4146
1717
1906
3131
1916
5953
2280
1721
753
4143
1898
2323
3794
714
3688
5900
4521
670
5949
700
653
3150
5101
1855
935
1450
1839
688
2714
628
5956
319
1859
4061
2089
1762
2226
2115
2274
1185
1186
5948
980
3767
4618
296
5012
1813
4534
4064
3728
5907
3975
644
2114
2144
1807
1630
1315
2140
5018
738
2269
3147
1077
6046
4727
2103
2272
505
5768
736
3130
1623
5899
2685
4003
5867
5240
483
2788
3174
909
3764
3741
5926
2582
2270
6008
4815
1771
3895
2227
5911
1687
3833
2083
4414
2070
4415
671
4134
3484
5093
1490
3471
2440
4010
1914
2606
3778
5092
667
4135
5858
3460
5020
3454
1344
5059
1302
1618
1899
4012
1607
1187
3978
1947
4718
2441
2326
872
465
4729
1494
1345
1718
5728
2624
2136
2293
2695
3572
981
4018
3522
1853
640
5098
1952
5457
2732
1660
767
3166
2519
5906
1872
1879
2623
1714
2548
3107
3622
3426
6007
2276
1511
2692
4523
641
661
761
649
4019
3610
856
1681
2772
4901
1724
4882
5242
1729
3970
1980
949
4865
481
1350
4130
2583
1601
3775
647
842
5943
5506
5471
1198
4145
3408
4007
5993
2477
5573
4351
5620
1340
846
4500
5016
4307
5882
2179
1293
5870
3009
4868
2325
1072
2324
1339
6017
4746
3944
2491
4002
2589
2730
4056
5998
2430
106
3449
6044
1728
4736
4663
1161
954
6182
116
422
1915
4477
4033
5980
4627
4876
4699
3013
6015
2795
4638
4396
3043
5748
4879
3613
1156
3733
3502
5737
4497
2663
2126
3475
2882
4873
4975
3023
4870
2480
5581
666
3742
594
2580
2977
1988
4673
1739
1809
743
5967
3760
1912
1488
2281
2128
3519
3428
3695
836
2981
92
2600
5313
2494
1613
4496
3822
3864
4505
3329
4483
1085
1055
3007
6016
4409
1871
5990
5252
3316
1427
792
449
1118
4385
4302
3703
4880
453
228
1426
1830
1661
4275
5844
1992
4301
2544
4826
1493
2991
4381
2957
2119
4709
4537
941
1390
2691
3735
5547
1924
1755
5133
1103
1709
1484
96
4024
881
4022
773
4131
1199
4822
1419
3640
4581
5694
5910
173
3172
3161
4811
1965
2972
4544
1737
5847
2643
2752
5200
2639
4512
2383
2191
4059
2754
4747
134490
4268
5
1824
2168
4266
5752
3162
3084
2599
1473
6090
1492
1758
1657
1276
2698
4226
4036
1461
1108
1275
3876
4248
5586
329
4781
1696
2750
4948
1031
4539
1679
3881
140
4816
2573
2282
6172
1828
3754
801
4525
2593
4220
3305
2622
615
5534
2159
3532
5
3578
3411
5842
3834
6109
2672
6180
5874
1
351
6088
1685
5134
136
5406
2116
5548
3111
3022
490
47
358
1625
299
3955
258
1949
559
776
4253
5107
2378
6190
1454
5937
4411
4264
4358
4309
2687
3410
3893
5761
4452
6030
1706
4580
3986
2951
2588
4223
5851
1932
4194
4553
3612
4824
5647
4045
4486
2870
1627
1297
563
3375
3582
487
5873
3272
2753
470
3422
4182
1401
2822
4394
3708
5369
1819
3570
1751
924
2377
4506
2198
6171
3415
660
1018
5862
5605
3416
4040
3843
1219
1799
2453
3699
748
4645
244
1608
3712
5773
6043
1264
5035
1
1214
3566
874
1280
5693
5928
5607
1458
1600
2489
2307
784
799
5719
1373
2711
5363
1100
2696
4175
5818
3003
2973 5606
2755
3757
1215
4670
5853
4932
5612
3340
5295
4849
4366
464
5444
3650
5917
862
2520
205
5798
4839
5397
3884
3887
4203
3882
574
5380
2801
1904
583
1240
3954
4369
5723
1698
1646
1045
4923
5054
2571
4121
5796
1249
238
5555
28
2835
3655
2150
5312
147
5837
3870
137
1115
5375
5725
179
5708
5483
5494
283
5078
4683
4591
70
2509
5904
4926
1278
1232
6019
2587
5404
5196
4278
3853
5646
4550
259
1239
5403
4651
6101
3097
1001
4561
5062
2924
543
4786
54
3836
62
5441
5036
386
1666
5299
2098
4202
5572
4334
5026
5561
5409
2705
503
1887
342
302
550
5063
6063
1310
5569
5316
312
5692
5599
2386
6036
1311
4314
125
3098
4778
5657
6186
363
5700
373
5329
5524
5589
6087
3998
576
3917
1384
305
5836
1004
140
2165
372
5300
181
401
0
.2
.4
.6
Pr(hiqual)
. scatter stdres p, mlabel(snum)
.8
1
Inspect observations with large
residuals (>2.5 a 3)
. list if snum==1403
458.
snum
1403
dnum
315
cred_hl
low
awards
No
schqual
high
pared
medium
hiqual
high
pared_ml
medium
ell
27
yr_rnd
nd
pared_hl
.
avg_ed
2.19
meals
100
api00
808
enroll
497
api99
824
hicred
0
cred
low
full
59
cred_ml
low
some_col
28
ym
100
. logit hiqual
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
yr_rnd meals avg_ed if snum != 1403
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-729.56398
-332.43297
-270.06297
-265.70542
-265.68934
-265.68934
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -265.68934
hiqual
Coef.
yr_rnd
meals
avg_ed
_cons
-1.1328
-.0790397
2.010791
-3.528875
. logit hiqual
Std. Err.
.3842377
.0076984
.2947269
1.037345
z
-2.95
-10.27
6.82
-3.40
P>|z|
0.003
0.000
0.000
0.001
=
=
=
=
1157
927.75
0.0000
0.6358
[95% Conf. Interval]
-1.885892
-.0941283
1.433137
-5.562035
-.3797077
-.0639511
2.588445
-1.495716
yr_rnd meals avg_ed, nolog
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -273.66402
hiqual
Coef.
yr_rnd
meals
avg_ed
_cons
-.9913148
-.0758864
1.98805
-3.566451
Std. Err.
.3743452
.0074453
.2884154
1.01715
z
-2.65
-10.19
6.89
-3.51
P>|z|
0.008
0.000
0.000
0.000
=
=
=
=
1158
914.05
0.0000
0.6255
[95% Conf. Interval]
-1.725018
-.090479
1.422766
-5.560028
-.2576117
-.0612938
2.553334
-1.572874
Cooks distance (< 1)
Prediction for j from all
observations
Number of parameter
Prediction for j for
observations excluding
observation i
Means square error
. logit
hiqual meals yr_rnd cred_ml, nolog
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Logistic regression
Log likelihood = -156.25611
hiqual
Coef.
meals
yr_rnd
cred_ml
_cons
-.0936
-1.189537
.7406536
2.425635
Std. Err.
.0084587
.5022235
.3152647
.3995025
z
-11.07
-2.37
2.35
6.07
P>|z|
0.000
0.018
0.019
0.000
707
385.53
0.0000
0.5523
=
=
=
=
[95% Conf. Interval]
-.1101786
-2.173877
.1227463
1.642624
-.0770213
-.2051967
1.358561
3.208645
. predict cook, dbeta
(493 missing values generated)
. summ cook
Variable
Obs
Mean
cook
707
.0257177
Std. Dev.
.0899176
Min
Max
2.11e-07
.6101257
. graph twoway scatter cook p, mlabel(snum)
.6
2086
4068
1846
1966
930
1497
934
718
1879
1516
753
1819
1403
935
.2
.4
1402
2334
1672
3845
1987
5836
140
5599
5664
5457
709
5620
1737
5569
0
3638
5589
2593
1033
3834
4002
167
4696
1777
5304
137
1241
1904
1620
62
5316
3893
5403
1232
5196
2802
1932
5864
5387
81
3917
2239
70
1860
5036
5063
4552 2165
2521
2984
5149
2607
2679
1706
351
4558
4799
5062
5441
4786
4294
2339
4853
1311
1310
5326
46985334
284
2703
66
373
78
1500 5380
5572
4312
1234
2369
3675
3829
904
4702
1239
3765
2150
3566
5329
1698
5874
3519
5406
5134
4182
6171
2490
1112
4651
2520
5719
4040
2696
5140
881
4253
3022
2116
6190
2687
5358
2378
596
4240
2870
660
2307
2672
1219
3966
185125871249
58
3858
3224
4175
4645
5363
5369
3870
995
2337
1795
1045
4705
2705
6101
259
3655
2266
2167
3518
5607127854
5444
5207 5434
3800
10
3824
2625
2386 3998
84 4783
5219
2319
3904
4816
5773
136
2951
1214
1646
5312
205
1629
3887
5873
3954
2338
1887
3235
3583
3956
1379
372
5561
6146
5798
179
1799
2588
3836
4975
3172
4309
2383
4248
6180
3570
4781
5937
1264
4724
28
5215
3214
3207
3356
1871
949
1600
4059
941
1627
2453
2489
1100
2750
2639
470
4778
2652 4839
5361
590
181
93
4213
2333
5194
2073
5374
3201
2635
1240
5210
5636
1383
5090
3246
521
4948
6088
1115
5211
3350
1985
5465
3475
6044
3970
4539
5313
1484
2494
1473
2168
1751
1401
1454
3986
2198
1103
5990
1276
1280
1608
464
3884
4561
65
6116
3849
6106
3004
2159
2622
1
299
1297
1874
3955
4824
4849
2755
2835
1923
4852
5295
3650
5397
53002955
2944
5331
5593
1213
5003
3238
529
1415
1444
4901
5993
4699
1824
3522
2977
4296
6182
5998
5980
1488
2698
47
5870
4318
1055
1199
3009
4307
1118
3023
1419
1685
4811
4826
6090
2753
2822
258
449
2282
5586
720
453
1373
2377
2208
3248
5379
1185
1186
2245
1175
2519
2326
1180
4736
4746
4879
422
872
1493
1912
2119
3864
4512
3978
1992
1161
1696
5882
2240
2480
2128
2663
1924
2126
5059
3166
909
3426
2772
2325
4663
2730
96
1293
2732
767
4729
1494
1872
173
92
1755
1492
2957
1828
1461
5200
1625
2544
2972
1758
4747
4226
25133
4220
2599
754
4223
4022
1458
5421
1666
6043
238
2152
5375
5494
4728
2076
5904
2098
4202
1502
1362
5409
3087
138
363
3960
5222
3355
1813
1779
698
696
1951
1952
664
1408
1618
1599
4132
1642
1713
1863
1784
1748
1769
1861
1843
1451
1886
1622
1768
1997
2089
4133
932
1890
1894
3216
657
1723
1717
1906
3131
1916
1718
4143
1721
1898
1762
1687
2083
2070
1855
1839
1850
1859
492
4522
4006
4061
1899
1967
1292
3121
2223
1807
1630
5095
5018
893
4173
2441
2141
1982
759
1926
3118
1595
1069
4003
1786
3767
3119
1746
2127
4146
5953
2280
2323
2270
1771
3150
5101
2226
2227
5016
3833
4134
1809
1490
2430
1679
2440
4010
1914
2714
3502
3613
5956
5858
5020
1681
4534
4064
1344
2115
2274
3975
4098
644
873
892
5946
1988
980
4504
5834
1318
867
483
5245
878
1345
1714
4618
1709
2136
2293
2695
671
5093
1450
4018
688
3471
1853
4135
3610
3460
1915
5012
4116
4873
3509
2114
3612
2144
1315
2140
4718
738
3147
2103
5768
496
3130
1623
2685
3174
6008
6007
2276
3895
4414
4415
3572
1511
1601
2692
1198
3069
4012
1660
5922
2272
5948
505
465
1980
2786
4344
1179
481
5911
1739
1350
856
836
1724
1947
2269
6046
4727
1171
4876
5867
5240
4815
1339
1340
846
6015
3484
1613
2991
1302
5242
1077
1728
5906
4868
4865
2623
5896
2624
3944
4056
106
1661
4145
5907
1187
1729
736
4860
2788
4500
116
2795
2583
1390
1607
5573
5899
981
228
4477
4007
900
2281
2491
2589
4130
3728
6017
2580
1426
1830
1427
5581
2691
2600
2477
1965
6016
2191
1156
5252
3013
3876
4095
954
4709
1275
4024
1657
4882
4870
3822
2752
4627
4880
5910
1108
0
.2
.4
.6
Pr(hiqual)
.8
1
To Stata
•
•
•
•
•
Use apilog.dta
Awards = dependent variable
For Awards inspect frequency counts
Recode Awards into binary variable
Estimate a LR model using yr_rnd meals enroll
as predictors
To Stata
•
•
•
•
•
Inspect classification table
Perform Hosmer & Lemeshow test
Inspect standardized residuals
Inspect cooks distance
See if interaction effects improve fit
• Is the Wald test an accurate test to the
significance of coefficients in Logistic
regression analysis?
a) Yes, just like regression analysis.
b) Yes, it is accurate, although a Likelihood ratio test is more
efficient
c) No, unlike regression analysis, the Wald test is biased,
especially for relatively small coefficients .
d) No, unlike regression analysis, the Wald test is biased,
especially for relatively large coefficients .
• Use LRtest to check the significance effect of
the variable yr_rnd
• Use auto.dta (if not on your pc then)
– use http://www.stata-press.com/data/r11/auto
• Predict which car will be foreign, using weigth
and mpg as predictors
• Is the interaction between weigth and mpg
significant?
• Tip: always center variable before making
interactionvariable.
• use http://www.stata-press.com/data/r11/choice
• Does income, gender or type of car (European,
Japanese or American) predict whether a car
will be bought (choice)?