Transcript Slide 1
Gerrit Rooks 22-02-10 AMMBR III DESCRIPTION OF DATA Variable 1 2 3 4 5 6 7 8 Description Codes/Values Identification CodeID Number ID Birth Number 1-4 Smoking Status 0 = No, 1 = Yes During Pregnancy Race 1 = White, 2 = Black 3 = Other Age of Mother Years Weight of Mother at Pounds Last Menstrual Period Name Birth Weight Low Birth Weight BWT LOW Grams 1 = BWT <=2500g, 0 = BWT >2500g BIRTH SMOKE RACE AGE LWT SUMMARY OF THE DATA . summ Variable Obs Mean id birth smoking race agemother 488 488 488 488 488 93.56148 1.872951 .3995902 1.852459 26.44057 weightmother birthweight lowweight 488 488 488 142.75 2841.971 .3094262 Std. Dev. Min Max 53.91331 .8283019 .4903167 .9123576 5.825363 1 1 0 1 14 188 4 1 3 48 32.43726 688.3148 .4627315 80 798 0 272 5025 1 LOGISTICS OF LOGISTIC REGRESSION Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions EMPTY MODEL . logit lowweight Iteration 0: Iteration 1: log likelihood = -301.89672 log likelihood = -301.89672 Logistic regression Number of obs LR chi2(0) Prob > chi2 Pseudo R2 Log likelihood = -301.89672 lowweight Coef. _cons -.8028031 Std. Err. .0979279 z -8.20 P>|z| 0.000 = = = = 488 -0.00 . -0.0000 [95% Conf. Interval] -.9947383 -.6108679 1 Pr(Y | X ) .31 ( .80 ) 1 e CLASSIFICATION TABLE EMPTY MODEL . estat class Logistic model for lowweight True D ~D Total + - 0 151 0 337 0 488 Total 151 337 488 Classified Classified + if predicted Pr(D) >= .5 True D defined as lowweight != 0 Sensitivity Specificity Positive predictive value Negative predictive value False False False False + + - rate rate rate rate for for for for true ~D true D classified + classified - Correctly classified Pr( +| D) Pr( -|~D) Pr( D| +) Pr(~D| -) 0.00% 100.00% .% 69.06% Pr( +|~D) Pr( -| D) Pr(~D| +) Pr( D| -) 0.00% 100.00% .% 30.94% 69.06% FULL MODEL . logit lowweight smoking agemother weightmother Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -301.89672 -288.88873 -288.76222 -288.76218 -288.76218 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -288.76218 lowweight Coef. smoking agemother weightmother _cons .8097503 .0452296 -.0086232 -1.139015 Std. Err. .2022273 .0184831 .0035144 .5844386 z 4.00 2.45 -2.45 -1.95 P>|z| 0.000 0.014 0.014 0.051 = = = = 488 26.27 0.0000 0.0435 [95% Conf. Interval] .413392 .0090033 -.0155113 -2.284493 1.206109 .0814558 -.0017351 .0064639 LOGISTICS OF LOGISTIC REGRESSION Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions MODEL FIT: LIKELIHOOD RATIO TEST . logit lowweight smoking agemother weightmother Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -301.89672 -288.88873 -288.76222 -288.76218 -288.76218 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -288.76218 lowweight Coef. smoking agemother weightmother _cons .8097503 .0452296 -.0086232 -1.139015 Std. Err. .2022273 .0184831 .0035144 .5844386 z 4.00 2.45 -2.45 -1.95 P>|z| 0.000 0.014 0.014 0.051 = = = = 488 26.27 0.0000 0.0435 [95% Conf. Interval] .413392 .0090033 -.0155113 -2.284493 1.206109 .0814558 -.0017351 .0064639 CLASSIFICATION TABLE FULL MODEL . estat class Logistic model for lowweight True Classified D ~D Total + - 15 136 13 324 28 460 Total 151 337 488 Classified + if predicted Pr(D) >= .5 True D defined as lowweight != 0 Sensitivity Specificity Positive predictive value Negative predictive value Pr( +| D) Pr( -|~D) Pr( D| +) Pr(~D| -) 9.93% 96.14% 53.57% 70.43% False False False False Pr( +|~D) Pr( -| D) Pr(~D| +) Pr( D| -) 3.86% 90.07% 46.43% 29.57% + + - rate rate rate rate for for for for true ~D true D classified + classified - Correctly classified 69.47% HOSMER & LEMESHOW TEST . estat gof, group(10) table Logistic model for lowweight, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total 1 2 3 4 5 0.1929 0.2190 0.2391 0.2597 0.2826 8 11 16 6 13 7.8 10.1 11.2 12.2 13.0 41 38 33 43 35 41.2 38.9 37.8 36.8 35.0 49 49 49 49 48 6 7 8 9 10 0.3161 0.3659 0.4160 0.4745 0.5951 12 17 22 25 21 14.5 16.6 19.1 22.0 24.6 37 32 27 24 27 34.5 32.4 29.9 27.0 23.4 49 49 49 49 48 number of observations number of groups Hosmer-Lemeshow chi2(8) Prob > chi2 = = = = 488 10 10.13 0.2559 LOGISTICS OF LOGISTIC REGRESSION Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions SIGNIFICANCE AND DIRECTION . . logit lowweight smoking agemother weightmother Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -301.89672 -288.88873 -288.76222 -288.76218 -288.76218 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -288.76218 lowweight Coef. smoking agemother weightmother _cons .8097503 .0452296 -.0086232 -1.139015 Std. Err. .2022273 .0184831 .0035144 .5844386 z 4.00 2.45 -2.45 -1.95 P>|z| 0.000 0.014 0.014 0.051 = = = = 488 26.27 0.0000 0.0435 [95% Conf. Interval] .413392 .0090033 -.0155113 -2.284493 1.206109 .0814558 -.0017351 .0064639 MAGNITUDE . logistic lowweight smoking agemother weightmother Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -288.76218 lowweight Odds Ratio smoking agemother weightmother 2.247347 1.046268 .9914139 Std. Err. .4544749 .0193383 .0034842 z 4.00 2.45 -2.45 = = = = 488 26.27 0.0000 0.0435 P>|z| [95% Conf. Interval] 0.000 0.014 0.014 1.511938 1.009044 .9846084 (Exponentiated coefficienti - 1.0) * 100 = 125 -> a smoker has 125% higher odds of have a lowweight baby. 3.34046 1.084865 .9982664 EXAMINING RESIDUALS IN LR 1. 2. Isolate points for which the model fits poorly Isolate influential data points RESIDUAL STATISTICS SAMANTHAS TIPS In stata after estimation of the model the predict command can be used to calculate residuals etc. Type help logit postestimation for details 3 2 1 0 Density 4 5 PREDICTED PROBABILITIES .1 .2 .3 .4 Pr(lowweight) .5 .6 0 .5 Density 1 1.5 HISTOGRAM OF STANDARDIZED RESIDUALS -1 0 1 2 standardized Pearson residual 3 STANDARDIZED RESIDUAL . . tab ZRE if ZRE > 3 standardize d Pearson residual Freq. Percent Cum. 3.181052 1 100.00 100.00 Total 1 100.00 . list if ZRE > 3 355. id birth smoking race agemot~r weight~r birthw~t lowwei~t ZRE 135 1 0 2 18 229 1858 1 3.181052 -1 0 1 2 3 INDEX PLOT ST. RESIDUALS 0 50 100 id 150 200 COOKS DISTANCE . predict cook, dbeta 0 .05 .1 .15 INDEX PLOT COOKS DISTANCE 0 50 100 id 150 200 MULTI-COLLINEARITY Field recommends obtaining VIF by using a OLS regression to estimate the same model Checking the correlation matrix of the independent variables is often enough. If you find high correlations (say >.6), then check VIFs FINALLY 2 CAUSES FOR TROUBLE Incomplete information Complete seperation INCOMPLETE INFORMATION COMPLETE SEPARATION COMPLETE SEPARATION PRACTICAL Open ammbr.dta Analyse entrepreneurship