Transcript AMMBR II
AMMBR II Gerrit Rooks Today • Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions • Logistic regression analysis with Stata – Estimation – GOF – Coefficients – Checking assumptions Stata file types • .ado – programs that add commands to Stata • .do – Batch files that execute a set of Stata commands • .dta – Data file in Stata’s format • .log – Output saved as plain text by the log using command The working directory • The working directory is the default directory for any file operations such as using & saving data, or logging output • cd “d:\my work\” Saving output to log files • Syntax for the log command – log using filename [, append replace [smcl|text]] • To close a log file – log close Using and saving datasets • Load a Stata dataset – use d:\myproject\data.dta, clear • Save – save d:\myproject\data, replace • Using change directory – cd d:\myproject – Use data, clear – save data, replace Entering data • Data in other formats – You can use SPSS to convert data – You can use the infile and insheet commands to import data in ASCII format • Entering data by hand – Type edit or just click on the data-editor button Do-files • You can create a text file that contains a series of commands • Use the do-editor to work with do-files • Example I Adding comments • // or * denote comments stata should ignore • Stata ignores whatever follows after /// and treats the next line as a continuation • Example II A recommended structure //if a log file is open, close it capture log close //dont'pause when output scrolls off the page set more off //change directory to your working directory cd d:\myproject //log results to file myfile.log log using myfile, replace text // * myfile.do-written 7 feb 2010 to illustrate do-files // your commands here //close the log file log close Serious data analysis • Ensure replicability use do+log files • Document your do-files – What is obvious today, is baffling in six months • Keep a research log – Diary that includes a description of every program you run • Develop a system for naming files Serious data analysis • • • • New variables should be given new names Use labels and notes Double check every new variable ARCHIVE The Stata syntax • Regress y x1 x2 if x3 <20, cluster(x4) 1. Regress = Command – What action do you want to performed 2. y x1 x2 = Names of variables, files or other objects – On what things is the command performed 3. if x3 <20 = Qualifier on observations – On which observations should the command be performed 4. , cluster(x4) = Options – What special things should be done in executing the command Examples • tabulate smoking race if agemother > 30, row • Example of the if qualifier – sum agemother if smoking == 1 & weightmother < 100 Elements used for logical statements Operator Definition Example == Equal to If male == 1 != Not equal to If male !=1 > Greater than If age > 20 >= Greater than or equal to If age >=21 < Less than If age<66 <= Less than or equal to If age<=65 & And If age==21&male ==1 | or If age<=21|age>=65 Missing values • Automatically excluded when Stata fits models, they are stored as the largest positive values • Beware – The expression ‘age > 65’ can thus also include missing values – To be sure type: ‘age > 65 & age != .’ Selecting observations • drop variable list • Keep variable list • drop if age < 65 Creating new variables • generate command – generate age2 = age * age – generate – see help function – !!sometimes the command egen is a useful alternative, f.i. – egen meanage = mean(age) Useful functions Function Definition Example + addition gen y = a+b - subtraction gen y = a-b / Division gen density=population/area * Multiplication gen y = a*b ^ Take to a power gen y = a^3 ln Natural log gen lnwage = ln(wage) exp exponential gen y = exp(b) sqrt Square root Gen agesqrt = sqrt(age) Replace command • replace has the same syntax as generate but is used to change values of a variable that already exists • gen age_dum = . • replace age = 0 if age < 5 • replace age = 1 if age >=5 Recode • Change values of exisiting variables – Change 1 to 2 and 3 to 4: recode origvar (1=2)(3=4), gen(myvar1) – Change missings to 1: recode origvar (.=1), gen(origvar) Logistic regression • Lets use a set of data collected by the state of California from 1200 high schools measuring academic achievement. • Our dependent variable is called hiqual. • Our predictor variable will be a continuous variable called avg_ed, which is a continuous measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools. OLS in Stata . use "D:\Onderwijs\AMMBR\apilog.dta", clear . regress hiqual avg_ed Source SS Model Residual 126.002822 128.260563 1 1156 126.002822 .110952044 Total 254.263385 1157 .219760921 hiqual Coef. avg_ed _cons .4287064 -.855187 Number of obs F( 1, 1156) Prob > F R-squared Adj R-squared Root MSE MS df Std. Err. .0127215 .0363792 t 33.70 -23.51 P>|t| 0.000 0.000 1158 = = 1135.65 = 0.0000 = 0.4956 = 0.4951 = .33309 [95% Conf. Interval] .4037467 -.9265637 .4536662 -.7838102 . predict yhat (option xb assumed; fitted values) (42 missing values generated) ylabel(0 1) 0 1 . twoway scatter yhat hiqual avg_ed, connect(l) 1 2 3 avg parent ed Fitted values 4 Hi Quality School, Hi vs Not 5 Logistic regression in Stata . logit hiqual avg_ed Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -730.68708 -386.86717 -355.09635 -353.94368 -353.94352 -353.94352 Logistic regression Number of obs LR chi2(1) Prob > chi2 Pseudo R2 Log likelihood = -353.94352 hiqual Coef. avg_ed _cons 3.910475 -12.30333 Std. Err. .2383352 .731532 z 16.41 -16.82 P>|z| 0.000 0.000 = = = = 1158 753.49 0.0000 0.5156 [95% Conf. Interval] 3.443347 -13.73711 4.377603 -10.86956 . predict yhat1 (option pr assumed; Pr(hiqual)) (42 missing values generated) 1 . twoway scatter yhat1 hiqual avg_ed, connect(l i) msymbol(i O) sort ylabel(0 1) 1 e ( 123.9 X 1 1 ) 0 E (Y | X ) 1 1 2 Pr(hiqual) 3 avg parent ed 4 Hi Quality School, Hi vs Not 5 Multiple predictors . logit hiqual yr_rnd avg_ed Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: likelihood likelihood likelihood likelihood likelihood likelihood log log log log log log = = = = = = -730.68708 -384.29232 -349.81276 -348.24638 -348.2462 -348.2462 Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Logistic regression Log likelihood = -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 Std. Err. .3425665 .2411152 .739755 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 Model fit: the likelihood ratio test 2[ LL( New) LL(baseline )] 2 Model fit: LR test . logit hiqual yr_rnd avg_ed Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: likelihood likelihood likelihood likelihood likelihood likelihood log log log log log log = = = = = = -730.68708 -384.29232 -349.81276 -348.24638 -348.2462 -348.2462 Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Logistic regression Log likelihood = -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 Std. Err. .3425665 .2411152 .739755 . di 2*(-348.2462+730.68708) 764.88176 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 Pseudo R2: proportional change in LL . logit hiqual yr_rnd avg_ed Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: likelihood likelihood likelihood likelihood likelihood likelihood log log log log log log = = = = = = -730.68708 -384.29232 -349.81276 -348.24638 -348.2462 -348.2462 Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Logistic regression Log likelihood = -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 Std. Err. .3425665 .2411152 .739755 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 . di (730.68708-348.2462)/730.68708 .52339899 = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 Classification Table . estat class Logistic model for hiqual True Classified D ~D Total + - 0 391 0 809 0 1200 Total 391 809 1200 Classified + if predicted Pr(D) >= .5 True D defined as hiqual != 0 Sensitivity Specificity Positive predictive value Negative predictive value Pr( +| D) Pr( -|~D) Pr( D| +) Pr(~D| -) 0.00% 100.00% .% 67.42% False False False False Pr( +|~D) Pr( -| D) Pr(~D| +) Pr( D| -) 0.00% 100.00% .% 32.58% + + - rate rate rate rate for for for for true ~D true D classified + classified - Correctly classified 67.42% Classification Table . estat class Logistic model for hiqual True Classified D ~D Total + - 288 89 58 723 346 812 Total 377 781 1158 Classified + if predicted Pr(D) >= .5 True D defined as hiqual != 0 Sensitivity Specificity Positive predictive value Negative predictive value Pr( +| D) Pr( -|~D) Pr( D| +) Pr(~D| -) 76.39% 92.57% 83.24% 89.04% False False False False Pr( +|~D) Pr( -| D) Pr(~D| +) Pr( D| -) 7.43% 23.61% 16.76% 10.96% + + - rate rate rate rate for for for for true ~D true D classified + classified - Correctly classified 87.31% Interpreting coefficients: significance . logit hiqual yr_rnd avg_ed, nolog Logistic regression Log likelihood = Number of obs LR chi2(2) Prob > chi2 Pseudo R2 -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 b Wald SEb Std. Err. .3425665 .2411152 .739755 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 Comparing models . logit hiqual yr_rnd avg_ed Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -730.68708 -384.29232 -349.81276 -348.24638 -348.2462 -348.2462 Logistic regression Log likelihood = Number of obs LR chi2(2) Prob > chi2 Pseudo R2 -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 Std. Err. .3425665 .2411152 .739755 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 After the full model and storage, estimate nested model . est store full_model . . logit hiqual avg_ed if e(sample) Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -730.68708 -386.86717 -355.09635 -353.94368 -353.94352 -353.94352 Logistic regression Number of obs LR chi2(1) Prob > chi2 Pseudo R2 Log likelihood = -353.94352 . hiqual Coef. avg_ed _cons 3.910475 -12.30333 Std. Err. .2383352 .731532 z 16.41 -16.82 P>|z| 0.000 0.000 = = = = 1158 753.49 0.0000 0.5156 [95% Conf. Interval] 3.443347 -13.73711 4.377603 -10.86956 Likelihood ratio test . lrtest full_model Likelihood-ratio test (Assumption: . nested in full_model) LR chi2(1) = Prob > chi2 = 11.39 0.0007 Interpretation of coefficients: direction . listcoef logit (N=1158): Factor Change in Odds Odds of: high vs not_high -----------------------------------------------------------------hiqual | b z P>|z| e^b e^bStdX SDofX ---------+-------------------------------------------------------yr_rnd | -1.09104 -3.185 0.001 0.3359 0.6593 0.3819 avg_ed | 3.86531 16.031 0.000 47.7180 19.5978 0.7698 ------------------------------------------------------------------ p( y ) logit ln b0 b1 x1 b2 x2 ... bn xn 1 p( y ) Interpretation of coefficients: direction . listcoef logit (N=1158): Factor Change in Odds Odds of: high vs not_high -----------------------------------------------------------------hiqual | b z P>|z| e^b e^bStdX SDofX ---------+-------------------------------------------------------yr_rnd | -1.09104 -3.185 0.001 0.3359 0.6593 0.3819 avg_ed | 3.86531 16.031 0.000 47.7180 19.5978 0.7698 ------------------------------------------------------------------ p( y ) b0 bn xn b1 x1 b2 x2 Odds e e e ... e 1 p( y ) Interpretation of coefficients: Magnitude . logit hiqual yr_rnd avg_ed, nolog Logistic regression Log likelihood = Number of obs LR chi2(2) Prob > chi2 Pseudo R2 -348.2462 hiqual Coef. yr_rnd avg_ed _cons -1.091038 3.86531 -12.05417 E (Y | X ) Std. Err. .3425665 .2411152 .739755 z -3.18 16.03 -16.29 P>|z| 0.001 0.000 0.000 1 1 e ( 123.9avg_ed1.1yr_rnd1 ) = = = = 1158 764.88 0.0000 0.5234 [95% Conf. Interval] -1.762456 3.392733 -13.50407 -.4196197 4.337887 -10.60428 Interpretation of coefficients: Magnitude E (Y | X ) . summ 1 1 e ( 123.9avg_ed1.1yr_rnd1 ) avg_ed yr_rnd Variable Obs Mean avg_ed yr_rnd 1158 1200 2.754212 .18 Std. Dev. .7697744 .3843476 . di 1/(1+exp(12-3.9*2.75)) .21840254 . di 1/(1+exp(12-3.9*2.75+1.1)) .08509905 Min Max 1 0 5 1 the assumptions of logistic regression • The true conditional probabilities are a logistic function of the independent variables. • No important variables are omitted. • No extraneous variables are included. • The independent variables are measured without error. • The observations are independent. • The independent variables are not linear combinations of each other. Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference) Hosmer & Lemeshow Average Probability In j th group First logistic regression . logit hiqual yr_rnd meals cred_ml Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -349.01971 -199.10312 -160.11854 -156.27132 -156.25612 -156.25611 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -156.25611 hiqual Coef. yr_rnd meals cred_ml _cons -1.189537 -.0936 .7406536 2.425635 Std. Err. .5022235 .0084587 .3152647 .3995025 z -2.37 -11.07 2.35 6.07 P>|z| 0.018 0.000 0.019 0.000 = = = = 707 385.53 0.0000 0.5523 [95% Conf. Interval] -2.173877 -.1101786 .1227463 1.642624 -.2051967 -.0770213 1.358561 3.208645 Then postestimation command . estat gof, table group(10) Logistic model for hiqual, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total 1 2 3 4 5 0.0008 0.0019 0.0037 0.0078 0.0208 1 1 0 0 1 0.0 0.1 0.2 0.4 0.9 71 71 71 68 71 72.0 71.9 70.8 67.6 71.1 72 72 71 68 72 6 7 8 9 10 0.0560 0.1554 0.4960 0.7531 0.9595 2 4 23 44 62 2.4 7.4 22.0 43.5 61.1 68 68 47 26 8 67.6 64.6 48.0 26.5 8.9 70 72 70 70 70 number of observations number of groups Hosmer-Lemeshow chi2(8) Prob > chi2 = = = = 707 10 40.45 0.0000 Specification error . logit hiqual yr_rnd meals cred_ml, nolog Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -156.25611 hiqual Coef. yr_rnd meals cred_ml _cons -1.189537 -.0936 .7406536 2.425635 Std. Err. .5022235 .0084587 .3152647 .3995025 z -2.37 -11.07 2.35 6.07 P>|z| 0.018 0.000 0.019 0.000 = = = = 707 385.53 0.0000 0.5523 [95% Conf. Interval] -2.173877 -.1101786 .1227463 1.642624 -.2051967 -.0770213 1.358561 3.208645 . linktest, nolog Logistic regression Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Log likelihood = -152.86003 hiqual Coef. _hat _hatsq _cons 1.215465 .0748928 -.1408008 Std. Err. .1283978 .0263911 .1637332 z 9.47 2.84 -0.86 P>|z| 0.000 0.005 0.390 = = = = 707 392.32 0.0000 0.5620 [95% Conf. Interval] .9638102 .0231673 -.4617121 1.46712 .1266184 .1801105 Including interaction term helps . gen ym=yr_rnd*meals . logit hiqual yr_rnd meals cred_ml ym , nolog Logistic regression Number of obs LR chi2(4) Prob > chi2 Pseudo R2 Log likelihood = -153.78831 hiqual Coef. yr_rnd meals cred_ml ym _cons -2.834458 -.1019211 .7789823 .0463257 2.686005 Std. Err. .8630901 .0098691 .3206881 .0188326 .4307661 z -3.28 -10.33 2.43 2.46 6.24 P>|z| 0.001 0.000 0.015 0.014 0.000 = = = = 707 390.46 0.0000 0.5594 [95% Conf. Interval] -4.526083 -.1212641 .1504452 .0094145 1.841719 -1.142832 -.0825781 1.407519 .0832368 3.530291 . estat gof, table group(10) Logistic model for hiqual, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total 1 2 3 4 5 0.0015 0.0033 0.0054 0.0095 0.0204 0 1 0 1 1 0.1 0.2 0.3 0.5 1.0 71 73 74 63 70 70.9 73.8 73.7 63.5 70.0 71 74 74 64 71 6 7 8 9 10 0.0620 0.1420 0.4745 0.7725 0.9697 4 2 24 44 61 2.5 6.5 22.0 43.4 61.5 69 66 50 25 8 70.5 61.5 52.0 25.6 7.5 73 68 74 69 69 number of observations number of groups Hosmer-Lemeshow chi2(8) Prob > chi2 . = = = = 707 10 9.25 0.3215 Ok now . linktest Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = -349.01971 -174.14403 -156.07793 -153.49407 -153.36857 -153.36794 -153.36794 Logistic regression Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Log likelihood = -153.36794 hiqual Coef. _hat _hatsq _cons 1.067861 .0297354 -.0644637 Ok now Std. Err. .1160715 .0317399 .1684527 z 9.20 0.94 -0.38 P>|z| 0.000 0.349 0.702 = = = = 707 391.30 0.0000 0.5606 [95% Conf. Interval] .8403653 -.0324737 -.3946249 1.295357 .0919445 .2656976 Multicollinearity . reg hiqual avg_ed yr_rnd meals Source SS df MS Model Residual 145.983509 108.279876 3 1154 48.6611696 .093830049 Total 254.263385 1157 .219760921 hiqual Coef. avg_ed yr_rnd meals _cons .1729601 -.0008586 -.0076084 .2445202 Std. Err. .021089 .0248112 .000527 .0824989 . vif Variable VIF 1/VIF meals avg_ed yr_rnd 3.31 3.25 1.11 0.301982 0.307731 0.903460 Mean VIF 2.56 t 8.20 -0.03 -14.44 2.96 Number of obs F( 3, 1154) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.972 0.000 0.003 = = = = = = 1158 518.61 0.0000 0.5741 0.5730 .30632 [95% Conf. Interval] .1315831 -.0495386 -.0086423 .0826554 .2143371 .0478215 -.0065744 .4063849 Influential observations . predict p (option pr assumed; Pr(hiqual)) (42 missing values generated) 50 . predict stdres, rstand (42 missing values generated) 30 40 1403 10 20 1402 840 5154 0 167 3656 328 1033 4719 4558 596 5864 4556 4852 4745 3159 4536 5978 285 2152 3765 4547 2334 1851 5968 4724 5755 4678 4035 5787 5663 2635 4386 42 3634 4702 4084 995 2918 4518 2339 1038 3874 3812 4400 5421 810 2337 4292 2490 1500 4043 2369 2679 48002353 3307 5039 1923 1795 346 1672 2802 5056 1629 4799 4696 4240 1234 4609 4514 3593 5704 6105 4583 4 2387 5304 3520 2816 2338 4728 4257 2546 1777 694 3829 3518 1241 3675 2704 3081 2817 5716 5656 5192 4608 1112 4705 138 2333 5639 2703 4654 5288 4853 3087 2913 3797 151 4091 2167 4320 3581 4936 4698 301 2989 5636 4985 3636 1987 2607 2521 284 4391 3868 5326 4270 5149 2922 2930 4820 2984 1502 5597 3845 2652 5434 5664 853 2498 3224 5777 5712 4735 5189 3800 6116 2908 4934 923 4289 5701 5331 58 2076 3904 78 2905 4910 3521 2842 5638 5294 4326 2902 2625 5334 1620 4213 3201 3858 5593 3064 3530 10 4552 3289 2636 3353 4790 5361 93 1362 3956 1131 2266 4286 3824 590 84 5207 5442 203 5194 4929 3206 3083 3204 4282 5635 5713 5427 2944 999 4237 431 4356 4572 2934 3294 3317 4557 1860 4519 3296 4911 4428 4399 1379 4200 5433 5219 1213 4585 477 4437 3638 611 2935 5211 3288 3583 3193 3343 5057 37 5114 66 4783 18 1383 2929 2319 3621 3207 3063 2097 3293 2955 5499 2313 3350 4083 5465 4284 4285 3266 6145 5563 3960 3265 6124 6038 692 3004 5276 4439 3366 2078 6106 129 3966 3589 532 3235 3849 3238 2898 4964 123 5469 3356 43 5676 4594 3670 5401 1444 2535 3285 3210 15 5374 1985 5210 1125 1437 4435 2317 513 446 3355 6146 2565 2899 591 65 6142 4921 272 3246 3029 2904 4596 6108 380 604 521 3345 519 4445 5065 4963 4271 520 2073 20 2910 4329 4328 4433 5110 4436 6129 5003 3365 5218 6126 5408 3236 5215 5387 3214 6127 4955 5090 5379 4927 5000 394 512 5305 1415 5273 3371 529 81 382 5338 427 234 5227 4626 2527 5270 5224 5268 5222 2208 233 3373 5271 5228 4330 1850 698 3069 696 1951 664 1408 492 4522 4006 2086 4068 1967 4132 3509 1292 4098 3121 1846 2223 1642 1713 5095 1966 1863 5922 5946 1748 930 893 1769 4173 1861 1843 1451 1497 2141 1886 1768 496 759 1926 1997 1982 3118 932 1595 4504 4133 2786 5834 1890 1318 1069 1786 867 4860 1516 1894 5245 3119 1723 5896 2127 657 1746 3216 4146 1717 1906 3131 1916 5953 2280 1721 753 4143 1898 2323 3794 714 3688 5900 4521 670 5949 700 653 3150 5101 1855 935 1450 1839 688 2714 628 5956 319 1859 4061 2089 1762 2226 2115 2274 1185 1186 5948 980 3767 4618 296 5012 1813 4534 4064 3728 5907 3975 644 2114 2144 1807 1630 1315 2140 5018 738 2269 3147 1077 6046 4727 2103 2272 505 5768 736 3130 1623 5899 2685 4003 5867 5240 483 2788 3174 909 3764 3741 5926 2582 2270 6008 4815 1771 3895 2227 5911 1687 3833 2083 4414 2070 4415 671 4134 3484 5093 1490 3471 2440 4010 1914 2606 3778 5092 667 4135 5858 3460 5020 3454 1344 5059 1302 1618 1899 4012 1607 1187 3978 1947 4718 2441 2326 872 465 4729 1494 1345 1718 5728 2624 2136 2293 2695 3572 981 4018 3522 1853 640 5098 1952 5457 2732 1660 767 3166 2519 5906 1872 1879 2623 1714 2548 3107 3622 3426 6007 2276 1511 2692 4523 641 661 761 649 4019 3610 856 1681 2772 4901 1724 4882 5242 1729 3970 1980 949 4865 481 1350 4130 2583 1601 3775 647 842 5943 5506 5471 1198 4145 3408 4007 5993 2477 5573 4351 5620 1340 846 4500 5016 4307 5882 2179 1293 5870 3009 4868 2325 1072 2324 1339 6017 4746 3944 2491 4002 2589 2730 4056 5998 2430 106 3449 6044 1728 4736 4663 1161 954 6182 116 422 1915 4477 4033 5980 4627 4876 4699 3013 6015 2795 4638 4396 3043 5748 4879 3613 1156 3733 3502 5737 4497 2663 2126 3475 2882 4873 4975 3023 4870 2480 5581 666 3742 594 2580 2977 1988 4673 1739 1809 743 5967 3760 1912 1488 2281 2128 3519 3428 3695 836 2981 92 2600 5313 24943955 1613 4496 3822 3864 4505 3329 4483 1085 1055 3007 6016 4409 1871 5990 5252 3316 1427 792 449 1118 4385 4302 3703 4880 453 228 1426 1830 1661 4275 5844 1992 4301 2544 4826 1493 2991 4381 2957 2119 4709 4537 941 1390 2691 3735 5547 1924 1755 5133 1103 1709 1484 96 4024 881 4022 773 4131 1199 4822 1419 3640 4581 5694 5910 173 3172 3161 4811 1965 2972 4544 1737 5847 2643 2752 5200 2639 4512 2383 2191 4059 2754 4747 134490 4268 5 1824 2168 4266 5752 3162 3084 2599 1473 6090 1492 1758 1657 1276 2698 4226 4036 1461 1108 1275 3876 4248 5586 329 4781 1696 2750 4948 1031 4539 1679 3881 140 4816 2573 2282 6172 1828 3754 801 4525 2593 4220 3305 2622 615 5534 2159 3532 5 3578 3411 5842 3834 6109 2672 6180 5874 1 351 6088 1685 5134 136 5406 2116 5548 3111 3022 490 47 358 1625 299 2583570 1949 559 776 4253 5107 2378 6190 1454 5937 4411 4264 4358 4309 2687 3410 3893 5761 4452 6030 1706 4580 3986 2951 2588 4223 5851 1932 4194 4553 3612 4824 5647 4045 4486 2870 1627 1297 563 3375 3582 487 5873 3272 2753 470 3422 4182 1401 2822 4394 3708 5369 1819 1751 924 2377 4506 2198 6171 3415 660 1018 5862 5605 3416 4040 3843 1219 1799 2453 3699 748 4645 244 1608 3712 5773 6043 1264 5035 11214 3566 874 1280 5693 5928 5607 1458 1600 2489 2307 784 799 5719 1373 2711 5363 1100 2696 4175 5818 3003 2973 2755 5606 3757 1215 4670 5853 4932 5612 3340 5295 4849 4366 464 5444 3650 5917 862 2520 205 5798 4839 5397 3884 3887 4203 3882 574 5380 2801 1904 583 1240 3954 4369 5723 1698 1646 1045 4923 5054 2571 4121 5796 1249 238 5555 28 2835 3655 2150 5312 147 5837 3870 137 1115 5375 5725 179 5708 5483 5494 283 5078 4683 4591 70 2509 5904 4926 1278 1232 6019 2587 5404 5196 4278 3853 5646 4550 259 1239 5403 4651 6101 3097 1001 4561 5062 2924 543 4786 54 3836 62 5441 5036 386 1666 5299 2098 373 4202 5572 4334 5026 5561 5409 2705 503 1887 342 302 550 5063 6063 1310 5569 5316 312 5692 5599 2386 6036 1311 4314 125 3098 4778 5657 6186 363 5700 5329 5524 5589 6087 3998 576 3917 1384 305 5836 1004 140 2165 372 5300 181 401 0 .2 .4 .6 Pr(hiqual) . scatter stdres p, mlabel(snum) .8 1 . list if snum==1403 458. snum 1403 dnum 315 cred_hl low awards No schqual high pared medium hiqual high pared_ml medium ell 27 yr_rnd nd pared_hl . avg_ed 2.19 meals 100 api00 808 enroll 497 api99 824 hicred 0 cred low full 59 cred_ml low some_col 28 ym 100 . logit hiqual Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: yr_rnd meals avg_ed if snum != 1403 log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -729.56398 -332.43297 -270.06297 -265.70542 -265.68934 -265.68934 Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -265.68934 hiqual Coef. yr_rnd meals avg_ed _cons -1.1328 -.0790397 2.010791 -3.528875 . logit hiqual Std. Err. .3842377 .0076984 .2947269 1.037345 z -2.95 -10.27 6.82 -3.40 P>|z| 0.003 0.000 0.000 0.001 = = = = 1157 927.75 0.0000 0.6358 [95% Conf. Interval] -1.885892 -.0941283 1.433137 -5.562035 -.3797077 -.0639511 2.588445 -1.495716 yr_rnd meals avg_ed, nolog Logistic regression Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -273.66402 hiqual Coef. yr_rnd meals avg_ed _cons -.9913148 -.0758864 1.98805 -3.566451 Std. Err. .3743452 .0074453 .2884154 1.01715 z -2.65 -10.19 6.89 -3.51 P>|z| 0.008 0.000 0.000 0.000 = = = = 1158 914.05 0.0000 0.6255 [95% Conf. Interval] -1.725018 -.090479 1.422766 -5.560028 -.2576117 -.0612938 2.553334 -1.572874 If we have enough time left • Perform a logistic regression analysis • Use apilog.dta • Awards = dependent variable