Transcript AMMBR II

AMMBR II
Gerrit Rooks
Today
• Introduction to Stata
– Files / directories
– Stata syntax
– Useful commands / functions
• Logistic regression analysis with Stata
– Estimation
– GOF
– Coefficients
– Checking assumptions
Stata file types
• .ado
– programs that add commands to Stata
• .do
– Batch files that execute a set of Stata commands
• .dta
– Data file in Stata’s format
• .log
– Output saved as plain text by the log using
command
The working directory
• The working directory is the default directory
for any file operations such as using & saving
data, or logging output
• cd “d:\my work\”
Saving output to log files
• Syntax for the log command
– log using filename [, append replace [smcl|text]]
• To close a log file
– log close
Using and saving datasets
• Load a Stata dataset
– use d:\myproject\data.dta, clear
• Save
– save d:\myproject\data, replace
• Using change directory
– cd d:\myproject
– Use data, clear
– save data, replace
Entering data
• Data in other formats
– You can use SPSS to convert data
– You can use the infile and insheet commands to
import data in ASCII format
• Entering data by hand
– Type edit or just click on the data-editor button
Do-files
• You can create a text file that contains a series
of commands
• Use the do-editor to work with do-files
• Example I
Adding comments
• // or * denote comments stata should ignore
• Stata ignores whatever follows after /// and
treats the next line as a continuation
• Example II
A recommended structure
//if a log file is open, close it
capture log close
//dont'pause when output scrolls off the page
set more off
//change directory to your working directory
cd d:\myproject
//log results to file myfile.log
log using myfile, replace text
// * myfile.do-written 7 feb 2010 to illustrate do-files
//
your commands here
//close the log file
log close
Serious data analysis
• Ensure replicability use do+log files
• Document your do-files
– What is obvious today, is baffling in six months
• Keep a research log
– Diary that includes a description of every program
you run
• Develop a system for naming files
Serious data analysis
•
•
•
•
New variables should be given new names
Use labels and notes
Double check every new variable
ARCHIVE
The Stata syntax
• Regress y x1 x2 if x3 <20, cluster(x4)
1. Regress = Command
– What action do you want to performed
2. y x1 x2 = Names of variables, files or other objects
– On what things is the command performed
3. if x3 <20 = Qualifier on observations
– On which observations should the command be performed
4. , cluster(x4) = Options
– What special things should be done in executing the
command
Examples
• tabulate smoking race if agemother > 30, row
• Example of the if qualifier
– sum agemother if smoking == 1 & weightmother < 100
Elements used for logical
statements
Operator
Definition
Example
==
Equal to
If male == 1
!=
Not equal to
If male !=1
>
Greater than
If age > 20
>=
Greater than or equal to
If age >=21
<
Less than
If age<66
<=
Less than or equal to
If age<=65
&
And
If age==21&male ==1
|
or
If age<=21|age>=65
Missing values
• Automatically excluded when Stata fits
models, they are stored as the largest positive
values
• Beware
– The expression ‘age > 65’ can thus also include
missing values
– To be sure type: ‘age > 65 & age != .’
Selecting observations
• drop variable list
• Keep variable list
• drop if age < 65
Creating new variables
• generate command
– generate age2 = age * age
– generate
– see help function
– !!sometimes the command egen is a useful
alternative, f.i.
– egen meanage = mean(age)
Useful functions
Function
Definition
Example
+
addition
gen y = a+b
-
subtraction
gen y = a-b
/
Division
gen
density=population/area
*
Multiplication
gen y = a*b
^
Take to a power
gen y = a^3
ln
Natural log
gen lnwage = ln(wage)
exp
exponential
gen y = exp(b)
sqrt
Square root
Gen agesqrt = sqrt(age)
Replace command
• replace has the same syntax as generate but is
used to change values of a variable that
already exists
• gen age_dum = .
• replace age = 0 if age < 5
• replace age = 1 if age >=5
Recode
• Change values of exisiting variables
– Change 1 to 2 and 3 to 4:
recode origvar (1=2)(3=4), gen(myvar1)
– Change missings to 1:
recode origvar (.=1), gen(origvar)
Logistic regression
• Lets use a set of data collected by the state of
California from 1200 high schools measuring
academic achievement.
• Our dependent variable is called hiqual.
• Our predictor variable will be a continuous
variable called avg_ed, which is a continuous
measure of the average education (ranging
from 1 to 5) of the parents of the students in
the participating high schools.
OLS in Stata
. use "D:\Onderwijs\AMMBR\apilog.dta", clear
. regress hiqual avg_ed
Source
SS
Model
Residual
126.002822
128.260563
1
1156
126.002822
.110952044
Total
254.263385
1157
.219760921
hiqual
Coef.
avg_ed
_cons
.4287064
-.855187
Number of obs
F( 1, 1156)
Prob > F
R-squared
Adj R-squared
Root MSE
MS
df
Std. Err.
.0127215
.0363792
t
33.70
-23.51
P>|t|
0.000
0.000
1158
=
= 1135.65
= 0.0000
= 0.4956
= 0.4951
= .33309
[95% Conf. Interval]
.4037467
-.9265637
.4536662
-.7838102
. predict yhat
(option xb assumed; fitted values)
(42 missing values generated)
ylabel(0 1)
0
1
. twoway scatter yhat hiqual avg_ed, connect(l)
1
2
3
avg parent ed
Fitted values
4
Hi Quality School, Hi vs Not
5
Logistic regression in Stata
. logit hiqual avg_ed
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-730.68708
-386.86717
-355.09635
-353.94368
-353.94352
-353.94352
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -353.94352
hiqual
Coef.
avg_ed
_cons
3.910475
-12.30333
Std. Err.
.2383352
.731532
z
16.41
-16.82
P>|z|
0.000
0.000
=
=
=
=
1158
753.49
0.0000
0.5156
[95% Conf. Interval]
3.443347
-13.73711
4.377603
-10.86956
. predict yhat1
(option pr assumed; Pr(hiqual))
(42 missing values generated)
1
. twoway scatter yhat1 hiqual avg_ed, connect(l i) msymbol(i O) sort ylabel(0 1)
1  e ( 123.9 X 1 1 )
0
E (Y | X ) 
1
1
2
Pr(hiqual)
3
avg parent ed
4
Hi Quality School, Hi vs Not
5
Multiple predictors
. logit hiqual yr_rnd avg_ed
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
log
log
log
log
log
log
=
=
=
=
=
=
-730.68708
-384.29232
-349.81276
-348.24638
-348.2462
-348.2462
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Logistic regression
Log likelihood =
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
Std. Err.
.3425665
.2411152
.739755
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
Model fit: the likelihood ratio test
  2[ LL( New)  LL(baseline )]
2
Model fit: LR test
. logit hiqual yr_rnd avg_ed
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
log
log
log
log
log
log
=
=
=
=
=
=
-730.68708
-384.29232
-349.81276
-348.24638
-348.2462
-348.2462
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Logistic regression
Log likelihood =
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
Std. Err.
.3425665
.2411152
.739755
. di 2*(-348.2462+730.68708)
764.88176
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
Pseudo R2: proportional change in LL
. logit hiqual yr_rnd avg_ed
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
log
log
log
log
log
log
=
=
=
=
=
=
-730.68708
-384.29232
-349.81276
-348.24638
-348.2462
-348.2462
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Logistic regression
Log likelihood =
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
Std. Err.
.3425665
.2411152
.739755
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
. di (730.68708-348.2462)/730.68708
.52339899
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
Classification Table
. estat class
Logistic model for hiqual
True
Classified
D
~D
Total
+
-
0
391
0
809
0
1200
Total
391
809
1200
Classified + if predicted Pr(D) >= .5
True D defined as hiqual != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
0.00%
100.00%
.%
67.42%
False
False
False
False
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
0.00%
100.00%
.%
32.58%
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
67.42%
Classification Table
. estat class
Logistic model for hiqual
True
Classified
D
~D
Total
+
-
288
89
58
723
346
812
Total
377
781
1158
Classified + if predicted Pr(D) >= .5
True D defined as hiqual != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
76.39%
92.57%
83.24%
89.04%
False
False
False
False
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
7.43%
23.61%
16.76%
10.96%
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
87.31%
Interpreting coefficients:
significance
. logit
hiqual yr_rnd avg_ed, nolog
Logistic regression
Log likelihood =
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
b
Wald 
SEb
Std. Err.
.3425665
.2411152
.739755
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
Comparing models
. logit hiqual yr_rnd avg_ed
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-730.68708
-384.29232
-349.81276
-348.24638
-348.2462
-348.2462
Logistic regression
Log likelihood =
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
Std. Err.
.3425665
.2411152
.739755
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
After the full model and storage,
estimate nested model
. est store full_model
.
. logit hiqual avg_ed if e(sample)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-730.68708
-386.86717
-355.09635
-353.94368
-353.94352
-353.94352
Logistic regression
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
Log likelihood = -353.94352
.
hiqual
Coef.
avg_ed
_cons
3.910475
-12.30333
Std. Err.
.2383352
.731532
z
16.41
-16.82
P>|z|
0.000
0.000
=
=
=
=
1158
753.49
0.0000
0.5156
[95% Conf. Interval]
3.443347
-13.73711
4.377603
-10.86956
Likelihood ratio test
. lrtest full_model
Likelihood-ratio test
(Assumption: . nested in full_model)
LR chi2(1) =
Prob > chi2 =
11.39
0.0007
Interpretation of coefficients:
direction
. listcoef
logit (N=1158): Factor Change in Odds
Odds of: high vs not_high
-----------------------------------------------------------------hiqual |
b
z
P>|z|
e^b
e^bStdX
SDofX
---------+-------------------------------------------------------yr_rnd | -1.09104
-3.185
0.001
0.3359
0.6593
0.3819
avg_ed |
3.86531
16.031
0.000 47.7180 19.5978
0.7698
------------------------------------------------------------------
p( y )
logit  ln
 b0  b1 x1  b2 x2  ...  bn xn
1  p( y )
Interpretation of coefficients:
direction
. listcoef
logit (N=1158): Factor Change in Odds
Odds of: high vs not_high
-----------------------------------------------------------------hiqual |
b
z
P>|z|
e^b
e^bStdX
SDofX
---------+-------------------------------------------------------yr_rnd | -1.09104
-3.185
0.001
0.3359
0.6593
0.3819
avg_ed |
3.86531
16.031
0.000 47.7180 19.5978
0.7698
------------------------------------------------------------------
p( y )
b0
bn xn
b1 x1
b2 x2
Odds 
 e  e  e  ...  e
1  p( y )
Interpretation of coefficients:
Magnitude
. logit
hiqual yr_rnd avg_ed, nolog
Logistic regression
Log likelihood =
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
-348.2462
hiqual
Coef.
yr_rnd
avg_ed
_cons
-1.091038
3.86531
-12.05417
E (Y | X ) 
Std. Err.
.3425665
.2411152
.739755
z
-3.18
16.03
-16.29
P>|z|
0.001
0.000
0.000
1
1  e ( 123.9avg_ed1.1yr_rnd1 )
=
=
=
=
1158
764.88
0.0000
0.5234
[95% Conf. Interval]
-1.762456
3.392733
-13.50407
-.4196197
4.337887
-10.60428
Interpretation of coefficients:
Magnitude
E (Y | X ) 
. summ
1
1  e ( 123.9avg_ed1.1yr_rnd1 )
avg_ed yr_rnd
Variable
Obs
Mean
avg_ed
yr_rnd
1158
1200
2.754212
.18
Std. Dev.
.7697744
.3843476
. di 1/(1+exp(12-3.9*2.75))
.21840254
. di 1/(1+exp(12-3.9*2.75+1.1))
.08509905
Min
Max
1
0
5
1
the assumptions of logistic
regression
• The true conditional probabilities are a logistic
function of the independent variables.
• No important variables are omitted.
• No extraneous variables are included.
• The independent variables are measured without
error.
• The observations are independent.
• The independent variables are not linear
combinations of each other.
Hosmer & Lemeshow
Test divides sample in subgroups, checks whether difference
between observed and predicted is about equal in these groups
Test should not be significant (indicating no difference)
Hosmer & Lemeshow
Average
Probability
In j th group
First logistic regression
. logit hiqual yr_rnd meals cred_ml
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-349.01971
-199.10312
-160.11854
-156.27132
-156.25612
-156.25611
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -156.25611
hiqual
Coef.
yr_rnd
meals
cred_ml
_cons
-1.189537
-.0936
.7406536
2.425635
Std. Err.
.5022235
.0084587
.3152647
.3995025
z
-2.37
-11.07
2.35
6.07
P>|z|
0.018
0.000
0.019
0.000
=
=
=
=
707
385.53
0.0000
0.5523
[95% Conf. Interval]
-2.173877
-.1101786
.1227463
1.642624
-.2051967
-.0770213
1.358561
3.208645
Then postestimation command
. estat gof, table group(10)
Logistic model for hiqual, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
Group
Prob
Obs_1
Exp_1
Obs_0
Exp_0
Total
1
2
3
4
5
0.0008
0.0019
0.0037
0.0078
0.0208
1
1
0
0
1
0.0
0.1
0.2
0.4
0.9
71
71
71
68
71
72.0
71.9
70.8
67.6
71.1
72
72
71
68
72
6
7
8
9
10
0.0560
0.1554
0.4960
0.7531
0.9595
2
4
23
44
62
2.4
7.4
22.0
43.5
61.1
68
68
47
26
8
67.6
64.6
48.0
26.5
8.9
70
72
70
70
70
number of observations
number of groups
Hosmer-Lemeshow chi2(8)
Prob > chi2
=
=
=
=
707
10
40.45
0.0000
Specification error
. logit hiqual yr_rnd meals cred_ml, nolog
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -156.25611
hiqual
Coef.
yr_rnd
meals
cred_ml
_cons
-1.189537
-.0936
.7406536
2.425635
Std. Err.
.5022235
.0084587
.3152647
.3995025
z
-2.37
-11.07
2.35
6.07
P>|z|
0.018
0.000
0.019
0.000
=
=
=
=
707
385.53
0.0000
0.5523
[95% Conf. Interval]
-2.173877
-.1101786
.1227463
1.642624
-.2051967
-.0770213
1.358561
3.208645
. linktest, nolog
Logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -152.86003
hiqual
Coef.
_hat
_hatsq
_cons
1.215465
.0748928
-.1408008
Std. Err.
.1283978
.0263911
.1637332
z
9.47
2.84
-0.86
P>|z|
0.000
0.005
0.390
=
=
=
=
707
392.32
0.0000
0.5620
[95% Conf. Interval]
.9638102
.0231673
-.4617121
1.46712
.1266184
.1801105
Including interaction term helps
. gen ym=yr_rnd*meals
. logit hiqual yr_rnd meals cred_ml ym , nolog
Logistic regression
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
Log likelihood = -153.78831
hiqual
Coef.
yr_rnd
meals
cred_ml
ym
_cons
-2.834458
-.1019211
.7789823
.0463257
2.686005
Std. Err.
.8630901
.0098691
.3206881
.0188326
.4307661
z
-3.28
-10.33
2.43
2.46
6.24
P>|z|
0.001
0.000
0.015
0.014
0.000
=
=
=
=
707
390.46
0.0000
0.5594
[95% Conf. Interval]
-4.526083
-.1212641
.1504452
.0094145
1.841719
-1.142832
-.0825781
1.407519
.0832368
3.530291
. estat gof, table group(10)
Logistic model for hiqual, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
Group
Prob
Obs_1
Exp_1
Obs_0
Exp_0
Total
1
2
3
4
5
0.0015
0.0033
0.0054
0.0095
0.0204
0
1
0
1
1
0.1
0.2
0.3
0.5
1.0
71
73
74
63
70
70.9
73.8
73.7
63.5
70.0
71
74
74
64
71
6
7
8
9
10
0.0620
0.1420
0.4745
0.7725
0.9697
4
2
24
44
61
2.5
6.5
22.0
43.4
61.5
69
66
50
25
8
70.5
61.5
52.0
25.6
7.5
73
68
74
69
69
number of observations
number of groups
Hosmer-Lemeshow chi2(8)
Prob > chi2
.
=
=
=
=
707
10
9.25
0.3215
Ok now
. linktest
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
6:
log
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
=
-349.01971
-174.14403
-156.07793
-153.49407
-153.36857
-153.36794
-153.36794
Logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -153.36794
hiqual
Coef.
_hat
_hatsq
_cons
1.067861
.0297354
-.0644637
Ok now
Std. Err.
.1160715
.0317399
.1684527
z
9.20
0.94
-0.38
P>|z|
0.000
0.349
0.702
=
=
=
=
707
391.30
0.0000
0.5606
[95% Conf. Interval]
.8403653
-.0324737
-.3946249
1.295357
.0919445
.2656976
Multicollinearity
. reg
hiqual avg_ed yr_rnd meals
Source
SS
df
MS
Model
Residual
145.983509
108.279876
3
1154
48.6611696
.093830049
Total
254.263385
1157
.219760921
hiqual
Coef.
avg_ed
yr_rnd
meals
_cons
.1729601
-.0008586
-.0076084
.2445202
Std. Err.
.021089
.0248112
.000527
.0824989
. vif
Variable
VIF
1/VIF
meals
avg_ed
yr_rnd
3.31
3.25
1.11
0.301982
0.307731
0.903460
Mean VIF
2.56
t
8.20
-0.03
-14.44
2.96
Number of obs
F( 3, 1154)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.972
0.000
0.003
=
=
=
=
=
=
1158
518.61
0.0000
0.5741
0.5730
.30632
[95% Conf. Interval]
.1315831
-.0495386
-.0086423
.0826554
.2143371
.0478215
-.0065744
.4063849
Influential observations
. predict p
(option pr assumed; Pr(hiqual))
(42 missing values generated)
50
. predict stdres, rstand
(42 missing values generated)
30
40
1403
10
20
1402
840
5154
0
167
3656
328
1033
4719
4558
596
5864
4556
4852
4745
3159
4536
5978
285
2152
3765
4547
2334
1851
5968
4724
5755
4678
4035
5787
5663
2635
4386
42
3634
4702
4084
995
2918
4518
2339
1038
3874
3812
4400
5421
810
2337
4292
2490
1500
4043
2369
2679
48002353
3307
5039
1923
1795
346
1672
2802
5056
1629
4799
4696
4240
1234
4609
4514
3593
5704
6105
4583
4
2387
5304
3520
2816
2338
4728
4257
2546
1777
694
3829
3518
1241
3675
2704
3081
2817
5716
5656
5192
4608
1112
4705
138
2333
5639
2703
4654
5288
4853
3087
2913
3797
151
4091
2167
4320
3581
4936
4698
301
2989
5636
4985
3636
1987
2607
2521
284
4391
3868
5326
4270
5149
2922
2930
4820
2984
1502
5597
3845
2652
5434
5664
853
2498
3224
5777
5712
4735
5189
3800
6116
2908
4934
923
4289
5701
5331
58
2076
3904
78
2905
4910
3521
2842
5638
5294
4326
2902
2625
5334
1620
4213
3201
3858
5593
3064
3530
10
4552
3289
2636
3353
4790
5361
93
1362
3956
1131
2266
4286
3824
590
84
5207
5442
203
5194
4929
3206
3083
3204
4282
5635
5713
5427
2944
999
4237
431
4356
4572
2934
3294
3317
4557
1860
4519
3296
4911
4428
4399
1379
4200
5433
5219
1213
4585
477
4437
3638
611
2935
5211
3288
3583
3193
3343
5057
37
5114
66
4783
18
1383
2929
2319
3621
3207
3063
2097
3293
2955
5499
2313
3350
4083
5465
4284
4285
3266
6145
5563
3960
3265
6124
6038
692
3004
5276
4439
3366
2078
6106
129
3966
3589
532
3235
3849
3238
2898
4964
123
5469
3356
43
5676
4594
3670
5401
1444
2535
3285
3210
15
5374
1985
5210
1125
1437
4435
2317
513
446
3355
6146
2565
2899
591
65
6142
4921
272
3246
3029
2904
4596
6108
380
604
521
3345
519
4445
5065
4963
4271
520
2073
20
2910
4329
4328
4433
5110
4436
6129
5003
3365
5218
6126
5408
3236
5215
5387
3214
6127
4955
5090
5379
4927
5000
394
512
5305
1415
5273
3371
529
81
382
5338
427
234
5227
4626
2527
5270
5224
5268
5222
2208
233
3373
5271
5228
4330
1850
698
3069
696
1951
664
1408
492
4522
4006
2086
4068
1967
4132
3509
1292
4098
3121
1846
2223
1642
1713
5095
1966
1863
5922
5946
1748
930
893
1769
4173
1861
1843
1451
1497
2141
1886
1768
496
759
1926
1997
1982
3118
932
1595
4504
4133
2786
5834
1890
1318
1069
1786
867
4860
1516
1894
5245
3119
1723
5896
2127
657
1746
3216
4146
1717
1906
3131
1916
5953
2280
1721
753
4143
1898
2323
3794
714
3688
5900
4521
670
5949
700
653
3150
5101
1855
935
1450
1839
688
2714
628
5956
319
1859
4061
2089
1762
2226
2115
2274
1185
1186
5948
980
3767
4618
296
5012
1813
4534
4064
3728
5907
3975
644
2114
2144
1807
1630
1315
2140
5018
738
2269
3147
1077
6046
4727
2103
2272
505
5768
736
3130
1623
5899
2685
4003
5867
5240
483
2788
3174
909
3764
3741
5926
2582
2270
6008
4815
1771
3895
2227
5911
1687
3833
2083
4414
2070
4415
671
4134
3484
5093
1490
3471
2440
4010
1914
2606
3778
5092
667
4135
5858
3460
5020
3454
1344
5059
1302
1618
1899
4012
1607
1187
3978
1947
4718
2441
2326
872
465
4729
1494
1345
1718
5728
2624
2136
2293
2695
3572
981
4018
3522
1853
640
5098
1952
5457
2732
1660
767
3166
2519
5906
1872
1879
2623
1714
2548
3107
3622
3426
6007
2276
1511
2692
4523
641
661
761
649
4019
3610
856
1681
2772
4901
1724
4882
5242
1729
3970
1980
949
4865
481
1350
4130
2583
1601
3775
647
842
5943
5506
5471
1198
4145
3408
4007
5993
2477
5573
4351
5620
1340
846
4500
5016
4307
5882
2179
1293
5870
3009
4868
2325
1072
2324
1339
6017
4746
3944
2491
4002
2589
2730
4056
5998
2430
106
3449
6044
1728
4736
4663
1161
954
6182
116
422
1915
4477
4033
5980
4627
4876
4699
3013
6015
2795
4638
4396
3043
5748
4879
3613
1156
3733
3502
5737
4497
2663
2126
3475
2882
4873
4975
3023
4870
2480
5581
666
3742
594
2580
2977
1988
4673
1739
1809
743
5967
3760
1912
1488
2281
2128
3519
3428
3695
836
2981
92
2600
5313
24943955
1613
4496
3822
3864
4505
3329
4483
1085
1055
3007
6016
4409
1871
5990
5252
3316
1427
792
449
1118
4385
4302
3703
4880
453
228
1426
1830
1661
4275
5844
1992
4301
2544
4826
1493
2991
4381
2957
2119
4709
4537
941
1390
2691
3735
5547
1924
1755
5133
1103
1709
1484
96
4024
881
4022
773
4131
1199
4822
1419
3640
4581
5694
5910
173
3172
3161
4811
1965
2972
4544
1737
5847
2643
2752
5200
2639
4512
2383
2191
4059
2754
4747
134490
4268
5
1824
2168
4266
5752
3162
3084
2599
1473
6090
1492
1758
1657
1276
2698
4226
4036
1461
1108
1275
3876
4248
5586
329
4781
1696
2750
4948
1031
4539
1679
3881
140
4816
2573
2282
6172
1828
3754
801
4525
2593
4220
3305
2622
615
5534
2159
3532
5
3578
3411
5842
3834
6109
2672
6180
5874
1
351
6088
1685
5134
136
5406
2116
5548
3111
3022
490
47
358
1625
299
2583570
1949
559
776
4253
5107
2378
6190
1454
5937
4411
4264
4358
4309
2687
3410
3893
5761
4452
6030
1706
4580
3986
2951
2588
4223
5851
1932
4194
4553
3612
4824
5647
4045
4486
2870
1627
1297
563
3375
3582
487
5873
3272
2753
470
3422
4182
1401
2822
4394
3708
5369
1819
1751
924
2377
4506
2198
6171
3415
660
1018
5862
5605
3416
4040
3843
1219
1799
2453
3699
748
4645
244
1608
3712
5773
6043
1264
5035
11214
3566
874
1280
5693
5928
5607
1458
1600
2489
2307
784
799
5719
1373
2711
5363
1100
2696
4175
5818
3003
2973
2755
5606
3757
1215
4670
5853
4932
5612
3340
5295
4849
4366
464
5444
3650
5917
862
2520
205
5798
4839
5397
3884
3887
4203
3882
574
5380
2801
1904
583
1240
3954
4369
5723
1698
1646
1045
4923
5054
2571
4121
5796
1249
238
5555
28
2835
3655
2150
5312
147
5837
3870
137
1115
5375
5725
179
5708
5483
5494
283
5078
4683
4591
70
2509
5904
4926
1278
1232
6019
2587
5404
5196
4278
3853
5646
4550
259
1239
5403
4651
6101
3097
1001
4561
5062
2924
543
4786
54
3836
62
5441
5036
386
1666
5299
2098 373
4202
5572
4334
5026
5561
5409
2705
503
1887
342
302
550
5063
6063
1310
5569
5316
312
5692
5599
2386
6036
1311
4314
125
3098
4778
5657
6186
363
5700
5329
5524
5589
6087
3998
576
3917
1384
305
5836
1004
140
2165
372
5300
181
401
0
.2
.4
.6
Pr(hiqual)
. scatter stdres p, mlabel(snum)
.8
1
. list if snum==1403
458.
snum
1403
dnum
315
cred_hl
low
awards
No
schqual
high
pared
medium
hiqual
high
pared_ml
medium
ell
27
yr_rnd
nd
pared_hl
.
avg_ed
2.19
meals
100
api00
808
enroll
497
api99
824
hicred
0
cred
low
full
59
cred_ml
low
some_col
28
ym
100
. logit hiqual
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
yr_rnd meals avg_ed if snum != 1403
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-729.56398
-332.43297
-270.06297
-265.70542
-265.68934
-265.68934
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -265.68934
hiqual
Coef.
yr_rnd
meals
avg_ed
_cons
-1.1328
-.0790397
2.010791
-3.528875
. logit hiqual
Std. Err.
.3842377
.0076984
.2947269
1.037345
z
-2.95
-10.27
6.82
-3.40
P>|z|
0.003
0.000
0.000
0.001
=
=
=
=
1157
927.75
0.0000
0.6358
[95% Conf. Interval]
-1.885892
-.0941283
1.433137
-5.562035
-.3797077
-.0639511
2.588445
-1.495716
yr_rnd meals avg_ed, nolog
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -273.66402
hiqual
Coef.
yr_rnd
meals
avg_ed
_cons
-.9913148
-.0758864
1.98805
-3.566451
Std. Err.
.3743452
.0074453
.2884154
1.01715
z
-2.65
-10.19
6.89
-3.51
P>|z|
0.008
0.000
0.000
0.000
=
=
=
=
1158
914.05
0.0000
0.6255
[95% Conf. Interval]
-1.725018
-.090479
1.422766
-5.560028
-.2576117
-.0612938
2.553334
-1.572874
If we have enough time left
• Perform a logistic regression analysis
• Use apilog.dta
• Awards = dependent variable