Transcript Slide 1

Gerrit Rooks
22-02-10
AMMBR III
DESCRIPTION OF DATA
Variable
1
2
3
4
5
6
7
8
Description
Codes/Values
Identification CodeID Number
ID
Birth Number
1-4
Smoking Status
0 = No, 1 = Yes
During Pregnancy
Race
1 = White, 2 = Black
3 = Other
Age of Mother
Years
Weight of Mother at
Pounds
Last Menstrual Period
Name
Birth Weight
Low Birth Weight
BWT
LOW
Grams
1 = BWT <=2500g,
0 = BWT >2500g
BIRTH
SMOKE
RACE
AGE
LWT
SUMMARY OF THE DATA
. summ
Variable
Obs
Mean
id
birth
smoking
race
agemother
488
488
488
488
488
93.56148
1.872951
.3995902
1.852459
26.44057
weightmother
birthweight
lowweight
488
488
488
142.75
2841.971
.3094262
Std. Dev.
Min
Max
53.91331
.8283019
.4903167
.9123576
5.825363
1
1
0
1
14
188
4
1
3
48
32.43726
688.3148
.4627315
80
798
0
272
5025
1
LOGISTICS OF LOGISTIC REGRESSION
 Estimate
the coefficients
 Assess model fit
 Interpret coefficients
 Check regression assumptions
EMPTY MODEL
. logit lowweight
Iteration 0:
Iteration 1:
log likelihood = -301.89672
log likelihood = -301.89672
Logistic regression
Number of obs
LR chi2(0)
Prob > chi2
Pseudo R2
Log likelihood = -301.89672
lowweight
Coef.
_cons
-.8028031
Std. Err.
.0979279
z
-8.20
P>|z|
0.000
=
=
=
=
488
-0.00
.
-0.0000
[95% Conf. Interval]
-.9947383
-.6108679
1
Pr(Y | X ) 
 .31
 ( .80 )
1 e
CLASSIFICATION TABLE EMPTY MODEL
. estat class
Logistic model for lowweight
True
D
~D
Total
+
-
0
151
0
337
0
488
Total
151
337
488
Classified
Classified + if predicted Pr(D) >= .5
True D defined as lowweight != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
False
False
False
False
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
0.00%
100.00%
.%
69.06%
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
0.00%
100.00%
.%
30.94%
69.06%
FULL MODEL
. logit lowweight smoking agemother weightmother
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-301.89672
-288.88873
-288.76222
-288.76218
-288.76218
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -288.76218
lowweight
Coef.
smoking
agemother
weightmother
_cons
.8097503
.0452296
-.0086232
-1.139015
Std. Err.
.2022273
.0184831
.0035144
.5844386
z
4.00
2.45
-2.45
-1.95
P>|z|
0.000
0.014
0.014
0.051
=
=
=
=
488
26.27
0.0000
0.0435
[95% Conf. Interval]
.413392
.0090033
-.0155113
-2.284493
1.206109
.0814558
-.0017351
.0064639
LOGISTICS OF LOGISTIC REGRESSION
 Estimate
the coefficients
 Assess model fit
 Interpret coefficients
 Check regression assumptions
MODEL FIT: LIKELIHOOD RATIO TEST
. logit lowweight smoking agemother weightmother
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-301.89672
-288.88873
-288.76222
-288.76218
-288.76218
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -288.76218
lowweight
Coef.
smoking
agemother
weightmother
_cons
.8097503
.0452296
-.0086232
-1.139015
Std. Err.
.2022273
.0184831
.0035144
.5844386
z
4.00
2.45
-2.45
-1.95
P>|z|
0.000
0.014
0.014
0.051
=
=
=
=
488
26.27
0.0000
0.0435
[95% Conf. Interval]
.413392
.0090033
-.0155113
-2.284493
1.206109
.0814558
-.0017351
.0064639
CLASSIFICATION TABLE FULL MODEL
. estat class
Logistic model for lowweight
True
Classified
D
~D
Total
+
-
15
136
13
324
28
460
Total
151
337
488
Classified + if predicted Pr(D) >= .5
True D defined as lowweight != 0
Sensitivity
Specificity
Positive predictive value
Negative predictive value
Pr( +| D)
Pr( -|~D)
Pr( D| +)
Pr(~D| -)
9.93%
96.14%
53.57%
70.43%
False
False
False
False
Pr( +|~D)
Pr( -| D)
Pr(~D| +)
Pr( D| -)
3.86%
90.07%
46.43%
29.57%
+
+
-
rate
rate
rate
rate
for
for
for
for
true ~D
true D
classified +
classified -
Correctly classified
69.47%
HOSMER & LEMESHOW TEST
. estat gof, group(10) table
Logistic model for lowweight, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
Group
Prob
Obs_1
Exp_1
Obs_0
Exp_0
Total
1
2
3
4
5
0.1929
0.2190
0.2391
0.2597
0.2826
8
11
16
6
13
7.8
10.1
11.2
12.2
13.0
41
38
33
43
35
41.2
38.9
37.8
36.8
35.0
49
49
49
49
48
6
7
8
9
10
0.3161
0.3659
0.4160
0.4745
0.5951
12
17
22
25
21
14.5
16.6
19.1
22.0
24.6
37
32
27
24
27
34.5
32.4
29.9
27.0
23.4
49
49
49
49
48
number of observations
number of groups
Hosmer-Lemeshow chi2(8)
Prob > chi2
=
=
=
=
488
10
10.13
0.2559
LOGISTICS OF LOGISTIC REGRESSION
 Estimate
the coefficients
 Assess model fit
 Interpret coefficients
 Check regression assumptions
SIGNIFICANCE AND DIRECTION
.
. logit lowweight smoking agemother weightmother
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-301.89672
-288.88873
-288.76222
-288.76218
-288.76218
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -288.76218
lowweight
Coef.
smoking
agemother
weightmother
_cons
.8097503
.0452296
-.0086232
-1.139015
Std. Err.
.2022273
.0184831
.0035144
.5844386
z
4.00
2.45
-2.45
-1.95
P>|z|
0.000
0.014
0.014
0.051
=
=
=
=
488
26.27
0.0000
0.0435
[95% Conf. Interval]
.413392
.0090033
-.0155113
-2.284493
1.206109
.0814558
-.0017351
.0064639
MAGNITUDE
. logistic lowweight smoking agemother weightmother
Logistic regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -288.76218
lowweight
Odds Ratio
smoking
agemother
weightmother
2.247347
1.046268
.9914139
Std. Err.
.4544749
.0193383
.0034842
z
4.00
2.45
-2.45
=
=
=
=
488
26.27
0.0000
0.0435
P>|z|
[95% Conf. Interval]
0.000
0.014
0.014
1.511938
1.009044
.9846084
(Exponentiated coefficienti - 1.0) * 100 = 125 -> a smoker
has 125% higher odds of have a lowweight baby.
3.34046
1.084865
.9982664
EXAMINING RESIDUALS IN LR
1.
2.
Isolate points for which the model fits poorly
Isolate influential data points
RESIDUAL STATISTICS
SAMANTHAS TIPS
In stata after estimation of the model the predict command can
be used to calculate residuals etc.
Type help logit postestimation for details
3
2
1
0
Density
4
5
PREDICTED PROBABILITIES
.1
.2
.3
.4
Pr(lowweight)
.5
.6
0
.5
Density
1
1.5
HISTOGRAM OF STANDARDIZED RESIDUALS
-1
0
1
2
standardized Pearson residual
3
STANDARDIZED RESIDUAL
.
. tab ZRE if ZRE > 3
standardize
d Pearson
residual
Freq.
Percent
Cum.
3.181052
1
100.00
100.00
Total
1
100.00
. list if ZRE > 3
355.
id
birth
smoking
race
agemot~r
weight~r
birthw~t
lowwei~t
ZRE
135
1
0
2
18
229
1858
1
3.181052
-1
0
1
2
3
INDEX PLOT ST. RESIDUALS
0
50
100
id
150
200
COOKS DISTANCE
. predict cook, dbeta
0
.05
.1
.15
INDEX PLOT COOKS DISTANCE
0
50
100
id
150
200
MULTI-COLLINEARITY

Field recommends obtaining VIF by using a OLS
regression to estimate the same model

Checking the correlation matrix of the
independent variables is often enough.

If you find high correlations (say >.6), then
check VIFs
FINALLY 2 CAUSES FOR TROUBLE

Incomplete information

Complete seperation
INCOMPLETE INFORMATION
COMPLETE SEPARATION
COMPLETE SEPARATION
PRACTICAL
Open ammbr.dta
 Analyse entrepreneurship
