VI. Logistic Regression  An event occurs or doesn’t.  A category applies to an observation or doesn’t.  A student passes or.

Transcript VI. Logistic Regression  An event occurs or doesn’t.  A category applies to an observation or doesn’t.  A student passes or.

Slide 1

VI. Logistic Regression

Slide 2

 An event occurs or doesn’t.

 A category applies to an observation or
doesn’t.
 A student passes or fails.
 A patient survives or dies.
 A candidate wins or loses.

 A person is poor or not poor.
 A person is a citizen or not.

Slide 3

 These are examples of categorical data.

 They are also examples of binary discrete
phenomena.
 Binary discrete phenomena usually take the form
of a dichotomous indicator, or dummy, variable.
 It’s best to code binary discrete phenomena 0/1
so that the mean of the dummy variable equals the
proportion of cases with a value of 1, & can be
interpreted as a probability: e.g., mean of
female=.545 (=sample’s probability of being
female).

Slide 4

 Here’s

what we’re going to figure
out how to interpret:

Slide 5

. logit hsci read math female, or nolog
Logit estimates

Number of obs =
LR chi2(3)

Log likelihood = -79.013272

=

200
60.07

Prob > chi2

=

0.0000

Pseudo R2

=

0.2754

-----------------------------------------------------------------------------hsci | Odds Ratio

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+----------------------------------------------------------------

read |

1.073376 .0274368

2.77 0.006

1.020926

1.128521

math |

1.10315

3.43 0.001

1.042904

1.166877

-2.64 0.008

.1510434

.7564918

.0316097

female | .3380283 .1389325

------------------------------------------------------------------------------

Slide 6



OLS regression encounters serious problems
in dealing with a binary dependent variable:



OLS’s explanatory variable coefficients can
extend to positive or negative infinity, but
binary probabilities & proportions can’t exceed
1 or fall below 0.



OLS is premised on linearity, but with a binary
dependent variable the effects of an
explanatory variable are non-linear at the
binary variable’s lower & upper levels.

Slide 7

(3) OLS is also premised on additivity, but with a
binary outcome variable an explanatory
variable’s effect depends on the relative effects
of the other variables: if, say, one explanatory
variable pushes the probability of the binary
outcome variable near 0 or near 1, then the
effects of the other explanatory variables can’t
have much influence.
(4) And because a binary outcome variable has
just two values, it violates the OLS
assumptions of normality &, more important,
non-constant variance of residuals.

Slide 8

 What to do? A logit transformation is an

advantageous way of representing the S-shaped
curve of a binary outcome variable’s y/x
distribution.
 A probit transformation, which has
somewhat thinner tails, is also commonly used.
 A complementary log-log transformation
or a scobit transformation is often used if the
binary outcome variable is highly skewed.
 There are other such transformations as well
(see, e.g., Long & Freese; & the Stata
manuals).

Slide 9

 A logit transformation changes probabilities into
logged odds, which eliminate the binary
proportion (or probability) ceiling of 1.
 Odds express the likelihood of occurrence
relative to the likelihood of non-occurrence (i.e.
odds are the ratio of the proportions of the two
possible outcomes [see Moore & McCabe, chap.
15, pages 40-42]):
odds = event’s probability/1 – event’s probability
probability = event’s odds/1 + event’s odds

Slide 10

 To repeat, a logit transformation changes
probabilities into logged odds, which eliminate
the binary proportion (or probability) ceiling of
1.
 Logged odds are also known as logits (i.e.
the natural log of the odds).
 Why not stick with odds rather than logged
odds? Because logged odds (i.e. logits) also
eliminate the binary proportion (or probability)
floor of 0.

Slide 11

 So, on balance, the logit

transformation eliminates the outcome
variable’s proportion (or probability)
ceiling of 1 & floor of 0.
 Thus an explanatory variable
coefficients can extend to positive or
negative infinity.

Slide 12

 Note:

larger sample size is even
more important for logistic
regression than for OLS regression.

Slide 13

 So that we get a feel for what’s going
on, & can do the exercises in Moore &
McCabe, chap. 15, let’s compute some
odds & logged odds (i.e. logits).

Slide 14

 Display the odds of being in honors math:
. tab hmath
(>=60)

Freq.

Percent

Cum.

0

151

75.50

75.50

1

49

24.50

100.00

Total

200

100.00

. display .245/(1 - .245) =

.3245

 Interpretation? The event occurs .3245 times
per each time it does not occur.
 That is, there are 32.45 occurrences per 100
non-occurrences.

Slide 15

 Display the logged odds of being in honors
math:
. display ln(.3245) =

-1.1255

 Display the odds of not being in honors
math:
.

di .755/(1 - .755) =

3.083

 Display the logged odds of not being in
honors math:
. di ln(3.083) =

1.126

Slide 16

 Although we’ll never have to do the
following—thankfully, software will do it
for us automatically—let’s make a variable
that combines the odds of being in honors
math versus not being in honors math.
. gen ohmath = .3245 if hmath==1
. replace ohmath = 3.083 if hmath==0
. tab ohmath

Slide 17

 Let’s transform ohmath into another
variable that represents logged odds:
. gen lohmath = ln(ohmath)
. su ohmath lohmath
 And how could we display lohmath not
as logged odds but rather as odds (i.e.
as ohmath)?
. display exp(lohmath)

Slide 18

 From the standpoint of regression
analysis, why should we indeed have
transformed the variable into logged
odds?
 That is, what are the advantages of
doing so?

Slide 19

 Overall, a logit transformation of a binary
outcome variable linearizes the non-linear
relationship of X with the probability of Y.
 It does so as the logit transformation:


eliminates the upper & lower probability
ceilings of the binary variable; &



is symmetric around the mid-range probability
of 0.5, so that probabilities below this value
have negative logits (i.e. logged odds) while
those above this value have positive logits (i.e.
logged odds).

Slide 20



Let’s summarize:



the effect of X on the probability of binary Y
is non-linear; but



the effect of X on logit-transformed binary Y
is linear.

 We call the latter either logit or logistic
regression: they’re the same thing.

 Let’s fit a model using hsb2.dta:

Slide 21

. logit hsci read math female, nolog
Logit estimates

Number of obs =

Log likelihood = -79.013272

200

LR chi2(3)

=

60.07

Prob > chi2

=

0.0000

Pseudo R2

=

0.2754

-----------------------------------------------------------------------------hsci |

Coef.

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------read | .0708092

.0255612 2.77 0.006

math |

.028654

.09817

3.43 0.001

female | -1.084626 .4110086 -2.64 0.008
_cons | -9.990621

1.60694

-6.22 0.000

.0207102

.0420091

.1209082

.1543308

-1.890188 -.2790636
-13.14017 -6.841076

------------------------------------------------------------------------------

Slide 22

 How, though, do we interpret logit coefficients in
meaningful terms?
 That is, how do we interpret ‘logged odds’?

 The gain in parsimony via the logit
transformation is mitigated by the loss in
interpretability: the metric of logged odds (i.e.
logits) is not instrinsically meaningful to us.

Slide 23

 An alternative, more comprehensible approach
is to express regression coefficients not as not
logged odds but rather as odds:
odds = event’s probability/1 – event’s probability

Slide 24

 The odds are obtained by taking the
exponent, or anti-log, of the logged odds (i.e.
the logit coefficient):
. odds of honors math: di exp(-1.1255) = .325
. odds of not honors math: di exp(1.126) =3.083
 Review: interpretation?

Slide 25

 What are the odds of being in honors math?
odds: a ratio of probabilities

= event’s probability/1 - event’s
probability

Slide 26

 What are the odds of being in honors math
versus the odds of not being in honors math?
This is called an odds ratio.
odds ratio: a ratio of odds
odds ratio = .325/3.083 = .105
 Interpretation? The odds of being in honors

math are .105 those of not being in honors
math.

Slide 27

 Via logit or logistic regression, Stata gives us
slope coefficients as odds, instead of logged
odds, in any of the following ways:
(1) logit hsci read math female, or nolog
(2) logistic hsci read math female, nolog
(3) quietly logit hsci read math female

listcoef, factor help

Slide 28

 But expressing slope coefficient as odds, instead
of logged odds, causes a complication: the
equation determining the odds is not additive but
rather multiplicative.

Slide 29

. logit hsci read math female, or nolog
Logit estimates

Number of obs =
LR chi2(3)

Log likelihood = -79.013272

=

200
60.07

Prob > chi2

=

0.0000

Pseudo R2

=

0.2754

-----------------------------------------------------------------------------hsci | Odds Ratio

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------read |

1.073376 .0274368

2.77 0.006

1.020926

1.128521

math |

1.10315

3.43 0.001

1.042904

1.166877

-2.64 0.008

.1510434

.7564918

.0316097

female | .3380283 .1389325

------------------------------------------------------------------------------

Slide 30

 For every 1-unit increase in reading score, the
odds of being in honors science increase by the
multiple of 1.07 on average, holding the other
variables constant.
 Every 1-unit increase in math score, the odds
of being in honors science by the factor of
1.10 on average, holding the other variables
constant.

 The odds of being in honors science is lower
for females than males by a multiple of .338
on average, holding the other variables
constant.

Slide 31

. logit hsci read math female, or nolog
Logit estimates

Number of obs =
LR chi2(3)

Log likelihood = -79.013272

=

200
60.07

Prob > chi2

=

0.0000

Pseudo R2

=

0.2754

-----------------------------------------------------------------------------hsci | Odds Ratio

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------read |

1.073376 .0274368

2.77 0.006

1.020926

1.128521

math |

1.10315

3.43 0.001

1.042904

1.166877

-2.64 0.008

.1510434

.7564918

.0316097

female | .3380283 .1389325

------------------------------------------------------------------------------

Slide 32

The Metrics
 logit = 0 is the equivalent of odds=1 &
the equivalent of probability = .5

Slide 33

 Regarding odds ratios: an odds ratio of
.5, which indicates a negative effect, is of
the same magnitude as a positive-effect
odds ratio of 2.0.

 Here’s a helpful way to disentangle this
complication after estimating a logit (i.e.
logistic) model:
. listcoef, reverse help
 This reverses the outcome variable.

Slide 34

. listcoef, reverse help [So that the coefficients refer to
the odds of not being in honors science.]
logit (N=200): Factor Change in Odds

Odds of: 0 vs 1
---------------------------------------------------------------------hsci |

b

z

P>|z|

e^b

e^bStdX

SDofX

-------------+-------------------------------------------------------read |

0.07081

2.770 0.006

0.9316 0.4838

10.2529

math | 0.09817

3.426 0.001

0.9065 0.3986

9.3684

female | -1.08463 -2.639 0.008 2.9583 1.7185

0.4992

b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test

e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X

Slide 35

 listcoef, reverse related the
explanatory variables to the odds of not
being in hsci.

Slide 36

 An easier interpretation than odds ratios is

percentage change in odds:

. quietly logit hsci read math female
. listcoef, percent help
---------------------------------------------------------------------hsci |

b

z

P>|z|

%

%StdX

SDofX

-------------+-------------------------------------------------------read |

0.07081

2.770 0.006

7.3

106.7

10.2529

math | 0.09817

3.426 0.001

10.3

150.9

9.3684

-66.2

-41.8

0.4992

female | -1.08463 -2.639 0.008

---------------------------------------------------------------------P>|z| = p-value for z-test

% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in X
SDofX = standard deviation of X

Slide 37

 For every 1-unit increase in reading score, the
odds of being in honors science increase by 7.3%
on average, holding the other variables constant.
 For every 1-unit increase in math score, the
odds of being in honors science increase by
10.3% on average, holding the other variables
constant.
 The odds of being in honors science are lower
by 66.2% on average for females than males,
holding the other variables constant.

Slide 38

 The percentage interpretation, then,
eliminates the multiplicative aspect of
the model.

Slide 39

 Alternatives to know about are

‘relative risk’ & ‘relative risk ratio’.
 See, e.g., Utts, chap. 12.
 And see the downloadable command
‘relrisk’.
. logit hsci read math female, nolog

. relrisk

Slide 40

 Pseudo-R2: this is not equivalent to OLS R2.
 Many specialists (e.g., Pampel) recommend
not to report pseudo-R2 .
 Its metric is different from OLS R2—typically
it’s much lower than OLS R2—but readers
(including many academic & policy specialists)
are not aware of this difference.

Slide 41

 In the logistic equation we could have
specified the options robust &/or
cluster (as in OLS regression).
 Specifying ‘cluster’ automatically
invokes robust standard errors.

Slide 42

 Before we proceed, keep in mind the

following:

odds=1 is the equivalent of:
logit=0
probability=0.5

Slide 43

 Besides logits (i.e. logged odds) & odds, we

can also interpret the relationships from the
perspective of probabilities.

 Recall, though, that the the relationship
between X & the probability of binary Y is nonlinear.
 Thus the effect of X on binary Y has to be
identified at particular X-values or at a
particular sets of values for X’s; & we
compare the effects across particular X-levels.

Slide 44

. logit hsci read math female, or nolog
. prvalue, r(mean) delta brief
logit: Predictions for hsci
Pr(y=1|x):

0.1525 95% ci: (0.0998,0.2259)

Pr(y=0|x):

0.8475 95% ci: (0.7741,0.9002)

 prvalue, like the rest of the pr-commands, can
only be used after estimating a regression model.
 It summarizes the samples 1/0 probabilities for
the specified binary y-variable holding all of the
data set’s (not just the model’s) other variables at
their means (though medians can be specified
alternatively).

Slide 45

. prvalue, x(female=0) r(mean) delta brief
logit: Predictions for hsci
Pr(y=1|x):

0.2453 95% ci: (0.1568,0.3622)

Pr(y=0|x):

0.7547 95% ci: (0.6378,0.8432)

. prvalue, x(female=1) r(mean) delta brief
logit: Predictions for hsci
Pr(y=1|x):

0.0990 95% ci: (0.0526,0.1784)

Pr(y=0|x):

0.9010 95% ci: (0.8216,0.9474)

Slide 46

 Comparing males across particular scores:
. prvalue, x(read=40 math=40 female=0), delta b save

. prvalue, x(read=60 math=60 female=0), delta b dif
 How else could we compare the estimated
probabilities of being in honors science? By
comparing female=0 versus female=1 at
particular reading & math scores.

Note: ‘b’ – ‘brief’ (i.e. display only the model’s
most relevant values)

Slide 47

 Comparing males & females at particular scores:
. prvalue, x(read=40 math=40 female=0), delta b
save
. prvalue, x(read=40 math=40 female=1), delta b
dif

 Or try prtab to see, e.g., how female versus male
estimated probabilities vary across the range of
math scores, holding reading scores constant at 40
(but does not provide a confidence interval):
. prtab math female, x(read=40) brief

Slide 48

 Or try prchange to see, e.g., how female
estimated probabilities vary as female math
scores increase from 40 to 60 (which, however,
does not provide a confidence interval):
. prchange math, x(female=1 math=40)
fromto delta(20) uncentered brief

Slide 49

 A problem with prtab & prchange is that they
don’t give confidence intervals, which prvalue delta
does provide.
 Here’s a different way of making predictions—for
logged odds, odds, or probabilities—that gives
confidence intervals:
. adjust math=40, by(female) ci
. adjust math=40, by(female) exp ci

. adjust math=40, by(female) pr ci

 Note: the first variant can be used to obtain
predicted coefficients with OLS regression as well.

Slide 50



Remember: we can examine the relationship
of X with binary Y via:



logits (i.e. logged odds)



odds; or



probabilities

 What are the differences in functional forms &
interpretations?

Slide 51

 While we’re at it, why not give probit a look?
. probit hsci read math, nolog

. logit hsci read math, nolog
 Although the logit coefficients exceed the probit
coefficients by a factor of about 1.8, the functional
y/x relationship is virtually the same: it is very
slightly different at the low & high ends because
the probit transformation has thinner tails.

 But there are no odds or probabilities
interpretations for probit models, which thus are
harder to express in meaningful terms.

Slide 52

 For all forms of categorical-dependent
variable regression, the same options
exist for explanatory variables—
interactions, power transformations, &
categorizing—as for OLS regression.

Slide 53

Estimation & Model Fit
 Because a dichotomous dependent variable

violates OLS assumptions of normality & nonconstant variance in residuals, models are
estimated instead by maximum likelihood
estimation.
 Maximum likelihood estimation finds
estimates of model parameters that are most
likely to give rise to the pattern of observations in
the sample data.

Slide 54

 Maximum likelihood estimation yields not an F-

statistic & F-test for model fit but rather a log
likelihood value & likelihood ratio test.

 The likelihood ratio test compares a model
with the constant only (the baseline model) to the
fitted (i.e. full) model: what’s the likelihood of
giving rise to the sample estimates via the baseline
model vs. the fitted (i.e. full) model?

 It then assesses the reduction in log
likelihood value anchored by the specified
degrees of freedom & computes a Chi-squared
test of the reduction.

Slide 55

 This is equivalent to the OLS F-statistic & F-test
for model fit.
 We use the likelihood ratio test to test nested
models (via the command lrtest).

 And as with OLS we examine the p-values of
individual coefficients, although in this case these
are computed via the Wald-test.

Slide 56

 Here we begin by testing the ‘full’ model (&
storing its estimates) & then test ‘reduced models’
versus the full model:
. logit hsci read write math female, or nolog
. estimates store full
. logit hsci read write, or nolog

. lrtest full
likelihood-ratio test

LR chi2(2) =

17.69

(Assumption: . nested in full)

Prob > chi2 =

0.0001

Slide 57

 But if the models use robust standard
errors, pweights, or cluster adjustment, the
likelihood ratio test can’t be used.
 And of course, if the number of observations
varies from one model to another, the likelihood
ratio test can’t be used either.
 In such cases (aside from varying number of
observations) use the Wald-test, which is what is
applied in logistic regression via the test
command:

Slide 58

. logit hsci read write math female, or nolog
. test math female
( 1) math = 0
( 2) female = 0
chi2( 2) = 15.35
Prob > chi2 =

0.0005

 The likelihood ratio test is considered superior
to the Wald-test; the results may differ
somewhat.

Slide 59

 There is a way to test non-nested
models or nested models with unequal
observations: BIC or AIC tests (see
STATA Manual [‘estimates,’ table
options]; Pampel; & Long/Freese).
 See Long/Freese’s downloadable
‘fitstat’ command & their discussions of
BIC & AIC.

Slide 60

. logistic hsci read write math female, nolog
. fitstat, saving(m1)
. est store m1

Slide 61

. logistic hsci read write, nolog
. fitstat, using(m1) bic
. est store m2
. est table m1 m2, eform star(.1 .05 .01)
stats(N chi2 df_m bic)

Slide 62

 BIC test criteria: smaller BIC indicates
better-fitting model.
. difference 0-2: weak support for complete
model

. difference 2-6: positive support for complete
model
. difference 6-10: strong support for complete
model
. difference >10: very strong support for
complete model

Slide 63

 AIC criterion: smaller AIC*n indicates
better-fitting model.

 AIC, BIC, & fitstat can be used to test
OLS regression models, too.

Slide 64

Beware: Pitfalls in Comparing
Logistic Coefficients across Nested
Models
 Unlike Y in OLS, the error variance for Y in
logistic changes with the addition/deletion of
variables.
 Hence, unlike OLS, comparing the values of
logistic coefficients is tricky.
 Check Stata listserv on this topic (e.g.,
commentaries by Richard Williams, Herb Smith,
Roger Newsome).

Slide 65

 If you need to compare the values of logistic
coefficients across nested models, one
recommended approach is to use Stata’s ‘suest’
(‘seemingly unrelated estimation’) command.
 Another approach is to standardize Y across
the models (e.g., using ‘listcoef, std’).

Slide 66

State-of-the-Art on Using Logistic or
Probit Regression:
Glenn Hoetker, “The Use of Logit and Probit
Models in Strategic Management Research:
Critical Issues”
http://www.business.uiuc.edu/ghoetker/documents/Ho
etker_logit_in_strategy.pdf
 Hoetker’s Stata command ‘complogit’:

. net from
http://www.business.uiuc.edu/ghoetker/
. findit oglm

Slide 67

 Richard Williams’ Stata ‘oglm’ command:

http://econpapers.repec.org/software/bocbocod
e/s453402.htm
. findit oglm

Slide 68

Data Preparation

 Data preparation is crucial for logistic regression.
 Particular attention must be paid to any contingency
table of the outcome variable with a categorical
explanatory variable that has a zero cell or cell<5
observations, which can cause serious problems of model
fitting
 Options for a zero cell or cell<5 observations:
collapse the categories in some sensible way to eliminate
the problem; eliminate the variable (if advisable on other
grounds); or if the variable is ordinal, model it as if it were
continuous.

Slide 69

 A common problem arising from zero cells or
low-count cells is ‘perfect prediction’ (see also
‘separation problem’ or ‘quasi-separation
problem’).

 Stata provides a warning for ‘perfect
prediction,’ & in general the standard errors &
confidence intervals are inflated.
 There’s considerable literature on strategies
for dealing with these problems, or consult
Stata listserv.

Slide 70

In sum, for a contingency table of
the outcome variable with a
categorical explanatory variable:

 No zero cells.
 No more than 20% of cells with
less than 5 observations.

Slide 71

 The following approach to data
preparation for logistic regression is
based on Hosmer & Lemeshow, Applied
Logistic Regression (2ed).

Slide 72

Variable Selection
 Begin with a careful analysis of each variable.
 For nominal, ordinal, & continuous variables with
few integer values, do a contingency table of
the outcome variable (y=0, 1) versus the
levels of the explanatory variable, checking
for zero cells, cells with <5 obs., &, at this stage,
for p<=.25.

. tab hsci female, col chi2
. tab hsci race, col chi2
. tab hsci quintiles_math, col chi2

Slide 73

 Take appropriate action to eliminate zero
cells and minimize cells with <5 observations.
 For continuous variables, a form of
preparatory analysis is the two-sample ttest:
. ttest read, by(hsci) unequal
. ttest math, by(hsci) unequal

Slide 74

 Likewise pertinent to continuous explanatory
variables is a graph of x versus y=0, 1, to
explore for possible curvilinearity:
. sparl hsci write, quad | logx
. twoway fpfit hsci write
. gr7 hsci write, c(s) ba(8)

. twoway mband hsci write, mband(8)
. twoway lowess hsci write, bwidth(.6)

Slide 75

 Choosing from those variables having p<=.25,

conceptualize & list a set of potential
explanatory variables.

 Discuss their anticipated relationship to
outcome variable Y , including anticipated
linearity or curvilinearity & relationships with the
other explanatory variables.
 Order the potential explanatory variables in
terms of their conceptual importance in relation
to Y.

Slide 76

 Fit a preliminary main-effects model,
dropping—for now—those variables that test
insignificant.
 Explore & add pertinent higher-order terms,
dropping—for now—those higher-order terms
that test insignificant.
 Explore & add pertinent interaction terms,
dropping—for now—those interaction terms that
test insignificant.

Slide 77

 Add those variables that previously tested

insignificant, dropping—in the context of
substantive judgement, including retention of
appropriate controls—those variables that now
test insignificant but whose absence doesn’t
practically effect the other explanatory variables.
 One by one, drop & re-add each explanatory
variable, checking for changes in coefficients
(direction, size) & p-values.
 The set of substantively meaningful &
statistically stable variables that’s left is the
preliminary final model.

Slide 78

 Re-add

all variables that are
conceptually/theoretically relevant:
this is the final, complete model.
 Conduct the diagnostic tests.
 Test nested models (keeping the
previously discussed caveats in
mind), conducting the diagnostic
tests for each model.

Slide 79

Let’s work through an example of
logistic regression, using toxic.dta.
 Toxic wastes contaminated the grounds of two

public schools in a small Vermont town (Hamilton,
Regression with Graphics, chap. 7).
 Some residents thought the schools should be
closed until proven safe (which would involve high
financial costs), while others though the schools
should stay open.
 Is the following model helpful in explaining the

residents’ opinions?

Slide 80

 Let’s assume that we’ve done the
preparatory analysis, as outlined above.
. logistic close female kids nodad educ lived
contam hsc, nolog
 quantitative—educ (years); lived(years in town)
 categorical—female; kids: kids<19 years old living in
town; contam (whether believes his/her own property
was contaminated); hsc: whether respondent attended
town Health & Safety meeting
 interaction—nodad: 1=interaction of male with no
children in town, else=0

Slide 81

 Multicollinearity:
Our model includes two quantitative variables.
We can begin by fitting an OLS regression with
the binary outcome variable & then assess the
possibility of multicollinearity by issuing the
command ‘vif’.
. reg close nodad educ lived contam hsc
. vif

 There’s no collinearity problem.
 Try Stata’s ‘collin’ (download).

Slide 82

 Now we’ll test nested models:

Slide 83

. logistic close female kids nodad educ lived contam hsc,
nolog
Logit estimates

Number of obs =
LR chi2(7)

Log likelihood = -70.524689

=

153
68.16

Prob > chi2

=

0.0000

Pseudo R2

=

0.3258

-----------------------------------------------------------------------------close | Odds Ratio

Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------female |

.949745

.5291234

-0.09 0.926

.3187016

2.830282

.51139

.2892497

-1.19 0.236

.1687719

1.549545

nodad | .1079607 .1078655

-2.23 0.026

.015234

.7650969

.9769102

kids

|

educ

| .8138141

.075845

-2.21 0.027

.677947

lived

| .9544288 .0162015

-2.75 0.006

.9231968

.9867174

contam | 3.604136 1.734916

2.66 0.008

1.403007

9.25854

4.74 0.000

4.133313

30.47554

hsc

| 11.22341 5.720166

Slide 84

. estimates store full
. fitstat, saving(m1) brief

Slide 85

. logistic close nodad educ lived contam hsc, nolog
Logit estimates

Number of obs =

LR chi2(5)
Prob > chi2
Log likelihood = -71.326227

Pseudo R2

=

153

66.56

=
=

0.0000
0.3181

-----------------------------------------------------------------------------close | Odds Ratio Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------nodad | .1771164

0.017

.0427468

.7338607

educ | .8214643 .0760781

-2.12 0.034

.6851041

.9849652

lived | .9611269 .0148794

-2.56 0.010

.9324019

.9907369

contam | 3.663985 1.746363

2.72 0.006

1.439614

9.325267

4.65 0.000

3.733856

25.52498

hsc |

.128458

9.76251 4.787244

-2.39

------------------------------------------------------------------------------

Slide 86

. lrtest full
likelihood-ratio test

LR chi2(2) =

1.60

(Assumption: . nested in full)

Prob > chi2 =

0.4486

 We fail to reject the null hypothesis & thus we

select the reduced model.

Slide 87

. fitstat, using(m1) bic
Current
Model:
N:
AIC:

Saved

logistic

logistic

153

153

1.011

1.026

Difference

0
-0.016

AIC*n:

154.652

157.049

-2.397

BIC:

-596.822

-588.364

-8.458

BIC':

-41.407

-32.949

-8.458

Difference of 8.458 in BIC provides strong support
for current model.

Slide 88

Caution in Comparing the Logistic
Coefficients across Nested Models
 Keep in mind the pitfalls of comparing logistic
coefficients across nested models.
 Try experimenting with ‘suest’ and with
standardizing Y (‘listcoef, std’).

Slide 89

 Continuing, let’s display the coefficients as
percentage change in odds:

. quietly logistic close nodad educ lived contam hsc
. listcoef, percent help

Slide 90

. listcoef, percent help
logit (N=153): Percentage Change in Odds
Odds of: close vs open

---------------------------------------------------------------------close |

b

z

P>|z|

%

%StdX

SDofX

-------------+-------------------------------------------------------nodad | -1.73095

educ |

-2.387 0.017

-82.3

-47.9

0.3768

-0.19667 -2.124 0.034

-17.9

-38.0

2.4315

-3.9

-48.9

16.9547

residence | -0.03965 -2.561 0.010
contam | 1.29855
hsc |

2.27855

2.724

0.006

266.4

79.6

0.4510

4.647 0.000

876.3

187.1

0.4628

----------------------------------------------------------------------

Slide 91

 Next we’ll look at model
specification:
. linktest, nolog

Slide 92

. linktest, nolog
Logit estimates

Number of obs =

LR chi2(2)
Log likelihood = -71.182064

=

153

66.85

Prob > chi2

=

0.0000

Pseudo R2

=

0.3195

-----------------------------------------------------------------------------close |

Coef. Std. Err.

z

P>|z|

[95% Conf. Interval]

-------------+---------------------------------------------------------------_hat | 1.007788

.16741

_hatsq | -.0488738 .0922346
_cons |

.082427

.2637352

6.02

0.000

.6796707

1.335906

-0.53 0.596

-.2296503

.1319027

0.31 0.755

-.4344844

.5993384

 There’s no problem at all.

Slide 93

 If linktest did test significant, we’d reconsider

the model, such as whether or not the outcome
variable is notably skewed & the possible need
to add, transform, categorize, or drop
explanatory variables.
 It the outcome variable were notably skewed,
we’d consider alternatives such as
complementary log-log (cloglog) or scobit
regression (see Long & Freese; Pampel; UCLAATS Stata web site; STATA Manuals).

Slide 94

 Let’s now consider summary measures of
model fit: ‘ldev’ & ‘estat gof, group(#)’. These
reflect cell patterns of observed versus fitted values
(i.e. residuals), as in contingency tables.
 We want the tests to test insignificant so that we
can fail to reject the null hypothesis that the model
correctly predicts the cell-pattern of the data.
 Covariate pattern: the set of values that
corresponds to each combination of levels of the
model’s covariates (i.e. explanatory variables); i.e.
a configuration of explanatory variables & their
values.

Slide 95

. ldev
Logistic model deviance goodness-of-fit test
number of observations =
number of covariate patterns =
deviance goodness-of-fit =
degrees of freedom =
Prob > chi2 =

153
132
129.47
126
0.3980

Slide 96

. estat gof, group(10) table
Logistic model for close, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
number of observations =
number of groups =
Hosmer-Lemeshow chi2(8) =
Prob > chi2 =

153
10
7.98
0.4359

Adjust the groups (to no fewer than 6) so that the
number of observations is relatively large (5+) in each
cell.


Slide 97

 If the summary measures of model fit did

test significant, then we’d reconsider the
model: re-check contingency tables of
the outcome variable and categorical
explanatory variables for zero cells & cells
with <5 obs, & take remedial action; recheck whether comp log-log (cloglog) or
scobit, or some other binary statistic,
should be used; & re-consider the
explanatory variables.

Slide 98

 While summary indicators of model fit are
helpful, they overlook information about a
model’s various components.
 E.g., even when the overall model fits the
data, there may be some interesting deviation
from fit for a portion of the observations.
 So, before concluding that a model fits, we
need to inspect the residual diagnostics for
the covariate patterns.

Slide 99

 Regarding the diagnostics for individual
observations & model fit, dd=deviance (i.e.
residuals for y); hat=leverage (i.e. x-outliers);
dx2 =chi2-measure of observaton fit;
db=influence of each x observation on the
beta coefficients in general; & n=identities of
the covariate patterns.

Slide 100

 We begin by predicting the probability of
outcome variable Y , followed by predicting the
residual diagnostic indicators:
. predict p if e(sample)
. predict dd if e(sample), dd
. predict h if e(sample), hat
. predict db if e(sample), db
. predict dx2 if e(sample), dx2
. predict n if e(sample), n

Slide 101

. su p-n
Variable |

Obs

Mean

Std. Dev.

Min

Max

-------------+-----------------------------------------------------------p |

153

.4313725

.3060614 .0111418

.9711162

dd |

153

1.066321

1.282536

.022654

hat |

153

.0469299

.0337735

.010832 .1845852

db |

153

.0541295

.0922495 .0001247 .6595063

dx2 |

153

1.113449

2.926733 .0113907 29.19545

6.885444

-------------+-------------------------------------------------------------n |

153

64.32026

37.9672

1

132

 Is dx2>4? Is dd or db>1? These levels are rough
indicators only (see Hosmer & Lemeshow).

Slide 102

30

. scatter dx2 p, yline(4) ml(n)

H -L dX ^2

20

11

10

60

53
65

7
30
2
75

73388 6

10
4 6 1932
1 105
7 291 3 1 11601149 9 1
8 888674 7 316
208 221217
49946
8 444233 973626
211
362
477
5

0

16
2 7 5 421161865 1 0 9
9 7 38 3
8 58 1
3
3
49
8
0 132488708 2 9 2 3 210043191221 09549 9 3 7 4
1
150
57
7
056
99
14
7
000
8
97
86
92445 5
4843219 30 06336
25
111
12
122
119
1018
1
5754816
5717
6116
550
1
049
11
5
111122
11
03
1
24
11
3
23
82
72
65

0

.2

569

.4

.6
P r ( c lo s e )

.8

1

Slide 103

0

10

H -L dX ^2

20

30

. scatter dx2 p [w=db], yline(4)

0

.2

.4

.6
P r ( c lo s e )

.8

1

Slide 104

.8

. scatter db p, yline(1) ml(n)

.6

60

.4

11
53
10 9
115

.2

16
70
32

1

110
3
59

105
131

52
85
6 69 7
41 8
81
9
3
1 0439 2 9843
7 5183793038
69 2 7
74
33
1 0 61 0 2
4 06
20 12
05049
75199
50
10
409
9445 5
4843219 0 3363548708 2 9 2253
13
116
11
211
9018
5
1754816
5717
6116
551
1
7
7
87
86
111122
11
03
1
24
13
2
82
72
51
221

0

65

0

.2

1130
46 92

8 16 2 8 5

.4

7
7 29

.6
P r ( c lo s e )

119
104
61

91
19
8 888674 7 316
208 22 214
7946
8 444233 973626
211
477
362
5

.8

1

Slide 105

0

.2

.4

.6

.8

. scatter db p [w=dx2], yline(1)

0

.2

.4

.6
P r ( c lo s e )

.8

1

Slide 106

.l n dd h dx db if n==60 | n==11 | n==53 | n==109

n

dd

h

dx

db

11.

11 6.885444

.0141875 29.19545 .4201701

66.

53 4.144958

.0520987 6.468378 .3555159

73.

60 6.068201

.0347088 18.34161 .6595063

129.

109 1.931174 .1845852 1.468668 .3324619

Slide 107

Explanatory Variables
 quantitative—educ (years); lived(years in town)
 categorical—female; kids: kids<19 years old living in
town; contam (whether believes his/her own property
was contaminated); hsc: whether respondent attended
town Health & Safety meeting
 interaction—nodad: 1=interaction of male with no
children in town, else=0

Slide 108

. l n close nodad educ lived contam hsc if n==60 | n==11
| n==53 | n==109

 What about these covariate patterns
makes them outliers? That is, how do they
deviate from the model’s predictions?
+---------------------------------------------------+
n close

nodad

educ lived contam hsc

11. 11

open

no

12

1

yes

yes

66. 53

open

no

12

40

yes

yes

73. 60

close

no

12

68

no

no

129. 109 close

yes

8

9

yes

no

Slide 109

 What is it about each of these covariate
patterns that the model can’t explain?
What insights do these deviations yield
concerning the model?
 Should we take remedial action of some
sort, such as the following?

Slide 110

Possible Remedial Action
 Check for & correct data errors.
 Possibly incorporate:
 omitted variables

 interactions
 log or other transformations
 Consider categorizing quantitative
explanatory variables.
 Consider other actions to temper outliers.

Slide 111

 For our immediate purposes, we’re
satisfied, on balance, that the model fits
well.

Slide 112

What about testing for unequal slopes
via a more or less ‘full interaction’ model?


 Doing so it routine in OLS regression,
but is hugely risky with a categorical
dependent variable.
 This is essentially due to the
nonlinearity of logistic & other such
models.

Slide 113

 On the pitfalls of exploring for unequal slopes (i.e.
comparing coefficients across groups) with a categorical
dependent variable, see Allison “Comparing Logit and
Probit Coefficients Across Groups” (1999); Norton et al.,
“Computing Interaction Effects and Standard Errors in
Logit and Probit Models” (2004); Hoetker, “Confounded
Coefficients: Accurately Comparing Logit and Probit
Coefficients Across Groups.
 Here’s a lucid review of the literature by Richard
Williams:
http://www.nd.edu/~rwilliam/oglm/index.html.

Slide 114

 STATA

commands that address the
problem: oglm (by Richard Williams,
Notre Dame); inteff (by Norton et al.,
UNC Chapel Hill); complogit (by Glenn
Hoetker, U. Illinois); vibl (by Michael
Mitchell, ATS/UCLA)
 In STATA: findit…

Slide 115

A low-tech, common-sense approach to
the problem:



Estimate a separate model for each category (e.g.,
male vs. female).
 Compare the patterns of significance.
 When an explanatory variable is significant in both
models, consider the magnitudes of their coefficients
to be significantly different from each other if their
confidence intervals do not overlap.

Slide 116

 Recalling the cautionary remarks,
here’s a nifty way to check out possible
interactions graphically via a
downloaded command (xi3):

. findit xi3 [then download]
. findit postgr3 [then download]

. help postgr3
. xi3: logistic close nodad educ lived contam
hsc, nolog

Slide 117

0

.2

.4

.6

.8

. postgr3 lived, by(contam)

0

2 0

4 0
y e a r s in t o w n

y h a t_ , c o n ta m = = n o

 No interaction.

6 0

y h a t_ , c o n ta m = = y e s

8 0

Slide 118

0

.2

.4

.6

.8

. postgr3 educ, by(contam)

5

10

15
y e a r s e d u c a t io n

y h a t_ , c o n ta m = = n o

 No interaction.

y h a t_ , c o n ta m = = y e s

20

Slide 119

 postgr3 can be used in OLS & other
kinds of regression analysis, too.

Slide 120

 Let’s next explore estimated
probabilities for particular levels of the
explanatory variables.
 We’ll begin by obtaining the baseline
estimated probabilities of Y=0, 1.

Slide 121

. prvalue, delta r(mean)
logit: Predictions for close
Pr(y=close|x):

0.4113 95% ci: (0.3141,0.5159)

Pr(y=open|x):

0.5887 95% ci: (0.4841,0.6859)

nodad

educ

lived

contam

hsc

x= .16993464 12.954248 19.267974 .28104575 .30718954

 prvalue: the estimated probability for
an average respondent (i.e. holding each
explanatory variable constant at its
mean).

Slide 122

Strategy
 Determine the range of values for independent
variables having relatively wide ranges of
estimated probability (or .5-.95 in cases of
extreme X-outliers).
 Find the extent to which change in one Xvariable affects the estimated probability by
allowing one X-variable to vary from its minimum
to maximum while holding the others constant.
 Emphasize y/x relations with relatively wide
ranges of estimated probability.

Slide 123

 Let’s next inspect the range of each X’s effects
on estimated probabilities that respondents favor
closing the town’s schools, holding each of the
other explanatory variables constant at its mean:
. prchange, fromto r(mean) brief help
from:

to:

x=min

x=max min->max x=0

nodad

0.4839

0.1424 -0.3415

0.4839

0.1424 -0.3415

educ

0.7328

0.1488 -0.5841

0.8993

0.8800 -0.0193

0.5904

0.0570 -0.5334

0.6000

0.5904 -0.0096

lived

dif:

from:

to:
x=1

dif:
0->1

contam

0.3266

0.6399

0.3133

0.3266

0.6399

0.3133

hsc

0.2576

0.7721

0.5145

0.2576

0.7721

0.5145

Slide 124

 What are the estimated probabilities of
‘close’ associated with a crossclassification of 2-4 explanatory
variables: respondent’s education level &
opinion that her/his own property is or
isn’t contaminated:
. prtab educ contam, r(mean) brief
logit: Predicted for close

 Note: better to categorize a continuous variable such as
‘educ’.

Slide 125

highest
Yr
compl

contam
no

yes

6

0.6557 0.8746

7

0.6100 0.8514

8

0.5624 0.8248

9

0.5135 0.7946

10

0.4644 0.7606

12

0.3691 0.6819

13

0.3246 0.6378

14

0.2831 0.5913

15

0.2449 0.5431

16

0.2104 0.4940

17

0.1796 0.4451

18

0.1524 0.3972

20

0.1082 0.3077

Slide 126

 The estimated probabilities of ‘close’ according
to whether respondent attended 2+ town
meetings on the school problem & where
respondent’s property is reportedly
contaminated:
. prtab hsc contam, r(mean) brief
logit: Predicted for close
| believe own
Attend 2+ | property/water
HSC

|

contam

meetings? | no

yes

----------+--------------no | 0.1941 0.4688
yes | 0.7016 0.8960

Slide 127

 Next let’s compare particular
configurations of respondent traits:
. prvalue, x(educ=12 contam=0 hsc=0) delta r(mean)
brief save

. prvalue, x(educ=16 contam=0 hsc=0) delta r(mean)
brief dif

Slide 128

 Finally, let’s graph the relationship of
years living in town by years education
to estimated probability of opinion that
the contaminated schools should be
closed.
 We’ll use the ‘prgen’ command to
create pseudo-variable data to be
graphed.

Slide 129

. prgen lived, from(1) to(81) x(educ=10) gen(p10) n(11)
. prgen lived, from(1) to(81) x(educ=12) gen(p12) n(11)
. prgen lived, from(1) to(81) x(educ=14) gen(p14) n(11)
. prgen lived, from(1) to(81) x(educ=16) gen(p16) n(11)
. prgen lived, from(1) to(81) x(educ=18) gen(p18) n(11)
. prgen lived, from(1) to(81) x(educ=20) gen(p10) n(11)
. la var p10p1 "10 years educ"
. la var p12p1 "12 years educ"
. la var p14p1 "14 years educ"

. la var p16p1 "16 years educ"
. la var p18p1 "18 years educ"
. la var p20p1 "20 years educ"

Slide 130

. scatter p10p1 p12p1 p14p1 p16p1 p18p1
p20p1 p20x, c(l l l l l l) title(“Opinion by Years
Residence & Years Education”, box bexpand)
l2title(Pr(Open|Close)) yvar(“”) xvar(“Years
residence in town”) legend(c(ltkhaki))

Slide 131

0

.2

.4

.6

.8

O p inio n b y Ye a rs R e s id e nc e & Ye a rs E d uc a tio n

0

20

40
Y e a r s r e s id e n c e in to w n

60

10 y ears educ

1 2 y e a r s e d uc

14 y ears educ

1 6 y e a r s e d uc

18 y ears educ

2 0 y e a r s e d uc

80

Slide 132

 See Long/Freese, Pampel, the UCLA-ATS
Stata website, & the Stata manuals for
alternatives to logistic regression.

 Recall that probit generally gives the same
results as logistic but gives results in probit
coefficients only.
 complementary log-log (cloglog in Stata)
is for highly skewed outcome variables, but
doesn’t give odds ratios. scobit, which is used
for the same purpose, does give odds ratios
but can be tricky to use.

Slide 133

 We have not discussed predicting
probabilities with curvilinear explanatory
variables.
 On this topic, see Long/Freese, chap.
8.

Slide 134

 Finally, models with binary outcome variables
are useful in exploring patterns of missing
values: is the pattern random or does it reflect
bias of some kind or another?

. u hsb2_miss, clear
. egen mvals=rmiss(_all)
. tab mvals
. gen mval=(mvals>=1 & mvals<.)
. tab mval

Slide 135

. tab mval female, col chi2

. tab mval ses, col chi2
. tab mval race, col chi2
. tab mval schtyp, col chi2
. ttest read, by(mval) unequal
. ttest math, by(mval) unequal

. xi: logistic mval female i.ses i.race i.schtyp,
nolog

Slide 136

 In sum, use logistic regression or an

alternative to analyze the pattern of
missing values.

 The topic of missing values has
received increased attention, as its
consequences for data analysis are
commonly ignored but can be serious in
terms of bias.
 See Allison, Missing Data (Sage
Publications).

Slide 137

 And see Gary King et al., “Analyzing
Incomplete Political Science Data…,”
APSR (March 2001); & King et al.,
“Amelia: A Program for Imputing
Missing Data.”
http://gking.harvard.edu/stats.shtml
 Note: King et al.’s downloadable-toStata program “Clarify” is an alternative
way to obtain predicted probabilities.

Slide 138

 One last question: are there any
conditions under which you might start
out doing OLS regression & decide to
use logistic regression instead?

Slide 139

Remember: State-of-the-Art on
Using Logistic or Probit Regression:
Glenn Hoetker, “The Use of Logit and Probit Models in
Strategic Management Research: Critical Issues”

http://www.business.uiuc.edu/ghoetker/documents/Hoe
tker_logit_in_strategy.pdf

VI. Logistic Regression  An event occurs or doesn’t.  A category applies to an observation or doesn’t.  A student passes or.

Transcript VI. Logistic Regression  An event occurs or doesn’t.  A category applies to an observation or doesn’t.  A student passes or.

Directory