The Sources of Associational Life: A Cross

Download Report

Transcript The Sources of Associational Life: A Cross

Count Models 2
Sociology 8811 Lecture 13
Copyright © 2007 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Paper #1 deadline coming up: March 8
• You should have a dataset by now
• You should have some simple models by now
• If not, you need to do something right away!!!
• Class schedule
• Today:
– Talk a bit about papers
– Wrap up count models
• Thursday: New topic – Event History Analysis
Review: Count Models
• Many dependent variables are counts: Nonnegative integers
• OLS is inappropriate: linearity and normality
assumptions are violated
– Solution: Poisson & Negative Binomial models
• Coefficient interpretation = similar to logit
– Exponentiated coefficients show multiplicative effect on rate
– Poisson assumes there is no overdispersion
• Skewed variables may lead to overdispersion
• If overdispersion is identified, use neg binomial model
– Neg binomial model offers chi-square test to identify
overdispersion!
Negative Binomial Example: Web Use
• Note: Info on overdispersion is provided
Negative binomial regression
Log likelihood = -4368.6846
Number of obs
LR chi2(5)
Prob > chi2
Pseudo R2
=
=
=
=
1552
57.80
0.0000
0.0066
-----------------------------------------------------------------------------wwwhr |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------male |
.3617049
.0634391
5.70
0.000
.2373666
.4860433
age | -.0109788
.0024167
-4.54
0.000
-.0157155
-.006242
educ |
.0171875
.0120853
1.42
0.155
-.0064992
.0408742
lowincome | -.0916297
.0724074
-1.27
0.206
-.2335457
.0502862
babies | -.1238295
.0624742
-1.98
0.047
-.2462767
-.0013824
_cons |
1.881168
.1966654
9.57
0.000
1.495711
2.266625
-------------+---------------------------------------------------------------/lnalpha |
.2979718
.0408267
.217953
.3779907
-------------+---------------------------------------------------------------alpha |
1.347124
.0549986
1.243529
1.459349
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 8459.61 Prob>=chibar2 = 0.000
Alpha is clearly > 0! Overdispersion is evident; LR test p<.05
You should not use Poisson Regression in this case
General Remarks
• It is often useful to try both Poisson and
Negative Binomial models
• The latter allows you to test for overdispersion
• Use LRtest on alpha (a) to guide model choice
– If you don’t suspect dispersion and alpha appears
to be zero, use Poission Regression
• It makes fewer assumptions
– Such as gamma-distributed error.
Example: Labor Militancy
Isaac &
Christiansen 2002
Note: Results are
presented as %
change
Zero-Inflated Poisson & NB Reg
• If outcome variable has many zero values it
tends to be highly skewed
• Under those circumstances, NBREG works better than
ordinary Poisson due to overdispersion
– But, sometimes you have LOTS of zeros. Even
nbreg isn’t sufficient
• Model under-predicts zeros, doesn’t fit well
– Examples:
• # violent crimes committed by a person in a year
• # of wars a country fights per year
• # of foreign subsidiaries of firms.
Zero-Inflated Poisson & NB Reg
• Logic of zero-inflated models: Assume two
types of groups in your sample
• Type A: Always zero – no probability of non-zero value
• Type ~A: Non-zero chance of positive count value
– Probability is variable, but not zero
– 1. Use logit to model group membership
– 2. Use poisson or nbreg to model counts for
those in group ~A
– 3. Compute probabilities based on those results.
Zero-Inflated Poisson & NB Reg
• Example: Web usage at work
.3
• More skewed than overall web usage. Why?
.2
Many people
don’t have
computers at
work!
0
.1
So, web
usage is zero
for many
0
20
40
hours per week using work computer www
60
Zero-Inflated Poisson & NB Reg
• Zero-inflated models in Stata
• “zip” = Poisson, zinb = negative binomial
• Commands accept two separate variable lists
– Variables that affect counts
• For those with non-zero counts
• Modeled with Poisson or NB regression
– Variables that predict membership in “zero” group
• Modeled with logit
– Ex: zinb webatwork male age educ
lowincome babies, inflate(male age
educ lowincome babies)
ZINB Example: Web Hrs at Work
• “Inflate” output = logit for group membership
Zero-inflated negative binomial regression
Number of obs
Nonzero obs
Zero obs
=
=
=
1135
562
573
Inflation model = logit
LR chi2(5)
=
13.25
Log likelihood = -2239.23
Prob > chi2
=
0.0212
-----------------------------------------------------------------------------|
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------webatwork
|
male |
.2348353
.1298324
1.81
0.070
-.0196315
.4893021
age | -.0152071
.0053766
-2.83
0.005
-.0257451
-.0046692
Education
reduces
educ |
.0126503
.0265321
0.48
0.634
-.0393517
.0646523
odds of zero value
lowincome | -.4183108
.2164324
-1.93
0.053
-.8425105
.0058889
babies |
.0588977
.1385245
0.43
0.671
-.2126053
.3304008
But
doesn’t have
_cons |
1.703158
.4538886
3.75
0.000
.8135524
2.592763
-------------+---------------------------------------------------------------an effect on count
Model
inflate
| predicting zero group
for those that are
male |
.2630493
.340892
0.77
0.440
-.4050866
.9311853
non-zero
age | -.0197401
.0195075
-1.01
0.312
-.057974
.0184939
educ | -.3601863
.071167
-5.06
0.000
-.4996711
-.2207015
lowincome |
.844378
.4013074
2.10
0.035
.0578299
1.630926
babies |
.4504404
.2502363
1.80
0.072
-.0400138
.9408947
_cons |
4.137417
1.172503
3.53
0.000
1.839354
6.43548
Zero-Inflated Poisson & NB Reg
• Remarks
– ZINB produces estimate of alpha
• Helps choose between zip & zinb
– Long and Freese (2006) have helpful tool to
compare fit of count models: countfit
• See textbook
– Zero-inflated models seem very useful
• Count variables often have many zeros
• It is often reasonable to assume a “always zero” group
– But, they are fairly new
• Not many examples in the literature
• Haven’t been widely scrutinized.
Zero-truncated Poisson & NB reg
• Truncation – the absence of information
about cases in some range of a variable
• Example: Suppose we study income based on data
from tax returns…
– Cases with income below a certain value are not required to
submit a tax return… so data is missing
• Example: Data on # crimes committed, taken from
legal records
– Individuals with zero crimes are not evident in data
• Example: An on-line survey of web use
– Individuals with zero web use are not in data
• Poisson & NB have been adapted to address
truncated data:
– Zero-truncated Poisson & Zero-trunciated NB reg.
Example: Zero-truncated NB Reg
• Web use (zeros removed)
Zero-truncated negative binomial regression
Dispersion
= mean
Log likelihood = -3653.162
Number of obs
LR chi2(5)
Prob > chi2
Pseudo R2
=
=
=
=
1304
34.87
0.0000
0.0047
-----------------------------------------------------------------------------wwwhr |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------male |
.3744582
.0874595
4.28
0.000
.2030407
.5458758
age | -.0114399
.0033817
-3.38
0.001
-.0180679
-.0048119
educ |
.0081191
.016731
0.49
0.627
-.024673
.0409112
lowincome |
.1899431
.1111248
1.71
0.087
-.0278574
.4077437
babies | -.1375942
.0860954
-1.60
0.110
-.306338
.0311496
_cons |
1.533013
.2907837
5.27
0.000
.9630872
2.102938
-------------+---------------------------------------------------------------/lnalpha |
1.099164
.1385789
.8275543
1.370774
-------------+---------------------------------------------------------------alpha |
3.001656
.4159661
2.287717
3.938396
-----------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 6857.67 Prob>=chibar2 = 0.000
Coefficient interpretation works just like ordinary poisson or NB
regression.
Empirical Example 2
• Example: Haynie, Dana L. 2001.
“Delinquent Peers Revisited: Does Network
Structure Matter?” American Journal of
Sociology, 106, 4:1013-1057.