Notes on Panel Data Models with Discrete Outcomes

Transcript Notes on Panel Data Models with Discrete Outcomes

Longitudinal and Multilevel
Methods for Models with Discrete
Outcomes with Parametric and
Non-Parametric Corrections for
Unobserved Heterogeneity
David K. Guilkey
Focus of this talk:
Binary dependent variables
Unordered categorical dependent variables
Models will be logit based – will not discuss probit, poisson or negative
binomial models although STATA has methods for these estimators as
well
Empirical example uses data from the Indonesian Family Life Survey:
Two outcomes:
Binary indicator for whether the respondent uses contraception
Unordered categorical variable for method choice
Data Set Overview
Four waves of data: 1993, 1997, 2000, and 2007
Individual level information on fertility, education, migration
Community and facility level data on health and family planning
providers
Data from 321 enumeration areas – we will consider these
communities
IFLS Longitudinal Sample Size
Initial Participation
Cohort
1993
Wave 1 Cohort
3520
Wave 2 Cohort
Wave 3 Cohort
Wave 4 Cohort
total observations
Survey Year
1997
2873
2207
2000
2684
1742
1466
2007
1498
1152
933
2287
total
10575
5101
2399
2287
20362
IFLS Summary Statistics
Dependent Variables
Contraceptive Use
Method Choice
no method
temporary modern
long Lasting modern
traditional
Independent Variables
highest ed grade school
highest ed high school
highest ed college
age
muslim
number of posyandus
Observations
mean
s.d.
.588
.492
.412
.397
.168
.023
0.669
0.470
0.169
0.375
0.049
0.217
34.099
8.842
0.891
0.312
7.507
6.251
20,000
Basic Model for Longitudinal Logit:
 P (Y  1|  ) 
ln 
 X    P  Z   

 P (Y  0 |  ) 
ti
i
ti
ti
ti
i
i
i
Where:
Yti: observed binary variable (respondent i from time period t)
Xti: time varying explanatory variables (age and education level)
Pti: time varying program variable (posyandus)
Zi: time invariant regressors (Muslim)
i=1,2,…N (individuals)
t=1,2,…Ti (observations per individual -- unbalanced panel)
Assumptions:
 ~ N (0,1)
i
for the parametric logit in STATA (xtlogit, melogit, and one variant of
GLLAMM)
and:
E ( X ti i )  E ( Pti i )  E ( Z i i )  0
Note that observations for the same individual will be correlated
because of the time invariant error – sometimes referred to as
unobserved heterogeneity
Given the assumptions, estimation options are:
1. Simple logit yields consistent point estimates but incorrect SE’s
2. Simple logit with cluster option corrects SE’s
3. Parametric or semi-parametric maximum likelihood
The likelihood function for this model is derived as follows:
e X ti   Pti  Zi  i
P (Yti  1| i ) 
1  e X ti   Pti  Zi  i
This is the probability that individual i at time t is using
contraception conditional on time invariant heterogeneity.
For individual i, we observe Ti binary responses that we can write as:
Yi = (1,0,0,1) for a woman that is observed for 4 time periods and
used contraception at times 1 and 4.
Let Yi be the set of observed outcomes for individual i, then:
P (Y )  
i


Ti
 P (Y
t 1
ti
 1|  ) (1  P(Y  1|  ) f (  ) d 
Yti 1
Yti
i
ti
i
i
i
Joint probability must be approximated -- approximating
the area under a curve. With the assumption of normality
the approximation method is Gaussian Quadrature or Hermite
integration
Points:
1. More accurate with more Hermite points – but execution
time is longer.
2. You need more points as Ti gets larger.
Hermite integration replaces the integral with a sum:
P (Y )   w
M
i
m 1
Ti
 P (Y
m t 1
ti
 1|  ) (1  P(Y  1|  )
Yti
m
ti
Yti 1
m
where the weights (wm ’s) and the masspoints (μm’s) are known
because of the assumption of normality
Alternative:
The discrete factor approximation searches over weights and mass
points along with the other parameters of the model.
Must impose a normalization;
1. Weights sum to one
2. Either set one mass point to zero (fortran program) or set mean
of distribution to zero (GLLAMM)
Simple Logit
logit cont_use posyandus age grade_school high_school college muslim
Logistic regression
Log likelihood = -13304.212
Number of obs
LR chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
20000
494.70
0.0000
0.0183
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0246126
.00295
8.34
0.000
.0188306
.0303946
age | -.0267579
.0016964
-15.77
0.000
-.0300828
-.0234329
grade_school |
.488657
.0469818
10.40
0.000
.3965744
.5807396
high_school |
.3713791
.0569712
6.52
0.000
.2597175
.4830406
college |
.3879987
.0789006
4.92
0.000
.2333564
.542641
muslim | -.0432766
.0468967
-0.92
0.356
-.1351924
.0486392
_cons |
.7282074
.0909596
8.01
0.000
.5499298
.906485
------------------------------------------------------------------------------
Simple Logit with Corrected Se’s
logit cont_use posyandus age grade_school high_school college muslim, cluster(ind_id)
Logistic regression
Log pseudolikelihood = -13304.212
Number of obs
Wald chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
20000
346.25
0.0000
0.0183
(Std. Err. adjusted for 9351 clusters in ind_id)
-----------------------------------------------------------------------------|
Robust
cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0246126
.0035385
6.96
0.000
.0176772
.031548
age | -.0267579
.0019687
-13.59
0.000
-.0306165
-.0228993
grade_school |
.488657
.0573944
8.51
0.000
.376166
.601148
high_school |
.3713791
.0679893
5.46
0.000
.2381225
.5046357
college |
.3879987
.0930589
4.17
0.000
.2056065
.5703909
muslim | -.0432766
.057396
-0.75
0.451
-.1557708
.0692176
_cons |
.7282074
.1084511
6.71
0.000
.5156472
.9407677
------------------------------------------------------------------------------
Parametric Maximum Likelihood
gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id)
family(binomial) link(logit) nip(20) ip(g) trace dot
log likelihood = -12661.672
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0316745
.0049243
6.43
0.000
.022023
.0413259
age | -.0378263
.0026797
-14.12
0.000
-.0430785
-.0325741
grade_school |
.651587
.0775775
8.40
0.000
.4995379
.8036361
high_school |
.453883
.0939311
4.83
0.000
.2697814
.6379847
college |
.4845458
.12741
3.80
0.000
.2348268
.7342648
muslim |
.0000928
.0809837
0.00
0.999
-.1586322
.1588179
_cons |
.9398335
.1461969
6.43
0.000
.6532929
1.226374
-----------------------------------------------------------------------------Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (ind_id)
var(1): 2.6610493 (.1476163)
------------------------------------------------------------------------------
Semi-Parametric Maximum Likelihood
gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id)
family(binomial) link(logit) nip(3) ip(f) trace dot
log likelihood = -12660.352
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0311694
.0048755
6.39
0.000
.0216135
.0407253
age | -.0379544
.002707
-14.02
0.000
-.04326
-.0326488
grade_school |
.6591399
.0788521
8.36
0.000
.5045927
.8136871
high_school |
.4674408
.0945268
4.95
0.000
.2821716
.65271
college |
.4973757
.1278639
3.89
0.000
.2467671
.7479843
muslim |
.0008321
.0812183
0.01
0.992
-.1583529
.160017
_cons |
1.020194
.1998693
5.10
0.000
.6284575
1.411931
-----------------------------------------------------------------------------Probabilities and locations of random effects
-----------------------------------------------------------------------------***level 2 (ind_id)
loc1: -2.0306, 2.9674, .16649
var(1): 2.780105
prob: 0.2982, 0.1744, 0.5274
------------------------------------------------------------------------------
Multilevel Panel Models
Basic Form of the model:
 P (Y  1|  ,  ) 
ln 
  X  P  Z       
 P (Y  0 |  ,  ) 
tij
ij
j
tij
tij
ij
tij
j
1
ij
2
j
j
where
j=1,2,…,J (communities)
i=1,2,…,Nj (individuals from community j)
t=1,2,…,Tij (observations for person i for community j)
Xtij: individual level variables (some could be fixed through time)
Ptij: time varying program variable
Zj: time invariant community level variables
μij: time invariant individual level unobserved heterogeneity
λj: time invariant community level unobserved heterogeneity
This model allows observations on the same individual to be
correlated and observations from the same community to be
correlated.
Assumptions:
E ( X  )  E ( P  )  E (Z  )  0
tij
j
tij
j
ij
j
E ( X  )  E ( P  )  E (Z  )  0
tij
ij
tij
ij
ij
ij
1. Simple logit yields consistent point estimates but incorrect SE’s
2. Simple logit with cluster option corrects SE’s (at community level)
3. Parametric or semi-parametric maximum likelihood
Maximum likelihood estimator is a straight forward extension of the
longitudinal data model:
X   P  Z       
e tij tij j 1 ij 2 j
P (Ytij  1| ij ,  j ) 
X   P  Z       
1  e tij tij j 1 ij 2 j
You need the unconditional joint probability of the observed
set of outcomes for the set of individuals in each community:
Conditional on the unobservables at the community level, the
probability of the set of observed outcomes for person i from
community j are:
P (Y |  )  
ij
j
Tij


 P (Y
Ytij 1
Ytij
tij
t 1
 1|  ,  ) (1  P(Y  1|  ,  ) f (  ) d 
ij
j
tij
ij
j
ij
ij
The unconditional joint probability of the set of observed outcomes for
all individuals in community j is then:
P (Y )  
ij


Ni
 P (Y
i 1
tij
|  ) f ( )d 
j
j
j
We then either use Hermite integration or the discrete factor method to
approximate the integral.
Simple logit
logit cont_use posyandus age grade_school high_school college muslim
Logistic regression
Log likelihood = -13304.212
Number of obs
LR chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
20000
494.70
0.0000
0.0183
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0246126
.00295
8.34
0.000
.0188306
.0303946
age | -.0267579
.0016964
-15.77
0.000
-.0300828
-.0234329
grade_school |
.488657
.0469818
10.40
0.000
.3965744
.5807396
high_school |
.3713791
.0569712
6.52
0.000
.2597175
.4830406
college |
.3879987
.0789006
4.92
0.000
.2333564
.542641
muslim | -.0432766
.0468967
-0.92
0.356
-.1351924
.0486392
_cons |
.7282074
.0909596
8.01
0.000
.5499298
.906485
------------------------------------------------------------------------------
Simple Logit with Corrected SE’s
logit cont_use posyandus age grade_school high_school college muslim, cluster(com_id)
Logistic regression
Log pseudolikelihood = -13304.212
Number of obs
Wald chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
20000
263.28
0.0000
0.0183
(Std. Err. adjusted for 313 clusters in com_id)
-----------------------------------------------------------------------------|
Robust
cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0246126
.0052652
4.67
0.000
.014293
.0349322
age | -.0267579
.0022948
-11.66
0.000
-.0312555
-.0222603
grade_school |
.488657
.0796778
6.13
0.000
.3324914
.6448226
high_school |
.3713791
.0929568
4.00
0.000
.1891871
.553571
college |
.3879987
.1057477
3.67
0.000
.180737
.5952603
muslim | -.0432766
.1257938
-0.34
0.731
-.2898279
.2032747
_cons |
.7282074
.1919567
3.79
0.000
.3519792
1.104436
------------------------------------------------------------------------------
Parametric Maximum Likelihood
gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id
com_id) family(binomial) link(logit) nip(20) ip(g) trace dot
number of level 1 units = 20000
number of level 2 units = 9394
number of level 3 units = 313
gllamm model
log likelihood = -12548.522
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0228368
.0069701
3.28
0.001
.0091757
.036498
age |
-.037996
.0026758
-14.20
0.000
-.0432405
-.0327516
grade_school |
.6367873
.0786581
8.10
0.000
.4826202
.7909543
high_school |
.4122478
.0975244
4.23
0.000
.2211036
.6033921
college |
.4165882
.1299495
3.21
0.001
.1618919
.6712844
muslim |
.0376821
.1052797
0.36
0.720
-.1686623
.2440266
_cons |
1.00658
.1701569
5.92
0.000
.6730791
1.340082
-----------------------------------------------------------------------------Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (ind_id)
var(1): 2.2860509 (.13570515)
***level 3 (com_id)
var(1): .34625941 (.04611334)
------------------------------------------------------------------------------
Non-parametric Maximum Likelihood
gllamm cont_use posyandus age grade_school high_school college muslim, i(ind_id
com_id) family(binomial) link(logit) nip(3) ip(f) trace dot
number of level 1 units = 20000
number of level 2 units = 9394
number of level 3 units = 313
gllamm model
log likelihood = -12546.725
-----------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------posyandus |
.0208781
.0067149
3.11
0.002
.0077171
.0340391
age |
-.037949
.0027013
-14.05
0.000
-.0432435
-.0326546
grade_school |
.637163
.0795007
8.01
0.000
.4813445
.7929815
high_school |
.4185102
.097197
4.31
0.000
.2280075
.6090129
college |
.4177381
.1293205
3.23
0.001
.1642745
.6712016
muslim | -.0883427
.1015703
-0.87
0.384
-.2874169
.1107315
_cons |
1.164577
.1836346
6.34
0.000
.8046593
1.524494
-----------------------------------------------------------------------------Probabilities and locations of random effects
-----------------------------------------------------------------------------***level 2 (ind_id)
loc1: -1.9001, 2.6348, .23361
var(1): 2.2386873
prob: 0.295, 0.1648, 0.5402
***level 3 (com_id)
loc1: -1.4082, .65457, -.1872
var(1): .33048135
prob: 0.0826, 0.3421, 0.5753
------------------------------------------------------------------------------
Testing for Program Targeting
Programs may target high need areas or areas where they feel
residents would be receptive to family planning
For example: family planning programs may concentrate on high
fertility areas
Result is that simple methods may understate or overstate program
impact
Statistical Implication of program targeting:
E(P  )  0
tij
j
Solutions:
Explicitly model program placement and estimate placement
simultaneously with program impact equations (Angeles, Guilkey,
and Mroz, 1998)
Treat  as fixed effects and include dummies for communities
or some other fixed effects method (Gertler and Molyneau, 1994)
j
Angeles, Guilkey, and Mroz show that the joint modeling approach
yields smaller standard errors in Tanzania but the two methods
gave similar results
Example (fixed effects) plus Hausman Test for endogenous placement:
Efficient estimator under the null of no endogeneity (random effects):
melogit cont_use posyandus age grade_school high_school college muslim ||prov_id: ||
ind_id:,intp(20)
Mixed-effects logistic regression
Number of obs
=
20000
Integration points =
20
----------------------------------------------------------|
No. of
Observations per Group
Group Variable |
Groups
Minimum
Average
Maximum
----------------+-----------------------------------------prov_id |
16
11
1250.0
3116
ind_id |
9507
1
2.1
4
----------------------------------------------------------Integration method: mvaghermite
Wald chi2(6)
=
468.70
Log likelihood = 5142.1765
Prob > chi2
=
0.0000
-------------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------------+---------------------------------------------------------------posyandus |
.0260336
.0036554
7.12
0.000
.018869
.0331981
age | -.0279207
.0019312
-14.46
0.000
-.0317057
-.0241357
grade_school |
.603515
.0539052
11.20
0.000
.4978628
.7091672
high_school |
.4773575
.0663403
7.20
0.000
.3473329
.607382
college |
.5571055
.0914372
6.09
0.000
.3778918
.7363192
muslim |
.2747446
.0685264
4.01
0.000
.1404353
.4090539
_cons |
.3397094
.1159908
2.93
0.003
.1123716
.5670472
---------------+---------------------------------------------------------------prov_id
|
var(_cons)|
.264253
.1427983
.0916312
.7620726
---------------+---------------------------------------------------------------prov_id>ind_id |
var(_cons)|
2.878393
.2659968
2.401537
3.449935
-------------------------------------------------------------------------------LR test vs. logistic regression:
chi2(2) = 36892.78
Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. estimates store efficient
Consistent estimator under the alternate (fixed effects):
xi: melogit cont_use posyandus age grade_school high_school college muslim i.prov_id
|| ind_id:,intp(20)
Integration method: mvaghermite
Integration points =
20
Wald chi2(21)
=
485.84
Log likelihood = -12574.068
Prob > chi2
=
0.0000
-------------------------------------------------------------------------------cont_use |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------------+---------------------------------------------------------------posyandus |
.0279454
.0059279
4.71
0.000
.0163268
.0395639
age | -.0368115
.0026797
-13.74
0.000
-.0420635
-.0315595
grade_school |
.6695194
.0781108
8.57
0.000
.516425
.8226137
high_school |
.5039237
.0954551
5.28
0.000
.3168351
.6910123
college |
.5033064
.1282156
3.93
0.000
.2520084
.7546044
muslim |
.0992815
.1055572
0.94
0.347
-.1076069
.3061698
_Iprov_id_13 |
.6485017
.1559989
4.16
0.000
.3427495
.9542539
.
.
_Iprov_id_76 | -.5329505
.8731102
-0.61
0.542
-2.244215
1.178314
_cons |
.2371631
.1764231
1.34
0.179
-.1086197
.582946
---------------+---------------------------------------------------------------ind_id
|
var(_cons)|
2.564441
.1439339
2.297299
2.862648
-------------------------------------------------------------------------------LR test vs. logistic regression: chibar2(01) = 1223.43 Prob>=chibar2 = 0.0000
estimates store consistent
Hausman test results:
hausman consistent efficient
---- Coefficients ---|
(b)
(B)
(b-B)
sqrt(diag(V_b-V_B))
|
consistent
efficient
Difference
S.E.
-------------+---------------------------------------------------------------posyandus |
.0279454
.0260336
.0019118
.0046667
age |
-.0368115
-.0279207
-.0088908
.0018577
grade_school |
.6695194
.603515
.0660044
.056529
high_school |
.5039237
.4773575
.0265662
.0686342
college |
.5033064
.5571055
-.0537991
.0898804
muslim |
.0992815
.2747446
-.1754631
.0802898
-----------------------------------------------------------------------------b = consistent under Ho and Ha; obtained from meglm
B = inconsistent under Ha, efficient under Ho; obtained from meglm
Test:
Ho:
difference in coefficients not systematic
chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B)
=
31.45
Prob>chi2 =
0.0000
State Dependence and Unobserved Heterogeneity
Consider the simple model:
Yti  i   ti
Note:
Yt 1,i  i   t 1,i
Implies:
corr (YtiYt 1,i )    0
Unless i  0 (no time invariant unobserved heterogeneity)
Now consider:
Y  Yt 1,i  
ti
ti
Now:
corr (YtiYt 1,i )    0
Very difficult to distinguish between the two models
Same problem would exist if the unobserved heterogeneity
were at the community level
Solution is to estimate a comprehensive model:
 P (Y  1|  ,  ) 
ln 
  Y   X  P  Z       
 P (Y  0 |  ,  ) 
tij
ij
j
t 1 ,ij
tij
ij
tij
tij
j
1
ij
2
j
j
Initial conditions problem:
Must either be able to set Y1ij  0
or jointly estimate the equation of interest with an equation of the
form:
 P (Y  1) 
ln 
  X   P  Z       
 P (Y  0) 
0
1 ij
1 ij
1 ij
0
0
tij
j
0
1
0
ij
2
j
Often it is reasonable to set the initial value:
Observations start at the beginning of the woman’s child bearing years
In this example, it is not since women enter the year one data set at
different ages
Joint estimation is basically a simultaneous equations problem subject
to standard identification issues.
However, time varying exogenous variables provide identification (age
and education in this case)
Example follows:
Estimation with no controls for unobserved heterogeneity and
initial conditions:
DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use
UNCONDITIONAL RESULTS
LOG ODDS OF CATEGORY
RHS. VAR.
one
posyandus
grade_school
high_school
college
age
muslim
cont_use_lag
2 RELATIVE TO CATEGORY 1
COEFFICIENT
1.68868
0.01820
0.30074
0.38269
0.65160
-0.06683
-0.11749
1.55126
STD. ERR.
T-SCORE
0.1512
11.168
0.0044
4.162
0.0664
4.532
0.0873
4.385
0.1258
5.178
0.0030
-22.524
0.0710
-1.655
0.0481
32.257
FPD
0.193E+00
0.141E+01
0.136E+00
0.254E-01
0.753E-02
0.747E+01
0.172E+00
0.193E+00
SPD
-0.205E+04
-0.154E+06
-0.140E+04
-0.293E+03
-0.834E+02
-0.302E+07
-0.183E+04
-0.142E+04
Estimation with Controls:
RESULTS FOR LOGIT-TYPE EQUATION -- NUMBER:
1
DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use
UNCONDITIONAL RESULTS
LOG ODDS OF CATEGORY
RHS. VAR.
one
posyandus
grade_school
high_school
college
age
muslim
OMEGAcl
OMEGAcl
OMEGAcl
OMEGAi
OMEGAi
2 RELATIVE TO CATEGORY 1
COEFFICIENT STD. ERR.
T-SCORE
FPD
SPD
0.58041
0.4776
1.215
0.506E-02 -0.263E+03
0.03158
0.0134
2.350
0.297E-01 -0.197E+05
0.38268
0.1948
1.965
0.370E-02 -0.200E+03
0.29902
0.2368
1.263
0.730E-03 -0.385E+02
0.48432
0.3531
1.372
0.200E-03 -0.101E+02
-0.03249
0.0089
-3.654
0.145E+00 -0.283E+06
-0.31757
0.2206
-1.439
0.476E-02 -0.230E+03
0.0
-- NORMALIZED AT ZERO
1.75070
0.3575
4.897 -0.249E-03 -0.244E+02
0.12941
0.3337
0.388
0.336E-02 -0.107E+03
0.0
-- NORMALIZED AT ZERO
8.78497
2.3058
3.810
0.536E-03 -0.941E-04
Estimation with Controls (continued)
RESULTS FOR LOGIT-TYPE EQUATION -- NUMBER:
2
DEPENDENT VARIABLE (LOGIT TYPE EQUATION): cont_use
UNCONDITIONAL RESULTS
LOG ODDS OF CATEGORY
RHS. VAR.
one
posyandus
grade_school
high_school
college
age
muslim
cont_use_lag
OMEGAcl
OMEGAcl
OMEGAcl
OMEGAi
OMEGAi
2 RELATIVE TO CATEGORY 1
COEFFICIENT STD. ERR.
T-SCORE
FPD
SPD
1.14288
0.2293
4.984
0.114E-02 -0.130E+04
0.02006
0.0065
3.070
0.138E-01 -0.900E+05
0.32652
0.0831
3.929
0.146E-02 -0.107E+04
0.38546
0.1017
3.790
0.161E-04 -0.259E+03
0.66607
0.1411
4.721
0.397E-03 -0.814E+02
-0.07099
0.0033
-21.261
0.376E-01 -0.195E+07
-0.02557
0.1082
-0.236
0.130E-02 -0.114E+04
1.37790
0.0633
21.774 -0.861E-03 -0.109E+04
0.0
-- NORMALIZED AT ZERO
1.00568
0.2002
5.024 -0.619E-04 -0.250E+03
0.58970
0.0829
7.114
0.140E-02 -0.572E+03
0.0
-- NORMALIZED AT ZERO
0.51559
0.1148
4.492 -0.123E-02 -0.281E+03
HETEROGENEITY INFORMATION
COMMUNITY SPECIFIC DISTRIBUTION
POINT #
1
2
3
PROBABILITY WEIGHT
0.31240645
0.17598629
0.51160726
INDIVIDUAL SPECIFIC DISTRIBUTION
POINT #
1
2
PROBABILITY WEIGHT
0.57174255
0.42825745
Basic Model Longitudinal Multinomial Logit with 3 Choices:
U  X   P  Z     
1 ti
ti
1
1
ti
i
1
1
i
1 ti
U  X   P  Z     
2 ti
U  X   P  Z     
3 ti
2 ti
3 ti
ti
ti
2
2
3
3
ti
ti
i
i
2
2
3
3
i
i
Individual i at time t time makes choice 3 (for example) if :
P(U  U
3 ti
1 ti
and U  U )
3 ti
2 ti
If we assume that the ε’s follow independent extreme value
distributions and impose the restriction that:
      0
1
1
1
1
So that the probabilities sum to one then:
 P (Y  k |  ) 
ln 
 X   P  Z   

 P (Y  1|  ) 
ti
i
ti
ti
k
k
ti
i
k
k
i
i
for k=2,3.
The discrete factor model allows a more general pattern of
correlation:
 P (Y  k | 
ln 
 P (Y  1| 
ti
ti
km
km
)
 X   P  Z  

)
ti
k
k
ti
i
k
km
for m=1,2…,M and a common set of weights: w
allows for correlation in the μ’s
m
Unfortunately, GLLAMM estimates a needlessly restrictive
version of the model:
Parametric:
 
2
3
If there are more than 3 choices, all ρ’s are restricted
Non-parametric:
 
2m
for all m.
3m
Extension to Multilevel Panel Model:
Parametric:
 P (Y  k |  ,  ) 
ln 
  X   P  Z       
 P (Y  1|  ,  ) 
tij
ij
j
tij
tij
ij
k
k
tij
j
k
1k
ij
2k
j
Semi-parametric:
 P (Y  k |  ,  ) 
ln 
 X   P  Z     

 P (Y  1|  ,  ) 
tij
km
kn
tij
ti
km
kn
k
k
tij
j
k
km
kn
j
The empirical example estimates a model with four choices:
1= Non use
2=Temporary Methods (pill, condom, injection)
3=Long Lasting Methods (IUD, sterilization)
4=Traditional Methods
We show the complete results for the most general model and then
report partial results for other models:
DEPENDENT VARIABLE (LOGIT TYPE EQUATION): new_meth
UNCONDITIONAL RESULTS
LOG ODDS OF CATEGORY
RHS. VAR.
Posyandus
age
grade_school
high_school
college
muslim
constant
2 RELATIVE TO CATEGORY 1
COEFFICIENT STD. ERR.
T-SCORE
FPD
SPD
0.01467
0.0102
1.437 -0.233E+00 -0.120E+06
-0.06430
0.0034
-18.883 -0.109E+01 -0.246E+07
0.62999
0.0992
6.348 -0.219E-01 -0.163E+04
0.37284
0.1190
3.132 -0.678E-02 -0.351E+03
0.21653
0.1497
1.446 -0.171E-02 -0.908E+02
0.75840
0.1621
4.678 -0.300E-01 -0.201E+04
0.18753
0.3810
0.492 -0.336E-01 -0.227E+04
Community Heterogeneity
OMEGAcl
0.0
OMEGAcl
0.27663
OMEGAcl
1.07322
Individual Heterogeneity
OMEGAi
0.0
OMEGAi
1.41424
OMEGAi
-1.28669
LOG ODDS OF CATEGORY
RHS. VAR.
Posyandus
age
grade_school
high_school
college
muslim
constant
--
--
NORMALIZED AT ZERO
0.2840
0.974 -0.102E-01
0.1974
5.437 -0.181E-01
-0.889E+03
-0.116E+04
NORMALIZED AT ZERO
0.1658
8.530 -0.154E-01
0.1762
-7.302 -0.182E-01
-0.627E+03
-0.816E+03
3 RELATIVE TO CATEGORY 1
COEFFICIENT
0.03780
0.01783
0.35402
0.21220
0.48971
-0.92293
1.59472
STD. ERR.
0.0188
0.0066
0.1371
0.1968
0.2856
0.2182
0.5112
T-SCORE
2.006
2.702
2.582
1.078
1.715
-4.229
3.120
FPD
-0.972E-01
-0.451E+00
-0.838E-02
-0.270E-02
-0.892E-03
-0.108E-01
-0.134E-01
SPD
-0.699E+05
-0.100E+07
-0.555E+03
-0.130E+03
-0.433E+02
-0.648E+03
-0.760E+03
Community Heterogeneity
OMEGAcl
OMEGAcl
OMEGAcl
0.0
-2.75334
-0.57495
Individual Heterogeneity
OMEGAi
0.0
OMEGAi
-3.49818
OMEGAi
-4.50909
--
--
NORMALIZED AT ZERO
0.4127
-6.671
0.524E-04
0.4030
-1.427 -0.700E-02
-0.139E+03
-0.430E+03
NORMALIZED AT ZERO
0.3190
-10.965 -0.898E-03
0.1863
-24.207 -0.399E-02
-0.114E+03
-0.153E+03
LOG ODDS OF CATEGORY
RHS. VAR.
Posyandus
age
grade_school
high_school
college
muslim
constant
4 RELATIVE TO CATEGORY 1
COEFFICIENT
0.05196
0.02778
1.03143
1.55313
1.70120
-0.72360
-4.48901
STD. ERR.
T-SCORE
FPD
0.0139
3.734
0.431E-01
0.0051
5.405
0.174E+00
0.2152
4.793
0.271E-02
0.2487
6.246
0.140E-02
0.2826
6.021
0.535E-03
0.1498
-4.831
0.375E-02
0.4868
-9.221
0.490E-02
SPD
-0.845E+05
-0.949E+06
-0.427E+03
-0.117E+03
-0.419E+02
-0.584E+03
-0.711E+03
Community Heterogeneity
OMEGAcl
OMEGAcl
OMEGAcl
0.0
--0.36298
-0.25288
NORMALIZED AT ZERO
0.3285
-1.105
0.136E-02
0.3296
-0.767
0.231E-02
-0.167E+03
-0.235E+03
NORMALIZED AT ZERO
0.5734
-2.015
0.681E-03
0.2834
0.330
0.355E-02
-0.104E+02
-0.517E+03
Individual Heterogeneity
OMEGAi
OMEGAi
OMEGAi
0.0
-1.15542
0.09344
--
HETEROGENEITY INFORMATION
COMMUNITY SPECIFIC DISTRIBUTION
POINT #
1
2
3
PROBABILITY WEIGHT
0.25422817
0.25159735
0.49417448
INDIVIDUAL SPECIFIC DISTRIBUTION
POINT #
1
2
3
PROBABILITY WEIGHT
0.23493000
0.32619060
0.43887940
Comparison of Posyandu effects across estimation methods:
Coefficient
2 versus 1
3 versus 1
4 versus 1
2 versus 1
3 versus 1
4 versus 1
2 versus 1
3 versus 1
4 versus 1
Heterogeneity
Community
Individual
2 versus 1
3 versus 1
4 versus 1
Heterogeneity
Community
Mass points
Weights
Individual
Mass points
Weights
2 versus 1
3 versus 1
4 versus 1
Std Error
Z statistic
Multinomial Logit
.01127
.0027824
4.05
.0348281
.0031497
11.06
.03814
.0062596
6.09
Multinomial Logit with community corrected SE’s
.01127
.0042326
2.66
.0348281
.0058663
5.94
.03814
.0081891
4.66
Parametric Random effects Multilevel Multinomial Logit (GLLAMM
restrictions)
.0114064
.005755
1.98
.0348275
.0059394
5.86
.0384119
.0080835
4.75
.35132246
2.3408954
.04663305
.13771143
7.53
17.00
Non-Parametric Random effects Multilevel Multinomial Logit (GLLAMM
restrictions)
.0116273
.0059026
1.97
.035
.0060997
5.74
.0385815
.0082056
4.70
-1.3963
0.083
.6822
0.3236
-.17666
0.5934
-1.9467
0.2916
2.6613
0.1622
.24876
0.5462
Non-Parametric Random effects Multilevel Multinomial Logit (Fortran)
0.01467
0.0102
1.44
0.03780
0.0188
2.01
0.05196
0.0139
3.74

Notes on Panel Data Models with Discrete Outcomes

Transcript Notes on Panel Data Models with Discrete Outcomes

Directory