Slide16-26 Principles of Econometrics, 3rd Edition

Download Report

Transcript Slide16-26 Principles of Econometrics, 3rd Edition

ECON 4551
Econometrics II
Memorial University of Newfoundland
Qualitative and Limited Dependent Variable
Models
Adapted from Vera Tabakova’s notes

16.1 Models with Binary Dependent Variables

16.2 The Logit Model for Binary Choice

16.3 Multinomial Logit

16.4 Conditional Logit

16.5 Ordered Choice Models

16.6 Models for Count Data

16.7 Limited Dependent Variables
Principles of Econometrics, 3rd Edition
Slide 16-2
When the dependent variable in a regression model is a count of the number
of occurrences of an event, the outcome variable is y = 0, 1, 2, 3, … These
numbers are actual counts, and thus different from the ordinal numbers of
the previous section. Examples include:

The number of trips to a physician a person makes during a year.

The number of fishing trips taken by a person during the previous year.

The number of children in a household.

The number of automobile accidents at a particular intersection during a
month.

The number of televisions in a household.

The number of alcoholic drinks a college student takes in a week.
Principles of Econometrics, 3rd Edition
Slide16-3
If Y is a Poisson random variable, then its probability function is
f
 y   P Y
 y 

e 
y !  y   y  1   y  2  
y
,
y  0,1, 2,
(16.27)
y!
1
“rate”
E  Y     exp   1   2 x 
Also equal
To the variance
(16.28)
This choice defines the Poisson regression model for count data.
Principles of Econometrics, 3rd Edition
Slide16-4
If we observe 3 individuals: one faces no event, the other two two events each:
L  1 ,  2   P  Y  0   P  Y  2   P  Y  2 
ln L   1 ,  2   ln P  Y  0   ln P  Y  2   ln P  Y  2 
 e y 
ln  P  Y  y    ln 
     y ln     ln  y !
 y! 
  ex p   1   2 x   y    1   2 x   ln  y !
ln L   1 ,  2  
N
   exp   1   2 x i  
y i    1   2 x i   ln  y i !
i 1
Principles of Econometrics, 3rd Edition
Slide16-5

E  y 0    0  ex p  1   2 x 0

Which is the expected number of occurrences observed
Pr  Y  y  


exp   0  0
y
,
y  0,1, 2,
y!
Which is the predicted probability
of a certain number y of events
For someone with characteristics X0
Principles of Econometrics, 3rd Edition
Slide16-6
E  yi 
 xi
  i 2
(16.29)
You may prefer to express this marginal effect as a %:
% E  y 
 xi
Principles of Econometrics, 3rd Edition
 100
E  yi  E  yi 
 xi
 100  2 %
Slide16-7
E  y i    i  exp   1   2 x i   D i 
E  y i | D i  0   exp   1   2 x i 
If there is a dummy
Involved, be careful,
remember
E  y i | D i  1   exp   1   2 x i   
 exp   1   2 x i     exp   1   2 x i  


100 
%

100
e
 1  %


exp   1   2 x i 


Which would be identical
to the effect of a dummy
In the log-linear model
we saw under OLS
Principles of Econometrics, 3rd Edition
Slide16-8
Example on Olympic Medals
# Poisson Regression
open "c:\Program Files\gretl\data\poe\olympics.gdt"
smpl year = 88 --restrict
genr lpop = log(pop)
genr lgdp = log(gdp)
poisson medaltot const lpop lgdp
genr mft = exp($coeff(const)+$coeff(lpop)*median(lpop) \
+$coeff(lgdp)*median(lgdp))*$coeff(lgdp)
Which would give you the marginal effect of GDP for the median country
genr predicted medals = exp($coeff(const)+$coeff(lpop)*median(lpop) \
+$coeff(lgdp)*median(lgdp))
0.863 medals for those with median GDP and pop
Principles of Econometrics, 3rd Edition
Slide16-9
Extensions: overdispersion
Under a plain Poisson the mean of the count is assumed to be equal to
the variance (equidispersion)
This will often not hold
Real life data are often overdispersed
For example:
• a few women will have many affairs and many women will have few
• a few travelers will make many trips to a park and many will make few
• etc.
Principles of Econometrics, 3rd Edition
Slide16-10
Extensions: overdispersion
open C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta
. poisson
visits Travelcost
educat income
Iteration 0:
log likelihood = -1321.4696
1 :p e rlsoo
g nltir
k eilp
i h oTor
d a=v e
- 1l3c2 o
1 .s4t
665
Iteration 2:
log likelihood = -1321.4665
r asto
io
. pIotie s
nn
educat income, nolog
P o iPsosi o
s sno n rreeg
g rree
ss
s isoi
non
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -1321.4665
Log likelihood = -2541.5165
visits
r aovn
el
s tp
p e rT s
tcroi
educat
income
T r a v e l _ccoos
n st
educat
income
_cons
Coef.
Std. Err.
z
- . 3 2 9 9 6 5C
5o e f
.0
.5 2 9 4 0S2 t d . - 6E.r2 3r .
-.0307667
.026493
-1.16
-.0019933
.0007191
-2.77
-7.69
81 2 5 4 9.3 0 4 3 579.47 93
.8
5 7597
10 7 1
.1
-.0206209
-.0014578
2.144476
Principles of Econometrics, 3rd Edition
.0163568
.0004404
.0688666
P>|z|
N =u m b e r o9f1 9 o b s
5 6(.3
61
L =R c h i 2
)
=
0.0000
P =r o b >0 . 0c2h1 0i 2
Pseudo R2
=
=
=
=
[95% Conf. Interval]
0.000
7 2>6|
4 z | - . 2 2 6 2 0[
45
z -.433P
95% Conf.
0.246
-.0826921
.0211587
0.006
-.0034027
-.0005839
201 . 9 5 . 6 5 5 0
.042515
0 .00
9 8.60
5 0 0 1 . 0 9 7-11
72
-1.26
-3.31
31.14
919
671.71
0.0000
0.1167
0.207
0.001
0.000
-.0526797
-.002321
2.0095
Interval]
-.8716285
.0114379
-.0005946
2.279452
Slide16-11
Extensions: overdispersion
open C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta
. poisson
visits Travelcost
Iteration 0:
Iteration 1:
Iteration 2:
educat income
log likelihood = -1321.4696
log likelihood = -1321.4665
log likelihood = -1321.4665
Poisson regression
Variable
Obs
Number of obs
chi2(3)
Prob > chi2
Pseudo R2
Mean L R
Log likelihood = -1321.4665
persontrip
Travelcost
visits
income
Travelcost
educat
educat
income
_cons
Coef.
-.3299655
-.0307667
-.0019933
.8765791
966
3.824017
947
S t d . E r r . .7748112
z
P>|z|
966
88.83793
.0529402
-6.23
0.000
. 0 2 6 4 9 3 4.144989
-1.16
0.246
938
.0007191
.1125493
Principles of Econometrics, 3rd Edition
-2.77
7.79
0.006
0.000
=
=
Std. Dev.
919
56.61
0.0000
0.0210
Min
Max
6.264637
1
.6820585
[ 9 5 % C o n f . I n t e r.0036767
val]
41.94486
20
-.4337264
-.2262045
-.0826921
.0211587
1.120433
1
91
7.8652
160
6
-.0034027
.6559865
=
=
-.0005839
1.097172
Slide16-12
Extensions: overdispersion
open C:\Users\rmartinezesp\aaa\bbbECONOMETRICS\Rober\4551\GROSMORNE.dta
Principles of Econometrics, 3rd Edition
Slide16-13
Extensions: negative binomial
Under a plain Poisson the mean of the count is assumed to be equal to
the average (equidispersion)
The Poisson will inflate your t-ratios in this case, making you think that your
model works better than it actually does 
Or use a Negative Binomial model instead (nbreg) or even a Generalised
Negative Binomial (gnbreg) , which will allow you to model the
overdispersion parameter as a function of covariates of our choice
You can also test for overdispersion, to test whether the problem is significant
Principles of Econometrics, 3rd Edition
Slide16-14
Extensions: negative binomial
. nbreg persontrip Travelcost
educat income, nolog
Negative binomial regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Dispersion
= mean
Log likelihood = -2038.1155
persontrip
Coef.
Std. Err.
Travelcost
educat
income
_cons
-.7135986
-.0218888
-.0014357
1.994577
.0489137
.0248201
.0006578
.1037
/lnalpha
-1.190022
alpha
.3042145
-14.59
-0.88
-2.18
19.23
P>|z|
0.000
0.378
0.029
0.000
919
236.04
0.0000
0.0547
[95% Conf. Interval]
-.8094676
-.0705353
-.0027249
1.791329
-.6177295
.0267578
-.0001465
2.197826
.0724583
-1.332038
-1.048006
.0220429
.2639388
.3506361
Likelihood-ratio test of alpha=0:
Principles of Econometrics, 3rd Edition
z
=
=
=
=
chibar2(01) = 1006.80 Prob>=chibar2 = 0.000
Slide16-15
Extensions: negative binomial
Principles of Econometrics, 3rd Edition
Slide16-16
Extensions: excess zeros
Often the numbers of zeros in the sample cannot be accommodated
properly by a Poisson or Negative Binomial model
They would underpredict them too
There is said to be an “excess zeros” problem
You can then use hurdle models or zero inflated or zero augmented
models to accommodate the extra zeros
Principles of Econometrics, 3rd Edition
Slide16-17
Extensions: excess zeros
0
.2
Proportion
nbvargr
Is a very useful
command
.4
They would underpredict
them too
.6
Often the numbers of zeros in the sample cannot be accommodated
properly by a Poisson or Negative Binomial model
0
2
4
6
8
10
k
mean = 3.296; overdispersion = 5.439
observed proportion
poisson prob
Principles of Econometrics, 3rd Edition
neg binom prob
Slide16-18
Extensions: excess zeros
You can then use hurdle models or zero inflated or zero augmented
models to accommodate the extra zeros
They will also allow you to have a different process driving the value of the
strictly positive count and whether the value is zero or strictly positive
EXAMPLES:
•Number of extramarital affairs versus gender
•Number of children before marriage versus religiosity
In the continuous case, we have similar models (e.g. Cragg’s Model) and an
example is that of size of Insurance Claims from fires versus the age of the
building
Principles of Econometrics, 3rd Edition
Slide16-19
Extensions: excess zeros
You can then use hurdle models or zero inflated or zero augmented
models to accommodate the extra zeros
Hurdle Models
A hurdle model is a modified count model in which there are two processes, one
generating the zeros and one generating the positive values. The two models are
not constrained to be the same. In the hurdle model a binomial probability model
governs the binary outcome of whether a count variable has a zero or a positive
value. If the value is positive, the "hurdle is crossed," and the conditional
distribution of the positive values is governed by a zero-truncated count model.
Example: smokers versus non-smokers, if you are a smoker you will smoke!
Principles of Econometrics, 3rd Edition
Slide16-20
Extensions: excess zeros
Hurdle Models
In Stata Joseph Hilbe’s downloadable ado HPLOGIT will work, although it does
not allow for two different sets of variables, just two different sets of coefficients
Example: smokers versus non-smokers, if you are a smoker you will smoke!
Principles of Econometrics, 3rd Edition
Slide16-21
Extensions: excess zeros
You can then use hurdle models or zero inflated or zero augmented
models to accommodate the extra zeros
Zero-inflated models (initially suggested by D. Lambert) attempt to account for
excess zeros in a subtly different way.
In this model there are two kinds of zeros, "true zeros" and excess zeros.
Zero-inflated models estimate also two equations, one for the count model and
one for the excess zero's.
The key difference is that the count model allows zeros now. It is not a truncated
count model, but allows for “corner solutions”
Example: meat eaters (who sometimes just did not eat meat that week) versus
vegetarians who never ever do
Principles of Econometrics, 3rd Edition
Slide16-22
Extensions: excess zeros
webuse fish
We want to model how many fish are being caught by fishermen at a state park.
Visitors are asked how long they stayed, how many people were in the group,
were there children in the group and how many fish were caught.
Some visitors do not fish at all, but there is no data on whether a person fished or
not.
Some visitors who did fish did not catch any fish (and admitted it ) so there are
excess zeros in the data because of the people that did not fish.
Principles of Econometrics, 3rd Edition
Slide16-23
Extensions: excess zeros
150
. histogram count, discrete freq
0
50
Frequency
100
Lots of zeros!
0
50
100
150
count
Principles of Econometrics, 3rd Edition
Slide16-24
Extensions: excess zeros (sample restricted to count<29)
0.6
count
gamma(0.21421,8.5873)
Test statistic for gamma:
z = -1.384 pvalue = 0.16642
0.5
. histogram count, discrete freq
0.4
Density
Lots of zeros!
0.3
0.2
0.1
0
0
5
Principles of Econometrics, 3rd Edition
10
15
count
20
25
Slide16-25
Extensions: excess zeros number of affairs (Fair 1978)
. histogram count, discrete freq
We sill showcase zero-inflated models using
STATA now…
LIMDEP has an extra option to run this from
Poisson or Negative
Lots of Binomial
zeros! dialogs
You would need to program it in GRETL using
its maximum likelihood routines (there is a ZIP
example on the pdf user’s guide) LIMDEP has
an extra option to run this from Poisson or
Negative Binomial dialogs
You would need to program it in GRETL using
its maximum likelihood routines (there is a ZIP
example on the pdf user’s guide)
Principles of Econometrics, 3rd Edition
Slide16-26
Extensions: excess zeros (greene22_2.gdt)
genr ANYAFFAIRS = ( Y>0)
. zip naffairs
age male relig , inflate(
age
male
relig ) vuong nolog
Zero-inflated Poisson regression
Number of obs
Nonzero obs
Zero obs
=
=
=
601
150
451
Inflation model = logit
Log likelihood
=
-810.055
LR chi2(3)
Prob > chi2
=
=
29.67
0.0000
naffairs
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
naffairs
age
male
relig
_cons
.015609
-.1598035
-.0971114
1.581638
.0038029
.0686006
.0292688
.1577305
4.10
-2.33
-3.32
10.03
0.000
0.020
0.001
0.000
.0081555
-.2942583
-.1544772
1.272492
.0230625
-.0253487
-.0397456
1.890784
age
male
relig
_cons
-.019041
-.1791471
.2884574
.9322364
.0104841
.1948003
.0841492
.3901503
-1.82
-0.92
3.43
2.39
0.069
0.358
0.001
0.017
-.0395895
-.5609488
.1235281
.1675558
.0015075
.2026546
.4533867
1.696917
inflate
Vuong test of zip vs. standard Poisson:
Principles of Econometrics, 3rd Edition
z =
Vuong test
11.66
Pr>z = 0.0000
Slide16-27
Extensions: excess zeros
. zinb naffairs
age male relig , inflate(
age
male
relig ) vuong nolog
Zero-inflated negative binomial regression
Number of obs
Nonzero obs
Zero obs
=
=
=
601
150
451
Inflation model = logit
Log likelihood
=
-726.405
LR chi2(3)
Prob > chi2
=
=
8.92
0.0304
naffairs
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
naffairs
age
male
relig
_cons
.0258188
-.2214886
-.1472717
1.273196
.0107692
.1660362
.0749567
.3874106
2.40
-1.33
-1.96
3.29
0.017
0.182
0.049
0.001
.0047115
-.5469135
-.2941842
.5138849
.046926
.1039364
-.0003593
2.032506
age
male
relig
_cons
-.014892
-.2309299
.274744
.6673066
.0113465
.2091759
.0904315
.433002
-1.31
-1.10
3.04
1.54
0.189
0.270
0.002
0.123
-.0371308
-.6409071
.0975014
-.1813618
.0073468
.1790474
.4519865
1.515975
/lnalpha
-.2743069
.2532933
-1.08
0.279
-.7707527
.2221388
alpha
.7600988
.1925279
.4626647
1.248745
inflate
Vuong test of zinb vs. standard negative binomial: z =
Principles of Econometrics, 3rd Edition
Vuong test
2.82
Pr>z = 0.0024
Slide16-28
Extensions: truncation
• Count
data can be truncated too (usually at zero)
• So ztp and ztnb can accommodate that
• Example: you interview visitors at the recreational site, so they all
made at least that one trip
•In the continuous case we would have to use the truncreg
command
Principles of Econometrics, 3rd Edition
Slide16-29
Extensions: truncation
This model works much better and showcases the bias in the previous estimates:
•
. ztp
persontrip Travelcost
educat income, nolog
Zero-truncated Poisson regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -2412.6552
persontrip
Coef.
Travelcost
educat
income
_cons
-1.380461
-.0170332
-.0013521
2.278878
Std. Err.
.0571736
.0175026
.000473
.0728394
z
-24.15
-0.97
-2.86
31.29
P>|z|
0.000
0.330
0.004
0.000
=
=
=
=
919
885.68
0.0000
0.1551
[95% Conf. Interval]
-1.492519
-.0513376
-.0022791
2.136116
-1.268403
.0172712
-.0004251
2.421641
Smaller now estimated Consumer Surplus
Principles of Econometrics, 3rd Edition
Slide16-30
Extensions: truncation
This model works much better and showcases the bias in the previous estimates:
• Now accounting for overdispersion
. ztnb
persontrip Travelcost
educat income, nolog
Zero-truncated negative binomial regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Dispersion
= mean
Log likelihood = -1866.326
persontrip
Coef.
Travelcost
educat
income
_cons
-1.079011
-.0216377
-.0016369
2.015503
.068793
.0322941
.0008563
.1344308
/lnalpha
-.6368613
alpha
.52895
Std. Err.
-15.68
-0.67
-1.91
14.99
P>|z|
0.000
0.503
0.056
0.000
919
263.89
0.0000
0.0660
[95% Conf. Interval]
-1.213843
-.084933
-.0033152
1.752024
-.9441795
.0416576
.0000413
2.278983
.101849
-.8364818
-.4372409
.053873
.433232
.6458158
Likelihood-ratio test of alpha=0:
Principles of Econometrics, 3rd Edition
z
=
=
=
=
chibar2(01) = 1092.66 Prob>=chibar2 = 0.000
Slide16-31
Extensions: truncation and endogenous stratification
Example: you interview visitors at the recreational site, so they all
made at least that one trip
• You interview patients at the doctors’ office about how often they
visit the doctor
• You ask people in George St. how often the go to George St…
•
•Then you are oversampling “frequent visitors” and biasing your
estimates, perhaps substantially
Principles of Econometrics, 3rd Edition
Slide16-32
Extensions: truncation and endogenous stratification
•Then you are oversampling “frequent visitors” and biasing your
estimates, perhaps substantially
•It turns out to be supereasy to deal with a Truncated and
Endogenously Stratified Poisson Model (as shown by Shaw, 1988):
Simply run a plain Poisson on “Count-1” and that will work (In
STATA: poisson on the corrected count)
It is more complex if there is overdispersion though 
Principles of Econometrics, 3rd Edition
Slide16-33
Extensions: truncation and endogenous stratification
•Supereasy to deal with a Truncated and Endogenously Stratified
Poisson Model
. poisson
persontripminusone Travelcost
educat income, nolog
Poisson regression
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -2474.3262
persontrip~e
Coef.
Travelcost
educat
income
_cons
-1.657986
-.0202144
-.0016285
2.191885
Std. Err.
.0620722
.0191574
.0005184
.0792934
z
-26.71
-1.06
-3.14
27.64
P>|z|
0.000
0.291
0.002
0.000
=
=
=
=
919
1071.95
0.0000
0.1780
[95% Conf. Interval]
-1.779646
-.0577622
-.0026446
2.036473
-1.536327
.0173333
-.0006124
2.347298
Much smaller now estimated Consumer Surplus
Principles of Econometrics, 3rd Edition
Slide16-34
Extensions: truncation and endogenous stratification
•Endogenously Stratified Negative Binomial Model (as shown by
Shaw, 1988; Englin and Shonkwiler, 1995):
. nbstrat
persontrip Travelcost
educat income, nolog
Negative Binomial with Endogenous Stratification
Log likelihood = -1837.3183
Travelcost
educat
income
_cons
-1.152915
-.0229483
-.0017368
1.189429
.0695958
.0318753
.0008447
.1561017
-16.57
-0.72
-2.06
7.62
0.000
0.472
0.040
0.000
-1.289321
-.0854228
-.0033923
.8834757
-1.01651
.0395261
-.0000813
1.495383
/lnalpha
.092944
.1482435
0.63
0.531
-.197608
.3834959
alpha
1.0974
.1626825
.8206915
1.467406
4.007
0.000
P>|z|
919
283.49
0.0000
Coef.
=
=
z
=
=
=
persontrip
AIC Statistic
Deviance
Std. Err.
Number of obs
Wald chi2(3)
Prob > chi2
[95% Conf. Interval]
BIC Statistic =
Dispersion
=
-6243.307
0.000
Even after accounting for overdispersion, CS estimate is relatively low
Principles of Econometrics, 3rd Edition
Slide16-35
Extensions: truncation and endogenous stratification
•How do we calculate the pseudo-R2 for this model???
. nbstrat
persontrip Travelcost
educat income, nolog
Negative Binomial with Endogenous Stratification
Log likelihood = -1837.3183
Travelcost
educat
income
_cons
-1.152915
-.0229483
-.0017368
1.189429
.0695958
.0318753
.0008447
.1561017
-16.57
-0.72
-2.06
7.62
0.000
0.472
0.040
0.000
-1.289321
-.0854228
-.0033923
.8834757
-1.01651
.0395261
-.0000813
1.495383
/lnalpha
.092944
.1482435
0.63
0.531
-.197608
.3834959
alpha
1.0974
.1626825
.8206915
1.467406
Principles of Econometrics, 3rd Edition
4.007
0.000
P>|z|
919
283.49
0.0000
Coef.
=
=
z
=
=
=
persontrip
AIC Statistic
Deviance
Std. Err.
Number of obs
Wald chi2(3)
Prob > chi2
[95% Conf. Interval]
BIC Statistic =
Dispersion
=
-6243.307
0.000
Slide16-36
Extensions: truncation and endogenous stratification
•GNBSTRAT will also allow you to model the overdispersion
parameter in this case, just as gnbreg did for the plain case
Principles of Econometrics, 3rd Edition
Slide16-37
NOTE: what is the exposure
• Count models often need to deal with the fact that the counts may be measured
over different observation periods, which might be of different length (in terms of
time or some other relevant dimension)
For example, the number of accidents are recorded for 50 different intersections.
However, the number of vehicles that pass through the intersections can vary
greatly. Five accidents for 30,000 vehicles is very different from five accidents for
1,500 vehicles.
Count models account for these differences by including the log of the exposure
variable in model with coefficient constrained to be one.
The use of exposure is often superior to analyzing rates as response variables as
such, because it makes use of the correct probability distributions
Principles of Econometrics, 3rd Edition
Slide16-38

16.7.1 Censored Data
Figure 16.3 Histogram of Wife’s Hours of Work in 1975
Principles of Econometrics, 3rd Edition
Slide16-39
Having censored data means that a substantial fraction of the
observations on the dependent variable take a limit value. The
regression function is no longer given by (16.30).
E  y | x   1   2 x
(16.30)
The least squares estimators of the regression parameters obtained by
running a regression of y on x are biased and inconsistent—least
squares estimation fails.
Principles of Econometrics, 3rd Edition
Slide16-40
Having censored data means that a substantial fraction of the
observations on the dependent variable take a limit value. The
regression function is no longer given by (16.30).
E  y | x   1   2 x
(16.30)
The least squares estimators of the regression parameters obtained by
running a regression of y on x are biased and inconsistent—least
squares estimation fails.
Principles of Econometrics, 3rd Edition
Slide16-41


With truncation, we only observe the value of
the regressors when the dependent variable
takes a certain value (usually a positive one
instead of zero)
With censoring we observe in principle the
value of the regressors for everyone, but not
the value of the dependent variable for those
whose dependent variable takes a value
beyond the limit
We give the parameters the specific values and  1   9 and  2  1.
y i   1   2 x i  ei   9  x i  ei
*
(16.31)
Assume ei ~ N  0,  2  16  .
y i  0 if y i  0;
*
y i  y i if y i  0.
*
*
Principles of Econometrics, 3rd Edition
Slide16-43

Create N = 200 random values of xi that are spread evenly (or
uniformly) over the interval [0, 20]. These we will keep fixed in
further simulations.

Obtain N = 200 random values ei from a normal distribution with
mean 0 and variance 16.


Create N = 200 values of the latent variable.
Obtain N = 200 values of the observed yi using
Principles of Econometrics, 3rd Edition
 0
yi  
*
 y i
if y i  0
*
if y i  0
*
Slide16-44
Figure 16.4 Uncensored Sample Data and Regression Function
Slide16-45
Figure 16.5 Censored Sample Data, and Latent Regression Function and Least Squares Fitted Line
Principles of Econometrics, 3rd Edition
Slide16-46
OLS for all the 200 observations predicts:
yˆ i   2.1477  .5161 x i
(se) (.3706)
(.0326)
(16.32a)
OLS for only the 100 positive observations (y >0) predicts:
yˆ i   3.1399  .6388 x i
(se) (1.2055) (.0827)
(16.32b)
Our Monte Carlo experiment resamples 200 times and on average predicts on average:
E M C  bk  
Principles of Econometrics, 3rd Edition
1
N SAM
N SAM
m 1

bk ( m )
(16.33)
Slide16-47
OLS for all the 200 observations predicts:
yˆ i   2.1477  .5161 x i
(se) (.3706)
(.0326)
(16.32a)
OLS for only the 100 positive observations (y >0) predicts:
yˆ i   3.1399  .6388 x i
(se) (1.2055) (.0827)
(16.32b)
Our Monte Carlo experiment resamples 200 times and on average predicts on average:
E M C  bk  
Principles of Econometrics, 3rd Edition
1
N SAM
N SAM
m 1

bk ( m )
(16.33)
Slide16-48
The maximum likelihood procedure is called Tobit in honor of James
Tobin, winner of the 1981 Nobel Prize in Economics, who first
studied this model.
The probit probability that yi = 0 is:
P  y i  0   P [ y i  0 ]  1      1   2 x i   


    2 xi
L   1 ,  2 ,     1    1

yi  0 

Principles of Econometrics, 3rd Edition
1


1
2 


2
2
 y i  1   2 xi   
      2   exp  
2
  yi  0 
 2

Slide16-49
The maximum likelihood estimator is consistent and asymptotically
normal, with a known covariance matrix.
Using the artificial data the fitted values are:
y i   10.2773  1.0487 x i
(se) (1.0970)
Principles of Econometrics, 3rd Edition
(16.34)
(.0790)
Slide16-50
Principles of Econometrics, 3rd Edition
Slide16-51
You can run this experiment yourselves in GRETL
open "c:\Program Files\gretl\data\poe\tobit.gdt"
smpl 1 200
genr xs = 20*uniform()
loop 1000 --progressive
genr y = -9 + 1*xs + 4*normal()
genr yi = y > 0
#which is a handy command to generate dummies!
genr yc = y*yi
ols yc const xs --quiet
genr b1s = $coeff(const)
genr b2s = $coeff(xs)
store coeffs.gdt b1s b2s
endloop
Principles of Econometrics, 3rd Edition
Slide16-52
open "c:\Program Files\gretl\data\poe\tobit.gdt"
genr xs = 20*uniform()
genr idx = 1
matrix A = zeros(1000,3)
loop 1000 --quiet
smpl --full
genr y = -9 + 1*xs + 4*normal()
smpl y > 0 --restrict
ols y const xs --quiet
genr b1s = $coeff(const)
genr b2s = $coeff(xs)
matrix A[idx,1]=idx
matrix A[idx,2]=b1s
matrix A[idx,3]=b2s
genr idx = idx + 1
endloop
# The matrix A contains all 1000 sets of coefficients
# bb finds the column mean of A
matrix bb = meanc(A) bb
Principles of Econometrics, 3rd Edition
Slide16-53
# The matrix A contains all 1000 sets of coefficients
# bb finds the column mean of A
matrix bb = meanc(A) bb
Note that the first cell refers to the average of the “case” number
(500.5, since there are 1000 cases numbered 1 to 1000)
Principles of Econometrics, 3rd Edition
Slide16-54
E  y | x 
x
 1   2 x 
  2 




(16.35)
Because the cdf values are positive, the sign of the coefficient does
tell the direction of the marginal effect, just not its magnitude. If
β2 > 0, as x increases the cdf function approaches 1, and the slope of
the regression function approaches that of the latent variable model.
Principles of Econometrics, 3rd Edition
Slide16-55
Figure 16.6 Censored Sample Data, and Regression Functions
for Observed and Positive y values
Principles of Econometrics, 3rd Edition
Slide16-56
H O U R S   1   2 E D U C   3 E X P E R   4 A G E   4 K ID SL 6  e
 E  H O U RS 
 ED U C
Principles of Econometrics, 3rd Edition
(16.36)
  2   73.29  .3638  26.34
Slide16-57
Principles of Econometrics, 3rd Edition
Slide16-58
#Tobit
open "c:\Program Files\gretl\data\poe\mroz.gdt"
tobit hours const educ exper age kidsl6
genr H_hat = $coeff(const)+$coeff(educ)*mean(educ) +$coeff(exper)*mean(exper) \
+$coeff(age)*mean(age)+$coeff(kidsl6)*1
genr z = cnorm(H_hat/$sigma)
genr pred = z*$coeff(educ)
smpl hours > 0 --restrict
ols hours const educ exper age kidsl6
smpl --full
ols hours const educ exper age kidsl6
Principles of Econometrics, 3rd Edition
Slide16-59

Problem: our sample is not a random sample. The data we observe are
“selected” by a systematic process for which we do not account.

Solution: a technique called Heckit, named after its developer (not
original author), Nobel Prize winning econometrician James
Heckman.
Principles of Econometrics, 3rd Edition
Slide16-60

The econometric model describing the situation is composed of two
equations. The first, is the selection equation that determines
whether the variable of interest is observed.
zi   1   2 wi  u i
*
1
zi  
 0
Principles of Econometrics, 3rd Edition
i  1,
,N
(16.37)
zi  0
*
(16.38)
otherw ise
Slide16-61

The second equation is the linear model of interest. It is
y i   1   2 x i  ei
i  1,
,n
*
E  y i | z i  0    1   2 x i     i
i 
Principles of Econometrics, 3rd Edition
   1   2 wi 
   1   2 wi 
N n
i  1,
(16.39)
,n
(16.40)
(16.41)
Slide16-62

The estimated “Inverse Mills Ratio” is
i 

   1   2 wi 
   1   2 wi 
The estimating equation is
y i  1   2 xi     i  vi
Principles of Econometrics, 3rd Edition
i  1,
,n
(16.42)
Slide16-63
OLS yields
ln  W A G E    .4002  .1095 E D U C  .0157 E X P E R
(t-stat)
(  2.10) (7.73)
R  .1484
2
(16.43)
(3.90)
Probit on dummy indicating “being in the labour force” yields:
P  L F P  1   1.1923  .0206 A G E  .0838 E D U C  .3139 K ID S  1.3939 M T R 
(t-stat)
(  2.93)
(3.61)
(  2.54)
(  2.26)
From here predict the inverse Mill’s ratio:
 1.1923  .0206 A G E  .0838 E D U C  .3139 K ID S  1.3939 M T R 
  IM R 
  1.1923  .0206 A G E  .0838 E D U C  .3139 K ID S  1.3939 M T R 
Principles of Econometrics, 3rd Edition
Slide16-64
ln  W A G E   .8105  .0585 E D U C  .0163 E X P E R  .8664 IM R
(1.64)
(2.45)
(4.08)
(  2.65)
(t-stat-adj) (1.33)
(1.97)
(3.88)
(  2.17 )
(t-stat)

(16.44)
The maximum likelihood estimated wage equation is
ln  W A G E   .6686  .0658 E D U C  .0118 E X P E R
(t-stat)
(2.84) (3.96)
(2.87)
The standard errors based on the full information maximum likelihood
procedure are smaller than those yielded by the two-step estimation method.
Principles of Econometrics, 3rd Edition
Slide16-65
#Heckit
open "c:\Program Files\gretl\data\poe\mroz.gdt“
genr kids = (kidsl6+kids618>0)
logs wage
list X = const educ exper
list W = const mtr age kids educ
probit lfp W
genr ind = $coeff(const) + $coeff(age)*age + $coeff(educ)*educ + $coeff(kids)*kids + $coeff(mtr)*mtr
#Predict the inverse Mill’s ratio:
genr lambda = dnorm(ind)/cnorm(ind)
ols l_wage X lambda
heckit l_wage X ; lfp W --two-step
Principles of Econometrics, 3rd Edition
Slide16-66















binary choice models
censored data
conditional logit
count data models
feasible generalized least squares
Heckit
identification problem
independence of irrelevant
alternatives (IIA)
index models
individual and alternative specific
variables
individual specific variables
latent variables
likelihood function
limited dependent variables
linear probability model
Principles of Econometrics, 3rd Edition

















logistic random variable
logit
log-likelihood function
marginal effect
maximum likelihood estimation
multinomial choice models
multinomial logit
odds ratio
ordered choice models
ordered probit
ordinal variables
Poisson random variable
Poisson regression model
probit
selection bias
tobit model
truncated data
Slide 16-67


Survival analysis (time-to-event data
analysis)
Multivariate probit (biprobit, triprobit,
mvprobit)




Hoffmann, 2004 for all topics
Long, S. and J. Freese for all topics
Cameron and Trivedi’s book for count data
Agresti, A. (2001) Categorical Data Analysis
(2nd ed). New York: Wiley.