Implementation of a double-hurdle model Bruno Garcia The

Download Report

Transcript Implementation of a double-hurdle model Bruno Garcia The

Implementation of a double-hurdle
model
Bruno Garcia
The Stata Journal (2013), 13, Number 4, pp. 776-794
Presented by Gulzat
The paper is about
• A double hurdle model (DHM) (Cragg, 1971
Econometrica 39: 829-844)
• What is new: Stata command dblhurdle (and
predict after dblhurdle )
Censored dependent variable models
• E.g. Consumer or not if a consumer the value
of the expenditure is known
• Tobit: assumes that the factors explaining of
becoming a consumer and how much to
spend have the same effect on these two
decisions
• DHM: allows these effects to differ
Tobit Model
• 𝑌𝑖 = 𝑌𝑖 ∗ 𝑖𝑓 𝑌𝑖 ∗ > 0
∗
• 𝑌𝑖 = 0
𝑖𝑓 𝑌𝑖 ≤ 0
∗
• 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝜀𝑖 and 𝜀𝑖 ≈ 𝑁(0, 𝜎 2 )
Two variables and one model to explain these
two variables
Double Hurdle Model
1. Potential consumer or not, D is not observed
• 𝐷𝑖 = 1 𝑖𝑓 𝑍𝑖 𝛿 + 𝑢𝑖 > 0
• 𝐷𝑖 = 0 𝑖𝑓 𝑍𝑖 𝛿 + 𝑢𝑖 ≤ 0
2. 𝑌𝑖 ∗ = 𝑋𝑖 𝛽 + 𝜀𝑖
• 𝑌𝑖 = 𝑌𝑖 ∗ 𝑖𝑓 𝐷𝑖 = 1 𝑎𝑛𝑑 𝑌𝑖 ∗ >0
• 𝑌𝑖 = 0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (or 𝐷𝑖 = 0 or (𝑌𝑖 ∗ ≤ 0 &𝐷𝑖 = 1) )
• 𝑢𝑖 ≈ 𝑁 0,1
• 𝜀𝑖 ≈ 𝑁(0, 𝜎 2 )
• 𝑐𝑜𝑟𝑟(𝑢𝑖 , 𝜀𝑖 )=𝜌 unobserved elements effecting
consumers/nonconsumers may affect amount of expenditure
• Individuals make decisions in two steps
Double Hurdle Model (following the
paper.....)
• Decision 1: participation
• Decision 2: quantity (maybe zero)
• 𝑦𝑖 =the observed consumption of an individual, dependent
variable continous over positive values, but
• 𝑃 𝑦 = 0 > 0 𝑎𝑛𝑑 𝑃 𝑦 < 0 = 0
• 𝑦𝑖 =
𝑥𝑖 𝛽 + 𝜖𝑖 𝑖𝑓 min 𝑥𝑖 𝛽 + 𝜖𝑖 , 𝑧𝑖 𝛾 + 𝑢𝑖 > 0
0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝜖𝑖
• 𝑢 ~𝑁 0, Σ , Σ = 1 𝜎12
𝑖
𝜎12 𝜎
• Ψ 𝑥, 𝑦, 𝜌 = CDF of a bivariate normal with correlation 𝜌
Double Hurdle Model
• The log liklihood function for the DHM (Φ − 𝐶𝐷𝐹, 𝜙 − 𝐷𝐹):
• log 𝐿 =
• +
𝑦𝑖 >0
𝑦𝑖 =0
𝑙𝑜𝑔 Φ
𝑙𝑜𝑔 1 − Φ
𝜌
𝜎
𝑥𝑖 𝛽
𝑧𝑖 𝛾,
,𝜌
𝜎
𝑧𝑖 𝛾+ (𝑦𝑖 −𝑥𝑖 𝛽)
1−𝜌2
− 𝑙𝑜𝑔 𝜎 +
Double Hurdle Model
• 𝑥𝑖 𝛽 + 𝜖𝑖 models the quantity equation
• 𝑧𝑖 𝛾 + 𝑢𝑖 models the participation equation
• The command estimates 𝛽, 𝛾, 𝜌, 𝑎𝑛𝑑 𝜎 where
𝜎 = 𝑉𝑎𝑟(𝜖)
• Restriction: 𝑉𝑎𝑟 𝑢 =
1 the model to be identified
Double Hurdle Model: Stata
Double Hurdle Model
Example: The use of the dblhurdle command using smoke.dta from
Wooldridge (2010).
Marginal effects
• The number of years of schooling (educ) on:
1. The probability of smoking
2. The expected number of cigarettes smoked
given that you smoke
3. The expected number of cigarettes smoked
Prediction
• ppar - the probability of being away from the
corner conditional on the covariates:
• ycond - expectation:
• yexpected - expected value of y conditional on
x and z:
Marginal effects
Marginal effects
Marginal effects
Monte Carlo simulation: Finite sample
properties of the estimator
• Three measures of performance:
• The mean of the estimated parameters should
be close to their true values.
• The mean standard error of the estimated
parameters over the repetitions should be
close to the standard deviation of the point
estimates.
• The rejection rate of hypothesis tests should
be close to the nominal size of the test.
Monte Carlo simulation
The data-generating process can be summarized
as follows:
Monte Carlo simulation
• A dataset of 2,000 observations was created.
• The x’s were drawn from a standard normal distribution,
and the d’s were drawn from a Bernoulli with p = 1/2.
• Refer to this dataset as “base”.
• Iteration of the simulation:
1. Use “base”.
2. For each observation, draw (gen) 𝜖 from a standard normal.
3. For each observation, draw (gen) u from a standard normal.
4. For each observation, compute y according to the datagenerating process presented above.
5. Fit the model, and save the values of interest with post.
Monte Carlo simulation
Monte Carlo simulation
• A less intuitive issue: The set of regressors in
the participation equation=the set of
regressors of the quantity equation.
• The model is weakly identified.
• The data-generating process:
Monte Carlo simulation
Conclusion
• Researchers may consider dblhurdle when using
tobit model
• Its flexibility allows the researcher to break down
the modeled quantity along two useful
dimensions, the “quantity” dimension and the
“participation” dimension
• The command presented in this article only
allows for a single corner in the data
• One desirable feature to add is the capability to
handle dependent variables with two corners