1st. STATA Group Meeting Mexico Discussion of user-written Stata programs Predicting counterfactual densities with the DFL Ado-file: A pertinent constructive critique. Luis Huesca Reynoso Centro.

Download Report

Transcript 1st. STATA Group Meeting Mexico Discussion of user-written Stata programs Predicting counterfactual densities with the DFL Ado-file: A pertinent constructive critique. Luis Huesca Reynoso Centro.

1st. STATA Group Meeting Mexico
Discussion of user-written Stata programs
Predicting counterfactual densities with the DFL Ado-file:
A pertinent constructive critique.
Luis Huesca Reynoso
Centro de Investigación en Alimentación y Desarrollo, A.C.
Department of Economics. Email: [email protected]
April 23, 2009, Universidad Iberoamericana Campus Mexico.
It is not an easy task dealing with distributions (and so with densities!)
Problems to face:
A. Scale: log or numeric.
B. Comparisson: Unit of measurement (in economics and social sciencies: constant
prices, others.
C. Selection of the right window width (eye-ball sight or the optimal) –check out for
instance bandw by Salgado-Ugarte, Shimizu and TaniuchiD. Joint: Compute them toghether (see for instance nbins or # of grid points in
akdensity).
STATA makes it easier!
Goal.The estimation of kernel density functions and counterfactuals well dimensioned
with a semiparametric technique:
Estimate densities that stands for obtaining the real shape not only for the total
distribution but also for a number of subgroups belonging to the former.
Probability density function (PDF)
Any function, f(y) can serve as a density function as long as:
f ( y )  0,    y  
and


f ( y )dx  1

A general kernel function K(u) to weight the density must then be,



K ( u )du  1
Since then



fˆ ( y ) dy  1
By definition, the sum of the PDF must add to one as so for the Gaussian or any
other nice kernel functions (Duclos, 2001 & Silverman, 1986) –Epanechnikov,
biweight , triangular, cosine kernels for instance-.
Kernel density estimation: Letting the data speak by themselves as follows:
fˆ ( y ) 
1
n
 y  Yi 

 h 
K

h
i 1
With y  ( y1 ,  , y n ) as a vector of earnings, h the optimal window width and K a Gaussian
kernel function.
Following Jenkins and Van Kerm, (2005) for decompositions:
K
f ( y) 

k
k
f ( y)
k 1
as a weigthed sum of the FDPs for each sub-group k, where  stands for the
population share of the group k, and f k as the PDF of the group k.
k
- In the empirical example an adaptive kernel estimator is used (Van Kerm, 2003).
Dinardo, Fortin, Lemiux (1996)
Counterfactual estimation compares the objective variable (depvar) distribution to
the depvar distribution that would have prevailed if they had been paid like the
comparison group (the counterpart).

f ( y ) dy 

f ( y | x ) h ( x | s  A ) dx

f ( y ) dy 

f ( y | x ) h ( x | s  B ) dx
A
B
A
Actual
fB y  

f
B
 y | x h  x | s  B dx
Counterfactual
fA y 

f
B
 y | x h  x | s  A dx
B
B
Actual wage distributions for A and B
B
DFL (1996) rewrite and reweigh the density for B as follows:


f
B
 y | x h  x | s  A 
hA x 
h AB  x 
dx
PA | x
Which can be computed using Bayes’ theorem:
w 
1  P( A | x)
P A

 wf
B
 y | x h  x | s  B dx
In Stata:
w = 1-Prob(Depvar=1)/Prob(Depvar=0)
1  P ( A)
The conditional treatment probability – propensity score – is estimated by the program under a especification
using a logistic regression (DFL command shifts to probit as well). For comparisson I use the pscore
ado file written by Becker & Ichino (2002) which follows the neirest neighbour technique.
Empirical case: (A semi-parametric-approach)
Estimation of the mexican earnings distribution and decompositions by subpopulation of workers in the formal and informal sectors (compliance with social
security coverage).
(Let’s assume that self-selection bias does not affect individual decisions of
worker’s location). Models are estimated separately for each category.
Logit has a practical advantage over probit when the sum of predicted values
equal to the sum of empirically observed values (Butcher and Dinardo, 1998.)
ENEU: Encuesta Nacional de Empleo Urbano (National Survey of Urban
Employment).
Males aging from 16 to 65
Occupations = (1 ,…, 4)
P(S  f ) 
exp( x s  s )
1  exp( x s  s )
1: Formal self-employed
2: Informal self-employed
3: Formal wage-earners
4: Informal wage-earners
Model 1 pooled
Model 2 pooled
Syntax
1. Compute the earnings distribution using DFL command.
dfl depvar indepvars [if exp] [in range] , outcome(varname)
[nbins(integer) w(bandwidth) adaptive gauss quietly probit [logit
default] graph(cfactual) graph_combine axis_selection_options
axis_scale_options title_options
dfl informal esc eda eda2 jefe dmiembros dwmenor drama1 drama3 ///
drama4 dregion1 dregion2 dregion3 dregion4 dregion6 ///
if sex==1 & logitp>=1 & logitp<=2, outcome(logwm) nbins(50) ///
adaptive gauss graph(cfactual)
2. Compute the earnings distribution using do-file.
pscore informalb esc eda eda2 jefe dmiembros dwmenor drama1 drama3 drama4 ///
dregion1 dregion2 dregion3 dregion4 dregion6 if sex==1 & logitp>=1 & logitp<=2, ///
pscore(mypscore) logit level(0.001)
akdensity logwm if sex==1 & logitp==4 [aw = mypscore], gau s(i) ///
gen(hai92c dhai92c) lab var dhai92c “Informal wage-earner"
replace dhai92c = dhai92c*.24
Example with my do-file
.6
.4
.2
0
Density
.8
1
Decomposition of density functions for self-employed
and wage earners, Mexico 1992
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Total
Wage-earners
Figure 1.
Self-employed
12
DFL command
Do file reescaled
Males
0
0
.5
.5
1
Density
Density
1
1.5
2
1.5
Males
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
4
12
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
Counterfactual
Differences
12
Counterfactual
0
-.5
-1
-1
-.5
0
Difference in Densities
.5
.5
Differences
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Figure 2. Wage-earners in Mexico working in a formal world, 1992.
12
Do file reescaled adjusting ranges
0
.1
.2
Density
.3
.4
Males
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
12
Counterfactual
0
-.05
-.1
Diferencia en densidad
.05
.1
Differences
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
Figure 2a. Wage-earners in Mexico working in a formal world, 1992.
DFL command
Do file reescaled
Males
0
0
.2
.2
.4
.6
Density
.4
Density
.6
.8
.8
1
1
Males
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
4
12
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
Counterfactual
12
Counterfactual
Differences
.05
0
-.05
-.1
-.1
-.05
0
Difference in Densities
.05
.1
.1
Differences
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Figure 3. Self-employed in Mexico working in a formal world, 1992.
12
Do file reescaled adjusting ranges
0
.2
Densidad
.4
.6
Males
4
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
12
Counterfactual
.05
0
-.05
-.1
Diferencia en densidad
.1
Differences
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
Figure 3a. Self-employed in Mexico working in a formal world, 1992.
Figure 4. Informal self-employed males in a formal world 2002
DFL command
.6
.4
Density
.4
.2
.2
0
0
2
4
6
8
10
Log of earnings (pesos 2000=100)
Factual
12
2
4
Counterfactual
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
DFL command
12
Counterfactual
.05
0
-.05
0
.1
Diferencia en densidad
.1
.2
Do file rescaled
-.1
-.1
Density
.6
.8
.8
1
Do file rescaled
2
4
6
8
10
Log of earnings (pesos 2000=100)
12
2
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
Figura 5. Informal wage-earner males in a formal world 2002
DFL command
.6
.2
.4
Density
.6
.4
.2
0
0
2
4
6
8
10
Log of earnings (pesos 2000=100)
Factual
12
4
Counterfactual
6
8
10
Log of monthly earnings (pesos 2000=100)
Factual
DFL command
12
Counterfactual
.1
0
-.1
-.1
0
Diferencia en densidad
.1
.2
.2
Do file rescaled
-.2
-.2
Density
.8
.8
1
1
Do file rescaled
2
4
6
8
10
Log of earnings (pesos 2000=100)
12
4
6
8
10
Log of monthly earnings (pesos 2000=100)
12
Conclusions :
DFL user written command is useful just watch out when using subgroups or log scales.
DFL (1996) use the subgroup decomposability property of the aggregate
PDF.
A suggestion when computing densities, consider population shares (if
necessary) to weight them.
The problem of obtaining over-dimensioned densities struggles the
most when dealing with logarithmic scales for data.
For kernel densities the estimation with the adaptive technique is more
time-consuming but seems to be more accurate as well (it works better
without smoothing more than needed).
Adaptive kernel estimation depicts better bimodal or multimodal
distributions
References
Azevedo, Joao Pedro (2005). DiNardo, Fortin and Lemieux Counterfacual Kernel Density –DFL user
written command-”.
Becker, Sascha O., and Andrea Ichino (2002), “Estimation of average treatment effects based on
propensity scores”, The Stata Journal, 2(4), 358-377.
Butcher, K. F. and John Dinardo (1998), “The immigrant and native-born wage distributions: Evidence
from united states census”, NBER Working paper No. 6630.
Dinardo, John, Nicole Fortin, and Thomas Lemieux (1996), “Labor Market Institutions and the Distribution
of Wages, 1973-1992: A semi-parametric approach”, Econometrica, 64(5), 1001-44.
Duclos, Jean-Yves (2001), “Non-parametric estimation for distributive analysis”, Poverty and Equity:
theory and estimation, Departament d’Economia Aplicada, Universitat Autònoma de Barcelona,
mimeo, March, 37-44.
Huesca, Luis and Mario Camberos (2009), "El mercado laboral mexicano 1992 y 2002: Un análisis
contrafactual de los cambios en la informalidad", Economía Mexicana, Vol. XVIII, Núm. 1, primer
semestre, pp. 5-43.
Heckman, James, Ichimura, H. and Todd, P. E. (1998), "Matching as an Econometric Evaluation Estimator",
Review of Economic Studies, 65, 261-294.
Inegi (2006), Encuesta Nacional de Empleo Urbano, 1992 and 2002, ENEU, INEGI, Ags., México, Bases de
datos.
Jenkins, Stephen and Phillipe Van Kerm (2005), “Accounting for income distribution trends: A density
function decomposition approach”, Journal of Economic Inequality, 3, pp. 43-62.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. Chapman and Hall. London.
Van-Kerm, Phillipe (2003), “Adaptive kernel density estimation”, -akdensity- The Stata Journal, 3(2), 14856.