TRUNCATED AND CENSORED VARIABLES

Download Report

Transcript TRUNCATED AND CENSORED VARIABLES

Estimation taking account of
sample selection with Stata
Cheti Nicoletti
ISER, University of Essex
2009
• Estimation commands:
truncreg, tobit,
heckman, heckprobit,
treatreg, ivreg
• Other useful commands:
ivprobit, ivtobit
• Useful option in the estimation commands:
pweights
truncreg
• The truncreg command is useful to estimate regression
models with a truncated sample
• Ex: Health insurance claims observed only when amount
claimed is higher than a fixed threshold.
y *  x    iid N (0,  2 )
we observ ey * only if y * c so that E(Y Y  c)  x    
  
c  x
where    
and  
1   

We observ e y  y * y* c ~ TruncatedNormal
truncreg y x1 x1 x2 … xk , ll(c)
tobit
• The tobit command is useful to estimate
regression models with a censored
dependent variable (deterministic censure)
• 3 Different types of models:
Tobit with fixed censoring value (tobit)
Censored regression with varying censoring
value (cnreg)
Regression with interval data (intreg)
tobit
• Tobit first type (consumption of a good)
y *  x    iid N (0,  2 )
*
*

y
if
y
0

y
*
0
if
y
0


tobit y x1 x2 … xk , ll(0)
*
*

y
if
y
c

y2  
*
c
if
y
c


tobit y x1 x2 … xk , ul(c)
cnreg
• Tobit first type
Ex. minimum wage with different levels in different years
y *  x    iid N (0,  2 )
 0 if yi  ci
 yi* if y *  c i
yi  
i is the indiv idualindex d i  
*
1 otherwise
ci if yi  ci
*
• cnreg y x1 x2 … xk censored(d)
intreg
• Interval data regression (Ex:Bracket information on
income for people refusing to give the exact value)
• Whet yi* is not declared we observe the range to which yi* belong
(0, 5000], (5000,15000], (15000,30000], (30000,+∞] say (ai, bi]
y  x    iid N (0,  )
*
2
1 if the exact v aluedof y * is declared
d 
0 otherwise
L
n

i 1
di
 1

 1  yi *  xi   
   bi
exp  

    
2


 
 2
 2
   
2

 ai
  





1 d i
Estimating the regression with
interval data in Stata
The command intreg needs two variables to define the
dependent variable, say y1 and y2
Individuals giving
An exact value of their income
Example
A range for their income
Example
Example
y1
y*
5980
y* in (a,b)
y2
y*
y*
5980 5980
a
b
(5000, 15000] 5000 15000
(30000, +∞] 30000
.
intreg y1 y2 x1 x2 … xk
heckman
• The heckman command is used to estimate Generalized Tobit or
Tobit of the 2nd type using ML estimation (default option) or the twostep estimation (option [twostep])
y *  x    iid N (0,  2 )
d *  z  v where v iid N (0,1)
*
*
*


1
if
d
 0 for respondents
 y if d  0

y
d 
*

0
if
d
 0 otherwise
.
otherwise



heckman y x1 x2 … xk, select(z1 z2 … zs)
heckman y x1 x2 … xk, select(d = z1 z2 … zs)
heckman y x1 x2 … xk, select(z1 z2 … zs) twostep
heckprobit
• The heckman command is used to estimate a probit
model with selection (option twostep does not exist
because inconsistent)
if y*  0
1
y  x    iid N (0,1) p*  
 0 otherwise
*
d *  z  v where v iid N (0,1)
 p * if d *  0
p
 . otherwise
*

1
if
d
 0 if indiv idual i - th responds

d 
*
0
if
d
 0 otherwise


heckprobit p x1 x2 … xk, select(z1 z2 … zs)
L    2 ( X i  , Z i ,  )  2 ( X i  , Z i ,  )
pi
i
 (  Z  )
1 pi di
1di
i
Impact of an endogenous dummy
Homogenous treatment effect
y1= earnings for trained people
y0= earnings for non-trained people
d dummy indicating participation to the training program
y=y1 d +y0 (1-d)
y=x+  d+
d*=z  +u where d=l(d*>0)
 0  2  u  
 
 
u  iid N  0, 
1 
 
  
We have a selection problem because of the correlation
between u and . This implies that d is not independent of .
treatreg
• The treatreg command is used to evaluate the effect of a
endogenous binary variables (treatment, program, …) on
a continuous variable of interest (see previous slide).
treatreg y x1 x2 … xk , treat(d=z1 z2 … zs)
• Ex: Sample of graduated students with and without a
master degree
• y=log earnings, d=1 if master degree, 0 otherwise
• x = age, age square, d, sex, type first degree
• z = mother’s level of education, father’s level of
education, sex, type first degree
How to use weights in Stata
• Most Stata commands can deal with weighted data.
Stata allows four kinds of weights:
1. fweights, or frequency weights, are weights that
indicate the number of duplicated observations.
2. pweights, or sampling weights, are weights that
denote the inverse of the probability that the
observation is included due to the sampling
design and or nonresponse.
3. aweights, or analytic weights, are weights that are
inversely proportional to the variance of an
observation; i.e., the variance of the j-th observation
is assumed to be sigma^2/w_j, where w_j are the
weights.
4. iweights, or importance weights, are weights that
indicate the "importance" of the observation in some
vague sense.
Option pweights
• Usually sample surveys provide weights to take account of sampling
design and nonresponse.
• Let p be individual weight
• Then we can run a regression with weighted observations
regress y x1 x2 … xk [pweight=p]
• Let us assume to have a sample with a sample selection problem
(due to observables), then we can use propensity score weighting
• A possible “simplified” way to estimate your own weights is
described in the following:
probit d z1 z2 … zs
predict prop
gen invprop=1/prop
reg y x1 x2 … xk [pweight=invprop]
For complex survey design it is
better to use
• svyset [pweight=p]
• svy: regress y x1 x2 … xk
• svyset have options for cluster sampling
designs or other complex design
• Declare survey design for dataset
• svyset [pweight=p], strata(stratid)
ivreg
• The ivreg command is used to estimate regression
model by using instrumental variables for potential
endogenous explanatory variables.
• Evaluation of the impact of years of schooling on
earnings
y=x+  d*+
Problem: d* and  are correlated
Solution 1: IV estimation ( IV=z: parental interest in the
child education, bad financial shock of the family
when the child is age 11-16, presence of older
siblings, Blundell et al 2003)
ivreg y x1 x1 x2 … xk (d*=z1 z2 … zs)
STATA program for evaluation
Abadie A., Drukker D., Herr J.L., Imbens G.W. (2001),
Implementing Matching Estimators for Average Treatment
Effects in Stata, The Stata Journal, 1, 1-18
http://ksghome.harvard.edu/~.aabadie.academic.ksg/software.html
Becker S.O., Ichino A. (2002), Estimation of average treatment
effects based on propensity scores. The Stata Journal, 2, 358377 http://www.lrz-muenchen.de/~sobecker/pscore.html
Sianesi B. (2001), Implementing Propensity Score Matching
Estimators with STATA, UK Stata Users Group, VII Meeting
London, http://ideas.repec.org/c/boc/bocode/s432001.html
Text Book References:
• Amemiya T. (1985), Advanced Econometrics, Basil Blackwell,
Oxford.
• Gourieroux C. (2000), Econometrics of Qualitative Dependent
Variables, Cambridge University Press, Cambridge.
• Greene W.H. (2000), Econometric Analysis, Third edition, Prenticehall, London.
• Maddala G. S. (1983), Limited-Dependent and Qualitative Variables
in Econometrics, Cambridge University Press, Cambridge.
• Wooldridge J.M. (2002), Econometric Analysis of Cross-Section and
Panel Data, MIT press
• Lee M. (2005) Micro-Econometrics for policy, program and treatment
effects. Advanced Text in Econometrics. Oxford University Press,
Oxford
Survey Articles:
•
•
•
•
•
•
•
•
Angrist J. (2001), Estimation of Limited-Dependent Variable Models with
Binary Endogenous Regressors: Simple Strategies for Empirical Practice,”
Journal of Business and Economic Statistics, 19, 2-28.
Angrist J.D., Krueger A.B. (1999), Empirical strategies in labor economics,
published as working paper Princeton University, 401, and in O. Ashenfelter
and D. Card, eds., Handbook of Labor Economics, Volume 3A, Amsterda,,
1277-1366.
Blundell R., Costa-Dias M. (2002), Alternative approaches to evaluation in
empirical microeconomics', published as IFS, Cemmap working paper, 10,
and in Portuguese Economic Journal, Vol.1, 91-115, 2002.
Blundell R., Powell J.L. (2001), Endogeneity in nonparametric and
semiparametric regression models, IFS, Cemmap working paper, CWP09/01,
Chapter 8 in Advances in Economics and Econometrics , M. Dewatripont,
Hansen, L. and S. J. Turnsovsky (eds.), Cambridge University Press, ESM
36, pp 312-357,2003.
Heckman J.J., Ichimura H., Smith J.A., Todd P. (1998), Characterization of
Selection Bias Using Experimental Data, Econometrica, 66, 1017-1098.
Heckman J.J., LaLonde R.J., Smith J.A. (2000), The economics and
econometrics of active labor market programs, in O. Ashenfelter and D. Card,
(eds.), Handbook of Labor Economics, vol. 3, North Holland, Amsterdam.
Moffitt R. (2004), An introduction to the symposium of matching
econometrics, Review of Economics and Statistics, vol. 1, a collection of
articles on matching by various authors.
Vella F. (1998), Estimating models with sample selection bias: a survey', The
Journal of Human Resources, vol. 3, 127-169.