ch16LimitedDepVARS4JUSTTOBIT
Download
Report
Transcript ch16LimitedDepVARS4JUSTTOBIT
ECON 6002
Econometrics
Memorial University of Newfoundland
Limited Dependent Variable Models
Adapted from Vera Tabakova’s notes
Censoring, Truncation, sample selection and related models
We now consider two closely related models:
• regression when the dependent variable of interest is
incompletely observed (due to censoring or
truncation)
• regression when the dependent variable is completely
observed but is observed in a selected sample that is
not representative of the population
Principles of Econometrics, 3rd Edition
Slide16-2
Censoring, Truncation, sample selection and related models
OLS regression yields inconsistent
estimates because the sample is not representative
of the population
The first-generation estimation methods require
strong distributional assumptions and even
seemingly minor departures from those
assumptions, such as heteroskedasticity, can lead to
inconsistency
Principles of Econometrics, 3rd Edition
Slide16-3
16.7.1 Censored Data
Figure 16.3 Histogram of Wife’s Hours of Work in 1975
Principles of Econometrics, 3rd Edition
Slide16-4
Having censored data means that a substantial fraction of the
observations on the dependent variable take a limit value. The
regression function is no longer given by (16.30).
E y | x 1 2 x
(16.30)
The least squares estimators of the regression parameters obtained by
running a regression of y on x are biased and inconsistent—least
squares estimation fails.
Principles of Econometrics, 3rd Edition
Slide16-5
Having censored data means that a substantial fraction of the
observations on the dependent variable take a limit value. The
regression function is no longer given by (16.30).
E y | x 1 2 x
(16.30)
The least squares estimators of the regression parameters obtained by
running a regression of y on x are biased and inconsistent—least
squares estimation fails.
Principles of Econometrics, 3rd Edition
Slide16-6
Censoring occurs when some of the
observations of the dependent variable have
been recorded as having reached a limit value
regardless of what their actual value might be
For instance, anyone earning $1 million or
more per year might be recorded in your
dataset at the upper limit of $1 million
With truncation, we only observe the value of
the regressors when the dependent variable
takes a certain value (usually a positive one
instead of zero)
With censoring we observe in principle the
value of the regressors for everyone, but not
the value of the dependent variable for those
whose dependent variable takes a value
beyond the limit
We give the parameters the specific values and 1 9 and 2 1.
y i 1 2 x i ei 9 x i ei
*
(16.31)
Assume ei ~ N 0, 2 16 .
y i 0 if y i 0;
*
y i y i if y i 0.
*
*
Principles of Econometrics, 3rd Edition
Slide16-9
Create N = 200 random values of xi that are spread evenly (or
uniformly) over the interval [0, 20]. These we will keep fixed in
further simulations.
Obtain N = 200 random values ei from a normal distribution with
mean 0 and variance 16.
Create N = 200 values of the latent variable.
Obtain N = 200 values of the observed yi using
Principles of Econometrics, 3rd Edition
0
yi
*
y i
if y i 0
*
if y i 0
*
Slide16-10
Figure 16.4 Uncensored Sample Data and Regression Function
Principles of Econometrics, 3rd Edition
Slide16-11
Figure 16.5 Censored Sample Data, and Latent Regression Function and
Least Squares Fitted Line
Principles of Econometrics, 3rd Edition
Slide16-12
yˆ i 2.1477 .5161 x i
(se) (.3706)
(.0326)
yˆ i 3.1399 .6388 x i
(se) (1.2055) (.0827)
E M C bk
Principles of Econometrics, 3rd Edition
1
N SAM
N SAM
m 1
bk ( m )
(16.32a)
(16.32b)
(16.33)
Slide16-13
The maximum likelihood procedure is called Tobit in honor of James
Tobin, winner of the 1981 Nobel Prize in Economics, who first
studied this model.
The probit probability that yi = 0 is:
P y i 0 P [ y i 0 ] 1 1 2 x i
2 xi
L 1 , 2 , 1 1
yi 0
Principles of Econometrics, 3rd Edition
1
1
2
2
2
y i 1 2 xi
2 exp
2
yi 0
2
Slide16-14
The maximum likelihood estimator is consistent and asymptotically
normal, with a known covariance matrix.
Using the artificial data the fitted values are:
y i 10.2773 1.0487 x i
(se) (1.0970)
Principles of Econometrics, 3rd Edition
(16.34)
(.0790)
Slide16-15
Principles of Econometrics, 3rd Edition
Slide16-16
E y | x
x
1 2 x
2
(16.35)
Because the cdf values are positive, the sign of the coefficient does
tell the direction of the marginal effect, just not its magnitude. If
β2 > 0, as x increases the cdf function approaches 1, and the slope of
the regression function approaches that of the latent variable model.
Principles of Econometrics, 3rd Edition
Slide16-17
Uncensored mean
Truncated mean
Censored mean
Figure 16.6 Censored Sample Data, and Regression Functions
for Observed and Positive y values
Principles of Econometrics, 3rd Edition
Slide16-18
H O U R S 1 2 E D U C 3 E X P E R 4 A G E 4 K ID SL 6 e
E H O U RS
ED U C
(16.36)
26.66
2 73.29 .3638 26.34
Marginal effect on the observed hours while 73.29 is the effect on the
underlying “unconditional” hours*
*NB: in all cases the expectation is conditional on the values of the regressors, so do not get confused by the terminology here
Principles of Econometrics, 3rd Edition
Slide16-19
Principles of Econometrics, 3rd Edition
Slide16-20
• Estimating the model by OLS with the zero observations in the model
would reduce all of the slope coefficients substantially
• Eliminating the zero observations as in the OLS regression just shown
even reverses the sign of the effect of years of schooling (though it is a
non-significant effect)
• For only women in the labor force, more schooling has no effect on
hours worked
• If you consider the entire population of women, however, more
schooling does increase hours, but we can now see that it is likely by
encouraging more women into the labor force, not by encouraging
those already in the market to work more hours
Principles of Econometrics, 3rd Edition
Slide16-21
http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg
There are several marginal effects of potential interest after -tobit-:
- the marginal effect on the expected value of the latent dependent
variable (on E(y*), simply given by the Tobit estimate)
- the marginal effect on the expected value of the dependent variable
conditional on its being greater than the lower limit (on E(y|x,
y>0)=E(y*|x, y>0))
- the marginal effect on the expected value of the observed (that is zeros
included) dependent variable (on E(y|x), given by Expression 16.35)
- the marginal effect on the probability of the dependent variable
exceeding the lower limit
Principles of Econometrics, 3rd Edition
Slide16-22
http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg
By default Stata chooses the effect on the latent variable
option, which are exactly the same as the coefficients
estimated by -tobit-. You will have to specify the -predict()option in -mfx- to get the other marginal effects. See
help mfxhelp tobit postestimation-
Principles of Econometrics, 3rd Edition
Slide16-23
http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg
- the marginal effect on the expected value of the latent
dependent variable (on E(y*), simply given by the Tobit
estimate)
- the marginal effect on the expected value of the dependent
variable conditional* on its being uncensored, that is,
greater than the lower limit (on E(y|x, y>0)=E(y*|x, y>0))
mfx compute, predict(e(0,.))
mfx compute, predict(e(a,b))
*NB:
in all cases the expectation
is conditional on the values of the regressors, so do not get confused by the terminology hereSlide16-24
Principles
of
Econometrics,
3rd Edition
http://www.stata.com/support/faqs/statistics/mfx-after-ologit/#intreg
- the marginal effect on the expected value of the observed
(that is, zeros included) dependent variable (on E(y|x),
given by Expression 16.35)
mfx compute, predict(ys(0,.))
mfx compute, predict(ys(a,b))
- the marginal effect on the probability of the dependent
variable exceeding the lower limit
- mfx compute, predict(p(0,1))
- mfx compute, predict(p(a,b))
Principles of Econometrics, 3rd Edition
Slide16-25
Interval data are data recorded in intervals rather than as a continuous variable
Survey data are often collected in this way to make it easier for the respondent and
to provide some greater anonymity in responses to more personal question such as
income and age
Income is often reported in intervals of $10,000 and then topcoded at a figure like
$100,000 or $130,000
In contingent valuation studies, sometimes a questions to elicit willingness to pay
ask respondents to choose an interval
Such data are then censored at multiple points, with the observed data y being only the
particular interval in which the unobserved y∗ lies
Principles of Econometrics, 3rd Edition
Slide16-26
Interval data are data recorded in intervals
rather than as a continuous variable
In these cases you have a multi-censored
dependent variable
Principles of Econometrics, 3rd Edition
Slide16-27
Interval
data are data recorded in
intervals rather than as a continuous
variable
STATA’s
intreg will help with this
model
Principles of Econometrics, 3rd Edition
Slide16-28
Interval data are data recorded in intervals rather than as
a continuous variable
In contingent valuation studies, sometimes a doublebound dichotomous-choice questions to elicit willingness
to pay
In these cases you have a doubly-censored dependent
variable with two variable limits
STATA’s intreg will help with this model
Principles of Econometrics, 3rd Edition
Slide16-29
Interval data are data recorded in intervals rather than as a
continuous variable
You are probably guessing that another (less flexible) way
to model these cases is by using an ordered regression
model
The ordered probit in particular would be quite close to the
interval regression model
Principles of Econometrics, 3rd Edition
Slide16-30
Interval data are data recorded in intervals rather than as a
continuous variable
STATA’s intreg will help with this model
Example: http://www.ats.ucla.edu/stat/stata/dae/intreg.htm
Principles of Econometrics, 3rd Edition
Slide16-31
STATA’s intreg will help with this model
intreg depvar1 depvar2 [indepvars] [if] [in] [weight] [, options]
By choosing the depvar1 depvar2 smartly you can also fit other models:
Type of data
depvar1 depvar2
---------------------------------------------point data
a = [a,a]
a
a
interval data
[a,b]
a
b
left-censored data
(-inf,b]
.
b
right-censored data
[a,inf)
a
.
----------------------------------------------
Principles of Econometrics, 3rd Edition
Slide16-32
binary choice models
censored data
latent variables
likelihood function
limited dependent variables
log-likelihood function
marginal effect
maximum likelihood estimation
multinomial choice models
ordered choice models
ordered probit
ordinal variables
probit
tobit model
truncated data
Principles of Econometrics, 3rd Edition
Slide 16-33
Survival analysis (time-to-event data
analysis)
Hoffmann, 2004 for all topics
Long, S. and J. Freese for all topics
Agresti, A. (2001) Categorical Data Analysis
(2nd ed). New York: Wiley.