No Slide Title

Download Report

Transcript No Slide Title

Research Method
Lecture 15-2
Censored regression
©
1
Censored regression
Sometimes, the dependent variable (yvariable) is right-censored (censored from
above) or left-censored (censored from
below).
Example 1: In some household survey,
when the wealth of a family exceeds
$500,000, the data is recorded as $500,000
even if the actual wealth may be much
higher than that amount. This is called the
top coding. Top coding is done to protect
the identity of the survey participants. In
2
this case, wealth is right-censored.
Top-coding example
Wealth
$500,000
When wealth exceeds the
threshold, say $500,000,
the data record it as
$500,000
Educ of
the head
of the
family
3
Example 2: Duration data. Suppose that a
survey is conducted to measure the
duration of unemployed workers to find a
job. If the survey is conducted for 12
months, some survey participants may
not have found a job. For those workers,
the researcher only knows that the
duration is greater than 12 months. Thus,
the duration is right-censored.
4
When a variable is censored from above,
you only know that the variable is at least
equal to the threshold value.
5
The censored regression
model
Here, I will explain the censored
regression model for the case where the
dependent variable is right censored.
Let yi be the actual value of the dependent
variable for the ith person.
However, when yi exceeds certain
threshold, ci, the data is recorded as ci. In
such a case, the observation is said to be
right-censored.
6
Let wi be the observed value of the
dependent variable, which may be
censored. Then the model is written as:
yi=β0+β1xi+ui
u~N(0,σ2)
wi=yi if yi<ci
This can be also written
as wi=min(yi,ci)
=ci if yi≥ci
In the top-coding example, the threshold
value is the same for all the people at
$500,000. But, the threshold value can be
different for different individual. Thus,
we have i-subscript for the threshold
values.
7
It should be emphasized that, in the
censored regression model, ci are known
values. For example, in the top coding
example, it is $500,000 for all the
observations.
8
When the person is not right censored, we
have wi=yi. Thus, we have ui=wi-(β0+β1xi).
Thus, the likelihood contribution is the
height of the density function, which is
given by:
Li 

1
2
2
e
( wi   0  1 xi ) 2
2 2
1 w   x
 ( i 0 1 i )2
2

1 1
1 wi   0  1 xi

e
 (
)
 2

 
 
(
wi   0  1 xi
)

9
If the person is right-censored, we only
know that yi≥ci. Thus, the likelihood
contribution of this person is given by:
Li  P( yi  ci )  P(  0  1 xi  ui  ci )
 P (ui  ci   0  1 xi )
 P(
ui


 1  (
ci   0  1 xi

ci   0  1 xi

)
)
10
To summarize:
Li 
1

(
wi   0  1 xi

)
if theobservation is not cencored
and
Li  1   (
ci   0  1 xi

)
if theobservation is right censored
11
Now, let Di be the dummy variable that
takes the value 1 if the ith person is rightcensored. Then the likelihood
contribution for the ith person is
conveniently written as:
1 Di
 1 wi  0  1xi 
Li    (
)



ci  0  1xi 

)
1  (



Di
12
Note, that computation-wise, Tobit model
is the same as the censored regression
model where the actual dependent
variable is left-censored, and ci=0 for all
observations.
13
The partial effect
In censored regression model, our interest is to
estimate the effect of x-variable on y, not w.
Since β1 is the partial effect of x on y, you can
use β1 as the partial effect. No difficult
computation is necessary. You can interpret the
coefficients as if it were OLS.
This is very different from the Tobit regression
model. In Tobit model, our interest is to estimate
the effect of x on w, not y. Thus, we had a very
complicated partial effect formula in the case of
Tobit.
14
Exercise:
Duration analysis of recidivism
Recidivism is an act of a person repeating
an undesirable behavior.
The data RECID.dta contains the duration
(in month) until an inmate who are
released from the prison is imprisoned
again.
1445 released inmates were followed for a
certain period of time.
15
Among them, 893 of them had not been arrested
again. Thus, the duration for those inmates are
right-censored: We only know that the duration
until they would come back to prison is at least
as long as the recorded duration.
Now, we want to estimate the determinants of
the duration of prisoner recidivism.
Although modern duration analysis is mostly
conducted using “hazard function analysis” or
“survival function analysis”, censored
regression is also a valid model for a duration
analysis.
16
Using RECID.dta, estimate the censored
regression model of the duration of
recidivism. Explanatory variables are
workprog, priors, tserved, felon, alcohol
drugs, black, married, educ age. Use the
log of duration as the dependent variable.
17
. cnreg ldurat workprg priors tserved felon alcohol drugs black married educ age, censored(cens)
Censored-normal regression
Number of obs
LR chi2(10)
Prob > chi2
Pseudo R2
Log likelihood = -1597.059
ldurat
Coef.
workprg
priors
tserved
felon
alcohol
drugs
black
married
educ
age
_cons
-.0625715
-.1372529
-.0193305
.4439947
-.6349092
-.2981602
-.5427179
.3406837
.0229196
.0039103
4.099386
.1200369
.0214587
.0029779
.1450865
.1442166
.1327355
.1174428
.1398431
.0253974
.0006062
.347535
/sigma
1.81047
.0623022
Observation summary:
Std. Err.
t
-0.52
-6.40
-6.49
3.06
-4.40
-2.25
-4.62
2.44
0.90
6.45
11.80
P>|t|
0.602
0.000
0.000
0.002
0.000
0.025
0.000
0.015
0.367
0.000
0.000
=
=
=
=
1445
166.74
0.0000
0.0496
[95% Conf. Interval]
-.2980382
-.1793466
-.0251721
.1593903
-.9178072
-.5585367
-.7730958
.066365
-.0269004
.0027211
3.417655
.1728951
-.0951592
-.013489
.7285991
-.3520113
-.0377837
-.31234
.6150024
.0727395
.0050994
4.781117
1.688257
1.932683
0 left-censored observations
552
uncensored observations
893 right-censored observations
Put the
censoring
indicator. This
indicator must
be 1 if right
censored, -1 if
left censored,
and 0 if
uncensored.
18