Transcript Chapter 8

Chapter 8
Dummy Variables and Truncated Variables
What is in this Chapter?
• This chapter relaxes the assumption made
in Chapter 4 that the variables in the
regression are observed as continuous
variables.
– Differences in intercepts and/or slope
coefficients
– The linear probability model and the logit and
probit models.
– Truncated variables, Tobit models
8.2 Dummy Variables for Changes
in the Intercept Term
• Note that the slopes of the regression lines for
both groups are roughly the same but the
intercepts are different.
• Hence the regression equations we fit will be
8.2 Dummy Variables for Changes
in the Intercept Term
• These equations can be combined into a single
equation
where
• The variable D is the dummy variable.
• The coefficient of the dummy variable measures
the difference in the two intercept terms
8.2 Dummy Variables for Changes
in the Intercept Term
8.2 Dummy Variables for Changes
in the Intercept Term
• If there are more groups, we have to introduce more
dummies.
• For three groups we have
• These can be written as
where
8.2 Dummy Variables for Changes
in the Intercept Term
•
•
1.
2.
3.
As yet another example, suppose that we have data
on consumption C and income Y for a number of
households.
In addition, we have data on
S: the sex of the head of the household.
A: the age of the head of the household, which is given
in three categories, <25 years, 25 to 50 year, and >50
years.
E: the education of the head of the household, also in
three categories, <high school, ≧high school but <
college degree, ≧ college degree.
8.2 Dummy Variables for Changes
in the Intercept Term
• We include these qualitative variable in the form
of dummy variables
8.2 Dummy Variables for Changes
in the Intercept Term
• For each category the number of dummy
variables is one less than the number of
classifications.
• Then we run the regression equation
C     Y   1D1 2 D 2  3 D 3  4 D 4  5 D 5  u
• The assumption made in the dummy variable
method is that it is the intercept that changes for
each group but not the slope coefficients (i.e.
coefficients of Y).
8.2 Dummy Variables for Changes
in the Intercept Term
• The dummy variable method is also used
if one has to take care of seasonal factors.
• For example, if we have quarterly data on
C and Y, we fit the regression equation
8.2 Dummy Variables for Changes
in the Intercept Term
• If we have monthly data, we use 11seasonal
dummies
• If we feel that, say, December (because of
Christmas shopping) is the only moth with strong
seasonal effect, we use only one dummy
variable
8.2 Dummy Variables for Changes
in the Intercept Term
• Two More Illustrative Examples
• We will discuss two more examples using
dummy variables.
• They are meant to illustrate two points worth
noting, which are as follows:
– 1. In some studies with a large number of dummy
variables it becomes somewhat difficult to interpret
the signs of the coefficients because they seem to
have the wrong signs. (The first example)
– 2. Sometimes the introduction of dummy variables
produces a drastic change in the slope coefficient.
(The second example)
8.2 Dummy Variables for Changes
in the Intercept Term
• The first example is a study of the determinants of
automobile prices.
• Griliches regressed the logarithm of new passenger car
prices on various specifications. The results are shown
in Table 8.1
• Since the dependent variable is the logarithm of price,
the regression coefficients can be interpreted as the
estimated percentage change in the price for a unit
change in a particular quality, holding other qualities
constant
• For example, the coefficient of H indicates that an
increase in 10 units of horsepower, results in a 1.2
increase in price
8.2 Dummy Variables for Changes
in the Intercept Term
• As another example consider the estimates of
liquid-asset demand by manufacturing
corporations
• Vogel and Maddala computed regressions of the
form log C =α +ß log S, where C is the cash and
S the sales, on the basis of data from the
Internal Revenue Service, "Statistics of Income,"
for the year 1960-1961.
• The data consisted of 16 industry subgroups
and 14 size classes, size being measured by
total assets.
8.2 Dummy Variables for Changes
in the Intercept Term
• The equations were estimated separately for
each industry, the estimates of β ranged from
0.929 to 1.077.
• The R2’s were uniformly high, ranging from
0.985 to 0.998.
• Thus one might conclude that the sales elasticity
of demand for cash is close to 1.
• Also, when the data were pooled and a single
equation estimated for the entire set of 224
observations, the estimate of β was 0.992 and
R2=0.897.
8.2 Dummy Variables for Changes
in the Intercept Term
• When industry dummies were added, the
estimate of β was 0.995 and R2=0.992.
• From the high R2’s and relatively constant
estimate of β one might be reassured that the
sales elasticity is very close to 1.
• However, when asset-size dummies were
introduced, the estimate of β fell to 0.334 with R2
of 0.996.
• Also, all asset-size dummies were highly
significant.
8.2 Dummy Variables for Changes
in the Intercept Term
• The situation is described in Figure 8.2.
• That the sales elasticity is significantly less than
1 is also confirmed by other evidence.
• This example illustrates how one can be very
easily misled by high R2’s and apparent
constancy of the coefficients.
8.2 Dummy Variables for Changes
in the Intercept Term
8.3 Dummy Variables for Changes
in Slope Coefficients
and
• We can write these equations together as
or
8.3 Dummy Variables for Changes
in Slope Coefficients
where
0
D1 
1
for all observations in the first group
for all observations in the second group
 0 for all observations in the first group
D2 
 x 2 i.e., the respective value of x for the second group
• The coefficient of D1 measures the difference in the intercept
terms and coefficient of D2 measures the difference in the
slope.
8.3 Dummy Variables for Changes
in Slope Coefficients
• Suitable dummy variables can be defined when
there are change in slopes and intercepts at
different times.
• Suppose that we have data for three periods
and in the second period only the intercept
changed ( there was a parallel shift).
• In the third period the intercept and the slope
changed.
8.3 Dummy Variables for Changes
in Slope Coefficients
• Then we write
• Then we can combine these equations and write
the model as
8.3 Dummy Variables for Changes
in Slope Coefficients
8.3 Dummy Variables for Changes
in Slope Coefficients
• An alternative way of writing the equations (8.5),
which is very general, is to stack the y variables
and the error terms in columns.
• Then write all the parameters α1, α2 , α3 , β1 , β2
down with their multiplicative factors stacked in
columns as follows:
8.3 Dummy Variables for Changes
in Slope Coefficients
• What this says is
where ( ) is used for multiplication, e.g.,
α3(0)=α3×0.
8.3 Dummy Variables for Changes
in Slope Coefficients
where the definitions of D1, D2, D3, D4, D5 are
clear from equation(8.7).
• For instance,
8.3 Dummy Variables for Changes
in Slope Coefficients
• Note that equation (8.8) has to be
estimated without a constant term.
• In this method we define as many dummy
variables as there are parameters to
estimate and we estimate the regression
equation with no constant term.
• Note that equations (8.6) and (8.8) are
equivalent.
8.7 Dummy Dependent Variables
• Until now we have been considering models where the
explanatory variables are dummy variables.
• We now discuss models where the explained variable is
a dummy variable.
• This dummy variable can take on two or more values but
we consider here the case where it takes on only two
values, zero or 1.
• The linear probability model, logit and probit models
8.8 The Linear Probability Model and
the Linear Discriminant Function
The Linear Probability Model
• Similarly, in an analysis of bankruptcy of firms, we
define
• We write the model in the usual regression
framework as
y i   xi  u i
with E(ui)=0.
(8 . 11 )
8.8 The Linear Probability Model and
the Linear Discriminant Function
• The condition expectation E ( y i x i ) is equal to  x i .
• This has to be interpreted in this case as the
probability that the even will occur given the xi.
• The calculated value if y from the regression
equation (i.e., yˆ i  ˆ x i ) will then give the
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Since yi takes the value 1 or zero, the errors in
equation (8.11) can take only two values, (1-βxi) and
(-βxi).
• Also, with the interpretation we have given equation
(8.11), and the requirement that E(ui)=0, the
respective probabilities of these events are βxi and
(1-βxi).
• Thus we have
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Hence
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Because of this heteroskedasticity problem the
OLS estimates of β from equation (8.11) will not
be efficient.
• We use the following two-step procedure:
• First estimate (8.11) by least squares.
• Net compute yˆ i (1  yˆ i ) and use weighted least
squares, that is, defining
wi 
• We regress
y i / wi on x i / wi
yˆ i (1  yˆ i )
8.8 The Linear Probability Model and
the Linear Discriminant Function
The problems with this procedure are
yˆ i (1  yˆ i ) in practice may be negative, although in
1.
large samples this will be so with a very small
probability since yˆ i (1  yˆ i ) is a consistent estimator
for E ( yˆ i )([ 1  E ( yˆ i )]) .
8.8 The Linear Probability Model and
the Linear Discriminant Function
2. The most important criticism is with the
formulation itself: that the conditional
expectation E ( y x ) be interpreted as the
probability that the even will occur. In many
case E ( y x ) cases lie outside the limits (0, 1).
i
i
i
i
8.8 The Linear Probability Model and
the Linear Discriminant Function
8.8 The Linear Probability Model and
the Linear Discriminant Function
The Linear Discriminant Function
• Suppose that we have n individuals for whom we
have observations on k explanatory variables
and we observe that n1 of them belong to a
second group  2 where n1+n2=n.
• We want to construct a linear function of the k
variables that we can use to predict that a new
observation belongs to one of the twp groups.
• This linear function is called the linear
discriminant function.
8.8 The Linear Probability Model and
the Linear Discriminant Function
• As an example suppose that we have data on a
number of loan applicants and we observe that
n1 of them were granted loans and n2 of them
were denied loans.
• We also have the socioeconomic characteristics
on the applicants
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Let us define a linear function
• Then it is intuitively clear that to get the best
discrimination between the two groups, we
would want to choose the  i that the ratio
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Fisher suggested an analogy between this
problem and multiple regression analysis.
• He suggested that we define a dummy variable
8.8 The Linear Probability Model and
the Linear Discriminant Function
• Now estimate the multiple regression equation
• Get the residual sum of squares RSS.
• Then
• Thus, once we have the regression coefficients and
residual sum of squares from the dummy dependent
variable regression, we can very easily obtain the
discriminant function coefficients.
Discriminant Analysis
• Discriminant analysis attempts to classify
customers into two groups:
– those that will default
– those that will not
• It does this by assigning a score to each
customer
• The score is the weighted sum of the customer
data:
Discriminant Analysis
• Here, wi is the weight on data type i, and Xi,c, is
one piece of customer data.
• The values for the weights are chosen to
maximize the difference between the average
score of the customers that later defaulted and
the average score of the customers who did not
default
Discriminant Analysis
• The actual optimization process to find
the weights is quite complex
• The most famous discriminant scorecard
is Altman's Z Score.
• For publicly owned manufacturing firms,
the Z Score was found to be as follows:
Discriminant Analysis
Discriminant Analysis
• A company scoring less than 1.81 was
"very likely" to go bankrupt later
• A company scoring more than 2.99 was
"unlikely" to go bankrupt.
• The scores in between were considered
inconclusive
8.9 The Probit and Logit Models
• An alternative approach is to assume that we
have a regression model
where y i* is not observed.
• It is commonly called a “latent” variable.
• What we observe is a dummy variable yi defined
by
8.9 The Probit and Logit Models
• For instance, if the observed dummy variable is
*
y
whether or not the person is employed, i would be
defined as “propensity or ability to find employment.”
• Similarly, if the observed dummy variable is whether
*
y
or not the person has bought a car, then i would
be defined as “desire or ability to buy a car.”
• Note that in both the examples we have given, there
is “desire” and “ability” involved.
• Thus the explanatory variables in (8.12) would
contain variables that explain both these elements.
8.9 The Probit and Logit Models
• The probit and logit model differ in the
specification of the distribution of the error term
u in equation (8.12).
• There are now several computer programs
available for probit and logit analysis, and these
programs are very inexpensive to run.
• The difference between the specification (8.12)
and the linear probability model
– the existence of an underlying latent variable for
which we observe a dichotomous realization.
8.9 The Probit and Logit Models
Illustrative Example
• As an illustration, we consider data on a sample
of 750 mortgage applications in the Columbia,
SC, metropolitan area.
• There were 500 loan applications accepted and
250 loan applications rejected.
• We define
8.9 The Probit and Logit Models
• Three model were estimated: the linear probability
model, the logit model, and the probit model.
• The explanatory variables were:
AI =applicant’s and coapplicant’s income (103 dollars)
XMD=debt minus mortgage payment (103 dollars)
DF=dummy variable,1 for female, 0 for male
DR=dummy variable,1 for nonwhite, 0 for white
DS=dummy variable,1 for single, 0 for otherwise
DA=age of house (102 dollars)
8.9 The Probit and Logit Models
NNWP= percent nonwhite in the neighborhood (×103)
NMFI=neighborhood mean family income (105dollars)
NA=neighborhood average age of house (102 years)
The results are presented in Table 8.3.
8.9 The Probit and Logit Models
8.9 The Probit and Logit Models
Measure Goodness of Fit
• There is a problem with the use of
conventional R2-type measures when the
explained variable y takes on only two
values.
• The predicted values yˆ are probabilities
and the actual values y are either 0 or 1.
• We can also think of R2 in term of the
proportion of correct predictions.
8.9 The Probit and Logit Models
• Since the dependent variables is a zero or 1
variable, after we computer the yˆ i we classify
the i-th observation as belonging to group 1 if yˆ i
<0.5 and group 2 if yˆ i >0.5.
• We can then count the number of correct
predictions.
• We can define a predicted value yˆ i* , which is
also a zero-one variable such that
8.9 The Probit and Logit Models
• (Provided that we calculate yi to enough
decimals, ties will be very unlikely.)
• Now define
Type I error vs. type II error
• Limitation of the above count R2
• Default prediction
– The costs of a type I error: classifying a
subsequently failing firm as non-failed
– The type II error: classifying a subsequently
non-failed firm as failed
Type I error vs. type II error
• In particular, in the first case, the lender can lose
up to 100% of the loan amount while, in the
latter case, the loss is just the opportunity cost of
not lending to that firm
• Accordingly, in assessing the practical utility of
failure prediction models, banks pay more
attention to the misclassification costs involved
in type I rather than type II errors.
8.11 Truncated Variables: The Tobit
Model
• In our discussion of the logit and probit models
we talked about a latent variable y i* which was
not observed, for which we could specify the
regression model
y i   xi  u i
*
( 8 . 18 )
• In the logit and probit models, what we observe
is a dummy variable
8.11 Truncated Variables: The Tobit
Model
*
*
• Suppose, however, that y i is observed if y i >0
*
y
and is not observed if i ≦0.
• Then the observed yi will be defined as
8.11 Truncated Variables: The Tobit
Model
• This is known as the tobit model (Tobin’s probit)
and was first analyzed in the econometrics
literature by Tobit.
• It is also known as a censored normal
regression model because some observations
on y* (those for which y* ≦ 0) are censored (we
are not allowed to see them).
• Our objective is to estimate the parameters β
and σ.