Transcript + b 2 N + u

COST
DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES
b 1+d
Occupational schools
Regular schools
d
b1
N
Combined equation
COST = b1 + d OCC + b2N + u
OCC = 0 Regular school
COST = b1 + b2N + u
OCC = 1 Occupational school
COST = b1 + d + b2N + u
Example: the cost of running a school depends on the number of pupils, but it also
depends on whether the school is an occupational school.
Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost
function becomes that for regular schools. If OCC is equal to 1, the cost function
becomes that for occupational schools.
11
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u
General School
COST = b1 + b2N + u
(TECH = WORKER = VOC = 0)
Technical School
COST = (b1 + dT) + b2N + u
(TECH = 1; WORKER = VOC = 0)
Skilled Workers’ School
COST = (b1 + dW) + b2N + u
(WORKER = 1; TECH = VOC = 0)
Vocational School
COST = (b1 + dV) + b2N + u
(VOC = 1; TECH = WORKER = 0)
Now the qualitative variable has four categories. The standard procedure is to
choose one category as the reference category and to define dummy variables for
each of the others.
Note: you must leave out the reference category, otherwise your model will be
perfectly collinear!
16
COST
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Technical
b1+dT
b1+dW
b1+dV
b1
Workers’
Vocational
dW
dT
dV
General
N
The diagram illustrates the model graphically. The d coefficients are the extra
overhead costs of running technical, skilled workers’, and vocational schools,
relative to the overhead cost of general schools.
17
COST
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Technical
b1+dT
b1+dW
b1+dV
b1
Workers’
Vocational
dW
dT
dV
General
N
We chose general academic schools as the reference (omitted) category and
defined dummy variables for the other categories. This means that we can only
compare other schools to general schools, and not to each other.
17
COST
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES
Technical
b1+dT
b1+dW
b1+dV
b1
Workers’
Vocational
dW
dT
dV
General
N
However, suppose that we were interested in testing whether the overhead costs
of skilled workers’ schools were different from those of the other types of school.
How could we do this? It is simplest to re-run the regression making skilled
workers’ schools the reference category.
17
TWO SETS OF DUMMY VARIABLES
COST = b1 + d OCC + e RES + b2N + u
Regular, nonresidential
COST = b1 + b2N + u
(OCC = RES = 0)
Regular, residential
COST = (b1 + e ) + b2N + u
(OCC = 0; RES = 1)
Occupational, nonresidential
COST = (b1 + d ) + b2N + u
(OCC = 1; RES = 0)
Occupational, residential
COST = (b1 + d + e ) + b2N + u
(OCC = RES = 1)
The explanatory variables in a regression model may include multiple sets of
dummy variables. Now you need to think about every combination, and the
reference category is the one in which all dummy variables are zero.
7
TWO SETS OF DUMMY VARIABLES
COST = b1 + d OCC + e RES + b2N + u
Regular, nonresidential
COST = b1 + b2N + u
(OCC = RES = 0)
Regular, residential
COST = (b1 + e ) + b2N + u
(OCC = 0; RES = 1)
Occupational, nonresidential
COST = (b1 + d ) + b2N + u
(OCC = 1; RES = 0)
Occupational, residential
COST = (b1 + d + e ) + b2N + u
(OCC = RES = 1)
In the case of a non-residential occupational school, RES is 0 and OCC is 1, so
the overhead cost increases by d. If the school is both occupational and
residential, it increases by (d + e).
7
COST
TWO SETS OF DUMMY VARIABLES
Occupational, residential
Occupational,
nonresidential
e
b 1+d +e
d
b 1+d
d +e
Regular,
residential
e
b 1+e
b1
Regular, nonresidential
N
The diagram illustrates the model graphically. Note that the effects of the different
components of the model are assumed to be separate and additive in this
specification. In particular, we are assuming that the extra overhead cost of a
residential school is the same for regular and occupational schools: there is no
interaction effect.
8
SLOPE DUMMY VARIABLES
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
-100000
N
Occupational schools
Regular schools
The specification of the model incorporates the assumption that the marginal cost
per student is the same for occupational and regular schools. Hence the cost
functions have the same slope: the same coefficient on N. This is a restriction we
have placed on the model.
2
SLOPE DUMMY VARIABLES
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
-100000
N
Occupational schools
Regular schools
This is not a realistic assumption. Occupational schools incur expenditure on
training materials that is related to the number of students. Also, the staff-student
ratio has to be higher in occupational schools.
3
SLOPE DUMMY VARIABLES
700000
600000
500000
COST
400000
300000
200000
100000
0
0
200
400
600
800
1000
1200
1400
-100000
N
Occupational schools
Regular schools
Looking at the scatter diagram, you can see that the cost function for the
occupational schools should be steeper, and that for the regular schools should be
flatter. The two lines should have different slopes.
5
SLOPE DUMMY VARIABLES
COST = b1 + d OCC + b2N + lNOCC + u
Regular school
COST = b1 + b2N + u
(OCC = NOCC = 0)
Occupational school
COST = (b1 + d ) + (b2 + l)N + u
(OCC = 1; NOCC = N)
We will relax the assumption of the same marginal cost by introducing what is
known as a slope dummy variable. This is NOCC, defined as the product of N and
OCC.
For example, in the case of an occupational school, OCC is equal to 1 and NOCC
is equal to N. The equation simplifies as shown.
8
COST
SLOPE DUMMY VARIABLES
Occupational
l
b1 +d
d
Regular
b1
N
The diagram illustrates the model graphically.
10
INTERACTING DUMMY VARIABLES
LGEARN = b1 + b2S + d F + W + lFW + u
Black male
LGEARN = b1 + b2S + u
(F = W = 0)
White male
LGEARN = b1 + b2S + W + u
(F = 0; W = 1)
Black female
LGEARN = b1 + b2S + d F + u
(F = 1; W = 0)
White female
LGEARN = b1 + b2S + d F + W + lFW + u
(F = W = 1)
If we interact dummy variables, we get new dummy variables, but we must
interpret carefully. The reference category is obtained by setting all dummies equal
to zero. Then write down the earnings function for each subgroup separately to
make the effects of various coefficients clear.
7
Copyright Christopher Dougherty 2000–2006. This slideshow may be freely copied for
personal use.
24.06.06