NBER Digest, and Car Discounts

Download Report

Transcript NBER Digest, and Car Discounts

General Qualitative Data,
and “Dummy Variables”
• How might we have represented “make-of-car” in the motorpool case,
had there been more than just two makes?
– Assume that Make takes four categorical values (Ford, Honda, BMW, and
Sterling).
•
•
•
•
Choose one value as the “foundation” case.
Create three 0/1 (“yes”/”no”, so-called “dummy”) variables for the other three cases.
These three variables jointly represent the four-valued qualitative Make variable.
Here are the details.
• We’ll use this representational trick in order to include “day of game”
(either Friday, Saturday, or Sunday) in a model which predicts attendance
at a professional indoor soccer team’s home games. Here is the example.
– Using this trick requires that we extend the “significance level” (with respect
to whether a variable “belongs” in the model) to groups of variables. This is
done via “analysis of variance” (ANOVA).
Discounts on Car Purchases: Does
Salesperson Identity Matter?
• Assume there are five salesfolks:
• Andy, Bob, Chuck, Dave and Ed
• Take one (e.g., Andy) as the foundation case, and add
four new “dummy” variables
•
•
•
•
DB = 1 only if Bob, 0 otherwise
DC = 1 only if Chuck, 0 otherwise
DD = 1 only if Dave, 0 otherwise
DE = 1 only if Ed, 0 otherwise
• The coefficient of each (in the most-complete model)
will differentiate the average discount that each
salesperson gives a customer from the average
discount Andy would give the same customer
Does Salesperson Identity Matter?
Imagine that , after adding the new variables (four new columns
of data) to your model, the regression yields:
Discountpred = 980 + 9.5  Age – 0.035  Income + 446  Sex
+ 240  DB + (–300)  DC + (–50)  DD + 370  DE
• With similar customers, you’d expect Bob to give a discount
$240 higher than would Andy
• With similar customers, you’d expect Chuck to give a discount
$300 lower than would Andy, $540 lower than would Bob,
and also lower than would Dave (by $250) and Ed (by $670)
Does “Salesperson” Interact with “Sex”?
• Are some of the salesfolk better at selling to a particular Sex of customer?
– Add DB, DC, DD, DE, and DBSex, DCSex, DDSex, DESex to the model
– Imagine that your regression yields:
Discountpred = 980 + 9.5  Age - 0.035  Income + 446  Sex
+ 240  DB – 350  DC + 75  DD + 10  DE
– 375  (DBSex) – 150  (DCSex) – 50  (DDSex) + 450  (DESex)
– Interpret this back in the “conceptual” model:
Discountpred = 980 + 9.5  Age – 0.035  Income + 446  Sex
+ (240 – 375Sex)  DB + (–350 – 150Sex)  DC
+ (75 – 50Sex)  DD + (10 + 450Sex)  DE
Discountpred = 980 + 9.5  Age – 0.035  Income + 446  Sex
+ (240 – 375Sex)  DB + (–350 – 150Sex)  DC
+ (75 – 50Sex)  DD + (10 + 450Sex)  DE
– Given a male (Sex=0) customer, you’d expect Bob (DB=1) to give
a greater discount (by $240-$3750 = $240) than Andy
– Given a female (Sex=1) customer, you’d expect Bob to give a
smaller discount (by $240-$3751 = -$135) than Andy
– Chuck has been giving smaller discounts to both men and
women than has Andy, and Dave and Ed have been giving larger
discounts than Andy to both sexes
– And we could take the same approach to investigate whether
“Salesperson” interacts with Age, including also DBAge,
DCAge, DDAge, DEAge in our model
Outliers
An outlier is a sample observation which fails to
“fit” with the rest of the sample data. Such
observations may distort the results of an entire
study.
– Types of outliers (three)
– Identification of outliers (via “model analysis”)
– Dealing with outliers (perhaps yielding a better
model)
• These issues are dealt with here.