Linear Regression 1 - Middle East Technical University

Download Report

Transcript Linear Regression 1 - Middle East Technical University

Multiple Regression 3
Sociology 5811 Lecture 24
Copyright © 2005 by Evan Schofer
Do not copy or distribute without
permission
Announcements
• Schedule:
– Today: More multiple regression
• Interaction terms, nested models
– Next Class: Multiple regression assumptions and
diagnostics
• Reminder: Paper deadline coming up soon!
• Questions about the paper?
Review: Dummy Variables
• Question: How can we incorporate nominal
variables (e.g., race, gender) into regression?
• Option 1: Analyze each sub-group separately
– Generates different slope, constant for each group
• Option 2: Dummy variables
– “Dummy” = a dichotomous variables coded to
indicate the presence or absence of something
– Absence coded as zero, presence coded as 1.
Review: Dummy Variables
• Strategy: Create a separate dummy variable for
all nominal categories
• Example: Race
– Coded 1=white, 2=black, 3=other (like GSS)
• Make 3 dummy variables:
– “DWHITE” is 1 for whites, 0 for everyone else
– “DBLACK” is 1 for Af. Am., 0 for everyone else
– “DOTHER” is 1 for “others”, 0 for everyone else
• Then, include two of the three variables in the
multiple regression model.
Dummy Variables: Interpretation
• Visually: Women = blue, Men = red
Overall slope for
all data points
10
9
8
Note: Line for men,
women have same
slope… but one is
high other is lower.
The constant differs!
7
6
5
4
3
HAPPY
2
1
0
0
INCOME
20000
40000
60000
80000
If women=1, men=0:
The constant (a) reflects
men only. Dummy
100000
coefficient (b) reflects
increase for women
(relative to men)
Dummy Variables: Interpretation
• Ex: Job Prestige
Model
1
(Constant)
EDUC
INCOM16
DBLACK
DOTHER
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
9.666
1.672
2.476
.111
6.282E-02
.397
-2.666
1.117
1.114
1.777
Standardi
zed
Coefficien
ts
Beta
.517
.004
-.055
.014
t
5.780
22.271
.158
-2.388
.627
Sig .
.000
.000
.874
.017
.531
a. Dependent Variable: PRESTIGE
• DBLACK coefficient of -2.66 indicates that
African Americans have, on average, 2.6 points
less job prestige compared to the reference group
(in this case, white respondents).
Dummy Variables
• Comments:
• 1. Dummy coefficients shouldn’t be called slopes
– Referring to the “slope” of gender doesn’t make sense
– Rather, it is the difference in the constant (or “level”)
• 2. The contrast is always with the nominal
category that was left out of the equation
– If DFEMALE is included, the contrast is with males
– If DBLACK, DOTHER are included, coefficients
reflect difference in constant compared to whites.
Interaction Terms
• Question: What if you suspect that a variable has
a totally different slope for two different subgroups in your data?
• Example: Income and Happiness
– Perhaps men are more materialistic -- an extra dollar
increases their happiness a lot
– If women are less materialistic, each dollar has a
smaller effect on income (compared to men)
• Issue isn’t men = “more” or “less” than women
– Rather, the slope of a variable (income) differs across
groups
Interaction Terms
• Issue isn’t men = “more” or “less” than women
– Rather, the slope of a variable coefficient (for income)
differs across groups
• Again, we want to specify a different regression
line for each group
– We want lines with different slopes, not parallel lines
that are higher or lower.
Interaction Terms
• Visually: Women = blue, Men = red
Overall slope for
all data points
10
9
8
Note: Here, the slope
for men and women
differs.
7
6
5
The effect of income on
happiness (X1 on Y)
varies with gender (X2).
This is called an
“interaction effect”
4
3
HAPPY
2
1
0
0
INCOME
20000
40000
60000
80000
100000
Interaction Terms
• Examples of interaction:
– Effect of education on income may interact with type
of school attended (public vs. private)
• Private schooling has bigger effect on income
– Effect of aspirations on educational attainment
interacts with poverty
• Aspirations matter less if you don’t have money to pay for
college
• Question: Can you think of examples of two
variables that might interact?
• Either from your final project? Or anything else?
Interaction Terms
• Interaction effects: Differences in the
relationship (slope) between two variables for
each category of a third variable
• Option #1: Analyze each group separately
• Look for different sized slope in each group
• Option #2: Multiply the two variables of interest:
(DFEMALE, INCOME) to create a new variable
– Called: DFEMALE*INCOME
– Add that variable to the multiple regression model.
Interaction Terms
• Consider the following regression equation:
Yi  a  b1INCOMEi  b2 DFEM * INCi  ei
• Question: What if the case is male?
• Answer: DFEMALE is 0, so b2(DFEM*INC)
drops out of the equation
– Result: Males are modeled using the ordinary
regression equation: a + b1X + e.
Interaction Terms
• Consider the following regression equation:
Yi  a  b1INCOMEi  b2 DFEM * INCi  ei
• Question: What if the case is female?
• Answer: DFEMALE is 1, so b2(DFEM*INC)
becomes b2*INCOME, which is added to b1
– Result: Females are modeled using a different
regression line: a + (b1+b2) X + e
– Thus, the coefficient of b2 reflects difference in
the slope of INCOME for women.
Interpreting Interaction Terms
• Interpreting interaction terms:
• A positive b for DFEMALE*INCOME indicates
the slope for income is higher for women vs. men
– A negative effect indicates the slope is lower
– Size of coefficient indicates actual difference in slope
• Example: DFEMALE*INCOME. Observed b’s:
– Income: b = .5
– DFEMALE * INCOME: b = -.2
• Interpretation: Slope is .5 for men, .3 for women.
Interpreting Interaction Terms
• Example: Interaction of Race and Education
affecting Job Prestige:
Coefficientsa
Model
1
(Constant)
EDUC
INCOM16
DBLACK
BL_EDUC
Unstandardized
Coefficients
B
Std. Error
8.855
1.744
2.541
.118
6.636E-02
.396
4.293
4.193
-.576
.332
Standardi
zed
Coefficien
ts
Beta
.531
.004
.088
-.149
t
5.076
21.563
.167
1.024
-1.735
Sig .
.000
.000
.867
.306
.083
a. Dependent Variable: PRESTIGE
DBLACK*EDUC has a negative effect (nearly significant).
Coefficient of -.576 indicates that the slope of education and job
prestige is .576 points lower for Blacks than for non-blacks.
Continuous Interaction Terms
• Two continuous variables can also interact
• Example: Effect of education and income on
happiness
– Perhaps highly educated people are less materialistic
– As education increases, the slope between between
income and happiness would decrease
• Simply multiply Education and Income to create
the interaction term “EDUCATION*INCOME”
• And add it to the model.
Interpreting Interaction Terms
• How do you interpret continuous variable
interactions?
• Example: EDUCATION*INCOME: Coefficient = 2.0
• Answer: For each unit change in education, the
slope of income vs. happiness increases by 2
– Note: coefficient is symmetrical: For each unit
change in income, education slope increases by 2
• Dummy interactions effectively estimate 2 slopes:
one for each group
• Continuous interactions result in many slopes: Each value
of education*income yields a different slope.=
Interpreting Interaction Terms
• Interaction terms alters the interpretation of
“main effect” coefficients
• Including “EDUC*INCOME changes the interpretation of
EDUC and of INCOME
• See Allison p. 166-9
– Specifically, coefficient for EDUC represents slope of
EDUC when INCOME = 0
• Likewise, INCOME shows slope when AGE=0
– Thus, main effects are like “baseline” slopes
• And, the interaction effect coefficient shows how the slope
grows (or shrinks) for a given unit change.
Dummy Interactions
• It is also possible to construct interaction terms
based on two dummy variables
– Instead of a “slope” interaction, dummy interactions
show difference in constants
• Constant (not slope) differs across values of a third variable
– Example: Effect of of race on school success varies
by gender
• African Americans do less well in school; but the difference
is much larger for black males.
Dummy Interactions
• Strategy for dummy interaction is the same:
Multiply both variables
– Example: Multiply DBLACK, DMALE to create
DBLACK*DMALE
– Then, include all 3 variables in the model
– Effect of DBLACK*DMALE reflects difference in
constant (level) for black males, compared to white
males and black females
• You would observe a negative coefficient, indicating that
black males fare worse in schools than black females or
white males.
Interaction Terms
• Comments:
• 1. If you make an interaction you should also
include the component variables in the model:
– A model with “DFEMALE * INCOME” should also
include DFEMALE and INCOME
• There are rare exceptions. But when in doubt, include them
• 2. Sometimes interaction terms are highly
correlated with its components
• That can cause problems (multicollinearity – which we’ll
discuss next week).
Interaction Terms
• 3. Make sure you have enough cases in each
group for your interaction terms
– Interaction terms involve estimating slopes based on
sub-groups in your data (e.g., black females).
• If you there are hardly any black females in the dataset, you
can have problems.
Interaction Terms
• 4. Interaction terms are confusing at first… but
they are VERY important
– Example: Race, class, gender.
– Most sociologists argue that they operate interactively
• The experience of black lower-class females is different
from black upper-class females or white lower-class
females
• Interaction terms are a powerful way of identifying such
intersections in quantitative data
– In short: Make the effort to consider how variables
interact… it is a very useful way of thinking.
Nested Models
• It is common to conduct a series of multiple
regressions, not just one
– Each model adds variables or sets of variables
• This is useful for several reasons:
– 1. To show how coefficients change when new
variables are added
• Which suggests which variables mediate others
– 2. To examine whether additional variables increase
model fit (r-square).
Nested Models
• Question: Do the new variables substantially
improve the model?
• Idea #1: Look for increase in Adjusted R-Square
• That gives you a sense of whether new variables “improve”
the model
• Idea #2: Conduct a formal F-test
– Recall that F-tests allow comparisons of variance
(e.g., SSbetween to SSwithin)
– This allows you to test the hypothesis that the added
variables improve the model overall.
Nested Models
• F-tests require “nested models”
– Models are the same, except for addition of new
variables
– You can’t compare totally different models this way
F( K 2  K1 )( N  K 2 1)
( R  R ) ( K 2  K1 )

2
(1  R2 ) ( N  K 2  1)
2
2
2
1
• Tests following Hypotheses:
• H0: Two models have the same R-square
• H1: Two models have different R-square
Nested Models
• SPSS can conduct an F-test between two
regression models
• A significant F-test indicates:
– The second model (with additional variables) is a
significant improvement (in R-square) compared to
the first.