11. Controlling for a 3rd Variable

Download Report

Transcript 11. Controlling for a 3rd Variable

15. Multiple Regression
• How do we actually request the
regressions in SPSS?
• How do we use regression to explicate a
bivariate relationship with a third variable?
• What do we look for once we have run the
relevant regressions?
To use a single independent variable, family
size, to predict the number of credit cards in a
family, we first choose 'Regression | Linear...'
from the Analyze menu.
Second, in the 'Linear Regression'
dialog box, move the variable
'Number of Credit Cards (ncards)' to
the 'Dependent:' variable list.
Third, move the variable 'Family Size
(famsize)' to the 'Independent(s)' list box.
Example of Simple and Multiple Regression
For this analysis, we
accept all of the other
defaults specified by
SPSS. Fourth, click
on the OK button to
produce the output.
DV
(Effect)
IV
(Cause)
SPSS Output:
Part 1: First Part Shown
Multiple R
Mo del
1
R
.492a
Model S ummary
R Square
.242
Ad just ed R
Sq uare
.222
Std . Error of
the Est imate
14.73484
a. Predict ors: (Con stant ), BOOKS
R Squared =
Percent Variance
Explained
(0.49 × 0.49)
Corrects for
small n
SPSS Output:
Part 2: ANOVA
ANOVAb
Regression
Residual
Total
Sum of
Squares
2633.513
1
Mean Square
2633.513
8250.387
38
217.115
10883.900
39
a. Predictors: (Const ant), BOOKS
b. Dependent Variable: GRADE
We’ll ignore
this part
df
F
12.130
Sig.
.001a
SPSS Output:
Part 3: The Coefficients
Coefficientsa
Unstandardized
Coefficients
Model
1
(Constant)
BOOKS
B
52.075
Std. Error
4.035
5.737
1.647
Standardized
Coefficients
Beta
.492
t
12.905
Sig.
.000
3.483
.001
a. Dependent Variable: GRADE
Almost all of this is important. Here we show one
Independent variable.
SPSS Output:
Part 3(i): The Coefficents - B
Coefficientsa
Model
1
(Constant)
BOOKS
Unstandar
dized
Coefficient
B
s
52.075
• B is shown for each
independent variable and
the constant.
5.737
a. Dependent Variable: GRADE
• B for books is the increase in grade when
you read one more book
• Constant is the estimated grade when you
read no (0) books.
Prediction Equation
• Estimating the DV
DV=B×IV+C
Y = BX + C
• OR:
Marks  5.7  Books  52
Add a Line
80
+
60
+
+
+
+
Here we can
draw the line for the
Equation.
These are the predicted
Values—or best fit line.
40
20
0
1
2
3
4
SPSS Output:
Part 3: The Coefficients
Coefficientsa
Unstandardized
Coefficients
Model
1
(Constant)
BOOKS
B
52.075
Std. Error
4.035
5.737
1.647
Standardized
Coefficients
Beta
.492
t
12.905
Sig.
.000
3.483
.001
a. Dependent Variable: GRADE
Sig. tests the null hypotheses that B is equal to 0.
This is a two-tail test. For directional hypotheses,
Divide by 2 to get the sig. level. Two-tail--the B for BOOKs
is sig. at the .001 level--about one in 1/000 times would
we observe a B as large + or – if there were no relationship
Between BOOKS and grades.
• Most of these previous 8 slides were
adapted from Jeremy Miles notes on line.
• Now let’s look at explicating a bivariate
relationship with a third variable.
Explicating a bivariate relationship
with a third variable
A misspecified relationship is when the
magnitude or direction of the relationship
you observe between a and b is not due to
a causing b, but to c partly or wholly
causing both a and b. When you control
for c the relationship between a and b
changes in magnitude or direction.
• Suppose we hypothesize that respondent’s
affect for Clinton (thermometer score) causes
their affect for Gore (thermometer score).
• But we wish to consider the alternative
explanation that partisanship is a cause of both.
By ignoring the effect of partisanship on both we
can overestimate the effect of feelings towards
Clinton impacting feelings towards Gore
Here we might find:
C
++
G
P
+
+
C
G
+
Here we would have overestimated the impact of C on G. C
does cause G, but controlling for P we realize the effect is
less than we initially thought.
Model Summary
Model
1
R
.732a
R Square
.536
Adjusted
R Square
.536
Std. Error of
the Estimate
19.105
a. Predictors: (Constant), Post:Thermom eter Bill Clinton
++
C
G
Coefficientsa
Model
1
(Constant)
Post:Thermometer
Bill Clinton
Unstandardized
Coefficients
B
Std. Error
17.489
1.006
.689
.016
a. Dependent Variable: Pos t:Thermometer Al Gore
Standardized
Coefficients
Beta
.732
t
17.388
Sig.
.000
42.054
.000
Model Summary
Model
1
•
R
.758a
Adjusted
R Square
.573
R Square
.574
Std. Error of
the Estimate
18.372
a. Predictors: (Constant), Party ID: 3 categories,
Post:Thermometer Bill Clinton
Coefficientsa
Model
1
(Constant)
Post:Thermometer
Bill Clinton
Party ID: 3 categories
Unstandardized
Coefficients
B
Std. Error
40.952
2.249
Standardized
Coefficients
Beta
t
18.208
Sig.
.000
.560
.019
.597
29.003
.000
-8.575
.746
-.236
-11.491
.000
a. Dependent Variable: Post:Thermometer Al Gore
• So yes we did overestimate the effect of Clinton
on Gore’s thermometer score, but the effect of
Clinton on Gore is still quite substantial, and
statistically sig. at the .01 level.
• The coefficient on Clinton is reduced from .689
to .560.
• The first equation: G=.689 C + 17.489 becomes:
G= .560 C – 8.575 P + 40.952.
• Note: what assumption was I making about
party id to have included it in this equation when
I used party3? (R=3, I=2, D=1).
• What would you predict G to be for a Dem who
rated Clinton at 60?
• G= .560 C – 8.575 P + 40.952.
• What would you predict G to be for a Dem
(P=1) who rated Clinton at 60?
•
•
•
•
G=.560 * 60 – 8.575 * 1 + 40.952.
G=66
For an Independent, G=57
For a Republican, G=49
Now we might also have started by examining the effect of
partisanship on Gore’s thermometer score and then asking
whether Clinton’s score was an intervening variable.
P
C
G
P causes G. All or some of the way P causes G is through C.
Pty
Pty
Gore
Clinton
Gore
a
Model Summary
• Model
1
R
.579a
R Square
.336
Adjusted
R Square
.335
Std. Error of
the Estimate
22.932
a. Predictors: (Constant), Party ID: 3 categories
Coeffi cientsa
Model
1
(Const ant)
Party ID: 3 cat egories
Unstandardized
Coeffic ient s
B
St d. Error
94.947
1.574
-21.016
.761
a. Dependent Variable: Post:Thermometer Al Gore
St andardiz ed
Coeffic ient s
Beta
-.579
t
60.305
-27.612
Sig.
.000
.000
Model Summary
Model
1
•
R
.758a
R Square
.574
Adjusted
R Square
.573
Std. Error of
the Estimate
18.372
a. Predictors: (Constant), Party ID: 3 categories,
Post:Thermometer Bill Clinton
Coefficientsa
Model
1
(Constant)
Post:Thermometer
Bill Clinton
Party ID: 3 categories
Unstandardized
Coefficients
B
Std. Error
40.952
2.249
Standardized
Coefficients
Beta
t
18.208
Sig.
.000
.560
.019
.597
29.003
.000
-8.575
.746
-.236
-11.491
.000
a. Dependent Variable: Post:Thermometer Al Gore
• Most, but not all, of the impact of party on
Gore’s thermometer score is due to
Clinton’s score. Perception of Clinton
mostly explains the way in which party
affects perception of Gore
• Remember party is still the cause, we are
looking at the mechanism.
Now there is a danger that there is a reciprocal
relationship. Perhaps Gore also causes perception of
Clinton. We are assuming that perception of Clinton is
more important and dominant in this relationship. A
simple correlation doesn’t give us the answer—we are
making an assumption.
This we don’t think this:
C
G
But rather this:
C
G
3D Relationship
3D Linear Relationship
Multiple Causes (Enhancement): Two variables may be causes of
a third variable, while the two are unrelated to each other.
Turning to the legislative data set: Suppose we think that states
with higher levels of average education are more likely to elect
women to the state legislature either because more women are
likely to run or because electorates are more likely to vote for the
ones that do.
Suppose you also hypothesize that women are more likely to be
elected to lower rather than upper chambers.
E=% college ed in state; C=chamber (2=upper)(1=lower);
W=% women in chamber
C
0
E
+
W
E
W
+
Now lets look at the correlations
among these three variables
•
Correl ations
chamber
colleg_1
pc twch_1
Pearson Correlation
Sig. (1-tailed)
N
Pearson Correlation
Sig. (1-tailed)
N
Pearson Correlation
Sig. (1-tailed)
N
chamber
1
99
-.003
.489
99
-.250**
.006
99
colleg_1
-.003
.489
99
1
pc twch_1
-.250**
.006
99
.451**
.000
99
99
.451**
1
.000
99
99
**. Correlation is s ignificant at t he 0.01 level (1-t ailed).
Model Summary
•
Model
1
R
.451a
R Square
.203
Adjusted
R Square
.195
Std. Error of
the Estimate
.08032
a. Predictors: (Constant), colleg_1
Coeffi cientsa
Model
1
(Const ant)
colleg_1
Unstandardized
Coeffic ients
B
St d. Error
-.036
.047
.009
.002
a. Dependent Variable: pc twch_1
St andardiz ed
Coeffic ients
Beta
.451
t
-.750
4.970
Sig.
.455
.000
Model Summary
Model
1
R
.514a
R Square
.265
Adjusted
R Square
.249
Std. Error of
the Estimate
.07755
a. Predictors: (Constant), chamber, colleg_1
Coeffi cientsa
Model
1
(Const ant)
colleg_1
chamber
Unstandardized
Coeffic ients
B
St d. Error
.031
.051
.009
.002
-.044
.016
a. Dependent Variable: pc twch_1
St andardiz ed
Coeffic ients
Beta
.450
-.248
t
.609
5.140
-2. 839
Sig.
.544
.000
.006
Now let’s look at a misspecified relationship:
P
o
W
S
-
-
P
W
-
Here we would thought that professionalization (P) had no
effect on the percent of women in the chamber (W). But
when we control for South (S) we see that there may be an
effect of prof that was concealed because of the relationship
Southern state region and both P and W.
Model Summary
Model
1
R
.017a
R Square
.000
Adjusted
R Square
-.010
Std. Error of
the Estimate
.08995
a. Predictors: (Constant), prof1_1
Coeffi cientsa
Model
1
(Const ant)
prof1_1
Unstandardized
Coeffic ients
B
St d. Error
.198
.013
-.006
.036
a. Dependent Variable: pc twch_1
St andardiz ed
Coeffic ients
Beta
-.017
t
14.795
-.172
Sig.
.000
.864
First I computed a var for southern state:
compute south=0.
if (state eq 'AL' or state eq 'AR' or state eq
'FL' or state eq 'GA' or state eq 'KY‘ or state
eq 'LA' or state eq 'MS' or state eq 'NC' or
state eq 'OK' or state eq 'SC' or state eq 'TN'
or state eq 'TX' or state eq 'VA')south=1.
Correl ations
pc twch_1
south
prof1_1
pc twch_1
1
Pearson Correlation
Sig. (1-tailed)
N
Pearson Correlation
Sig. (1-tailed)
N
Pearson Correlation
Sig. (1-tailed)
N
99
-.545**
.000
99
-.017
.432
99
south
-.545**
.000
99
1
99
-.192*
.029
99
**. Correlation is s ignificant at t he 0.01 level (1-t ailed).
*. Correlation is s ignificant at t he 0.05 level (1-t ailed).
S
-
-
P
W
-
prof1_1
-.017
.432
99
-.192*
.029
99
1
99
Model Summary
Model
1
R
.559a
R Square
.312
Adjusted
R Square
.298
Std. Error of
the Estimate
.07500
a. Predictors: (Constant), s outh, prof1_1
Coeffi cientsa
Model
1
(Const ant)
prof1_1
south
Unstandardized
Coeffic ients
B
St d. Error
.239
.013
-.045
.031
-.115
.017
a. Dependent Variable: pc twch_1
St andardiz ed
Coeffic ients
Beta
-.126
-.569
t
18.719
-1. 466
-6. 598
Sig.
.000
.146
.000
Model Summary
Model
1
R
.650a
R Square
.423
Adjusted
R Square
.398
Std. Error of
the Estimate
.06942
a. Predictors: (Constant), colleg_1, chamber, prof1_1,
south
Coefficientsa
Model
1
(Constant)
prof1_1
south
chamber
colleg_1
Unstandardized
Coefficients
B
Std. Error
.173
.054
-.054
.029
-.091
.019
-.045
.014
.005
.002
a. Dependent Variable: pctwch_1
Standardized
Coefficients
Beta
-.151
-.448
-.252
.251
t
3.184
-1.886
-4.899
-3.220
2.751
Sig.
.002
.062
.000
.002
.007