Introduction to Statistics: Political Science (Week 1)

Download Report

Transcript Introduction to Statistics: Political Science (Week 1)

Introduction to Statistics:
Political Science (Class 4)
Revisiting the Idea of Confounds
Why MV Regression?
Redundancy v. Suppression
• A few words about covering multivariate
regression over a few weeks
• My hope – you will:
– Understand the mechanics of interpreting MV models
– Have a basic grasp of what MV analysis does and
does not “get us”
• Today we will:
– Revisit the issue of what happens when we “control
for a variable” and why we do it
– Talk a bit more about interpretation of dichotomous
and nominal IVs
Why do multivariate regression?
• Why did most people vote for
Republicans in the midterm?
– John Boehner: “The American people [were]
concerned about the government takeover of
healthcare.”
– What else are the pundits/ officials saying?
What do you think? What went into
individuals’ vote choices this election?
• How do we know who’s right?
Why do multivariate regression?
•
•
Problem: potential explanations are often
related to one another (confounded)
Identify independent relationships
between predictors and outcomes
– I.e., relationships after accounting for
confounds
What happens when we add an IV?
• It depends on:
– the relationship between the new IV and the other IVs
in the model
– the relationship between the new IV and the outcome
variable (DV)
• Typically: Added variable has to be related to
other IV(s) and the DV to affect coefficients on
other IVs in a meaningful way
– There are some (unusual) exceptions we won’t discuss
– Note: adding a new variable will always change the
estimates somewhat
In most cases…
• Adding a confounding variable – i.e., a
variable associated with another IV and
the DV – to a model will attenuate the
coefficient on the original IV
– Sometimes referred to as “redundancy” – IVs
are redundant explanations for the outcome
• Why does this happen?
Bush Feeling
Thermometer
Obama Feeling
Thermometer
Party Affiliation
Democrats
Republicans
100
90
70
60
50
40
30
20
10
Bush FT
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
0
Obama FT
80
Negative assessments of the
economy  like Obama?
• 2008 survey
– Outcome: Evaluation of Obama (1=very
unfavorable; 4=very favorable)
– IVs:
• Evaluation of performance of economy over past 12
months (1=much better; 5=much worse)
• Party affiliation (-3=strong Rep; 3=strong Dem)
Assessment
of Economy
Obama
Favorability
Party Affiliation
One possibility?
Consequences of using bivariate regression if this is the case?
gotten much better
gotten better
stayed about the same
gotten worse
gotten much worse
Democrats Republicans
0.4%
0.5%
0.9%
0.9%
0.9%
11.3%
21.9%
75.9%
50.0%
37.4%
DV: Obama favorability (1-4)
Coef.
Std. Err.
t
p
Economic Assessments
(1=much better; 5=much worse)
0.750
0.081
9.32
0.000
Constant
-0.749
0.365
-2.05
0.041
Coef.
Std. Err.
t
p
Economic Assessments
(1=much better; 5=much worse)
0.332
0.068
4.9
0.000
Party Identification
0.350
0.020
17.5
0.000
Constant
1.097
0.306
3.6
0.000
Assessment
of Economy
Obama
Favorability
Party Affiliation
The regression suggests this ↑
So… relationship between economic assessments and Obama favorability
appears to be biased in bivariate analysis.
Why? Because we haven’t accounted for alternative explanation – PID
All
Democrats
Republicans
Obama Favorability (1-4)
4
3
2
1
0
gotten much better
What’s going
on here?
gotten better
stayed about the
same
gotten worse
gotten much worse
DV: Obama favorability (1-4)
Coef.
Std. Err.
t
p
Economic Assessments
(1=much better; 5=much worse)
0.332
0.068
4.9
0.000
Party Identification
0.350
0.020
17.5
0.000
Constant
1.097
0.306
3.6
0.000
• Should we be confident in our estimate of the
independent relationship between:
– Economic Assessments and Obama favorability?
– Party Identification and Favorability?
• Other variables missing from this model?
– Consequences?
Dichotomous and Nominal
DV: Obama favorability (1-4)
Coef.
Std. Err.
t
p
Gender (1=female)
0.297
0.120
2.490
0.013
Constant
2.456
0.087
28.320
0.000
Why did women like Obama more?
DV: Obama favorability (1-4)
Coef.
Std. Err.
t
p
Gender (1=female)
0.297
0.120
2.490
0.013
Constant
2.456
0.087
28.320
0.000
Coef.
Std. Err.
t
p
Gender (1=female)
0.141
0.093
1.520
0.129
Ideology (-2=very
cons, 2=v. liberal)
0.732
0.039
18.960
0.000
Constant
2.702
0.068
39.870
0.000
“Controlling for the effects of ideology, gender is…”
Expected value: very conservative male? Middle-of the-road male? Very liberal male?
Females?
Males
Females
Obama Favorability
4
3
2
1
very
conservative
conservative
middle-of-theroad
liberal
very liberal
Note: given our model specification, the effect of gender doesn’t depend on
the value of ideology
DV: Obama favorability (1-4)
Coef.
Std. Err.
t
p
Gender (1=female)
0.141
0.093
1.520
0.129
Ideology (-2=very
cons, 2=v. liberal)
0.732
0.039
18.960
0.000
Constant
2.702
0.068
39.870
0.000
What else might predict Obama favorability?
Consequences of not including those measures for our estimate of
The effects of gender?
The effects of ideology?
DV: Obama favorability (1-4)
Gender (1=female)
Coef.
Std. Err.
t
p
0.141
0.093
1.520
0.129
Ideology (-2=very
cons, 2=v. liberal)
0.732
0.039
18.960
0.000
Constant
2.702
0.068
39.870
0.000
Coef.
Std. Err.
t
P
Gender (1=female)
0.163
0.094
1.740
0.082
Ideology (-2=very
cons, 2=v. liberal)
0.716
0.041
17.260
0.000
Protestant
-0.200
0.139
-1.440
0.151
Roman Catholic
-0.145
0.146
-1.000
0.320
Other Religion
-0.364
0.144
-2.530
0.012
Constant
2.871
0.111
25.810
0.000
Religion?
Excluded category: agnostic/atheist
Why didn’t the coefficient on gender change substantially?
“Suppression”
• Omitting a variable from the model CAN
suppress the estimate of an independent
relationship
– I.e., adding a variable can make the
coefficient on an original predictor larger or
even change signs
Do firemen help reduce amount of
damage caused by a fire?
Number of
Fireman at Fire
Fire Damage
Amount of Fire Damage
$250,000
$200,000
$150,000
$100,000
$50,000
$0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
# of Firemen
Do firemen help reduce amount of
damage caused by a fire?
Number of
Fireman at Fire
Fire Damage
Severity of Fire
Small Fires
Big Fires
Amount of Fire Damage
$250,000
$200,000
$150,000
$100,000
$50,000
$0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
# of Firemen
Regression and Causality
• Can we answer these questions?
– Did feelings about Bush and Party
Identification cause feelings about Obama?
– Did assessments of the economy, party
identification and ideology cause Obama’s
favorability?
Regression and Causality
• Regression usually can not decisively
determine causality
– Potential for reverse causality
– Unmeasured confounds
• Instead we:
– Rely on theory
– Use multivariate regression to try to rule out
(account for) the most compelling alternative
explanations / confounds
Notes and Next Time
• Homework
– TAs have homework 1 to return to you
• Model answers are posted online
– We are one class behind
• Homework 2 will be handed out Thursday and due
on Tuesday (it will cover dichotomous and nominal
IVs and non-linear relationships)
• Next time:
– Functional form in multivariate regression