Example: Spurious relationship

Download Report

Transcript Example: Spurious relationship

Controlling for
a third variable:
Examples (with one exception)
from the NES data
• Note: The example of a spurious relationship I
gave in class was incorrect. It was incorrect
because education was not the cause of race.
(The expression “duh” comes to mind here.)
• The substantive point of work on this subject
was that lower participation on the part of African
Americans could be explained by their lower
levels of education—i.e., the explanation was
not race per se.
• That point is correct. But Professor Powell’s
point—that education was an intervening
variable—was also correct.
• The example of a spurious relationship that
follows is corrected. (Same as before; diff var’s)
Example: Spurious relationship
• Spurious relationships, where the original
relationship more or less completely
disappears when you control for a second
variable, are quite rare.
• So, for this one example, I’m going to
show you hypothetical numbers.
Spurious relationship (cont.)
• Hypothesis: People with low levels of
political efficacy participate less than those
with high efficacy.
• But, this effect could be explained by lower
levels of education, so controlling for
education would make the relationship
between efficacy and participation
disappear.
• That is, the relationship between efficacy
and participation is spurious.
The simple (uncontrolled) relationship might
look like this. Tau-c = -.33
Low part.
High
Low
efficacy efficacy
65%
75%
Med. part.
25%
20%
High part.
10%
5%
The difference is not large, but it clearly shows that those
with low efficacy participate less.
Low education tau-c = -.05
Hi
Low
73
78
Md 22
18
Lo
Hi
5
When we control for
education, we might
get something like
this.
4
Medium education tau-c = .01
High education tau-c = .04
Hi
Low
68
67
Lo
Md 23
25
Md 40
40
Hi
15
Lo
Hi
9
8
Hi
47
13
Low
45
Spurious relationship (cont.)
• How do we interpret the results?
• As strong evidence that the original
relationship was spurious—i.e., that the
difference in participation rates between
more and less efficacious people was due
to their differing education levels.
• Question: Does this mean that people with
low efficacy actually participate as much
as those with high efficacy?
Spurious relationship (cont.)
• No. The original relationship is not wrong.
But it is misleading if left unexplained.
• A sensible interpretation is that those with
low efficacy lag in education and that is
why they participate less. They evidently
do not participate less because of various
psychological or political motivations that
are associated low efficacy.
Example: Conditional
(specification) relationship
• Hypothesis: People’s overall liberal-conservative
views (judged by their self-placement) influence
(cause) their feelings of attachment to the
political parties (measured by their three-point
party identification).
• However, this relationship is likely not to be as
strong for African Americans (who, overall, very
often consider themselves Democrats).
Conditional relationship (cont.)
• So, let’s first look at the relationship
between lib-cons self placement and
partisanship (uncontrolled).
Party ID: 3 categories * Self plcmnt lib-con 3 cats Crosstabulation
% within Self plcmnt lib-con 3 cats
Self plcmnt lib-con 3 cats
Party ID: 3
categories
1. Democrat
2. independent
3. Republican
Total
1. Liberal
52.8%
41.7%
5.5%
100.0%
3. Moderate
37.4%
55.1%
7.5%
100.0%
5.
Conservative
22.1%
35.0%
42.9%
100.0%
Total
34.3%
38.7%
27.0%
100.0%
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
.383
.343
1608
Asymp.
a
Std. Error
.019
.018
b
Approx. T
19.410
19.410
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Approx. Sig.
.000
.000
Conditional relationship (cont.)
• Now, let’s control for race (blacks vs.
whites only).
Party ID: 3 categories * Self plcmnt lib-con 3 cats * Race: W hite/Black Crosstabulation
% within Self plcmnt lib-con 3 cats
Self plcmnt lib-con 3 cats
Race: White/Black
1. White
2. Black
Party ID: 3
categories
Total
Party ID: 3
categories
1. Democrat
2. independent
3. Republican
1. Democrat
2. independent
3. Republican
Total
1. Liberal
49.8%
44.0%
6.2%
100.0%
80.6%
18.1%
1.4%
100.0%
3. Moderate
31.3%
59.0%
9.6%
100.0%
69.2%
30.8%
100.0%
5.
Conservative
17.6%
34.1%
48.3%
100.0%
59.2%
35.5%
5.3%
100.0%
Total
29.6%
39.2%
31.2%
100.0%
69.6%
27.3%
3.1%
100.0%
Symmetric Measures
Race: White/Black
1. White
2. Black
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
.418
.371
1263
.217
.163
161
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Asymp.
a
Std. Error
.021
.019
Approx. T
19.170
19.170
b
Approx. Sig.
.000
.000
.072
.055
2.970
2.970
.003
.003
Conditional relationship (cont.)
• Summary of relationships
Self-placement x party (tau-b)
Overall
Whites
Blacks
.38
.42
.22
Conditional relationship (cont.)
• How do we interpret the results?
• As strong support for the hypothesis—both that
people’s liberal-conservative views influence
their partisanship and that this relationship is
conditioned by race (i.e., is stronger for whites
than blacks).
• How do we know that? Because when we
control, the tau-c (.38) is strengthened for whites
(.42) and considerably reduced (.22) for blacks.
Conditional relationship (cont.)
• Note that this conclusion does not say that
African Americans are more often
Democratic—though that is true.
• Rather, it says that the relationship
between liberal-conservative views and
partisanship is weaker for blacks.
Presumably, blacks’ ideological views play
less of a role (than for whites) in
determining their partisanship.
Example: Intervening variable
• Hypothesis: Partisanship has a very strong
effect on who one votes for.
• But, it has this effect because partisanship
causes people to have very different views
of issues and candidates, which in turn,
influence the vote.
• That is, issue and candidate views
intervene between partisanship and voting
choices.
Intervening variable (cont.)
• So, let’s first look at the relationship
between partisanship and vote choice
(uncontrolled).
Vote: Gore or Bush * Party ID: 3 categories Crosstabulation
% within Party ID: 3 categories
Vote: Gore
or Bush
Gore
Bush
Total
Party ID: 3 categ ories
2.
1. Democrat
independent
3. Republican
94.2%
47.1%
7.2%
5.8%
52.9%
92.8%
100.0%
100.0%
100.0%
Total
52.7%
47.3%
100.0%
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
.675
.777
1114
Asymp.
a
Std. Error
.016
.018
b
Approx. T
43.153
43.153
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Approx. Sig.
.000
.000
Intervening variable (cont.)
• Now, let’s control for Clinton’s handling of
the economy. (Yes, I’m aware that we’re
looking at 2000 and Gore was the Dem.
candidate. But how Clinton handled the
economy might still have been important.)
Vote: Gore or Bush * Party ID: 3 categories * Clinton's handling of economy Crosstabulation
% within Party ID: 3 categories
Clinton' s handling of
economy
1. Approve strong ly
Vote: Gore
or Bush
Gore
Bush
2. Approve not strongly
Total
Vote: Gore
or Bush
Gore
Bush
4. Disapprove not strongly
Total
Vote: Gore
or Bush
Gore
Bush
5. Disapprove strong ly
Total
Vote: Gore
or Bush
Gore
Bush
Total
Tau-c
for the four
categories of
handling of
the economy
Party ID: 3 categories
2.
1. Democrat
independent
3. Republican
96.0%
68.9%
19.7%
4.0%
31.1%
80.3%
100.0%
100.0%
100.0%
94.0%
48.0%
7.5%
6.0%
52.0%
92.5%
100.0%
100.0%
100.0%
70.0%
19.2%
6.0%
30.0%
80.8%
94.0%
100.0%
100.0%
100.0%
63.6%
10.9%
1.2%
36.4%
89.1%
98.8%
100.0%
100.0%
100.0%
Approve strongly
Approve not strongly
.47
.72
Disapprove not strongly .34
Disapprove strongly
.24
Total
79.3%
20.7%
100.0%
46.7%
53.3%
100.0%
17.4%
82.6%
100.0%
9.3%
90.7%
100.0%
Intervening variable (cont.)
• How do we interpret the results?
• First, as (pretty) strong support for the
hypothesis (there is that messy .72). It appears
as if views of Clinton’s handling of the economy
intervene between partisanship and voting
choices.
• How do we know that? Because when we
control, the tau-c’s are (with one exception)
considerably reduced from the original value of
.78 and because of our theory.
Intervening variable (cont.)
• Second, partisanship makes some
difference above and beyond that of
Clinton’s handling of the economy.
• How do we know that? The tau-c values
are still quite large, even after controlling.
(Also, look at the percentages in each of
the sub-tables.)
Intervening variable (cont.)
• Third, it’s another example of a conditional
relationship. Note that (again, with the
exception of the one category), as Clinton
approval goes down, the relationship
between partisanship and the vote is
weaker.
• NOTE: this does not simply mean that
fewer people voted for Gore, though this is
true—but that the relationship between
partisanship and the vote was weaker.
Intervening variable (cont.)
• What do we do with that pesky .72?
• Don’t totally ignore it: the results are not
perfect. Reality isn’t always simple.
• If possible, try to explain why the “oddity”
occurs. (In this case, I think it would be
very difficult.) Try to find support for any
explanation you come up with.
• Don’t over-interpret—i.e, don’t come up
some unlikely, unsupported explanation.
Intervening variable (cont.)
• One thing you should do is to look at the n
(number of cases) underlying the odd
result. (Here I’ve suppressed the n’s
simply to make things big enough to read.)
• In this case, the n is not small. Good try,
but it doesn’t work.
• We’re left with (as is often the case) a
good, but not perfect (or perfectly
explicable) analysis and interpretation.
Example: Antecedent variable
• Hypothesis: Interest in the campaign
causes people to be more informed about
politics.
• But, education is an antecedent variable—
i.e., education causes people to be
interested in politics and thus, indirectly, is
a cause of knowledge.
• If this sounds rather like the intervening
variable case, it should (as you will see).
Antecedent variable (cont.)
• What I’m going to do is simply follow the
steps outlined by Professor Powell (next
slide).
Using a third variable to find an antecedent cause:
a
b
+
c
a
+
b
A causes b, but we can learn more by finding a is caused by c.
a
Here we start with:
a
We ascertain:
b
c
b
a
c
With…
a
Then we identify a as intervening by predicting b with c and controlling for a. To the
extent the relationship is attenuated by the control, c is antecedent and works through a.
Example: Antecedent variable
• So, let’s first look at the relationship
between campaign interest and political
knowledge (uncontrolled). (a & b)
• For convenience of presentation, I’ve
collapsed the six-item knowledge scale
into three categories. Generally, I would
not do this. I would normally prefer to
have more, rather than fewer, categories
(especially in my dependent variable).
knowl2 * Campaign interest Crosstabulation
% within Campaign interest
knowl2
1.00
2.00
3.00
Total
Campaign interest
High
Low
37.4%
69.1%
48.3%
29.0%
14.2%
1.9%
100.0%
100.0%
Total
43.2%
44.8%
12.0%
100.0%
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
-.246
-.208
1424
Asymp.
a
Std. Error
.022
.020
b
Approx. T
-10.270
-10.270
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Approx. Sig.
.000
.000
Antecedent variable (cont.)
• Now we need to see if education is related
to interest in the campaign. (c & a)
• So, we crosstab education and interest.
Campaign interest * Education: 3 categories Crosstabulation
% within Education: 3 categories
Campaign
interest
Total
High
Low
Education: 3 categ ories
1. Less
3. More
than HS
2. HS
than HS
65.0%
70.5%
84.0%
35.0%
29.5%
16.0%
100.0%
100.0%
100.0%
Total
78.2%
21.8%
100.0%
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
-.174
-.148
1800
Asymp.
a
Std. Error
.023
.020
b
Approx. T
-7.240
-7.240
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Approx. Sig.
.000
.000
Antecedent variable (cont.)
• Note that we have a tau-c here of .15 (the
output says -.15, but effectively, it’s a
positive relationship).
• Two steps left.
• We next check to see if education is
related to knowledge. (c & b)
know2 * Education: 3 categories Crosstabulation
% within Education: 3 categ ories
know2
1.00
2.00
3.00
Total
Education: 3 categories
1. Less
3. More
than HS
2. HS
than HS
78.3%
59.7%
32.0%
19.2%
35.2%
52.0%
2.5%
5.1%
16.0%
100.0%
100.0%
100.0%
Total
43.1%
44.8%
12.0%
100.0%
Symmetric Measures
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
.305
.250
1421
Asymp.
a
Std. Error
.022
.019
b
Approx. T
13.344
13.344
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Approx. Sig.
.000
.000
Antecedent variable (cont.)
• Education and knowledge are related (tauc = .25)
Note: if you are really quick-eyed, you
will note that I used a tau-c here on a
3x3 table. I did so for consistency.
• Finally, we again look at the relationship
between education and knowledge, but
now controlling for interest. (c & b,
controlling for a)
know2 * Education: 3 categories * Campaign interest Crosstabulation
% within Education: 3 categories
Campai gn interest
High
know2
Low
Total
know2
Total
1.00
2.00
3.00
1.00
2.00
3.00
Education: 3 categ ories
1. Less
3. M ore
than HS
2. HS
than HS
73.2%
55.4%
27.5%
23.2%
38.5%
54.3%
3.7%
6.1%
18.2%
100.0%
100.0%
100.0%
89.5%
72.3%
60.3%
10.5%
25.5%
37.3%
2.1%
2.4%
100.0%
100.0%
100.0%
Total
37.4%
48.3%
14.3%
100.0%
69.0%
29.1%
1.9%
100.0%
Symmetric Measures
Campaign interest
High
Low
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Ordinal by
Ordinal
Kendall's tau-b
Kendall's tau-c
N of Valid Cases
Value
.296
.235
1163
.200
.155
258
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
Asymp.
a
Std. Error
.025
.021
Approx. T
11.389
11.389
b
Approx. Sig.
.000
.000
.053
.042
3.666
3.666
.000
.000
Antecedent variable (cont.)
• How do we interpret the results?
• As some support for the hypothesis. It
appears as if education is an antecedent
cause of the relationship between the
campaign interest and political knowledge.
• How do we know that? Because, when
we controlled, the tau-c’s were reduced
from the original value of .25 (to .24 for
high interest and .16 for low interest) and
because of our theory.
Some cautionary notes
• Be careful about using a control variable
with too many categories (or recode so
there aren’t too many categories).
Vote: Gore or Bush * Clinton's handling of economy Crosstabulation
% within Clinton's handling of economy
Vote: Gore
or Bush
Total
Gore
Bush
1. Approve
strongly
79.2%
20.8%
100.0%
Clinton's handling of economy
2. Approve
4. Disapprove
not strongly
not strongly
46.7%
17.4%
53.3%
82.6%
100.0%
100.0%
5. Disapprove
strongly
9.2%
90.8%
100.0%
Total
54.3%
45.7%
100.0%
I took this relationship and then, unthinkingly, controlled for
education, with 7 categories. (Education is not a very
sensible control here theoretically, but set that aside.)
Cautionary notes (cont.)
• There’s a lot of
variation in the tau-c
values, but no
sensible pattern.
• Some of the variability
may be caused by
small numbers of
cases.
8 grades or less
.45
9-11 grades
.64
Grad. high school
.58
Some college
.48
Community college
.84
BA degree
.57
Advanced degree
.53
• Similarly, in an earlier example of a
specified relationship, I looked only at
liberal-conservative views x partisanship
for blacks and whites, not for all races.
That’s because my theory told me what to
expect for these two groups, not for
others.
• Which leads to the next point.
Cautionary notes (cont.)
• Theory/reasoning is important.
• What it makes sense to control for, and
what the interpretation is once you’ve
controlled, depends heavily on your
reasoning about what causes what.
• In particular, whether you have an
intervening variable or an antecedent
variable, isn’t determined simply by the
tables you run (or the measures you
calculate).
An explanatory point
• As shown in the text (pp. 87-92), the same
sort of reasoning (about kinds of
relationships) is applicable when you have
interval-level variables and use means
instead of crosstabs.
A look forward
• It might occur to you to ask about
controlling for more than one variable.
• Good thought. We will. But we generally
do not do it by using crosstabs—for
obvious reasons about complexity and
interpretability.
• We will get into this soon by looking at
correlation and regression.
Data Analysis #2
Due one week from today (by ind’s, not pairs)
• Directions are on the syllabus
• Reminders (unnecessary for most of you)
Do not simply give us SPSS tables.
Do create tables with meaningful labels,
only the entries that are necessary,
and so on.
Explain your results. (More than “yes
my hypothesis is supported.)
• Usually c3 pp. (double-spaced) + tables
Tables should go on a separate page.
• Writing is important.
Use clear, straightforward prose.
Proper grammar; correct spelling,
punctuation, and capitalization; typofree