Transcript Document

Sociology 601 Class 13: October 13, 2009
• Measures of association for tables (8.4)
– Difference of proportions
– Ratios of proportions
– the odds ratio
• Measures of association for ordinal data (8.5 – 8.6).
– gamma
– Kendall’s tau-b
– Statistical inference for ordinal associations
8.4: Measures of Association:
Difference of Proportions
The difference of proportions is the proportion scoring
“yes” in one category of variable X, minus the
proportion scoring “yes” in another category of
variable X.
Formal definition:
For two variables X and Y, with 1,2 as possible
values for X and 1,2 as possible values for Y:
d. p. = P ((Y = 1) | (X = 1)) - P ((Y = 1) | (X = 2))
alternately, d. p. = x=1|y=1 - x=1|y=2
Example for difference of proportions
Support for legalized abortion:
Age
Yes
No
Total
< 40
490 (49%) 510 (51%) 1000 (100%)
40 +
210 (35%) 390 (65%)
600 (100%)
total
700
900
1600
difference of proportions = yes|40+ - yes|<40
= .35 .49
= -.14
The sample percent of older people (40+) who support
abortion is 14 percentage points lower than the percent
Difference of proportions: the problem
A difference of proportions calculated at about .5 does not
seem as important as the same difference calculated near
0.0 or 1.0.
Fictitious example: women as a proportion of all veterinary
school students.
1960, p=.02 1965, p=.06,
difference = .04
1990, p =.51 1995, p=.55
difference = .04
2020, p =.94 2025, p=.98
difference = .04
which 5-year span reflects the largest underlying social
change?
Measures of association: Odds and odds ratios
Odds = proportion of one response
proportion of other response
Examples: what are the odds that a veterinary student would
be a woman in 1990? 1995? 1960? 1965? 2020? 2025?
•
•
•
•
•
•
2025, p=.98,
2020, p =.94,
1995, p=.55
1990, p =.51
1965, p=.06,
1960, p=.02,
odds= 49.0 (or 49 : 1)
odds= 15.7 (or 15.7 : 1)
odds= 1.22 (or 1.22 : 1)
odds= 1.04 (or 1.04 : 1)
odds= 0.0638 (or 1 : 15.7)
odds= 0.0204 (or 1 : 49)
Problems with Odds and odds ratios
• Not as intuitively obvious as difference in proportions
• The odds tend to take extreme values when the proportion
under consideration is near 1 or zero.
• Odds are not symmetric around 50-50 = 1.0
So, we often take log odds:
•
log (4:1) = - log(1:4)
But this is even less intuitive.
Odds ratios
• 2025, odds= 49.0; 2020, odds= 15.7; ratio= 3.12
• 1995, odds= 1.22; 1990, odds= 1.04; ratio= 1.17
• 1965, odds= 0.0638; 1960, odds= 0.0204; ratio= 3.12
8.5. Stepping up to ordinal and interval data
• The chi-squared test is an extremely simple test of
relationships between categories.
– In chi-squared tests, we ask “Does the distribution of one
variable depend on the categories for the other variable?”
– This sort of question requires only nominal-scaled data
• We are usually interested in more informative tests
of relationships between categories.
– In such tests, we ask “As we increase the level of one
variable, how do we change the level of another?”
– “The more of X, the more of Y”
A weakness of a chi-squared test.
• The problem: Chi-Squared tests are for nominal
associations. If we use a chi-squared test when there
is an ordinal association, we waste some information.
Chi-Squared tests cannot distinguish the following
patterns:
like job?
wages no
maybe
yes
like job?
wages no
maybe yes
low
++
-
-
low
++
-
-
med
-
++
-
med
-
-
++
high
-
-
++
high
-
++
-
Alternative for ordinal data
A solution: find concordant and discordant patterns.
• Identify every possible pair of observations. The number of
possible pairs far exceeds the number of observations.
• A pair of observations is concordant if the subject who is
higher on one variable is also higher on the other variable.
• A pair of observations is discordant if the subject who is
higher on one variable is lower on the other variable.
• Many pairs of observations are neither concordant nor
discordant (i.e., ties). We ignore those pairs.
Finding concordant and discordant patterns.
• For all but the smallest samples, the number of concordant
and discordant patterns can be very difficult to count, so we
usually leave that exercise to a computer program.
• It is, however, important to understand what the computer is
doing. For that reason, we will try an example.
Concordant pairs:
Discordant pairs:
wages
like job?
no
maybe yes
low
10
1
1
med
3
4
5
high
3
7
2
Counting concordant pairs
(no like, low wages) x (maybe like, med wages)
(no, low) x (maybe, high)
(no, low) x (yes, med)
(no, low) x (yes, high)
(maybe, low) x (yes, med)
(maybe, low) x (yes, high)
(no, med) x (maybe, high)
(no, med) x (yes, high)
(maybe, med) x (yes, high)
Total concordant pairs
wages
low
med
high
like job?
no
maybe
10
3
3
1
4
7
yes
1
5
2
= 10 x 4
= 10 x 7
= 10 x 5
= 10 x 2
=1x5
=1x2
=3x7
=3x2
=4x2
= 40
= 70
= 50
= 20
= 5
= 2
= 21
= 6
= 8
= 222
Counting discordant pairs
(no like, med wages) x (maybe like, low wages)
(no, med) x (yes, low)
(no, high) x (maybe, med)
(no, high) x (yes, med)
(no, high) x (maybe, low)
(no, high) x (yes, low)
(maybe, high) x (yes, low)
(maybe, high) x (yes, med)
(maybe, med) x (yes, low)
Total discordant pairs
=3x1
=3x1
=3x4
=3x5
=3x1
=3x1
=7x1
=7x5
=4x1
wages
like job?
no
maybe
yes
low
med
10
3
1
4
1
5
high
3
7
2
= 3
= 3
= 12
= 15
= 3
= 3
= 7
= 35
= 4
= 85
Measuring ordinal associations with gamma
Gamma (γ): A measure for concordant and discordant
patterns.
gamma = (C –D) / (C+D), where
C = number of concordant pairs.
D = number of discordant pairs.
For the previous example:
γ = (222 – 85) / (222 + 85)
= 139 / 307
= +.45
Measuring ordinal associations with gamma
Interpreting gamma:
If gamma is between 0 and +1, the ordinal variables are
positively associated.
If gamma is between 0 and –1, the ordinal variables are
negatively associated.
The magnitude of gamma indicates the strength of the
association.
If gamma = 0, the variables may still be statistically dependent
because Chi-squared could still be large. However, the
categories may not be dependent in an ordinal sequence.
The trouble with gamma
• Because gamma varies from -1 to +1 and is a measure of
association between two variables, naïve statisticians tend
to interpret gamma as a correlation coefficient.
• (more on correlation coefficients in the next chapter)
• The problem is that gamma gives more extreme values than
a correlation coefficient, especially if the number of
categories is small.
• Unscrupulous researchers can increase gamma by
collapsing categories together!
Kendall’s Tau-b
• Kendall’s Tau-b is an alternative measure to
Gamma.
– Like Gamma, Kendall’s tau-b can take values from -1 to
+1, and the farther from 0, the stronger the association.
• STATA calculates a ‘sort-of’ standard error
(Asymptotic Standard Error, or ASE) for tau-b,
which you can use for statistical significance tests.
• z = tau-b / (ASE of tau-b)
Using gamma and tau-b
Use STATA commands for Chi-squared tests, which give you
significance tests for ordinal level data.
If the gamma or tau-b test is statistically significant and the chisquared is not, you have added power to the test by making
the assumption of an ordinal relationship.
If the chi-squared test is statistically significant and the gamma
and tau-b tests are not, you should see a clear departure
from an ordinal relationship in the data.
(To test this relationship, calculate the conditional
distributions of one variable for categories of the other.)
STATA example of gamma and tau-b
Use the TABULATE command with options:
. tabulate gender party [freq=number], gamma taub
|
party
gender | democrat independe republica |
Total
-----------+---------------------------------+---------female |
279
73
225 |
577
male |
165
47
191 |
403
-----------+---------------------------------+---------Total |
444
120
416 |
980
gamma =
Kendall's tau-b =
0.1470
0.0796
ASE = 0.056
ASE = 0.031
Statistical inference with gamma and tau-b
A test for ordinal comparisons is similar to an independent
samples test for population proportions.
Assumptions:
random sample,
ordinal (or interval) categories,
the sampling distribution of differences between groups is
normal because the sample size is large: n ≥ 5 for every cell.
Null hypothesis:
there is no ordered relationship between the ordered
distributions of categories.
Statistical inference with gamma and tau-b
Test statistic:
z = gamma / ASE of gamma.
gamma = 0.1470 ASE = 0.056
z = .1470/.056 = 2.625
(note: ASE stands for Asymptotic Standard Error)
P-value: look up in Table A
p = .0044 for a one-tailed test,
so p = .0088 for a two tailed test.
Conclusion: p < .01, so reject the null hypothesis.
Instead, conclude that there is an ordered relationship
between sex and political identification.
(If you checked, you would find that p for a gamma test is
smaller than p for a Chi-squared test in this case.)