Linear Regression 1 - University of California, Irvine

Download Report

Transcript Linear Regression 1 - University of California, Irvine

Sociology 5811:
Lecture 16: Crosstabs 2
Measures of Association
Plus Differences in Proportions
Copyright © 2005 by Evan Schofer
Do not copy or distribute without
permission
Announcements
• Final project proposals due Nov 15
• Get started now!!!
• Find a dataset
• figure out what hypotheses you might test
• Today: Wrap up Crosstabs
• If time remains, we’ll discuss project ideas…
Review: Chi-square Test
• Chi-Square test is a test of independence
• Null hypothesis: the two categorical variables are
statistically independent
• There is no relationship between them
• H0: Gender and political party are independent
• Alternate hypothesis: the variables are related,
not independent of each other
• H1: Gender and political party are not independent
• Test is based on comparing the observed cell
values with the values you’d expect if there were
no relationship between variables.
Review: Expected Cell Values
• If two variables are independent, cell values will
depend only on row & column marginals
– Marginals reflect frequencies… And, if frequency is
high, all cells in that row (or column) should be high
• The formula for the expected value in a cell is:
ˆf 
ij
( f i )( f j )
N
• fi and fj are the row and column marginals
• N is the total sample size
Review: Chi-square Test
• The Chi-square formula:
R
C
  
2
• Where:
•
•
•
•
i 1 j 1
( Eij  Oij )
2
Eij
R = total number of rows in the table
C = total number of columns in the table
Eij = the expected frequency in row i, column j
Oij = the observed frequency in row i, column j
– Assumption for test: Large N (>100)
– Critical value DofF: (R-1)(C-1).
Chi-square Test of Independence
• Example: Gender and Political Views
– Let’s pretend that N of 68 is sufficient
Women
Men
Democrat
O11: 27
E11: 23.4
O12 : 10
E12 : 13.6
Republican
O21 : 16
E21 : 19.6
O22 : 15
E22 : 11.4
Chi-square Test of Independence
• Compute (E – O)2 /E for each cell
Women
Men
Democrat
(23.4 – 27)2/23.4
= .55
(13.6 – 10)2/13.6
= .95
Republican
(19.6 – 16)2/19.6
= .66
(11.4 – 15)2/15
= .86
Chi-Square Test of Independence
• Finally, sum up to compute the Chi-square
• 2 = .55 + .95 + .66 + .86 = 3.02
• What is the critical value for a=.05?
• Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1
• According to Knoke, p. 509: Critical value is 3.84
• Question: Can we reject H0?
• No. 2 of 3.02 is less than the critical value
• We cannot conclude that there is a relationship between
gender and political party affiliation.
Chi-square Test of Independence
• Weaknesses of chi-square tests:
• 1. If the sample is very large, we almost always
reject H0.
• Even tiny covariations are statistically significant
• But, they may not be socially meaningful differences
• 2. It doesn’t tell us how strong the relationship is
• It doesn’t tell us if it is a large, meaningful difference or a
very small one
• It is only a test of “independence” vs. “dependence”
• Measures of Association address this shortcoming.
Measures of Association
• Separate from the issue of independence,
statisticians have created measures of association
– They are measures that tell us how strong the
relationship is between two variables
• Weak Association
Women
Men
Dem.
51
49
Rep.
49
51
Strong Association
Women
Men
Dem.
100
0
Rep.
0
100
Crosstab Association:Yule’s Q
• #1: Yule’s Q
– Appropriate only for 2x2 tables (2 rows, 2 columns)
• Label cell frequencies a through d:
bc  ad
Formula : Q 
bc  ad
a
b
c
d
• Recall that extreme values along the “diagonal”
(cells a & d) or the “off-diagonal” (b & c)
indicate a strong relationship.
• Yule’s Q captures that in a measure
• 0 = no association. -1, +1 = strong association
Crosstab Association:Yule’s Q
• Rule of Thumb for interpreting Yule’s Q:
• Bohrnstedt & Knoke, p. 150
Absolute
value of Q
Strength of Association
0 to .24
“virtually no relationship”
.25 to .49
“weak relationship”
.50 to .74
“moderate relationship”
.75 to 1.0
“strong relationship”
Crosstab Association:Yule’s Q
• Example: Gender and Political Party Affiliation
Women
a
Dem
27
10
Calculate “ad”
d
16
Calculate “bc”
bc = (10)(16) = 160
b
c
Rep
Men
15
ad = (27)(15) = 405
bc  ad 160  405  245
Q


 .48
bc  ad 160  405
505
• -.48 = “weak association”, almost “moderate”
Association: Other Measures
• Phi ()
• Very similar to Yule’s Q
• Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc.
• Gamma (G)
• Based on a very different method of calculation
• Not limited to 2x2 tables
• Requires ordered variables
• Tau c (tc) and Somer’s d (dyx)
• Same basic principle as Gamma
• Several Others discussed in Knoke, Norusis.
Crosstab Association: Gamma
• Gamma, like Q, is based on comparing
“diagonal” to “off-diagonal” cases.
– But, it does so differently
• Jargon:
• Concordant pairs: Pairs of cases where one case
is higher on both variables than another case
• Discordant pairs: Pairs of cases for which the
first case (when compared to a second) is higher
on one variable but lower on another
Crosstab Association: Gamma
• Example: Approval of candidates
– Cases in “Love Trees/Love Guns” cell make
concordant pairs with cases lower on both
Love
Guns
Guns
= OK
Hate
Guns
Hate
Trees
Trees
OK
Love
Trees
1205
603
71
All 71 individuals can be a
pair with everyone in the
lower cells. Just Multiply!
(71)(659+1498+ 431+467)
= 216,905 conc. pairs
659
1498
452
431
467
1120
Crosstab Association: Gamma
• More possible concordant pairs
– The “Love Guns/Trees are OK” cell and the “Trees =
OK/Love Guns” cells also can have concordant pairs
Love
Guns
Guns
= OK
Hate
Guns
Hate
Trees
Trees
= OK
Love
Trees
These 603 can pair with all
those that score lower on
approval for Guns & Trees
1205
603
71
(603)(659 + 431) =
657,270 conc. pairs
659
1498
452
These can pair lower too!
1120
(452)(431 + 467) =
405,896 conc. pairs
431
467
Crosstab Association: Gamma
• Discordant pairs: Pairs where a first person ranks
higher on one dimension (e.g. approval of Trees)
but lower on the other (e.g., app. of Guns)
Love
Guns
Guns
= OK
Hate
Guns
Hate
Trees
Trees
= OK
Love
Trees
1205
603
71
659
1498
452
431
467
1120
The top-left cell is higher
on Guns but lower on
Trees than those in the
lower right. They make
pairs:
(1205)(1498 + 452 + 467
+ 1120) = 4,262,085
discordant pairs
Crosstab Associaton: Gamma
• If all pairs are concordant or all pairs are
discordant, the variables are strongly related
• If there are an equal number of discordant and
concordant pairs, the variables are weakly
associated.
ns  nd
• Formula for Gamma: G 
ns  nd
• ns = number of concordant pairs
• nd = number of discordant pairs
Crosstab Association: Gamma
• Calculation of Gamma is typically done by
computer
• Zero indicates no association
• +1 = strong positive association
• -1 = strong negative association
• It is possible to do hypothesis tests on Gamma
• To determine if population gamma differs from zero
• Requirements: random sample, N > 50
• See Knoke, p. 155-6.
Crosstab Association
• Final remarks:
• You have a variety of possible measures to assess
association among variables. Which one should
you use?
• Yule’s Q and Phi require a 2x2 table
• Larger ordered tables: use Gamma, Tau-c, Somer’s d
• Ideally, report more than one to show that your findings are
robust.
Odds Ratios
• Odds ratios are a powerful way of analyzing
relationships in crosstabs
• Many advanced categorical data analysis techniques are
based on odds ratios
• Review: What is a probability?
• p(A) = # of outcomes that are “A” divided by total number
of outcomes
• To convert a frequency distribution to a probability
distribution, simply divide frequency by N
• The same can be done with crosstabs: Cell frequency over
N is probability.
Odds Ratios
• If total N = 68, probability of drawing cases is:
Women
Men
Dem
27 / 68
10 / 68
Rep
16 / 68
15 / 68
Women
Men
Dem
.397
.147
Rep
.235
.220
Odds Ratios
• Odds are similar to probability… but not quite
• Odds of A = Number of outcomes that are A,
divided by number of outcomes that are not A
– Note: Denominator is different that probability
• Ex: Probability of rolling 1 on a 6-sided die = 1/6
• Odds of rolling a 1 on a six-sided die = 1/5
• Odds can also be calculated from probabilities:
pi
oddsi 
1  pi
Odds Ratios
• Conditional odds = odds of being in one category
of a variable within a specific category of
another variable
– Example: For women, what are the odds of being
democrat?
– Instead of overall odds of being democrat, conditional
odds are about a particular subgroup in a table
Dem
Rep
Women
Men
27
10
16
15
Conditional odds of
being democrat are:
27 / 16 = 1.69
Note: Odds for women
are different than men
Odds Ratios
• If variables in a crosstab are independent, their
conditional odds are equal
• Odds of falling into one category or another are same for all
values of other variable
• If variables in a crosstab are associated,
conditional odds differ
• Odds can be compared by making a ratio
• Ratio is equal to 1 if odds are the same for two groups
• Ratios much greater or less than 1 indicate very different
odds.
Odds Ratios
• Formula for Odds Ratio in 2x2 table:
OR
XY
b d bc


a c ad
Women
Dem
Rep
a
c
27
16
Men
b
d
10
15
• Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395
• Interpretation: men have .395 times the odds of
being a democrat compared to women
• Inverted value (1/.395=2.5) indicates odds of
women being democrat = 2.5 is times men’s odds
Odds Ratios: Final Remarks
• 1. Cells with zeros cause problems for odds ratios
• Ratios with zero in denominator are undefined.
• Thus, you need to have full cells
• 2. Odds ratios can be used to measure assocation
• Indeed, Yule’s Q is based on them
• 3. Odds ratios form the basis for most advanced
categorical data analysis techniques
• For now it may be easier to use Yule’s Q, etc. But, if you
need to do advanced techniques, you will use odds ratios.
Tests for Difference in Proportions
• Another approach to small (2x2) tables:
• Instead of making a crosstab, you can just think
about the proportion of people in a given category
• More similar to T-test than a Chi-square test
•
•
•
•
•
Ex: Do you approve of Pres. Bush? (Yes/No)
Sample: N = 86 women, 80 men
Proportion of women that approve: PW = .70
Proportion of men that approve: PM = .78
Issue: Do the populations of men/women differ?
• Or are the differences just due to sampling variability
Tests for Difference in Proportions
• Hypotheses:
• Again, the typical null hypothesis is that there are
no differences between groups
• Which is equivalent to statistical independence
• H0: Proportion women = proportion men
• H1: Proportion women not = proportion men
• Note: One-tailed directional hypotheses can also be used.
Tests for Difference in Proportions
• Strategy: Figure out the sampling distribution for
differences in proportions
• Statisticians have determined relevant info:
• 1. If samples are “large”, the sampling
distribution of difference in proportions is normal
– The Z-distribution can be used for hypothesis tests
• 2. A Z-value can be calculated using the formula:
P1  P2
Z
σˆ ( P1  P2 )
Tests for Difference in Proportions
• Standard error can be estimated as:
σˆ ( P1  P2 )
N1  N 2
 Pboth (1  Pboth )
N1 N 2
• Where:
Pboth
N1 P1  N 2 P2

N1  N 2
Difference in Proportions: Example
•
•
•
•
•
Q: Do you approve of Pres. Bush? (Yes/No)
Sample: N = 86 women, 80 men
Women: N = 86, PW = .70
Men: N = 80, PW = .78
Total N is “Large”: 166 people
– So, we can use a Z-test
• Use a = .05, two-tailed Z = 1.96
Difference in Proportions: Example
• Use formula to calculate Z-value
P1  P2 .70  .78  .08
Z


σˆ ( P1  P2 )
σˆ ( P1  P2 )
σˆ ( P1  P2 )
• And, estimate the Standard Error as:
σˆ ( P1  P2 )
N1  N 2
 Pboth (1  Pboth )
N1 N 2
Difference in Proportions: Example
• First: Calculate Pboth:
N1 P1  N 2 P2
Pboth 
N1  N 2
86(. 70)  80(. 78)
Pboth 
86  80
Pboth
60 .2  62 .4

 .739
166
Difference in Proportions: Example
• Plug in Pboth=.739:
σˆ ( P1  P2 )
N1  N 2
 .739(1  .739)
N1 N 2
σˆ ( P1 P2 )
86  80
 .454
(86)(80)
σˆ ( P1 P2 )
166
 .674
 .104
6880
Difference in Proportions: Example
• Finally, plug in S.E. and calculate Z:
P1  P2 .70  .78  .08
Z


σˆ ( P1  P2 )
σˆ ( P1  P2 )
σˆ ( P1  P2 )
P1  P2  .08
Z

 .769
σˆ ( P1 P2 ) .104
Difference in Proportions: Example
•
•
•
•
Results:
Critical Z = 1.96
Observed Z = .739
Conclusion: We can’t reject null hypothesis
– Women and Men do not clearly differ in approval of
Bush