Transcript Slide 1

Lecture 9
Chapter 22. Tests for two-way
tables
Objectives
The chi-square test for two-way tables
(Award: NHST Test for Independence)

Two-way tables

Hypotheses for the chi-square test for two-way tables

Expected counts in a two-way table

Conditions for the chi-square test

Chi-square test for two-way tables of fit

Simpson’s paradox
Two-way tables
An experiment has a two-way factorial design if two categorical
factors are studied with several levels of each factor.
Two-way tables organize data about two categorical variables with any
number of levels/treatments obtained from a factorial design design or
two-way observational study.
High school students were asked whether they smoke, and whether their
parents smoke:
Second factor:
Student smoking status
First factor:
Parent smoking status
400
416
188
1380
1823
1168
Marginal distribution
The marginal distributions (in the “margins” of the table) summarize
each factor independently.
Marginal distribution for
parental smoking:
400
416
188
1380
1823
1168
P(both parent)
= 1780/5375 = 33.1%
percent of all students
P(one parent) = 41.7%
P(neither parent) = 25.2%
40
30
20
10
0
both parents
one parent
neither parent
Conditional distribution
The cells of the two-way table represent the intersection of a given level
of one factor with a given level of the other factor. They represent the
conditional distributions.
400
416
188
1380
1823
1168
Conditional distribution of student smoking for
different parental smoking statuses:
P(student smokes | both parent) = 400/1780 = 22.5%
P(student smokes | one parent) = 416/2239 =18.6%
P(student smokes | neither parent) = 188/1356 = 13.9%
Hypotheses
A two-way table has r rows and c columns. H0 states that there is no
association between the row and column variables in the table.
Statistical Hypotheses
H0 : There is no association between the row and column variables
Ha : There is an association/relationship between the 2 variables
We will compare actual counts from the sample data with the counts
we would expect if the null hypothesis of no relationship were true.
Expected counts in a two-way table
A two-way table has r rows and c columns. H0 states that there is no
association between the row and column variables (factors) in the table.
The expected count in any cell of a two-way table when H0 is true is:
row total ´ column total
expected count =
table total
The expected count is the average count you would get for that cell if
the null hypotheses was true.
Cocaine addiction
Cocaine produces short-term feelings of physical
and mental well being. To maintain the effect, the
drug may have to be taken more frequently and at
higher doses. After stopping use, users will feel tired,
sleepy and depressed.
A study compares the rates of successful rehabilitation for cocaine addicts
following 1 of 3 treatment options:
1: antidepressant treatment (desipramine)
2: standard treatment (lithium)
3: placebo (“sugar pill”)
Cocaine addiction
Calculate the expected cell counts if relapse is
independent of the treatment.
Observed %
Expected %
35%
35%
35%
Expected relapse counts
No
25*26/74 ≈ 8.78
Desipramine
25*0.35
Yes
16.22
25*0.65
Lithium
9.14
26*0.35
16.86
25*0.65
Placebo
8.08
23*0.35
14.92
25*0.65
Situations appropriate for the chi-square test
The chi-square test for two-way tables looks for evidence of
association between multiple categorical variables (factors) in
sample data. The samples can be drawn either:

By randomly selecting SRSs from different populations (or from a
population subjected to different treatments)


girls vaccinated for HPV or not, among 8th graders and 12th graders

remission or no remission for different treatments
Or by taking 1 SRS and classifying the individuals according to 2
categorical variables (factors)

11th graders’ smoking status and parents’ status
When looking for associations between two categorical/nominal variables.
We can safely use the chi-square test when:

no more than 20% of expected counts are less than 5 (< 5)

all individual expected counts are 1 or more (≥1)
What goes wrong? With small expected cell counts the sampling
distribution will not be chi-square distributed.
Statistician’s note: If one factor has many levels and too many expected counts
are too low, you might be able to “collapse” some of the levels (regroup them)
and thus have large-enough expected counts.
The chi-square test for two-way tables
H0 : there is no association between the row and column variables
Ha : H0 is not true
The c2 statistic sums over all r x c cells in the table
c 
2

observed
count - expected
expected
count
2
count
When H0 is true, the c2 statistic
follows ~ c2 distribution with
(r-1)(c-1) degrees of freedom.
P-value: P(c2 variable ≥ calculated c2 | H0 is true)
Table A
Ex: df = 6
If c2 = 15.9
the P-value
is between
0.01 −0.02.
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
80
100
0.25
1.32
2.77
4.11
5.39
6.63
7.84
9.04
10.22
11.39
12.55
13.70
14.85
15.98
17.12
18.25
19.37
20.49
21.60
22.72
23.83
24.93
26.04
27.14
28.24
29.34
30.43
31.53
32.62
33.71
34.80
45.62
56.33
66.98
88.13
109.10
0.2
1.64
3.22
4.64
5.99
7.29
8.56
9.80
11.03
12.24
13.44
14.63
15.81
16.98
18.15
19.31
20.47
21.61
22.76
23.90
25.04
26.17
27.30
28.43
29.55
30.68
31.79
32.91
34.03
35.14
36.25
47.27
58.16
68.97
90.41
111.70
0.15
2.07
3.79
5.32
6.74
8.12
9.45
10.75
12.03
13.29
14.53
15.77
16.99
18.20
19.41
20.60
21.79
22.98
24.16
25.33
26.50
27.66
28.82
29.98
31.13
32.28
33.43
34.57
35.71
36.85
37.99
49.24
60.35
71.34
93.11
114.70
0.1
2.71
4.61
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.31
23.54
24.77
25.99
27.20
28.41
29.62
30.81
32.01
33.20
34.38
35.56
36.74
37.92
39.09
40.26
51.81
63.17
74.40
96.58
118.50
0.05
3.84
5.99
7.81
9.49
11.07
12.59
14.07
15.51
16.92
18.31
19.68
21.03
22.36
23.68
25.00
26.30
27.59
28.87
30.14
31.41
32.67
33.92
35.17
36.42
37.65
38.89
40.11
41.34
42.56
43.77
55.76
67.50
79.08
101.90
124.30
p
0.025
5.02
7.38
9.35
11.14
12.83
14.45
16.01
17.53
19.02
20.48
21.92
23.34
24.74
26.12
27.49
28.85
30.19
31.53
32.85
34.17
35.48
36.78
38.08
39.36
40.65
41.92
43.19
44.46
45.72
46.98
59.34
71.42
83.30
106.60
129.60
0.02
5.41
7.82
9.84
11.67
13.39
15.03
16.62
18.17
19.68
21.16
22.62
24.05
25.47
26.87
28.26
29.63
31.00
32.35
33.69
35.02
36.34
37.66
38.97
40.27
41.57
42.86
44.14
45.42
46.69
47.96
60.44
72.61
84.58
108.10
131.10
0.01
6.63
9.21
11.34
13.28
15.09
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.58
32.00
33.41
34.81
36.19
37.57
38.93
40.29
41.64
42.98
44.31
45.64
46.96
48.28
49.59
50.89
63.69
76.15
88.38
112.30
135.80
0.005
7.88
10.60
12.84
14.86
16.75
18.55
20.28
21.95
23.59
25.19
26.76
28.30
29.82
31.32
32.80
34.27
35.72
37.16
38.58
40.00
41.40
42.80
44.18
45.56
46.93
48.29
49.64
50.99
52.34
53.67
66.77
79.49
91.95
116.30
140.20
0.0025
9.14
11.98
14.32
16.42
18.39
20.25
22.04
23.77
25.46
27.11
28.73
30.32
31.88
33.43
34.95
36.46
37.95
39.42
40.88
42.34
43.78
45.20
46.62
48.03
49.44
50.83
52.22
53.59
54.97
56.33
69.70
82.66
95.34
120.10
144.30
0.001
10.83
13.82
16.27
18.47
20.51
22.46
24.32
26.12
27.88
29.59
31.26
32.91
34.53
36.12
37.70
39.25
40.79
42.31
43.82
45.31
46.80
48.27
49.73
51.18
52.62
54.05
55.48
56.89
58.30
59.70
73.40
86.66
99.61
124.80
149.40
0.0005
12.12
15.20
17.73
20.00
22.11
24.10
26.02
27.87
29.67
31.42
33.14
34.82
36.48
38.11
39.72
41.31
42.88
44.43
45.97
47.50
49.01
50.51
52.00
53.48
54.95
56.41
57.86
59.30
60.73
62.16
76.09
89.56
102.70
128.30
153.20
Table of counts:
“actual/expected,” with
three rows and two
columns:
No relapse
Relapse
Desipramine
15
8.78
10
16.22
Lithium
7
9.14
19
16.86
Placebo
4
8.08
19
14.92
df = (3 − 1)(2 − 1) = 2
c
We compute the X2 statistic:
2

15  8 . 78 2
10

8 . 78

7  9 . 14 2


19

 16 . 86 
2
16 . 86

19
8 . 08
 10 . 74
Using Table D: 10.60 < X2 < 11.98
2
16 . 22
9 . 14
 4  8 . 08 2
 16 . 22 
 14 . 92 
2
14 . 92
0.005 > P > 0.0025
The P-value is very small (JMP gives P = 0.0047) and we reject H0.
 There is a significant relationship between treatment type (desipramine, lithium,
placebo) and outcome (relapse or not).
Interpreting the X2 output
When the X2 test is statistically significant:
The largest components indicate which condition(s) are most different
from H0. You can also compare the observed and expected counts, or
compare the computed proportions in a graph.
No relapse
Desipramine
Lithium
Placebo
Relapse
4.41
2.39
0.50
0.27
2.06
1.12
c2 components
The largest X2 component, 4.41, is for
desipramine/norelapse. Desipramine has
the highest success rate (see graph).
Influence of parental smoking
Here is a computer output for a chi-square test performed on the data from
a random sample of high school students (rows are parental smoking
habits, columns are the students’ smoking habits). What does it tell you?
Is the sample size sufficient?
What are the hypotheses?
Are the data ok for a c2 test?
What else should you ask?
What is your interpretation?
Caution with categorical data
An association that holds for all of several groups can reverse direction
when the data are combined to form a single group. This reversal is
called Simpson's paradox.
Kidney stones
A study compared the success rates of
two different procedures for removing
kidney stones: open surgery and
S uccess
F ailure
% failu re
S m all sto n es
O pen surgery
PCNL
81
234
273
289
77
6136
6
7%
13%
22%
17%
percutaneous nephrolithotomy (PCNL),
a minimally invasive technique.
It turns out that for any given patient that PCNL is more likely
to result in failure. Can you think of a reason why?
The procedures are not chosen randomly by surgeons! In fact, the minimally
invasive procedure is most likely used for smaller stones (with a good chance of
success) whereas open surgery is likely used for more problematic conditions.
S uccess
F ailure
% failu re
S uccess
F ailure
% failu re
S m all sto n es
O pen surgery
PCNL
81
234
273
289
677
36
61
7%
13%
22%
17%
S m all sto n es
O pen surgery
PCNL
81
234
6
36
7%
13%
S uccess
F ailure
% failu re
S uccess
F ailure
% failu re
L arg e
O pen surge
192
71
27%
L arg e sto n es
O pen surgery
PCNL
192
55
71
25
27%
31%
For both small stones and large stones, open surgery has a lower failure rate.
This is Simpson’s paradox. The more challenging cases with large stones tend
to be treated more often with open surgery, making it appear as if
the procedure were less reliable overall.
Beware of lurking variables!