M&Ms Two-way Tables

Download Report

Transcript M&Ms Two-way Tables

M&Ms Two-way Tables
Ellen Gundlach
STAT 301 Course Coordinator
Purdue University
M&Ms Color Distribution %
according to their website
Brown
Yellow
Red
Blue
Orange
Green
Plain
13
14
13
24
20
16
Peanut
12
15
12
23
23
15
Peanut
10
Butter/
Almond
20
10
20
20
20
Skittles Color Distribution %
according to their hotline
Red
Skittles 20
Orange Yellow Green
Purple
20
20
20
20
My M&Ms data in counts
Plain
Brown Yellow Red
Blue Orange Green
Total
14
10
10
8
4
8
54
Peanut 2
3
5
0
8
4
22
Total
13
15
8
12
12
76
16
My M&Ms data: joint %
(divide counts by total = 76)
Brown Yellow Red
Plain
18.4
Peanut 2.6
Blue Orange Green
13.2
13.2 10.5 5.3
10.5
3.9
6.6
5.3
0
10.5
My M&Ms data:
marginal %s for color
(add down the columns)
Brown Yellow Red
Plain
18.4
Blue Orange Green
13.2
13.2 10.5 5.3
10.5
Peanut 2.6
3.9
6.6
5.3
Marg. 21.0
for
color
17.1
19.8 10.5 15.8
0
10.5
15.8
Total
100
My M&Ms data:
marginal %s for flavor
(add across the rows)
Brown Yellow Red
Plain
18.4
Peanut 2.6
Total
Blue Orange Green
Marg.
for
flavor
13.2
13.2 10.5 5.3
10.5
71.1
3.9
6.6
5.3
28.9
0
10.5
100
My M&Ms data:
joint and marginal %s
Brown Yellow Red
Plain
18.4
Blue Orange Green
Marg.
for
flavor
13.2
13.2 10.5 5.3
10.5
71.1
Peanut 2.6
3.9
6.6
5.3
28.9
Marg.
for
color
17.1
19.8 10.5 15.8
15.8
100
21.0
0
10.5
Conditional distribution of flavor
for color
• We know the color of our M&M already,
but now how is flavor distributed for this
color?
joint % of color and flavor
marginal % of color
Conditional distribution example
• We know we have a red M&M, so what is
the probability it is a plain M&M?
joint % of red and plain 13.2

 66.7%
marginal % of red
19.8
Conditional distribution of color
for flavor
• We know the flavor of our M&M already,
but now how is color distributed for this
color?
joint % of color and flavor
marginal % of flavor
Conditional distribution example
• We know we have a peanut M&M, so what
is the probability it is green?
joint % of peanut and green 5.3

 18.3%
marginal % of peanut
28.9
Conditional distributions in
general
Conditional distribution of X for Y (we know Y
for sure already, but we want to know the
probability or % of having X be true as well):
joint % of X and Y
marginal % of Y (what we know for sure)
Bar graphs for conditional
distribution of color for both flavors
Conditional distribution of color for Milk Chocolate M&Ms
Conditional distribution of color for Peanut M&Ms
30
40
25
30
Percent
Percent
20
15
20
10
10
5
0
0
blue
brown
green
orange
red
color for milk chocolate M&Ms
Cases weighted by percentages for plain M&Ms
yellow
brown
green
orange
red
color for peanut M&Ms
Cases weighted by percentages for peanut M&Ms
yellow
Chi-squared hypothesis test
H0: There is no association between color
distribution and flavor for M&Ms.
Ha: There is association between color
distribution and flavor for M&Ms.
Use an  = 0.01 for this story.
Full-class M&Ms data in counts
(large sample size necessary for test)
Brown Yellow Red
Plain
147
Peanut 69
Blue Orange Green
302
264 407 330
373
110
70
123
162 148
Chi-squared test SPSS results
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
Value
14.396a
14.623
2505
df
5
5
Asymp. Sig.
(2-sided)
.013
.012
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 58.81.
Chi-squared test conclusions
• Test statistic = 14.396 and P-value = 0.013
• Since P-value is > our  of 0.01, we do not
reject H0.
• We do not have enough evidence to say
there is association between color
distribution and flavor for M&Ms.
Skittles vs. M&Ms
• Now we will compare the proportion of
yellow candies for Skittles and for M&Ms.
• The previous two-way table with plain and
peanut M&Ms was of size 2 x 6.
• This table will be of size 2x2 because we
only care about whether a candy is yellow
or non-yellow.
Full-class M&Ms and Skittles data
in counts
(large sample size necessary for test)
Yellow NonTotal
Yellow
Plain
M&Ms
302
1521
1823
Skittles
361
1351
1712
Total
663
2872
3535
Chi-squared hypothesis test
H0: There is no association between color
distribution and flavor for these candies.
Ha: There is association between color
distribution and flavor for these candies.
Use an  = 0.01 for this story.
Chi-squared test SPSS results
Chi-Square Tests
Pearson Chi-Square
Continuity Correction a
Likelihood Ratio
Fisher's Exact Test
N of Valid Cases
Value
11.839b
11.544
11.840
df
1
1
1
Asymp. Sig.
(2-sided)
.001
.001
.001
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.001
.000
3535
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 321.
09.
Chi-squared test conclusions
• Test statistic = 11.839 and P-value = 0.001
• Since P-value is < our  of 0.01, we reject
H 0.
• We have evidence that there is association
between color distribution and flavor for
these candies.
Another way to do this test
Since this is a 2x2 table, and if we are only
interested in a 2-sided () hypothesis test,
we can use the 2-sample proportions test
here.
2-sample proportion test
hypotheses
H0: pM&Ms = pSkittles
Ha: pM&Ms  pSkittles
Defining the proportions
p M&Ms
# yellow M&Ms

total # M&Ms
pSkittles
# yellow Skittles

total # Skittles
Test statistic
Z
pˆ M & Ms  pˆ Skittles
 1
1 
pˆ (1  pˆ ) 


 nM & Ms nSkittles 
Results from the proportion test
• Sample proportions:
pˆ M &Ms  0.166 and pˆ Skittles  0.211
• Test statistic Z = -3.44
• P-value = 2(0.0003) = 0.0006
• Since P-value < our  of 0.01, we reject
H0.
Conclusion to the proportion test
• We have evidence the proportion of yellow
M&Ms is not the same as the proportion of
yellow Skittles.
• In other words, the type of candy makes a
difference to the color distribution.
How do our results from the 2
tests compare?
• The X2 test statistic = 11.839, which is
actually the (Z test statistic = -3.44)2.
• If you take into account the rounding, the Pvalues for both tests are  0.001.
• We rejected H0 in both tests.
When do you use which test?
• Chi-squared tests are best for:
two-sided hypothesis tests only
2x2 or bigger tables
• Proportion (Z) tests are best for:
one- or two-sided hypothesis tests
only 2x2 tables