Crontingency Table Analysis

Download Report

Transcript Crontingency Table Analysis

07
Class 06
Case: The Roulette Wheel
Goodness of Fit Tests
EMBS 11.2
What we learned last class
• Probability Distributions have characteristics
• Descriptive Statistics are used to estimate those
characteristics.
– Location (mean, median, mode)
– Variability (variance and standard deviation)
– Shape (skewness)
• The MEAN is important.
– Sample mean “value” times n is total value.
• Measures of variability are under-appreciated
Descriptive Statistics Matter
Baseball Statistics
• A batter came to the plate
five times
– Got a hit
– Struck Out
– Walked
– Flied Out
– Grounded Out
Batting
Average = ¼ =
0.250
On Base
Percentage =
2/5 = 0.400
WALK is the fault of
the pitcher: 1 success
in 4 trials
WALK is the fault of
the batter: 2
successes in 5 trials
The Roulette wheel.
• Surveillance video of 18 hours of play of a
roulette wheel in a Reno, Nevada casino
– 904 spins of the wheel
– 22,527 bets places
Outcome
Frequency
00
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
22
25
23
30
28
15
28
20
15
26
23
24
26
21
21
27
27
25
23
Number of
Bets
354
442
362
450
357
375
636
363
682
633
503
484
783
360
525
649
340
643
1,079
Outcome
Frequency
Number of Bets
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Total
23
30
24
26
32
24
18
19
15
22
25
23
33
22
29
17
29
22
22
904
518
595
983
447
576
746
461
521
703
490
827
878
695
664
925
613
597
627
641
22,527
First We Examine the Wheel
• H0: The wheel works properly
• H0: All 38 outcomes have equal probability of
occurring
Like before, the Hypothesis
is about a parameter of a
probability distribution
• H0: P0=P28=P9=…=P2=1/38
• HA: they are not all equal
Outcome Observed
00
22
0
25
1
23
2
30
3
28
4
15
5
28
6
20
7
15
8
26
9
23
10
24
11
26
12
21
13
21
14
27
15
27
16
25
17
23
18
23
19
30
20
24
21
26
22
32
23
24
24
18
25
19
26
15
27
22
28
25
29
23
30
33
31
22
32
29
33
17
34
29
35
22
36
22
Total
904
Expected
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
23.79
904
Distance
0.13
0.06
0.03
1.62
0.75
3.25
0.75
0.60
3.25
0.21
0.03
0.00
0.21
0.33
0.33
0.43
0.43
0.06
0.03
0.03
1.62
0.00
0.21
2.83
0.00
1.41
0.96
3.25
0.13
0.06
0.03
3.57
0.13
1.14
1.94
1.14
0.13
0.13
31.20
Goodness-of-fit Test
1. Calculate the expected counts under H0
2. Calculate the Distances as (O-E)2/E
3. The sum of the distances is the test
statistic. We call it the calculated chisquared.
4. We reject H0 (and say the results are
statistically significant) if the calculated
chi-squared statistic is too big.
(O  E )
 
E
cells
2
2
The calculated
chi-squared
statistic
The sum of the
distances.
We need a p-value!
Number correct
depends on n and P
• For the lady tasting tea
P(X≥8 │ H0) = 1-binomdist(7,10,.5,true)
Pvalue = 0.055
χ2 depends on
number of cells - 1.
• For a GOODNESS OF FIT TEST
P(χ2≥ calculated χ2 │H0) = chidist(calculated χ2, dof)
dof stands for “degrees of freedom”
dof is the parameter of the chi-squared distribution
Can also use
dof here is 37, the number of cells - 1.
=chisq.dist.rt(31.2,37)
P(χ2≥ 31.2│H0) = chidist(31.2,37)
NOT statistically
significant.
Pvalue = 0.74
WARNING
• The chi-squared test
does not work well
when some cells have
low expected counts.
• If some cells have
expected counts < 5,
combine then with
neighboring cells.
Roulette Wheel Demonstration
H0: All 38 are equally likely to get bet on.
Ha: The p’s are not equal. (Some segments
are more popular than others)
Lorex Pharma
H0: The Fill Amounts are Normally
Distributed with μ=10.2 and σ=0.16
Ha: They are not…
Assignment 08
• Due Monday, Feb 13
• Youth Soccer (football) teams from several countries
compete annually in an important international
tournament.
• The birth months (Jan=1, Feb=2, .. Dec=12) of the 288 boys
competing in the 2005 under 16 division showed higher
counts for the early months and lower counts for the later
months.
• Formulate and test a relevant hypothesis
opinion about how
• If you find statistical significance, offer an option
it came to be that early months are more prevalent.
Helsen, W.F., Van Winckel, J., and WIlliams, M., The relative age
effect in youth soccer across Europe, Journal of Sports Sciences, June
2005; 23(6): 629-636.
The Data look like…..
ID
1
2
3
Birth Month
5
5
3
.
.
.
.
.
.
285
286
287
288
3
2
1
1
http://faculty.darden.virginia.edu/Pfeiferp/Statisticsinbusiness/assignments.htm