Chapter 11 Chi-Square Distribution

Download Report

Transcript Chapter 11 Chi-Square Distribution

Chapter 11 Chi-Square
Distribution
Review
• So far, we have used several probability
distributions for hypothesis testing and
confidence intervals with normal distribution
and Student’s t distribution.
• In this section, we will be using chi-squre.
What is Chi-Square?
• 𝜒 2 = Chi-Square
• The values begin at 0 and then all are positive. The graph
of 𝜒 2 is not symmetrical, and like student’s t distribution, it
depends on the number of degrees of freedom.
• It can determine if random variables are dependent or
independent.
• It can determine if different populations share the same
proportions of specified characteristics.
Example:
Mode (high point)
• The mode (high point) of a chi-square
distribution with n degrees of freedom occurs
over n-2 (for 𝑛 ≥ 3)
Formula for 𝜒
• 𝜒2 =
𝑂−𝐸 2
𝐸
• O= observed
• E= expected
• 𝐸=
(𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙)(𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒
2
Degrees of Freedom
• Degrees of freedom = (number of rows –
1)(Number of columns – 1)
• 𝑑. 𝑓. = 𝑅 − 1 𝐶 − 1
• R= number of cell rows
• C=number of cell columns
Example: (The situation)
• Innovative Machines Incorporated has developed
two new letter arrangements for computer
keyboards. The company wishes to see if there is
any relationship between the arrangement of
letters on the keyboard and the number of hours
it takes a new typing student to learn to type at
20 words per minute. Or, from another point of
view, is the time it takes a student to learn to type
independent of the arrangement of the letters on
a keyboard? Use 5% level of significance
Example: (step 1)
• 𝐻0 : Keyboard arrangement and learning times
are independent
• 𝐻𝐴 : Keyboard arrangement and learning times
are not independent
Example: (chart)
Step 2: Determine E
Answer for E (will show in class)
Keyboard
21-40 h
41-60 h
61-80 h
Row Total
A
O:25
E:24
O:30
E:40
O:25
E:16
80
B
O:30
E:36
O:71
E:60
O:19
E:24
120
Standard
O:35
E:30
O:49
E:50
O:16
E:20
100
Column Total
90
150
60
300 (sample
size)
Remember 𝐸 =
(𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙)(𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒
𝟐
Chart to find 𝑶 − 𝑬 /𝑬
𝑶
Cell
𝑬
𝑶−𝑬
𝑶−𝑬
𝟐
𝑶 − 𝑬 𝟐 /𝑬
1
25
24
1
1
0.04
2
30
40
-10
100
2.50
3
25
16
9
81
5.06
4
30
36
-6
36
1.00
5
71
60
11
121
2.02
6
19
24
-5
25
1.04
7
35
30
5
25
0.83
8
49
50
-1
1
0.02
9
16
20
-4
16
0.80
2
What is 𝜒 then?
2
• 𝜒 =
𝑂−𝐸 2
𝐸
𝑶 − 𝑬 𝟐 /𝑬
0.04
• Add up all the numbers
2.50
5.06
1.00
• 𝜒 2 = 13.31
2.02
1.04
0.83
0.02
0.80
Example: (Degrees of freedom for test
of independence)
• 𝑑. 𝑓. = 𝑅 − 1 𝐶 − 1
• 𝑑. 𝑓. = 3 − 1 3 − 1 = 2 ∗ 2 = 4
• d.f.=4
Conclusion
• Look in the book with chi-square table.
• Since we have Chi-square as 13.31 with d.f. 4
• The corresponding P-value falls between 0.005 and
0.010.
• Since (.005< P-Value < 0.010) < .05, we reject null and
accept alternate. Based on 5% level of significance, we
are taking a chance to conclude that keyboard
arrangement and learning time are not independent.
Group Work (the situation)
• Vending Machine is to install soda machines in
elementary school and high school. The
market analyst wish to know if flavor
preference and school level are independent.
A random sample of 200 students was taken.
Their school level and soda preferences are
given. Is independence indicated at the 1%
level of significance?
Group Work (table)
Soda
High School
Elementary
Row Total
Coke
O:33
E:
O:57
E:
90
Pepsi
O:30
E:
O:20
E:
50
Mountain Dew
O:5
E:
O:35
E:
40
Fanta
O:12
E:
O:8
E:
20
Column Total
80
120
200 (sample size)
How to Test for independence of two
statistical variables
• Look at Pg 582. Copy it and follow it!
Test of homogeneity
• The test claim that different populations share
the sample proportions of specified
characteristics.
Test of Homogeneity
•
The procedure is very much the same as test for independence, except the
hypothesis is different.
•
•
•
For test of independence:
𝐻0 : 𝑇ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝐻𝐴 : 𝑇ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
•
•
•
For test of homogeneity:
𝐻0 : 𝐸𝑎𝑐ℎ 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠ℎ𝑎𝑟𝑒𝑠 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
𝐻𝐴 : 𝑆𝑜𝑚𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠 ℎ𝑎𝑣𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠
Example:
• If you could own one pet, what kind would
you choose? The possible responses were of
the following. Does the same proportion of
males same as females prefer each type of
pet? Use 1 % level of significance
Gender
Dog
Cat
Other pet
No Pet
Female
120
132
18
30
Male
135
70
20
25
Fill this out
Gender
Dog
Cat
Other pet
No Pet
Female
O:120
E:
O:132
E:
O:18
E:
O:30
E:
Male
O:135
E:
O:70
E:
O:20
E:
O:25
E:
Column
Total
Row Total
Answer
Gender
Dog
Cat
Other pet
No Pet
Row Total
Female
O:120
E:139.09
O:132
E:110.18
O:18
E:20.73
O:30
E:30
300
Male
O:135
E:115.91
O:70
E:91.82
O:20
E:17.27
O:25
E:25
250
Column
Total
255
202
38
55
550 (sample
size)
Fill this out
Cell
1
2
3
4
5
6
7
8
𝑶
𝑬
𝑶−𝑬
𝑶−𝑬
𝟐
𝑶−𝑬
/𝑬
𝟐
Answer
𝑶
Cell
𝑬
𝑶−𝑬
𝑶−𝑬
𝟐
𝑶 − 𝑬 𝟐 /𝑬
1
120
139.09
2.62
2
132
110.18
4.320
3
18
20.73
0.359
4
30
30
0
5
135
115.91
3.144
6
70
91.82
5.185
7
20
17.27
0.431
8
25
25
0
Final Answer
• Chi-square= 16.059
• d.f.=3
• P-value=.001
• Based on 1% level of significance, we are taking a
chance to say that males and female students
have different preferences when it comes to
selecting a pet because we rejected the null
saying preference is the same and accept the
alternate saying the preference is different.
Homework Practice
• Pg 588 #1-15 even
CHI-SQUARE: GOODNESS OF FIT
Reason Behind Goodness of Fit
• Set up a test to investigate how well a sample
distribution fits a given distribution
• Use observed and expected frequencies to
compute the sample chi-square statistics
• Find or estimate the P-value and complete the
test
Hypothesis Testing
• 𝐻0 : 𝑇ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑓𝑖𝑡𝑠 𝑡ℎ𝑒 𝑔𝑖𝑣𝑒𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝐻𝐴 : 𝑇ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 ℎ𝑎𝑠 𝑎 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
Sample statistic
• 𝜒2 =
•
•
•
•
𝑂−𝐸 2
𝐸
With degrees of freedom= k-1
E=Expected frequency
O=Observed frequency
k=number of categories in the distribution
Question
• Does present distribution of favorable
responses the same or different than last
year? To test this hypothesis, a random
sample of 500 employees was taken. The
chart is on the next slide. Use 1% level of
significance
Example
Category
Percentage of Favorable Responses
Vacation time
4%
Salary
65%
Safety regulations
13%
Health and retirement benefits
12%
Overtime policy and pay
6%
Category
Observed
Vacation time
30
Salary
290
Safety regulations
70
Health and retirement
benefits
70
Overtime
40
Answer
• 𝐻0 : 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑓 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑎𝑠 𝑙𝑎𝑠𝑡 𝑦𝑒𝑎𝑟
• 𝐻𝐴 : 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
𝑶−𝑬
𝟐
𝑶 − 𝑬 𝟐 /𝑬
Category
O
E
Vacation time
30
20
100
5.00
Salary
290
325
1225
3.77
Safety
regulations
70
65
25
0.38
Health and
retirement
benefits
70
60
100
1.67
Overtime
40
30
100
3.33
Total
500
500
14.15
Answer
• 𝜒 2 = 14.15
• K-1 = 5-1=4
• (.005<P-value<.010) < .01
• Reject null, accept alternate
• At the 1% level of significance, we can say that the
evidence supports the conclusion that this year’s
responses to the issues are different from last years
because we reject the null saying they are the same
and accept the alternate, saying they are different.
Group Work
• The age distribution of the Canadian population and the
age distribution of a random sample of 455 residents in the
Indian community (Red Lake village)
Age
% population
Observed in Red
Lake Village
Under 5
7.2%
47
5-14
13.6%
75
15-64
67.1%
288
65 +
12.1%
45
• Use 5% level of significance to test the claim that the age
distribution fits the age distribution of red lake village
Answer
• 𝐻0 : 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑠 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒
• 𝐻𝐴 : 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑎𝑟𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
•
•
•
•
𝜒 2 = 11.788; 𝑑. 𝑓. = 3
.005<P-value<.01
Reject null; accept alternate
***insert conclusion***
Homework Practice
• Pg 597 #1-18 even