Chapter 11 Slides

Download Report

Transcript Chapter 11 Slides

INTRODUCTORY STATISTICS
Chapter 11 THE CHI-SQUARE DISTRIBUTION
PowerPoint Image Slideshow
FIGURE 11.1
The chi-square distribution can be used to find relationships between two things, like
grocery prices at different stores. (credit: Pete/flickr)
SEC. 11.2: THE CHI-SQUARE DISTRIBUTION
The notation for the chi-square distribution is:
๐œ’~๐œ’๐‘‘๐‘“
2
where df = degrees of freedom which depends on how chi-square is
being used. (If you want to practice calculating chi-square
probabilities then use df = n - 1. The degrees of freedom for the three
major uses are each calculated differently.)
For the ฯ‡2 distribution, the population mean is ฮผ = df and the
population standard deviation is ฯƒ= 2(๐‘‘๐‘“)
FACTS ABOUT CHI-SQUARE
1. The curve is nonsymmetrical and skewed to the right.
2.
There is a different chi-square curve for each df.
3. The test statistic for any test is always greater than or equal to zero.
4. When df > 90, the chi-square curve approximates the normal distribution. For the
distribution with 1,000 degrees of freedom, ฮผ = df = 1,000 and the standard
deviation, ฯƒ = 2(1,000)= 44.7. Therefore, X ~ N(1,000, 44.7), approximately.
5. The mean, ฮผ, is located just to the right of the peak.
SEC. 11.3: GOODNESS-OF-FIT TESTS
In this type of hypothesis test, you determine whether the data "fit" a particular
distribution or not.
Note: The expected value for each cell needs to be at least five in order for you to use
this test.
The table below shows the results of a survey asking 30 students how long they
studied for an exam. Can we use chi-square to test if the frequency matches the
expected frequency?
Hours spent
studying for
exam
Frequency
Expected
Frequency
0-1
4
3
1-2
9
5
2-3
10
12
4-5
4
6
5-6
3
4
THE TEST STATISTIC
2
The test statistic for a goodness-of-fit test is: ๐œ’
=
(๐‘‚โˆ’๐ธ)2
๐‘˜
๐ธ
where:
O = observed values (data)
E = expected values (from theory)
k = the number of different data cells or categories
The observed values are the data values and the expected values are the values
you would expect to get if the null hypothesis were true.
The number of degrees of freedom is df = (number of categories โ€“ 1)
THE P-VALUE
When we calculate our test statistic, our p-value is going to be the probability of getting
that value or higher in the chi-square distribution.
In the picture above, the test-statistic, ๐œ’ 2 is 29.65, so the p-value is p(๐œ’ 2 > 29.65).
Note: We would need to know the degrees of freedom to know which chi-square
distribution we are using.
USING YOUR CALCULATOR
You can calculate the test statistic by hand, making a table and adding the values of
(๐‘‚โˆ’๐ธ)2
๐ธ
together, but there is a function on your calculator that calculates the test-statistic
and p-value for you.
Enter your observed data into ๐ฟ1 , expected data into ๐ฟ2 and use the ๐œ’ 2 GOF-Test found
in the STAT, TESTS menu. Make sure that the observed and expected lists fit and enter
the correct degrees of freedom for df.
EXAMPLE
European crossbills (Loxia curvirostra) have the tip of the upper bill either right or left of
the lower bill, which helps them extract seeds from pine cones. Some have
hypothesized that frequency-dependent selection would keep the number of right and
left-billed birds at a 1:1 ratio. Groth (1992) observed 1752 right-billed and 1895 leftbilled crossbills.
Bill Type
Right
Left
Observed
Expected
HYPOTHESIS TEST
๐ป0 : The observed data fits the 1:1 ratio of bill types
๐ป๐‘Ž: The observed data does not fit the 1:1 ratio of bill types
Degrees of freedom: 1
Distribution: ๐œ’1
2
Test statistic: ๐œ’ 2 = 5.61
P-value: p=0.0179
The p-value represents the probability of getting the observed data if the ratio is
actually 1:1 for bill types.
๐›ผ=0.05
Decision: Reject ๐ป0 since ๐›ผ > ๐‘
Conclusion: At the 5% significance level, there is sufficient evidence to say that the bill
types do not occur at the same frequency. There are significantly more left-billed birds.
ANOTHER BIOLOGY EXAMPLEโ€ฆ
Shivrain et al. (2006) crossed clearfield rice, which are resistant to the
herbicide imazethapyr, with red rice, which are susceptible to
imazethapyr. They then crossed the hybrid offspring and examined
the F2 generation, where they found 772 resistant plants, 1611
moderately resistant plants, and 737 susceptible plants. If resistance
is controlled by a single gene with two co-dominant alleles, you would
expect a 1:2:1 ratio. Do the results fit the expected ratio?
CREATE A TABLE, PERFORM HYPOTHESIS TEST
Result
Observed
Expected
Resistant
772
780
Moderately
resistant
1611
1560
Susceptible
737
780
Null hypothesis: observed data fits expected results
Alternative hypothesis: observed data is significantly different than expected
Df=2
Distribution: ๐œ’2
2
Test statistic: ๐œ’ 2 = 4.12
P-value: p=0.127
๐›ผ=0.05
Decision: Do not reject ๐ป0 since p > ๐›ผ
Conclusion: At the 5% significance level, there is sufficient evidence to say that
the observed results fit the expected 1:2:1 ratio.
PRACTICE
After Halloween, a child is curious about the distribution of colors of
M&M candies. She claims that the six colors are distributed equally,
while her parents claim they are not. She counts all of her M&Ms (on
her way to eating them) and out of 264 candies she got the following
results. Is she correct or are her parents?
Color
Observed Expected
Red
55
Brown
41
Yellow
60
Blue
33
Green
45
Orange
30
SEC. 11.4: TESTS OF INDEPENDENCE
Tests of independence involve using a contingency table of observed
(data) values.
The test statistic for a test of independence is similar to that of a
goodness-of-fit test:
๐‘–๐‘—
(๐‘‚ โˆ’ ๐ธ)2
๐ธ
where:
O = observed values
E = expected values
i = the number of rows in the table
j = the number of columns in the table
There are iโ‹…j terms of the form
(๐‘‚โˆ’๐ธ)2.
๐ธ
INDEPENDENT, OR NOT?
A test of independence determines whether two factors are
independent or not.
Two events A and B are independent if the knowledge that one
occurred does not affect the chance the other occurs.
For example, the outcomes of two roles of a fair die are independent
events. The outcome of the first roll does not change the probability
for the outcome of the second roll.
KEY FEATURES OF THE TEST
The test of independence is always right-tailed because of the
calculation of the test statistic. If the expected and observed values
are not close together, then the test statistic is very large and way out
in the right tail of the chi-square curve, as it is in a goodness-of-fit.
The number of degrees of freedom for the test of independence is:
df = (number of columns - 1)(number of rows - 1)
The following formula calculates the expected number (E):
E=(row total)(column total)/(total number surveyed)
USING YOUR CALCULATOR
Enter the table into your calculator using the MATRX menu.
โ€ขScroll over to EDIT
โ€ขChoose [A] and adjust the matrix size so it is (number of rows) x
(number of columns)
โ€ขGo to the STAT, TESTS menu and choose the ฯ‡2-Test
โ€ขMake sure it says Observed:[A], Expected:[B]
โ€ขChoose DRAW to get the test statistic, p-value and graph
EXAMPLE
A public opinion poll surveyed a simple random sample of 1000
voters. Respondents were classified by gender (male or female) and
by voting preference (Republican, Democrat, or Independent).
Results are shown in the contingency table below.
Republican
Democrat
Independent Total
Male
200
150
50
400
Female
250
300
50
600
Total
450
450
100
1000
Is there a gender gap? Do the men's voting preferences differ
significantly from the women's preferences? Use a 0.05 level of
significance.
HYPOTHESIS TEST
H0: Gender and voting preference are independent
HA: Gender and voting preference are dependent
Degrees of freedom (df) = (2-1)(3-1)=1โ€ข2=2
Distribution = ฯ‡22
Test statistic: ฯ‡2=16.2
P-value: p=0
The p-value represents the probability of getting the sample results if
gender and voting preference are independent.
ฮฑ=0.05, reject null since ฮฑ>p
Conclusion: At the 5% significance level, there is sufficient evidence
to conclude that men and womenโ€™s voting preferences are dependent
(i.e. they differ depending on gender).
PRACTICE: TIRE QUALITY
The operations manager of a company that manufactures tires wants
to determine whether there are any differences in the quality of
workmanship among the three daily shifts. She randomly selects 496
tires and carefully inspects them. Each tire is either classified as
perfect, satisfactory, or defective. Do these data provide sufficient
evidence to infer that there are differences in quality among the three
shifts?
Perfect
Satisfactory Defective
Total
Shift 1
106
124
1
231
Shift 2
67
85
1
153
Shift 3
37
72
3
112
Total
210
281
5
496
SEC. 11.5: TEST OF HOMOGENEITY
The goodnessโ€“ofโ€“fit test can be used to decide whether a population
fits a given distribution, but it will not suffice to decide whether two
populations follow the same unknown distribution.
A different test, called the test for homogeneity, can be used to draw a
conclusion about whether two populations have the same distribution.
To calculate the test statistic for a test for homogeneity, follow the
same procedure as with the test of independence.
SET-UP
Hypotheses
H0: The distributions of the two populations are the same.
Ha: The distributions of the two populations are not the same.
Test Statistic: Use a ฯ‡2 test statistic. It is computed in the same way
as the test for independence.
Degrees of Freedom df = number of columns - 1
Requirements: All values in the table must be greater than or equal
to five.
Common Uses: Comparing two populations. For example: men vs.
women, before vs. after, east vs. west. The variable is categorical with
more than two possible response values.
EXAMPLE
Do families and singles have the same distribution of cars? Use a
level of significance of 0.05. Suppose that 100 randomly selected
families and 200 randomly selected singles were asked what type of
car they drove: sport, sedan, hatchback, truck, van/SUV.
PRACTICE
A university admissions officer was concerned that males and females
were accepted at different rates into the four different schools
(business, engineering, liberal arts, and science) at her university.
She collected the following data on the acceptance of 1200 males and
800 females who applied to the university:
Are males and females distributed equally among the various
schools?
This PowerPoint file is copyright 2011-2015, Rice University. All
Rights Reserved.