Business Statistics: A First Course Chapter 13 Inference for Counts: Chi-Square Tests© 2011 Pearson Education, Inc.
Download
Report
Transcript Business Statistics: A First Course Chapter 13 Inference for Counts: Chi-Square Tests© 2011 Pearson Education, Inc.
Business Statistics:
A First Course
Chapter 13
Inference for Counts: Chi-Square
Tests
1
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
Given the following…
1) Counts of items in each of several categories
2) A model that predicts the distribution of the relative
frequencies
…the basic idea is to ask:
“Does the actual distribution differ from the model
because of random error or do the differences mean that
the model does not fit the data?”
In other words, “How good is the fit between what we
observe and what we expect to observe?”
2
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
Example: Stock Market “Up” Days
Sample of 1000 “up” days
“Up” days appear to be more common than expected on
Fridays (we expect them to be equally likely across
trading days).
Null Hypothesis: The distribution of “up” days is no
different from what we expect (equally likely across days).
Test the hypothesis with a chi-square goodness-of-fit test.
3
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
The Chi-Square Distribution
2
Note that
“accumulates” the relative squared deviation
of each cell from its expected value.
2
So,
gets “big” when the model is a poor fit.
4
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
Assumptions and Conditions
• Counted Data Condition – The data must be counts for
the categories of a categorical variable.
• Independence Assumption – The counts should be
independent of each other.
• Randomization Condition – The counted individuals
should be a random sample of the population.
• Expected Cell Frequency Condition – Expect at least 5
individuals per cell.
5
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
The Chi-Square Calculation
6
© 2011 Pearson Education, Inc.
13.1 Chi-Square Tests
The Chi-Square Calculation: Stock Market “Up” Days
(192 193.4)2
(218 199.7) 2
x
...
2.62
193.4
199.7
2
Using a chi-square table at a significance level of 0.05 and
with 4 degrees of freedom:
42 9.488 2.62
Do not reject the null hypothesis. (The fit is “good”.)
7
© 2011 Pearson Education, Inc.
13.2 Interpreting Chi-Square Values
The Chi-Square Distribution
The distribution is right-skewed and becomes broader
with increasing degrees of freedom:
2
2
The test is a one-sided test.
8
© 2011 Pearson Education, Inc.
13.3 Examining the Residuals
When we reject a null hypothesis, we can examine the
residuals in each cell to discover which values are
extraordinary.
Because we might compare residuals for cells with very
different counts, we should examine standardized
residuals:
Note that standardized residuals from goodness-of-fit
tests are actually z-scores (which we already know how to
interpret and analyze).
9
© 2011 Pearson Education, Inc.
13.3 Examining the Residuals
Standardized residuals for the trading days data:
• None of these values is remarkable.
• The largest, Friday, at 1.292, is not impressive when
viewed as a z-score.
• The deviations are in the direction of a “weekend effect”,
but they aren’t quite large enough for us to conclude they
are real.
10
© 2011 Pearson Education, Inc.
13.6 Chi-Square Test of Independence
The table below shows the importance of personal
appearance for several age groups.
Are Age and Appearance independent, or is there a
relationship?
11
© 2011 Pearson Education, Inc.
13.6 Chi-Square Test of Independence
A stacked barchart suggests a relationship:
Test for independence using a chi-square test of
independence.
12
© 2011 Pearson Education, Inc.
13.6 Chi-Square Test of Independence
The test requires finding expected counts under the
assumption that the null hypothesis is true (that the two
variables are independent). Find the expected count for
each cell by multiplying the appropriate row and column
totals and divide by the table total:
Exp ij = Total Row i x Total Column j / Table Total
13
© 2011 Pearson Education, Inc.
13.6 Chi-Square Test of Independence
For the Appearance and Age example, we reject the null
hypothesis that the variables are independent.
So, it may be of interest to know how differently two age
groups (teens and 30-something adults) select the “very
important” category (Appearance response 6 or 7).
You can construct a confidence interval for the true
difference in these proportions…
14
© 2011 Pearson Education, Inc.
13.6 Chi-Square Test of Independence
From the table, the relevant percentages of responses (6 or
7) on Appearance for teens and 30 something adults are:
Teens:
45.17%
30-39:
39.91%
The 95% confidence interval is found below:
15
© 2011 Pearson Education, Inc.
What Can Go Wrong?
Don’t use chi-square methods unless you have counts.
Beware of large samples! With a sufficiently large sample
size, a chi-square test will result in rejecting the null
hypothesis.
Don’t say that one variable “depends” on the other just
because they’re not independent.
16
© 2011 Pearson Education, Inc.
What Have We Learned?
Goodness-of-fit tests compare the observed distribution of
a single categorical variable to an expected distribution
based on a theory or model.
Tests of independence examine counts from a single
group for evidence of an association between two
categorical variables.
17
© 2011 Pearson Education, Inc.