Chi-square statistic
Download
Report
Transcript Chi-square statistic
Statistical Significance for a two-way table
Inference for a two-way table
•We often gather data and arrange them in a two-way table to see if two categorical variables are
related to each other.
•Look for an association between the row and column variables.
• Is the association in the sample evidence of an association between these variables in the entire
population?
•Or could the sample association easily arise just from the error in random sampling?
Statistical Significance for a two-way table
Example : Aspirin and Heart Attacks
Aspirin
Placebo
Total
Aspirin Group:
Placebo Group:
Heart
Attack
104
189
293
No Heart
Attack
10,933
10,845
21,778
Total
11,037
11,034
22,071
Heart
Attacks (%)
0.94
1.71
Rate per
1000
9.4
17.1
Percentage who had heart attacks = 0.94%
Percentage who had heart attacks = 1.71%
Difference: only 1.71% – 0.94% = 0.77%
•Are we convinced by the data that there is a real relationship in the population between
taking aspirin and risk of heart attack?
•Need to assess if the relationship is statistically significant.
•Experiment included over 22,000 men, so small difference could be statistically significant
Statistical Significance for a two-way table
Example : Ease of Pregnancy for Smokers and Nonsmokers
Difference: 41% – 29% = 12%
Larger difference, but only based on 586 subjects. Convincing?
Statistical Significance for a two-way table
Step 1: Stating The Hypotheses
Example 1: Aspirin and Heart Attacks
Null Hypothesis:
There is no relationship between taking aspirin
and risk of heart attack in the population.
Alternative Hypothesis:
There is a relationship between taking aspirin
and risk of heart attack in the population.
Example 2: Ease of Pregnancy and Smoking
Null Hypothesis: Smokers and nonsmokers are equally likely to get pregnant in 1st cycle
in population of women trying to get pregnant.
Alternative Hypothesis: Smokers and nonsmokers are not equally likely to get
pregnant in 1st cycle in population of women trying to get pregnant.
Statistical Significance for a two-way table
The chi-square test
To see if the data give evidence against the null hypothesis of "no relationship," compare the
counts in the two way table with the counts we would expect if there really were no relationship.
If the observed counts are far from the expected counts, that's the evidence we were seeking. The
test uses a statistic that measures how far apart the observed and expected counts are.
Expected count= (row total)(column total)
(table total)
•The chi-square statistic is a sum of terms, one for each cell in the table.
•Because chi-square measures how far the observed counts are from what would be expected if
null hypothesis were true, large values are evidence against null hypothesis.
•This sampling distribution is not a Normal distribution. It is a right-skewed distribution that
allows only nonnegative values because chi-square can never be negative.
Statistical Significance for a two-way table
The chi-square test
Step 2:
Collect data and summarize with a ‘test statistic’.
Chi-square statistic: compares data in sample to what would be expected if no relationship
between variables in the population.
Step 3:
true.
Determine how unlikely test statistic would be if the null hypothesis were
p-value: probability of observing a test statistic as extreme as the one observed or more so, if
the null hypothesis is really true. (For chi-square: more extreme = larger value of chi-square
statistic.)
Step 4:
Make a decision.
If chi-square statistic is at least 3.84, the p-value is 0.05 or less, so conclude relationship in
population is real. That is, we reject the null hypothesis and conclude the relationship is
statistically significant.
Statistical Significance for a two-way table
Ease of Pregnancy and Smoking
Pregnancy Occurred After
First Cycle Two or More Cycles Total Percentage in First
Smoker
29
71
100
29%
Nonsmoker
198
288
486
41%
Total
227
359
586
38.7%
1.
Compute the expected numbers.
Expected number of smokers pregnant after 1st cycle:
(100)(227)/586 = 38.74
Can find the remaining expected numbers by subtraction.
Pregnancy Occurred After
First Cycle
Two or More Cycles Total
Smoker
38.74
100 – 38.74 = 61.26
100
Nonsmoker 227 – 38.74 = 188.26 486 – 188.26 = 297.74 486
Total
227
359
586
Statistical Significance for a two-way table
Example 3: Ease of Pregnancy and Smoking
Pregnancy Occurred After
First Cycle Two or More Cycles Total
Smoker
29 (38.74)
71 (61.26)
100
Nonsmoker 198 (188.26)
288 (297.74)
486
Total
227
359
586
2.
Compare Observed and Expected counts.
(observed count – expected count)2/(expected count)
First cell: (29 – 38.74)2/(38.74) = 2.45
Remaining cells shown in table below.
Pregnancy Occurred After
First Cycle Two or More Cycles
Smoker
2.45
1.55
Nonsmoker
0.50
0.32
3.
Compute the chi-squared statistic.
chi-square statistic = 2.45 + 1.55 + 0.50 + 0.32 = 4.82
What is your conclusion?
Statistical Significance for a two-way table
Minitab Results for Example : Ease of Pregnancy and Smoking
P-Value = 0.028
Statistical Significance for a two-way table
Example : Aspirin and Heart Attacks
Aspirin
Placebo
Total
Heart
Attack
104
189
293
No Heart
Attack
10,933
10,845
21,778
Total
11,037
11,034
22,071
Heart
Attacks (%)
0.94
1.71
Rate per
1000
9.4
17.1
Chi-squared statistic = 25.01 - highly statistically significant with with p-value < 0.00001