Relationships Between Quantitative Variables

Download Report

Transcript Relationships Between Quantitative Variables

Chapter 6
Relationships
Between
Categorical
Variables
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
Is there a relationship
between the two variables,
so that the category into which
individuals fall for one variable
seems to depend on the category
they are in for the other variable?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
2
Example 6.1 Smoking and Divorce Risk
Data on smoking habits and divorce history for the
1669 respondents who had ever been married.
Among smokers, 49% have been divorced, 51% have not.
Among nonsmokers, only 32% have been divorced, 68% have not.
The difference between row percents indicates a relationship.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
3
Risk = Number in category
Total number in group
Risk in category 1
Relative Risk =
Risk in category 2
Percent increase in risk
= (relative risk – 1) x 100%
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
4
Odds
= Number in category 1 to Number in category 2
= (Number in category 1/Number in category 2) to 1
Odds Ratio
= (Odds for group 1) / (Odds for group 2)
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
5
6.3 Misleading Statistics
About Risk
Questions to Ask:
• What are the actual risks? What is the
baseline risk?
• What is the population for which the
reported risk or relative risk applies?
• What is the time period for this risk?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
6
Example 6.4 Disaster in the Skies?
Case Study 1.2 Revisited
“Errors by air traffic controllers climbed
from 746 in fiscal 1997 to 878 in fiscal 1998,
an 18% increase.” USA Today
Look at risk of controller error per flight:
In 1998: 5.5 errors per million flights
In 1997: 4.8 errors per million flights
Risk of error increased but the actual risk is very small.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
7
Example 6.5
Dietary Fat and Breast Cancer
“Italian scientists report that a diet rich in animal
protein and fat – cheeseburgers, french fries, and
ice cream, for example – increases a woman’s risk
of breast cancer threefold.” Prevention Magazine’s
Giant Book of Health Facts (1991, p. 122).
Two reasons info is useless:
1. Don’t know how data collected nor
what population the women represent.
2. Don’t know ages of women studied,
so don’t know baseline rate.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
8
Example 6.5
Dietary Fat
and Breast Cancer (cont)
Age is a critical factor.
Accumulated lifetime risk of woman developing
breast cancer by certain ages:
By age 50: 1 in 50
By age 60: 1 in 23
By age 85: 1 in 9
Annual risk 1 in 3700 for women in early 30’s.
If Italian study was on very young women, the
threefold increase in risk represents a small increase.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
9
6.4 The Effect of a Third Variable
and Simpson’s Paradox
Example 6.7
Educational Status and
Driving after Substance Use
1996 nationwide survey of 11,847 individuals 16 or over.
Response was Driving Status with 3 categories:
• Unimpaired = never drove while impaired
• Alcohol = drove within 2 hours of alcohol use,
but never after drug use
• Drug = drove within 2 hours of drug use
and possibly after alcohol use.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
10
Example 6.7
Educational Status and
Driving after Substance Use
As amount of education increases, the proportion who
drove within two hours of alcohol use also increases.
One difference
between the educ.
groups is age.
Not enough
information on
age so age is a
lurking variable.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
11
Example 6.8
Blood Pressure and
Oral Contraceptive Use
Hypothetical data on 2400 women. Recorded oral
contraceptive use and if had high blood pressure.
Percent with high blood pressure is about the same among
oral contraceptive users and nonusers.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
12
Example 6.8
Blood Pressure and
Oral Contraceptive Use (cont)
Many factors affect blood pressure. If users and nonusers differ
with respect to such a factor, the factor confounds the results.
Blood pressure increases with age and users tend to be younger.
In each age group, the percentage with high blood pressure is
higher for users than for nonusers => Simpson’s Paradox.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
13
6.5 Assessing the Statistical
Significance of a 2x2 Table
Question: Can a relationship observed in the
sample data be inferred to hold in the population
represented by the data?
A statistically significant relationship or
difference is one that is large enough to be unlikely
to have occurred in the observed sample if there is
no relationship or difference in the population.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
14
Five Steps to Determining
Statistical Significance:
1. Determine the null and alternative hypotheses.
2. Verify necessary data conditions, and if met,
summarize the data into an appropriate test statistic.
3. Assuming the null hypothesis is true,
find the p-value.
4. Decide whether or not the result is statistically
significant based on the p-value.
5. Report the conclusion in the context of the situation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
15
Step 1: Null and
Alternative Hypotheses
null hypothesis:
The two variables are not related.
alternative hypotheses:
The two variables are related.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
16
Step 2: The Chi-square Statistic
Chi-square statistic measures the difference between
the observed counts and the counts that would be
expected if there were no relationship.
Large difference => evidence of a relationship.
• Compute expected count for each cell:
Expected count = (Row total)(Column total)
Total n for table
• Compute for each cell: (Obs count – Exp count)2
Exp count
• Compute test statistic by totaling over all cells:
(Obs count – Exp count)2
2
Exp count
 
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
17
Step 3: The p-value of the
Chi-square Test
Large test statistic => evidence of a relationship.
So how large is enough to declare significance?
Q: If there is actually no relationship
in the population, what is the
likelihood that the chi-square statistic
could be as large as it is or larger?
A: The p-value
Note: The p-value is generally reported in computer output.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
18
Steps 4 and 5: Making and
Reporting a Decision
Large test statistic => small p-value
=> evidence a real relationship exists in the population.
Common rule:
• p-value  0.05 => say relationship is statistically
significant and we reject the null hypothesis
• p-value > 0.05 => cannot say relationship is
statistically significant and we cannot reject the
null hypothesis
Note: For 22 tables, a test statistic of 3.84 or larger is significant.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
19
Example 6.10
Randomly Pick S or Q
Of 92 college students asked: “Randomly choose one of
the letters S or Q”, 66% (61/92) picked S. Of another 98
students asked: “Randomly choose one of the letters Q or S”,
46% (45/98) picked S.
Can we conclude order
of letters on the form
and the response are
related?
The p-value = 0.005
which is less than 0.05,
so the relationship is
statistically significant.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
20
Factors that Affect
Statistical Significance
• The strength of the observed relationship
Example 6.10
Of those with “S or Q”, 66% picked S.
Of those with “Q or S”, 46% picked S.
Difference in percentages (66% - 46%) reflects
the strength of the observed relationship.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
21
Factors that Affect
Statistical Significance: (cont)
• How many were studied (sample size)
Example:
I. Treatment A had 8 of 10 patients improve.
Treatment B had 5 of 10 patients improve.
Strength = 80% - 50% = 30% seems large
but study is too small. The p-value is 0.16.
II. Treatment A had 80 of 100 patients improve.
Treatment B had 50 of 100 patients improve.
Strength = 80% - 50% = 30% is again large.
The p-value is 0.000000087, which is very significant.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
22
Practical versus Statistical Significance
Statistical Significance does not mean the
relationship is of practical importance.
Example 6.12 Aspirin and Heart Attacks
p-value is 0.000 => relationship
is statistically significant.
Placebo:
189/11034 = 1.71% had attack
Aspirin:
104/11037 = 0.94% had attack
Difference only 1.71 – 0.94 =
0.77%, or less than 1%.
With large sample this important
difference was detected.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
23
Interpreting a
Nonsignificant Result
• The sample results are not strong enough
to safely conclude that there is a relationship
in the population.
• The observed relationship could have resulted
by chance, even if there is no relationship in
the population. This is not the same as saying
there is no relationship.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
24
Case Study 6.1 Drinking, Driving,
and the Supreme Court
“Random Roadside Survey” of drivers under 20 years of age.
p-value is 0.201 => the observed
association could easily have
occurred even if there is no
relationship in the population.
This result was used by Supreme Court to overturn a law
that allowed sale of beer to females but not males.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
25