Transcript Slide 1
One-way ANOVA:
Inference for one-way ANOVA
IPS chapter 12.1
© 2006 W.H. Freeman and Company
Objectives (IPS chapter 12.1)
Inference for one-way ANOVA
The concept of ANOVA The ANOVA F-test The ANOVA table Using Table E Computation details
The one-way layout
Suppose we have two or more experimental conditions (treatments) we would like to compare.
Usually that comparison takes the form of testing the hypothesis of equal means H o : 1 = 2 = … = k .
In theory we can select a random sample from the k populations associated with each treatment.
More practically we can identify N “experimental units” and randomly assign the treatments to those units.
The concept of ANOVA
Reminders:
A
categorical factor
is a variable that can take on any of several
levels
used to differentiate one group from another.
An experiment has a
one-way,
or
completely randomized, design
if several levels of one factor are being studied and the individuals are randomly assigned to those levels. (There is only one way to group the data .
) Example: Four levels of nematode quantity in seedling growth experiment.
Example: Student performance is evaluated with and without (2 levels) “computer aided” instruction
Analysis of variance
(
ANOVA
) is the technique used to test the equality of k > 2 means.
One-way ANOVA
is used for completely randomized, one-way designs.
How do we compare several means?
We want to know if the observed differences in sample means are likely to have occurred by chance.
Our decision depends partly on the amount of overlap between the groups which depends on the differences between the means and the amount of variability within the groups.
We
first
examine the samples to test for overall significance as evidence of any difference among the means.
ANOVA F-test
If
that overall test indicates statistical significance, then a follow-up comparison of combinations of means is in order.
If we planned our experiment with specific alternative hypotheses in mind (before gathering the data), we can test them using
contrasts .
If we do not have specific alternatives, we can examine all pair-wise parameter comparisons to define which parameters differ from which, using
multiple comparisons procedures .
Nematodes and plant growth
Do nematodes affect plant growth? A botanist prepares 16 identical planting pots and adds different numbers of nematodes into the pots. Seedling growth (in mm) is recorded two weeks later.
Hypotheses: i are all equal (
H
0 ) versus not All i are the same (
H a
) Nematodes Seedling growth 0 10.8
9.1 13.5
1,000 11.1 11.1
9.2
8.2 11.3
5,000 10,000 5.4
5.8
4.6
5.3
7.4
overall mean 8.03
5 7.5
x i
10.65
10.43
5.6
5.45
The ANOVA model
Random sampling always produces chance variation. Any “factor effect” would thus show up in our data as the factor-driven differences plus chance variations (“error”):
Data = fit
(“factor/groups”)
+ residual
(“error”) The one-way ANOVA model analyses situations where chance variations are normally distributed
N
(0,
σ
) so that:
The ANOVA F-test
We have
I independent SRSs ,
from I populations or treatments.
The
i
th population has a
normal distribution
with unknown mean
µ i
.
All
I
populations have the
same standard deviation
σ
, unknown.
F
SSG SSE ( (
N I
The ANOVA
F
statistic tests: 1 )
I
)
H
0 : 1 = 2 = … =
H a
: not all the i I are equal.
When
H
0 is true,
F
has the
F distribution
with I − 1 (
numerator
) and N − I (
denominator
) degrees of freedom.
The ANOVA F-test
Alternatively and more practically we can randomly assign the treatments to a collection of N experimental units so that n 1 units get treatment 1, n 2 units get treatment 2, and so on. We then proceed as before.
F
SSG SSE ( (
N I
1 )
I
)
H
0 : 1 = 2 = … =
H a
: not all the i I are equal.
When
H
0 is true,
F
has the
F distribution
with I − 1 (
numerator
) and N − I (
denominator
) degrees of freedom.
The
ANOVA F-statistic
compares variation due to treatments (levels of the factor) with variation among individuals who should be similar (individuals in the same sample).
F
variation variation among among sample individual s in means same sample Difference in means small relative to overall variability F tends to be small F tends to be large Difference in means large relative to overall variability Larger F-values lead to more significant results. How large it needs to be in order to be significant depends on the degrees of freedom (I − 1 and N − I).
Checking our assumptions
“Theory” suggests each of the populations must be
normally distributed
. But the test is robust to deviations from normality for reasonably sized samples, thanks to the central limit theorem.
The ANOVA F-test theory also requires that all populations have the
same standard deviation
.
Practically:
The results of the ANOVA F-test are approximately correct when the largest sample standard deviation is no more than twice as large as the smallest sample standard deviation.
(Equal sample sizes also make ANOVA more robust to deviations from the equal
rule)
Do nematodes affect plant growth?
0 nematode 1000 nematodes 5000 nematodes 10000 nematodes Seedling growth 10.8
9.1
11.1
5.4
5.8
11.1
4.6
5.3
13.5
8.2
7.4
3.2
9.2
11.3
5.0
7.5
x
¯ i 10.65
10.425
5.6
5.45
s i 2.053
1.486
1.244
1.771
Conditions required:
• equal variances: checking that largest
s
i no more than twice smallest
s
i Largest
s
i = 2.053; smallest
s
i = 1.244
• Independent SRSs Four groups, assumed independent • Distributions “roughly” normal It is hard to assess normality with only four points per condition. But the pots in each group are identical, and there are no outliers.
Smoking influence on sleep
A study of the effect of smoking classifies subjects as nonsmokers, moderate smokers, and heavy smokers. The investigators interview a random sample of 200 people in each group and ask “How many hours do you sleep on a typical night?” 1. Study design?
1.
This is an observational study.
Explanatory variable: smoking -- 3 levels: nonsmokers, moderate smokers, heavy smokers Response variable: # hours of sleep per night 2. Hypotheses?
2.
H 0 : all 3 i equal (versus not all equal) 3. ANOVA assumptions?
4. Degrees of freedom?
3.
Three obviously independent SRS. Sample size of 200 should accommodate any departure from normality. Would still be good to check for s min /s max . 4.
I = 3, n1 = n2 = n3 = 200, and N = 600, so there are I - 1 = 2 (numerator) and N - I = 597 (denominator) degrees of freedom.
The ANOVA table
Source of variation Among or between “groups” Sum of squares SS
n i
(
x i
x
) 2 Within groups or “error” (
n i
1 )
s i
2 Total SST=SSG+SSE (
x ij
x
) 2 DF
I
-1
N - I N
– 1
R
2 = SSG/SST Coefficient of determination
Mean square MS SSG/DFG F MSG/MSE P value F crit Tail area above F Value of F for a SSE/DFE
√MSE = s
p
Pooled standard deviation
The sum of squares represents variation in the data: SST = SSG + SSE. The degrees of freedom likewise reflect the ANOVA model: DFT = DFG + DFE.
Data (“Total”) = fit (“Groups”) + residual (“Error”)
Using Table E
The F distribution is asymmetrical and has two distinct degrees of freedom. This was discovered by Fisher, hence the label “F.” Once again, what we do is calculate the value of F for our sample data and then look up the corresponding area under the curve in Table E.
Table E
For df: 5,4 df den =
N
−
I
p
F
df num =
I
− 1
ANOVA
Source of Variation
Between Groups Within Groups
SS
101 33.3
df
3 12
MS F
33.5 12.08
2.78
P-value
0.00062
F crit
3.4903
Total 134 15
F = 12.08 > 10.80
Thus p < 0.001
F critical for
a
5% is 3.49
Yogurt preparation and taste
Yogurt can be made using three distinct commercial preparation methods: traditional, ultra filtration, and reverse osmosis.
To study the effect of these methods on taste, an experiment was designed where three batches of yogurt were prepared for each of the three methods. A trained expert tasted each of the nine samples, presented in random order, and judged them on a scale of 1 to 10.
Variables, hypotheses, assumptions, calculations?
ANOVA table
Source of variation
Between groups Within groups
SS df
17.3 I-1=2 4.6 N-I=6
MS
8.65
0.767
F
11.283
P-value F crit
Total 17.769
df den =
N
−
I
F
df num =
I
− 1
Computation details
F
MSG MSE SSG SSE ( (
N I
1 )
I
)
MSG ,
the mean square for groups, measures how different the individual means are from the overall mean (~ weighted average of square distances of sample averages to the overall mean). SSG is the sum of squares for groups.
MSE ,
the mean square for error is the
pooled sample variance s
p
2
and estimates the common variance
σ
2 of the
I
populations (~ weighted average of the variances from each of the
I
samples). SSG is the sum of squares for error.
Note: Two sample
t
-test and ANOVA
A two sample
t
-test assuming equal variance and an ANOVA comparing only two groups will give you the exact same
p
-value (for a two-sided hypothesis).
H
0 : 1
H a
: 1 = ≠ 2 2 One-way ANOVA F-statistic
H
0 :
H a
: 1 1 = ≠ 2 2
t
-test assuming equal variance
t-
statistic F =
t
2 and both
p
-values are the same.
But the
t
-test is more flexible: You may choose a one-sided alternative instead, or you may want to run a
t
-test assuming unequal variance if you are not sure that your two populations have the same standard deviation .