Analysis of Variance (ANOVA)

Download Report

Transcript Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA)
W&W, Chapter 10
The Results
Many other factors may determine the salary
level, such as GPA. The dean decides to
collect new data selecting one student
randomly from each major with the
following average grades.
New data
Average Accounting
A+
41
A
36
B+
27
B
32
C+
26
C
23
M(t)1=30.83
 = 33.72
Marketing
45
38
33
29
31
25
M(t)2=33.5
Finance M(b)
51
M(b1)=45.67
45
M(b2)=39.67
31
M(b3)=30.83
35
M(b4)=32
32
M(b5)=29.67
27
M(b6)=25
M(t)3=36.83
Randomized Block Design
Now the data in the 3 samples are not
independent, they are matched by GPA
levels. Just like before, matched samples
are superior to unmatched samples because
they provide more information. In this case,
we have added a factor that may account for
some of the SSE.
Two way ANOVA
Now SS(total) = SST + SSB + SSE
Where SSB = the variability among blocks,
where a block is a matched group of
observations from each of the populations
We can calculate a two-way ANOVA to test
our null hypothesis.
The Hypotheses
Ho: 1 = 2 = 3
H A : 1  2   3
We are testing the same hypothesis as in the
completely randomized design.
Calculating SST
SST = b(M(t)j - )2
where b = the number of blocks
M(t)j = the mean for each sample
 = grand mean
Calculating SST
SST = (6)(30.83-33.72)2 + (6)(33.5-33.72)2 +
(6)(36.83-33.72)2 = 108.4
This captures the variation across our samples
(majors).
Calculating SSB
SSB = k(M(b)i - )2
where k = the number of samples
M(b)i = the mean for each block
 = grand mean
Calculating SSB
SSB = (3)(45.67-33.72)2 + (3)(39.67-33.72)2
+ …(3)(25-33.72)2 = 854.9
This captures the variation across our blocks
(GPA levels).
Calculating SS (total)
SS =   (Xij - )2
SS = (41-33.72)2 + (36-33.72)2 + … + (2733.72)2 = 1015.61
We know that
SS = SST + SSB + SSE
So SSE = SS – SST – SSB
Calculating SSE
SSE = 1015.61 – 108.4 – 854.9 = 52.2
Now we can compare our results across the
two designs we have discussed:
1) Completely randomized design
2) Randomized block design
Comparison of the Designs
Sum of
squares
SST
SSB
SSE
SS
Completely Randomized
Design (One way ANOVA)
193
---819.5
1012.5
Randomized Block
Design (Two way)
108.4
854.9
52.2
1015.61
We can see that we have dramatically decreased the error
(SSE) by accounting for GPA. In other words we have
decreased the variability caused by the difference among
the blocks.
Summary Table
Source of
Variation
Treatment
Block
Error
Total
df
k-1
b-1
n-k-b+1
n-1
Sum of
Mean
squares
squares
SST
MST=SST/(k-1)
SSB
MSB=SSB/(b-1)
SSE
MSE=SSE/(n-k-b+1)
SS=SST+SSB+SSE
We can calculate a F-statistic to test
differences among samples or blocks.
Calculating F (differences among
samples) for two way ANOVA
F = MST = SST/(k-1)
MSE SSE/(n-k-b+1)
F = 108.4/(3-1)
52.2/(18-3-6+1)
F = 54.2/5.2 = 10.4
Critical F, k-1, n-k-b+1 = F.05, 2, 10 = 4.1
Decision
Because our calculated F (10.4) exceeds our
critical F (4.1), we reject the null hypothesis
that the means across the samples are equal.
We conclude that there is a difference in the
mean salary levels across the 3 business
majors.
Testing Block Differences
We could also test whether the blocks are different
from each other, or whether students with higher
GPA’s earn more money.
F = MSB = SSB/(b-1)
MSE SSE/(n-k-b+1)
F = 854.9/(6-1)
52.2/(18-3-6+1)
F = 170.982/5.2 = 32.76
Testing Block Differences
Critical F, b-1, n-k-b+1 = F.05, 5, 10 = 3.33
We can also reject the null hypothesis of no
difference among blocks because our
calculated F (32.76) exceeds our critical F
(3.33).
Mean Square Error (MSE)
It is interesting to note that MSE is similar to the
pooled variance sp2 which we calculated earlier for
a matched samples confidence interval.
MSE = (n1 – 1)s12 + (n2 – 1)s22 +..+(nk – 1)sk2
(n – k)
Thus MSE is an unbiased estimate of 2.
W&W show that you can substitute MSE for s in
the calculation of a confidence interval (1 - 2).
Some Assumptions for ANOVA
The population random variables must be
normally distributed (there are many alternative
nonparametric tests if this is violated).
Population variances must be equal.
We assume an additive model, where the effects of
the two factors are added together (multiplicative
may be needed if students with a particular GPA
have an unusually higher salary).
We have no missing data.