Analysis of Variance - University of South Carolina

Download Report

Transcript Analysis of Variance - University of South Carolina

One-Way Analysis of Variance
Stat 700 Lecture 12
11/29/2001
Illustrative Example


Four chemical plants, producing the same product
and owned by the same company, discharge
effluents into streams in the vicinity of their locations.
To check the extent of the pollution created by the
effluents and to determine whether this varies from
plant to plant, the company collected random
samples of liquid waste, five specimens for each of
the four plants.
The data is presented in the table of the next slide.
7/18/2015
2
The Data
Plant
A
B
C
D
Polluting Effluents (lb/gal of waste)
1.65
1.72
1.50
1.37
1.60
1.70
1.85
1.46
2.05
1.80
1.40
1.75
1.38
1.65
1.55
2.10
1.95
1.65
1.88
2.00
Overall Mean
Mean
1.568
1.772
1.546
1.916
1.7005
Question: Do the data provide sufficient evidence to
indicate a difference in the mean amounts of effluents
discharged by the four plants?
7/18/2015
3
Another Example

College students were assigned to various study
methods in an experiment to determine the effect of
study technique on learning. The data presented in
the next table was generated to be consistent with
summary quantities found in the paper ``The Effect of
Study Techniques, Study Preferences and Familiarity
on Later Recall.’’ The study methods compared were
reading only, reading and underlining, and reading
and taking notes. One week after studying the paper
``Love in Infant Monkeys’’ students were given an
exam on the article.
7/18/2015
4
The Data on Learning
Technique
Test Score
Means
Read Only
15
14
16
13
11
14
13.833
Read and Underline
15
14
25
10
12
14
15.000
Read and Take
Notes
18
18
18
16
18
20
18.000
15.611
Question: Is there a difference between the true mean scores
for the three study methods?
7/18/2015
5
Testing Equality of Several
Population Means
The setting is that there are p normal
populations each with variance  . This
assumption of equal variances is called the
homoscedasticity assumption. The ith normal
population has mean mi, i=1,2,…p. The goal
is to test the null hypothesis that the p
population means are all identical, versus the
alternative hypothesis that there are at least
two means which are different.
2
7/18/2015
6
Sample Data
For the ith population, we observe a random
sample of size ni, i=1,2,…,p. The data for this
sample therefore consists of Yi1, Yi2, …, Yin(i).
The sample data for the p samples could then
be summarized in tabular form as follows:
7/18/2015
7
Data in Tabular Form
Group
Observations
Sample
Size
Group
Total
Group
Mean
1
Y11,Y12,…,Y1n(1)
n1
Y1.
Y1.
2
Y21,Y22,…,Y2n(2)
n2
Y2.
Y2.
…
…
…
…
…
p
Yp1,Yp2,…,Ypn(p)
np
Yp.
Y p.
n
Y..
Overall
7/18/2015
Y..
8
Test Procedure
The test procedure for testing the equality of the p
population means is based on the F-distribution, and is
usually called the one-way analysis of variance. The test
statistic is given by:
p
Fc 
n p
p 1
 n (Y
i 1
p ni
i
i.
 Y.. )2
2
(
Y

Y
)
 ij i.

MSTr
MSE
i 1 j 1
If the value of this test statistic is larger than F ; p1,n p
Obtained from the F-distribution table, then the null
hypothesis of equal population means is rejected.
7/18/2015
9
ANOVA Representation
Source
Degrees
of
Freedom
Treatment
p-1
or Group
Sum of
Squares
SSTr
MSTr
Fc =
MSTR/MSE
Error or
Residual
n-p
SSE
Total
n-1
SST
7/18/2015
Mean F Statistic
Squares
MSE
10
Formulas
p
ni
p
ni
SST   (Yij  Y.. ) 2   Yij2  CF
i 1 j 1
i 1 j 1
(Y.. ) 2
CF 
n
p
p
i 1
i 1
SSTr   ni (Yi .  Y.. ) 2   ni (Yi . ) 2  CF
p
ni
SSE   (Yij  Yi . ) 2  SST  SSTr
i 1 j 1
7/18/2015
11
Example Using Effluents Data
Plant
A
B
C
D
Polluting Effluents (lb/gal of waste)
1.65
1.72
1.50
1.37
1.60
1.70
1.85
1.46
2.05
1.80
1.40
1.75
1.38
1.65
1.55
2.10
1.95
1.65
1.88
2.00
Overall Mean
Mean
1.568
1.772
1.546
1.916
1.7005
We present and illustrate the analysis using Minitab. In the
next slide is the output from the Minitab analysis. We will
illustrate how this is done in class.
7/18/2015
12
Output from Minitab
One-way Analysis of Variance
Analysis of Variance
Source
DF
SS
Factor
3
0.4649
Error
16
0.4768
Total
19
0.9417
Level
A
B
C
D
N
5
5
5
5
Pooled StDev =
Mean
1.5680
1.7720
1.5460
1.9160
0.1726
MS
0.1550
0.0298
StDev
0.1366
0.2160
0.1592
0.1689
F
5.20
P
0.011
Individual 95% CIs For Mean
Based on Pooled StDev
-+---------+---------+---------+----(-------*--------)
(--------*-------)
(-------*-------)
(-------*-------)
-+---------+---------+---------+----1.40
1.60
1.80
2.00
Conclusion: The p-value of .011 is quite small, so there is
indication that at least two population means are different.
7/18/2015
13
Calculations from Excel
Group
A
B
C
D
Total
1.65
1.7
1.4
2.1
1.72
1.85
1.75
1.95
1.5
1.46
1.38
1.65
1.37
2.05
1.65
1.88
1.6
1.8
1.55
2
Mean
SumSQ
7.84
1.568 12.3678
8.86
1.772 15.8866
7.73
1.546 12.0519
9.58
1.916 18.4694
Overall SumSq 58.7757
OverallTotal
34.01
CF
57.83401
SST
0.941695
SSTr
0.464895
SSE
0.4768
7/18/2015
14
Example for RCB
(Two-Analysis of Variance)

A study was conducted to compare the effects of
three levels of digitalis on the levels of calcium in the
heart muscles of dogs. A description of the actual
experimental procedure is omitted, but it is sufficient
to note that the general level of calcium uptake varies
from one animal to another so that comparison of
digitalis levels (treatments) had to be blocked on
heart muscles. That is, the tissue for a heart muscle
was regarded as a block, and comparisons of the
three treatments were made within a given muscle.
The calcium uptakes for the three levels of digitalis,
A, B, and C, were compared based on the heart
muscles of the four dogs.
7/18/2015
15
The Raw Data from Study
Dogs
1
A
1342
B
1608
C
1881
7/18/2015
2
C
1698
B
1387
A
1140
3
B
1296
A
1029
C
1549
4
A
1150
C
1579
B
1319
16
Two-way Analysis of Variance
Analysis of Variance for CalLevel
Source
DF
SS
MS
Digitali
2
524177
262089
Dog
3
173415
57805
Error
6
6089
1015
Total
11
703682
Digitali
A
B
C
Dog
1
2
3
4
7/18/2015
Mean
1165
1402
1677
Mean
1610
1408
1291
1349
F
258.24
56.96
P
0.000
0.000
Individual 95% CI
-----+---------+---------+---------+-----(--*-)
(--*-)
(--*-)
-----+---------+---------+---------+-----1200
1350
1500
1650
Individual 95% CI
------+---------+---------+---------+----(---*----)
(----*---)
(---*----)
(----*---)
------+---------+---------+---------+----1300
1400
1500
1600
17