Analysis of Variance - Nicholls State University

Download Report

Transcript Analysis of Variance - Nicholls State University

Analysis of Variance
• Compares means to determine if the
population distributions are not
similar
• Uses means and confidence intervals
much like a t-test
• Test statistic used is called an F
statistic (F-test)
Normal Distribution
• Most characteristics follow a normal
distribution
– For example: height, length, speed, etc.
• One of the assumptions of the ANOVA
test is that the sample data is ‘normally
distributed.’
Sample Distribution Approaches
Normal Distribution With Sample Size
10
Frequency
8
6
4
2
0
Population
Sample
Sample Distribution Approaches Normal
Distribution With Sample Size
10
Frequency
8
6
4
2
0
Population
Sample
Sample Distribution Approaches
Normal Distribution With Sample Size
10
Frequency
8
6
4
2
0
Population
Sample
Proc Univariate
• Tests for normality
• Gives you a ‘visual’ of your sample
distribution
• SAS code:
proc sort; by location
proc univariate plot normal; by location;
var length;
run;
Important Univariate Output
Tests for Normality
Test
Shapiro-Wilk W
Kolmogorov-Smirnov D
Cramer-von Mises W-Sq
Anderson-Darling A-Sq
--Statistic--0.951978
0.156907
0.03281
0.223707
-----p Value-----Pr < W
Pr > D
Pr > W-Sq
Pr > A-Sq
0.6919
>0.1500
>0.2500
>0.2500
Each of the four above tests are testing for normality. The
Shapiro-Wilk and Kolmogorov-Smirnov are the two most
common. Because all p values are >0.05, none of the tests
indicate that our sample is significantly different than a
normal distribution.
Mean = x =
Variance =
Individual
1
2
3
4
5
6
N=6
N-1=5
Weight
26
32
25
26
30
30
169
Ni=x
N
(x-x)2
N-1
Standard Deviation = 
(x-x)2
N-1
SD
Standard Error =
2
Mean
(Weight - Mean)
28.17
4.7089
28.17
14.6689
28.17
10.0489
28.17
4.7089
28.17
3.3489
28.17
3.3489
SOS=
40.8334
√N
Mean = 169/6 = 28.17
Range = 25 – 32
SOS = 40.83
Variance = 40.83 / 5 = 8.16
Std. Dev. = 40.83/5 = 2.86
Std. Err. = 2.86 / √6 = 1.17
ANOVA – Analysis of Variance
Calculate a SOS based on an overall mean (total SOS)
Pond
Lake
120
100
80
60
40
20
0
0
1
2
3
Trtmnt
Replicate
Length
Overall Mean
SOSTotal
Pond
1
34
57.7
561.69
Pond
2
78
57.7
412.09
Pond
3
48
57.7
94.09
Pond
4
24
57.7
1135.69
Pond
5
64
57.7
39.69
Pond
6
58
57.7
0.09
Pond
7
34
57.7
561.69
Pond
8
66
57.7
68.89
Pond
9
22
57.7
1274.49
Pond
10
44
57.7
187.69
80
Lake
1
38
57.7
388.09
60
Lake
2
82
57.7
590.49
Lake
3
58
57.7
0.09
Lake
4
76
57.7
334.89
Lake
5
60
57.7
5.29
Lake
6
70
57.7
151.29
Lake
7
99
57.7
1705.69
Lake
8
40
57.7
313.29
Lake
9
68
57.7
106.09
Lake
10
91
57.7
1108.89
This provides a
measure of the
overall variance
(Total SOS).
Pond
Lake
120
9040.2
100
40
20
0
0
1
2
3
Calculate a SOS based for each treatment
(Treatment or Error SOS).
Pond
Lake
120
100
80
60
40
20
0
0
1
2
3
Trtmnt
Replicate
Length
Trtmnt Mean
SOSError
Pond
1
34
47.2
174.24
Pond
2
78
47.2
948.64
Pond
3
48
47.2
0.64
Pond
4
24
47.2
538.24
Pond
5
64
47.2
282.24
Pond
6
58
47.2
116.64
Pond
7
34
47.2
174.24
Pond
8
66
47.2
353.44
Pond
9
22
47.2
635.04
Pond
10
44
47.2
10.24
Lake
1
38
68.2
912.04
Lake
2
82
68.2
190.44
Lake
3
58
68.2
104.04
Lake
4
76
68.2
60.84
Lake
5
60
68.2
67.24
Lake
6
70
68.2
3.24
Lake
7
99
68.2
948.64
Lake
8
40
68.2
795.24
60
Lake
9
68
68.2
0.04
20
Lake
10
91
68.2
519.84
This provides a
measure of the
reduction of variance
by measuring each
treatment separately
(Treatment or Error
SOS).
What happens to Error
SOS when the
variability w/in each
treatment decreases?
Pond
Lake
120
100
80
40
0
6835.2
0
1
2
3
Calculate a SOS for each predicted value vs. the overall mean
(Model SOS)
Predicted_Pond
Predicted_Lake
Overall_Avg
120
100
80
60
40
20
0
0
1
2
3
Trtmnt
Replicate
Length
Trtmnt Mean
Overall Mean
SOSModel
Pond
1
34
47.2
57.7
110.25
Pond
2
78
47.2
57.7
110.25
Pond
3
48
47.2
57.7
110.25
Pond
4
24
47.2
57.7
110.25
Pond
5
64
47.2
57.7
110.25
Pond
6
58
47.2
57.7
110.25
Pond
7
34
47.2
57.7
110.25
Pond
8
66
47.2
57.7
110.25
Pond
9
22
47.2
57.7
110.25
Pond
10
44
47.2
57.7
110.25
Lake
1
38
68.2
57.7
110.25
Lake
2
82
68.2
57.7
110.25
Lake
3
58
68.2
57.7
110.25
Lake
4
76
68.2
57.7
110.25
Lake
5
60
68.2
57.7
110.25
Lake
6
70
68.2
57.7
110.25
Lake
7
99
68.2
57.7
110.25
Lake
8
40
68.2
57.7
110.25
Lake
9
68
68.2
57.7
110.25
Lake
10
91
68.2
57.7
110.25
2205
This provides a
measure of the
distance between
the mean values
(Model SOS).
What happens to
Model SOS when
the two means are
close together?
What if the means
are equal?
Detecting a Difference Between Treatments
• Model SOS gives us an index on how far
apart the two means are from each other.
– Bigger Model SOS = farther apart
• Error SOS gives us an index of how
scattered the data is for each treatment.
– More variability = larger Error SOS = more
possible overlap between treatments
Magic of the F-test
• The ratio of Model SOS to Error SOS (Model SOS divided
by Error SOS) gives us an overall index (the F statistic)
used to indicate the relative ‘distance’ and ‘overlap’
between two means.
– A large Model SOS and small Error SOS = a large F statistic. Why
does this indicate a significant difference?
– A small Model SOS and a large Error SOS = a small F statistic. Why
does this indicate no significant difference??
• Based on sample size and alpha level (P-value), each F
statistic has an associated P-value.
– P < 0.05 (Large F statistic) there is a significant difference between
the means
– P ≥ 0.05 (Small F statistic) there is NO significant difference
cards;
SAS Program with ANOVA added
Data Set not shown
;
proc print;
run;
proc sort; by location;
/*
proc means mean n var stddev cv stderr clm;
Tells SAS to ignore everything
between /* and */
by location;
var length;
run;
*/
proc anova; {Tells SAS to do the analysis of variance procedure}
The SAS System
Obs
Data Set:
20 total
observations
Two locations with
10 replicates each
Individual lengths
10:10 Monday, June 19, 2006 1
location replicate length
1
Pond
1
34
2
Pond
2
78
3
Pond
3
48
4
Pond
4
24
5
Pond
5
64
6
Pond
6
58
7
Pond
7
34
8
Pond
8
66
9
Pond
9
22
10
Pond
10
44
11
Lake
1
38
12
Lake
2
82
13
Lake
3
58
14
Lake
4
76
15
Lake
5
60
16
Lake
6
70
17
Lake
7
99
18
Lake
8
40
19
Lake
9
68
20
Lake
10
91
SAS ANOVA Output 1st Page
The ANOVA Procedure
Class Level Information
Class
Levels Values
location
2 Lake Pond
Number of Observations Read
Number of Observations Used
20
20
This tell us that SAS understands that there are two
classes: Lake and Pond. We also are told that SAS can
use all 20 values in this ANOVA procedure.
SAS ANOVA Output 2nd Page
The SAS System
10:10 Monday, June 19, 2006 4
The ANOVA Procedure
Dependent Variable: length
Source
Model
Error
Corrected Total
DF
1 ÷
18 ÷
19
R-Square
0.243911
Source
location
DF
1
Sum of
Squares
Mean Square
2205.000000 = 2205.000000
6835.200000 = 379.733333
9040.200000
Coeff Var
33.77253
Root MSE
19.48675
Anova SS
2205.000000
F Value
5.81
Pr > F
0.0269
P-value
length Mean
57.70000
Mean Square
2205.000000
F Value
5.81
Pr > F
0.0269
What are some ways to make the F Value larger?
The SAS System
13:17 Monday, October 4, 2004
Obs
What about analysis of
variance with three treatments:
Treatment
Mean
1
26.4
2
17.2
3
29.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
treat
size
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
25
22
30
26
29
15
12
22
19
18
32
31
27
26
29
Data Set;
3 treatments
with 5
replicates per
treatment
The SAS System
13:17 Monday, October 4, 2004
The ANOVA Procedure
Class Level Information
Variable name
Class
Levels
treat
3
Number of observations
Values
1 2 3
15
Variable labels
2
The SAS System
13:17 Monday, October 4, 2004
The ANOVA Procedure
Dependent Variable: size
Source
Sum of
Squares
DF
Model
2
Error
12
Corrected Total
14
Source
treat


384.4000000
126.0000000
=
=
Mean Square
F Value
Pr > F
192.2000000
18.30
0.0002
10.5000000
F-value
510.4000000
R-Square
Coeff Var
Root MSE
size Mean
0.753135
13.38996
3.240370
24.20000
P-value
DF
Anova SS
Mean Square
F Value
Pr > F
2
384.4000000
192.2000000
18.30
0.0002
Treatment
Mean
1
26.4
2
17.2
3
29.0
Which means are
different/similar?
3
Delineating the Means With SAS
proc anova; {Tells SAS to do the analysis of variance procedure}
class treatment; {Tells SAS that treatment is a class variable}
model weight=treatment; (Tells SAS to compare weight among treatments}
means treatment / tukey lines; {Tells SAS to delineate the means with a
Tukey test and use the lines method to show differences.
run;
The SAS System
13:17 Monday, October 4, 2004
The ANOVA Procedure
Tukey's Studentized Range (HSD) Test for size
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher
Type II error rate than REGWQ.
Alpha
Error Degrees of Freedom
Error Mean Square
Critical Value of Studentized Range
Minimum Significant Difference
0.05
12
10.5
3.77278
5.4673
Means with the same letter are not significantly different.
Tukey Grouping
Mean
N
treat
A
A
A
29.000
5
3
26.400
5
1
B
17.200
5
2
Treat 1 and 3 are not
different, 1 and 3 are
different than 2
7
Showing Results
35
30
A
A
25
B
20
15
10
5
0
1
2
3