Analysis of variance (2) - University of Hong Kong

Download Report

Transcript Analysis of variance (2) - University of Hong Kong

Analysis of variance (2)
Lecture 10
Measurements
(data)
Descriptive
statistics
Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test
YES
Data transformation
NO
Median, range,
Q1 and Q3
Mean, SD, SEM,
95% confidence
interval
Data transformation
F max test
Check the
Homogeneity
of Variance
One-way ANOVA
Tukey’s test
Two-way ANOVA
YES
Parametric Tests
Student’s t tests for
2 samples; ANOVA
for  2 samples; post
hoc tests for
multiple comparison
of means
NO
Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare
K-W test,
Dunn’s test
Kruskal-Wallis test with tied ranks
•
Example 10.11 (Zar, 1999) – comparison of pH among 4 ponds
Pond 1
Pond 2
Pond 3
Pond 4
pH
Rank
pH
Rank
pH
Rank
pH
Rank
7.68
1
7.71
6
7.74
13.5
7.71
6
7.69
2
7.73
10
7.75
16
7.71
6
7.70
3.5
7.74
13.5
7.77
18
7.74
13.5
7.70
3.5
7.74
13.5
7.78
20
7.79
22
7.72
8
7.78
20
7.80
23.5
7.81
26
7.73
10
7.78
20
7.81
26
7.85
29
7.73
10
7.80
23.5
7.84
28
7.87
30
7.76
17
7.81
26
7.91
31
n
Sum R
R^2/n
8
55
378.1
8
132.5
2194.5
7
145
3003.6
8
163.5
3341.5
Total
31
496
8917.8
N = 8 + 8 + 7 + 8 = 31
H = {12/[N(N + 1)]} (Ri2/ni) - 3(N + 1)
H = {12/[31(31 + 1)]} (8917.8) - 3(31 + 1) = 11.876
Number of groups of tied ranks = m = 7
 T = (ti3 - ti) = (23 - 2) + (33 - 3) + (33 - 3) + (43 - 4) + (33 - 3) + (23 - 2) + (33 - 3) = 168
C = 1 -  T / (N3 - N) = 1 - (168/ (313 - 31)) =0.9944
Hc = H/C = 11.876/ 0.9944 = 11.943
 = k - 1 = 4 -1 = 3
2 0.05, 3 = 7.815 < 11.943, 0.005< p <0.01, hence reject Ho (Table B1)
Nonparametric multiple comparisons: Dunn’s test (e.g. 11.10, Zar 1999)
Dunn’s test is a non-parametric test and is used to compare any significant
different means or medians.
Using Example 10.11:  T = 168
For nA = 8 and nB = 8,
SE = {[(N(N + 1)/12) –  T /(12(N – 1)][(1/nA) + (1/nB)]}
SE = {[(31(32)/12) – 168 /(12(31 – 1)][(1/8) + (1/8)]} = 4.53
For nA = 7 and nB = 8,
SE = {[(31(32)/12) – 168 /(12(31 – 1)][(1/7) + (1/8)]} = 4.69
Sample ranked by mean rank: 1
Rank sum:
Sample sizes:
Mean ranks:
RB - RA
Comparison Difference
SE
3 vs 1
20.71-6.88 = 13.83
3 vs 2
4.15
3 vs 4
0.27
4 vs 1
13.56
4 vs 2
3.88
2 vs 1
9.68
4.69
4.69
4.69
4.53
4.53
4.53
2
55
8
6.88
Q=
Diff/SE
Q 0.05, 4
2.95
2.639
0.88
2.639
0.06
2.639
2.99
2.639
0.86
2.639
2.14
2.639
4
132.5
8
16.56
3
163.5
8
20.44
Similar to
Tukey’s test
145
7
20.71
In conclusion, water pH is the same in
ponds 4 & 3 but is different in pond 1,
and the relationship of pond 2 to the
others is unclear. (see Table B15 for
critical Q values)
Measurements
(data)
Descriptive
statistics
Data transformation
Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test
YES
NO
Median, range,
Q1 and Q3
Mean, SD, SEM,
95% confidence
interval
Data transformation
F max test
Check the
Homogeneity
of Variance
One-way ANOVA
Tukey’s test
Two-way ANOVA
NO
YES
Other ANOVAs
Parametric Tests
Student’s t tests for
2 samples; ANOVA
for  2 samples; post
hoc tests for
multiple comparison
of means
Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare
Next lecture
K-W test,
Dunn’s test
Friedman
Two-factor ANOVA
• 2-way ANOVA
• Can simultaneously assess the effects of two
factors on a variable.
• Can also test for interaction among factors,
provided that data in each cell of a
contingency table consist observations n > 1.
• Assumption: normal data with equal
variances but ANOVA is robust (see p. 185188, Zar 1999)
2-way ANOVA with equal replication
Example 12.1: The effects of sex and hormone treatment on
plasma calcium concentrations (in mg/100 ml) of birds.
Control group
Female Male
16.5
14.5
18.4
11.0
12.7
10.8
14.0
14.3
Hormone
Female
39.1
26.2
21.3
35.8
treatment
Male
32.0
23.8
28.8
25.0
Questions:
• Is there a significant difference between the mean calcium
concentration of males and females?
• Is there a significant difference between the mean calcium
concentration in each treatment (control vs. hormone treatment)?
Mean & 95% C.I.
Calcium concentration (mg/ 100 ml)
45.0
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Female
Male
Control group
Female
Male
Hormone treatment
F-Test Two-Sample for
Variances
Variable 1
Variable 2
32.52
27.78
69.717
11.182
Observations
5
5
df
4
4
F
6.234752
45
Mean
Variance
P(F<=f) one-tail
0.052034
F Critical one-tail
6.388233
Calcium concentration (mg/ 100 ml)
40
Passed the Fmax test, indicating
equal variances among the four
means
Female
Male
Mean and
95% C.I.
35
30
25
Two factors
-Sex
-Hormone
20
15
10
5
0
Control
Hormone treated
SS total = SS within cells + SS between A + SS between B + SS interaction
SS within cells = SS total – SS cells
SS interaction = SS cells – SS between A – SS between B
DF total = N – 1
DF cells (explained) = (nA)(nB) – 1
DF within cells (residual or error) = (nA)(nB)(n’ – 1)
where n’= no. of replicates within each cell
DF between A = nA – 1
DF between B = nB – 1
DF A B interaction = (DF between A)(DF between B)
Sum of Xi
Control group
Sum Xi
Hormone treatment
Sum of Xi^2
Female
16.5
18.4
12.7
14.0
12.8
74.4
Male treatment total
14.5
11.0
10.8
14.3
10.0
60.6
135.0
Sum Xi
39.1
26.2
21.3
35.8
40.2
162.6
32.0
23.8
28.8
25.0
29.3
138.9
301.5
sex total
237.0
199.5
436.5
Control group
Sum Xi
Hormone treatment
Female
272.3
338.6
161.3
196.0
163.8
1131.9
Male
treatment total
210.3
121.0
116.6
204.5
100.0
752.4
1884.3
Sum Xi
1528.8
686.4
453.7
1281.6
1616.0
5566.6
1024.0
566.4
829.4
625.0
858.5
3903.4
9470.0
sex total
6698.6
4655.8
11354.3
• SS total = 11354.3 – (436.5)2/20 = 1827.7 DF total = 20 – 1 = 19
• SS cells = [(74.4)2 + (60.6)2 + (162.6)2 + (138.9)2]/5 - (436.5)2/20 = 1461.3
DF cells = (2)(2) - 1 = 3
• SS within cells = SS total - SS cells = 1827.7 – 1461.3 = 366.4
DF within cells = (2)(2)(5 – 1) = 16
Control group
Sum Xi
Hormone treatment
Female
16.5
18.4
12.7
14.0
12.8
74.4
Male treatment total
14.5
11.0
10.8
14.3
10.0
60.6
135.0
Sum Xi
39.1
26.2
21.3
35.8
40.2
162.6
32.0
23.8
28.8
25.0
29.3
138.9
301.5
sex total
237.0
199.5
436.5
•
SS total = 11354.3 – (436.5)2/20 = 1827.7
DF total = 20 – 1 = 19
•
SS cells = [(74.4)2 + (60.6)2 + (162.6)2 +
(138.9)2]/5 - (436.5)2/20 = 1461.3
DF cells = (2)(2) - 1 = 3
•
SS within cells = 1827.7 – 1461.3 = 366.4
DF within cells = (2)(2)(5 – 1) = 16
SS between treatments = {[(135.0)2 + (301.5)2] /(2)(5)} - (436.5)2/20 = 1386.1
DF between treatment = 2 - 1 = 1
SS between sexes = {[(237.0)2 + (199.5)2] /(2)(5)} - (436.5)2/20 = 70.31
DF between sexes = 2 - 1 = 1
SS interaction = SS cells – SS between A – SS between B = 1461.3 – 1386.1 – 70.31 = 4.900
DF interaction = (1)(1) = 1
Equations: See p. 242 (Zar, 1999)
Analysis of Variance Summary Table
Source of variance
Total
Cells
Hormone
Sex
Hormone x Sex
Within cells (error)
SS
1827.7
1461.3
1386.1
70.31
4.900
366.4
DF MS = SS/DF F
F critical, 0.05(1), 1, 16
P
19
3
1
1386.10 60.53
4.49
< 0.001
1
70.31 3.07
4.49
> 0.05
1
4.90 0.21
4.49
> 0.05
16
22.90
• There was a significant effect
of hormone treatment on
plasma calcium
concentrations in the birds (P
<0.001).
• There was no interaction
between sex and hormone
treatment while the sex effect
was not significant (likely due
to inadequate power)
45
Calcium concentration (mg/ 100 ml)
40
Female
Male
35
30
25
20
15
10
5
0
Control
Hormone treated
Tukey test same as 1-way ANOVA
Output from Excel
ANOVA
Source of Variation
Sample
Columns
Interaction
Within
Total
SS
df
MS
F
P-value
F crit
1386.113
1
1386.113
60.534
<0.001
4.494
70.313
1
70.313
3.071
0.099
4.494
4.901
1
4.901
0.214
0.650
4.494
366.372
16
22.898
1827.698
19
[Ca]
[Ca]
female
male
Sex 
Horm. 
control
hormone treated
[Ca]
Sex X
Horm. 
control
[Ca]
Sex 
Horm. X
Sex X
Horm. X
control
hormone treated
hormone treated
control
hormone treated
[Ca]
Sex 
Horm. 
[Ca]
Sex 
Horm. 
Intera. 
female
male
control
hormone treated
[Ca]
control
[Ca]
Sex 
Horm. 
Intera. 
control
hormone treated
hormone treated
Sex 
Horm. 
Intera. 
control
hormone treated
Interactive effects between variables: (a) no interaction; (b) interaction.
120
120
(a)
A
B
B
100
C
80
Growth rate (g/ day)
Growth rate (g/ day)
100
(b)
A
60
40
20
C
80
60
40
20
0
0
Spring
Summer
Autum
Winter
Spring
Summer
Autum
Winter
An example: The effects of light and sex on food
intake in starlings.
Total food intake (g) for 7 days
Male
69.5
72.1
73.2
71.1
72.3
73.3
70.0
72.9
100.0
90.0
80.0
Total food intake (g)
Day-length Female
Long (16h)
78.1
75.5
76.3
81.2
Short (8h)
82.4
80.9
83.0
88.2
70.0
60.0
50.0
40.0
30.0
20.0
10.0
0.0
Long day-length
Short day-length
An example: The effects of light and sex on food intake in starlings.
ANOVA
Source of Variation
Day-length
Sex
Interaction (Day-length x Sex)
Within
Total
SS
42.25
316.84
27.04
63.37
df
1
1
1
12
449.5
MS
42.25
316.84
27.04
5.28
F
P-value
8.001
0.015
59.998
< 0.001
5.120
0.043
F crit
4.747
4.747
4.747
15
90
female
Two sexes have different food
intake levels (p < 0.001).
Total food intake (g)
A significant interaction (p <0.05)
indicates that two sexes respond
significantly differently to daylength in the amount of food they
eat.
male
85
80
75
70
65
60
Long day-length
Short day-length
Computation of the F statistics for tests of significance in 2way ANOVA with replicates
Model I
both A & B
Hypothesized effect fixed factors
Model II
both A & B
random factors
Mixed model
Model III
factor A fixed
factor B random
Factor A
factor A MS
error MS
factor A MS
A x B MS
factor A MS
error MS
Factor B
factor B MS
error MS
factor B MS
A x B MS
factor B MS
A x B MS
Interaction A x B
A x B MS
error MS
A x B MS
error MS
A x B MS
error MS
(a) Equal replication per cell
Factor A
Level 1
Level 2
Level 3
Level 1
XXX
XXX
XXX
Factor B Level 2
XXX
XXX
XXX
Level 2
XXX
XXX
XXX
9
9
9
Level 4
XXX
XXX
XXX
9
12
12
12
N = 36
(b) Equal replication within rows: proportional replication within columns
Factor A
Level 1
Level 2
Level 3
Level 4
Level 1
XXX
XXX
XXX
XXX
12
Factor B Level 2
XXXX
XXXX
XXXX
XXXX
16
Level 2
XX
XX
XX
XX
8
9
9
9
9
N = 36
(c) Proportional replication within rows and within columns
XXX
XXXXXX
XXXXXXXXX
XXXXXX
XXX
XXXXXXXX XXXXXXXXXXXX XXXXXXXX
XX
XXXX
XXXXXX
XXXX
9
18
27
18 N =
24
32
16
72
(d) Disproportional replication
XXX
XX
XXXX
XX
XXXX
XXX
11
XX
XX
XXXX
XXXX
XXX
XXX
7
8
11
11
14
10 N = 36
(e) No replication
X
X
X
X
X
X
3
X
X
X
3
X
X
X
3
Cannot test for the interacting effect !
3
4
4
4
N = 21
2-way ANOVA for data without replication
Fertilizer
Farm
1
2
3
4
A
1130
1115
1145
1200
B
1125
1120
1170
1230
C
1350
1375
1235
1140
D
1375
1200
1175
1325
E
1225
1250
1225
1275
F
1235
1200
1155
1215
ANOVA
Source of Variation
Farm
Fertilizer blend
Error
SS
11070.83
59762.5
69479.17
Total
140312.5
df
MS
F
P-value
F crit
3 3690.278 0.796702 0.51467 3.287383
5 11952.5 2.58045 0.070826 2.901295
15 4631.944
23
Interaction cannot be measured where data in each cell of a contingency
table consist of single observations. Variability due to interaction is
combined with the within variability and it is assumed to be negligible.
Other two experimental design suitable for 2-way ANOVA
The randomized block
Repeated-measures
See Example 12.4 (Zar, 99)
Each block contains 4 animals:
See Example 12.5 (Zar, 1999)
Block
Block
Block
Block
Block
1
2
3
4
5
Diet
Diet
Diet
Diet
Diet
3
1
3
4
1
Diet
Diet
Diet
Diet
Diet
4
3
2
2
4
Diet
Diet
Diet
Diet
Diet
1
2
4
1
3
Diet
Diet
Diet
Diet
Ciet
2
4
1
3
2
Other example: different colours
of buckets (water traps) to sample
insects
e.g. effects of diet type on food
wastage in fish farm
Cage
1
2
3
4
5
Diet 1
X11
X12
X13
X14
X15
Diet 2
X21
X22
X23
X24
X25
Diet 3
X31
X32
X33
X34
X35
Diet 4
X41
X42
X43
X44
X45
Equivalent non-parametric method: Friedman’s analysis of variance by ranks
(see p. 263-266, Zar 1999)
Use SPSS to conduct a 2-way ANOVA
Dependent variable
Column 1
obs. 1
obs. 2
obs. 3
Column 2
1
1
1
Factor B
Column 3
1
2
3
……..
……..
……..
obs. i - 2
obs. i - 1
obs. i
Factor A
2
2
2
1
2
3
Measurements
(data)
Descriptive
statistics
Data transformation
Normality Check
Frequency histogram
(Skewness & Kurtosis)
Probability plot, K-S
test
YES
NO
Median, range,
Q1 and Q3
Mean, SD, SEM,
95% confidence
interval
Data transformation
F max test
Check the
Homogeneity
of Variance
One-way ANOVA
Tukey’s test
Two-way ANOVA
NO
YES
Other ANOVAs
Parametric Tests
Student’s t tests for
2 samples; ANOVA
for  2 samples; post
hoc tests for
multiple comparison
of means
Non-Parametric
Test(s)
For 2 samples: MannWhitney
For 2-paired samples:
Wilcoxon
For >2 samples:
Kruskal-Wallis
Sheirer-Ray-Hare
Next lecture
K-W test,
Dunn’s test
Friedman
Key notes
• After performing a Kruskal-Wallis test, a Dunn’s test can be
used to identify any significantly different medians (or means)
based on ranking
• Two-way ANOVA can be used to analyze samples which have
been subjected to two levels of treatment
• In two-way ANOVA, there are several different design: (Model 1)
both factors A and B are fixed factors; (Model 2) both factors are
random factors; (Model 3) mixed factors. Furthermore, two-way
ANOVA can also be applied to data with randomized block or
repeated measure designs as well as data without replication.
• In two-way ANOVA, interaction cannot be tested where data in
each cell of a contingency table consist of single observations.