Chapter 4 analysis of variance (ANOVA)

Download Report

Transcript Chapter 4 analysis of variance (ANOVA)

Chapter 4
analysis of variance
(ANOVA)
Section
the basic idea
1
and
condition of application
Objective: deduce and compare several (or two ) population
means .
Method: analysis of Variance (ANOVA), ie F test for
comparing several sample means .
Basic idea : according to the type of design, the sum of
squares of deviation from means (SS) and degree of
freedom (df) were divided into two or several sections .
Except the chance error, the variation of every section can
be explained by a certain or some factors.
Condition of Application :
population : normal distribution and
homogeneity of variance.
Sample: independent and random
Types of design :
The ANOVA of completely random design;
The ANOVA of randomized block design;
The ANOVA of Latin square design;
The ANOVA of cross-over design ;
The basic idea of ANOVA of completely random design
Table 4-1
group
the results of g groups
Measure value
Statistic
1level
X11
X12
…
X1j
…
X1n1
n1
X1
S1
2 level
X21
X22
…
X2j
…
X 2n2
n2
X2
S2
…
…
…
…
…
…
…
…
…
…
g level
Xg1
Xg2
…
Xgj
…
X gng
ng
Xg
Sg
total
N
S
partition of variation
sum of squares of deviations from
mean,SS :
SS    X ij  X     X  X 
g ni
i 1 j 1
2
N
i, j
 ( N  1) S
2
2
1. total variation : the degree of variation of
all variable values , the formula as follows
SSt ot al     X ij  X     X ij  C
g
ni
g
2
i 1 j 1
N
  X  C,
2
ij
i, j
2
i 1 j 1
 t ot al  N  1
g ni
amend factor:
ni
C
( X ij )
i 1 j 1
N
N
2

( X ij )
i, j
N
2
2. between- group variation: the sum of squares of
deviations from mean between groups means and
grand mean show the effects of treatment and
random error, the formula :
ni
g
g
(  X ij )
i 1
ni
SSbg   ni ( X i  X )  
2
i 1
 bg  g  1
j 1
2
C
3.Within-group Variation : differences among values
within each group .The formula as follows:
g ni
SSwg   ( X ij  X i )
i 1 j 1
 wg  N  g
2
the relation of three variation
SSt ot al  SSbg  SSwg
 t ot al   bg  wg
mean square,MS
MS bg 
SS bg
MS wg 
SS wg
 bg
 wg
Test statistic:
•
MSbg
F
,  1   bg ,  2   wg
MSwg
If
1  2    g
,MS , MS
were the
estimated value of the random
error ,

F value should be close to 1 .
• If  ,  , ,  were not equal , F value will
be larger than 1.
bg
wg
2
1
2
g
Section 2
The ANOVA of Completely Random
Design
completely random design
All of objects were randomly distributed
to g groups (levels), and every group
give the different treatment. The effects
of
treatment
comparing
will
the
experimentation.
be
groups
deduced
means
by
after
Example 4-1
A doctor want to explore the
clinic effect of a new medicine for reducing
blood fat, and selects 120 patients according
to the same standard. All of patients were
divide into 4 groups by the completely random
design. How should he divide the groups?
The methods of dividing groups
of completely random design
1. serial number: 120 patients was numbered from 1to
120 ( table 4-2 column 1);
2. choosing random figure: you can begin from the any
row or any column in the appendix 15(for example
beginning from the fifth row and seventh column),
and read three digit in turn as a random number to
write down the serial number, (table 4-2,column 2)
3.edit serial number: edit serial number according to the number from
small to large (the same number according to early or late order)
(table 4-2,column 3)
4.define in advance: the serial numbers from 1-30 were defined the A
group; 31-60 were the B group; 61-90 were the C group; 91-120 were
the D group, (table 4-2,column 4)
Table 4-2 the grouping result of completely random design
Serial number
1
2
3
4
5
6
7
8
9
10 … 119 120
Random number 260 873 373 204 056 930 160 905 886 958 … 220 634
rank
Grouping result
24 106 39 15
3 114 13 109 108 117 … 16 75
A
A
D
B
A
D
A
D
D
D … A
C
(2)the choice of statistic
methods
1. If the data accord with normal distribution
and
homogeneity
of
variance,
one-way
ANOVA or independent t test was used (g=2) ;
2. If the data are not normal distribution or
heterogeneity
of
variance,
the
datum
transform or Wilcoxon rank sum test can be
done.
decompose of variation
Table 4-4
Variation
source
the ANOVA of completely random design
df
SS
g
Total
N-1
g-1
N-g
SS bg
MS bg
MS wg
2
  X ij  C
i 1 j 1
g

i 1
Within-group
F
ni
ni
Between-group
MS
(
j 1
X ij )
ni
2
C
SSt ot al  SS bg
 bg
SS wg
 wg
Example 4-2
A doctor wanted to explore the clinic
effect of a new medicine for reducing blood fat, and
selected 120
patients
according to the
same
standard. He divided all of patients into 4 groups by
the completely random design. The low density
lipoprotein were measured after 6 weeks by double
blind experiment, table 4-3. Is there difference
among
the
population
lipoprotein of 4 groups ?
means
of
low
density
Table 4-3 the low density lipoprotein value of 4 treatment groups (mmol/L)
statistic
value
group
Placebo group
3.53
4.59 4.34 2.66 3.59 3.13
2.64 2.56 3.50 3.25
3.30
4.04 3.53 3.56 3.85 4.07
3.52 3.93 4.19 2.96
1.37
3.93 2.33 2.98 4.00 3.55
2.96
2.42
3.36 4.32 2.34 2.68 2.95
1.56 3.11 1.81 1.77
1.98
2.63 2.86 2.93 2.17 2.72
2.65 2.22 2.90 2.97
2.36
2.56 2.52 2.27 2.98 3.72
2.80 3.57 4.02 2.31
2.86
2.28 2.39 2.28 2.48 2.28
3.21 2.23 2.32 2.68
2.66
2.32 2.61 3.64 2.58 3.65
2.66 3.68 2.65 3.02
3.48
2.42 2.41 2.66 3.29 2.70
3.04 2.81 1.97 1.68
0.89
1.06 1.08 1.27 1.63 1.89
1.19 2.17 2.28 1.72
1.98
1.74 2.16 3.37 2.97 1.69
0.94 2.11 2.81 2.52
1.31
2.51 1.88 1.41 3.19 1.92
2.47 1.02 2.10 3.71
X
X2
n
Xi
30
3.43
102.91 367.85
30
2.72
81.46
233.00
30
2.70
80.94
225.54
30
1.97
58.99
132.13
4.3 4.16 2.59
New medicine
2.4g
4.8g
7.2g
三、steps of analysis
1. State the
hypotheses
and test criteria




H 0:
1

2

3

4
,即 4 个试验组的总体均数相等
H 1:4 个试验组的总体均数不全相等
H0:


 0.05
 2  3  4 ie. all of 4 population means are equal.
按表 4-4 中的公式计算各离均差平方和 SS、自由度 、均
方
MS1和
F 值。
H1:not all of the population means are equal
  0.05
2 . Calculate test statistic
  X ij  102.91  81.46  80.94  58.99  324.30
2
X
  ij  367.85  233.00  225.54  132.13  958.52
C  (324.30) 2 /120  876.42
SSt ot al  958.52  876.42=82.10 , 总 =120-1=119
(102.91) 2 (81.46) 2 (80.94) 2 (58.99) 2
SSbg 



 876.42  32.16
30
30
30
30
 bg  4  1  3
SS wg  82.10  32.16  49.94 ,  wg  120  4  116
32.16
49.94
MS bg 
 10.72
MS wg 
 0.43
,
3
116
10.72
F
 24.93
0.43
,
list the ANOVA table
Table 4-5 the table of ANOVA of
completely random design
variation source
total
Between-group
Within-group
df
119
3
116
SS
82.10
32.16
49.94
MS
F
P
10.72
0.43
24.93
<0.01
3. Calculate p value and deduce
according to a=0.05 level, reject H 0 , and accept H ,
1
not all of 4 population means are equal; ie. different
dose medicines have different effects on ldl-c.
attention:
if the result of ANOVA is to reject H0 , and accept H1, it
does not mean that all of population means have
difference each other. If analysing which groups have
significant difference , we must compare among
several population means (section 6). When g=2, the
ANOVA of completely random design is equal to
t F
independent t test, ie.
Section 3
The ANOVA of randomized
block design
randomized block design
(1) grouping method of randomized block design :
Firstly, match the objects as the blocks according
to the non-treatment factor affecting the result
of experiment (such as sex, weight, age,
occupation , state of illness, course of disease et al) .
Secondly, the objects of each block were randomly
distributed to each treatment group or control group.
(2)characteristic of randomized block
design
• Random distribution was repeated many times for
objects of the blocks. The number of objects is
same in every treatment group.
• SS of the block variation was separated from SS of
the within-group variation of completely random
design; SS of within-group (sum of error square)
was decreased, and power of test was increased.
example 4-3 distribute 15 white mice of 5 blocks
to three treatment groups , how to do it ?
Grouping method: firstly, number the mice by the weight,
and match the 3 near weigh mice as a block (table 4-6).
Secondly, select 2 digit as one random number from any
row or any column in the random number table, for
example, from the 8th row and third column (table 4-6);
and rank the random number from small to large in every
block. The object of serial number in each block is 1,2,3
will accept A,B,C treatment respectively. (table 4-6)
Table 4-6
the distribution result of 5 blocks white mice
block
1
2
3
4
5
White mice 1
Random
number
order
2 3
4 5 6
7 8 9 10 11 12 13 14 15
68 35 26 00 99 53 93 61 28 52 70 05 48 34 56
3
2
1
1
treatment C B A A
3
2
3
2
1
2
3
1
2
1
3
C B C B A B C A B A C
table 4-7
Block
number
the result of random block design
Treatments
(g level)
1
2
3
…
g
1
X11
X21
X31
…
Xg1
2
X12
X22
X32
…
Xg2
…
…
…
…
…
…
j
X1j
X2j
X3j
…
Xgj
…
…
…
…
…
…
n
X1n
X 2n
X 3n
…
X gn
partition of variation
(1)Total variation:SStotal.
(2) Treatment-group variation :SStreatment.
(3) block-group variation:SSblock.
(4) Error variation:SSerror.
SSt ot al  SSt r eat ment  SSbl ock  SSer r or
 t ot al   t r eat ment   bl ock   er r or
table 4-8 the ANOVA of random block design
variaion
df
SS
g
total
treanment
Block
error
n-1
F
SSt r eat ment
 t r eat ment
MSt r eat ment
MSerror
SSbl ock
 bl ock
MSbl ock
MSer r or
n
2
X
  ij  C
N -1
g-1
MS
i 1 j 1
1
g
 ( X ij )  C
n i 1
1
g
n
n
2
j 1
g
 ( X ij )  C
2
j 1 i 1
(n-1)(g-1) SStotal- SStreatment--SSblock
SSer r or
 er r or
Steps of analysis
example 4-4
15 mice were divided into 5 blocks by
the weight. there are 3 mice in every
block .the result showed in table 4-9. is
there difference among 3 treatment
groups?
table 4-9 the variable values of different groups(g)
g
block
A
B
C
 X ij
i 1
1
0.82
0.65
0.51
1.98
2
0.73
0.54
0.23
1.50
3
0.43
0.34
0.28
1.05
4
0.41
0.21
0.31
0.93
5
0.68
0.43
0.24
1.35
 X ij
3.07
2.17
1.57
6.81
Xi
0.614
0.434
0.314
0.454
(X )
 X ij2
2.0207
1.0587
0.5451
3.6245
(
n
j 1
n
j 1
( Xij )
Xij2 )
H0:1  2  3
H1:not of all population means are equal
  0.05
g
n
C  ( X ij )2 / N  (6.81) 2 /15  3.0917
i 1 j 1
g
n
SSt ot al   X ij  C  3.6245  3.0917=0.5328
2
i 1 j 1
SSt r eat ment
1g n
1
2
  ( X ij )  C  (3.072  2.172  1.572 )  3.0917  0.228
n i1 j 1
5
1 n g
SSbl ock   ( X ij )2  C
g j 1 i 1
1
 (1.982  1.502  1.052  0.932  1.352 )  3.0917  0.2282
3
Table 4-10 the ANOVA of example 4-4
variation
df
SS
MS
F
P
total
14
0.5328
treatment
2
0.2280
0.1140
11.88 <0.01
block
4
0.2284
0.0571
5.95 <0.05
error
8
0.0764
0.0096
according to 1=2、2=8, check F value table:
F0.05(2,8)  4.46, F0.01(2,8)  8.65,
F  11.88  F0.01(2,8) , P  0.01。
At α=0.05 level,reject H0,accept H1,not
all of population means are equal.
section 6
multiple comparison
can the above example be analyzed by t test ?
Numbers of t test
C 42  6
a=0.05, the probability of non-type I error for one comparison :
1-0.05=0.95;
the probability of non-type I error for all of 6 times analysis :
0.956 =0.77;
the probability of type I error for 6 times analysis: 1-0.77=0.23
the probability of type I error will be increased
Condition of application:
when the result of ANOVA reject H0,
and accept H1, not all of population
means are equal. If wanting to know
the difference between any two group
means, we should do the multiple
comparison.
LSD-t test
(least significant difference)
The formula
LSD  t 
SX
i X j
Xi  X j
SX
,    er r or
iX j
1 1 
 MSer r or   
 ni n j 


MS误差  MS组内
example 4-7 for the example 4-2
data,are there difference among the
population means of 2.4g、4.8g、7.2g
and placebo group?
Comparing between 2.4g and placebo group:
H 0 :  2.4g  0
H1 :  2.4g  0
α=0.05
According to example 4-2 , X 2.4g =2.72 ,
X 0 =3.43, n2.4g = n0 =30, MSer r or =0.43,
S Xi  X j
error=116。
1 
 1
= 0.43     =0.17
 30 30 
2.72  3.43
LSD-t =
=-4.18
0.17
ν=116,t=4.18 check t value table,P<0.001。at
  0.05 ,reject H0 ,accept H1 ,and there are
significant .
4.8g VS placebo group: LSD-t =-4.29
7.2g VS placebo: LSD-t =-8.59。
Dunnett- t test
formula:
Xi  X0
Dunnett- t 
,
S Xi  X0
SX
i X0
   er r or
1 1
 MSer r or    ,    er r or
 ni n0 
example 4-8 according to example
4-2, compare 3 population means of
treatment groups and placebo
group,respectively?
H0:μi=μ0
H1:μi
α=0.05
μ
0
X 2.4g =2.72,X 4.8g =2.70,X 7.2g =1.97,
According to example 4-2,
X 0 =3.43, ni = n0 =30, MSer r or =0.43, error=116.
Dunnett- t2.4g 
Dunnett- t4.8g 
Dunnett- t7.2g 
2.72  3.43
1 
 1
0.43  


30
30


2.70  3.43
1 
 1
0.43  


30
30


1.97  3.43
1 
 1
0.43  


30
30


=-4.18
=-4.29
=-8.59
ν=116、T=g-1=4-1=3 , check Dunnett-t
value table ( two tail ),
t0.01/2,116  t0.01/2,120 =2.98 。
t2.4g  t0.01/2,116 ,t4.8g  t0.01/2,116 ,t7.2g  t0.01/2,116 P<0.01。
at the level of   0.05 ,reject H0,accept
H1,there is significant difference。
三、SNK-q test
(Student-Newman-Keuls)
q
SX
Xi  X j
SX
iX j
, ν=νerror
iX j
MSer r or

2
1 1 
  
 ni n j 
X i , ni and X j , n j mean the group means and sample
numbers.
Example 4-9 according to 4-4,
compare the 3 group means by
SNK-q test
H0:μA=μB
H1:μA≠μB,
α=0.05
rank the 3 group means from small to
large and number them
mean
0.314
0.434
0.614
group
C
B
A
number
1
2
3
MS 误差 =0.0096, 误差
Example 4-4
 8 。numbe of
sample is 5, S X  X  0.0096  1  1   0.0438 。
i
j
2 5 5
Table 4-15
the comparing between two
group means
group
Xi  X j
a
q
q 0.05
q0.01
P
(1)
(2)
(3)
(4)
(5)
(6)
(7)
1,2
0.12
2
2.74
3.26
4.75
>0.05
1,3
1.30
3
6.85
4.04
5.64
<0.01
2,3
0.18
2
4.11
3.26
4.75
<0.05
conclusion:
there are significant difference
between A and B, A and C.