Transcript No Slide Title
Multiple Comparison Procedures
Once we reject H 0 :
=
=...
c in favor of H 1 : NOT all
’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4
’s different? Are 3 the same and one different? If so, which one? etc.
1
These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES.
Errors (Type I): We set up “
” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at
= .05; each test has type I error (rej H 0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) = 1-P( accept all 3, given true ) = 1 - (.95) 3
.14
2
In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23.
Question - Should we choose
= .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or
experimentwise )?
OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test
by 1 - (1-
) 5 (which gives us
= .011)?
= .05,
3
The formula 1 - (1-
) 5 = .05
would be valid only if the tests are independent ; often they’re not.
1 2 [ e.g.,
1 =
2
2 = 3
3 ,
1 =
3
4
When the tests are not independent, it’s usually very difficult to arrive at the correct
for an individual test so that a specified value results for the overall error rate.
5
Categories of multiple comparison tests
- “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.) - “Pairwise” comparisons (every column mean compared with each other column mean) - “Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”) 6
(Pairwise comparisons are traditionally considered as “post hoc” and not “a priori”, if one needs to categorize all comparisons into one of the two groups) There are many multiple comparison procedures. We’ll cover only a few.
Method 1: Do a series of pairwise t-tests, each with specified
value (for individual test).
This is called “Fisher’s LEAST SIGNIFICANT DIFFERENCE” (LSD).
7
Example: Broker Study
A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used.
Y=1000(A-P)/A
where P=per share price paid for the stock; A=average of high price and low price per share, for the day.
“The higher Y is the better the trade is.” 8
1 12 3 5 -1 12 5 6 2 7 17 13 11 7 17 12 CoL: broker 3 8 1 7 4 3 7 5 4 21 10 15 12 20 6 14 5 24 13 14 18 14 19 17
}
R=6 Five brokers were in the study and six trades were randomly assigned to each broker.
9
Source SSQ df MSQ Col
640.8
4 160.2
F
7.56
Error
530 25 21.2
“MSW”
= .05, F
TV
= 2.76
(reject equal column MEANS)
10
For any comparison of 2 columns,
/2 Y i -Y j
/2 C L 0 C u AR: 0 + t
-
/2 25 df x
MSW x
1 n i + 1 n j (n i = n j = 6, here) SQ Root of : Pooled Variance, “s 2 p ”, perhaps, in earlier class in basic statistics
11
In our example, with
=.05 0
2.060 (
21.2 x
1 6 + 1 ) 0
5.48 6 This value, 5.48 is called the Least Significant Difference (LSD).
When same number of data points, R, in each column, LSD = t
-
/2 x
2xMSW R .
12
Underline Diagram
Now, rank order and compare: Col: 3 1 2 4 5 5 6 12 14 17
13
Step 1: identify difference > 5.48, and mark accordingly: 3 1 2 4 5 5 6 12 14 17 2: compare the pair of means within each subset: Comparison difference vs. LSD 3 vs. 1 * < 2 vs. 4 2 vs. 5 * 5 < < 4 vs. 5 * < * Contiguous; no need to detail
14
Conclusion : 3, 1 2, 4, 5 Can get “inconsistency”: Suppose col 5 were 18: 3 1 2 4 5 5 6 12 14 18 Now: Comparison |difference| vs. LSD 3 vs. 1 * < 2 vs. 4 2 vs. 5 * 6 < > 4 vs. 5 * < Conclusion : 3, 1 2 4 5 ???
15
Conclusion : 3, 1 2 4 5
• Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers.
• Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly.
16
MULTIPLE COMPARISON TESTIN G
AFS BROKER STUD Y
BROKER ----> TRADE 1 2 3 4 5 6 COLUMN MEAN
1
12 3 5 -1 12 5 6
2
7 17 13 11 7 17 12
3
8 1 7 4 3 7 5
4
21 10 15 12 20 6 14
SOURCE
BROKER ERROR
AN OVA TABLE SSQ D F
640.8
530 4 25
MS
160.2
21.2
Fcalc
7.56
5
24 13 14 18 14 19 17 17
Using SPSS Variable Score By Variable Broker Analysis of Variance Sum of Mean F F Source D.F. Squares Squares Ratio Prob.
Between Groups 4 640.8000 160.2000 7.5566 .0004
Within Groups 25 530.0000 21.2000
Total 29 1170.8000
18
Fisher’s LSD
USIN G SPSS 5.0 - MAC Variable Score By Variable Broker Mu ltip le Range Tests: LSD test w ith significance level .05
The d ifference betw een tw o m eans is significant if MEAN (J)-MEAN (I) >= 3.2558 * RAN GE * SQRT(1/ N (I) + 1/ N (J)) w ith the follow ing valu e(s) for RAN GE: 2.91
(*) Ind icates significant d ifferences w hich are show n in the low er triangle G G G G G r r r r r p p p p p Mean Broker 3 1 2 4 5 5.0000 Grp 3 6.0000 Grp 1 12.0000 Grp 2 * * 14.0000 Grp 4 * * 17.0000 Grp 5 * * Su bset 1 Grou p Grp 3 Grp 1 Mean 5.0000 6.0000
- - - - - - - - - - - - - - - - Su bset 2 Grou p Grp 2 Grp 4 Grp 5 Mean 12.0000 14.0000 17.0000
- - - - - - - - - - - - - - - - - - - - - - - 19
USIN G WIN DOWS 8.0
1=colu m n of interest, 2=com p ared colu m n, 3=d ifference, 4=std . error, 5=p -valu e 6, 7 = 95 confid ence lim its LSD (1) (2) (3) (4) (5) (6) (7) 1 2 3 4 5 -6.00* 2.658 .033
1.00
2.658 .710
-8.00* 2.658 .006
-11.00* 2.658 .000
-11.47 -.53
-4.47
6.47
-13.47 -2.53
-16.47 -5.53
2 3 4 5 1 3 4 5 6.00* 2.658 .033
7.00* 2.658 .014
-2.00
-5.00
2.658 .459
2.658 .072
1 2 3 5 1 2 -1.00
2.658 .710
-7.00* 2.658 .014
4 -9.00* 2.658 .002
5 -12.00* 2.658 .000
8.00* 2.658 .006
2.00
9.00* 2.658 .002
-3.00
2.658 .459
2.658 .270
1 2 11.00* 2.658 .000
5.00
2.658 .072
3 12.00* 2.658 .000
4 3.00 2.658 .270
.53 11.47
1.53 12.47
-7.47 3.47
-10.47 .47
-6.47
4.47
-12.47 -1.53
-14.47 -3.53
-17.47 -6.53
2.53
-3.47
3.53
-8.47
13.47
7.47
14.47
2.47
5.53
-.47
6.53
-2.47
16.47
10.47
17.47
8.47
20
Fisher's pairwise comparisons
(Minitab)
Family error rate = 0.268
Individual error rate = 0.0500
Critical value = 2.060
t_(
/2)
Intervals for (column level mean) - (row level mean) 1 2 3 4 2 -11.476
-0.524
3 -4.476 1.524
6.476 12.476
4 -13.476 -7.476 -14.476
-2.524 3.476 -3.524
5 -16.476 -10.476 -17.476 -8.476
-5.524 0.476 -6.524 2.476
Minitab: Stat>>ANOVA>>one way anova then click “comparisons”.
21
In the previous procedure, each individual comparison has error rate
=.05. The overall error rate is, were the comparisons independent, 1- (.95) 10 = .401.
However, they’re not independent.
Method 2: A procedure which takes this into account and pre-sets the overall error rate is “TUKEY’S HONESTLY SIGNIFICANT DIFFERENCE TEST ”.
22
Tukey’s method works in a similar way to Fisher’s LSD, except that the “LSD” counterpart (“HSD”) is not t
-
/2 x
MSW x
1 n i + 1 n j ( or, for equal number ) of data points/col = t
-
/2 x
2xMSW R , but tuk
-
/2 X
2xMSW R , where t uk has been computed to take into account all the inter-dependencies of the different comparisons.
23
HSD =
t
uk
-
/2
x
2MSW R ________________________________________
A more general approach is to write HSD =
q
-
/2 x
MSW R
where
q
-
/2 = t uk
-
/2 x
2 --- q = (Y largest - Y smallest ) /
MSW R --- probability distribution of q is called the “Studentized Range Distribution”. --- q = q(c, df), where c =number of columns, and df = df of MSW
24
q table
25
With c = 5 and df = 25, from table: q = 4.16 (between 4.10 and 4.17) t
uk
= 4.16/1.414 = 2.94
Then, HSD = 4.16 ./6=7.8 also .94 x ./6=7.8 26
In our earlier example: 3 1 2 4 5 5 6 12 14 17 Rank order: (No differences [contiguous] > 7.82)
27
Comparison |difference| >or< 7.82
3 vs. 1 3 vs. 2 7 * (contiguous) < < 3 vs. 4 3 vs. 5 9 12 > >
1 vs. 2 * < 1 vs. 4 1 vs. 5 8 11 > >
2 vs. 4 * < 2 vs. 5 5 < 4 vs. 5 * 3, 1, 2 4, 5 < 2 is “same as 1 and 3, but also same as 4 and 5 .”
28
Tukey’s HSD (“LSD ”)
Variable Score By Variable Broker Mac Mu ltip le Range Tests: Tu key-H SD test w ith significance level .05
The d ifference betw een tw o m eans is significant if MEAN (J)-MEAN (I) >= 3.2558 * RAN GE * SQRT(1/ N (I) + 1/ N (J)) w ith the follow ing valu e(s) for RAN GE: 4.15
(*) Ind icates significant d ifferences w hich are show n in the low er triangle G G G G G r r r r r p p p p p Mean Broker 5.0000 Grp 3 6.0000 Grp 1 3 1 2 4 5 12.0000 Grp 2 14.0000 Grp 4 * * 17.0000 Grp 5 * * Su bset 1 Grou p Grp 3 Grp 1 Grp 2 Mean 5.0000 6.0000 12.0000
- - - - - - - - - - - - - - - - - - - - - - - Su bset 2 Grou p Grp 2 Grp 4 Grp 5 Mean 12.0000 14.0000 17.0000
29
Wind ow s 8.0 form at, w ith the sam e colu m n m eanings: (1) (2) (3) (4) (5) (6) (7) Tu key H SD 1 2 3 -6.00
1.00
2.658 .192
2.658 .995
4 -8.00* 2.658 .043
5 -11.00* 2.658 .003
-13.81 1.81
-6.81 8.81
-15.81 -.19
-18.81 -3.19
2 3 1 3 4 5 6.00
7.00
-2.00
-5.00
2.658 .192
2.658 .094
2.658 .942
2.658 .353
1 2 4 -1.00
-7.00
2.658 .995
2.658 .094
-9.00* 2.658 .018
5 -12.00* 2.658 .001
-1.81 13.81
-.81 14.81
-9.81 5.81
-12.81 2.81
-8.81 6.81
-14.81 .81
-16.81 -1.19
-19.81 -4.19
4 5 1 2 3 5 1 2 3 4 8.00* 2.658 .043
2.00
2.658 .942
9.00* 2.658 .018
-3.00
2.658 .790
11.00* 2.658 .003
5.00
2.658 .353
12.00* 2.658 .001
3.00
2.658 .790
.19 15.81
-5.81 9.81
1.19 16.81
-10.81 4.81
3.19 18.81
-2.81 12.81
4.19 19.81
-4.81 10.81
For Tukey’s HSD, the Windows SPSS output also provides another format, called “homogeneous Subsets” (it doesn’t provide it for Fisher’s LSD): Tu key H SD Broker N Su bset 1 Su bset 2 Su bset 3 3 6 5.00
1 2 4 5 Sig.
6 6 6 6 6.00
12.00
.094
12.00
14.00
17.00
.353
30
Tukey's pairwise comparisons
(Minitab)
Family error rate = 0.0500
Individual error rate = 0.00706
Critical value = 4.15
q_(1-
/2)
Intervals for (column level mean) - (row level mean) 1 2 3 4 2 -13.801
1.801
3 -6.801 -0.801
8.801 14.801
4 -15.801 -9.801 -16.801
-0.199 5.801 -1.199
5 -18.801 -12.801 -19.801 -10.801
-3.199 2.801 -4.199 4.801
31
Exercise: Drug Study
A drug company are developing two new drug formulations for treating flu, denoted as drug A and drug B. Two groups of 10 volunteers were taken drug A and drug B, respectively, and after three days, their responses (Y) were recorded. A placebo group was added to check the effectiveness of drugs. The larger the Y value is, the more effective the drug is. Here is the data: (MSE=1) Index i Column mean Sample size Drug A 1 -5.3
10 Drug B 2 -6.1
10 Placebo 3 -12.3
10 32
LSD = t 97.5%;27 df 2/10 = 2.052
2/10 = 0.9177
Placebo Drug B Drug A HSD =
q
97 .5%;27 df 1/10 = 3.51
1/10 = 1.110 Placebo Drug B Drug A 33
Method 3: Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Example: CONTROL Col 1 2 3 4 5 6 12 5 14 17 } R=6 Analog of LSD (=t 1-
/2 x
2 MSW R ) = Dut 1-
/2 x
2 MSW R
34
D table p. 107
35
Dut 1-
/2 x
2 MSW/R = 2.61 (
2(21.2) ) 6 = 6.94
In our example: CONTROL 1 2 3 4 5 6 12 5 14 17 Comparison |difference| >or< 6.94
1 vs. 2 6 < 1 vs. 3 1 vs. 4 1 8 < > 1 vs. 5 11 > - Cols 4 and 5 differ from the control [ 1 ].
- Cols 2 and 3 are not significantly different from control.
36
DUNNETT Dependent Variable: SCORE Dunnett t (2-sided) (I) BROKER 2 (J) BROKER 1 Mean Difference (I-J) 6.00
Std.
Error 2.658
Sig.
.103
3 4 1 1 -1.00
8.00* 2.658
2.658
.987
.020
5 1 11.00* 2.658
.001
* The mean difference is significant at the .05 level.
95% Confidence Interval Lower Bound -.93
-7.93
1.07
4.07
Upper Bound 12.93
5.93
14.93
17.93
37
Dunnett's comparisons with a control
(Minitab)
Family error rate = 0.0500
Individual error rate = 0.0152
Critical value = 2.61
Dut_1-
/2
Control = level (1) of broker Intervals for treatment mean minus control mean Level Lower Center Upper --+---------+---------+---------+---- 2 -0.930 6.000 12.930 (---------*--------) 3 -7.930 -1.000 5.930 (---------*--------) 4 1.070 8.000 14.930 (--------*---------) 5 4.070 11.000 17.930 (---------*---------) --+---------+---------+---------+---- -7.0 0.0 7.0 14.0
38
Method 4: MCB Procedure (Compare to the best) This procedure provides a subset of treatments that cannot distinguished from the best. The probability of that the “best” treatment is included in this subset is controlled at 1 .
*Assume that larger is better.
39
STEP 1: Calculate the following for all index i
y i
.
and max
j
i
(
y j
.
)
M il
=
D
c
1 ,
v
) 1
MSE
(
n i
1
n l
) where
l
(not
i)
is the group of which mean reaches max
j
i
(
y j
.
) 40
STEP 2: Conduct tests
The treatment
i
is included in the best subset if
D i
= [
y i
.
max
j
i
(
y j
.
)] -
M il
.
41
Index i Column mean max
j
i y j
.
D i Drug A 1 -5.3
-6.1
0.8
Drug B 2 -6.1
-5.3
-0.8
Placebo 3 -12.3
-5.3
-7
M il
=
D
5 % ( 2 , 27 )
MSE
1 ( 10 1 10 ) = 2 1 1 ( 10 1 10 ) = 0 .
894 (Given MSE = 1.) What drugs are in the best subset?
42
Identify the subset of the best brokers
Hsu's MCB (Multiple Comparisons with the Best) Family error rate = 0.0500
Brokers 2, 4, 5
Critical value = 2.27
Intervals for level mean minus largest of other level means Level Lower Center Upper ---+---------+---------+---------+--- 1 -17.046 -11.000 0.000 (------*-------------) 2 -11.046 -5.000 1.046 (-------*------) 3 -18.046 -12.000 0.000 (-------*--------------) 4 -9.046 -3.000 3.046 (------*-------) 5 -3.046 3.000 9.046 (-------*------) ---+---------+---------+---------+--- -16.0 -8.0 0.0 8.0
43
----Post Hoc comparisons *F test for contrast (in “Orthogonality”) *Scheffe test (p.108; skipped) To test all linear combinations at once. Very conservative; not to be used for pairwise comparisons.
----A Priori comparisons * covered later in chapter on “Orthogonality” 44