Transcript Document

1
When we think only of sincerely helping all others, not ourselves,
We will find that we receive all that we wish for.
multiple comparisons
Chapter 9: Multiple Comparisons
Error
rate of control
Pairwise comparisons
Comparisons to a control
Linear contrasts
2
multiple comparisons
Multiple Comparison Procedures
Once we reject H0: ==...t in favor of
H1: NOT all ’s are equal, we don’t yet
know the way in which they’re not all
equal, but simply that they’re not all the
same. If there are 4 columns (levels), are
all 4 ’s different? Are 3 the same and
one different? If so, which one? etc.
3
multiple comparisons
These “more detailed” inquiries into the
process are called MULTIPLE
COMPARISON PROCEDURES.
Errors (Type I):
We set up “” as the significance level for a
hypothesis test. Suppose we test 3 independent
hypotheses, each at = .05; each test has type I
error (rej H0 when it’s true) of .05. However,
P(at least one type I error in the 3 tests)
= 1-P( accept all ) = 1 - (.95)3  .14
3, given true
4
multiple comparisons
In other words, Probability is .14 that at
least one type one error is made. For 5
tests, prob = .23.
Question - Should we choose = .05,
and suffer (for 5 tests) a .23
Experimentwise Error rate (“a” or E)?
OR
Should we choose/control the overall
error rate, “a”, to be .05, and find the
individual test  by 1 - (1-)5 = .05, (which
gives us  = .011)?
multiple comparisons
5
The formula 1 - (1-)5 = .05
would be valid only if the tests are
independent; often they’re not.
1
2
3
[ e.g., 1=22= 3, 1= 3
IF 1 accepted & 2 rejected, isn’t
it more likely that 3 rejected? ]
multiple comparisons
6
Error Rates
When the tests are not independent, it’s
usually very difficult to arrive at the
correct for an individual test so that a
specified value results for the
experimentwise error rate (or called
family error rate).
7
multiple comparisons
There are many multiple comparison
procedures. We’ll cover only a few.
Pairwise Comparisons
Method 1: (Fisher Test) Do a series of
pairwise t-tests, each with specified  value
(for individual test).
This is called “Fisher’s LEAST SIGNIFICANT
DIFFERENCE” (LSD).
8
multiple comparisons
Example: Broker Study
A financial firm would like to determine if brokers they use to
execute trades differ with respect to their ability to provide a
stock purchase for the firm at a low buying price per share. To
measure cost, an index, Y, is used.
Y=1000(A-P)/A
9
where
P=per share price paid for the stock;
A=average of high price and low price per share, for
the day.
“The higher Y is the better the trade is.”
multiple comparisons
CoL: broker
1
12
3
5
-1
12
5
6
2
7
17
13
11
7
17
12
3
8
1
7
4
3
7
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
}
n=6
Five brokers were in the study and six trades
were randomly assigned to each broker.
multiple comparisons
10
Source SSQ df MSQ F
Col 640.8 4 160.2 7.56
Error
530
25 21.2
“MSW”
 = .05, FTV = 2.76
(reject equal column MEANS)
multiple comparisons
11
For any comparison of 2 columns,
/2
/2
Yi -Yj
CL
0
Cu
AR: 0+ t/2 x MSW x 1n + 1n
dfw
MSW :
i
j
(ni = nj = 6, here)
Pooled Variance, the estimate
for the common variance
multiple comparisons
12
In our example, with=.05
0  2.060 (21.2 x 16 + 16 )
0 5.48
This value, 5.48 is called the Least
Significant Difference (LSD).
When same number of data points, n,
in each column, LSD = t/2 x 2xMSW
n .
multiple comparisons
13
Underline Diagram
Summarize the comparison results. (p. 443)
1. Now, rank order and compare:
Col: 3
1
2
4
5
6
12 14 17
multiple comparisons
5
14
Step 2: identify difference > 5.48, and
mark accordingly: 3 1 2 4 5
5
3:
6 12 14 17
compare the pair of means within
each subset:
Comparison difference vs. LSD
<
3 vs. 1
*
<
2 vs. 4
*
5
<
2 vs. 5
<
4 vs. 5
*
* Contiguous; no need to detail
multiple comparisons
15
Conclusion : 3, 1 2, 4, 5
Can get “inconsistency”: Suppose col 5
were 18:
3 1 2 4 5
5
6
12 14
18
Now: Comparison |difference| vs. LSD
<
3 vs. 1
*
<
2 vs. 4
*
6
2 vs. 5
4 vs. 5
<
*
Conclusion : 3, 1 2 4 5 ???
>
multiple comparisons
16
Conclusion : 3, 1 2 4 5
• Broker 1 and 3 are not significantly different
but they are significantly different to the other
3 brokers.
• Broker 2 and 4 are not significantly different,
and broker 4 and 5 are not significantly
different, but broker 2 is different to (smaller
than) broker 5 significantly.
multiple comparisons
17
MULTIPLE COMPARISON TESTIN G
AFS
BROKER ---->
TRADE 1
2
3
4
5
6
COLUMN MEAN
BROKER STUD Y
1
2
3
12
7
8
3
17
1
5
13
7
-1
11
4
12
7
3
5
17
7
6
12
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
AN OVA TABLE
SOURCE
SSQ
DF
MS
Fcalc
BROKER
640.8
4
160.2
7.56
ERROR
530
25
21.2
multiple comparisons
18
Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”.
Fisher's pairwise comparisons (Minitab)
Family error rate = 0.268
Individual error rate = 0.0500
Critical value = 2.060  t_/2
Intervals for (column level mean) - (row level mean)
1
2
2
3
4
-11.476
-0.524
Col 1 < Col 2
3
4
5
-4.476
1.524
6.476
12.476
-13.476
-7.476
-14.476
-2.524
3.476
-3.524
-16.476
-10.476
-17.476
-8.476
-5.524
0.476
-6.524
2.476
Cannot reject Col 2 = Col 4
multiple comparisons
19
Pairwise comparisons
Method 2: (Tukey Test) A procedure which
controls the experimentwise error rate is
“TUKEY’S HONESTLY SIGNIFICANT
DIFFERENCE TEST ”.
20
multiple comparisons
Tukey’s method works in a similar way
to Fisher’s LSD, except that the “LSD”
counterpart (“HSD”) is not
t/2 x MSW x  1n + 1n
i
(
or, for equal number
of data points/col
but tuk /2
X 2xMSW
n
)
j
= t/2 x 2xMSW ,
n
,
where tuk has been computed to take
into account all the inter-dependencies
of the different comparisons.
multiple comparisons
21
HSD = tuk/2x2MSW
n
_______________________________________
A more general approach is to write
HSD = qxMSW
where
q = tuk

/2
n

x 2
--- q = (Ylargest - Ysmallest) / MSW n
---- probability distribution of q is called the
“Studentized Range Distribution”.
--- q = q(t, df), where t =number of columns,
and df = df of MSW
multiple comparisons
22
With t = 5 and df = v= 25,
from Table 10:
q = 4.15 for =5%
tuk = 4.15/1.414 = 2.93
Then,
HSD = 4.15 ./6=7.80
also.93x./6=7.80
multiple comparisons
23
In our earlier
example:
3
1
2
4
5
5
6 12 14 17
Rank order:
(No differences [contiguous] > 7.80)
multiple comparisons
24
Comparison |difference| >or< 7.80
<
3 vs. 1
(contiguous)
*
7
<
3 vs. 2
>
9
3 vs. 4
>
12
3 vs. 5
<
*
1 vs. 2
>
8
1 vs. 4
>
11
1 vs. 5
<
*
2 vs. 4
<
5
2 vs. 5
<
*
4 vs. 5
3, 1, 2 4, 5
2 is “same as 1 and multiple
3, but
also same as 4 and 5.”
comparisons
25
Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”.
Tukey's pairwise comparisons (Minitab)
Family error rate = 0.0500
Individual error rate = 0.00706
Critical value = 4.15  q_
Intervals for (column level mean) - (row level mean)
2
3
4
5
1
-13.801
1.801
-6.801
8.801
-15.801
-0.199
-18.801
-3.199
2
-0.801
14.801
-9.801
5.801
-12.801
2.801
3
-16.801
-1.199
-19.801
-4.199
multiple comparisons
4
-10.801
4.801
26
Special Multiple Comp.
Method 3: Dunnett’s test
Designed specifically for (and incorporating
the interdependencies of) comparing several
“treatments” to a “control.”
Example:
CONTROL
Col
1
2
6 12
Analog of LSD
(=t/2 x 2 MSW
)
n
3
4
5
5 14 17
} n=6
D = Dut/2 x 2 MSW
n
From table or Minitab
multiple comparisons
27
D= Dut/2 x 2 MSW/n
CONTROL
= 2.61 (2(21.2)
)
6
= 6.94
1 2 3 4 5
In our example:
6 12 5 14 17
Comparison |difference| >or< 6.94
1 vs. 2
1 vs. 3
1 vs. 4
1 vs. 5
6
1
8
11
<
<
>
>
- Cols 4 and 5 differ from the control [ 1 ].
- Cols 2 and 3 are not significantly different
from control.
multiple comparisons
28
Minitab: Stat>>ANOVA>>General Linear Model then click “comparisons”.
Dunnett's comparisons with a control (Minitab)
Family error rate = 0.0500  controlled!!
Individual error rate = 0.0152
Critical value = 2.61  Dut_/2
Control = level (1) of broker
Intervals for treatment mean minus control mean
Level
2
3
4
5
Lower
-0.930
-7.930
1.070
4.070
Center
6.000
-1.000
8.000
11.000
Upper --+---------+---------+---------+----12.930
(---------*--------)
5.930 (---------*--------)
14.930
(--------*---------)
17.930
(---------*---------)
--+---------+---------+---------+-----7.0
0.0
7.0
14.0
multiple comparisons
29
What Method Should We Use?
30

Fisher procedure can be used only after the
F-test in the Anova is significant at 5%.

Otherwise, use Tukey procedure. Note that
to avoid being too conservative, the
significance level of Tukey test can be set
bigger (10%), especially when the number
of levels is big. Or use S-N-K procedure.
multiple comparisons
Contrast
Consider the following data, which,
let’s say, are the column means of a
one factor ANOVA, with the one factor
being “DRUG”:
Consider 4 column means:
1
2
3
4
Y.1
Y.2
Y.3
Y.4
6
4
1
-3
Grand Mean = Y.. = 2
# of rows (replicates) = R = 8
31
Contrast
1
Example 1
Placebo
2
3
4
Sulfa
Sulfa
Type
Type
S1
Antibiotic
Type A
S2
Suppose the questions of interest are
(1) Placebo vs. Non-placebo
(2) S1 vs. S2
(3) (Average) S vs. A
multiple comparisons
32
• For (1), we would like to test if the mean of
Placebo is equal to the mean of other levels,
i.e. the mean value of {Y. -(Y. +Y. +Y. )/3}
is equal to 0.
• For (2), we would like to test if the mean of
S1 is equal to the mean of S2, i.e. the mean
value of (Y. -Y. ) is equal to 0.
• For (3), we would like to test if the mean of
Types S1 and S2 is equal to the mean of Type
A, i.e. the mean value of {(Y. +Y. )/2-Y. }
is equal to 0.
1
2
2
3
4
3
2
3
4
33
In general, a question of interest can be
expressed by a linear combination of
column means such as
Z =  a j Y. j
j
with restriction that Saj = 0.
Such linear combinations are called (linear)
contrasts.
multiple comparisons
34
Test if a contrast has mean 0
The sum of squares for contrast Z is
SSZ = n  Z /  a
2
2
j
j
where n is the number of rows (replications).
The test statistic Fcalc = SSZ/MSW is distributed
as F with 1 and (df of error) degrees of freedom.
Reject E[Z]= 0 if the observed Fcalc is too large
(say, > F0.05(1,df of error) at 5% significant level).
multiple comparisons
35
Example 1 (cont.): aj’s for the 3 contrasts
P
S1
S2
A
1
2
3
4
P vs. P: Z1 -3
1
1
1
S1 vs. S2:Z2
0
-1
1
0
S vs. A: Z3
0
-1
-1
2
multiple comparisons
36
Calculating
a

2
j
j
top row
middle row
bottom row
3= 
00= 
0= 6
multiple comparisons
37
5
Y.1
6
P
Y.2
7
S1
Y.3
S2
10
Y.4
Z 2 /  a 2j
A
j
Placebo
vs. drugs
-3
1
1
1
5.33
S1 vs. S2
0
-1
1
0
0.50
0
-1
-1
2
Average S
vs. A
8.17
14.00
multiple comparisons
38
SSZ :
Z / a
2
2
j
8  Z / a
2
j
S(Y.j - Y..)2 = 14.
•SSBc = 14.R;
•R = # rows= 8.
2
j
j
5.33
42.64
.50
4.00
8.17
65.36
14.00
112.00
SSBc !
39
Orthogonal Contrasts
A set of k contrasts { Zi = aij Y . j , i=1,2,…,k } are
called orthogonal if
Sai1j . ai2j = 0 for all i1, i2,
j
i1 = i 2 .
If k = c -1 (the df of “column” term and c: # of
columns), then k
 SSZ = SSB
i =1
i
c
40
Orthogonal
Contrasts
If a set of contrasts are orthogonal, their
corresponding questions are called
independent because the probabilities of Type I
and Type II errors in the ensuing hypothesis
tests are independent, and “stand alone”.
That is, the variability in Y due to one contrast
(question) will not be affected by that due to the
other contrasts (questions).
41
Orthogonal
Breakdown
Since SSBcol has (C-1) df (which corresponds
with having C levels, or C columns ), the SSBcol
can be broken up into (C-1) individual SSQ
values, each with a single degree of freedom,
each addressing a different inquiry into the data’s
message (one question).
A set of C-1 orthogonal contrasts (questions)
provides an orthogonal breakdown.
42
Recall Data in Example 1:
R=8
Placebo
.
.
.
.
.
5
{
S1
.
.
.
.
.
6
S2
.
.
.
.
.
7
A
.
.
.
.
.
10
Y..= 7
43
ANOVA
Source SSQ df MSQ
F
Drugs
112 3 37.33 7.47
Error
140 28
5
F1-.05(3,28)=2.95
44
An Orthogonal Breakdown
Source SSQ
Z1
Drugs Z2
Z3
112
Error
140
{
{
df
42.64
4.00
65.36
3
28
{
MSQ
1
1
1
42.64
4.00
65.36
F
8.53
.80
13.07
5
F1-.05(1,28)=4.20
45
Example 1 (Conti.): Conclusions



46
The mean response for Placebo is
significantly different to that for Non-placebo.
There is no significant difference between
using Types S1 and S2.
Using Type A is significantly different to using
Type S on average.
multiple comparisons
What if contrasts of interest
are not orthogonal?
Let k be the number of contrasts of interest;
c be the number of levels
1. If k <= c-1  Bonferroni method
2. If k > c-1  Bonferroni or Scheffe method
*Bonferroni Method: The same F test but use = a/k,
where a is the desired family error rate (usual at 5%).
*Scheffe Method: To test all linear combinations at
once. Very conservative. (Section 9.8)
47
Special Pairwise Comp.
Method 4: MCB Procedure (Compare to the best)
This procedure provides a subset of
treatments
that cannot distinguished from the best. The
probability of that the “best” treatment is
included in this subset is controlled at 1-.
*Assume that the larger the better.
If not, change response to –y.
48
Identify the subset of the best brokers
Minitab: Stat>>ANOVA>>One-Way Anova then click
“comparisons”, HSU’s MCB
Hsu's MCB (Multiple Comparisons with the Best)
Family error rate = 0.0500
Brokers 2, 4, 5
Critical value = 2.27
Intervals for level mean minus largest of other level means
Level
1
2
3
4
5
Lower
-17.046
-11.046
-18.046
-9.046
-3.046
Center
-11.000
-5.000
-12.000
-3.000
3.000
Upper ---+---------+---------+---------+---0.000
(------*-------------)
1.046
(-------*------)
0.000 (-------*--------------)
3.046
(------*-------)
9.046
(-------*------)
---+---------+---------+---------+----16.0
-8.0
0.0
8.0
Not included; only if the interval (excluding ends) covers 0, this level is selected.
49