11 Multiple Comparisons

Download Report

Transcript 11 Multiple Comparisons

PSYC 6130
Multiple Comparisons
Lecture 17 Summary
•
Why do multiple comparisons
•
The problem with multiple comparisons
•
Familywise and per-comparison alpha
•
Exploratory data analysis
•
•
–
Fisher’s protected t tests
–
Tukey’s HSD test
–
Dunnett’s Test
–
REGWQ Test
–
Games-Howell Test
Planned Comparisons
–
Bonferroni t or Dunn’s Test
–
Complex Comparisons (Linear Contrasts)
–
Scheffé’s Test (an exploratory analysis
technique that works for complex
comparisons).
Recommendations
PSYC 6130, PROF. J. ELDER
2
Why do multiple comparisons?
2
1.5
1
0.5
0
1
2
1.5
1
0.5
0
A
B
C
A
Independent Variable
0.8
0.6
3.5
0
C
Independent Variable
Dependent Variable
2.5
B
C
H1
0.2
A
B
Independent Variable
H0
0.4
Dependent Variable
Dependent Variable
1.2
2.5
Dependent Variable
Dependent Variable
2.5
2
1.5
1
0.5
2
1.5
1
0.5
0
A
0
B
Independent Variable
A
B
Independent Variable
PSYC 6130, PROF. J. ELDER
3
2.5
3
C
C
Number of Comparisons
Dependent Variable
3.5
3
2.5
2
1.5
1
0.5
0
A
B
Independent Variable
PSYC 6130, PROF. J. ELDER
4
C
Number of Comparisons
Dependent Variable
3.5
3
2.5
2
1.5
1
0.5
0
A
B
Independent Variable
PSYC 6130, PROF. J. ELDER
5
C
Number of Comparisons
Dependent Variable
3.5
3
2.5
2
1.5
1
0.5
0
A
B
Independent Variable
PSYC 6130, PROF. J. ELDER
6
C
Number of Possible Comparisons
• In general, for an independent variable with k
groups the number of possible comparisons is
given by:
k k  1
2
• In our example, k=3, so the number of possible
comparisons is: 33  1 32 

2
PSYC 6130, PROF. J. ELDER
7
2
3
The Problem with Multiple Comparisons
• Each pairwise comparison we do has a 5% chance of
resulting in a type I error (assuming   0.05 ) .
PSYC 6130, PROF. J. ELDER
8
The Problem with Multiple Comparisons
P=0.95
Accept H0
P(Accept,Accept)
= 0.95*0.95
=0.9025
Accept H0
P=0.05
P=0.95
Comparison 1
Reject H0
Comparison 2
P=0.05
P=0.95
Accept H0
P(Accept,Reject)
= 0.95*0.05
=0.0475
P(Reject,Accept)
= 0.05*0.95
=0.0475
Reject H0
P=0.05
Reject H0
P(Reject,Reject)
= .05*0.05
=0.0025
PSYC 6130, PROF. J. ELDER
9
The Problem with Multiple Comparisons
• If we do 20 comparisons where all of the null hypotheses
are actually true, we have a 0.9520  0.3585 chance of
correctly accepting all true null hypotheses and a 10.3585 = 0.6415 chance of making at least one Type I
error.
• In general, the probability of making at least one Type I
error in j comparisons is:
 EW  1  1   PC  j
– This is called the Experimentwise, or Familywise type I error
rate.
PSYC 6130, PROF. J. ELDER
10
Example
Suppose we wish to make three comparisons at  PC  0.05.
The probability of making at least one type I error is:
 EW  1  1   PC  j
 1  1  0.05
3
 1  0.95
 1  0.8574
 0.1426
3
PSYC 6130, PROF. J. ELDER
11
How to Fix the Problem
• One way to fix this problem is to reduce the per comparison alpha
rate.
• This is the main idea behind the approaches we will discuss.
PSYC 6130, PROF. J. ELDER
12
Reality
The Trade-Off
H0
true
H0
false
Your guess
H0
true
Hit
0.45
Type
II
error
0.4
H0
false
Type
I
error
C.R.
H0
H1
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
tcrit
13
4
6
8
The Trade-Off
0.45
0.4
0.35
0.3
0.25
 PC =Type I
0.2
0.15
error rate
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
14
4
6
8
The Trade-Off
0.45
0.4
0.35
0.3
0.25
 PC =Type II
0.2
0.15
error rate
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
15
4
6
8
The Trade-Off
0.45
0.4
0.35
0.3
1   PC =Power
0.25
0.2
0.15
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
16
4
6
8
The Trade-Off
0.45
0.4
0.35
0.3
0.25
 PC =Type I
0.2
0.15
error rate
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
17
4
6
8
The Trade-Off
0.45
0.4
0.35
0.3
1   PC =Power
0.25
0.2
0.15
0.1
0.05
0
-4
PSYC 6130, PROF. J. ELDER
-2
0
2
18
4
6
8
Exploratory Data Analysis
• Analyzing data for possible effects without any prior
expectations about what effects might be found is called
exploratory data analysis.
• In this case we want to detect effects when present, but we
want to limit our familywise Type I error rate so that it never
exceeds a strict threshold (e.g., 0.05).
• Such after-the-fact t-tests are called post-hoc comparisons.
PSYC 6130, PROF. J. ELDER
19
Fisher’s Protected t Tests
• Idea: only perform t-tests if an ANOVA analysis indicates
a significant effect.
• If there is absolutely no effect of the independent
variable, this will weed out 95% of the possible Type I
errors, thus ensuring the Type I error rate for any
subsequent post-hoc t-tests will be less than .05.
PSYC 6130, PROF. J. ELDER
20
Fisher’s Protected t Tests
• Used when performing exploratory data analysis at a
fixed Type I error rate.
• Assumptions:
– All your data are independent and normally distributed.
– Equal variances in each treatment group (homogeneity of
variance).
– You have performed an ANOVA on your data and found a
significant F-ratio at your preferred type I error rate (e.g. at
_______).
 0.05
PSYC 6130, PROF. J. ELDER
21
Fisher’s protected t tests
The formula for a standard (pooled variances) t test is:
t
X
1
 X2
1 1
s   
 n1 n2 
2
p
2
For Fisher’s protected t tests, we replace the p term
s
with the MSw term.
t
X
1
 X2

1 1
MSW   
n n 
j 
 i
PSYC 6130, PROF. J. ELDER
X


1
 X2
2MSW
n
22

if n
n1  n2
Fisher’s Protected t Tests
• Conditions of protection: The null hypothesis is
completely true (i.e. 1  2  ...  k ) or only one
null hypothesis is true (e.g. 1  2  3 ).
• Conditions of no protection: The null hypothesis is
partially true. e.g.,1   2  3   4
1   2  3   4
1   2  3   4
– In this case, if you are testing more than one true null hypothesis
then your experimentwise type I error rate accumulates as
before.
 EW  1  1   
PSYC 6130, PROF. J. ELDER
23
j
1. Fisher’s LSD: Degrees of Freedom
• Since the estimate of variance is based on all groups in
the experiment, the error (denominator) degrees of
freedom is:
df  NT  k  (ni  1)
PSYC 6130, PROF. J. ELDER
24
1. Fisher’s Least Significant Difference (LSD)
Treatment
Group
n
Mean
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
t
X
1
 X2
1 1
MSW   
n n 
j 
 i

46.8  45.5
t
 2.1
1 1
2  
 11 10 
SS
df
MS
F
df  NT  k  36
Between 60
3
20
10
  .05  tcrit  2.03
Within
72
36
2
Total
132
39
Source
PSYC 6130, PROF. J. ELDER
2.1>2.03, therefore, reject H0
and conclude that the mean for
group A is significantly different
from the mean for group B.
25
1. Fisher’s LSD
Advantages
Disadvantages
• Very powerful
• Very poor Type I error rate in
general.
• Controls familywise Type I
error rate when comparing only
three treatment means.
• Controls familywise Type I
error rate when at most one
null hypothesis is true.
• Controls familywise Type I
error rate when the complete
null hypothesis is true.
• Available in SPSS
PSYC 6130, PROF. J. ELDER
26
1. Fisher’s LSD
• Why is it called “Least Significant Difference”?
– Suppose sample sizes are equal.
Then the difference is significant if and only if
t 
Xi  X j
2MSW
n
PSYC 6130, PROF. J. ELDER
 tcrit  X i  X j  tcrit
27
2MSW
n
LSD
End of Lecture
March 18, 2009
2. Tukey’s Honestly Significant Difference
• Fisher’s LSD breaks down for > 3
groups.
• Tukey’s HSD works for any
number of groups
• Key Idea:
– Given k groups, consider the smallest
and largest means.
Treatment
Group
– Ensure protection against Type I error
when comparing these two means.
A
10
45.5
– This is guaranteed to protect against
Type I error for the next comparison.
B
11
46.8
C
9
53.2
D
10
54.5
PSYC 6130, PROF. J. ELDER
29
n
Mean
2. Tukey’s HSD
• Tukey’s HSD makes use of the studentized range
distribution q, which describes the expected, normalized
difference between the max and min observed means
amongst k treatments, under the null hypothesis:
q
Xi  X j
MSW
n
PSYC 6130, PROF. J. ELDER
where
X i  largest mean
X j  smallest mean
30
2. Tukey’s HSD
• As for Fisher’s LSD, this formula can be reversed to
efficiently determine which means are significantly
different:
q 
Xi  X j
MSW
n
 q  X i  X j  q
MSW
n
HSD
q is a function of both the number of treatments k and the df:
q  q (k, df )
PSYC 6130, PROF. J. ELDER
31
2. Tukey’s HSD
• In Tukey’s HSD, every pairwise difference is compared
against this HSD.
• Any difference that exceeds the HSD is considered
statistically significant.
This guarantees that EW  ,
regardless of how many pairwise comparison are made!
• This guarantee derives from a telescoping form of
protection.
PSYC 6130, PROF. J. ELDER
32
2. Tukey’s HSD
• Suppose that you order the k means in ascending order:
X1  X2 
 Xk
PSYC 6130, PROF. J. ELDER
33
Intuition behind Tukey’s HSD
Accept H0
P=0.95
Comparison 1
X k  X1  HSD?
Stop!
Comparison 2
e.g., X k  X 2  HSD?
P=0.05
P=0.95
Accept H0
P(Reject,Accept)
= 0.05*0.95
=0.0475
Reject H0
P=0.05
Reject H0
P(Reject,Reject)
= 0.05*0.05
=0.0025
“Telescoping protection”
PSYC 6130, PROF. J. ELDER
34
2. Tukey’s HSD Test
• Maintains  EW at the chosen value regardless of the number of
groups or whether the null hypothesis is completely or partially true.
• Assumptions
– Normality
– Homogeneity of variance
– Independent, random samples
– Roughly equal sample sizes
• Most appropriate when tests are post-hoc and/or all possible
pairwise comparisons are being performed.
PSYC 6130, PROF. J. ELDER
35
2. Tukey’s HSD
• If the sample sizes are slightly different you can replace
n with the harmonic mean of the sample sizes.
n 
k
k
1

i 1 ni
• k = The number of treatment groups.
• ni = The number of elements in treatment group i.
PSYC 6130, PROF. J. ELDER
36
2. Tukey’s HSD
Treatment
Group
n
Mean
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
Source
n 
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
PSYC 6130, PROF. J. ELDER
37
k
k
1

i 1 ni

4
1
1 1 1
  
10 11 9 10
 9.9
2. Tukey’s HSD
Treatment
Group
n
Mean
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
n 
k
k
1

i 1 ni

4
1
1 1 1
  
10 11 9 10
 9.9
From Studentized Range Statistic Table:
q (36,4) 3.85
Source
SS
df
MS
F
Between 60
3
20
10
Within
72
36
Total
132
39
PSYC 6130, PROF. J. ELDER
HSD  qcrit
2
38
MSW
2
 3.85
 1.7
n
9.9
2. Tukey’s HSD
Treatment
Group
n
Mean
Comparison Difference
Significant?
A
10
45.5
A vs. B
(46.8-45.5)=1.3
1.3<1.7
B
11
46.8
A vs. C
(53.2-45.5)=7.7
7.7>1.7 *
C
9
53.2
D
10
54.5
A vs. D
(54.5-45.5)=9
9>1.7 *
B vs. C
(53.2-46.8)=6.4
6.4>1.7 *
B vs. D
(54.5-46.8)=7.7
7.7>1.7 *
C vs. D
(54.5-53.2)=1.3
1.3<1.7
HSD  1.7
PSYC 6130, PROF. J. ELDER
39
Mean
2. Tukey’s HSD
55
54
53
52
51
50
49
48
47
46
45
PSYC 6130, PROF. J. ELDER
A
B
C
Treatment Group
40
D
2. Tukey’s HSD
Advantages
Disadvantages
• Type I error is properly
controlled for arbitrary number
of groups.
• Overly conservative (low
power) for k=3: better to use
Fisher’s LSD.
• Does not require an ANOVA.
• Not appropriate if sample sizes
or variances are very different.
• Available in SPSS
PSYC 6130, PROF. J. ELDER
41
3. Dunnett’s Test
•
Dunnett’s test was devised for the situation when:
1. one condition (e.g., the control condition) is to be compared
against all other conditions (e.g., the treatment conditions), and
2. no other pairwise comparisons are required.
•
Under these conditions, Dunnett’s test is the most
powerful test that accurately prevents inflation of Type I
error.
•
Dunnett’s test is available in SPSS
PSYC 6130, PROF. J. ELDER
42
3. Dunnett’s Test
Advantages
Disadvantage
• Useful for comparing each
treatment group mean with a
control group mean.
• Limited applicability.
• Requires homogeneity of
variance.
• In this situation, it’s the most
powerful test available that
does not allow  EW to rise
above its preset value.
PSYC 6130, PROF. J. ELDER
43
4. REGWQ Test
• REGW = Ryan, Einot, Gabriel and Welsh. Q = the studentized
range statistic.
• More powerful than Tukey’s HSD, but still maintains  EWat the preset
value.
• Adjusts the critical value separately for each pair of means,
depending on how many steps separate each pair when the means
are put in order.
• Available in SPSS
• The test of choice when
– k>3
– Dunnett’s test does not apply
– Homogeneity of variance applies
PSYC 6130, PROF. J. ELDER
44
Recall Tukey’s HSD
Accept H0
P=0.95
Comparison 1
X k  X1  HSD?
Stop!
Comparison 2
e.g., X k  X 2  HSD?
This turns out to be
stricter than necessary.
P=0.05
P=0.95
Accept H0
P(Reject,Accept)
= 0.05*0.95
=0.0475
Reject H0
P=0.05
Reject H0
P(Reject,Reject)
= 0.05*0.05
=0.0025
“Telescoping protection”
PSYC 6130, PROF. J. ELDER
45
REGWQ
• The k means are sorted in ascending order:
X1  X2 
 Xk
• Now when we do pairwise comparisons, Instead of
basing the critical q value on k, we base it on the number
of steps between the two means being compared:
q 
Xi  X j
MSW
n
 qr (r , df ), where r  j  i  1
Since qr (r ,df ) is smaller for smaller r , this increases power.
PSYC 6130, PROF. J. ELDER
46
REGWQ
To ensure EW   with this procedure, the  -value
used for each comparison must be adjusted based upon
the separation r :
 r  1  1   
PSYC 6130, PROF. J. ELDER
r /k
47
4. REGWQ Test: Example 1
Hours spent sleeping each night
Ryan-Einot-Gabriel-Welsch Range
Would you describe your
life as...
VERY STRESSFUL
SOMEWHAT STRESSFUL
NOT AT ALL STRESSFUL
NOT VERY STRESSFUL
Sig.
Subset
N
1
591
590
593
597
2
6.89
7.07
.056
7.26
7.29
.922
Means for groups in homogeneous subsets are displayed.
Alpha = .05.
PSYC 6130, PROF. J. ELDER
48
4. REGWQ Test: Example 2
• Marathon Split Times
– Note: due to very different sample sizes, we normally
would not choose REGWQ for this dataset.
MEASURE_1
Ryan-Einot-Gabriel-Welsch Range
Category
Men 30 Men 40 Men 24 &
Men 45 Men 25 Men 35 Men 50 Men 60 Men 55 Men 65 Sig.
N
152
219
45
177
89
172
97
14
50
6
1
1:55:44.16
1:56:21.63
1:57:03.26
1:57:12.99
1:57:27.04
1:58:53.89
.763
Subset
2
1:57:03.26
1:57:12.99
1:57:27.04
1:58:53.89
2:04:21.63
2:06:32.61
.064
3
1:57:27.04
1:58:53.89
2:04:21.63
2:06:32.61
2:07:30.10
2:23:39.50
.059
Means for groups in homogeneous subsets are displayed.
Alpha = .05.
PSYC 6130, PROF. J. ELDER
49
What do do when variance is not
homogeneous
What to do when variance is not homogeneous
• Recall that for two-sample independent t-tests:
– homogeneity of variance is not a problem when sample sizes are large
(can use separate-variances z-test) or sample sizes are equal (pooled
variance approximation is reasonably good).
• For pairwise comparisons following an ANOVA:
– Even if sample sizes are large, if variances or sample sizes are very
different, should use tests that do not pool variance over all k conditions:
• If only 3 groups:
– LSD with separate variances.
• Otherwise:
– Bonferroni or sequential Bonferroni (with separate variances)
– One of the SPSS post-hoc tests tolerant to heterogeneity of variance (e.g.,
Games-Howell)
PSYC 6130, PROF. J. ELDER
51
Fisher’s LSD when Variance Heterogeneous
To apply Fisher's LSD when the 3 variances appear homogeneous:
t
Xi  X j
1 1
MSW   
n n 
j 
 i
If the 3 variances do not appear homogeneous, then variance
must be calculated separately for each pairwise comparison:
If the 2 sample variances are similar, or the 2 sample sizes are large or similar,
Xi  X j
do a pooled variance test:
t
1 1
sp2   
n n 
j 
 i
Otherwise do a separate variances test:
t
PSYC 6130, PROF. J. ELDER
52
Xi  X j
s12 s22

ni n j
Bonferonni Method
• Unfortunately, the post-hoc Bonferonni method available in SPSS
does not accommodate heterogeneous variances. It uses the
standard method to calculate the error term by pooling the variance
over all conditions:
MSW 
2
(
n

1)
s
 i i
dfw
• It then uses the separate variances method to calculate the standard
error:
s
2
X1  X 2
MSW MSW


n1
n2
• The validity of this method rests on the assumption of homogeneity
of variance.
PSYC 6130, PROF. J. ELDER
53
Games-Howell Method
• As an alternative, the Games-Howell method is valid
when homogeneity of variance does not apply, and is
provided by SPSS. The Games-Howell method is a
separate-variances version of Tukey’s test.
PSYC 6130, PROF. J. ELDER
54
Example: Games-Howell
Test of Homogeneity of Variances
During 12 months-Number of contacts: Psychologist
Levene
Statistic
115.537
PSYC 6130, PROF. J. ELDER
55
df1
3
df2
11807
Sig.
.000
Example: Games-Howell
PSYC 6130, PROF. J. ELDER
56
Not Recommended
• Newman-Keuls Test (SNK)
• Duncan’s Test
PSYC 6130, PROF. J. ELDER
57
Final Recommendations for Post-Hoc Comparisons
• If comparisons are strictly simple:
– For k=3 (3 treatment groups):
• Use Fisher’s LSD
– If sample sizes are very different or variances appear unequal, use separate
variances tests
– For k>3
• If sample sizes are equal or nearly equal and variances appear to be
homogeneous
– Use Dunnett’s test when you are comparing multiple treatment group means with
a control group mean.
– Use Tukey’s HSD when you have to do your calculations by hand.
– Otherwise use REGWQ.
• If sample sizes are very different or variances appear unequal
– Bonferroni (preferably sequential) with separate variances and
– One of the methods offered by SPSS (e.g., Games-Howell SPSS)
PSYC 6130, PROF. J. ELDER
58
Lecture 17 Summary
•
Why do multiple comparisons
•
The problem with multiple comparisons
•
Familywise and per-comparison alpha
•
Exploratory data analysis
•
•
–
Fisher’s protected t tests
–
Tukey’s HSD test
–
Dunnett’s Test
–
REGWQ Test
–
Games-Howell Test
Planned Comparisons
–
Bonferroni t or Dunn’s Test
–
Complex Comparisons (Linear Contrasts)
–
Scheffé’s Test (an exploratory analysis technique that works for complex comparisons).
Recommendations
PSYC 6130, PROF. J. ELDER
59
End of Lecture
March 25, 2009
Planned Comparisons
• Comparisons you know you are going to do before you
actually do the experiment.
• When a small number of comparisons are planned, Type
I error inflation can be prevented with a lower critical
value than required for post-hoc tests, resulting in higher
power.
• You don’t have to have a significant ANOVA F to do a
planned comparison (in fact you don’t have to do an
ANOVA at all).
PSYC 6130, PROF. J. ELDER
61
ANOVA vs Planned Comparison
• It is possible to obtain an
insignificant ANOVA result,
while a planned comparison
(t-test) yields significance.
0.25
t(198)=2.1, p=.03
Dependent Variable
0.2
• Under the normal
assumptions, it is unusual to
obtain an insignificant
ANOVA result and a
significant post-hoc
comparison (done correctly).
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
1
2
3
4
5
6
7
8
9
10
Independent Variable (Factor)
Source
SS
df
MS
F
Prob>F
Columns
9.778
9
1.0864
1.22
0.2791
Error
882.227
990
0.8911
Total
892.005
999
PSYC 6130, PROF. J. ELDER
62
Bonferroni-Dunn Test
• For a given number of comparisons j the experimentwise
alpha will never be more than j times the per-comparison
alpha.
 EW  1  1   
j
PSYC 6130, PROF. J. ELDER
 EW  j PC
63
 EW
j
  PC
Bonferroni-Dunn Test
Treatment
Group
n
Mean
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
Source
  .05  pc  .025
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
PSYC 6130, PROF. J. ELDER
64
Treatment
Group
n
Bonferroni-Dunn Test
X  X 
Mean
t
1
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
1 1
MSW   
n n 
j 
 i

53.2  45.5
t
 11.9
1 1 
2  
 9 10 
SS
df
MS
F
df  NT  k  36
Between 60
3
20
10
t.0125  2.44
Within
72
36
2
Total
132
39
Source
PSYC 6130, PROF. J. ELDER
2
11.9>2.44, therefore, reject H0
and conclude that the mean
for group A is significantly
different from the mean for
group C.
65
Bonferroni T, or Dunn’s Test
Treatment
Group
n

X
t
 X2
2MSW
n
Mean
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
t
1
54.5  45.5  14.2
22 
10
Source
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
PSYC 6130, PROF. J. ELDER
df  NT  k  36
t.0125  2.44
14.2>2.44, therefore, reject H0
and conclude that the mean
for group A is significantly
different from the mean for
group D.
66
Bonferroni-Dunn Test
Advantages:
Type I error is properly controlled, e.g., ew  .05.

is a very accurate approximation of 1-(1- )j
j
Can handle very different sample sizes (using separate variances t -tests)
Disadvantages:
Ignores possible statistical dependencies between comparisons
For a large number of comparisons, this makes Bonferroni overly conservative.
PSYC 6130, PROF. J. ELDER
67
Modified Bonferroni:
Sequentially Rejective Multiple Test
• One of the reasons the Bonferroni test is conservative is
that it is highly constrained:
– It requires that you use the same critical value for each test
– It requires that each test be made independently
• Relaxing these constraints leads to a more powerful test
that still accurately controls Type I error.
PSYC 6130, PROF. J. ELDER
68
Modified Bonferroni: Procedure
1. Arrange the j planned comparisons you are doing according to their p values.
2. Test the smallest p value against EW / j.
3. If and only if this comparison is significant,
proceed to test the next smallest p value against EW /( j  1).
4. Repeat, testing the kth smallest p value against EW /( j  k  1),
but only if the previous p value was found to be significant.
5. Stop as soon as any p value fails to reach significance.
PSYC 6130, PROF. J. ELDER
69
Modified Bonferroni: Why Does this Work?
• The application of the Bonferroni correction still
conservatively protects each t-test performed.
• The conditional nature of the procedure leads to a
multiplicative, rather than additive, compounding of error
probability.
PSYC 6130, PROF. J. ELDER
70
Modified Bonferroni
Sequentially Rejective Multiple Tests
Comparison t
df
p
Comparison α
A vs. D
14.23
18
1.55 x 10-11
0.05/(6) = 0.0083
B vs. D
12.46
19
6.83 x 10-11
0.05/(6-1) = 0.0100
A vs. C
11.85
17
6.10 x 10-10
0.05/(6-2) = 0.0125
B vs. C
10.07
18
4.02 x 10-9
0.05/(6-3) = 0.0167
A vs. B
2.10
19
0.024
0.05/(6-4) = 0.0250
C vs. D
2.00
17
0.031
0.05/(6-(6-1))=0.0500
PSYC 6130, PROF. J. ELDER
71
Modified Bonferroni
Sequentially Rejective Multiple Tests
Advantages
Disadvantages
• More powerful than regular
Bonferroni.
• Not yet computed directly by
SPSS
• Still maintains EW at the
desired level.
• Can use SPSS to calculate p
values, and then apply tests by
hand.
PSYC 6130, PROF. J. ELDER
72
Complex Comparisons
• Pairwise comparisons (comparing the means of 2
conditions) are sometimes called simple comparisons.
• It is also common to aggregate multiple conditions into 2
larger disjoint groups and then compare these agregate
groups.
• For example, when comparing the efficacy of multiple
treatments for depression, one might want to compare
the aggregate of all pharmaceutical treatments against
the aggregate of all psychotherapy treatments.
PSYC 6130, PROF. J. ELDER
73
Complex Comparisons
• Another example: Suppose you wish to compare the
average IQs of people born on a weekend with those
born on week days.
• This can be done by forming a linear contrast:

 Sat   Sun

 Mon  Tues  Wed  Thurs   Fri
2
5
1
1
1
1
1
1
1
  Sat   Sun   Mon  Tues  Wed  Thurs   Fri
2
2
5
5
5
5
5
 c11  c2  2  c3 3  c4  4  c5 5
k
  ci i
i 1
PSYC 6130, PROF. J. ELDER
74
Complex Comparisons
• This is called a linear contrast if
k
c
i 1
i
 0.
• When referring to our sample estimates we use the
k
notation:
L   ci X i
i 1
• Although we could use a t-test to evaluate significance, it
is common to use ANOVA.
PSYC 6130, PROF. J. ELDER
75
Complex Comparisons
22
20
Independent
Variable
18
16
14
L  1X A 1X B 1X C 1X D
12
10
8
6
A
B
C
Treatment Condition
PSYC 6130, PROF. J. ELDER
76
D
L
Complex Comparisons
F
MScontrast
MSW
Since a complex comparison evaluates only a single difference score, dfcontrast  1.
Thus F 
MScontrast SScontrast

, where
MSW
MSW
L2
SScontrast  k 2
 ci 
 

i 1  ni 
The F statistic will be evaluated against an Fcrit based on (1, dfW ) degrees of freedom.
PSYC 6130, PROF. J. ELDER
77
Complex Comparisons
Treatment
Group
n
A
10
45.5
B
11
46.8
C
9
53.2
D
10
54.5
Source
L  3X A 1X B 1X C 1X D
Mean
 345.5  46.8  53.2  54.5
 18
L2
SScontrast  k 2
 ci 
 

i 1  ni 
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
PSYC 6130, PROF. J. ELDER
2

 18
 2
2
2
2
3  1  1  1



10 11
 269.55
9
SScontrast 269.55
F

 134.78
MSW
2
78
10
Complex Comparisons
Source
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
F 1,36 0.05  4.17
Since 134.78 > 4.17, we reject H0 and conclude that the mean
for treatment group A is significantly different from the average
of the means for treatment groups B, C, and D.
PSYC 6130, PROF. J. ELDER
79
Orthogonal Contrasts
• Orthogonal means perpendicular or “at right angles.”
The word perpendicular, however, is usually reserved for
the two dimensional case. Orthogonal is the
generalization of perpendicular to higher dimensions.
1.2
1
1
0.5
0.8
0
0.6
0.4
-0.5
0.2
-1
1
0
0.5
-0.2
1
0.5
0
-0.2
0
0.2
0.4
0.6
0.8
1
0
1.2
-0.5
-0.5
-1
PSYC 6130, PROF. J. ELDER
80
-1
Orthogonal Contrasts
• Two vectors are orthogonal if their dot product is zero.
• E.g. [1, 0][0, 1] = 1*0 + 0*1 = 0 - Orthogonal!
1.2
[0, 1]
1
0.8
0.6
0.4
0.2
[1, 0]
0
-0.2
-0.2
0
0.2
0.4
0.6
0.8
PSYC 6130, PROF. J. ELDER
1
1.2
81
Orthogonal Contrasts
• [1, 0][0.5 0.5] = 1*0.5 + 0*0.5 = 0.5 - Not orthogonal.
0.7
0.6
[0.5, 0.5]
0.5
0.4
0.3
0.2
0.1
[1, 0]
0
-0.1
-0.2
-0.2
0
0.2
0.4
PSYC 6130, PROF. J. ELDER
0.6
0.8
1
1.2
82
Orthogonal Contrasts
• When doing more than one complex comparison (e.g. A vs.
(B+C+D)/3 and (3A + C) vs. (B + 3D)), if the vectors of coefficients
(in this case [1, -1/3, -1/3, -1/3] and [0, 0, ½, -½ ]) are orthogonal, the
results from the two comparisons will be independent of each other.
• If you are doing complex comparisons on a study with 4 treatment
groups, there are 4-1 = 3 possible orthogonal complex comparisons.
• In general, for a study with k treatment groups, there are k-1
possible orthogonal contrasts.
PSYC 6130, PROF. J. ELDER
83
Orthogonal Contrasts
Treatment
Group
n
L  3X A 1X B 1X C 1X D
Mean
 345.5  46.8  53.2  54.5
 18
A
10
45.5
B
11
46.8
C
9
53.2
c = [3, -1, -1, -1]
D
10
54.5
c = [0, 0, 1, -1]
L  0 X A  0 X B  1X C 1X D
[3, -1, -1, -1][0, 0, 1, -1]
 045.5  046.8  53.2  54.5
 1.3
= (3)(0) + (-1)(0) + (-1)(1) + (-1)(-1)
=0 - Orthogonal
PSYC 6130, PROF. J. ELDER
84
Orthogonal Contrasts
• Performing Multiple Complex Comparisons:
– Some textbooks suggest that if planned, multiple orthogonal
complex comparisons can be performed at the usual alpha-level
without worrying about inflation of Type I error.
– I do not agree with this policy: inflation of Type I error will occur.
– Instead, even for planned orthogonal comparisons, you must
correct for inflated Type I error (e.g., by using Bonferroni
correction).
– For these reasons, I also do not recommend the Keppel
modification.
PSYC 6130, PROF. J. ELDER
85
Scheffé’s Test
• Scheffe’s test can be used to control Type I error when
doing a set of post-hoc tests that includes complex
comparisons.
• Scheffé’s test proceeds in the same way as the usual
complex comparisons, except that the critical F value is
determined by taking the critical F from the overall
ANOVA and multiplying it by dfbetween  k  1.
• The usual ANOVA assumptions apply (i.e. normality,
independence, homogeneity of variance).
PSYC 6130, PROF. J. ELDER
86
Scheffé’s Test
FS  k 1Fcrit k 1, NT  k 
• FS = Scheffé’s critical F.
•
k= The number of treatment groups.
• N T = The total number of subjects from all groups
combined.
PSYC 6130, PROF. J. ELDER
87
Scheffé’s Test
• From our previous example,
Source
•
Fcrit 3,36 0.05  2.92.
SS
df
MS
F
Between 60
3
20
10
Within
72
36
2
Total
132
39
FS  k 1Fcrit  4 12.92  8.76.
• Taking F = 134.78 from our complex comparison, we would reject H0
(since 134.78 > 8.76) and conclude that that the mean for treatment
group A is significantly different from the average of the means for
treatment groups B, C, and D.
PSYC 6130, PROF. J. ELDER
88
Scheffé’s Test
Advantages
Disadvantages
• Maintains  EW at the pre-set
value.
• Can be very conservative: use
only for posthoc analyses that
include complex comparisons.
• It is robust with respect to the
usual ANOVA assumptions
(e.g. homogeneity of variance).
• Doesn’t require equal sample
sizes.
PSYC 6130, PROF. J. ELDER
89
Final Recommendations for Post-Hoc Comparisons
•
If comparisons are strictly simple:
– For k=3 (3 treatment groups):
• Use Fisher’s LSD
– If sample sizes are very different or variances appear unequal, use separate variances tests
– For k>3
• If sample sizes are equal or nearly equal and variances appear to be homogeneous
– Use Dunnett’s test when you are comparing multiple treatment group means with a control
group mean.
– Use Tukey’s HSD when you have to do your calculations by hand.
– Otherwise use REGWQ.
• If sample sizes are very different or variances appear unequal
– Bonferroni (preferably sequential) with separate variances and
– One of the methods offered by SPSS (e.g., Games-Howell SPSS)
•
If comparisons include complex comparisons:
– Use Scheffe’s test
PSYC 6130, PROF. J. ELDER
90
j
k ( k  1)
2
Recommendations for Planned Comparisons
• If k=3, comparisons are strictly simple:
– Use Fisher’s LSD
• If sample sizes are very different or variances appear unequal, use
separate variances tests
• Otherwise
– Use Bonferroni (preferably sequential)
PSYC 6130, PROF. J. ELDER
91
End of Lecture
April 1, 2009