Psych 5500/6500

Download Report

Transcript Psych 5500/6500

Psych 5500/6500
ANOVA:
Single-Factor Independent
Means
Fall, 2008
ANOVA
ANOVA is short for ‘Analysis of Variance’, it is
also known as the F test. It is applicable in a
variety of experimental designs that involve
the comparison of group means to determine
whether or not the independent variable had
an effect.
We will begin with the ‘single-factor
independent means’ ANOVA.
‘Single-Factor’
This is similar to the ‘t test for independent
means’. As in the t test there is one
dependent variable and one independent
variable (a ‘factor’ is an independent
variable, thus ‘single-factor’ means one IV).
But unlike in the t test, with the F test there
can be 2 or more levels of the IV (i.e. two or
more groups in the experiment).
Independent Means
The ANOVA (F test) we will begin with
assumes that the scores are independent
across groups. In other words, this could
be used in a true experimental design, a
quasi-experimental design, or a static
group design (just like the t test for
independent means).
Example 1:
True Experimental Design
Does Vitamin C affect the length of people’s
colds? Randomly divide subjects who are
in their first day of a cold into 4 groups,
then each group gets a different level of
Vitamin C. Measure how long it takes each
person to get over their cold (DV).
IV= Group 1: 0 mg, Group 2: 100 mg, Group 3:
500 mg, Group 4: 5000 mg
Example 2:
Quasi-Experimental Design
Do three specific therapies differ in their
ability to treat depression? Let subjects
select the type of therapy they want (three
different kinds are available), then measure
their level of depression (DV) after 2
months of therapy (note the IV is
manipulated by the experimenter).
IV= Group 1: Behavior Modification, Group 2:
Gestalt, Group 3: Client-Centered
Example 3: Static Group
Design
Does the size of a city affect the cancer rate in that
city? Randomly select several small cities,
several medium sized cities, and several large
cities, measure their cancer rate per 10,000
citizens (DV). Note the IV is not manipulated,
instead it is the criteria for assigning to groups;
IV= Group 1: Small Cities, Group 2: Medium Cities,
Group 3: Large Cities
Relationship of t and F
If you have two groups (i.e. two levels of
your IV) you can use either a ‘t’ or ‘F’ test to
analyze the results, and if you are testing
a two-tailed hypothesis there is no
difference between doing a t test or an F
test.
2
2
obt
obt
c
c
t
F
and t  F
The F test cannot test a directional (one-tailed) hypothesis. Thus, if
you want to do a one-tailed test use t. The t test, however, cannot
be used if you have more than two groups, then you must use F.
Hypotheses
If you have three levels of your IV then:
H0: μ1= μ2= μ3 (one μ for each group)
You are saying that all the populations in the
experiment have the same mean, and that
any differences in the group sample means
are just due to chance.
So what is HA?
HA: μ1 μ2  μ3?
H0 and HA
No, HA: μ1 μ2  μ3 doesn’t work, as H0 and
HA together must cover every possibility
(e.g. what about μ1= μ2  μ3?). So, the
correct answer is:
H0: μ1= μ2= μ3
HA: at least one μ is different than the rest
Test Statistic
We need a statistic whose value we know if
H0 is true. With the t test for independent
groups the way we tested whether μ1= μ2
was by using: Y1  Y2
If H0 is true then we expect Y1  Y2  0
But what if we have three or more groups? If
H0 is μ1= μ2= μ3 what would we expect if H0
is true?
Y1  Y2  Y3  ?
We need a statistic that will measure whether
several group means are about the same (H0 true
and the means differ only due to chance) or if they
differ more than you would expect if only chance
were involved (i.e. if the independent variable
made the populations—and thus the groups
means—more different than you would expect if
only random error were involved).
What statistic do we know measures how much a
bunch of numbers (in this case group means but
that doesn’t matter) differ from each other?
Analysis of Variance
The essence of the F test for the one factor
independent group ANOVA is that it
examines the variance of the group means
to determine whether the group means
differ more than you would expect if H0
were true. The logic of how we will do that
is based upon ‘partitioning the Sums of
Squares’
Setup
We will begin with a simple experiment
with three groups, and three scores in
each group.
Group 1 Group 2 Group 3
Y1
Y4
Y7
Y2
Y5
Y8
Y3
Y5
Y9
Symbols
Yi  an individualscore(' i' indicateswhichscore)
Yj  the mean of group' j'
YT  the totalmean (the mean of all of the scores)
N j  the number of scoresin group' j'
N T  the total N (the totalnumber of scores)
a  the number of groups
Group 1 Group 2 Group 3
Y1
Y4
Y7
Y2
Y5
Y8
Y3
Y5
Y9
Y1  ?
Y2  ?
Y3  ?
N1=3
N2=3
N3=3
a 3
YT  ?
NT  9
Partitioning the Deviation
We begin by looking at how far each score is from the
mean of all of the scores:
Yi  YT 
Then we break (partition) that distance into two pieces,
how far the score is from the mean of its group, and how
far the mean of the group is from the mean total.
Yi  YT   Yi  Yj  Yj  YT 
Partitioning the SS
Now we use those deviations to create three sums of squares.
SSTotal = SSWithinGroups + SSBetweenGroups
 Y  Y    Y  Y    Y  Y 
2
2
i
T
i
j
2
j
T
SSTotal measures the squared deviations of the scores from the
mean of all of the scores.
SSWithin measures the squared deviations of the scores from the
mean of the group they are in.
SSBetween measures the squared deviations of the group means from
the mean of all of the scores.
SS’s
Sums of squares are a way of measuring
variability. Consequently:
SSTotal reflects how much all of the scores differ from each other
(if all the scores were the same they would all equal the total
mean and the squared distances would all be zero).
SSWithin reflects how much the scores differ from other scores in
the same group (if all the scores in a group are the same they
would all equal the mean of their group and the squared
distances would all be zero).
SSBetween reflects how much the group means differ from each
other (if all of the group means were the same they would all
equal the total mean and the squared distances would all be
zero.
Example…
Refer to the handout on partitioning SS.
Partitioning the df
The total df for the experiment would be the total number
of scores – 1.
dfTotal  N T  1
We are also going to partition that.
dfTotal
= dfWithin + dfBetween
N T  1  N T  a   a  1
What are d.f.s?
(Discuss in class)…
Mean Squares
A mean square is a Sum of Squares
divided by its degrees of freedom.
SS Within
MSWithin 
dfWithin
MSBetween
SSBetween

dfBetween
What is a Mean Square?
It is not normally computed as we won’t be needing it,
but to make a conceptual point, let’s look at MSTotal.
SSTotal
MSTotal 

dfTotal
Does that look familiar?
 (Y  Y)
N -1
2
Error Variance
The term error variance refers to the variance of the
population from which the scores were originally
sampled. The use of the term ‘error’ will be clearer
next semester, it refers to the error of using the
mean to predict each score. For now just think of
error variance as the variance of the population
from which we sampled. The ANOVA assumes that
each population in the study has the same
variance.
Mean Square Within Groups
MSWithin is an estimate of error variance
based upon how much the scores differ
inside of each group. Essentially, it uses
each group to estimate error variance, then
pools those different estimates into one
good estimate. If the N’s of each group are
the same then MSWithin is literally the mean
of the variance estimates from each group.
Mean Square Between Groups
MSBetween is an estimate of error variance based
upon how much the group means differ from
each other. Remember that the variance of the
population affects the variance of the sample
means (the standard error); well it also works the
other way, the variance of the sample means tells
us something about the variance of the
population from which those means were drawn.
F
Fobt
MSBetween

MS Within
MSBetween and MSWithin are two, independent estimates of the
same thing...error variance.
df Within
If H0 is true, μ F 
 1 (for largish df within )
df within  2
Logic of the ANOVA
When H0 is true: MSBetween and MSWithin are two, independent
estimates of error variance.
MSBetween est.
Fobt 

 1.00
MSWithin est.
2
Y
2
Y
When H0 is false: the independent variable makes the group means
differ more than they would if only chance were involved, which
affects MSBetween making it larger. The independent variable—
however—does not affect the variance inside of each group, thus
MSWithin is not affected.
Fobt
MSBetween


MSWithin
2
est.σ Y
 effect of IV
est.σ 2Y
 1.00
Example
IV=type of therapy (control group, vs behavior
modification vs psychoanalysis vs clientcentered vs gestalt)
DV=level of depression after 2 months
H0: μC= μBM= μPA= μCC= μG
HA: at least one μ is different than the rest.
Data
Control
Beh.
Mod.
Psychoanalysis
ClientGestalt
Centered
9
6
6
6
3
8
6
7
5
1
7
4
6
7
5
Y1  8
Y2  5.33 Y3  6.33
Y4  6
Y5  3
Bar Graph
8.00
Mean Y
6.00
4.00
2.00
0.00
Control
Behavior Mod
Psycho
Analysis
X
Client Centered
Gestalt
Computations
SSTotal=54.93
SSWithin=15.33
SSBetween=39.60
dfTotal=14
dfWithin=10
dfBetween=4
MSWithin=9.90
MSBetween=1.53
Fobt=6.46
Fc;.05,df1=4,df2=10=3.48
We reject H0
p Values & Expressing
Results
The exact p value can be obtained by either
performing the analysis using SPSS or by
using my F tool and inputting the df’s and the
value of Fobt. In this example p=.0078
The way the results are commonly expressed
are as F(df1,df2)=Fobt, p=... In our example it
would be: F(4,10)=6.46, p=.0078
Summary Table
Another common way of expressing the results of the analysis
is in a ‘Summary Table’.
Source
SS
Between
df
MS
F
p
39.60 4
9.90
6.46 .0078
Within
15.33 10
1.53
Total
54.93 14
Decision
H0: μC= μBM= μPA= μCC= μG
HA: at least one μ is different than the rest.
We have rejected H0, which means that we can
conclude that at least one of the population means
is different than the rest. It is tempting to say, for
example, that the control group (which had a mean
level of depression of 8) was more depressed than
the Gestalt group (which had a mean level of
depression of 3) but we cannot be that specific, we
can only say that at least one group was different
than the rest. We will learn in a future lecture how
to make more specific tests among the group
means.
Effect Size
Cohen’s d is not capable of determining an
overall effect due to the independent
variable when there are more than two
groups as we can’t expect d to equal zero
when H0 is true (i.e. when the independent
variable has no effect):
μ1  μ 2  μ 3  μ 4
Cohen's d 
?
σY
Effect Size (cont.)
For the overall effect of the independent
variable we will have to turn to measures of
association, which examine how much
knowing what group the score is in helps us
in predicting their score on the dependent
variable. We will be covering that next
semester. The measure we will be looking
at then is called R², and to get a general idea
of how it works...
R²=0 (knowing which group the score is in doesn’t help at all).
Group 1
Group 2
Group 3
3
5
9
3
5
9
3
5
9
R²=1.00 (knowing which group the score is in allows us to know
exactly what the score will be).
Group 1
3
3
3
Group 2
5
5
5
Group 3
9
9
9
You can see that R² will always be between 0 and 1
Computing R²
SS Between
R 
SS Total
2
In our example:
SS Between 39.6
R 

 0.72
SS Total
54.93
2
Cohen’s f
GPower uses Cohen’s f to express effect size.
While R² will be between 0 and 1, f expands
that out to be between 0 and infinity.
2
R
f
2
1- R
The conventions for relating f to effect size are:
.10=small effect .25=medium effect .40=large effect
Our Example
.72
f
 2.57  1.60
.28
A whopping big effect size (because we are in Oakley land
rather than using real data).
GPower and Cohen’s f
GPower will compute f for you if you give it the N’s of
each group, the means of each group, and the
standard deviation (the one you assume each
group has in common), but the equation on the
previous slide is much simpler. With this
information you can then compute the power a
priori and post hoc as you did with the t test.
In our example power=0.98 (ridiculously large for 3
scores per group, due to the big effect of the IV and
the small amount of within-group variance, a
byproduct of my making up the data).
Assumptions of This Use of
the F test
1.
2.
3.
Independence of scores (important).
All the populations are normally distributed (the
F test is ‘robust’ to this assumption, particularly if
N’s are large and roughly equal across groups).
Homogeneity of Variance (this can be violated if
you have roughly equal N’s across the groups).
Levenes’ test will evaluate this.
Homogeneity of Variance
If N’s are not equal, then the effects of violating this
assumption are:
1.
If larger sample size is associated with larger
variances then alpha decreases (biased towards
not making a type 1 error but at the expense of
power).
2.
If larger sample size is associated with smaller
variances then alpha increases (biased towards
making a type 1 error). If this is the case then
either select a smaller significance level (e.g. .01
rather than .05) or ‘transform’ your data.
Levene’s Test
Levene’s test for the inequality of variances can be
used to test whether a difference exists somewhere
among the population variances. In our example
with five types of therapy the hypotheses for
Levene’s test would be:
H0: σ²1= σ²2= σ²3= σ²4= σ²5
Ha: at least one σ² is different than the rest.
Now that we know how ANOVA works it is easy
to describe how Levene’s test works. Let’s
begin with a simply study with just two
groups.
Group 1
Group 2
9
26
10
28
10
32
11
34
Mean=10
Mean=30
S²=0.5
S²=10
Group 1
9
10
10
11
Mean=10
S²=0.5
Group 2
26
28
32
34
Mean=30
S²=10
The two groups have very different variances (10 vs.
30), we want to test to see whether it is reasonable
to conclude that the populations these groups came
from have different variances (our assumption is
about populations).
H0: σ²1= σ²2 Ha: σ²1 σ²2
Original Scores
Absolute deviations
from the group mean.
Group 1
Group 2
9
26
Group 1
Group 2
10
28
1
4
10
32
0
2
11
34
0
2
Mean=10
Mean=30
1
4
The first thing we do is to transform the original
scores to deviation scores that reflect how far each
score was from the mean of its group. Then we
take the absolute values of those deviations
Original Scores
Absolute deviations
from the group mean.
Group 1
Group 2
9
26
Group 1
Group 2
10
28
1
4
10
32
0
2
11
34
0
2
Mean=10
Mean=30
1
4
We have changed the data to being a measure of how much
each score differed from its group mean. In other words,
each score is now a measure of variability. We can see in
the absolute deviations that the original scores in group 1
did not vary much from their group mean (and thus didn’t
vary much from each other).
Absolute deviations from
the group mean.
Group 1
Group 2
1
4
0
2
0
2
1
4
Mean=.5
Mean=3
Levene’s test simply
performs an ANOVA (with
only 2 groups you could
use a t test) to see if the
mean of the deviations
differ significant in the
two groups, which tells
you whether the variance
of the original scores
differ significantly.
H0: μ1= μ2
Ha: μ1 μ2
Original Scores
Absolute deviations from the
group mean.
Group 1
Group 2
Group 1
Group 2
9
26
1
4
10
10
28
32
0
2
0
2
11
34
1
4
Mean=.5
Mean=3
Result of the ANOVA on the absolute deviation
scores. F(1,6)=15.00, p=.008, so we can conclude
that the difference in variances among the groups
(original scores) was statistically significant.
Our Example
Original
data
Control
B. M.
P.A.
C.C.
Gestalt
9
8
7
6
6
4
6
7
6
6
5
7
3
1
5
B. M.
P.A.
C.C.
Gestalt
.67
.67
1.33
M=.89
.33
.67
.33
M=.44
0
1
1
M=.67
0
2
2
M=1.33
Absolute Control
deviation
1
scores.
0
1
M=.67
For the |deviations|, F(4,10)=0.782, p=.562
Levene’s: Considerations
The previous example would actually have a problem
in real life, Levene’s test is not accurate when there
are very small N’s in each group. This problem
becomes negligible when you have 10 or more
scores per group.
Also, you should know that since Levene’s procedure
involves simply applying ANOVA to the absolute
mean deviation scores, that Levene’s too has the
assumption that the absolute deviation scores are
normally distributed and that the groups have equal
variances.
Levene’s: Assumptions
Levene’s test is fairly robust to violations of the
ANOVA assumptions. A study by Brown and
Forsythe (1974), however, suggests that if the
populations (of the original scores) are fat tailed
that you use the ’10 percent trimmed mean’ instead
of the mean when finding the absolute mean
deviations, and if the populations are skewed that
you find the absolute deviation from the median
rather than the absolute deviation from the mean.
To find the 10 percent trimmed mean first chop off
the 10% highest scores, and then the 10% lowest
scores before finding the mean.
Levene’s: Assumptions
The problem with using either the 10% trimmed mean
or the median is that SPSS will do Levene’s for you
using the mean, if you want to use the 10 percent
trimmed mean or the median you will have to do it
yourself in a similar fashion to what I did in
demonstrating how Levene’s works, you find the
correct deviations and then do an ANOVA on them.
Brown, M. G. & Forsythe, A. B. (1974) Robust tests
for the equality of variances. Journal of the
American Statistical Association, 69, 364-367