Transcript Slide 1

Group 5
AMS 572
Professor: Wei Zhu
1/87
Foram Sanghvi :Brief review of ANOVA
Shihui Xiang: Introduction to Repeated Measures Design
Qianzhu Wu: One-way repeated measures ANOVA
Yue Tang: Using the repeated statement of proc anova
Yan Xu: Two-Factor ANOVA with repeated Measures on One
Factor
Weina Gao: Two-Factor experience with Repeated Measure
on both factors
Yi Hu: Three-Factor experiments with a repeated measure on
the last factor
Xiaoke Fei: Three-Factor experiments with repeated measure
on two factors
Yuzhou Song: Mixed Model
2/87
Foram Sanghvi
3/87
The One-way ANOVA can test the equality of
several population means.
It is an extension of the pooled variance t-test
That is:
H0 (null hypothesis) : µ1 = µ2 = µ3 =…….. = µn
Ha (alternative hypothesis): At least one of means
differs from the rest.
Assumptions:
Equal population variances
Normal population
Independent samples
4/87
~Fa-1,N-a
Conclusion: Reject H0 if Fo>Fa-1,N-a
5/87
MSA =Variance of group mean
MSE =Mean of within group variance
Total sample size N=
Sample mean:
Grand mean:
Yij =observed response from experimental unit i when receiving effect j
~N(µi ,σ2 )
6/87
The most distinct disadvantage to the
analysis of variance (ANOVA) method is
that it requires two assumptions to be
made:
All population means from each data
group must be (roughly) equal.
All variances from each data group must
be (roughly) equal.
The normal subject-to-subject variation
may strongly affect the error sum of
squares.
7/87
8/87
-- A repeated measures design is one in which
at least one of the factors consists of
repeated measurements on the same
subjects or experimental units, under
different conditions.
9/87
A repeated measures design involves
measuring subjects at different points in
time (typically after different treatments)
It can be viewed as an extension of the
paired-samples t-test (which involved only
two related measures)
Thus, the measures—unlike in “regular”
ANOVA—are correlated, i.e., the
observations are not independent
10/87
• Data collected in a sequence of evenly
spaced points in time
• Treatments are assigned to experimental
units
11/87
By collecting data from the same
participants under repeated conditions the
individual differences can be eliminated or
reduced as a source of between group
differences.
Also, the sample size is not divided
between conditions or groups and thus
inferential testing becomes more powerful.
This design also proves to be economical
when sample members are difficult to
recruit because each member is measured
under all conditions.
12/87
13/87
As with any ANOVA, repeated measures ANOVA tests
the equality of means. However, repeated measures
ANOVA is used when all members of a random sample
are measured under a number of different conditions.
• As the sample is exposed to each condition, the
measurement of the dependent variable is repeated.
• Using a standard ANOVA in this case is not
appropriate because it fails to model the correlation
between the repeated measures: the data violate the
ANOVA assumption of independence.
14/87
• The simplest example of a repeated measures
design is a paired t-test.
Each subject is measured twice (time 1 and time 2)
on the same variable or each pair of matched
participants are assigned to one of two treatment
levels.
• If we observe participants at more than two timepoints, then we need to conduct a repeated
measures ANOVA.
15/87
What we would like to do is to decompose
the variability into:
(1) A random effect
(2) A fixed effect
The effect of participants is always a
random effect. We will only consider
situations where the factor is a fixed effect
16/87
Yij = μj +Si+εij
μj = The fixed effect.
Si= The random effect of subject i.
εij = The random error independent of Si
17/87
18/87
Assumptions of a repeated
measures design
For a repeated measures design, we start
with the same assumptions as a paired ttest :
Participants are independent and randomly
selected from the population
Normality (actually symmetry).
Due to having more than two
measurements on each participant, we
have an additional assumption on the
variances.
19/87
20/87
21/87
22/87
The assumptions we have to check
for a repeated measures design are:
1.Participants are independent
randomly selected from the
population
and
2.Normality (actually symmetry)
3. Compound symmetry
23/87
Consider the following experiment:
We have four drugs (1,2,3 and 4) that
relieve pain. Each subject is given each of
the four drugs. The subject’s pain tolerance
is then measured. Enough time is allowed
to pass between successive drug
administrations so that we can be sure
there’s no residual effect from the previous
drug.
The null hypothesis is:
Mean(1)=Mean(2)=Mean(3)=Mean(4)
24/87
In the one-way analysis of variance
without a repeated measure, we would
have each subject receive only one of
the four drugs. In this design, each
subjects is measured under each of
the drug conditions. This has several
important advantages.
25/87
Each subject acts as his own control.
i.e. : drugs effects are calculated by
recording deviations between each
drug score and the average drug score
for each subject.
The normal subject-to-subject
variation can thus be removed from the
error sum of squares.
26/87
27/87
28/87
SAS code without using repeated statement
DATA PAIN;
INPUT SUBJ DRUG
PAIN;
DATALINES;
115
129
reconstruct
136
1 4 11
217
2 2 12
……
;
PROC ANOVA DATA=PAIN;
TITLE ‘without repeated statement';
CLASS SUBJ DRUG;
MODEL PAIN=SUBJ DRUG;
MEANS DRUG/DUNCAN;
RUN;
DATA PAIN;
INPUT SUBJ @;
DO DRUG = 1 to 4;
INPUT PAIN @;
OUTPUT;
END;
DATALINES;
1 5 9 6 11
2 7 12 8 9
3 11 12 10 14
43858
;
29/87
SAS code without using repeated statement
To keep reading from the same line of data
DATA PAIN;
INPUT SUBJ @;
DO DRUG = 1 to 4;
iterative loop
INPUT PAIN @;
OUTPUT;
END;
DATALINES;
1 5 9 6 11
2 7 12 8 9
a lot easier!
3 11 12 10 14
43858
;
30/87
SAS code without using repeated statement
Remark 1: about the DO statement
the general form:
initial
value
ending
value
Do variable = start TO end BY increment;
(SAS Statements)
Default: 1
END;
31/87
SAS code without using repeated statement
Remark 1: about the DO statement
in our example:
initial
value: 1
ending
value: 4
DO DRUG = 1 to 4;
INPUT PAIN @;
OUTPUT; to keep reading from the same line of data
END;
return to “DO”
32/87
SAS code without using repeated statement
Remark 2: about the ANOVA procedure
PROC ANOVA DATA=PAIN;
TITLE ‘without repeated statement';
CLASS SUBJ DRUG;
MODEL PAIN=SBJ DRUG;
MEANS DRUG/DUNCAN;
No “|” : they are
RUN;
each main effects
and no interaction
terms between
them.
33/87
SAS code using the REPEATED Statement
DATA REPEAT;
INPUT PAIN1-PAIN4;
DATALINES;
5 9 6 11
7 12 8 9
11 12 10 14
3858
;
PROC ANOVA DATA=REPEAT;
TITLE 'using repeated statement';
MODEL PAIN1-PAIN4 = / NOUNI;
REPEATED DRUG 4 (1 2 3 4);
RUN;
34/87
SAS code using the REPEATED Statement
Remark 1 : about the data set
We need the data set in the form:
SUBJ PAIN1 PAIN2 PAIN3 PAIN4
NOTICE that it does not
have a DRUG variable
35/87
SAS code using the REPEATED Statement
Remark 2 : about the REPEATED Statement
The general form:
To compute pairwise
comparisons
REPEATED factor_name CONTRAST(n);
•N is a number from 1 to k, with k being # levels
of repeated factor;
•To get all pairwise contrasts, we need k-1
repeated statements
36/87
SAS code using the REPEATED Statement
Remark 2 : about the REPEATED Statement
In our example:
Request ANOVA tables
for each contrast
PROC ANOVA DATA=REPEAT;
TITLE 'using repeated statement';
MODEL PAIN1-PAIN4 = / NOUNI;
REPEATED DRUG 4 CONTRAST(1) / SUMMARY;
REPEATED DRUG 4 CONTRAST(2) / SUMMARY;
REPEATED DRUG 4 CONTRAST(3) / SUMMARY;
RUN;
37/87
SAS code using the REPEATED Statement
Remark 3 : more explanation of the ANOVA procedure
PROC ANOVA DATA=REPEAT;
TITLE 'using repeated statement';
MODEL PAIN1-PAIN4 = / NOUNI;
REPEATED DRUG 4 (1 2 3 4);
RUN;
•No CLASS: our data set does not have an independent variable
•NOUNI: not to conduct a separate analysis for each of the four PAIN
•4: the repeated factor “DRUG” has four levels; optional
•(1 2 3 4): the labels we want printed for each level of DRUG
38/87
SAS code using the REPEATED Statement
Remark 4 : comparison of the DATA steps
DATA PAIN;
INPUT SUBJ DRUG
PAIN;
DATALINES;
115
129
136
1 4 11
217
2 2 12
……
;
DATA PAIN;
INPUT SUBJ @;
DO DRUG = 1 to 4;
INPUT PAIN @;
OUTPUT;
END;
DATALINES;
1 5 9 6 11
2 7 12 8 9
3 11 12 10 14
43858
;
DATA REPEAT;
INPUT PAIN1-PAIN4;
DATALINES;
5 9 6 11
7 12 8 9
11 12 10 14
3858
;
39/87
40/87
Repeated
Factor A: GROUP
Factor B: TIME
PRE
POST
Subject
Control
Treatment
1
80
83
2
85
86
3
83
88
4
82
94
5
87
93
6
84
98
41/87
42/87
43/87
Treatment
df=a-1
Between subjects
Error due to
subjects within
treatment
df=a(n-1)
Total Variance
df=N-1
Time
df=b-1
Within subjects
a: # of treatment groups
b: # of time points
n: # of subjects per treatment
N=a×b×n: total # of measurements
Treatment × time
df =(a-1)×(b-1)
Error or residual
df =a×(n-1)×(b-1)
44/87
Source
d.f.
SS
MS
Factor A
a-1
SSA
MSA = SSA/(a-1)
Factor B
b-1
SSB
MSB = SSB/(b-1)
AB interaction
(a-1)(b-1)
SSAB
MSAB = SSAB/(a-1)(b-1)
Subjects (within A)
a(n-1)
SSWA
MSWA = SSWA/a(n-1)
Error
a(n-1)(b-1)
SSE
MSE = SSE/a(n-1)(b-1)
Total
nab-1
SST
45/87
46/87
Data prepost;
Input subj group $ pretest postest;
datalines;
1 c 80 83
2 c 85 86
3 c 83 88
4 t 82 94
5 t 87 93
6 t 84 98
;
run;
proc anova data=prepost;
title 'Two-way ANOVA with a Repeated Measure on One Factor';
class group;
model pretest postest = group/nouni;
repeated time 2 (0 1);
means group;
run;
47/87
MANOVA Test Criteria and Exact F Statistics for the
Hypothesis of no time Effect
Statistic
Value
Wilks' Lambda
0.13216314
Pillai's Trace
0.86783686
Hotelling-Lawley Trace 6.56640625
Roy's Greatest Root
6.56640625
F Value
26.27
26.27
26.27
26.27
Num DF
1
1
1
1
Den DF
4
4
4
4
Pr > F
0.0069
0.0069
0.0069
0.0069
MANOVA Test Criteria and Exact F Statistics for the Hypothesis of
no time*group Effect
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root
Value
0.32611465
0.67388535
2.06640625
2.06640625
F Value
Num DF
8.27
8.27
8.27
8.27
1
1
1
1
Den DF
Pr > F
4 0.0452
4 0.0452
4 0.0452
4 0.0452
48/87
Tests of Hypotheses for Between Subjects Effects
Source
DF
Anova SS
Mean Square
F Value
group
Error
1
4
90.75000000
30.66666667
90.75000000
7.66666667
11.84
Pr > F
0.0263
Univariate Tests of Hypotheses for Within Subject Effects
Source
time
time*group
Error(time)
Level of
group
N
c
t
3
3
DF
Anova SS
Mean Square
F Value
1
1
4
140.0833333
44.0833333
21.3333333
140.0833333
44.0833333
5.3333333
26.27
8.27
-----------pretest----------Mean
Std Dev
82.6666667
84.3333333
2.51661148
2.51661148
Pr > F
0.0069
0.0452
-----------postest----------Mean
Std Dev
85.6666667
95.0000000
2.51661148
2.64575131
49/87
50/87
Two-factor ANOVA
The subject are taken under the levers of both
factors
Subjec
t
B1
B2
…
Bb
1
A1
Y111
Y112
…
Y11B
…
…
…
…
…
…
I
Aa
YI11
YI12
…
…
…
…
…
…
1
Aa
Y1a1
Y1a2
…
Y1aB
…
…
…
…
…
…
I
Aa
YIa1
YIa2
…
YIaB
…
…
…
…
…
…
1
AA
Y1A1
Y1A2
…
Y1AB
…
…
…
…
…
…
I
AA
YIA1
YIA2
…
YIAB
YI1B
A and B denote
the two factors
and Yiab denote
the
measurement
taken from ith
subject when
the level of
factor A takes
on the value a
and that of B
takes on the
value b.
51/87
Two Factors Model
• All the groups have equal variances
Fixed
effects of
factors
Random
effects due to
subjects







Y
i
a
ba



b

a
b

i

i
a

i
b

e
i
a
b
~
N
(
0
,

)
,~
N
(
0
,

)
,

~
N
(
0
,

)
eN
~
(
0
,

)
,
i
i
a
b
2
i
2
a
2
b
2
b
52/87
The fixed model estimated as followed:
ˆ  Y ,
ˆ a  Y a  Y ,
ˆ b  Y b  Y ,
ˆ ab  Y ab  Y ,
53/87
RM Anova Table:
Source
DF
SS MS
Factor A
A—1
Sa
Sa
M
Sa
A1
Factor B
B—1
Sb
Subjects
I—1
Si
Sb
M
Sb
A1
Si
M
Si 
I1
A*Subjects
(A—1)(I—1)
Sai
B*Subjects
(A—1)(I—1)
S
a
i
Sbi M
A*B
(A—1)(B—1)
Error
dABI
Total
ABI—1
S
a
i
(
A

1
)
(
I
1
)
F-Value
MS a
MS ai
MS b
MS bi
M Si
M Se
MS ai
MS e
MS bi
MS e
S
b
i
S
b
i
Sab M
(
B

1
)
(
I
1
) MS ab
S
a
b
MS e
Se M
S
a
b

(
A

1
)
(
B

1
)
St
Se 54/87
M
Se 
dABI
Example:
A group of subjects is treated in the morning
and afternoon of two different days. On one of
the days, the subjects receive a strong
sleeping aid the night before the experiment is
to be conducted; on the other, a placebo.
treat
control
Time
A.m.
P.M
drug
subject
reaction
subject
reaction
1
65
1
70
2
72
2
78
3
90
3
97
1
55
1
60
2
64
2
68
3
80
3
85
55/87
SAS Code
data repeat;
input react1-react4;
proc anova data=repeat;
datalines;
65 70 55 60
model react1-react4= /nouni;
72 78 64 68
repeated time 2, treat 2 /nom;
90 97 80 85
run;
;
Run;
56/87
A portion of output from
SAS
57/87
Interpretation
• According to the observed p-values, except the
interactions, we can reject that time and treat
are not significantly different.
•The drug increase reaction time
• Reaction time is no longer in the morning
compared to the afternoon
• The interaction of treat and time is not
significant
58/84
59/87
Consider a marketing experiment:
•Male and female subjects are offered
one of three different brands of coffee.
•Each brand is tasted twice; once after
breakfast, the other time after dinner.
•The preference of each brand is
measured on a scale from 1 to
10(1=lowest, 10=highest).
60/87
The experimental design is shown
below:
Three-Factor Experiment
with a Repeated Measure on the last
factor
Meal: Repeated Measure Factor
61/87
SAS Program:
62/87
OUTPUT(Part 1/4):
63/87
OUTPUT(Part 2/4):
64/87
OUTPUT(Part 3/4):
65/81
65/87
OUTPUT(Part 4/4):
66/87
67/87
68/87
A group of high- and low-SES children is selected
for the experiment. Their reading comprehension
is tested each spring and fall for three consecutive
years. A Diagram of the design is shown here:
69/87
Notice that each subject is measured each spring and fall of
each year so that the variables SEASON and YEAR are
both repeated measures factors.
To analyze this experiment, we will use the SAS program: the
REPEATED statement of PROC ANOVA:
DATA READ
INPUT SUBJ SES $ READ1-READ6;
LABEL READ1 = 'SPRING YR 1’
READ2 = 'FALL YR 1’
READ3 = 'SPRING YR 2’
READ4 = 'FALL YR 2’
READ5 = 'SPRING YR 3’
READ6 = 'FALL YR 3';
70/87
DATALINES;
1 HIGH 61 50 60 55 59 62
2 HIGH 64 55 62 57 63 63
3 HIGH 59 49 58 52 60 58
4 HIGH 63 59 65 64 67 70
5 HIGH 62 51 61 56 60 63
6 LOW 57 42 56 46 54 50
7 LOW 61 47 58 48 59 55
8 LOW 55 40 55 46 57 52
9 LOW 59 44 61 50 63 60
10 LOW 58 44 56 49 55 49
;
PROC ANOVA DATA=READ;
TITLE "READING COMPREHENSION ANALYSIS";
CLASS SES;
MODEL READ1-READ6 = SES / NOUNI;
REPEATED YEAR 3, SEASON 2;
MEAN SES;
RUN;
71/87
Since the REPEATED statement is confusing
when we have more than one repeated factor, it is
important for you to know how to determine the
order of the factor names. Look at the REPEATED
statement in this example:
REPEATED YEAR 3, SEASON 2;
This statement instructs the ANOVA procedure to
choose the first level of YEAR(1), then loop
through two levels of SEASON(SPRING FALL),
then return to the next level of YEAR(2), followed
by two levels of SEASON, etc.
72/87
READING COMPREHENSION ANALYSIS
The ANOVA Procedure
Class Level Information
Class
SES
Levels
2
Values
HIGH LOW
Number of Observations Read
Number of Observations Used
10
10
The ANOVA Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects
Source
DF
SES
Error
1
8
Anova SS
680.0666667
401.6666667
Mean Square
F Value
680.0666667
50.2083333
13.54
Pr > F
0.0062
The ANOVA Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects
Source
DF
YEAR
2
Anova SS
252.0333333
Adj Pr > F
Mean Square F Value Pr > F
126.0166667
G-G
H-F-L
26.91 <.0001 0.0002 <.0001
73/87
YEAR*SES
0.8450
Error(YEAR)
Source
SEASON
SEASON*SES
Error(SEASON)
2
1.0333333
0.5166667
0.11 0.8962 0.8186
16 74.9333333
4.6833333
Greenhouse-Geisser Epsilon
0.6757
Huynh-Feldt-Lecoutre Epsilon 0.7642
DF
1
1
8
Anova SS
680.0666667
112.0666667
24.2000000
Mean Square
680.0666667
112.0666667
3.0250000
F Value
Pr > F
224.82 <.0001
37.05 0.0003
74/87
Adj Pr > F
Source
F-L
DF
YEAR*SEASON
<.0001 <.0001
YEAR*SEASON*SES
0.7592 0.7905
Error(YEAR*SEASON)
Anova SS
2
Mean Square F Value Pr > F
265.4333333
132.7166667
2
0.4333333
0.2166667
16
18.8000000
1.1750000
Greenhouse-Geisser Epsilon
Huynh-Feldt-Lecoutre Epsilon
G-G
H-
112.95 <.0001
0.18 0.8333
0.7073
0.8147
75/87
High-SES student have higher reading comprehension
scores than low-SES students (F=13.54, p=0.0062).
Reading comprehension increases with each year
(F=26.91, p=0.0001).
Students had higher reading comprehension scores in the
spring compared to the following fall (F=224.82, p=0.0001)
The "slippage" was greater for the low-SES students (there
was a significant SES*SEASON interaction [F=37.05,
p=0.0003}).
"Slippage" decreases as the students get older
(YEAR*SEASON is significant [F=112.95, p=0.0001]).
76/87
77/87
What is Mixed Model?
Mixed Model: When we have design in
which we have both random and fixed
variables, we have what is often called a
mixed model.
78/87
Do not have to assume
sphericity in the model.
Do not have to assume
compound symmetry in the
model.
79/87
We can use “Proc Mixed”
statement to deal with the Mixed
model.
80/87
Example:
81/87
Treat this case as a standard repeated
measure anova. We can get the following result:
82/87
Treat it as Mixed model
SAS program:
83/87
The result of using mixed model:
84/87
comparing results of the two methods, it is obvious
that Mixed model has following advantages:
The degree of freedom is bigger.
The interaction is significant.
85/87
86/87
87/87