Two-Factor Fixed Effects Model

Download Report

Transcript Two-Factor Fixed Effects Model

Two-Factor Fixed Effects Model
• The model usually used for this design includes effects for both
factors A and B.
• In addition, it includes interaction an term.
• The model is given by: Yijk = μ + αi + βj + γij + εijk where
Term
Description
μ
Overall mean
αi
Effect of level i of factor A
βj
Effect of level j of factor B
γij
Interaction effect of factor A level i and factor B level j
εijk
Experimental error
STA305 week 5
1
Some Notation
• First, suppose that the design is balanced in the sense that the number
of experimental units randomly allocated to the combination of factor
A level i and factor B level j is the same for all i and j.
• That is, suppose that rij = r.
• Further, suppose the factor A has a levels and the factor B has b levels.
Let
a
b
n  i 1  j 1 rij  rab.
• The total number of experimental units exposed to factor A level i is
b
ri   j 1 rij  br.
• The total number of experimental units exposed to factor B level j is
r j  i 1 rij  ar .
a
STA305 week 5
2
Sample Means
• In the 2-factor model, several means are useful for understanding
the data and deriving sums of squares. They are:
1 a b r
Overall mean: (mean of all observations) Y   Yijk .
n i 1 j 1 k 1
1 b r
Yi   Yijk .
Factor A level i: (across all B levels)
rb j 1 k 1
1 a r
  Yijk
ra i 1 k 1
Factor B level j: (across all A levels)
Y j
Factor A level i and Factor B level j
1 r
Yij   Yijk
r k 1
STA305 week 5
3
Model Assumptions
• As in the 1-factor model, we assume that the εijk are i.i.d. N(0, σ2)
• What assumptions must be made about αi , βj and γij?
• In order to obtain unbiased estimators, we require that:
a

i 1
i
0 ,
b

j 1
a

0 ,
j
i 1
b
ij
   ij  0
j 1
• To see that this results in unbiased estimates, consider the overall
sample mean…
• Similarly, consider the sample mean for the experimental unites that are
exposed to factor A level i…
• Exercise: show that
1
rb
 
a
r
i 1
k 1 ijk
Y
is an unbiased estimator of μ + βj.
STA305 week 5
4
Total Variability
• In any sample of data, the sample variance is used as a measure of the
total variability in the data.
• In the 2-factor model the sample variance can be written as:
SST
1 a b r
2


s 
Y

Y

 ijk  n  1
n  1 i1 j 1 k 1
2
• So the total variation in the data is measured by total sum of squares.
• As in one-factor model, differences between mean response for each
factor level contribute to the total variability seen in data.
• In the two-factor model, both factors A and B contribute to total
variability, as does the A × B interaction.
STA305 week 5
5
Partitioning SST
•
Each observation Yijk makes a contribution of Yijk  Y to the total variability.
•
Difference between each observation and overall mean can be explained by 4
components:
1. Difference between mean for factor A level i and overall mean: Yi  Y
Exercise: show that the expected value of this difference is αi.
2. Difference between mean for factor B level j and overall mean: Y j Y
Exercise: show that the expected value of this difference is βj.
3. Interaction between factor A level i and factor B level j: Yij  Yi  Y j  Y
Exercise: show that the expected value of this is γij.
4. Experimental error: Yijk  Yij .
Exercise: show that expected value of this is 0.
•
The total sum of squares can be rewritten and expanded as follows…
STA305 week 5
6
Degrees of Freedom
• Since one of the model requirement is that, i  0 , there are a − 1
degrees of freedom for estimating mean response for levels of factor A.
• Similarly, there are b − 1 degrees of freedom for factor B.
• The interaction degrees of freedom is the number of degrees of
freedom for the treatments/cells (which is # of treatments - 1 = ab − 1),
minus the degrees of freedom for factors A and B. That is,
ab − 1 − (a − 1) − (b − 1) = (a − 1)(b − 1).
• Since the total degrees of freedom are n-1, the degrees of freedom
available for estimating experimental error variance is found by
subtraction. It is given by,
n − 1 − (a − 1) − (b − 1) − (a − 1)(b − 1) = ab(r − 1).
STA305 week 5
7
Expected Mean Squares
• The expected mean squares can be found using same approach as for
one-factor design.
• Exercise: verify that the following are true:
 SS 
E MS A   E  A    2 
 a 1
 SS 
E MS B   E  B    2 
 b 1
bri 1  i2
a
a 1
ar j 1  j2
b
b 1
r i 1  j 1  ij2
a
b
 SS AB

   2 
E MS AB   E 
a  1b  1
 a  1b  1 
 SSE 
   2
E MS E   E 
 abr  1 
STA305 week 5
8
Hypothesis Testing
• The expected mean squares provide motivation for test statistics.
• The first test should always be for interaction effects.
• If the interaction effects are found to be 0, then go ahead and test for
main effect of A and B.
• If the interaction effects are not 0, it might be best not to test for
main effect of A and B since the interpretation of the main effects is
difficult in presence of interactions.
• The tests for factor A effects and for factor B effects are designed to
ask about whether the effects of factor A are 0 across all levels of
factor B, and vice versa.
• However, if there is an interaction, we know that effects of factor A
vary depending on level of factor B, and vice versa.
STA305 week 5
9
Test for Interactions
• Note that if the interaction effects are all 0, then E(MSA×B) = σ2 = E(MSE).
• So if there are no interaction effects we would expect the ratio of the above
mean squares to be close to 1 and larger otherwise.
• The hypothesis of interest is: H0 : γij = 0, for all i, j
Ha : at least one γij ≠ 0.
• We can use Cochran’s theorem again to show that test statistic has Fdistribution and is given by:
MS AB
Fobs 
~ F a  1b  1, abr  1
MS E
• We can then calculate the P-value and make a decision.
• If P-value is small and H0 is rejected, then do not go on to test for effects of A
or B.
• If P-value is large then there is no evidence of interaction between factors A
and B. In this case, proceed to test whether factor A or factor B has an effect.
STA305 week 5
10
Main Effects
• The effects of factor A and factor B are known as the main effects.
• Recall, from the 1-factor model that if treatment A has no effect then
E(MSA) = σ2 = E(MSE).
• Again, this suggests using ratio MSA/MSE as the test statistic.
• If factor A has no effect then this ratio should be close to 1; otherwise we
expect it to be large.
• The hypothesis test if interest is: H0 : αi = 0, for i = 1, 2, . . . , a
Ha : at least one αi ≠ 0.
• The test statistic is Fobs = MSA/MSE ~ F(a-1, ab(r -1)).
• We can then calculate the P-value.
• The test for main effect of factor B is constructed in a similar manner.
STA305 week 5
11
ANOVA Table
• The ANOVA table for the 2-factor fixed effect model is:
STA305 week 5
12
What to Do When Interactions Are Present
• When the test for interaction is significant, it is difficult to interpret
tests for main effects.
• Instead, we could analyze the data as a 1-factor model where each
cell is a treatment.
• That is, the new ’factor’ would have ab levels.
• The text book calls this the cell means model, it is given by:
Yijk = μ + τij + εijk
where τij = αi + βj + γij
• This would allow comparison of specific cells or combinations of A
and B levels.
STA305 week 5
13
Estimation of Main Effects
• Suppose the researchers are interested in estimating the average
response for level i of factor A: μ + αi .
• We have seen before that Yi is an unbiased estimator of μ + αi.
• To find a confidence interval for μ + αi , we need the variance of Yi …
• Further, Y has a distribution that is N(μ + αi, σ2/br).
i
• We can use the MSE as the estimate of σ2 since it is unbiased.
• The 100(1 − α)% confidence interval for μ + αi, the average response for level i
of factor A is:
MSE
Yi  t  abr  1
2
br
where tα/2(ab(r − 1)) is upper percentile of the t-distribution with ab(r − 1) d.f.
• Confidence intervals for the mean response at level j of factor B can be found
in a similar manner.
STA305 week 5
14
Contrasts in 2-Factor Design
• Recall that a treatment is any combination of a Factor A level with a
Factor B level.
• To compare specific treatments use cell means model as defined in
slide 13, and define contrasts of interest.
• For example, suppose researcher plans to test the hypothesis that the
mean for cell 23 is the same as the mean for cell 34.
• The contrast of interest is μ23 − μ34 = 0, which can be estimated by
Y23  Y34 .
• Contrasts for the cell means model are done in the same way as those
for 1-factor model.
• The total number of orthogonal contrasts possible is ab − 1, which is
the number of treatments – 1.
STA305 week 5
15
• Generally, we write contrast and test it as follows…
STA305 week 5
16
Using Contrasts to Test Interactions
• We can use contrasts in the cell means model to test whether lines on
the interaction plots are parallel.
• For example, τ12 − τ22 is the mean change in Factor A when going from
level 1 to level 2, when the level of Factor B is 2.
• If there was no interaction, then this change should be the same at all
levels of B.
• So we might be interested, for example, in the contrast
(τ12 − τ22) − (τ15 − τ25).
STA305 week 5
17
• More generally, we might be interested in the interaction contrast of
the form: (τij − τ(i+1)j) − (τik − τ(i+1)k).
• Using the fact that τij = αi + βj + γij the interaction contrast can be
shown to be equal to (γij − γ(i+1)j) − (γik − γ(i+1)k).
• In order to be an interaction contrast, the contrast must be of the
form cij ij where:
a
c
i 1
ij
 0 for all j, and
b
c
j 1
ij
0
for all i.
• Note, that this requirement is more specific than the requirement in
the general case of a contrast that.
STA305 week 5
18
Main Effects
• Although ANOVA can be used to test whether all levels of Factor A
have the same mean, it doesn’t indicate which of the a means are the
same and which ones differ.
• If no interaction was found, we could do pairwise comparisons as in the
1-factor case.
• The hypotheses concerning specific levels of Factor A may be of
interest to the researcher.
• Contrasts of the form i 1 ci  i can be used to conduct these tests, where
μi = μ + αi.
a
• We could also use the cell means model to construct contrasts for main
effects by using….
STA305 week 5
19
• Tests concerning levels of Factor B can be constructed in an
analogous manner by interchanging the roles of a and b in the
above, and setting μj = μ + βj …
STA305 week 5
20
Examples of Two-Factor Design
• Two examples below illustrate some aspects of analysis of two-factor
design.
• In both cases, equal number of experimental units was randomly
allocated to each combination of factor levels.
• The designs are similar, but there are differences in the hypotheses of
interest, and the steps taken in analysis.
• In the first example (slide 22) interaction is found not to be significant,
so tests concerning main effects can be made using ANOVA.
• In the second example interactions are significant, and cells means
model is used to compare treatments of interest.
STA305 week 5
21
Example - Reaction Time Experiment
• Background: The experiment was described in week 3 lecture notes
(slides 15-17), where it was analyzed as single-factor experiment. The
data in fact arose from 2-factor experiment, & here 2-factor analysis is
carried out.
• Goal: Subjects must press computer key after being given stimulus.
Subject were warned that stimulus is coming by either auditory or
visual cue. Time between cue and stimulus, also of interest, was 5, 10,
or 15 seconds. Response measure was time from stimulus to pressing
computer key. Goal of experiment was to determine whether type of
cue, or time between cue and stimulus had effect on response time.
• Other Aspects of the Design: 3 subjects were randomly allocated to
each of 6 possible combinations of cue type and time between cue and
stimulus.
STA305 week 5
22
• The Data: Response times were measured in seconds and are
presented in the following table.
STA305 week 5
23
• Analysis: The goal of the study is to determine whether either of the
2 factors has an effect on the response time. However, the first step
needs to be analysis of interaction effect. If interactions are present,
the test for main effects is not straightforward to interpret.
• Plot the Means: Visual inspection is a useful first step in
determining whether there is an interaction between type of cue and
time between cue and stimulus. The plot is given on the next slide.
Although lines aren’t quite parallel, departure from parallel doesn’t
appear to be too great.
• ANOVA Table: The next step is to test whether interaction effects
are significant. For this we first construct the ANOVA table. It is
given on slide 26.
STA305 week 5
24
STA305 week 5
25
STA305 week 5
26
Example Battery Lifetime Study
•
The source of this example is: Montgomery, Section 6.3.1.
•
Background: Engineer designing battery for use in device that will be
subjected to some extreme temperatures. Three possible plate materials
for battery will be studied at 15˚F, 70˚F, and 125˚F. Outcome of interest
is lifetime of battery (in hours).
•
Goal: Engineer wants to answer the following questions:
1. What effects do material type and temperature have on lifetime of battery?
2. Is there a choice of material that would give uniformly long life regardless
of temperature?
3. Past experience leads engineer to believe that all materials will have same
mean lifetime at 15˚F, & that this mean will be the same as that for
material 3 at 70˚F. Do the data support this?
•
Sample Size/Randomization: 4 randomly selected batteries of each material
will be studied at each of the 3 temperatures of interest.
STA305 week 5
27
• Data: The data are given in the following table:
STA305 week 5
28
• Plot the Means: The plot of means can help understand effects of
material type and temperature on battery lifetime. It is given below:
STA305 week 5
29
• From the plot it appears to be large interaction between material and
temperature.
• Generally, lifetimes are longest at lowest temperature for all
materials.
• Changing from low to intermediate temperature, battery life with
material 3 increases, while it decreases for materials 1 and 2.
• From intermediate to high temperature, mean lifetime decreases for
materials 2 and 3 but is unchanged for material 1.
• Material 3 seems to give the best results in terms of consistent
lifetimes across temperatures.
STA305 week 5
30
• ANOVA: The ANOVA table is given below.
• As we can see, the ANOVA confirms that interaction between
material and temperature is significant.
STA305 week 5
31
Cell Means Model
• In order to answer the last of engineer’s questions, need to fit a cell means
model and use contrasts.
• To fit cell means model, recode the treatments as follows:
- 11, 12, 13 correspond to material 1 at temperatures 15˚F, 70˚F, and 125˚F
- 21, 22, 23 correspond to material 2 at temperatures 15˚F, 70˚F, and 125˚F
- 31, 32, 33 correspond to material 3 at temperatures 15˚F, 70˚F, and 125˚F
• The model is now a 1-factor model with 9 treatments: Yijk = μ + τij + εijk .
• To test hypotheses for question 3, we can use the set of contrasts that are
given in the following table
STA305 week 5
32
• Are these contrast orthogonal?
• To answer the question, we create additional rows in ANOVA table. It
is given below.
• Note that this isn’t a complete set of orthogonal contrasts so they
won’t sum to SSTreatment.
• Since none of these contrasts is significant, the data don’t provide any
evidence against the engineer’s belief that all materials will have
same mean lifetime at 15˚F, & that this mean will be same as that for
material 3 at 70˚F.
STA305 week 5
33
Unbalanced Design
• So far only balanced design has been considered.
• Case where not all rij are equal can also be handled.
• The expressions for sums of squares must be adjusted.
• The degrees of freedom for A, B, and A × B stay the same as for
balanced design.
• The degrees of freedom for the error and the total must be adjusted as
follow:
total degrees of freedom = n − 1
error degrees of freedom = (n − 1) − (a − 1) − (b − 1) − (a − 1)(b − 1)
STA305 week 5
34
Special Case: Model with No Interaction Terms
• Usually the two-factor model includes interaction terms.
• In some cases researchers might know from past experience that factors
being studied have no interaction effects when used together.
• In such a case, it is OK to use model with no interaction terms:
Yijk = μ + αi + βj + εijk.
• Since only main effects are included in model, it is known as maineffects model.
• In balanced design, the degrees of freedom for A, B, and total are as for
model with interaction.
• However, degrees of freedom that would have been used to estimate
interaction can now be used estimate experimental error.
STA305 week 5
35
• Therefore, the degrees of freedom for the error can be found by
subtraction. That is,
error degrees of freedom = (n − 1) − (a − 1) − (b − 1) = n − a − b + 1.
• The expressions for sums of squares for A, B, and total are the same as
for the model with interaction.
• The SSE is found by subtraction.
• The ANOVA Table for Main-Effects Model is given below.
STA305 week 5
36
Special Case: One Observation per Cell
• In some cases it is not feasible to study more then one experimental
unit under each set of conditions.
• In this case, the result is a 2-factor experiment with a single replicate.
• The statistical model in this case is: Yij = μ + αi + βj + γij + εij.
• By examining expected mean squares (as was done earlier) we can see
that σ2 is not estimable.
• The interaction effect γij and the experimental error can’t be separated.
• As a result, there is no way to construct tests about main effects unless
the interaction effect is 0.
STA305 week 5
37
• If reasonable to assume no interaction, then could use main-effects
model: Yij = μ + αi + βj + εij.
• For this situation, σ2 can be estimated.
• The main effects can be tested by comparing MSA (or MSB) to MSE.
• The ANOVA table for this case is given below
STA305 week 5
38
Two - Factor Design in SAS
• Fitting full 2-factor design model using PROC GLM in SAS is done as follows:
proc glm data = mydata ;
class factorA factorB ;
model response = factorA factorB factorA*factorB ;
run ;
• Interaction term is denoted by factorA*factorB.
• To fit a model without interaction, leave this term out.
• To use contrasts to test hypothesis concerning Factor A, say that 1st level has
same mean as 2nd level, contrast would be specified by using this contrast
statement (assuming that Factor A has 5 levels):
proc glm data = mydata ;
class factorA factorB ;
model response = factorA factorB factorA*factorB ;
contrast ’Level 1 vs Level 2’ factorA 1 -1 0 0 0 ;
run ;
STA305 week 5
39
SAS Code Used in Reaction Time Example
• The following code create the dataset.
data reaction ;
input cue $ cstime reaction ;
cards ;
Auditory 5 0.204
Auditory 5 0.170
Auditory 5 0.181
Auditory 10 0.167
.....
Visual 15 0.281
Visual 15 0.258
;
run ;
STA305 week 5
40
• The following code is used in order to get cell means to use in plot.
proc summary data = reaction nway ;
class cue cstime ;
var reaction ;
output out = reaction2 (drop = _type_ _freq_)
mean = reaction ;
run ;
• The following code is use to produce the plot of cell means.
proc gplot data = reaction2 ;
plot reaction * cstime = cue ;
label cue = ’Type of Cue’ ;
run ;
STA305 week 5
41
• The following code is used to fit the 2-factor model.
proc glm data = reaction ;
class cue cstime ;
model reaction = cue cstime cue*cstime ;
run ;
STA305 week 5
42
SAS Code Used in Battery Example
• The following code create the dataset.
data battery ;
input material temperature lifetime ;
cards ;
1 15 130
1 15 155
1 15 74
1 15 180
1 70 34
....
3 125 82
3 125 60
;
run ;
STA305 week 5
43
• The following code is used to get cell means for plotting
proc summary data = battery nway ;
class material temperature ;
var lifetime ;
output out = battery2 (drop = _type_ _freq_)
mean = lifetime ;
run ;
• The following code is used to produce the plot cell means
proc gplot data = battery2 ;
plot lifetime * temperature = material ;
label material = ’Material Type’ ;
run ;
• The following code is used to fit a model with interaction
proc glm data = battery ;
class material temperature ;
model lifetime = material | temperature ;
run ;
STA305 week 5
44
• The following code is used to recode data for cell means model.
data recode ;
set battery ;
treatment = 10 * material + (temperature-15)/55 + 1 ;
run ;
• The following code is used to fit cell means model & get contrasts.
proc glm data = recode ;
class treatment ;
model lifetime = treatment ;
contrast ’15F M1 vs M2’ treatment 1 0 0 -1 0 0 0 0 0 ;
contrast ’15F M1 & M2 vs M3’ treatment 1 0 0 1 0 0 -2 0 0 ;
contrast ’15F M1,M2,M3 vs 70F M3’ treatment 1 0 0 1 0 0 1 3 0 ;
run ;
STA305 week 5
45