The Mixed Effects Model

Download Report

Transcript The Mixed Effects Model

The Mixed Effects Model - Introduction
•
In many situations, one of the factors of interest will have its levels
chosen because they are of specific interest to the researcher.
• On the other hand, there may be a second factor of interest for which
it is important to generalize to all possible levels; in this case the
levels of the second factor might be chosen at random.
• This type of experiment is referred to as a mixed study design
because one of the factors has fixed levels while the other has
random levels.
STA305 Week 9
1
The Mixed Effects Model
• Suppose that in a given study, a levels of factor A have been chosen
because they are of particular interest. Further, suppose that b levels of
factor B have been chosen at random from all possible levels of this
factor.
• A total of abr experimental units will be used to conduct the study, r
units will be randomly allocated to each of the ab experimental
conditions.
• The form of the statistical model that we will study is identical to that
for 2 factor studies with either fixed or random effects - the difference
is in the assumptions about the factor levels and the interactions.
• The model equation is Yijk = μ + αi + βj + γij + εijk
• As with the fixed and random effects models, we parameterize the
model in such a way that μ is the overall mean of all of the responses:
i.e. EY    .
STA305 Week 9
2
Assumptions of the Mixed Effects Model
•
As in the fixed effects model, the factor A has fixed effects and we
therefore require that a
 i  0
i 1
• Since the levels of factor B have been chosen at random, we require
instead that  j ~ N 0, B2 .


• Since one of the factors is random, the interactions must be random
as well however, since one of the factors is fixed, the sum over that
component will be 0. Together these yield 2 constraints on the γij
 a 1 2 
 ij ~ N  0,
 AB  and
a


a

i 1
ij
0
• The factor (a-1)/a is for convenience in expressing EMS only.
STA305 Week 9
3
Sums of Squares
• The observed variation in the data is measure in the same manner as for
the fixed effects and the random effect case.
• In other words, the total variation in the data is measured by
SST   Yijk  Y 
a
b
r
2
i 1 j 1 k 1
• The sums of squares and the degrees of freedom for the other sources
of variation are also the same as in the 2-factor study with fixed or
random effects model.
• The only difference is in the expected mean squares.
STA305 week 5
4
Expected Mean Squares
• The expected mean squares are as follows:
a
br
2
2
2
E MS A     r AB 
i

a  1 i 1
2
2


E MS B    ar B
E MS AB    2  r A2B
E MS E    2
STA305 week 5
5
Hypothesis Testing
• As in all of the other experimental designs that we have looked at,
the motivation for the test statistics is derived from the EMS.
• As in the case of both the fixed and random effects models, the test
for interactions will be made by comparing MSA×B to MSE.
• The test for the fixed factor, factor A, will be made by comparing
MSA to MSA×B.
• The test for the random effect, factor B, will be made by comparing
MSB to MSE.
STA305 Week 9
6
The ANOVA Table
•
It is useful to add the expected mean squares to the table in order to
remember which ratios to form for the F-tests.
• The ANOVA table is given below:
STA305 week 6
7
Estimating the Model Parameters
• The effect for the levels of the fixed factor can be estimated as in the fixed
effects model. That is, ˆi  Yi  Y
• In the mixed model, however, confidence intervals for the effects of the
levels of the fixed factors are constructed using MSA×B as the variance
estimate. That is, a CI for the effect of the ith level of factor A is:
ˆ i  t  ; a 1b1
2
MS AB
br
• Orthogonal contrasts can also be used to make inferences about the levels
of factor A.
• The mixed effects model also contains components of variation and these
can be estimated as follows:
ˆ B2 
MS B  MS E
ar
, ˆ A2B 
STA305 week 6
MS A B  MS E
r
8
Random & Mixed Effects Using SAS – Example
•
Background: the goal of this study is to investigate the capacity of
a measurement system.
• Design: 10 parts are randomly selected; 2 operators are randomly
selected to measure each part 3 times.
• The statements required to conduct analysis in SAS are as follows:
proc glm data = measurement ;
class part operator ;
Model measure = part | operator ;
Random part operator ;
Test h=part e=part*operator ;
Test h=operator e=part*operator ;
run ;
STA305 Week 9
9
STA305 Week 9
10
STA305 Week 9
11
STA305 Week 9
12
• Suppose that parts were fixed and operators were random.
• The SAS code would be as follows:
proc glm data = measurement ;
class part operator ;
model measure = part | operator ;
random part operator ;
test h=part e=part*operator ;
run ;
• The ANOVA would look the same as above.
• The fixed factor “part” would be tested against the interaction.
• The (random) factor “operator” and “part×operator” would be tested
against error term that can be read from the ANOVA table.
STA305 Week 9
13
Three-Factor Fixed Effects Design
•
Suppose that in a particular experiment, there are 3 factors that are of
interest to the researcher.
• Assume that there are a levels of Factor A, b levels of Factor B, and c
levels of Factor C.
• In this case, the researcher must also be concerned with interactions
between all 3 factors: A×B, A×C, B×C, and A×B×C.
• The model that we will use in this case is
Yijkl = μ+αi +βj+_γk+(αβ)ij+(αγ)ik+(βγ)jk +(αβγ)ijk+ εijkl.
• In this notation, the interaction terms are denoted by, for instance, (αβ)ij.
• This notation is used to avoid introducing more Greek letters, and does
not mean that the interaction between αi and βj is αiβj
STA305 Week 9
14
Model Assumptions
• The assumptions about the parameters are similar to those for the 2factor fixed effects model.
• We assume the following:
STA305 Week 9
15
Sums of Squares and ANOVA Table
STA305 Week 9
16
Blocking - Introduction
• In general, the goal of experimental design is to minimization
haphazard variability and to be able to see differences between
treatments.
• In some situations, a variable might have an impact on the response,
however, this variable is not the focus of the study and we generally
wish to exclude it from the design.
• Such variables are called nuisance factors.
• The purpose of randomization is to average out the impact of these
nuisance factors.
• In some cases, the nuisance factors may be both unknown and
uncontrollable, in which case randomization is especially useful.
STA305 Week 9
17
• In other cases, factors which influence response might be known,
but possibly uncontrollable.
• Although such factors cannot be included in the design, we can at
least observe their value.
• The analysis can then be adjusted to compensate for the effect of
these variables using an analysis of covariance (to be discussed
later in the course).
• In other situations, a nuisance factor may be both known and
controllable, in which case, we can reduce risk of haphazard error
by including this factor in design of the experiment.
• The type of designs, called blocked designs can be used to reduce
variability of experimental error in such cases.
STA305 Week 9
18
Example
• Fleet manager wishes to consider 4 brands of tires to determine
which has least tread wear after 20,000 miles.
• Since there are 4 brands of tires to test the study should ideally
include at least 4 cars.
• Denote tire brands by T1, T2, T3, T4 and the cars by C1, C2, C3, C4.
• One possible way to design the study is to randomly decide which
car gets which type of tire.
• This car would then have 4 tires of this type
STA305 Week 9
19
• However, if there is a difference between cars with respect to the
wear they cause on the tires, then this design will not allow us to
detect a difference between brands.
• Although differences between cars are not of primary interest, they
need to be taken into account.
• One possible way around this is to randomly assign the 16 tires (4 of
each type) to the 4 cars.
• The following allocation of tires to cars might result from such a
randomization:
STA305 Week 9
20
• However, the goal of the design was to eliminate the confounding of tire
effects with car differences but this goal has not been met here.
• For example, brand T1 isn’t used on car C3, brand T2 is not used on car C1,
and brand T4 is not used on car C2.
• So we need to ensure that there is no confounding and that random error
does include differences between cars.
• This could be accomplished by restricting randomization so that each car
must have one tire of each brand. That is, randomize the location of tires
within each car.
• An example of such a randomization scheme is as follows:
• This design is known as a randomized complete block design.
STA305 Week 9
21
Randomized Complete Block Design
• A randomized complete block design is a restricted randomization.
• Experimental units are first organized into homogeneous groups called
blocks.
• Treatments are then randomly allocated within each block.
• In the example above, cars were contributing to variation but were not
of primary interest.
• The fact that each car requires 4 tires means that the 4 tires on one car
form a natural blocking unit.
• The purpose of blocking is to ensure that experimental units within a
block are as homogeneous as possible with respect to the response
variable.
• Units in different blocks are more heterogeneous.
STA305 Week 9
22
Advantages & Disadvantages
• Using blocks allows us to control a factor not of primary interest.
• However, it requires that there be enough experimental units to
ensure that each treatment can be used within each block.
• Further, it requires the researcher to assume that there is no
interaction between blocks and treatments.
• Since block effects must be estimated in addition to treatment
effects, the degrees of freedom available for estimating error are
reduced.
STA305 Week 9
23
Special Case: Paired t-test
• The simplest example of a randomized complete block design is a
paired t-test.
• In this case there are 2 treatments to be studied, each treatment is
applied to each experimental unit.
• For example, twins might be randomly allocated to one of 2
treatments.
• Or 2 treatments might be randomly allocated to left and right eyes,
lungs, kidneys, hands, etc.
STA305 Week 9
24
General Case: Two or More Treatments
• Consider the case where there is one factor which will be studied for
its effect on the response variable. Suppose that the number of levels
of that factor is a.
• Further, suppose that it is known that there is a nuisance factor
which can be controlled, and that this factor will be used to form
blocks.
• Let b denote the number of blocks to be used in the experiment.
• The order in which the treatments will be allocated within blocks is
randomized.
• The total number of experimental units required to conduct this
experiment is N = ab.
STA305 Week 9
25
The Model
• We will use the following statistical model to express the response
in terms of the treatment and block effects: Yij = μ+τi +βj+ εij.
• Where:
μ is the overall mean
τi is the effect of the i-th treatment
βj is the effect of the j-th block and
εij is the residual or random error term.
• It is possible that either or both of the treatments and blocks could
be randomly chosen. But, for now we assume that both are fixed.
STA305 Week 9
26
Assumptions
• As before, we will assume that εij ~ N(0, σ2) and that εij are
independent of each other.
• Treatment and block effects are defined as deviations from the
overall mean.
• Therefore, we require that
a

i 1
i
0
b
and

j 1
STA305 Week 9
j
0
27
Sources of Variation
• When considered as a whole, the data from all treatment groups and
all blocks will contain a certain amount of variability.
• Some of the variability might be due the fact that the treatments
have different effects on the response.
• Similarly, some of the variability might be due to the fact that blocks
are quite heterogeneous with regard to the response.
• Finally, even if there were no treatment or block differences, there
would still be chance variation.
• The total sum of squares is a measure of the overall variability in the
sample, and it can be decomposed to allow us to determine how
much variability is due to each source…
STA305 Week 9
28