IE241 Introduction to Mathematical Statistics

Download Report

Transcript IE241 Introduction to Mathematical Statistics

IE341: Introduction to Design
of Experiments
Last term we talked about testing the difference
between two independent means. For means from a
normal population, the test statistic is
XA  XB
XA  XB
t

sdiff
s A2 s B2

n A nB
where the denominator is the estimated standard
deviation of the difference between two independent
means. This denominator represents the random
variation to be expected with two different samples.
Only if the difference between the sample means is
much greater than the expected random variation do
we declare the means different.
We also covered the case where the
two means are not independent, and
what we must do to account for the fact
that they are dependent.
And finally, we talked about the difference
between two variances, where we used the F
ratio. The F distribution is a ratio of two chisquare variables. So if s21 and s22 possess
independent chi-square distributions with v1
and v2 df, respectively, then
s12
v1
F 2
s2
v2
has the F distribution with v1 and v2 df.
All of this is valuable if we are testing only two means. But
what if we want to test to see if there is a difference among
three means, or four, or ten?
What if we want to know whether fertilizer A or fertilizer B or
fertilizer C is best? In this case, fertilizer is called a factor,
which is the condition under test.
A, B, C, the three types of fertilizer under test, are called levels
of the factor fertilizer.
Or what if we want to know if treatment A or treatment B or
treatment C or treatment D is best? In this case, treatment is
called a factor.
A,B,C,D, the four types of treatment under test, are called levels
of the factor treatment.
It should be noted that the factor may be quantitative or
qualitative.
Enter the analysis of variance!
ANOVA, as it is usually called, is a way to test the
differences between means in such situations.
Previously, we tested single-factor experiments with
only two treatment levels. These experiments are
called single-factor because there is only one factor
under test. Single-factor experiments are more
commonly called one-way experiments.
Now we move to single-factor experiments with more
than two treatment levels.
Let’s start with some notation.
Yij = ith observation in the jth level
N
N = total number of experimental observations
Y
Y 
= the grand mean of all N experimental observations
Y j = the mean of the observations in the
Y
i 1
ij
N
nj
jth level
Yj 
Y
i 1
ij
nj
nj = number of observations in the jth level; the nj are called replicates.
Replication of the design refers to using more than one experimental unit for each
level.
If there are the same number n replicates for each treatment, the design is said to be
balanced.
Designs are more powerful if they are
balanced, but balance is not always
possible.
Suppose you are doing an experiment
and the equipment breaks down on one
of the tests. Now, not by design but by
circumstance, you have unequal
numbers of replicates for the levels.
In all the formulas, we used nj as the
number of replicates in treatment j, not n,
so there is no problem.
Notation continued
j
= the effect of the jth level
 j  Yj Y
L = number of treatment levels
eij = the “error” associated with the ith observation in the jth level,
assumed to be independent normally distributed random
variables with mean = 0 and variance = σ2, which are constant for
all levels of the factor.
For all experiments, randomization is
critical. So to draw any conclusions
from the experiment, we must require
that the treatments be applied in
random order.
We must also assign the experimental
units to the treatments randomly.
If all this randomization occurs, the
design is called a completely
randomized design.
ANOVA begins with a linear statistical
model
Yij  Y   j  eij
This model is for a one-way or singlefactor ANOVA. The goal of the model is
to test hypotheses about the treatment
effects and to estimate them.
If the treatments have been selected by
the experimenter, the model is called a
fixed-effects model. In this case, the
conclusions will apply only to the
treatments under consideration.
Another type of model is the random
effects model or components of
variance model.
In this situation, the treatments used
are a random sample from large
population of treatments. Here the τi
are random variables and we are
interested in their variability, not in the
differences among the means being
tested.
First, we will talk about fixed effects,
completely randomized, balanced models.
In the model we showed earlier, the τj are
defined as deviations from the grand mean
L
so

j 1
j
0
It follows that the mean of the jth treatment is
Yj  Y  j
Now the hypothesis under test is:
Ho: μ1= μ2 = μ3 = … μL
Ha: μj≠ μk for at least one j,k pair
The test procedure is ANOVA, which is a
decomposition of the total sum of
squares into its components parts
according to the model.
The total SS is
nj
L
SSTotal   (Yij  Y ) 2
i 1 j 1
and ANOVA is about dividing it into its
component parts.
SS = variability of the differences
among the L levels
L
SS   n j (Y j  Y ) 2
j 1
SSε = pooled variability of the random error
within levels
L
nj
SS error   (Yij  Y j ) 2
j 1 i 1
This is easy to see because

2
  Y j  Y   (Yij  Y j )
)
Y

Y
(
 ij
L
nj
nj
L
2

j 1 i 1
j 1 i 1
  n j Y j  Y    (Yij  Y j )  2 Y j  Y (Yij  Y j )
L
j 1
2
nj
L
L
nj
2
j 1 i 1
j 1 i 1
But the cross-product term vanishes
because n
j
 (Y
i 1
ij
Yj )  0
So SStotal = SS
treatments
+ SS
error
Most of the time, this is called
SStotal = SS between + SS within
Each of these terms becomes an MS (mean
square) term when divided by the appropriate
df.
SStreatments SStreatments
MStreatments 

dftreatments
L 1
MSerror 
SSerror SSerror

dferror
N L
The df for SSerror = N-L because
L
nj
 (Y
j 1 i 1

ij
L
 Y j )   (n j  1) s 2j
2
j 1
(n1  1) s12  (n  1) s22  ...  (nL  1) sL2
(n1  1)  (n2  1)  ...  (nL  1)

SSerror
N L
and the df for SSbetween = L-1 because
there are L levels.
Now the expected values of each of these terms
are
E(MSerror) = σ2
E(MStreatments) =
L
2 
2
n

 j j
j 1
L 1
Now if there are no differences among the treatment
means, then  j  0 for all j.
So we can test for differences with our old friend F
F
MStreatments
MSerror
with L -1 and N -L df.
Under Ho, both numerator and denominator are
estimates of σ2 so the result will not be significant.
Under Ha, the result should be significant because
the numerator is estimating the treatment effects as
well as σ2.
The results of an ANOVA are presented
in an ANOVA table. For this one-way,
fixed-effects, balanced model:
Source
Model
Error
Total
SS
SSbetween
SSwithin
SStotal
df
L-1
N-L
N-1
MS
MSbetween
MSwithin
p
p
Let’s look at a simple example.
A product engineer is investigating the
tensile strength of a synthetic fiber to
make men’s shirts. He knows from prior
experience that the strength is affected
by the weight percent of cotton in the
material. He also knows that the
percent should range between 10% and
40% so that the shirts can receive
permanent press treatment.
The engineer decides to test 5 levels:
15%, 20%, 25%, 30%, 35%
and to have 5 replicates in this design.
His data are
%
Yj
15
7
7 15
11
9
9.8
20
12
17
12
18
18
15.4
25
14
18
18
19
19
17.6
30
19
25
22
19
23
21.6
35
7
10
11
15
11
10.8
Y
15.04
In this tensile strength example, the
ANOVA table is
Source
Model
Error
Total
SS
df
MS
475.76 4 118.94
161.20 20
8.06
636.96 24
p
<0.01
In this case, we would reject Ho and
declare that there is an effect of the
cotton weight percent.
We can estimate the treatment
parameters by subtracting the grand
mean from the treatment means. In this
example,
τ1 = 9.80 – 15.04 = -5.24
τ2 = 15.40 – 15.04 = +0.36
τ3 = 17.60 – 15.04 = -2.56
τ4 = 21.60 – 15.04 = +6.56
τ5 = 10.80 – 15.04 = -4.24
Clearly, treatment 4 is the best because
it provides the greatest tensile strength.
Now you could have computed these values
from the raw data yourself instead of doing
the ANOVA. You would get the same results,
but you wouldn’t know if treatment 4 was
significantly better.
But if you did a scatter diagram of the
original data, you would see that treatment 4
was best, with no analysis whatsoever.
In fact, you should always look at the original
data to see if the results do make sense. A
scatter diagram of the raw data usually tells
as much as any analysis can.
S catter plo t o f tensile strength data
30
tensile str ength
25
20
15
10
5
0
10
15
20
25
30
weight percent co tto n
35
40
How do you test the adequacy of the
model?
The model assumes certain
assumptions that must hold for the
ANOVA to be useful. Most importantly,
that the errors are distributed normally
and independently.
The error for each observation,
sometimes called the residual, is
eij  Yij  Y j
A residual check is very important for testing
for nonconstant variance. The residuals
should be structureless, that is, they should
have no pattern whatsoever, which, in this
case, they do not.
S catter plo t o f residuals v s. fitted v alues
6
5
4
3
r esidual
2
1
0
-1
-2
-3
-4
-5
9
12
15
fitted v alue
18
21
These residuals show no extreme
differences in variation because they all
have about the same spread.
They also do not show the presence of
any outlier. An outlier is a residual value
that is vey much larger than any of the
others. The presence of an outlier can
seriously jeopardize the ANOVA, so if
one is found, its cause should be
carefully investigated.
A histogram of residuals shows the
distribution is slightly skewed. Small
departures from symmetry are of less
concern than heavy tails.
Histo gram o f R esiduals
Fr equency
3
2
1
-6
-4
-2
0
R esidual
2
4
6
Another check is for normality. If we do a
normal probability plot of the residuals, we
can see whether normality holds.
Normal probability plot
100
90
Normal probability
80
70
60
50
40
30
20
10
0
-4
-2
0
2
Res idual
4
6
A normal probability plot is made with
ascending ordered residuals on the
x-axis and their cumulative probability
points, 100(k-.5)/n, on the y-axis. k is
the order of the residual and n =
number of residuals. There is no
evidence of an outlier here.
The previous slide is not exactly a
normal probability plot because the
y-axis is not scaled properly. But it
does gives a pretty good suggestion of
linearity.
A plot of residuals vs run order is useful to
detect correlation between the residuals, a
violation of the independence assumption.
Runs of positive or of negative residuals
indicates correlation. None is observed here.
R esiduals v s R un O rder
6
5
4
Residuals
3
2
1
0
-1
-2
-3
-4
-5
0
5
10
15
R un O rder
20
25
30
One of the goals of the analysis is to
estimate the level means. If the results
of the ANOVA shows that the factor is
significant, we know that at least one of
the means stands out from the rest.
But which one or ones?
The procedures for making these mean
comparisons are called multiple
comparison methods. These methods
use linear combinations called contrasts.
A contrast is a particular linear
combination of level means, such as
Y4  Y5 to test the difference between
level 4 and level 5.
Or if one wished to test the average of
levels 1 and 3 vs levels 4 and 5, he
would use (Y  Y )  (Y  Y ) .
1
In general,
3
L
C   c jY j
j 1
4
5
L
where  c
j 1
j
0
An important case of contrasts is called
orthogonal contrasts. Two contrasts in
a design with coefficients cj and dj are
orthogonal if
L
n c d
j 1
j
j
j
0
There are many ways to choose the
orthogonal contrast coefficients for a
set of levels. For example, if level 1 is
a control and levels 2 and 3 are two real
treatments, a logical choice is to
compare the average of the two
treatments with the control:
Y2  Y3  2Y1
and then the two treatments against
one another: Y2  Y3  0Y1
These two contrasts are orthogonal
because  c d  (1*1)  (1* 1)  (2 * 0)  0
3
j 1
j
Only L-1 orthogonal contrasts may be
chosen because the L levels have only
L-1 df. So for only three levels, the
contrasts chosen exhaust those
available for this experiment.
Contrasts must be chosen before seeing
the data so that experimenters aren’t
tempted to contrast the levels with the
greatest differences.
For the tensile strength experiment with 5
levels and thus 4 df, the 4 contrasts are:
C1= 0(5)(9.8)+0(5)(15.4)+0(5)(17.6)-1(5)(21.6)+1(5)(10.8) =-54
C2= +1(5)(9.8)+0(5)(15.4)+1(5)(17.6)-1(5)(21.6)-1(5)(10.8) =-25
C3= +1(5)(9.8)+0(5)(15.4)-1(5)(17.6)+0(5)(21.6)+0(5)(10.8) =-39
C4= -1(5)(9.8)+4(5)(15.4)-1(5)(17.6)-1(5)(21.6)-1(5)(10.8) = 9
These 4 contrasts completely partition the SStreatments. Then the
SS for each contrast is formed:
 L

  n j c jY j 


j 1


SS C 
L
 n j c 2j
j 1
2
So for the 4 contrasts we have:
SS C1 
SS C 2 
SS C 3 
SS C 4 
 542
5[(0 2 )  (0 2 )  (0 2 )  (12 )  (12 )]
 291.6
 252
5[(12 )  (0 2 )  (12 )  (12 )  (12 )]
 392
5[(1 )  (0 )  (1 )  (0 )  (0 )]
2
2
2
2
2
 31.25
 31.25
92
5[(12 )  (4 2 )  (12 )  (12 )  (12 )]
 0.81
Now the revised ANOVA table is
Source
Weight %
C1
C2
C3
C4
Error
Total
SS
475.76
291.60
31.25
152.10
0.81
161.20
636.96
df
MS
p
4 118.94 <0.001
1 291.60 <0.001
1
31.25 <0.06
1 152.10 <0.001
1
0.81 <0.76
20
8.06
24
So contrast 1 (level 5 – level 4) and
contrast 3 (level 1 – level 3) are
significant.
Although the orthogonal contrast
approach is widely used, the
experimenter may not know in advance
which levels to test or they may be
interested in more than L-1
comparisons. A number of other
methods are available for such testing.
These methods include:
Scheffe’s Method
Least Significant Difference Method
Duncan’s Multiple Range Test
Newman-Keuls test
There is some disagreement about
which is the best method, but it is best
if all are applied only after there is
significance in the overall F test.
Now let’s look at the random effects
model.
Suppose there is a factor of interest with
an extremely large number of levels. If
the experimenter selects L of these
levels at random, we have a random
effects model or a components of
variance model.
The linear statistical model is
Yij  Y   j  eij
as before, except that both  j and eij
are random variables instead of simply eij .
Because  j and eij are independent, the
variance of any observation is
Var(Yij )  Var( )  Var(eij )
These two variances are called variance
components, hence the name of the model.
The requirements of this model are that the e ij
are NID(0,σ2), as before, and that the  j
are NID(0,  2 ) and that eij and  j are
independent. The normality assumption is
not required in the random effects model.
As before,
SSTotal = SStreatments + SSerror
And the E(MSerror) = σ2.
But now E(MStreatments) = σ2 + n  2
So the estimate of  2 is
MStreatmetns  MSerror
ˆ  
n
2
The computations and the ANOVA table
are the same as before, but the
conclusions are quite different.
Let’s look at an example.
A textile company uses a large number
of looms. The process engineer
suspects that the looms are of different
strength, and selects 4 looms at
random to investigate this.
The results of the experiment are shown in the
table below.
Loom
Yj
1
98
97
99 96
97.5
2
91
90
93 92
91.5
3
96
95
97 95
95.75
4
95
96
99 98
97.0
95.44
The ANOVA table is
Source
SS
df
Looms
89.19
3
Error
22.75 12
Total
111.94 15
MS
p
29.73 <0.001
1.90
In this case, the estimates of the variances
are:
 e2 =1.90
29.73  1.90
ˆ  
 6.96
4
2
 ij2   e2   2  1.90  6.96  8.86
Thus most of the variability in the
observations is due to variability in loom
strength. If you can isolate the causes of this
variability and eliminate them, you can reduce
the variability of the output and increase its
quality.
When we studied the differences
between two treatment means, we
considered repeated measures on the
same individual experimental unit.
With three or more treatments, we can
still do this. The result is a repeated
measures design.
Consider a repeated measures ANOVA
partitioning the SSTotal.
n
L
n
L
n
L
2
(
Y

Y
)

(
Y

Y
)

(
Y

Y
)
 ij
 i
 ij i
2
i 1 j 1
2
i 1 j 1
i 1 j 1
This is the same as
SStotal = SSbetween subjects + SSwithin
subjects
The within-subjects SS may be further
partitioned into SStreatment + SSerror .
In this case, the first term on the RHS is
the differences between treatment
effects and the second term on the
RHS is the random error.
n
L
n
L
n
L
2
(
Y

Y
)

(
Y

Y
)

(
Y

Y

Y

Y
)
 ij i  j
 ij i j
2
i 1 j 1
2
i 1 j 1
i 1 j 1
Now the ANOVA table looks like this.
Source
SS
n
L
Between subjects  (Y
i 1 j 1
n
Within Subjects
i 1 j 1
n
Treatments
Error
Total
i 1 j 1
n-1
ij
 Yi ) 2
n(L-1)
j
 Y )2
L-1
ij
 Yi  Y j  Y ) 2
(L-1)(n-1)
ij
 Y )2
L
 (Y
i 1 j 1
n
 Y )2
L
 (Y
n
i
L
 (Y
L
 (Y
i 1 j 1
df
Ln-1
MS
p
The test for treatment effects is the
usual
MStreatment
MSerror
but now it is done entirely within
subjects.
This design is really a randomized
complete block design with subjects
considered to be the blocks.
Now what is a randomized complete
blocks design?
Blocking is a way to eliminate the effect
of a nuisance factor on the
comparisons of interest. Blocking can
be used only if the nuisance factor is
known and controllable.
Let’s use an illustration. Suppose we
want to test the effect of four different
tips on the readings from a hardness
testing machine.
The tip is pressed into a metal test
coupon, and from the depth of the
depression, the hardness of the coupon
can be measured.
The only factor is tip type and it has four
levels. If 4 replications are desired for each
tip, a completely randomized design would
seem to be appropriate.
This would require assigning each of the 4x4
= 16 runs randomly to 16 different coupons.
The only problem is that the coupons need to
be all of the same hardness, and if they are
not, then the differences in coupon hardness
will contribute to the variability observed.
Blocking is the way to deal with this problem.
In the block design, only 4 coupons are
used and each tip is tested on each of
the 4 coupons. So the blocking factor
is the coupon, with 4 levels.
In this setup, the block forms a
homogeneous unit on which to test the
tips.
This strategy improves the accuracy of
the tip comparison by eliminating
variability due to coupons.
Because all 4 tips are tested on each coupon,
the design is a complete block design. The
data from this design are shown below.
Test coupon
Tip type
1
2
3
4
1
9.3
9.4
9.6
10.0
2
9.4
9.3
9.8
9.9
3
9.2
9.4
9.5
9.7
4
9.7
9.6
10.0
10.2
Now we analyze these data the same
way we did for the repeated measures
design. The model is
Y jk  Y   j   k  e jk
where βk is the effect of the kth block
and the rest of the terms are those we
already know.
Since the block effects are deviations
from the grand mean,
K

k 1
just as
k
0
j
0
L

j 1
We can express the total SS as
L
B
L
B
2
(
Y

Y
)

[(
Y

Y
)

(
Y

Y
)

(
Y

Y

Y

Y
)]
 jk
 j
k
jk
j
k
2
j 1 k 1
L
j 1 k 1
B
L
B
L
B
  (Y j  Y )   (Yk  Y )   (Y jk  Y j  Yk  Y ) 2
2
j 1 k 1
2
j 1 k 1
j 1 k 1
which is equivalent to
SStotal = SStreatments + SSblocks + SSerror
with df
N-1 = L-1
+ B-1
+ (L-1)(B-1)
The test for equality of treatment means
is F  MS
MS
treatments
error
and the ANOVA table is
Source
SS
Treatments
SStreatments
Blocks
Error
Total
SSblocks
SSerror
SStotal
df
L-1
B-1
(L-1)(B-1)
N-1
MS
MStreatments
MSblocks
MSerror
p
For the hardness experiment, the
ANOVA table is
Source
Tip type
Coupons
Error
Total
SS
df
38.50
3
82.50
3
8.00
9
129.00 15
MS
12.83
27.50
.89
p
0.0009
As is obvious, this is the same analysis
as the repeated measures design.
Now let’s consider the Latin Square
design. We’ll introduce it with an
example.
The object of study is 5 different
formulations of a rocket propellant on
the burning rate of aircraft escape
systems. Each formulation comes from
a batch of raw material large enough for
only 5 formulations. Moreover, the
formulations are prepared by 5 different
operators, who differ in skill and
experience.
The way to test in this situation is with a 5x5
Latin Square, which allows for double blocking
and therefore the removal of two nuisance
factors. The Latin Square for this example is
Batches of
raw
material
Operators
1
2
3
4
5
1
A
B
C
D
E
2
B
C
D
E
A
3
C
D
E
A
B
4
D
E
A
B
C
5
E
A
B
C
D
Note that each row and each column
has all 5 letters, and each letter occurs
exactly once in each row and column.
The statistical model for a Latin Square
is
Y  Y       e
jkl
j
k
l
jkl
where Yjkl is the jth treatment
observation in the kth row and the lth
column.
Again we have
SStotal=SSrows+SScolumns+SStreatments+SSerror
with df =
N = R-1 + C-1 + L-1 + (R-2)(C-1)
The ANOVA table for propellant data is
Source
SS df MS
p
Formulations
330.00
4
82.50
0.0025
Material batches
Operators
Error
Total
68.00
150.00
128.00
676.00
4
4
12
24
17.00
37.50
10.67
0.04
So both the formulations and the
operators were significantly different.
The batches of raw material were not,
but it still is a good idea to block on
them because they often are different.
This design was not replicated, and
Latin Squares often are not, but it is
possible to put n replicates in each cell.
Now if you superimposed one Latin
Square on another Latin Square of the
same size, you would get a GraecoLatin Square. In one Latin Square, the
treatments are designated by roman
letters. In the other Latin Square, the
treatments are designated by Greek
letters.
Hence the name Graeco-Latin Square.
A 5x5 Graeco-Latin Square is
Batches of
raw
material
Operators
1
2
3
4
5
1
Aα
Bγ
Cε
Dβ
Eδ
2
Bβ
Cδ
Dα
Eγ
Aε
3
Cγ
Dε
Eβ
Aδ
Bα
4
Dδ
Eα
Aγ
Bε
Cβ
5
Eε
Aβ
Bδ
Cα
Dγ
Note that the five Greek treatments appear
exactly once in each row and column, just as
the Latin treatments did.
If Test Assemblies had been added as
an additional factor to the original
propellant experiment, the ANOVA table
for propellant data would be
Source
SS
Formulations
Material batches
Operators
Test Assemblies
Error
Total
330.00
68.00
150.00
62.00
66.00
676.00
df
4
4
4
4
8
24
MS
82.50
17.00
37.50
15.50
8.25
p
0.0033
0.0329
The test assemblies turned out to be
nonsignificant.
Note that the ANOVA tables for the Latin
Square and the Graeco-Latin Square
designs are identical, except for the
error term.
The SS(error) for the Latin Square
design was decomposed to be both
Test Assemblies and error in the
Graeco-Latin Square. This is a good
example of how the error term is really a
residual. Whatever isn’t controlled falls
into error.
Before we leave one-way designs, we
should look at the regression approach
to ANOVA. The model is
Yij     j  eij
Using the method of least squares, we
rewrite this as
nj
L
nn
L
E   e ij2   (Yij     j ) 2
i 1 j 1
i 1 j 1
Now to find the LS estimates of μ and τj,
E
0

E
0

When we do this differentiation with
respect to μ and τj, and equate to 0, we
obtain
nj
L
 2 (Yij  ˆ  ˆ j )  0
i 1 J 1
L
 2 (Yij  ˆ  ˆ j )  0
j 1
for all j
After simplification, these reduce to
Nˆ  nˆ1  nˆ2  ...  nˆ L  Y ..
nˆ  nˆ1 ............................. Y.1
nˆ ............  nˆ2 ................. Y.2
.
.
.
nˆ ............................ nˆ L  Y. L
In these equations,
Y..  NY
Y. j  nY j
These j + 1 equations are called the
least squares normal equations.
L
If we add the constraint
ˆ
j 1
j
0
we get a unique solution to these
normal equations.
ˆ  Y
ˆ j  Y j  Y
It is important to see that ANOVA designs are
simply regression models. If we have a oneway design with 3 levels, the regression
model is
Yij   0  1 X i1   2 X i 2  eij
where Xi1 = 1 if from level 1
= 0 otherwise
and Xi2 = 1 if from level 2
= 0 otherwise
Although the treatment levels may be
qualitative, they are treated as “dummy”
variables.
Since Xi1 = 1 and Xi2 = 0,
Yi1   0  1 (1)   2 (0)  e ij
  0  1  e ij
    Y 
so
Similarly, if the observations are from
level 2, Y     (0)   (1)  e
0
i2
1
0
1
1
  0   2  e ij
so
0  2  Y   2
2
ij
Finally, consider observations from level
3, for which Xi1 = Xi2 = 0. Then the
regression model becomes
so
Yi 3   0  1 (0)   2 (0)  e ij
  0  e ij
0  Y   3
Thus in the regression model
formulation of the one-way ANOVA, the
regression coefficients describe
comparisons of the first two level
means with the third.
So
 0  Y3
 1  Y1  Y3
 2  Y 2  Y3
Thus, testing β1= β2 = 0 provides a test of the
equality of the three means.
In general, for L levels, the regression model
will have L-1 variables
Yij   0  1 X i1   2 X i 2  ...   L1 X i , L1  eij
and
 0  YL
 j  Y j  YL
Now what if you have two factors under
test? Or three?
Here the answer is the factorial design.
A factorial design crosses all factors.
Let’s take a two-way design. If there
are L levels of factor A and M levels of
factor B, then all LM treatment
combinations appear in the experiment.
Most commonly, L = M = 2.
In a two-way design, with two levels of each
factor, we have
Factor A
Factor B
-1 (low level)
-1 (low level)
+1 (high level)
-1 (low level)
-1 (low level)
+1 (high level)
+1 (high level)
+1 (high level)
Response
20
50
40
12
We can have as many replicates as we want in
this design. With n replicates, there are n
observations in each cell of the design.
SStotal = SSA + SSB + SSAB + SSerror
This decomposition should be familiar
by now except for SSAB. What is this
term? Its official name is interaction.
This is the magic of factorial designs.
We find out about not only the effect of
factor A and the effect of factor B, but
the effect of the two factors in
combination.
Now let’s look at the main effects of the
factors graphically.
M ain Effect o f Facto r A
3 1 .2
31
3 0 .6
3 0 .4
3 0 .2
M ain Effect o f Facto r B
30
40
2 9 .8
35
1
A
2
30
Response
Response
3 0 .8
25
20
15
10
5
0
1
2
B
Now let’s look at the interaction effect. This is
the effect of factors A and B in combination,
and is often the most important effect.
Inte rac tio n o f fac to rs A and B
60
50
Response
40
30
20
10
Factor B -- low
Factor B -- high
0
1
1 .5
Fac to r A
2
Interaction of factors is the key to the
East, as we say in the West.
Suppose you wanted the factor levels
that give the lowest possible response.
If you picked by main effects, you
would pick A low and B high.
But look at the interaction plot and it will
tell you to pick A high and B high.
This is why, if the interaction term is
significant, you never interpret main
effects. They are meaningless in the
presence of interaction.
And it is because factorial designs
provide interactions that they are so
popular and so successful.
Now what if the interaction term is not
significant? What if the results instead were
M ain Effect o f Facto r A
45
35
M ain Effect o f Facto r B
30
45
25
1
2
A
40
Response
Response
40
35
30
25
1
2
B
and the interaction is
Interactio n o f facto rs A and B
60
50
Response
40
30
20
10
Factor B -- low
Factor B -- high
0
1
1 .5
Facto r A
2
The clearest indication of no interaction is the
parallel lines.
So this time, if you wanted the lowest
response, you would pick A low and B
low and that would be correct.