Lecture 14 - University of Pennsylvania

Download Report

Transcript Lecture 14 - University of Pennsylvania

Lecture 14
• Analysis of Variance Experimental Designs
(Chapter 15.3)
• Randomized Block (Two-Way) Analysis of
Variance
• Announcement: Extra office hours, today
after class and Monday, 9:00-10:20
15.3 Analysis of Variance
Experimental Designs
• Several elements may distinguish between
one experimental design and another:
– The number of factors (1-way, 2-way, 3-way,…
ANOVA).
– The number of factor levels.
– Independent samples vs. randomized blocks
– Fixed vs. random effects
These concepts will be explained in this lecture.
Number of factors, levels
• Example: 15.1, modified
– Methods of marketing:
price, convenience, quality
=> first factor with 3 levels
– Medium: advertise on TV vs. in newspapers
=> second factor with 2 levels
• This is a factorial experiment with two
“crossed factors” if all 6 possibilities are
sampled or experimented with.
• It will be analyzed with a “2-way ANOVA”.
(The book got this term wrong.)
One - way ANOVA
Single factor
Two - way ANOVA
Two factors
Response
Response
Treatment 3 (level 1)
Treatment 2 (level 2)
Treatment 1 (level 3)
Level2
Level 1
Factor B
Level 3
Level2
Level 1 Factor A
Randomized blocks
• This is something between 1-way and 2-way ANOVA:
a generalization of matched pairs when there are more
than 2 levels.
• Groups of matched observations are collected in
blocks, in order to remove the effects of unwanted
variability. => We improve the chances of detecting
the variability of interest.
• Blocks are like a second factor
=> 2-way ANOVA is used for analysis
• Ideally, assignment to levels within blocks is
randomized, to permit causal inference.
Randomized blocks (cont.)
• Example: expand 13.03
– Starting salaries of marketing and finance MBAs:
add accounting MBAs to the investigation.
– If 3 independent samples of each specialty are
collected (samples possibly of different sizes), we
have a 1-way ANOVA situation with 3 levels.
– If GPA brackets are formed, and if one samples
3 MBAs per bracket, one from each specialty,
then
one has a blocked design. (Note: the 3 samples will
be of equal size due to blocking.)
– Randomization is not possible here: one can’t assign
each student to a specialty
=> No causal inference.
Models of fixed and random effects
• Fixed effects
– If all possible levels of a factor are included in our
analysis or the levels are chosen in a nonrandom way,
we have a fixed effect ANOVA.
– The conclusion of a fixed effect ANOVA applies only
to the levels studied.
• Random effects
– If the levels included in our analysis represent a random
sample of all the possible levels, we have a randomeffect ANOVA.
– The conclusion of the random-effect ANOVA applies to
all the levels (not only those studied).
Models of fixed and random effects (cont.)
Fixed and random effects - examples
– Fixed effects - The advertisement Example (15.1): All
the levels of the marketing strategies considered were
included. Inferences don’t apply to other possible
strategies such as emphasizing nutritional value.
– Random effects - To determine if there is a difference in
the production rate of 50 machines in a large factory,
four machines are randomly selected and the number of
units each produces per day for 10 days is recorded.
15.4 Randomized Blocks Analysis of
Variance
• The purpose of designing a randomized
block experiment is to reduce the withintreatments variation, thus increasing the
relative amount of between treatment
variation.
• This helps in detecting differences between
the treatment means more easily.
Examples of Randomized Block
Designs
Factor
Response
Units
Block
Varieties of
Corn
Blood
pressure
Drugs
Number of
breaks
Yield
Plots of
Land
Patient
Adjoining
plots
Same age,
sex, overall
condition
Shifts
Blood
pressure
Worker
Worker
productivity
Randomized Blocks
Block all the observations with some
commonality across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block3
Block2
Block 1
Randomized Blocks
Block all the observations with some
commonality across treatments
Treatment
Block
1
2
.
.
.
b
Treatment mean
1
2
k Block mean
x [B ]1
X11 X12 . . . X1k
X21 X22
X2k
x [B ]2
Xb1 Xb2
Xbk
x [ T ]1 x [ T ]2
x [ T ]k
x [ B ]b
Partitioning the total variability
• The sum of square total is partitioned into
Recall.
three sources of variation For the independent
– Treatments
– Blocks
– Within samples (Error)
samples design we have:
SS(Total) = SST + SSE
SS(Total) = SST + SSB + SSE
Sum of square for treatments
Sum of square for blocks
Sum of square for error
Sums of Squares Decomposition
•
•
•
•
= observation in ith block, jth treatment
= mean of ith block
X i 
X  j= mean of jth treatment
X ij
k
b
SSTot   ( X ij  X ) 2
j 1 i 1
k
SST   b( X  j  X ) 2
j 1
b
SSB   k ( X i  X ) 2
i 1
k
b
SSE  SSTot SST  SSB   ( X ij  X i  X  j  X ) 2
j 1 i 1
Calculating the sums of squares
• Formulas for the calculation of the sums of squares
SS(Total)  ( x11  X ) 2  ( x21  X ) 2  ...  ( x12  X ) 2  ( x22  X ) 2 
...  ( X 1k  X ) 2  ( x2 k  X ) 2  ... 
SSB=
Treatment
Block
1
2
k Block mean
2


X11 X12 . . . X1k
 k( x[B]1 )  X  
x [B ]1


X21 X22
X2k x [B ]
2
2
1
2
.
.
.
b
 k( x[B] )  X  
2


Xb1 Xb2
Xbk
Treatment mean x [ T ]1 x [ T ]2
x [ T ]k
SST =
2
2
 k( x[B] )  X 
k


x
 b( x[T] )  X    b( x[T] )  X   ...   b( x[T] )  X 
1
2
k


 


2
2
Calculating the sums of squares
• Formulas for the calculation of the sums of squares
SSE  ( x 11  x[ T ]1  x[B]1  X ) 2  ( x 21  x[ T ]1  x[B] 2  X ) 2  ...
( x 12  x[ T ] 2  x[B]1  X ) 2  ( x 22  x[ T ] 2  x[B] 2  X ) 2  ...
( x 1k  x[ T ]k  x[B]1  X ) 2  ( x 2k  x[ T ]k  x[B] 2  X ) 2  ...
SSB=
Treatment
Block
2
1
2
k Block mean
 k( x[B] )  X  
x [B ]1
X11 X12 . . . X1k
1


X21 X22
X2k x [B ] 2
2
1
2
.
.
.
b
 k( x[B] )  X  
2


Xb1 Xb2
Xbk
Treatment mean x [ T ]1 x [ T ]2
x [ T ]k
SST =
2
2
 k( x[B] )  X 
k


x
 b( x[T] )  X    b( x[T] )  X   ...   b( x[T] )  X 
1
2
k


 


2
2
Mean Squares
To perform hypothesis tests for treatments and
blocks we need
• Mean square for treatments
• Mean square for blocks
SST
• Mean square for error
MST 
k 1
SSB
MSB 
b 1
SSE
MSE 
n  k  b 1
Test statistics for the randomized block
design ANOVA
Test statistic for treatments
MST
F
MSE
Test statistic for blocks
MSB
F
MSE
df-T: k-1
df-B: b-1
df-E: n-k-b+1
The F test rejection regions
• Testing the mean responses for treatments
F > Fa,k-1,n-k-b+1
• Testing the mean response for blocks
F> Fa,b-1,n-k-b+1
Randomized Blocks ANOVA - Example
• Example 15.2
– Are there differences in the effectiveness of
cholesterol reduction drugs?
– To answer this question the following experiment
was organized:
• 25 groups of men with high cholesterol were matched by
age and weight. Each group consisted of 4 men.
• Each person in a group received a different drug.
• The cholesterol level reduction in two months was
recorded.
– Can we infer from the data in Xm15-02 that there
are differences in mean cholesterol reduction
among the four drugs?
Randomized Blocks ANOVA - Example
• Solution
– Each drug can be considered a treatment.
– Each 4 records (per group) can be blocked,
because they are matched by age and weight.
– This procedure eliminates the variability in
cholesterol reduction related to different
combinations of age and weight.
– This helps detect differences in the mean
cholesterol reduction attributed to the
different drugs.
Randomized Blocks ANOVA - Example
ANOVA
Source of Variation
Rows
Columns
Error
SS
3848.7
196.0
1142.6
Total
5187.2
Treatments
df
24
3
72
MS
160.36
65.32
15.87
F
P-value
10.11
0.0000
4.12
0.0094
F crit
1.67
2.73
99
Blocks b-1 K-1 MST / MSE MSB / MSE
Conclusion: At 5% significance level there is sufficient evidence
to infer that the mean “cholesterol reduction” gained by at least
two drugs are different.
Required Conditions for Test
• The sample from each block in each population is
a simple random sample from the block in that
population
• There are conditions that are similar to the
populations being normal and having equal
variance but they are more complicated (the
book’s description is wrong). We shall discuss
this more when we cover regression. For now,
you should just look for outliers.
Criteria for Blocking
• Goal is to find criteria for blocking that
significantly affect the response variable
• Effect of teaching methods on student test scores.
– Good blocking variable: GPA
– Bad blocking variable: Hair color
• Ideal design of experiment is often to make each
subject a block and apply the entire set of
treatments to each subject (e.g., give different
drugs to each subject) but not always physically
possible.
Test of whether blocking is
effective
• We can test for whether blocking is
effective by testing whether the means of
different blocks are the same.
• We now consider the blocks to be
MSB
F

“treatments” and look at
MSE
• Under the null hypothesis that the mean in
each block is the same, F has an F
distribution with (b-1,n-k-b+1) dof
Practice Problems
• 15.38, 15.40