幻灯片 1 - Stony Brook University

Download Report

Transcript 幻灯片 1 - Stony Brook University

Chapter 13
Analysis of Multi-factor Experiments
Dec 6th 2007
Our Group Member
 Part I: Background Introduction:
 1.Ruirui Pan: Why do we work on this topic?
 2.Xuanti Ying: Introduction to related
technology
Part II: Theoretical Derivation
3. Parameter Estimation:
Ji-Young Yun
4. Theory of two factor experiments:
Mingyi Hong
5. Theory of 2^k experiments
Zheng Zhao
Part III: Data Analysis
6. Data analysis of 2^2 experiment
Wei Hu
7. Data analysis of 2^3 experiment
Hao Zhang
8. Data analysis of 2^k experiment
Ti Zhou
Part IV: Model analysis and Conclusion
9.Model diagnostic and SAS programming
Jun Huang
10.Regression approach and conclusion
Wenbin Zhang
Why do we
work on
this topic?
by Ruirui Pan
What is multifactor experiment?
• In statistics, a multifactor experiment
(also called factorial experiment) is an
experiment whose design consists of two
or more factors, each with discrete
possible values or "levels", and whose
experimental units take on all possible
combinations of these levels across all
such factors.
Basic Concepts
• The primary purpose of an experiment is to
evaluate how a set of predictor variables (called
factors in experimental design jargon) affect a
response variable.
• The different possible values of a factor are
called its levels.
• Each treatment is a particular combination of the
levels of different treatment factors.
Example
Factors
• A Factor is a linked set of experimental
conditions we may wish to compare
e.g.
Levels of temperature
Different methods of solving a problem
Pressure of bicycle tire
• Two types:
Treatment factors
Nuisance factors
Example
• Suppose an engineer wishes to study the total
power used by each of two different motors, A and B,
running at each of two different speeds, 2000 or
3000 RPM.
So the factorial experiment would consist of 8 experimental
units: motor A at 2000 RPM, motor B at 2000 RPM, motor A
at 3000 RPM, and motor B at 3000 RPM. Each combination of
a single level selected from every factor is present twice.
The importance of multifactor experiments
• Single factor experiment ---one-way
ANOVA
• Two-factor Experiments with Fixed
Crossed Factors--- Factor A with a levels and B with b levels
are crossed, there are a*b treatment combinations
• 2^3 Factorial Experiments---3 factors with 2 levels
each, so there are 8 treatment combinations
• 2^k Factorial Experiments--- k factors with 2 levels
each, so there are 2^k treatment combinations
Introduction to related
technology
By Xuanti Ying
related technology
• ANOVA
Introduction to ANOVA
• Analysis of variance (ANOVA) is used to test
hypothesis about differences between two or
more means. The t-test based on the standard
error of the difference between two means can
only be used to test differences between two
means. When there are more than two means, it
is possible to compare each mean with each
other mean using t-tests. However, conducting
multiple t-tests can lead to severe inflation of the
Type I error rate. Analysis of variance can be
used to test differences among several means
for significance without increasing the Type I
error rate.
Who Developed this Technology
• The initial techniques of the analysis of
variance were developed by the
statistician and geneticist R.A.Fisher in the
1920s and 1930s.
The Significance of ANOVA
One important reason for using ANOVA methods
rather than multiple two-group studies analyzed
via t-tests is that the former method is more
efficient, and with fewer observations we can
gain more information.
• Controlling for factors
• Detects interactive effects (The term
interaction was first used by Fisher, 1926.)
Logic of ANOVA
• Partitioning of the sum of squares
The fundamental technique is a partitioning of
the total sum of squares into components related
to the effects used in the model.
• The F-test
The F-test is used for comparisons of the
components of the total deviation. the F-test is
the mean square for each main effect and the
interaction effect divided by the within variance.
Several Types of ANOVA
• One way ANOVA is used to test for
differences among two or more
independent groups.
• Factorial ANOVA or Two way ANOVA is
used when want to study the effects of two
or more treatment variables. (our case)
• Mixed-design ANOVA
Tests Supplementing ANOVA
• All pairwise t-test
• Fisher’s LSD (Least Significant Difference
Method)
• Tukey’s HSD (Honestly Significantly Different
Test proposed by the statistician John Tukey)
• Newman-Keuls method
• Duncan’s Procedure (similar to the NewmanKeuls method)
Parameter Estimation by
Ji-
Young Yun
• EXAMPLE
Consider a grade treatment
experiment to evaluate the effects
of sleeping hours and the
percentage of attendance of the
class
• Factor A : sleeping hours
• Levels
•
enough (if sleeping hours ≥ 8)
•
•
normal (if 6 ≤ sleeping hours ˂ 8)
lack
(if sleeping hours ˂ 6 )
• Factor B:the percentage of attendance
• High (the percentage ≥ 50%)
• Low (the percentage ˂ 50%)
A grade treatment experiment to evaluate the
effects A and B
Factor A
Levels
Factor B Levels
1(high)
2(low)
1(enough)
80, 85, 60
70, 72, 62
2(normal)
90, 92, 94
80, 70, 90
3(lack)
100, 80, 90
90, 80, 70
• Each student numbers of treatment
combination is 3
•
•
•
•
a=3
b=2
n=3
N = (3)(2)(3) =18
A grade treatment experiment to evaluate the
effects A and B
Factor A
Levels
Factor B Levels
1(high)
2(low)
Row mean
1(enough) 75
68
71.5
2(normal)
92
80
86
3(lack)
90
80
85
Column
mean
85.6
76
80.8
Parameters & Estimates
Yijk  ij   ijk     i   j  ( )ij   ijk
yijk: the kth observation on the (i, j)th treatment combination
 ij :the mean of cell (i, j)
 ijk: i.i.d random error, normal distribution
 i: i th row main effect
 j: j th column main effect
( )ij : (i, j)th row-column interaction
Parameters & Estimates
  .., i   i.  ..,  j  .j  ..
( )ij   ij    i   j   ij   i.   . j   ..
yij. : sample mean of the (i, j)th cell;
least square estimate of  ij
ˆ j = y.j. y...
ˆ i = yi.. y...
ij =yij. yi.. y.j.  y...
Analysis of Variance
ANOVA Table for Crossed Two-Way Layout
•
•
•
•
•
•
SST = 2223.92
SSA = 788.64
SSB = 414.72
SSAB = 18.9
SSE = 1001.66
SST = SSA + SSB + SSAS + SSE
• MSA = 394.43 FA
= MSA/MSE=4.33
• MSB = 414.72 FB = MSB/MSE=4.55
• MSAB = 9.45 FAB = MSAB/MSE = 0.103
• MSE = 91.06
• The main effect of sleeping hours and the
percentage of attendance are both highly
significant, but the interaction between the
sleeping hours and the percentage of
attendance is NOT significant at the .1 level.
Theory Derivation for Two Factor
Experiment
Mingyi Hong
Overview
• Sometimes a researcher might want to
simultaneously examine the effects of two
treatments.
• Examples
The effect of sex and race on wage
The effect of the level of pollution and the
level of city services on housing prices
Data from a Balanced Two-way
Layout
Factor ⁿⁿ Factor B levels
A
2
3
levels 1
4
5
…
…
…
…
…
…
…
…
…
…
…
1
2
3
…
The Model
The Sum of Squares
The Chi Square Distributions
• Actually, each of the previous sum of
squares divides the variance is a chi square
distribution. For example,
The ANOVA Identity
Total DF = Row DF + Column DF + Interaction DF + Error DF
The Tests
Multiple Comparisons Between
Rows and/or Columns
• Pairwise comparisons between the row main effects
and/or between the column main effects are generally of
interest only when the interactions are nonsignificant.
• Tukey method to determine 100(1-a)% simultaneous
confidence intervals is as follows.
Theory derivation about 2^k
experiment
Zheng Zhao
One Factor Experiment with Two
levels
A
Data
Mean
A+
y11, y12, y13, y14,……, y1n1
y1
A-
y21, y22, y23, y24,……, y2n2
y2
2² Experiment—The Introduction of
Factor B with Two levels
• Factor A = High (+) or Low (-)
• Factor B = High (+) or Low (-)
Four Treatment Combinations
•
•
•
•
ab = (A High, B High)
a = (A High, B Low)
b = (A Low, B High)
(1) = (A Low, B Low)
Yij ~ N (μi, σ²), i = 1, 2, …,2^k;
j = 1,2,…,n
Factor
Treatment
Combination
Data
A
B
Low (-)
Low (-)
(1)
y(1), 1,……, y(1),n
High (+)
Low (-)
a
ya, 1,……, ya,n
Low (-)
High (+)
b
yb, 1,……, yb,n
High (+)
High (+)
ab
yab, 1,……, yab,n
Main Effect and Interaction Effect of
2² Experiment
(uab  ub)  (ua  u (1)) uab  ua ub  u (1)



2
2
2
(uab  ua )  (ub  u (1)) uab  ub ua  u (1)



2
2
2
(uab  ub)  (ua  u (1)) (uab  ua )  (ub  u (1))
( ) 

2
2
Estimated Effects
• Est. Main Effect A
( yab  yb)  ( ya  y (1))

2
• Est. Main Effect B
( yab  ya )  ( yb  y (1))

2
• Est. Interaction AB
( yab  yb)  ( ya  y (1))
 
2
2³ Experiment—One More Factor
Considered
Factor A
Factor B
Factor C
2³ = 8
Treatment
Combinations
•
•
•
•
•
•
•
•
(1): Low, Low, Low
a: High, Low, Low
b: Low, High, Low
ab: High, High, Low
c: Low, Low, High
ac: High, Low, High
bc: Low, High, High
abc: High, High, High
( yabc  ybc )  ( yac  yc )  ( yab  yb)  ( ya  y (1))
4
( yabc  yac )  ( ybc  yc )  ( yab  ya )  ( yb  y (1))
B
4
( yabc  yab)  ( ybc  yb)  ( yac  ya )  ( yc  y (1))
C
4
{( yabc  ybc )  ( yac  yc )}  {( yab  yb)  ( ya  y (1))}
AB 
4
{( yabc  yac )  ( ybc  yc )}  {( yab  ya )  ( yb  y (1))}
BC 
4
{( yabc  yab)  ( ybc  yb)}  {( yac  ya )  ( yc  y (1))}
AC 
4
{( yabc  ybc )  ( yac  yc )}  {( yab  yb)  ( ya  y (1))}
ABC 
4
A
2^2 experiment
Wei Hu
Calculate the estimated main effects A and
B, and the interaction AB.
B
Low
Low
y11=10
High
y12=15
ỹ1.=12.5
High
y21=20
y22=35
ỹ2.=27.5
ỹ.1=15
ỹ.2=25
ỹ..=40
A
The estimated main effects are:
A ={(y22-y12)+(y21-y11)}/2
={(35-15)+(20-10)}/2 = 15
B ={(y22-y21)+(y12-y11)}/2
={(35-20)+(15-10)}/2 = 10
The estimated interaction effect is:
AB ={(y22-y12)-(y21-y11)}/2
={(35-15)-(20-10)}/2 = 5
I
SST 
SSA 
SSB 
J
 ( y  y )
ij
i 1
j 1
I
J

i 1
j 1
I
J
2
2
2
2

( yi .  y ..)  2{(12.5  20)  (27.5  20) }  225
2
2
2
i 1
b
 y ..)  I
2
.j
(y
 y ..)  2{(15  20)  (25  20) }  100
2
.j
2
2
i 1
J

i 1
 {(10  20)  (15  20)  (20  20)  (35  20) }  350
( yi .  y ..)  J
j 1
I
SSAB 
..
2
a
 ( y
i 1
2
( yij  yi .  y . j  y ..)  {(10  12.5  15  20)  (15  12.5  25  20)
2
2
j 1
 (20  27.5  15  20)  (35  27.5  25  20) }  25
2
2
SSE  SST  SSA  SSB  SSAB  350  225  100  25  0
2
ANOVA Table (Two-Way Layout with Fixed Factors)
Source d.f.
SS
MS
F
A
I 1
SSA
I 1
SSB
J 1
a
J  ( yi.  y..)2
i 1
B
J 1
b
I  ( y. j  y..) 2
i 1
AB
I
( I  1)( J  1)
MSB
MSAB
J
 ( y  y  y
ij
i 1
MSA
MSAB
i.
j 1
.j
 y ..)
2
SSAB
( I  1)( J  1)
___________________________________________
Total N  1
 (y  y )
I
J
2
ij
i 1
j 1
..
Analysis of Variance
(Two-Way Layout with Fixed Factors)
Source
d.f.
SS
MS
F
A
1
225
225
9
B
1
100
100
4
AB
1
25
25
Total
3
350
23 Experiment.
Hao Zhang
DESIGN AND CALCULATION MATRIX
RUN
Comb
I
X1
X2
X1X2
X3
X1X3
X2X3
X1X2
X3
1
(1)
+
-
-
+
-
+
+
-
2
x1
+
+
-
-
-
-
+
+
3
x2
+
-
+
-
-
+
-
+
4
x1x2
+
+
+
+
-
-
-
-
5
x3
+
-
-
+
+
-
-
+
6
x1x3
+
+
-
-
+
+
-
-
7
x2x3
+
-
+
-
+
-
+
-
8
x1x2x3
+
+
+
+
+
+
+
+
Example:
• To study the effects of bicycle seat height,
generator use, and tire pressure on the time
taken to make a half-block uphill run.
• The levels of the factors were as follows:
Seat height
26” (-)
30” (+)
Generator
Off (-)
On (+)
Tire pressure
40 psi (-)
55 psi (+)
BICYCLE DATA:
Travel Times from Bicycle Experiment
Factor
Time ( Secs.)
A
B
C
Run 1
Run 2
Mean
-
-
-
51
54
52.2
+
-
-
41
43
42
-
+
-
54
60
57
+
+
-
44
43
43.5
-
-
+
50
48
49
+
-
+
39
39
39
-
+
+
53
51
52
+
+
+
41
44
42.5
CALCULATION OF THE EFFECTS
X1={(42.5-52)+(39-49)+(43.5-57)+(4252.5)}/4=-10.875
X2={(42.5-39)+(52-49)+(43.5-42)+(5752.5)}/4=+3.125
X3={(42.5-43.5)+(52-57)+(39-42)+(4952.5)}/4=-3.125
X1X2={(42.5-52)-(39-49)}/4+{(43.5-57)-(4252.5)}/4=-0.625
X1X3={(42.5-52)-(43.5-57)}/4+{(39-49)-(4252.5)}/4=+1.125
X2X3={(42.5-39)-(43.5-42)}/4+{(52-49)-(5752.5)}/4=+0.125
X1X2X3={(42.5-52)-(39-49)}/4-{43.5-57)(42-52.5)}/4=+0.875
X1X2
X1X3
X2X3
X1X2X3
CONCLUSIONS:
♦ Only the main effects are large; All interactions
are small in comparison.
♦ The X1 and X3 main effects are negative,
implying that to reduce the travel time the high
levels of these factors should be used.
♦ The X2 main effect is positive, implying that to
reduce the travel time the low level of X2 should
be used
Data Analysis of 2^k Experiment
By Ti Zhou
Based on the former discussion, we now generalize the situation into the
k level where k>3 and deal with an example.
• Assumption: n i.i.d. observations yij (j=1,2..,n) at
each ith treatment combination
• Denote their sample means by y (i  1, 2,..., 2k )
i
ci yi
Contrast

i

1
• Estimated effect 

k 1
2
2k 1
2k
•
•
ci is the contrast coefficient for the main effects.
1 main effect is at high level
ci   main effect is at low level
1
For example when k=3, the contrast
coefficients are as follows:
Run
A
B
C
Treatment
1
-
-
-
-1
2
+
-
-
a
3
-
+
-
b
4
+
+
-
ab
5
-
-
+
c
6
+
-
+
ac
7
-
+
+
bc
8
+
+
+
abc
The contrast coefficients for interactions are obtained
by taking term-by-term products of the contrast vectors
of corresponding main effects.
The contrast coefficient for 2^4 Experiment
2k Design Example
Problem Statement: Generally there are three important factors
in designing a computer.
1.Memory Size (A)
2.Cache Size (B)
3.Number of Processors (C)
A manufacturer wants to study above three effects on the
performance of computers and their interaction. The levels of
each factor are as follows:
Factor
A Memory Size
B Cache Size
C Number of Processors
Level -1
4MB
1kB
1
Level 1
16MB
2kB
2
Computer Design Experiment
Treatmen
t
Combinati
on
Run
Coded Factors
A
B
C
Benchmark Scores
Factor Levels
Replic
a1
Replic
a2
Low (1)
High(+
1)
1
(-1)
-1
-1
-1
16.7
14.8
A(MB)
4
16
2
a
1
-1
-1
24.5
23.3
B(KB)
1
2
3
b
-1
1
-1
13
11.6
C(Unit)
1
2
4
ab
1
1
-1
34.2
33.6
5
c
-1
-1
1
45.1
46.3
6
ac
1
-1
1
59.2
57.3
7
bc
-1
1
1
51.4
49.3
8
abc
1
1
1
81.9
84.6
Go back to Yates‘ Algorithm
• The estimated effect for each factor and interaction effect can be
calculated as follows:
k
Contrast  i 1 ci yi


k 1
2
2k 1
2
• For example, the main effect of A =
( y(1)  ya  yb  yab  yc  yac  ybc  yabc ) / 4
=
(-15.75+23.9-12.3+33.9-45.7+58.5-50.35+83.25)/4 = 18.8.
To summarize, all the effects are calculated:
Effect
A
B
C
AB
AC
BC
ABC
Est.Effect
18.8
21
151.7
8.45
3.925
5.775
1.725
• Statistics Inference for the experiment
SSE   i 1  j 1 ( yij  yi )
2k
n
2
SSE
MSE  k
 s2
2 (n  1)
(n2k  2 )( Est.effect ) 2 MSeffect
FEffect 

2
MSE
s
MSeffect  SSeffect.
f1,2k ( n 1)
Since n=2, we can get SS.effect = 4*(Est.effect)^2
AVONA Table for the Experiment of Design of Computer
Factors
Est.effect
SSeffect/MSeffect
DF
Mean Sq.
F ratio
P-value
Memory
Size(A)
18.8
1413.76
1
1413.76
937.8176
<.0001
Cache Size(B)
9.05
327.61
1
327.61
217.3201
<.0001
Number of
Processors(c)
37.925
5753.2225
1
5753.2225
3816.4
<.0001
AB
8.45
285.61
1
285.61
189.4594
<.0001
AC
3.925
61.6225
1
61.6225
40.87728
0.0002
BC
5.775
133.4025
1
133.4025
88.49254
<.0001
ABC
1.725
11.9025
1
11.9025
7.895522
0.0228
12.06
8
MSE=1.5075
7999.19
15
Total error
Total
Conclusion
• Intuitively increment in all the three factors (Memory,
Cache and Processor) can pose a positive effect on the
performance of a computer.
• The p-value for both main effects and interaction effects
are very small, indicating that all the effects are highly
significant (at significance level 0.05) which is consistent
with intuition.
• The F-ratio for the effect of factor c( number of
processors )is the largest number among all the result. It
indicates that it is the most important factor in the
performance of a computer.
Yates' Algorithm
Frank Yates(1902-1994) had found a systematic algorithm to perform
the above calculations. Recall the former example: Experiment of
Computer Design. We will use it to explain the algorithm.
 List all the treatment combinations and their sample means in the
standard order
 The 2^(k-1) successive pairs of means are first added and then
subtracted. The result saves in the column labeled I.
 Repeat the calculation k times by using the data from column I and
saves the result in the column labeled K.
 Divide the first entry in column k by 2^k while divide the remains by
2^(k-1). Then we obtain the grand mean and all estimated effects.
SS for
Treatment
Combination
Estimated
Effect
Treatment Mean
I
II
III
Effect
(1)
15.75
39.65
85.85
323.4
40.425
a
23.9
46.2
237.55
75.2
18.8
1413.76
b
12.3
103.95
29.75
36.2
9.05
327.61
ab
33.9
133.6
45.45
33.8
8.45
285.61
c
45.7
8.15
6.55
151.7
37.925
5753.223
ac
58.25
21.6
29.65
15.7
3.925
61.6225
bc
50.35
12.55
13.45
23.1
5.775
133.4025
abc
83.25
32.9
20.35
6.9
1.725
11.9025
RELATED SAS PROGRAM
& MODEL DIAGNOSTICS
BY TONY
SAS----•A more efficient way
•Very interesting
One Suggestion?
Run Number
Number
Run
Temperature Concentratio
Concentration
Temperature
n
Catalyst
Catalyst
Yield1
Yield1
Yield2
Yield2
Example: The Pilot Plants Investigation
T(C)
T(C)
C(%)
C(%)
K(A or
or B)
K(A
B)
y1(%)
y1(%)
y2(%)
y2(%)
This experiment employed a 23 factorial
experimental design with two quantitative
factors----temperature T and concentration C
----and a single qualitative factor----type of
catalyst K. Each data value recorded is for the
response yields y1, y2 of two duplicate runs.
The data is as follows:
1
1
160
-1
20
-1
-1A
59
59
61
61
2
2
180
1
20
-1
-1A
74
74
70
70
3
3
160
-1
40
1
-1A
50
50
58
58
4
4
180
1
40
1
-1A
69
69
67
67
5
5
160
-1
20
-1
1B
50
50
54
54
6
6
180
1
20
-1
1B
81
81
85
85
7
7
160
-1
40
1
1B
46
46
44
44
8
8
180
1
40
1
1B
79
79
81
81
data plant;
input t c k result @@;
tc=t*c;
tk=t*k;
ck=c*k;
tck=t*c*k;
datalines;
-1 -1 -1
59 -1 -1
1 -1 -1
74 1 -1
-1 1 -1
50 -1
1
1 Obs.
1 -1k
69
1 t 1
c
-1 -1
1-1
50
1
-1 -1 -1-1
1 -1
1-1
81
2
-1 1 -1-1
-1 31
1-1
46
-1 -1 1 1
1 41
1-1
79
-1 1 1 1
; 5
-1
1
-1
run;6
-1
1
-1
7
-1
1
1
data plant;
do k= -1 to 1 by 2;
do c=-1 to 1 by 2;
do t=-1 to 1 by 2;
do r=1 to 2;
input result @@;
ck=k*c; tk=k*t; tc=c*t; tck=k*c*t;
output;
end;
end;
end;
-1 end; 61
datalines;
-159 61 74 70
70 50 58 69 67
85 46 44 79 81
-150 54 81 58
;
67 ck
run; Res.
r -1
tk
tc
Q:
Can you input
the data in a
more concise
way?
tck
11
5954
1
1
1
-1
21
6185
1
1
1
-1
11
7444
1
-1
-1
1
21
7081
1
-1
-1
1
1
50
-1
1
-1
1
2
58
-1
1
-1
1
1
69
-1
-1
1
-1
proc glm data=plant;
class t c k tc tk ck tck;
model result = t c k tc tk ck tck;
run;
OUTPUT:
The SAS System
19:23 Wednesday, December 1, 2007 30
The GLM Procedure
Class Level Information
Class
Levels Values
t
2 -1 1
c
2 -1 1
k
2 -1 1
tc
2 -1 1
tk
2 -1 1
ck
2 -1 1
tck
2 -1 1
Number of Observations Read
Number of Observations Used
16
16
The SAS System
19:23 Wednesday, December 1, 2007 31
The GLM Procedure
Dependent Variable: result
Sum of
Squares
Source
DF
Model
7
2635.000000
Error
8
64.000000
15
2699.000000
Corrected Total
R-Square
0.976288
Coeff Var
4.402221
Mean Square
376.428571
F Value
47.05
8.000000
Root MSE
2.828427
result Mean
64.25000
Pr > F
<.0001
Source
t
c
k
tc
tk
ck
tck
Source
t
c
k
tc
tk
ck
tck
DF
1
1
1
1
1
1
1
DF
1
1
1
1
1
1
1
Type I SS
2116.000000
100.000000
9.000000
9.000000
400.000000
0.000000
1.000000
Mean Square
F Value
Pr > F
Conclusion:
2116.000000 264.50 <.0001
100.000000
12.50 0.0077
At the significance
9.000000
1.13 0.3198 level
0.05, the
main
effects T,
9.000000
1.13
0.3198
C and TK
interaction
400.000000
50.00
0.0001 are
all significant.
0.000000
0.00 1.0000
1.000000
0.13 0.7328
Type III SS
Mean Square
F Value
Pr > F
2116.000000
100.000000
9.000000
9.000000
400.000000
0.000000
1.000000
2116.000000
100.000000
9.000000
9.000000
400.000000
0.000000
1.000000
264.50
12.50
1.13
1.13
50.00
0.00
0.13
<.0001
0.0077
0.3198
0.3198
0.0001
1.0000
0.7328
REVISION:
proc glm data=plant;
class t c k tk;
model result = t c k tk;
output out=plant2 r=resid p=predic;
run;
proc plot data=plant2;
plot resid*predic;
run;
OUTPUT:
Source
DF
Type III SS
Mean Square
F Value
Pr > F
t
c
k
tk
1
1
1
1
2116.000000
100.000000
9.000000
400.000000
2116.000000
100.000000
9.000000
400.000000
314.54
14.86
1.34
59.46
<.0001
0.0027
0.2719
<.0001
Constant Variance ?
Plot of resid*predic. Legend: A = 1 obs, B = 2 obs, etc.
resid |
|
3.5 +
A
|
3.0 +
A
|
2.5 +
|
2.0 +
A
|
1.5 +
A
A
A
|
1.0 +
A
|
0.5 +
|
0.0 + A
A
|
-0.5 +
A
A
|
-1.0 +
A
|
-1.5 +
|
-2.0 + A
|
-2.5 +
A
|
-3.0 +
A
|
-3.5 +
|
-4.0 +
|
-4.5 +
A
|
---+---------+---------+---------+---------+---------+---------+---------+---------+-45
50
55
60
65
70
75
80
85 predic
CONCLUSION
From the plot of residuals against
fitted values, we find that most dots
are fairly dispersed. Although
dispersion of some dots appears
uneven, maybe it’s not serious
enough to reject the constant
variance assumption.
Is the population normal?
proc univariate data=plant2 plot;
var resid;
run;
Normal Probability Plot
3.5+ CONCLUSION
*++++*
|
*+++
* *+++*+
The|| data comes
from
*+*+++
* *+++
a-0.5+
population
which is
|
+*+++
|
*++*
normal
or
|
+++++
-4.5+ ++++*
approximately
normal.
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
Regression Approach & Summary
Wenbin Zhang
•
•
•
•
•
Purpose of the Regression Approach
Regression Approach to Two-Factor Experiments
k
Regression Approach to 2 Experiments
3
An Example to Regression on 2 Experiments
Summary on Chapter 13
Purpose of the Regression Approach
• To provide A unified approach to the
analysis of balanced or unbalanced
designs
• To predict the responses at specified
combinations of the levels of the
experimental factors
Regression Approach to Two-Factor
Experiments
• Define indicator variables:
For i= 1,…, a-1
i
=
+1
-1
0
if the observation is from the i-th row
if the observation is from the a-th row
otherwise
+1
-1
0
if the observation is from the j-th column
if the observation is from the b-th column
otherwise
For j= 1,…, b-1
j =
Regression Approach to Two-Factor
Experiments
• Regression Model:
a 1
b 1
a 1 b 1
i 1
j 1
i 1 j 1
Y     iui    j v j   ( )ij ui v j  
– Indicator variables ui , v j are predictor variables
–  i ,  j ,( )ij are unknown parameters
• The Regression Model is equivalent to the model
we have examined:
Yijk  ij   ijk     i   j  ( )ij   ijk
   .., i   i.   ..,  j   .j   ..
( )ij  ij    i   j   ij   i.   . j   ..
Regression Approach to 2k
Experiments
• For 22 experiments, define indicator variables x1 and x2 to
represent the levels of A and B:
x 2  1 if B is low
x1  1 if A is low
1 if B is high
1 if A is high
• Regression Model:
E (Y )   0   1x1   2 x 2    x1x 2
( )
.


2
Estimated with main effects and interaction effects:
 0     
•
ˆ 0  y  ˆ 
ˆ



,  

,  12 
A
ˆ B
( ) AB
, ˆ    , ˆ12 

.
2
 2
2
2
Regression Approach to 2k
Experiments
• Similarly, the full regression model for k=3 is:
E(Y )   0   1x1   2 x2   3 x3   12 x1x2   13 x1x3   23 x2 x3   123 x1x2 x3
• Substitute with main effects and interaction effects:
A
B
C
AB
AC
BC
ABC
yˆ  y  x1  x2  x3 
x1 x2 
x1 x3 
x2 x3 
x1 x2 x3
2
2
2
2
2
2
2
• The reduced model after dropping all interactions from the full model:
yˆ  y 
A
B
C
x1  x2  x3
2
2
2
– The parameters will be unchanged because of the orthogonal nature of the
design
Regression Approach to 2k
Experiments
• To predict the response at specified
combinations of the levels of the experimental
factors:
– For numerical factors:
The interpolation formula:
xi 
Specified _ level  Average _ level Specified _ level  ( High  Low) 2

Range 2
( High  Low) 2
– For nominal factors (e.g., either on or off):
Don’t need interpolation
An Example to Regression on 23
Experiments
• The effects of bicycles seat height, generator use, and tire
pressure on the time taken to make a half-block uphill run.
• The level of factors:
Seat height (Factor A): 26”(-), 30”(+)
Generator (Factor B): Off(-), On(+)
Tire pressure (Factor C): 40psi(-), 55psi(+)
• The data are shown in the next slice.
• Questions:
– 1) Predict the minimum travel time using the regression model
– 2) Predict the travel time when seat height = 27”, generator is off,
and tire pressure = 50psi.
An Example to Regression on 23
Experiments
Travel Times from Bicycle Experiment
Factor
Time (Secs.)
A
B
C
Run 1
Run 2
Mean
-
-
-
51
54
52.5
+
-
-
41
43
42.0
-
+
-
54
60
57.0
+
+
-
44
43
43.5
-
-
+
50
48
49.0
+
-
+
39
39
39.0
-
+
+
53
51
52.0
+
+
+
41
44
42.5
An Example to Regression on 23
Experiments
• Because all interactions were found
nonsignificant, we only consider main effects:
(42.5  52.0)  (39.0  49.0)  (43.5  57.0)  (42.0  52.5)
 10.875
4
(42.5  39.0)  (52.0  49.0)  (43.5  42.0)  (57.0  52.5)
B
 3.125
4
(42.5  43.5)  (52.0  57.0)  (39.0  42.0)  (49.0  52.5)
C
 3.125
4
y  47.1875
A
yˆ  47.1875 
10.875
3.125
3.125
x1 
x2 
x3  47.1875  5.4375 x1  1.5625 x2  1.5625 x3
2
2
2
An Example to Regression on 23
Experiments
• 1) The minimum predicted travel time is:
y  47.1875  5.4375(1)  1.5625(1)  1.5625(1)  38.625sec
• 2) when seat height = 27”, generator is off, and tire pressure =
50psi, we have
27  (30  26) / 2
50  (55  40) / 2
 0.5, x2  1, x3 
 0.333
(30  26) / 2
(55  40) / 2
yˆ  47.1875  5.4375(0.5)  1.5625(1)  1.5625(0.333)  47.823sec
x1 
Summary
This chapter gives us the solution of experimental design and the
method of analyzing data we collected.
1. The design of the experiment is called complete factorial design
compared to fractional factorial design.
2. 2k experiment is better than One-Factor-a-time because it can detect
interactions.
3. We discussed two-factor experiments with arbitrary number of levels
per factor and k-factor experiments with two levels per factor.
. Two-factor experiments are considered for balanced designs and
unbalanced designs.
. 2k experiments are considered only for balanced designs.
Summary (cont.)
4. When we get the data of our experiment, we just need to find the
parameter to our linear model:
SST = SS( all single factor effects) +
SS(all interactive effects) +
e(noise)
5. ANOVA table gives us a great way to analyze the data to get and
give us the significance of each factor.
6. We can use regression approach to get the parameters of the model
above.
Thank you!
Any Questions?