Diapositive 1

Download Report

Transcript Diapositive 1

1
RESPONSE SURFACE
METHODOLOGY
(R S M)
Par
Mariam MAHFOUZ
Planning


Part I
A - Introduction to the RSM method
B - Techniques of the RSM method
C - Terminology
D - A review of the method of least squares
Part II
A - Procedure to determine optimum
conditions – Steps of the RSM method
B – Illustration of the method against an example
3
4
A – Introduction to the RSM method
The experimenter frequently faces the task of exploring
the relationship between some response y and a number of
predictor variables x = (x1, x2, … , xk)’. Various degrees of
knowledge or ignorance may exist about the nature of such
relationships.
Most exploratory-type investigations are set up with a
twofold purpose:
1. To determine and quantify the relationship between the
values of one or more measurable response variable(s)
and the sittings of a group of experimental factors
presumed to affect the response(s) and
2. To find the sittings of experimental factors that produce the
best value or best set of values of the response (s).
5
Example
The following is an example of seeking an optimal
value of the response in drug manufacturing.
Combinations of two drugs, each known to reduce
blood pressure in humans, are to be studied.
A series of clinical trials involving 100 high blood
pressure patients is set up, and each patient is given some
predetermined combination of the two drugs.
The purpose of administering the different
combinations of the drugs to the individuals is to find the
specific combination that results the greatest reduction in
the patient’s blood pressure reading within some specified
interval of time.
6
B - Techniques of the Response surface
methodology (RSM)
1.
Setting up a series of experiments (designing a set of
experiments) that will yield adequate and reliable
measurements of the response of interest,
2.
Determining a mathematical model that best fits the
data collected from the design chosen in (1), by
conducting appropriate tests of hypotheses concerning
the model’s parameters, and
3.
Determining the optimal settings of the experimental
factors that produce the optimum (maximum, minimum
or close to a specific value) value of the response.
7
C – Terminology








Factors
Response
The response function
The polynomial representation of a response surface
The predicted response function
The response surface
Contour representation of a response surface
The operability region and the experimental region
8
Factors
Factors are processing conditions or input variables
whose values or settings can be controlled by
experimenter. Factors in a regression analysis can be
qualitative or quantitative.
The specific factors whose levels are to be studied in
detail in this course are those that are quantitative in
nature, and their levels are assumed to be fixed or
controlled by the experimenter.
Factors and their levels will be denoted by X1, X2,…,Xk
respectively.
9
Response
The response variable is the measured quantity
whose value is assumed to be affected by
changing the levels of the factors. The true value
of the response is denoted by .
However, because experimental error is present
in all experiments involving measurements, the
response value that is actually observed measured
for any particular combination of the factor levels
differs from .
This difference from the true value is written as
Y =  + , where Y represents the observed value
of the response and  denotes experimental error.
10
11
Response function
When we say that the value of the true response 
depends upon the levels X1, X2,…,Xk of k quantitative
factors, we are saying that there exists some function of
theses levels,  = (X1, X2,…,Xk).
The function  is called the true response function
(unknown), and is assumed to be a continuous , smooth
function of the Xi.
12
The polynomial representation of a
response surface
Let us consider the response function  = (X1) for a
1313single factor. If  is a continuous, smooth function,
then it is possible to represent it locally to any required
degree of approximation with a Taylor series expansion
about some arbitrary point X1,0:
  ( X 1,0 )  ( X 1  X 1,0 )( X 1,0 ) 
( X 1  X 1,0 ) 2
2!
( X 1,0 ) 
( X 1  X 1,0 )3
3!
( X 1,0 )  ...
where , , ,... are respectively, the first,
second, … derivatives of (X1) with respect to X1 .
13
The expansion (1) reduces to a polynomial of the form:
  ( X 1 )   0  1 X 1  11 X 12  111 X 13  ...
where the coefficients 0, 1, 11 … are parameters which depend on
X1,0 and the derivatives of (X1) at X1,0.
First order model with one factor:
  ( X1 )  0  1 X1
Second order model with one factor:
  ( X1 )  0  1 X1  11 X12
The second order model with two factors is in the form (equation 1):
  ( X1 )  0  1 X1  2 X 2  11 X12  22 X 22  12 X1 X 2
And so on …
14
Predicted response function

The structural form of  is usually unknown and
therefore an approximating form is sought using a
polynomial or some other type of empirical model
equation.
The steps, taken in obtaining the approximating model,
are as follows:
First, an assumed form of model equation in the k input
variables is proposed. Then, associated with the proposed
model, some number of combinations of the levels X1, X2,
…, Xk of the k factors are selected for use as the design. At
each factor level combination chosen, one or more
observations are collected.
15



The observations are used to obtain estimates of the parameters in the
proposed model as well as to obtain an estimate of the experimental
error variance.
Tests are then performed on the magnitudes of the coefficient
estimates as well as on the model form itself, and, if the model is
considered to be satisfactory, it can be used as a prediction equation.
Let us assume the true response function is represented by equation 1.
Estimates of
X the parameters 0, 1, … are obtained using the method of
least squares.
If these estimates,
denoted by b0, b1, … respectively, are used instead
.
of the unknown parameters 0, 1, …, we obtain the prediction

equation:
2
2
1

Y  b0  b1 X1  b2 X 2  b11 X1  b22 X 2  b12 X1 X 2

Y , called “Y hat”, denotes the predicted response value for
where
given values of X1 and X2.
16
The response surface
With k factors, the
response surface is a
subset of (k+1)dimensional Euclidean
space, and have as
equation:

Y  b0  b1 x1  b2 x2  ...
where xi, i=1, …, k, are
called coded variables.
17
Contour representation of a response
surface
A technique used to help visualize the shape of a
three-dimensional response surface (case of two
factors), is to plot the contours of the response surface.
In a contour plot, lines or curves of constant
response values are drawn on a graph or plane whose
coordinate axes represent the coded levels x1 and x2 of
the factors.
The lines (or curves) are known as contours of the
surface. Each contour represents a specific value
for the

height of the surface (i.e., a specific value of Y ) above
the plane defined for combinations of the levels of the
factors.
18
Geometrically,
each contour is
a projection
onto the x1x2
plane of a
cross-section of
the response
surface made
by a plane,
parallel to the
x1x2 plane,
cutting through
the surface.
19
20
21
The plotting of different surface height
values enables one to focus attention on the
levels of the factors at which the changes
occur in the surface shape.
Contour plotting is not limited to three
dimensional surfaces.
The geometrical representation for two and
three factors enables the general situation for k
> 3 factors to be more readily understood,
although they cannot be visualized
geometrically.
22
Operability and experimental regions
Let us call the region in the factor space in which the
experiments can actually be performed the operability
region O.
For some applications the experimenter may wish to
explore the whole region O, but this is usually rare. Instead,
a particular group of experiments is set up to explore only a
limited region of interest, R, which is entirely contained
within the operability region O. The region is called
experimental region.
23
In most experimental
programs, the design
points are positioned
inside
or
on
the
boundary of the region
R.
Typically R is defined
as, a cubical region, or
as a spherical region.
24
D – A review of the method of least squares
Let us assume provisionally that N observations of the
response are expressible by means of the first-order model
in k variables:
Yu  0  1 X u1  2 X u 2  ...  k X uk   u
, u  1,...,N
Yu denotes the observed response for the uth trial, Xui
represents the level of factor i at the uth trial, 0 and i are
unknown parameters, u represents the random error in Yu
and N is the number of observations (experiences).
25
Assumptions made about the errors are:
1.
2.
3.
Random errors u have zero mean and common
variance 2.
Random errors u have zero mean and common
variance 2.
For tests of significance (T- and F_statistics), and
confidence interval estimation procedures, an
additional assumption must be satisfied:
Random errors u are normally distributed.
26
Parameter estimates and properties
The method of least squares selects as estimates for
the unknown parameters in Eq. (1), those values, b0, b1,…,
bk respectively, which minimize the quantity:
N
R(  0 , 1 ,..., k )   (Yu  0  1 X u1   2 X u 2  ...   k X uk ) 2
u 1
Over N observations, the first-order model in Eq. (1) can be
expressed, in matrix notation, as: Y=X + , where:
 Y1 
.
 
Y  . 
 
.
YN 
1 X 11 . . . X 1k 
. .

.


X  . .
. .


. 
. .
1 X N1 . . . X Nk 
 0 
 
 1
 . 
  
 . 
 . 
 
  k 
 1 
 . 
 
  . 
 
 . 
 N 
27
The parameter estimates b0, b1,…, bk which
minimize R(0, 1, …, k) are the solutions to the (k+1)
normal equations, which can be expressed, in matrix
notation, as:
X’ X b = X’ Y,
where X’ is the transpose of the matrix X,
and b=(b0, b1,…, bk)’.
The matrix X is assumed to be of full column rank.
Then:
b=(X’X)-1 X’ Y, where (X’X)-1 is the inverse of X’ X.
If the used model is correct, b is an unbiased estimator
of .
The variance-covariance matrix of the vector of
estimates, b, is: Var(b) = 2(X’ X)-1.
28
It is easy to show that the least squares
estimator, b, produces minimum variance estimates
of the elements of  in the class of all linear
unbiased estimators of .
As stated by the Gauss-Markov theorem, b is the
best linear unbiased estimator (BLUE) of .
29
Predicted response values
Let x 'p denote a 1x p vector whose elements
correspond to the elements of a row of the matrix X (p>k).
The expression for the predicted value of the response,
at point x'p
in the experimental region is:

Y ( x p )  x'pb
Hereafter we shall use the notation Yˆ ( x) to denote
the predicted value of Y at the point x  ( x , x ,...,x ) .
1
2
k
A measure of the precision of the prediction, defined
as the variance of Yˆ ( x) , is expressed as:






Var Yˆ ( x)  Var  x p b   x p Var (b) x p  x p X  X



1
x p 2
30
Estimation of 2
Let
ru
ru  Yu  Yˆ ( xu )
, u=1, …,N=number of experiments.
is called the uth residual.
For the general case where the fitted model contains p
parameters, the total number of observations is N (N>p)
and the matrix X is supposed of full column rank, the
estimate, s2 of 2, is computed from:
N

1
1
1
2
s 
r

Y

X
b
Y

X
b

SSE

u
N  p u 1
Np
Np
2



SSE is the sum of squared residuals. The divisor N-p is the
degrees of freedom of the estimator s2.
When the true model is given by Y=X+, then s2 is an
unbiased estimator of 2.
31
The Analysis of variance table
The entries in the ANOVA table represent measures
of information concerning the separate sources of
variation in the data.
The total variation in a set of data is called the total
sum of square (SST):
N
2
SST   Yu  Y  where
Y is the mean of Y.
u 1
The total sum of squares can be partitioned into two
parts: The sum of squares dues to regression, SSR (or
sum of squares explained by the fitted model) and the
sum of squares unaccounted for by the fitted model, SSE
(or the sum of squares of the residuals).
N
N
2
2
ˆ
ˆ
and
SSR   Yu  Y 
SSE   Yu  Yu 
u 1
u 1
32
If the fitted model contains p parameters, then the
number of degrees of freedom associated with SSR is p-1,
and this associated with SSE, is N-p.
Short-cut formulas for SST, SSR, and SSE are possible
using matrix notation. Letting 1’ be a 1xN vector of ones,
we have:
SST  Y Y
2

1Y 

N
2

1Y 
SSR  bX Y 
N
SST  Y Y  bX Y
Note that: SST = SSR + SSE
33
The usual test of the significance of the fitted
regression equation is a test of the null hypothesis:
H0: “all values of i (excluding 0) are zero”.
The alternative hypothesis is
Ha: “at least one value of i (excluding 0) is not zero”.
Assuming normality of the errors, the test of H0 involves
first calculating the value of the F-statistic F  MSR
MSE
where
SSR
MSR 
is called the mean square regression, and
p 1
is called the mean square residual.
SSE
MSE 
N p
34
If the null hypothesis is true, the F-statistic
follows an F-distribution with (p-1) and (N-p) degrees
of freedom in the numerator and in the denominator,
respectively.
The second step of the test of H0 is to compare
the value of F to the table value, F,p-1,N-p, which is the
upper 100 percent point of the F-distribution with (p1) and (N-p) degrees of freedom, respectively.
If the observed value of F exceeds F,p-1,N-p, then
the null hypothesis is rejected at the  level of
significance
35
An accompanying statistic to the F-statistic is the
coefficient of determination: R 2  SSR
SST
The value of R2 is a measure of the proportion of total
variation of the values of Yu about the mean Y explained
by the fitted model.
A related statistic, called the adjusted R2 statistic, is:
SSE ( N  1)
R  1
SST ( N  p)
2
A
or
 N 1 

RA2  1  (1  R 2 )
 N  p
36
ANOVA table
Source of
variation
Due to
regression
(fitted
model)
Degrees of
freedom (df)
p-1
Sum of
square (SS)
SSR
Residual
(error)
N–p
SSE
Total
(variations)
N–1
SST
Mean
square (MS)
F-statistic
MSR =
F=
SSR / (p-1)
MSR / MSE
MSE =
SSE / (N-p)
37
Tests of hypotheses concerning the individual
parameters in the model
In general, tests of hypotheses concerning parameters
in the proposed model are performed by comparing the
parameter estimates in the fitted model to their respective
estimated standard errors.
Let us denote the least squares estimate of i by bi and
the estimated standard error of bi by est.s.e.(bi).
Then a test of the null hypothesis H0: i=0, is performed
by calculating the value of the test statistic:
bi
t 
est.s.e.(bi )
and comparing the value of t
against a table value, t, from the student-table.
38
The choice of the table value, t, depends on the
alternative hypothesis, Ha, the level of significance, , and
the degrees of freedom for t.
If the alternative hypothesis is Ha: i ≠ 0, the test is
called a two-sidedor test, and the value of t is taken from the
column corresponding to t/2 in the table.
If, on the other hand, the alternative hypothesis is Ha:
i>0 or Ha: i<0, the test is a one sided test, and the value of
t is taken from the column t in the table.
The degrees of freedom for t are the degrees of
freedom of s2 used in est.s.e.(bi).
39
Testing lack of fit of the fitted model using
replicated observations
In general, to say the fitted model is inadequate or is
lacking in fit is to imply the proposed model does not
contain a sufficient number of terms.
1.
2.
This inadequacy of the model is due to either or both of
the following causes:
Factors (other than those in the proposed model) that are
omitted from the proposed model but which affect the
response, and or,
The omission of higher-order terms involving the factors
in the proposed model which are needed to adequately
explain the behavior of the response.
40
Since in most modeling situations it is far
easier from a design and analysis standpoint
to upgrade (add terms to) the model in the
factors already considered than to introduce
new factors to the program, we shall assume
also, upon detecting inadequacy of the fitted
model, that the inadequacy is due to the
omission of higher-order terms in the fitted
model, case (2).
41
The test of adequacy (or zero lack of fit) of the fitted
model requires two conditions be met regarding the
collection (design) of the data values:
1.
The number of distinct design points, n, must exceed the
number of terms in the fitted model. If the fitted model
contains p terms, then n > p.
2.
An estimate of the experimental error variance that does
not depend on the form of the fitted model is required.
This can be achieved by collecting at least two replicate
observations at one or more of the design points and
calculating the variation among the replicates at each
point.
42
In addition, we shall assume the random errors are
normal and independently distributed with a common
variance 2.
When conditions (1) and (2) are met, the residual sum
of squares, SSE, can be partitioned into two sources of
variation:

the variation among the replicates at those design points
where replicates are collected,

and the variation arising from the lack of fit of the fitted
model.
43
The sum of squares due to the replicate
observations is called the sum of squares for
pure error (abbreviated, SSPE) and once
calculated, it is then subtracted from the
residual sum of squares to produce the sum
of squares due to lack of fit (SSLOF).
44
To illustrate the partitioning of the residual sum of
squares, let us first give a formula for calculating the pure
error sum of squares.
Denote the uth observation at the lth design point by
Yul, where u=1,2,…,rl 1, l=1,2,…,n.
Define
to be the average of the rl observations at
l
the lth design Ypoint.
Then the sum of squares for pure error
is calculated using:
SSPE   Yul  Yl 
n
rl
2
l 1 u 1
The degrees of freedom associated with SSPE is
n
 (r
l 1
l
 1)  N  n
where N is the total number of observations.
45
The sum of squares due to lack of fit is found
by subtraction: SSLOF = SSE - SSPE .
The degrees of freedom associated with
obtained by subtraction is (N-p)-(N-n) = n-p.
An expanded analysis-of-variance table that
displays the partitioning of the residual sum of
squares is the following one:
46
Source
Df
SS
MS
Due to
regression
p-1
SSR
SSR / (p-1)
Residual
N-p
SSE
SSE / (N-p)
Lack of fit
n-p
SSLOF
MSLOF =
SSLOF / (n-p)
Pure error
N-n
SSPE
MSPE =
SSPE / (N-n)
Total
(variations)
N-1
SST
F-statistic
MSLOF / MSPE
47
The test of the null hypothesis of adequacy of fit (or
lack of fit is zero) involves calculating the value of the Fratio:
F 
SSLOF /(n  p)
SSPE /( N  n)
and comparing the value of F with a table value of the
Fisher distribution. Lack of fit can be detected, at the 
level of significance, if the value of F exceeds the table
value,
 , n  p , N n
F
where the latter quantity is the upper 100 percentage
point of the central F-distribution.
48
The use of the coded variables in the fitted
model
The use of coded variables in place of the input variables
facilitates the construction of experimental design.
Coding removes the units of measurement of the
input variables and as such distances measured along
the axes of the coded variables in k-dimensional space
are standardized (or defined in the same metric).
Another advantages to using coded variables rather
than the original input variables, when fitting polynomial
models, are: computational ease and increased accuracy
in estimating the model coefficients, and enhanced
interpretability of the coefficient estimates in the model.
49
50
A - Procedure to determine optimum conditions
steps of the method
This method permits find the settings of the input
variables which produce the most desirable response
values.
These response values may be the maximum yield or
the highest level of quality coming off the production line.
Similarly, we may seek the variables settings that
minimize the cost of marking the product.
In any case, the set of values of the input variables
which result in the most desirable response values is
called the set of optimum conditions.
51
Steps of the method
The strategy in developing an empirical model through
a sequential program of experimentation is as follows:
1.
The simplest polynomial model is fitted (a first-order
model) to a set of data collected at the points of a firstorder design. If extra points are included from which data
are collected and an estimate of the error variance is
available, the model is tested for adequacy of fit.
2.
If the fitted first-order model is adequate, the information
provided by the fitted model is used to locate areas in the
experimental region, or outside the experimental region,
but within the boundaries of the operability region, where
more desirable values of the response are suspected to
be.
52
3. In the new region, the cycle is repeated in that
the first-order model is fitted and testing for
adequacy of fit. If nonlinearity in the surface
shape is detected through the test for lack of fit
of the first-order model, the model is upgraded
by adding cross-product terms and / or pure
quadratic terms to it. The first-order design is
likewise augmented with points to support the
fitting of the upgraded model.
53
4. If curvature of the surface is detected and a
fitted second-order model is found to be
appropriate, the second-order model is used
to map or describe the shape of the surface,
through a contour plot, in the experimental
region. If the optimal or most desirable
response values are found to be within the
boundaries of the experimental region, then
locating the best values as well as the
settings of the input variables that produce
the best response values in the next order of
business.
54
5. Finally, in the region where the most desirable
response values are suspected to be found,
additional experiments are performed to
verify that this is so. Once the location of the
most
desirable
response
values
is
determined, the shape of the response
surface in the immediate neighborhood of the
optimum is described.
55
B- Illustration of the method against an
example
For simplicity of presentation we shall
assume there is only one response variable to
be studied although in practice there can be
several response variables that are under
investigation simultaneously.
56
Experience
In a particular chemical reaction setting, the
temperature, X1 , and the length of time, X2 , of
the reaction are known to affect the reaction rate
and thus the percent yield.
An experimenter, interested in determining if
an increase in the percent yield is possible,
decides to perform a set of experiments by
varying the reaction temperature and reaction
time while holding all other factors fixed.
57
The initial set of experiments consists of
looking at two levels of temperature (70° and 90°)
and two levels of time (30 sec and 90 sec).
The response of interest is the percent yield,
which is recorded in terms of the amount of
residual material burned off during the reaction
resulting in a measure of the purity of the end
product.
The process currently operates in a range of
percent purity between 55 % and 75 %, but il is felt
that a higher percent yield is possible.
58
Design 1 – Fitting first order model
For the initial set of experiments, the two-variable
model to be fitted is:
%Yield   0  1 X 1   2 X 2
Each of the four temperature-time settings, 70°-30 sec,
70°-90 sec, 90°-30 sec, 90°-90 sec, is replicated twice and
the percent yield recorded for each of the eight trials.
The measured yield values associated with each
temperature-time combination are listed in the following
table:
59
Original variables
Temperature
X1 (C°)
Time
X2 (sec.)
Coded variables
x1
Percent yield
x2
Y
70
30
-1
-1
49.8
48.1
90
30
1
-1
57.3
52.3
1
65.7
69.4
1
73.1
77.8
70
90
90
90
-1
1
x1 and x2 are the coded variables which are defined as:
x1 
X1  80
10
x2 
X 2  60
30
60
Representation of the first design
61
First-order model
Expressed in terms of the coded
variables, the observed percent yield values
are modeled as:
Y   0  1 x1   2 x2  
The remaining term, , represents random
error in the yield values.
The eight observed percent yield values,
when expressed as function of the levels of
the coded variables, in matrix notation, are:
Y=X+
62
Matrix form
 49.8
 48.1


 57.3


52
.
3


65.7 


69.4 
 73.1


 77.8
=
1
1

1

1
1

1
1

1
1
1
1
1
1
1
1
1
 1
 1
 1

 1
 1

 1
 1

 1
 0 
 
 1 +
  2 
 1 
 
 2
 3 
 
 4 
 5 
 
 6 
 
 7
 8 
63
Estimations
The estimates of the coefficients in the firstorder model are found by solving the normal
equations:
X Xb  X Y
The estimates are:
b 
 X X 1
6 1.6 8 7 5

X Y  
3
.
4
3
7
5



 9.8 1 2 5 

The fitted first-order model in the coded variables is:
Yˆ ( x)  61.6875 3.4375x1  9.8125x2
64
ANOVA table – design 1
Source
Degrees of
freedom
d.f.
Sum of
squares
SS
Mean
square
F
Model
2
864.8125
432.4063
63.71
Residual
5
33.9363
6.7873
Lack of fit
1
2.1013
2.1013
Pure error
4
31.8350
7.9588
0.264
65
Test of adequacy
To perform a test on the adequacy of the fitted model,
the errors in the observed percent yield values are
assumed to be distributed normally with mean zero and
variance and variance 2.
The value of the lack of fit test statistic is F = 0.264.
Since this value not exceed the table value F1;4;0.05 = 7.71,
we do not have sufficient evidence to doubt the adequacy
of the fitted model.
In the next steep the fitted model is tested to see if it
explains a significant amount of the variation in the
observed percent yield values.
66
Global test of parameters
This test is equivalent to testing the null
hypothesis,
H :  0
0
1
2
or that both temperature and time have zero or no
effect on percent yield.
The test is highly significant since the corresponding
value of the test statistic is F = 63.71 > F2;5;0.01 =13.27.
Hence, one or both of the parameters, 1 and 2 ,
are non zero.
67
Individual tests of parameters
At this point in the model development, tests are
performed on the magnitudes of the separate effects of
temperature and time on percent yield to see if both terms
b1x1 and b2x2, are needed in the fitted model.
To do that the Student-test is used.
b1  0
For the test of: H :   0 we have t  var(b )  3.73
0
And for
H0 : 2  0
1
1
we have
t
b2  0
 10.65
var(b2 )
Each of the null hypotheses is rejected at the  = 0.05 level
of significance owing to the calculated values, 3.73 and
10.65, being greater in absolute value than the tabled value,
T5;0.025 =2.571.
68
We infer, therefore, that both temperature and time
have an effect on percent yield.
Furthermore, since both b1 and b2 are positive, the
effects are positive.
Thus, by raising either the temperature or time of
reaction, this produced a significant increase in percent
yield.
69
Second stage of the sequential program
At this point in the analysis and in view of the
objective of the experiment, which is to find the
temperature and time settings that maximize the
percent yield, the experimenter quite naturally might
ask, “If additional experiments can be performed, at
what settings of temperature and time should the
additional experiments be run?”
To answer this question, we enter the second
stage of our sequential program of experimentation.
70
Contour plots
The fitted model: Yˆ ( x)  61.6875 3.4375x1  9.8125x2
can now be used to map values of the estimated response
surface over the experimental region.
This response surface is a hyper-plane; their contour
plots are lines in the experimental region.
The contour lines are drawn by connecting two points
(coordinate settings of x1 and x2) in the experimental
region that produce the same value of Yˆ
71
In the figure above are shown the contour lines
of the estimated planar surface for percent yield
corresponding to values of Yˆ ( x) = 55, 60, 65 and 70 %.
72
The direction of tilt of the estimated percent yield
planar surface is indicated by the direction of the arrow
which is drawn perpendicular to the surface contour lines.
The arrow points upward and to the right indicating
that higher values of the response are expected by
increasing the values of x1 and x2 each above +1.
This action corresponds to increasing the temperature
of the reaction above 90°C and increasing the time of
reaction above 90 sec.
These recommendations comprise the beginning steps
in a series of single experiments to be performed along the
path of steepest ascent up the surface.
73
Performing experiments along the path of
steepest ascent
The steepest ascent procedure consists of performing a sequence
of experiments along the path of maximum increase in response.
(Reminder: the direction is dependent on the scale of the coded
variables).
The procedure begins by approximating the response surface
using an equation of a hyper-plane. The information provided by the
estimated hyper-plane is used to determine a direction toward which
one may expect to observe increasing values of the response.
As one moves up the surface of increasing response values and
approaches a region where curvature in the surface is present, the
increase in the response values will eventually level off at the highest
point of the surface in the particular direction.
If one continues in this direction and the surface height
decreases, a new set of experiments is performed and again the firstorder model is fitted. A new direction toward increasing values of the
response is determined from which another sequence of experiments
along the path toward increasing values is performed. This sequence
of trials continues until it becomes evident that little or no additional
increase in response can be achieved from the method.
74
Description of the method of steepest ascent
To describe the method of steepest ascent
mathematically, we begin by assuming the true
response surface can be approximated locally with an
equation of a hyper-plane
k
   0    i xi
i 1
Data are collected from the points of a first-order
design and the data are used to calculate the coefficient
estimates to obtain the fitted first-order model
k
Yˆ ( x)  b0   bi xi
i 1
75
The next step is to move away from the center of the
design, a distance of r units, say, in the direction of the
maximum increase in the response.
By choosing the center of the design in the coded
variable to be denoted by O(0, 0, …, 0), then movement
from the center r units away is equivalent to find the
k
values of x1 , x2 ,...,xk which maximize Yˆ ( x)  b0   bi xi
k
subject to the constraint
x
i 1
2
i
 r
2
i 1
Maximization of the response function is performed
by using Lagrange multipliers. Let
 k 2

Q( x1 , x2 ,..., xk )  b0   bi xi     xi  r 2 
i 1
 i 1

k
where  is the Lagrange multiplier.
76
To maximize Yˆ ( x) subject to the above-mentioned
constraint, first we set equal to zero the partial derivatives
Q( x1 , x 2 ,..., x k )
xi
i=1,…,k and
Q( x1 , x 2 ,..., x k )

Setting the partial derivatives equal to zero produces:
Q( x1 , x2 ,..., xk )
 bi  2xi  0
xi
i =1,…,k, and
k
Q( x1 , x 2 ,..., x k )
  xi2  r 2  0

i 1
The solutions are the values of xi satisfying
or
xi
 2
bi
xi 
bi
2
i = 1,…,k, where the value of  is yet to be
determined. Thus the proposed next value of xi is directly
proportional to the value of bi.
77
Let us the change in Xi be noted by i , and the
change in xi be noted by i. The coded variables is
obtained by these formulas x  X  X
where
i
i
i
si
Xi
(respectively si) is the mean (respectively the
standard deviation) of the two levels of Xi .
Thus
i 
i
si
xi   i 
( X i  i )  X i
si
or
, then
 i  si  i
78
Let us illustrate the procedure with the first-order
model:
Yˆ ( x)  61.6875 3.4375x1  9.8125x2
that was fitted early to the percent yield values in our
example.
To the change in X2, 2=45 sec. corresponds the
change in x2, 2=45/30=1.5 units.
In the relation
1
b1

x1
x2

b1
b2
 2 , thus
b2
1
3.4375

, we can substitute i to xi:
1.5
9.8125
and 1 = 0.526, so
1=0.526*10=5.3°C .
79
The first point on the path of steepest ascent,
therefore, is located at the coordinates (x1, x2)=(0.53, 1.5),
which corresponds to the settings in the original variables
of (X1, X2) = (85.3, 105).
Additional experiments are now performed along the
path of steepest ascent at points corresponding to the
increments of distances 1.5 i, 2 i, 3 i, and 4 i (i=1,2).
The table below lists the coordinates of these points
and the corresponding observed percent yield values.
80
Points along the path of steepest ascent and
observed percent yield values at the points
Temperature
X1 (°C)
Time
X2 (sec.)
Base
80.0
60
i
5.3
45
Base + i
85.3
105
74.3
Base + 1.5 i
87.95
127.5
78.6
Base + 2 I
90.6
150
83.2
Base + 3 i
95.9
195
84.7
Base + 4 i
101.2
240
80.1
Observed
percent yield
81
The observed percent yield values increase to a
value of 84.7 % at the setting in X1 and X2 of 95.9 °C
and 195 sec, respectively, and then the value drop to
80.1 % at X1 = 101.2 °C and X2 = 240 sec.
Our thinking at this moment is that either the
temperature of 101.2 °C is too high or the length of
time of 240 sec is too long and therefore additional
experimentation along the path at higher values of X1
and X2 would not be useful.
The decision is made to conduct a second group
of experiments and again fit a first-order model.
The table below list the points of the design two
with two replicate yield values were collected at each
of the four factorial combinations along with a
second replicated observation at the center point.
82
Sequence of experimental trials performed in
moving to a region of high percent yield values
Design two: For this design the coded variables are
Time ( X 2 )  195
Tem perature( X 1 )  95.9
defined as:
x

2
x1 
30
10
x1
x2
X1
X2
% yield
-1
-1
85.9
165
82.9; 81.4
+1
-1
105.9
165
87.4; 89.5
-1
+1
85.9
225
74.6; 77.0
+1
+1
105.9
225
84.5; 83.1
0
0
95.9
195
84.7; 81.9
83
84
The fitted model corresponding to the group of experiments
of design two is: Yˆ ( x)  82.70  3.575x1  2.750x2
The corresponding analysis of variance is:
Source
d.f.
SS
MS
F
Model
2
162.745
81.372
42.34
Residual
7
13.455
1.922
Lack of fit
2
2.345
1.173
Pure error
5
11.110
2.222
Total
(variations)
9
176.2
0.53
85
The test for lack of fit of this model produced an F
value of F = 0.53, which is not significant.
The test of significance of the fitted model produced a
highly significant F=42.34 value.
Thus, the information obtained from this fitted model is
used to obtain a new direction in which to perform
additional experiments in seeking higher percent yield
values.
The table below lists the sequence of experimental
trials that were performed in the direction two:
86
sequence of experimental trials that were
performed in the direction two:
Steps
x1
x2
X1
X2
% yield
1
Base + I
+1
- 0.77
105.9
171.9
89.0
2
Base + 2 I
+2
- 1.54
115.9
148.8
90.2
3
Base + 3 I
+3
- 2.31
125.9
125.7
87.4
4
Base + 4 I
+4
- 3.08
135.9
102.6
82.6
87
Retreat to center + 2 i and proceed in
direction three
Steps
x1
x2
X1
X2
% yield
5
Replicated 2
+2
- 1.54
115.9
148.8
91.0
6
+3
- 0.77
125.9
171.9
93.6
7
+4
0
135.9
195
96.2
8
+5
0.77
145.9
218.1
92.9
88
Set up design three using points of steps 6, 7,
and 8 along with the following two points:
Steps
x1
x2
X1
X2
% yield
9
+3
0.77
125.9
218.1
91.7
10
+5
- 0.77
145.9
171.9
92.5
11
Replicated 7
+4
0
135.9
195
97.0
Center of design is (X1; X2)=(135.9; 195)
Fitted model using percent yield values in steps 6 – 11:
Yˆ ( x)  93.983 0.025x1  0.375x2
This model is considered not adequate.
89
90
Moore explanations
The figure above shows the sequence of experiments performed
by numbering the points which are listed as steps 1 – 11 in the least
table, where steps 1 – 4 represent the experimental trials taken along
the second direction of steepest ascent.
Step 5 denotes a return to the point in step 2 and replicating the
experiment to validate the previously high percent yield value.
In fact, at step 6 the coded values of the temperature and time
combinations are +1, +1, respectively, in a ¾ replicate of a factorial
design consisting of the points at steps 1, 3, and 6, with the point at
step 5 as center.
Upon observing a higher percent yield at step 6 than at step 5,
steps 7 and 8 represent additional experiments performed along a third
direction defined by the line joining the points of steps 5 and 6.
The choice of this third direction represents a deviation from the
conventional steepest ascent (or descent) approach and was
undertaken in an attempt to reduce the amount of work required in
setting up a complete factorial experiment with center at point 5 and
the subsequent fitting of another first-order model.
91
Design three was set up using the point at step 7 as its
center. It includes steps 6 – 11. If we redefine the coded
variables:
Tem perature( X 1 )  135.9
x1 
10
and
x1 
Tim e( X 2 )  195
23.1
then the fitted first-order model is:
Yˆ ( x)  93.983 0.025x1  0.375x2
The corresponding analysis of variance table is:
92
ANOVA table
source
d.f.
SS
MS
F
Model
2
0.5650
0.2825
0.04
Residual
3
22.1833
7.3944
total
5
22.7483
It is obvious from the ANOVA table that the least
model does not explain a significant amount of the
overall variation in the percent yield values, and it is
necessary to fit a curved surface.
93
Fitting a second-order model
A second-order model in k variables is of the form:
k
k
i 1
i 1
Y   0   i xi   ii xi2    ij xi x j  
i j
The number of terms in the model above is
p=(k+1)(k+2)/2; for example, when k=2 then p=6.
Let us return to the chemical reaction example of the
previous section. To fit a second-order model (k=2), we
must perform some additional experiments.
94
Central composite rotatable design
Suppose that four additional experiments are
performed, one at each of the axial settings (x1,x2):
( 2 ,0);( 2 ,0);(0, 2 ); (0, 2 )
These four design settings along with the four factorial
settings: (-1,-1); (-1,1); (1,-1); (1,1) and center point
comprise a central composite rotatable design.
The percent yield values and the corresponding nine
design settings are listed in the table below:
95
Central composite rotatable design
96
Percent yield values at the nine points of a
central composite rotatable design
97
The fitted second-order model, in the coded
variables, is:
Yˆ ( x)  96.6  0.03x1  0.31x2 1.98x12 1.83x22  0.58x1x2
The analysis is detailed in this table, using the
RSREG procedure in the SAS software:
98
SAS output 1
99
Moore explanations
The test for adequacy of fit of the fitted model
produced an F value (= lack of fit mean square/pure error
mean square) less than 1, which is clearly not significant.
The pure quadratic coefficient estimates are each
highly significant (p<0.001), which indicates that surface
curvature is present in the observed percent yield values.
With the fitted second-order model, we can predict
percent yield values for values of x1 and x2 inside the
region of experimentation.
The table below show some predicted percent yield
values and their variances:
100
SAS output 2
101
Response surface and the contour plot
102
Moore explanations




The contours of the response surface, showing
above, represent predicted yield values of 95.0 to
96.5 percent in steps of 0.5 percent.
The contours are elliptical and centered at the point
(x1; x2)=(- 0.0048; - 0.0857)
or (X1; X2)=(135.85°C; 193.02 sec).
The coordinates of the centroid point are called the
coordinates of the stationary point.
From the contour plot we see that as one moves
away from the stationary point, by increasing or
decreasing the values of either temperature or time,
the predicted percent yield (response) value
decreases.
103
Determining the coordinates of the stationary point

A near stationary region is defined as a region where the
surface slopes (or gradients along the variables axes) are
small compared to the estimate of experimental error.

The stationary point of a near stationary region is the point
at which the slope of the response surface is zero when
taken in all direction.

The coordinates of the stationary point x0  ( x10 , x20 ,...,xk 0 )
are calculated by differentiating the estimated response
equation with respect to each xi, equating these derivatives
to zero, and solving the resulting k equations
simultaneously.
104
Remember that the fitted second-order model in k
variables is:
k
k
i 1
i 1
Yˆ ( x)  b0   bi xi   bii xi2   bij xi x j
i j
To obtain the coordinates of the stationary point,
let us write the above model using matrix notation,
as:
Yˆ ( x)  b0  xb  xBx
105
where
 x1 
x 
 2
 . 
x 
 . 
 . 
 
 xk 
 b1 
b 
 2
.
b 
.
.
 
bk 
and

b11



B





b12
2
.
.
.
b22
.
.
.
.
.
sym etric
.
b1k 
2 
b2 k 

2 
. 
. 

. 
bkk 
106
Some details
The partial derivatives of Yˆ ( x) with respect to x1, x2, …, xk
are :
k

Yˆ ( x )
 b1  2b11 x1   b1 j x j 
x1
j 2

k

Yˆ ( x )
 b2  2b22   b2 j x j 
x2
j 2

.
  b  2 Bx

.

.

k 1

Yˆ ( x )
 bk  2bkk   bkj x j 
xk
j 1

107
Moore details
Setting each of the k derivatives equal to zero and
solving for the values of the xi, we find that the
coordinate of the stationary point are the values of the
elements of the kx1 vector x0 given by:
x0
B 1b

2
At the stationary point, the predicted response
value, denoted by Yˆ0 , is obtained by substituting x0
for x:
'
x
'
'
0b
ˆ
Y0  b0  x0b  x0 Bx0  b0 
2
108
Return to our example
The fitted second-order model was:
Yˆ ( x)  96.6  0.03x1  0.31x2 1.98x12 1.83x22  0.58x1 x2
 1.98127 0.28750 
B

 0.28750  1.83127
B
1
 0.51650  0.08109


 0.08109  0.5588 
so the stationary point is:
 0.00486
1 1
x0   B b  

2
 0.08568
In the original variables, temperature and time of the
chemical reaction example, the setting at the stationary
point are: temperature=135.85°C and time=193.02 sec.
And the predicted percent yield at the stationary point
is:
Yˆ  96.60  0.013 96.613
0
109
Moore details
Note that the elements of the vector x0 do not tell us
anything about the nature of the surface at the stationary
point.
This nature can be a minimum, a maximum or a
mini_max point.
For each of these cases, we are assuming that the
stationary point is located inside the experimental region.
When, on the other hand, the coordinates of the
stationary point are outside the experimental region, then
we might have encountered a rising ridge system or a
falling ridge system, or possibly a stationary ridge.
110
Nest Step
The next step is to turn our attention to
expressing the response system in
canonical form so as to be able to describe
in greater detail the nature of the response
system in the neighborhood of the
stationary point.
111
The canonical Equation of a Second-Order
Response System
The first step in developing the canonical equation
for a k-variable system is to translate the origine of the
system from the center of the design to the stationary
point, that is, to move from (x1,x2,…,xk)=(0,0,…,0) to x0.
This is done by defining the intermediate variables
(z1,z2,…,zk)=(x1-x10,x2-x20,…,xk-xk0) or z=x-x0.
Then the second-order response equation is
expressed in terms of the values of zi as:
Yˆ ( z )  b0  ( z  x0 )b  ( z  x0 ) B ( z  x0 )
 Yˆ  z Bz
0
112
Now, to obtain the canonical form of the predicted
response equation, let us define a set of variables
w1,w2,…,wk such that W’=(w1,w2,…,wk) is given by
W  M z
where M is a kxk orthogonal matrix whose columns
are eigenvectors of the matrix B.
The matrix M has the effect of diagonalyzing B, that
is, where 1,2,…,k are the corresponding eigenvalues
of B.
M BM  diag(1 , 2 ,...,k )
The axes associated with the variables w1,w2,…,wk
are called the principal axes of the response system.
This transformation is a rotation of the zi axes to
form the wi axes.
113
So we obtain the canonical equation:
Yˆ (W )  Yˆ0 W  M  B MW
 Yˆ   w 2  ...   w 2
0
1
1
k
k
The eigenvalues i are real-valued (since the matrix B
is a real-valued, symmetric matrix) and represent the
coefficients of the terms in the canonical equation.
It is easy to see that if 1,2,…,k are:
1) All negative, then at x0 the surface is a maximum.
2) All positive, then at x0 the surface is a minimum.
3) Of mixed signs, that is, some are positive and the
others are negative, then x0 is a saddle point of
the fitted surface.
The canonical equation for the percent yield surface is:
Yˆ  96.6131.6091w12  2.2034w22
114
Moore details
The magnitude of the individual values of the i tell
how quickly the surface height changes along the Wi
axes as one moves away from x0.
Today there are computer software packages available
that perform the steps of locating the coordinates of the
stationary point, predict the response at the stationary
point, and compute the eigenvalues and the
eigenvectors.
115
For example, the solution for optimum response
generated from PROC RSREG of the Statistical Analysis
System (SAS) for the chemical reaction data, is in
following table:
116
Recapitulate
Process to
optimize
Contours and optimal direction
Input and output
variables
Experiments in the
Optimal direction
Experimental and
Operational regions
Locate a new
Experimental region
Series of experiments
New series of experiments
Yes
Fitting
First-order
model
Model
Adequate
?
No
Fitting
First-order
model
Yes
Fitting a
Second-order
model
Model
Adequate
?
No
117
Bibliography

André KHURI and John CORNELL: “Response
Surfaces – Designs and Analyses” , Dekker, Inc., ASQC
Quality Press, New York.

Irwin GUTTMAN: “Linear Models: An Introduction”,
John Wiley & Sons, New York.

George BOX, William HUNTER & J. Stuart HUNTER:
“Statistics for experimenters: An Introduction to
Design, Data Analysis, and Model Building” , John
Wiley & Sons, New York.

George BOX & Norman DRAPPER: “Empirical ModelBuilding and Response Surfaces” , John Wiley & Sons,
New York.
118
119