Test title - Applied Physics Laboratory

Download Report

Transcript Test title - Applied Physics Laboratory

Slides for Introduction to Stochastic Search
and Optimization (ISSO) by J. C. Spall
CHAPTER 17
OPTIMAL DESIGN FOR EXPERIMENTAL INPUTS
•Organization of chapter in ISSO*
–Background
•Motivation
•Finite sample and asymptotic (continuous) designs
•Precision matrix and D-optimality
–Linear models
•Connections to D-optimality
•Key equivalence theorem
–Response surface methods
–Nonlinear models
*Note: Appendix to these slides is brief discussion of
factorial design (not in ISSO)
Optimal Design in Simulation
• Two roles for experimental design in simulation
– Building approximation to existing large-scale simulation
via “metamodel”
– Building simulation model itself
• Metamodels are “curve fits” that approximate simulation
input/output
– Usual form is low-order polynomial in the inputs; linear in
parameters 
– Linear design theory useful
• Building simulation model
– Typically need nonlinear design theory
• Some terminology distinctions:
– “Factors” (statistics term)  “Inputs” (modeling and
simulation terms)
– “Levels”  “Values”
– “Treatments”  “Runs”
17-2
Unique Advantages of Design in Simulation
• Simulation experiments may be considered special case
of general experiments
• Some unique benefits occur due to simulation structure
• Can control factors not generally controllable (e.g., arrival
rates into network)
• Direct repeatability due to deterministic nature of random
number generators
– Variance reduction (CRNs, etc.) may be helpful
• Not necessary to randomize runs to avoid systematic
variation due to inherent conditions
– E.g., randomization in run order and input levels in
biological experiment to reduce effects of change in
ambient humidity in laboratory
– In simulation, systematic effects can be eliminated since
analyst controls nature
17-3
Design of Computer Experiments in Statistics
• There exists significant activity among statisticians for
experimental design based on computer experiments
– T. J. Santner et al. (2003), The Design and Analysis of
Computer Experiments, Springer-Verlag
– J. Sacks et al (1989), “Design and Analysis of Computer
Experiments (with discussion),” Statistical Science, 409–435
– Etc.
• Above statistical work differs from experimental design with
Monte Carlo simulations
– Above work assumes deterministic function evaluations
via computer (e.g., solution to complicated ODE)
• One implication of deterministic function evaluations: no
need to replicate experiments for given set of inputs
• Contrasts with Monte Carlo, where replication provides
variance reduction
17-4
General Optimal Design Formulation
(Simulation or Non-Simulation)
• Assume model
z = h(, x) + v ,
where x is an input we are trying to pick optimally
• Experimental design  consists of N specific input
values x = i and proportions (weights) to these input
values wi :
 1  2

w1 w 2
N 

wN 
• Finite-sample design allocates n  N available
measurements exactly; asymptotic (continuous)
design allocates based on n  
17-5
D-Optimal Criterion
• Picking optimal design  requires criterion for
optimization
• Most popular criterion is D-optimal measure
• Let M(,) denote the “precision matrix” for an
estimate of  based on a design 
– M(,) is inverse of covariance matrix for estimate
and/or
– M(,) is Fisher information matrix for estimate
• D-optimal solution is
  arg max det  M (, ) 

17-6
Equivalence Theorem
• Consider linear model
zk  hkT   v k , k =1,2,..., n
•
Prediction based on parameter estimate ˆ n and
“future” measurement vector hT is
ˆz = hT ˆ n
• Kiefer-Wolfowitz equivalence theorem states:
D-optimal solution for determining  to be used in
forming ˆ n is the same  that minimizes the
maximum variance of predictor zˆ
• Useful in practical determination of optimal 
17-7
Variance Function as it Depends on
Input: Optimal Asymptotic Design for
Example 17.6 in ISSO
17-8
Orthogonal Designs
• With linear models, usually more than one solution is
D-optimal
• Orthogonality is means of reducing number of solutions
• Orthogonality also introduces desirable secondary
properties
– Separates effects of input factors (avoids “aliasing”)
– Makes estimates for elements of  uncorrelated
• Orthogonal designs are not generally D-optimal;
D-optimal designs are not generally orthogonal
– However, some designs are both
• Classical factorial (“cubic”) designs are orthogonal (and
often D-optimal)
17-9
Example Orthogonal Designs, r = 2 Factors
xk2
xk2
xk1
Cube (2r design)
xk1
Star (2r design)
17-10
Example Orthogonal Designs, r = 3 Factors
xk2
xk2
xk1
xk1
xk3
xk3
Cube (2r design)
Star (2r design)
17-11
Response Surface Methodology (RSM)
• Suppose want to determine inputs x that minimize the
mean response z of some process (E(z))
– There are also other (nonoptimization) uses for RSM
• RSM can be used to build local models with the aim of
finding the optimal x
– Based on building a sequence of local models as one
moves through factor (x) space
• Each response surface is typically a simple regression
polynomial
• Experimental design can be used to determine input
values for building response surfaces
17-12
Steps of RSM for Optimizing x
Step 0 (Initialization) Initial guess at optimal value of x.
Step 1 (Collect data) Collect responses z from several x
values in neighborhood of current estimate of best x
value (can use experimental design).
Step 2 (Fit model) From the x, z pairs in step 1, fit
regression model in region around current best estimate
of optimal x.
Step 3 (Identify steepest descent path) Based on
response surface in step 2, estimate path of steepest
descent in factor space.
Step 4 (Follow steepest descent path) Perform series
of experiments at x values along path of steepest descent
until no additional improvement in z response is obtained.
This x value represents new estimate of best vector of
factor levels.
Step 5 (Stop or return) Go to step 1 and repeat process
until final best factor level is obtained.
17-13
Conceptual Illustration of RSM for Two
Variables in x; Shows More Refined
Experimental Design Near Solution
Adapted from:
Montgomery (2005),
Design and Analysis
of Experiments,
Fig. 11-3
17-14
Nonlinear Design
• Assume model
z = h(, x) + v ,
where  enters nonlinearly and x is r-dimensional
input vector
• D-optimality remains dominant measure
– Maximization of determinant of Fisher information
matrix (from Chapter 13 of ISSO: Fn(, X) is Fisher
information matrix based on n inputs in n × r matrix X)
• Fundamental distinction from linear case is that Doptimal criterion depends on 
• Leads to conundrum:
Choosing X to best estimate , yet need to know  to
determine X
17-15
Strategies for Coping with
Dependence on 
• Assume nominal value of  and develop an optimal
design based on this fixed value
• Sequential design strategy based on an iterated design
and model fitting process.
• Bayesian strategy where a prior distribution is assigned
to , reflecting uncertainty in the knowledge of the true
value of 
17-16
Sequential Approach for Parameter
Estimation and Optimal Design
•
Step 0 (Initialization) Make initial guess at , ˆ 0 . Allocate
n0 measurements to initial design. Set k = 0 and n = 0.
Step 1 (D-optimal maximization) Given Xn , choose the nk
inputs in X = X nk to maximize
det[Fn (ˆ n , X n )  Fnk (ˆ n , X )] .
•
•
Step 2 (Update  estimate) Collect nk measurements
based on inputs from step 1. Use measurements to update
from ˆ n to ˆ n +nk .
Step 3 (Stop or return) Stop if the value of  in step 2 is
satisfactory. Else return to step 1 with the new k set to the
former k + 1 and the new n set to the former n + nk
(updated Xn now includes inputs from step 1).
17-17
Comments on Sequential Design
• Note two optimization problems being solved: one for
, one for 
• Determine next nk input values (step 1) conditioned
on current value of 
– Each step analogous to nonlinear design with fixed
(nominal) value of 
• “Full sequential” mode (nk = 1) updates  based on
each new inputouput pair (xk , zk)
• Can use stochastic approximation to update :
ˆ n 1  ˆ n  anYn  ˆ n | zn 1, xn 1
where Yn ( | zn 1, x n 1)  12   zn 1  h(, x n 1)2 
17-18
Bayesian Design Strategy
• Assume prior distribution (density) for , p(), reflecting
uncertainty in the knowledge of the true value of .
• There exist multiple versions of D-optimal criterion
• One possible D-optimal criterion:
E logdet Fn (, X )   logdet Fn (, X ) p() d 

• Above criterion related to Shannon information
• While log transform makes no difference with fixed , it
does affect integral-based solution
• To simplify integral, may be useful to choose discrete
prior p()
17-19
Appendix to Slides for Chapter 17: Factorial
Design (not in ISSO; see ref. [1] below)
• Classical experimental design deals with linear models
• Factorial design is most popular classical method
– All r inputs (“factors”) changed at one time (note: ref.
[1] uses notation m instead of r)
• Factorial design provides two key advantages over one-ata-time changes:
1. Greater efficiency in extracting information from given
number of experiments
2. Ability to determine if there are interaction effects
• Standard method is 2r factorial; “2” comes about by looking
at each input at two levels: low () and high (+)
– E.g., if r = 3, then have 23 = 8 input combinations:
(  ), (+  ), ( + ), (  +), (+ + ), (+  +), ( + +), (+ + +)
[1] Spall, J. C. (2010), “Factorial Design for Choosing Input Values in Experimentation:
Generating Informative Data for System Identification,” IEEE Control Systems
Magazine, vol. 30(5), pp. 38−53.
17-20
Appendix to Slides (cont’d):
Factorial Design with 3 Inputs
• Consider r = 3 linear model
zk = 0 + 1xk1 + 2xk2 + 3xk3 + 4xk1xk2 + 5xk1xk3
+ 6xk2xk3 + 7xk1xk2xk3 + noise,
where  = [0, 1,…, 7]T represents vector of (unknown)
parameters and xki represents i th term in input vector xk
• 23 factorial design allows for efficient estimation of all
parameters in 
• In contrast, one-at-a-time provides no information for
estimating 4 to 7
• However, 23 factorial design must be augmented in some
2
way if wish to add quadratic (e.g., x k 1) or other higherorder polynomial terms to model
17-21
Appendix to Slides (cont’d):
Illustration of Interaction with 2 Inputs
• Example responses for r = 2: no interaction and
interaction between input variables
• Left plot (no interaction) shows that change in zk with
change in xk2 does not depend on xk1; right plot
(interaction) shows change in zk does depend on xk1
No interaction
Interaction
z
z
k
Xk1= high
(+ +)
k
(+ )
Xk1= high
( +)
( +)
(+ )
Xk1= low
( )
( )
xk2
Xk1= low
(+ +)
xk2
17-22
Appendix to Slides (cont’d):
Efficiency of Factorial Design for Main Effects
Ratio of number of runs needed:
one-at-a-time / factorial
• Factorial design estimates “main effects” (non-interaction)
with greater efficiency than one-at-a-time changes
• Plot below based on same accuracy in estimation for the
two methods 8
7
6
5
4
3
2
1
2
4
6
8
10
Input dimension r
12
14
16
17-23