No Slide Title

Download Report

Transcript No Slide Title

Predictions From Models and Data
Michael Frenklach, Andy Packard and Pete Seiler
Mechanical Engineering
University of California
Berkeley, CA
Support from NSF-CTS0113985. Additional thanks to Laurent
ElGhaoui, Bernd Sturmfels, Pablo Parrilo and Eric Wemhoff.
American Control Conference, May 2002
Copyright 2002, Frenklach, Packard and Seiler. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a
copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way,
Stanford, California 94305, USA.
Model-Based Uncertainty Prediction
Goal: Predict the outcome of a modeled physical process using experimental
results and models of related, but different processes.
Motivation: Model-based and experimental research in Chemical Reaction
Networks. This field is characterized by several interrelated, relevant facts:
– Process to be predicted is complex, though physics-based governing equations
are widely accepted
– Uncertainty in process behavior exists, but much is known regarding “where”
the uncertainty lies in the governing equations (uncertain parameters)
– Numerical simulations of process, with uncertain parameters “fixed” to certain
values, may be performed “reliably”
– Partially-isolated aspects of process are studied experimentally in labs
We propose a collaborative data processing approach that draws on tools
that are now common in Robust Control Theory (RCT)
A case study on a well-known database of methane combustion
experiments/models demonstrates the viability of the method
Methane Combustion: CH4 + 2 O2  CO2 + 2 H2O
Methane reaction mechanisms have
grown in complexity over time:
(~1970): Less than 15 elementary
reactions with 12 species
(~1982): 75 elementary reactions
(forward and reverse) with 25
species
(Today): GRI-Mech has 300+
elementary reactions, 53 Species,
and 102 “active” parameters.
Pathway diagram for methane combustion [Turns]
Modeling of Chemical Reaction Networks (CRN)
D0e0?
Chemistry(r)
Transport 0
S0(r)
Chemistry(r)
Transport 1
S1(r)
Chemistry(r)
Transport N
SN(r)
300+ Reactions,
53 Species, and
102 “Active”
Parameters
Process P0
CH4+202

2H20+CO2
Process P1
D1e1
Process PN
DNeN
Experiments /Outcomes
Physics-based models of Processes
Essentially, predictions are “currently” obtained as follows:
– Model/Perform laboratory experiments Pi
– Estimate some parameters from model/experimental data (Si and Di)
– Use, collectively, all estimated parameters to make predictions about P0
Reporting of Experimental Results
The canonical structure of a technical report (a paper) is:
• Description of experiment: apparatus, conditions, measured observable
– flow-tube reactors, laminar premixed flames, ignition delay, flame speed
• Care in eliminating unknown biases, and assessing uncertainty in
outcome measurement
• Transport and chemistry models that involve uncertain parameters
– momentum, diffusion, heat transfer
– 10-100’s reactions, uncertainty in the rate constant parameters
k(T)=ATn exp(-E/RT)
• Sensitivities of modeled outcome to parameters in chemistry model
– evaluate sensitivities at nominal parameter values
• Focus on parameter(s) resulting in high sensitivities on the outcome
• Assumptions on parameters not being studied
– freeze low-sensitivity parameters at nominal values
• Predict one or two parameter values/ranges
Terminology: Experiments
To be clear: an experiment consists of:
– Measured observable, D
– Experimental tolerance in measuring observable, e
– Mathematical Model, M(r), showing dependency on active variables r n
(taking a deterministic, worst-case view) The experiment actually asserts an
inequality constraint among the active variables: |M(r) - D| < e.
The mathematical model, M(r), of the process is usually physics-based
– Parametrized chemical reaction ODEs
– Energy, momentum, and mass balance laws
Now, recall last 3 steps of the paper
•Focus on parameter(s) resulting in high sensitivities on the outcome
•Assumptions on parameters not being studied
– freeze low-sensitivity parameters at nominal values
•Predict one or two parameter values/ranges
CRN Data Processing
Given:
r2
– A priori knowledge: -1 rk 1 k n.
– An experiment: (M(r), D, e) with r n
From this, all that can be concluded is |M(r)-D|<e.
But, typically the procedure is:
r1
 1  r1, r 2  1

r :

M( r )  D  e 

– Freeze all parameters except one, at the nominal:
rk=0 for k  k0
– Find range of the unfrozen parameter:
max/min rk0
subject to: rk=0 for k  k0
-1 rk0 1
|M(r)-D|<e
r2
The reported range is a subset of what can actually be inferred from
(M(r), D, e), but the implied higher dimensional cube (the new, in-literature
feasible set) neither contains, nor is a subset of the feasible parameter set.
r1
Terminology: Surrogate Models
The mathematical model, M(r), of the process is usually complicated and its
dependence on the parameter vector, r, is unclear.
A surrogate model, S(r), is an algebraic representation of observable as a
function of the parameters. In our context, it is a polynomial “fit” of M(r).
Constructing a Surrogate Model
– Large-scale computer “experimentation” on M(r).
• Random sampling and sensitivity calculations to determine active
parameters
• Factorial design-of-experiments on active parameter cube
– Linear, Quadratic or Polynomial (stay in Sum-of-Squares hierarchy) fit
– Assess the residuals and increase order if necessary
The surrogate model, S(r), replaces the mathematical model, M(r), when
needed. Typically, it is both empirical and physics-based.
Consistent CRN Data Processing
Instead, find a bound on range in a consistent manner. For each k0
max/min rk0
subject to: -1 rk 1 k
|S(r)-D|<e
Test this on the models and measured outcomes in the GRI-Mech Dataset:
– Models with n=102 uncertain parameters
– 77 different experiments (model, outcome, tolerance)
Perform this optimization at e=0.05|D|.
Clearly, few experiments,
on their own, are able to
reduce the uncertainty
of any given parameter.
Message: You can’t reduce the uncertainty in any parameter from any one
experiment. Some collaborative data processing is necessary.
GRI-Mech and GRI Data Set
GRI-Mech (1992-present) addresses the collaborative data processing for methane
• Methane reaction model: 53 Species/300+ Reactions/102 Uncertain constants
– Chemistry(r)
• 77 peer-reviewed, published Experiments/Measured Outcomes
– Processes Pi, measured outcomes Di, measurement uncertainties ei
• Models of (Experiments/Measured Outcomes)
– Math models involving CFD/chemistry/other phenomena
• Surrogate models of (Experiments/Measured Outcomes)
– Factorial design of computer experiments, leading to quadratic Si(r)
• Optimization to get “best” fit: single parameter vector (identification step)
• Validation
Features
• Only "raw" data is used - none of the potentially erroneous conclusions. Part way
towards “give me your information, not your conclusions…”
• Treats the experiments as information, and combines them all.
• Addresses the "lack-of-collaboration" in the post experimental data processing.
• www.me.berkeley.edu/gri_mech
Criticisms
There are criticisms of GRI-Mech. Typically, the objections read like
– “It's too early to undertake such a project, because some fundamental
knowledge is still lacking”
– “I am unwilling to rely on a flame measurement to extract the value of some
fundamental reaction's rate properties -- I prefer to do that in isolation”
– “Not all relevant data was included”
– “The result (one particular number) is different from mine”
Root cause for objection is mostly psychological:
– distributed effort dilutes any one specific contribution (promotion & tenure?)
– protection of individual’s territory
– ownership
– information technology
– complex geometry of feasible set is unappreciated
And...
Limitation in GRI-Mech
GRI-Mech returns a single parameter vector, without uncertainty information.
– The nonlinear dependence makes it difficult to translate the uncertainty in
experimental outcome space (77 dimensional) to an uncertainty in the
parameter-space (102 dimensional).
GRI-Mech could return the smallest coordinate-aligned cube in parameterspace that is consistent with the GRI-Mech Data Set. Computing this yields
Only 25 of the dimensions have been shrunk – some by only a few percent.
Predictive capability associated with Reduced Cube
Now try to use this method for prediction.
– First find the smallest cube in the parameter space that is consistent with 76 of
the experiments.
– On this cube, compute the range of the 77th model.
– Also compute the range of the 77th model on the original unit-cube.
– Compare the two predictions (normalized range reduction)
– Repeat this for all cases
Roughly, an outcome prediction of
50% has been improved to 40%,
knowing the 76 experiments as 5%.
That is a modest improvement, at best.
Approaches to Prediction
Over-bounding the feasible parameter set is decoupled from the prediction.
But the benefit is small.
A coupled approach that bypasses the coarse
description of the parameter set may prove
more effective.
Predicted Range
In this approach we solve one big problem rather
than two decoupled problems. The parameters are
treated as internal variables.
Models/Experiments
Over-bound of
Feasible Parameter Set
New Model
Predicted Range
An Approach
The message at this point: data processing should
– Use all the experimental data and models
– Avoid the intermediate coarse description of the feasible set
D0e0?
Facts
– Any r consistent with all experiments yields a
possible outcome of Po, So(r).
– All such possible outcomes constitute the predicted
outcome set of Po.
Need to understand the extremes of this set,
min n S 0 ( r )
r [ 1,1]
S k ( r )  Dk  ek
1 k  N
and
max n S 0 ( r )
Process P0
CH4+202

2H20+CO2
Process P1
D1e1
Process PN
DNeN
Experiments /Outcomes
r [ 1,1]
S k ( r )  Dk  ek
1 k  N
Problem involves polynomial objective and
constraints, and is non-convex. Take the
viewpoint from RCT, and go for two types of bounds.
Chemistry(r)
Transport 0
S0(r)
Chemistry(r)
Transport 1
S1(r)
Chemistry(r)
Transport N
SN(r)
Physics-based models of Processes
Inner Bounds/Outer Bounds
Compute outer and inner bounds satisfying
L 
min
r [ 1,1]n
Si ( r )  Di  ei
S0 ( r )  L
and
H 
max
r [ 1,1]n
Si ( r )  Di  ei
S0 ( r )  H
Inner bounds
–
–
–
–
Find feasible points, and compute objective
Local search from randomly selected initial conditions
Off-the-shelf nonlinear, constrained minimization tools
NPSOL (www.sbsi-sol-optimize.com/NPSOL.htm)
Outer bounds
– For quadratic surrogates, use S-procedure (going back to Yacobovitch)
• Relying on S-procedure proofs, it is always worse to take the intermediate step,
over-bounding the feasible set with quadratic constraints
– As surrogates become general polynomials, move to the sum-of-squares
hierarchy (Parrilo, Zelentsovsky, Barkin)
• Relying on SOS proofs, with fixed order multipliers, it is always worse to take the
intermediate step, over-bounding the feasible set with polynomial constraints
– SeDuMi (J. Sturm, www.unimass.nl/~sturm/software/sedumi.html)
S-procedure, SOS
Solving for one endpoint of the predicted outcome set reduces to an
indefinite quadratic program:
T
1
1
*
H  max   M 0   subject to
r 
r 
T
1
1
:   M k    0 1  k  2N  n
r 
r 
Use the S-procedure to find an outer bound on this extreme point:
 0 2 N  n
*
If k  0 such that A( , k ) : M o  


M

0
then
H


k
k

0 0 k 1
To find the best upper bound, solve a semi-definite programming problem:
H  min  subject to: k  0 and A( , k )  0
Remarks:
– In fact, the S-procedure only uses one side of each absolute value constraint.
– The sum-of-squares hierarchy can also be used to improve the upper bound.
New Predictive capability
Use this method for prediction on the
GRI-Mech Data Set
– Compute the range of the i’th model,
consistent with the other 76 experiments
& models.
– Also compute the range of the i’th model
on the original unit-cube.
– Compare the two predictions.
– Repeat this for all i
Recall old prediction, using intermediate
cube.
New prediction, using experimental
constraints directly
Now, an outcome prediction of 50% has
been improved to  15%, knowing the
76 experiments as 5%.
Connections to RCT robustness analysis
Some similarities with robust control theory (specifically, worst-case analysis)
In particular,
– high-order, uncertain differential equation models;
– S-procedure to yield “outer bounds”
– heuristic searches to get “inner bounds”
– use of experimental data to refine bounds;
– evidence of what appears to be good performance on a “real” problem
Nevertheless, there are differences:
– data used to correlate uncertainty, not merely shrink a-priori norm bound
– uncertain parameters the same across all experiments
Other differences are:
– systems considered are (at least as modeled) nonlinear;
– unconventional model transformations that in RCT, with linear fractional
uncertainty on linear models, and H criterion are more rigorous (Surrogate
model vs. Frequency Response of an LTI system with LFT uncertainty)
Pre-assessment of Prediction Improvement
•Size of original prediction interval (written as a coupled optimization for
comparison with below)
max
r H , r L [ 1,1]
Si ( r H )  D i  e i
Si ( r L )  D i  e i
n
S0 (r H )  S0 (r L )
•Size of prediction interval knowing that experiment I can be rerun at a new
tolerance of EI and knowing its outcome will be consistent with past
experiments
max n
r H , r L [ 1,1] , N I
Si ( r H )  D i  e i
Si ( r L )  D i  e i
SI ( r H )  N I  E I
SI ( r L )  N I  E I
N I DI E I eI
S0 (r H )  S0 (r L )
• This is analogous to: X Gaussian random vector:
Variance of (X2| X1=x1) = S22 – S21 (S11)-1 S12