No Slide Title

Transcript No Slide Title

Collaborative Science: Give us your information,
not your conclusions
Andy Packard
Mechanical Engineering
jointly with Michael Frenklach, Ryan Feeley and Trent Russi
University of California
Berkeley, CA
August 2006 Summary Slides
Support from NSF grants: CTS-0113985 and CHE-0535542
Support from CITRIS
Copyright 2006, Packard, Frenklach, Feeley, and, Russi. This work is licensed under the Creative Commons Attribution-ShareAlike License. To
view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way,
Stanford, California 94305, USA.
Collaborative Science
Limit “science” to mean mechanism understanding through modeling and
experimentation for the purpose of prediction
starting point
Applicability:
focus on
–chemical kinetics modeling
–atmospheric chemistry modeling
–…
–systems biology
Process models are complex, though physics based
governing equations are widely accepted
Uncertainty in process behavior exists, but much is
known regarding “where” the uncertainty lies in the
governing equations (uncertain parameters)
Numerical simulations of process, with uncertain
parameters “fixed” to certain values, may be
performed “reliably”
Processes are studied experimentally in labs
Goal of “collaboration”: quantify the joint information implicit in the
community’s research portfolio.
–Portfolio: diverse, and individually generated
–parametrized models which explain/govern behavior
–observed facts about behavior
Result:
–better decisions
–more reliance on “the facts”
–quicker path to answer relevant questions
Collaborative Science: Who does what?
Sensitivity Analysis
High performance
scientific computing
Experiment Design
Function approximation
on Cubes (polynomial
and/or rational)
error assessment
(many) Laboratory
Experiments
(many) Science Models
Numerical analyst
Prior Information
Repeated requests to run
model at different conditions,
parameter values, etc.
on each model
(many) Science
researchers
Consistency
Prediction
Invalidation certificates
(polynomial inequalities)
Nonlinear programming
(polynomial constraints
and objective)
global opt
Embarrassingly
Parallelizable structure
Relevance
questions
scalability
Branch & Bound
accuracy
(reformulate as
invalidation)
Revisit/think fundamentals
feedback
The result of this process
When we say Invalidation Proof, or Certificate, what do we really mean?
Binary tree (from Branch & Bound)
– The tree partitions prior information into cubes
– At each leaf, you find a cube, and
• Array of Rational functions (one for each Model)
• Array of Approximation Error bounds (one for each approximation)
• Sum of Squares certificate proving emptiness of the array of constraints
Fairly appealing to user. Good tools to drill in and interactively confirm, are
necessary (initially…)
The conclusion might be wrong.
1. Error bound on function approximation wrong
– SOS emptiness proof is not necessarily valid
2. Experiment result wrong
3. Science model wrong
4. A-priori info on parameters wrong
Proof is wrong!
Methane Combustion: CH4 + 2 O2  CO2 + 2 H2O
Gaseous methane reaction models
have grown in complexity over time
with the aim of improving predictions.
(~1970): 15 elementary reactions
with 12 species
(~1980): 75 elementary reactions
with 25 species
(Now): 300+ elementary reactions,
50+ species. Used to predict heat
release and concentrations over a
wide scale, from work production to
pollutant formation.
–ODE model called “GRI-Mech 3.0”
–several releases, now static…
–used as a common benchmark
–Result of 0.5 alpha collab. science
Pathway diagram for methane combustion [Turns]
How did the GRI-Mech come about?
Each of the GRI-Mech ODE releases embody the work of many people, but
not explicitly working together. How did the successful collaboration occur?
Informal mode?
– assimilate conclusions of each paper sequentially
– “read my paper”
– “data is available on my website”
authors stake professional
reputation on these…
No. Didn’t/Doesn’t work – community tried it, but predictive capability of
model did not reliably improve as more high-quality experiments were done.
– Papers tend to lump modeling and theory, experiments, analysis and
convenience assumptions, leading to a concise text-based conclusion
– Conclusions are conditioned on these additional assumptions necessary to
make the conclusion concise.
– Impossible to anonymously “collaborate" since the convenience assumptions
are unique to each paper.
– Goals of one paper are often the convenience assumptions of another.
– Difficult/impossible to trace the quality of a conclusion reached sequentially
across papers
– Posted data is often the text-based conclusion in e-form, little additional
information
but if really, really pressed, perhaps not these
(perhaps doubt those of their colleagues as well)
Traditional Reporting of Experimental Results
The canonical structure of a technical report (a paper) is:
the science aspect
• Description of experiment: apparatus, conditions, measured observable
– flow-tube reactors, laminar premixed flames, ignition delay, flame speed
• Care in eliminating unknown biases, and assessing uncertainty in
outcome measurement
• Informal description of transport and chemistry models that involve
uncertain parameters
– momentum, diffusion, heat transfer
– 10-100’s reactions, uncertainty in the rate constant parameters
k(T)=ATn exp(-E/RT)
• Focus on parameter(s) resulting in high sensitivities on the outcome
– evaluate (numerical sims) sensitivities at nominal parameter values
• Convenience assumptions on parameters not being studied
– freeze low-sensitivity parameters at “nominal” values (obtained elsewhere)
• Predict one or two parameter values/ranges
• Post values on website (rarely models, rarely “raw” data)
Consequence: Mistakes and Artificial Controversies
The most influential (linearized, at nominal) parameter for models 66 and 67
happens to be ρ44 (2nd most influential is ρ45). Look at slices of the feasible set
for experiments 66 and 67 (all other parameters set to nominal). Following the
simplistic paradigm…
1
E66 reports 0.3 ≤ ρ44 ≤ 1.0.
E66
0.5
ρ34
0
ρ45
-0.5
-1
1
E67 reports -1.0 ≤ ρ44 ≤ 0.15,
which is a direct conflict…
E67
0.5
or, perhaps E67 considers both
44 and 45 and then reads
report E66. After doing so
0
E67 reports 0.2 ≤ 45 ≤ 1.0…
-0.5
-1
-1
-0.5
0
ρ44
0.5
1
or, perhaps noting that for any
α, 44=45= α is consistent with
the data, so
E67 reports nothing!
ρ45
ρ44
In any case, all such reports are
wrong!
A higher dimensional slice (but still
a slice!), now including parameter
ρ34 illustrates the inaccuracy.
Key issue: Geometry of feasible set
(not a coordinate-aligned cube) is
unappreciated.
Lessons Learned
Chemical kinetics modeling is a form of
– high dimensional (mechanisms are complex),
– distributed (efforts of many, working separately)
system identification.
Here, we take deterministic, worst-case view
– Understand the impact (on all posed questions:
prediction, consistency, relevance) of the
currently unfalsified parameter set.
The effort of researchers yields complex, intertwined, factual assertions
about the unfalsified values of the model parameters
– Handbook style of {parameter, nominal, range, reference} will not work
– Unrepeatable/undocumented data analysis can be as confounding as
unrepeatable experiments (destructive too…)
– Each individual assertion is usually not illuminating in the problem’s
natural coordinates. Concise individual conclusions are actually rare.
– Information-rich, “anonymous” collaboration is necessary
– Computers must do the heavy lifting
• Managing lists of assertions, reasoning and inference
– Useful role of journal paper: document methodology leading to assertion
Alternate model: Separate asserted facts from analysis
Two types of assertions: models and observed behavior
“Data”
– Assertion of models of physical processes (e.g., “if we knew the
parameter values, this parametrized mathematics would accurately
model the process”)
– Assertion of measured outcomes of physical processes (e.g., “I
performed experiment, and the process behaved as follows…”)
Together, these form constraints in "world"-parameter space of
physical constants.
Data Collaboration
Analysis (global optimization) on the constraints
– Check consistency of a collection of assertions
– Explore the information implied by the assertions
–…
– (old standby) Generate consistent parameter samples.
Collaborative Science
Collaborative Science is the open availability of these 3 components
Is this really necessary?
We think so, at least in chemical kinetics modeling.
Other fields have related views…
“…how well data are turned into knowledge depends on how they
gathered, organized, managed, and exhibited—and those tasks are
increasingly arduous as the data increase. … databases can be far
more than repositories—they can serve as tools for creating new
knowledge”
2000 NRC Workshop on Bioinformatics: Converting Data to Knowledge
GRI-Mech: Successful Data Collaboration
Result:
High quality, predictive Methane reaction model: 50+ Species/300+ Reactions
Based on:
77 peer-reviewed, published Experiments/Measured Outcomes of ~25 groups
Infrastructure to use these did not exist
– Grassroots effort of 4 groups
– Decide on a common, “encompassing” list of species/reactions
– Extract the information in each paper, not simply assimilate conclusions
– Reverse-engineer assertions in light of the common reaction model
eliminating the incompatible convenience assumptions
The rest was relatively “easy”
– Optimization to get “best” fit single parameter vector
– Validation (on ~120 other published results)
Features (www.me.berkeley.edu/gri_mech)
– Only use "raw" scientific assertions - not the potentially erroneous conclusions
– “give me your information, not your conclusions…”
– Treats the models/experiments as information, and combines them all.
Moving forward
– With the assertions now in place, much more can be inferred…
GRI DataSet (assertion set)
The GRI-Mech (www.me.berkeley.edu/gri_mech) DataSet is collection of 77 experimental reports, consisting
of models and ``raw'' measurement data, compiled/arranged towards obtaining a complete mechanism for
CH4 + 2O2 → 2H2O + CO2 capable of accurately predicting pollutant formation. The DataSet consists of:
• Reaction model: 53 chemical species, 325 reactions, depending on…
• Unknown parameters (): 102 active parameters, essentially the various rate constants.
• Prior Information:
, each normalized parameter is presumed known to lie between -1 and 1.
• Processes (Pj): 77 widely trusted, high-quality laboratory experiments, all involving methane combustion, but under different
physical manifestations, and different conditions.
• Process Models (Mj): 77 0-d, 1-d and 2-d numerical PDE models, coupled with the common reaction model.
• Measured Data (dj,uj) data and measurement uncertainty from 77 peer-reviewed papers reporting above experiments.
d1  u1
Chemistry()
Transport 1
M1()
Process P1
300+ Reactions,
50+ Species
CH4 + 2O2
↓
2H2O + CO2
Process
P2
d2  u2
Chemistry()
Transport 2
Chemistry()
Transport 77
M2()
100+ unknown parameters
Process
P77
d77  u77
The prior information, models and
measured data constitute assertions
about possible parameter values.
M77()
•kth assertion associated with prior info:
•Assertions associated with jth dataset
unit:
Research portfolio expressed as deterministic constraints
Suitable for analysis (generally optimization over these)
So, why not manual management of uncertainty
Informal, manual (journal paper/email)
mode would require an efficient
uncertainty description (linear in number
of model parameters, say).
– But this is easy to do this wrong…
– How about consistent, but simple?
For this, use “CRC-Handbook” type
description:
– parameter values
– plus/minus uncertainty
Equivalent to requiring a coordinatealigned cube to contain feasible set.
Very ineffective in extracting the predictive capability of GRI data: ie.,
using assertions to predict the outcome (a range) of another model
Questions to ask
the “dataset”
Given
– Prior info,
– Models, measurements & uncertainty
The feasible set is implied
Consistency: Quantify a measure of consistency of the dataset
sensitivity to u
Prediction: Bound the range of another model
Explore feasible set:
, on
Inner/Outer Bounds
Example: Prediction: Bound the range of another model
, on
Upper bound on maximum by showing infeasability (emptiness) of
Outer bound
Lower bound on maximum by evaluating at a feasible point, given
Inner bound
Prediction Sensitivity
Given the GRI dataset,
and
an additional model,
. Consider the prediction problem
, and
Treat the outer bounds as functions of the experimental uncertainty level,
Look at differential sensitivity of the prediction width to this level.
Compute these for many random
Experiment #: 40
10000
Number of occurrences
9000
“How important is my
experimental contribution
when considered as part of
a larger collection?”
8000
7000
6000
5000
69% removed
4000
Some values of j, (i.e., a model, measured
data, measurement uncertainty) are
particularly uninformative in this manner.
Others are relevant in a modest number of
cases.
A few seem to contribute almost always.
3000
In isolation, none of these individual 4
constraints stands out as special.
2000
1000
0
0
0.02
0.04
0.06

0.08
0.1
0.12
.
High price of low cost uncertainty description
Computational exercise: assess capability of
assertions in predicting the outcome of an additional
model, M0.
Method H: Use only the prior information ( 2 H) on
parameters; gives the prediction interval
d1  u1
Method F: The prediction directly uses the raw
model/data pairs from all assertions, as well as the
prior information.
M1
CH4 + 2O2
↓
2H20 + C02
M77
Process
P77
d77  u77
Method Q: Community “pools” prior information and
assertions, yielding the consistent coordinate-aligned
cube. The prediction interval for M0 uses this,
Process P1
Process M0
P0
d0  u0 ?
Loss using consistent, coordinate-aligned cube
How much information is lost when
resorting to method Q instead of F?
Define the “loss in using method Q''
1
Fraction of Cases with Loss  x
0.9
0.8
0.7
0.6
In 70% of cases, the loss exceeds 0.7
0.5
No loss (LQ=0) if prediction by Q
is as tight as that achieved by F.
0.4
0.3
Complete loss (LQ=1) occurs if
prediction by Q is no better than
method H (only using prior info).
In such case, the experimental
results are effectively wasted.
0.2
Frequency of Loss
0.1
0
0
0.2
0.4
Loss (L )
0.6
0.8
Q
Method Q pays a significant price for its crude representation of the constraints.
1
Consistency results for GRI-DataSet assertions
GRI-DataSet is consistent,
Nevertheless, the consistency measure is
very sensitive (using multipliers from the
dual form) to 2 particular experimental
assertions, but not to the prior info.
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
20
40
60
80
100
0
0
20
40
60
80
Experiment #
Parameter #
The scientists involved rechecked calculations, and
concluded that reporting errors had been made.
Both reports were updated -- one
measurement value increased, one decreased
-- exactly what the consistency analysis had
suggested (without us informing them of that).
Sensitivity of the consistency measure to
individual assertions is greatly reduced, and
spread more evenly across data set.
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
20
40
60
80
Parameter #
100
0
0
20
40
60
Experiment #
80
PrIMe: Process Informatics Modeling (www.primekinetics.org)
Combustion impacts everything
– Economies
– Politics
– Environment
Predictive capability leads to informed decisions and policymaking
PrIMe: A community activity aimed at the development of predictive
reaction models for combustion
Challenge
– to meet immediate needs for predictive reaction models in combustion
engineering, the petrochemical industry, and pharmaceuticals
– build reaction models in a consistent and systematic way incorporating all
data and including all members of the scientific community
Details
– partial support from NSF CyberInfrastructure program
– UC Berkeley CITRIS-hosted project (ww.citris.berkeley.edu)
– Kickoff April, 2006 in Berkeley
Sometime in 2008…
Chemist to PrIMe I have an idea of how to measure the elusive reaction between C14H7
and C3H3 forming C16H8 and CH2. What impact would such a
measurement have on the three competing hypotheses concerning
the nucleation of interstellar dust?
PrIMe to Chemist If the rate coefficient is established to within 3% accuracy, I will be
able to discriminate between hypotheses A and B.
Chemist to PrIMe I do not think my experiment can attain better that 10% accuracy.
What is the next best thing I can do experimentally to advance
knowledge of this subject?
PrIMe to Chemist Measure the reaction between C10H7 and C3H2; I can then
discriminate between hypotheses B and C.
Sometime in 2010…
Engineer to PrIMe What fueling rate produces peak output power while holding NOx
yields within the EPA prescribed limits in a HCCI engine running GTL
prescribed fuel #22 with design and operating parameters: xx, yy, ...
PrIMe to Engineer …
Sometime in 2020…
Policymaker to PrIMe How much longer will there be an Antarctic ozone hole?
PrIMe to Policymaker …
PrIMe
Contributors
elements
species
Theoretician
Experimentalist
reactions
Physical Modeler
Scientific Computing
experiments
Numerical Analyst
numerical
models
tools
PrIMe
elements
Update referable
analysis archive
species
reactions
experiments
numerical
models
Relevant to
conditions
Associated with
experiments
Parameter
tools ID
User: “Need an ODE
chemistry model suitable for
natural gas at … conditions”
PrIMe
elements
Update referable
analysis archive
species
reactions
experiments
numerical
models
Parameter
tools ID
Specified by
user
Consistency
Analysis
Associated with
experiments
CA alg RFTA.03B
Some specification of the
consistency analysis (type, and
exact codes) for repeatability
User: “Check joint
consistency of these
experiments/models”
PrIMe
elements
Update referable
analysis archive
species
reactions
experiments
numerical
models
Specified by user
Associated with
experiments
NOAA model
prediction
tools
User: “Predict range of ozone
concentration at 40Km using
NOAA model, using
experiments …”
Alliance for Cellular Signaling (AfCS)
Similar origin to GRI Mech – a few people, frustrated by the uncoordinated, tunnel
vision (deliberately leaving out interactions for simplicity sake) of the signaling
community
– spearheaded by Gilman (UT Southwestern Medical Center)
– saw the need for a large-scale examination/treatment of the problem
10 laboratories investigating basic questions in cell signaling
– How complex is signal processing in cells?
– What is the structure and dynamics of the network?
– Can functional modules be defined?
Distributed system ID
Focus is on high quality, community data
Key Advantage of AfCS:
Models, analysis tools not emphasized
– High quality data from single cell type
– All findings/data available (www.signaling-gateway.org)
from Henry Bourne, UCSF “The collaboration itself is the biggest experiment of all.
After all, the scientific
culture of biology is traditionally very individualistic and it will be interesting to see if scientists can work as a large and
complex exploratory expedition.”
(http://www.nature.com/nature/journal/v420/n6916/full/420600a.html)
Vision paper in Nature talks about socialistic aspects of science
(http://www.nature.com/nature/journal/v420/n6916/full/nature01304.html)
Calcium Signaling Application
Together with AfCS scientists, we extracted key, relevant features of calcium
response to create 18 experimental assertions
– Rise time, peak value, fall time
– 6 different stimuli levels
Published models constitute various model assertions
– Goldbeter, Proc. Natl. Acad Sci. 1990
– Wiesner, American J. Physiology. 1996
– Lemon, J. Theor. Biology, 2003
Models are ODEs, each derived from a hypothesized network
Calcium Signaling Application
Results (18 assertions, plus prior info)
– Goldbeter, 6 states, 20 parameters, invalidated 30 minutes
– Wiesner, 8 states, 27 parameters
• 10 node “machine”
Smaller problems, but relatively much harder
than the chemistry analysis
• Invalidated in 2 days
– Lemon, 8 states, 34 parameters
• Feasible points found in ~8 hours
• New data led to invalidation
Conclusion: likely that more proteins and accompanying interactions are
necessary to mathematically describe the signaling pathway.
These tools (eg., model invalidation, model-directed experimentation) were
not part of the original AfCS mission, but the alliance is acquiring an
“appreciation” of modeling and verification.
How are we computing? Invalidation Certificates
Consider invalidating the constraints (prior info, and N dataset units)
The invalidation certificate is a binary tree, with L leaves. At the i’th leaf
– coordinate-aligned cube
– Polynomial/rational functions (“surrogate models”) & error bounds,
which satisfy
– sum of squares certificate proving the emptiness of
Moreover
Caveat: with each Mj relatively complex,
these error bounds are generally heuristic,
implicitly assuming regularity in Mj
How are we computing? Invalidation Certificates
Why do emptiness proofs on the algebraic models?
Easier. The original problem was
In its simplest form, think of Mj(ρ) as the response at a
fixed time, of an ODE model (with parameters ρ) from a
fixed initial condition
Vinter, Prajna, Papachristodoulou,
Doyle, Allgöwer,…
Could derive invalidation certificates directly for the ODEs, in principle
– ODE reachability analysis using barrier (Lyapunov) functions
• eg., ODE solution cannot get within
of
for any value of
– Use sum of squares certificates to bound reachability
– (N&) Sufficient conditions using semidefinite programming
– For the methane/transport models, the SDPs would be almost unimaginably
large
– Perhaps a fresh look could reveal a new approach…
Error bounds: pragmatic issues
Recall, at the i’th leaf
– coordinate-aligned cube
– collection of surrogate models and error bounds,
which satisfy
Error bounds are estimated statistically. They are more likely “reliable” if M
is well-behaved. So, through:
– experience, and
– domain-specific knowledge
the scientist is responsible to design/select experiments/features that
– are measurable in the lab
– feature model is well-behaved over the parameter space, and
– show sensitivity to some coordinates of the parameter space
Random experimental investigations could lead nowhere, and even break
the analysis… therefore…
Prudent experiment selection is critical to success
Summary: How are we computing?
Transforming real models to polynomial/rational models
– Large-scale computer “experimentation” on
• Random sampling and sensitivity calculations to determine active
parameters
• Latin Hypercube experiment design on active parameter cube
– Polynomial or rational fit
– Assess residuals, account for fit error
Assertions become polynomial/rational inequality constraints
Most analysis is optimization subject to these constraints
– S-procedure, sum-of-squares (emptiness proofs, outer bounds)
• Outer bounds are also interpreted as solutions to the original problem
when cost is an expected value, constraints are only satisfied on average,
and the decision variable is a random variable.
– Constrained nonlinear optimization for inner bounds
• Stochastic interpretation of outer bounds can aid search
– Branch & Bound to eliminate ambiguity due to fit errors
– Overall, straightforward and brute force, parallelizes rather easily
Papers
Dissemination
Ryan Feeley, Michael Frenklach, Matt Onsum, Trent Russi, Adam Arkin and Andy Packard,
“Model Discrimination using Data Collaboration,” to appear, J. Physical Chemistry A, 2006.
Greg Smith, Michael Frenklach, Ryan Feeley, Andy Packard and Pete Seiler, “A System
Analysis Approach to Atmospheric Observations and Models: the Mesospheric HOx
Dilemma,” to appear J. Geophys. Res. (Atmospheres), 2006.
Pete Seiler, Michael Frenklach, Andy Packard and Ryan Feeley, “Numerical approaches for
collaborative data processing,” to appear Optimization and Engineering, Kluwer, 2006.
Ryan Feeley, Pete Seiler, Andy Packard and Michael Frenklach, “Consistency of a reaction
data set,” J. Physical Chemistry A, vol. 108, pp. 9573-9583, 2004.
Michael Frenklach, Andy Packard, Pete Seiler and Ryan Feeley, “Collaborative data
processing in developing predictive models of complex reaction systems,” International
Journal of Chemical Kinetics, vol. 36, issue 1, pp. 57-66, 2004.
Michael Frenklach, Andy Packard and Pete Seiler, “Prediction uncertainty from models and
data,” 2002 American Control Conference, pp. 4135-4140, Anchorage, May 8-10, 2002.
Project website
Slides, drafts, notes, proposal and related links, etc can be found at
http://jagger.me.berkeley.edu/~pack/nsfuncertainty
Collaborators
Pete Seiler (UCB, Honeywell)
Adam Arkin and Matt Onsum (UCB)
Greg Smith (SRI)
GRI-Mech Team: Michael Frenklach, Hai Wang, Michael Goldenberg, Nigel Moriarty, Boris
Eiteener, Bill Gardiner, Huixing Yang, Zhiwei Qin, Tom Bowman, Ron Hanson, David Davidson,
David Golden, Greg Smith, Dave Crossley
PrIMe Team:
UCB: Michael Frenklach, Andy Packard, Zoran Djurisic, Ryan Feeley, Trent Russi
Stanford: David Golden, Tom Bowman, …
Utah: Phil Smith, Julio Facelli, Ron Price,…
MIT: Bill Green, Greg McRae, …
EU: Mike Pilling, …
NIST: Tom Allision, Greg Rosasco, …
ANL: Branko Ruscic, …
CMCS
…
Support from CITRIS as well as NSF grants:
CTS-0113985 (ITR, 2001-2006)
CHE-0535542 (CyberInfrastructure, 2005-2010)
Collaborative Science: Conclusions
GRI Mech, AfCS and PrIMe are domain specific examples
Requires
–Data sharing/contributions
–Model sharing/contributions
–Math tools sharing/contributions
Benefits
Is a rich, large-scale, practical
problem that improves the
consistency in which scientific
results are used to make decisions
and set policy.
–Scalable distributed version of the scientific method
–Roadmap to reliable prediction
–Information transfer between disciplines and scales
Present challenges
–Community involvement and participation
–Privacy versus Open/Community
Analyzing proprietary data
–Convenient infrastructure
–Math analysis methods

No Slide Title

Transcript No Slide Title

Directory