PNC, “Collaboration: Tools and Infrastructure” December 7, 2012 PrIMe: Integrated Infrastructures for Data and Analysis Michael Frenklach Supported by AFOSR, Fung.

Download Report

Transcript PNC, “Collaboration: Tools and Infrastructure” December 7, 2012 PrIMe: Integrated Infrastructures for Data and Analysis Michael Frenklach Supported by AFOSR, Fung.

PNC, “Collaboration: Tools and Infrastructure”
December 7, 2012
PrIMe:
Integrated Infrastructures for
Data and Analysis
Michael Frenklach
Supported by AFOSR, Fung
• IMPACT ON SOCIETY
– Energy (power plants, car and jet engines, rockets, …)
– Defense (engines, rockets, …)
– Environment (pollutants, global modeling, …)
– Space exploration
– Astrophysics
– Material synthesis
• ESTABLISHED PRACTICE OF COLLABORATION
– Across different disciplines
– Across different countries
• THERE IS AN ACCUMULATING EXPERIMENTAL PORTFOLIO
• THEORY/MODELING LINKS FUNDAMENTAL TO APPLIED LEVEL
experiments
theory
individual reactions
model reduction
model
analysis
sensitivity
reaction path
…
mechanism of:
numerical simulations
ignition
laminar flames
NOx
soot
...
Methane Combustion:
CH4 + 2 O2  CO2 + 2 H2O
1970’s: 15 reactions, 12 species
1980’s: 75 reactions, 25 species
1990’s: 300+ reactions, 50+ species
Larger molecular-size fuels:
2000’s: 1,000+ reactions, 100+ species
2010’s: 10,000+ reactions, 1000+ species
Methane Combustion:
CH4 + 2 O2  CO2 + 2 H2O
and yet
The networks are complex, but the
governing equations (rate laws) are
known
Uncertainty exists, but much is
known where the uncertainty lies
(rate parameters)
Numerical simulations with
parameters fixed to certain values
may be performed “reliably”
There is an accumulating
experimental portfolio on the
system
Methane Combustion:
but still
Lack of predictability
Lack of consensus
CH4 + 2 O2  CO2 + 2 H2O
• current inability of truly predictive modeling
– conflicting data in/among sources
– poor documentation of data/models
– no uncertainty reporting or analysis
– not much focus on integration of data
• resistance to data sharing
– no personal incentives
– no easy-to-use technology
• no recognition of the problem
• models are not additive
• data are not additive
• need a system for synthesis of data
PrIMe
http://primekinetics.org
Process Informatics Model
 Data sharing
 App sharing
 Automation
• registered members
• countries
• data records
• apps
~400
~15
~100,000
~20
• active “players”
− UCB (lead), NSCU, Stanford, MIT, Cambridge, KAUST, Tsinghua
PrIMe
Warehouse
Portal
Workflow
Assess to distributed resources
“Browser-based” software
User authorization
User building projects
Social networking
Data/app linking
User forums
Binary XML interfaces
Data evaluation panels
Remote-server support
Help, tutorials, examples
Project sharing
Customized Drupal
(PHP)
C#, Windows, IE
WebDAV
apps: C#, Matlab
XML
platform independent
Data collections
Models and Experiments
Controlled by schemas
Submission forms
Multiple-mode access
DATA ORGANIZATION:
• conceptual abstraction
• practical realization
Chemical Kinetics Model
composed of
Chemical Reactions have
rate law data
-parameter values
-uncertainties
-reference
composed of
Chemical Species
composed of
thermo data
transport data
Chemical Elements
have
atomic masses
combustion modeling
quantum chemistry
-reactions
thermo
molecular
structure
thermosciences
spectra
absorption
coefficient
diagnostics
Experimental Record
• reference
• apparatus
• conditions
• observations
– inner: XML
– remote: HDF5, …
• uncertainties
• additional items
– links, docs, …
– video files, …
archival record
Data Attribute (QOI, ‘target’)
a specific feature extracted
for modeling:
– peak value
– peak location
– induction time
– ratio of peaks
(from multiple experiments) …
VVUQ data
Initial Model:
“Upload your data to PrIMe Warehouse” (“give me your data”)
New, Distributed Model:
“You may, if choose, connect your data to the communal system”
• with a switch in the OFF position: “you can use the
communal data and tools but your own data is private to
you only”
• “but please flip the switch to the ON position when you are
ready to share your own data”
“Connect your code to the communal system”
- you control your own code:
• release version
• user access, licenses
• collect fees, if desired
Remote server app—PrIMe Web Services (PWS)
• no restrictions on platform
• no restrictions on data formats
• no restrictions on local programming language(s)
PrIMe Workflow Interface (PWI) is the only “standard”
• developed, maintained, and controlled by the community
PrIMe Dispatcher
PrIMe Data Flow Network
client machine
PrIMe
web services
client data
excessively large data sets
• do not move the data
• but use “smart agents” (eg, HTML5 walkers)
web services with
user-reloaded tasks:
fetch data features for
user-requested analysis
user
specifies
conditions
of interest
workflow project
workflow component performs:
• retrieves the pertinent kinetics
model (via link in the dataset)
• performs simulations on the fly for
the conditions specified and builds
a new surrogate model
• performs UQ analysis combining
the new surrogate model with the
archived ones and the rest of the
pertinent data
• reports results
workflow component retrieves archived data: a set of relevant targets
target values and their uncertainty ranges
surrogate models developed for relevant targets
active variables and their uncertainty ranges
data warehouse
user
specifies
a new set
of data
workflow project
workflow component performs:
• retrieves the pertinent kinetics
model (via link in the dataset)
• performs simulations on the fly for
the new data and builds a new
surrogate model
• performs UQ analysis combining
the new surrogate model with the
archived ones and the rest of the
pertinent data
• reports results
• adds the new data to the dataset
and archives in Warehouse
workflow component retrieves archived data: a set of relevant targets
target values and their uncertainty ranges
surrogate models developed for relevant targets
active variables and their uncertainty ranges
data warehouse
enrichment
•What causes/skews model predictiveness?
•Are there new experiments to be performed, old
repeated, theoretical studies to be carried out?
•What impact could a planned experiment have?
•What is the information content of the data?
•What would it take to bring a given model to a
desired level of accuracy?
from algorithm-centric view
to data-centric view
input
data
code
output
data