PNC, “Collaboration: Tools and Infrastructure” December 7, 2012 PrIMe: Integrated Infrastructures for Data and Analysis Michael Frenklach Supported by AFOSR, Fung.
Download ReportTranscript PNC, “Collaboration: Tools and Infrastructure” December 7, 2012 PrIMe: Integrated Infrastructures for Data and Analysis Michael Frenklach Supported by AFOSR, Fung.
PNC, “Collaboration: Tools and Infrastructure” December 7, 2012 PrIMe: Integrated Infrastructures for Data and Analysis Michael Frenklach Supported by AFOSR, Fung • IMPACT ON SOCIETY – Energy (power plants, car and jet engines, rockets, …) – Defense (engines, rockets, …) – Environment (pollutants, global modeling, …) – Space exploration – Astrophysics – Material synthesis • ESTABLISHED PRACTICE OF COLLABORATION – Across different disciplines – Across different countries • THERE IS AN ACCUMULATING EXPERIMENTAL PORTFOLIO • THEORY/MODELING LINKS FUNDAMENTAL TO APPLIED LEVEL experiments theory individual reactions model reduction model analysis sensitivity reaction path … mechanism of: numerical simulations ignition laminar flames NOx soot ... Methane Combustion: CH4 + 2 O2 CO2 + 2 H2O 1970’s: 15 reactions, 12 species 1980’s: 75 reactions, 25 species 1990’s: 300+ reactions, 50+ species Larger molecular-size fuels: 2000’s: 1,000+ reactions, 100+ species 2010’s: 10,000+ reactions, 1000+ species Methane Combustion: CH4 + 2 O2 CO2 + 2 H2O and yet The networks are complex, but the governing equations (rate laws) are known Uncertainty exists, but much is known where the uncertainty lies (rate parameters) Numerical simulations with parameters fixed to certain values may be performed “reliably” There is an accumulating experimental portfolio on the system Methane Combustion: but still Lack of predictability Lack of consensus CH4 + 2 O2 CO2 + 2 H2O • current inability of truly predictive modeling – conflicting data in/among sources – poor documentation of data/models – no uncertainty reporting or analysis – not much focus on integration of data • resistance to data sharing – no personal incentives – no easy-to-use technology • no recognition of the problem • models are not additive • data are not additive • need a system for synthesis of data PrIMe http://primekinetics.org Process Informatics Model Data sharing App sharing Automation • registered members • countries • data records • apps ~400 ~15 ~100,000 ~20 • active “players” − UCB (lead), NSCU, Stanford, MIT, Cambridge, KAUST, Tsinghua PrIMe Warehouse Portal Workflow Assess to distributed resources “Browser-based” software User authorization User building projects Social networking Data/app linking User forums Binary XML interfaces Data evaluation panels Remote-server support Help, tutorials, examples Project sharing Customized Drupal (PHP) C#, Windows, IE WebDAV apps: C#, Matlab XML platform independent Data collections Models and Experiments Controlled by schemas Submission forms Multiple-mode access DATA ORGANIZATION: • conceptual abstraction • practical realization Chemical Kinetics Model composed of Chemical Reactions have rate law data -parameter values -uncertainties -reference composed of Chemical Species composed of thermo data transport data Chemical Elements have atomic masses combustion modeling quantum chemistry -reactions thermo molecular structure thermosciences spectra absorption coefficient diagnostics Experimental Record • reference • apparatus • conditions • observations – inner: XML – remote: HDF5, … • uncertainties • additional items – links, docs, … – video files, … archival record Data Attribute (QOI, ‘target’) a specific feature extracted for modeling: – peak value – peak location – induction time – ratio of peaks (from multiple experiments) … VVUQ data Initial Model: “Upload your data to PrIMe Warehouse” (“give me your data”) New, Distributed Model: “You may, if choose, connect your data to the communal system” • with a switch in the OFF position: “you can use the communal data and tools but your own data is private to you only” • “but please flip the switch to the ON position when you are ready to share your own data” “Connect your code to the communal system” - you control your own code: • release version • user access, licenses • collect fees, if desired Remote server app—PrIMe Web Services (PWS) • no restrictions on platform • no restrictions on data formats • no restrictions on local programming language(s) PrIMe Workflow Interface (PWI) is the only “standard” • developed, maintained, and controlled by the community PrIMe Dispatcher PrIMe Data Flow Network client machine PrIMe web services client data excessively large data sets • do not move the data • but use “smart agents” (eg, HTML5 walkers) web services with user-reloaded tasks: fetch data features for user-requested analysis user specifies conditions of interest workflow project workflow component performs: • retrieves the pertinent kinetics model (via link in the dataset) • performs simulations on the fly for the conditions specified and builds a new surrogate model • performs UQ analysis combining the new surrogate model with the archived ones and the rest of the pertinent data • reports results workflow component retrieves archived data: a set of relevant targets target values and their uncertainty ranges surrogate models developed for relevant targets active variables and their uncertainty ranges data warehouse user specifies a new set of data workflow project workflow component performs: • retrieves the pertinent kinetics model (via link in the dataset) • performs simulations on the fly for the new data and builds a new surrogate model • performs UQ analysis combining the new surrogate model with the archived ones and the rest of the pertinent data • reports results • adds the new data to the dataset and archives in Warehouse workflow component retrieves archived data: a set of relevant targets target values and their uncertainty ranges surrogate models developed for relevant targets active variables and their uncertainty ranges data warehouse enrichment •What causes/skews model predictiveness? •Are there new experiments to be performed, old repeated, theoretical studies to be carried out? •What impact could a planned experiment have? •What is the information content of the data? •What would it take to bring a given model to a desired level of accuracy? from algorithm-centric view to data-centric view input data code output data