GRADD: Scientific Workflows Scientific Workflow E. Science laboris • Workflows are the new rock and roll of eScience • Machinery for coordinating the execution of.
Download ReportTranscript GRADD: Scientific Workflows Scientific Workflow E. Science laboris • Workflows are the new rock and roll of eScience • Machinery for coordinating the execution of.
GRADD: Scientific Workflows Scientific Workflow E. Science laboris • Workflows are the new rock and roll of eScience • Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources. • Era of service oriented apps (SOA) • Repetitive and mundane boring tasks made easier (data cleaning...) • Facilitates sharing of science Trident Scientific Workflow Workbench • Visually program workflows, through a web browser • Libraries of activities, workflows and services – Social annotations and search • Abstract parallelism, for HPC & many core (CCR) • Adaptive workflows, to detect and respond to events • Automatic provenance capture, open provenance model • Costing model, resources include time, power, data xfer • Integrated data storage and access • Integrated visualization tools • Fault tolerance, facilitate smart reruns, what-if analysis • Factory scheduling of workflows Trident Implementation Built on top of industrial workflow engine Windows Workflow Foundation – Workflow in a general purpose framework – Part of Microsoft’s .NET Framework 3.5 Trident Logical Architecture Trident Logical Architecture Domain specific custom activities Visualization Design Workflow Packages Visual Workflow Designer Runtime Scientific Workflows Portal: Administration Console Workbench Windows Workflow Foundation Trident Runtime Services Service Registry Community Sharing and commenting on workflows, services, and data sources Workflow Monitor Provenance Workflow Launcher Archiving Runtime Services • Provenance • Fault Tolerance • HPC Scheduling Service • Monitoring Service Fault Tolerance HPC Scheduling Service Monitoring Service Data Access Data Object Model (Database Agnostic Abstraction) SQL Server, Cloud DB, and others Registry Runtime Admin Tools Community Site Activities: An Extensible Approach Base Activity Library Custom Activity Libraries Domain-Specific Workflow Packages Rosetta net Compose activities Out-of-Box Activities Biology Extend activity Read from Sensor CRM Oceanography OOB activities, workflow types, Create/Extend/ Compose activities Domain-specific activities General-purpose Read from sensors, Data pipelines, etc. Domain specific workflow packages - oceanography Basic workflow constructs First-class citizens Trident Workflow Designer Visually compose, search and archive (share) Workflow Execution Provenance Scientists routinely record the provenance of bench experiments in lab notebooks – this is essential for computational experiments as well. For a workflow management system, provenance identifies what activities were executed, parameters supplied at runtime, data passed between activities, intermediate results generated, etc • Explain how a workflow result was created – sufficient to establish trust; • Provides a replication recipe; • Guide development of future experiments; Provenance in Trident Enactment engine documents all steps linking original inputs with final result so execution can be verified, reproduced or rerun – provenance is a first class data product in Trident… Provenance capture is automatic and transparent Will persist provenance data for a fixed period of time. Supports multiple levels of representation. Storage provided by underlying system Interface to query and reason over provenance data. Efficient storage representation and query performance. Trident Registry Applications and Scientists need a Curated Registry of Services Just having a workflow system isn’t enough and it’s not just about workflows... Note: Registry, not repository Services are hosted elsewhere A Curated Registry of Services (and…) Registry of Data Products (and…) Registry of Provenance