GRADD: Scientific Workflows Scientific Workflow E. Science laboris • Workflows are the new rock and roll of eScience • Machinery for coordinating the execution of.

Download Report

Transcript GRADD: Scientific Workflows Scientific Workflow E. Science laboris • Workflows are the new rock and roll of eScience • Machinery for coordinating the execution of.

GRADD: Scientific Workflows
Scientific Workflow
E. Science laboris
• Workflows are the new rock
and roll of eScience
• Machinery for coordinating the
execution of (scientific)
services and linking together
(scientific) resources.
• Era of service oriented apps
(SOA)
• Repetitive and mundane
boring tasks made easier (data
cleaning...)
• Facilitates sharing of science
Trident
Scientific Workflow Workbench
• Visually program workflows, through a web browser
• Libraries of activities, workflows and services
– Social annotations and search
• Abstract parallelism, for HPC & many core (CCR)
• Adaptive workflows, to detect and respond to events
• Automatic provenance capture, open provenance model
• Costing model, resources include time, power, data xfer
• Integrated data storage and access
• Integrated visualization tools
• Fault tolerance, facilitate smart reruns, what-if analysis
• Factory scheduling of workflows
Trident Implementation
Built on top of industrial workflow engine
Windows Workflow Foundation
– Workflow in a general purpose framework
– Part of Microsoft’s .NET Framework 3.5
Trident
Logical Architecture
Trident Logical Architecture
Domain specific custom activities
Visualization
Design
Workflow Packages
Visual Workflow Designer
Runtime
Scientific Workflows
Portal:
Administration
Console
Workbench
Windows Workflow
Foundation
Trident Runtime Services
Service
Registry
Community
Sharing and
commenting
on workflows,
services, and
data sources
Workflow
Monitor
Provenance
Workflow
Launcher
Archiving
Runtime Services
• Provenance
• Fault Tolerance
• HPC Scheduling Service
• Monitoring Service
Fault Tolerance
HPC Scheduling Service
Monitoring Service
Data Access
Data Object Model (Database Agnostic Abstraction)
SQL Server, Cloud DB, and others
Registry
Runtime Admin Tools
Community Site
Activities:
An Extensible Approach
Base Activity
Library
Custom Activity
Libraries
Domain-Specific
Workflow Packages
Rosetta net
Compose
activities
Out-of-Box
Activities
Biology
Extend
activity
Read from
Sensor
CRM
Oceanography
OOB activities,
workflow types,
Create/Extend/
Compose activities
Domain-specific activities
General-purpose
Read from sensors,
Data pipelines, etc.
Domain specific workflow
packages - oceanography
Basic workflow
constructs
First-class citizens
Trident Workflow Designer
Visually compose, search and archive (share)
Workflow Execution Provenance
Scientists routinely record the provenance of bench
experiments in lab notebooks – this is essential for
computational experiments as well.
For a workflow management system, provenance identifies
what activities were executed, parameters supplied at
runtime, data passed between activities, intermediate
results generated, etc
• Explain how a workflow result was created – sufficient to establish
trust;
• Provides a replication recipe;
• Guide development of future experiments;
Provenance in Trident
Enactment engine documents all steps linking original
inputs with final result so execution can be verified,
reproduced or rerun – provenance is a first class data
product in Trident…
Provenance capture is automatic and transparent
Will persist provenance data for a fixed period of time.
Supports multiple levels of representation.
Storage provided by underlying system
Interface to query and reason over provenance data.
Efficient storage representation and query performance.
Trident Registry
Applications and Scientists need
a Curated Registry of Services
Just having a workflow system isn’t enough
and it’s not just about workflows...
Note: Registry, not repository
Services are hosted elsewhere
A Curated Registry of Services
(and…) Registry of Data Products
(and…) Registry of Provenance