Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Download Report

Transcript Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Pegasus: Mapping Scientific
Workflows onto the Grid
Ewa Deelman
Center for Grid Technologies
USC Information Sciences Institute
Pegasus Acknowledgements


Ewa Deelman, Carl Kesselman, Saurabh Khurana,
Gaurang Mehta, Sonal Patil, Gurmeet Singh, MeiHui Su, Karan Vahi (Center for Grid Computing,
ISI)
James Blythe, Yolanda Gil (Intelligent Systems
Division, ISI)

Collaboration with Miron Livny (UW Madison)

http://pegasus.isi.edu

Research funded as part of the NSF GriPhyN, NVO
and SCEC projects and EU-funded GridLab
Ewa Deelman
Information Sciences Institute
Outline

Workflow Management in Grids

Pegasus, Planning for Execution in Grids

Applications Using Pegasus

In-time planning

Future Research Directions
Ewa Deelman
Information Sciences Institute
Grid Applications

Increasing in the level of complexity
Use of individual application components
Reuse of individual intermediate data products (files)
Description of Data Products using Metadata Attributes

Execution environment is complex and very dynamic







Resources come and go
Data is replicated
Components can be found at various locations or staged
in on demand
Separation between


the application description
the actual execution description
Ewa Deelman
Information Sciences Institute
Application Development and Execution Process
Abstract
Workflow
Generation
FFT
Application
Component
Selection
ApplicationDomain
Specify a
Different
Workflow
Concrete
Workflow
Generation
FFT filea
Resource Selection
Data Replica Selection
Transformation Instance
Selection
Abstract
Workflow
Pick different Resources
transfer filea from host1://
home/filea
to host2://home/file1
/usr/local/bin/fft /home/file1
DataTransfer
Concrete
Workflow
host1
host2
host2
Retry
Data
Data
Execution
Environment
Ewa Deelman
Failure Recovery
Method
Information Sciences Institute
Why Automate Workflow Generation?

Usability: Limit User’s necessary Grid knowledge



Complexity:

User needs to make choices






Alternative application components
Alternative files
Alternative locations
The user may reach a dead end
Many different interdependencies may occur among
components
Solution cost:

Evaluate the alternative solution costs




Monitoring and Directory Service
Replica Location Service
Performance
Reliability
Resource Usage
Global cost:


minimizing cost within a community or a virtual organization
requires reasoning about individual user’s choices in light of
other user’s choices
Ewa Deelman
Information Sciences Institute
GriPhyN’s
Executable Workflow Construction



Build an abstract workflow based on VDL
descriptions (Chimera)
Build an executable workflow based on the
abstract workflows (Pegasus)
Execute the workflow (Condor’s DAGMan)
RLS
TC
Abstract
Worfklow
VDL
Chimera
Concrete
Workflow
Pegasus
Ewa Deelman
MDS
Jobs
DAGMan
Information Sciences Institute
VDL and Abstract Workflow
a
d1
b
VDL descriptions
b
d2
c
User request data file “c”
a
Abstract Workflow
Ewa Deelman
d1
b
d2
c
Information Sciences Institute
Condor’s DAGMan

Developed at UW Madison (Livny)

Executes a concrete workflow

Makes sure the dependencies are followed

Execute the jobs specified in the workflow


Execution

Data movement

Catalog updates
Provides a “rescue DAG” in case of failure
Ewa Deelman
Information Sciences Institute
Pegasus:
Planning for Execution in Grids

Maps from abstract to concrete workflow


Algorithmic and AI-based techniques
Automatically locates physical locations for both
components (transformations) and data

Finds appropriate resources to execute

Reuses existing data products where applicable

Publishes newly derived data products

Chimera virtual data catalog

Provides provenance information
Ewa Deelman
Information Sciences Institute
Information Components
Used by Pegasus

Globus Monitoring and Discovery Service
(MDS)
Locates available resources
 Finds resource properties




Dynamic: load, queue length
Static: location of gridftp server, RLS, etc
Globus Replica Location Service
Locates data that may be replicated
 Registers new data products


Transformation Catalog

Locates installed executables
Ewa Deelman
Information Sciences Institute
Example Workflow Reduction

Original abstract workflow
a

b
d1
d2
c
If “b” already exists (as determined by query to
the RLS), the workflow can be reduced
b
Ewa Deelman
d2
c
Information Sciences Institute
Mapping from abstract to concrete
b

d2
c
Query RLS, MDS, and TC, schedule
computation and data movement
Move b
from A
to B
Execute
d2 at B
Ewa Deelman
Move c
from B
to U
Register
c in the
RLS
Information Sciences Institute

Montage (NASA and
NVO)



Montage
Deliver science-grade
custom mosaics on
demand
Produce mosaics from a
wide range of data
sources (possibly in
different spectra)
User-specified
parameters of
projection, coordinates,
size, rotation and
spatial sampling.
Mosaic created by Pegasus based Montage from a run of
the M101 galaxy images on the Teragrid.
Ewa Deelman
Information Sciences Institute
Small Montage Workflow
~1200 nodes
Ewa Deelman
Information Sciences Institute
Montage Acknowledgments




Bruce Berriman, John Good, Anastasia Laity,
Caltech/IPAC
Joseph C. Jacob, Daniel S. Katz, JPL
http://montage.ipac. caltech.edu/
Testbed for Montage: Condor pools at USC/ISI, UW
Madison, and Teragrid resources at NCSA, PSC,
and SDSC.
Montage is funded by the National Aeronautics and
Space Administration's Earth Science Technology
Office, Computational Technologies Project, under
Cooperative Agreement Number NCC5-626
between NASA and the California Institute of
Technology.
Ewa Deelman
Information Sciences Institute
Applications Using
Chimera, Pegasus and DAGMan


GriPhyN applications:

High-energy physics: Atlas, CMS (many)

Astronomy: SDSS (Fermi Lab, ANL)

Gravitational-wave physics: LIGO (Caltech, AEI)
Astronomy:


Biology


Galaxy Morphology (NCSA, JHU, Fermi, many
others, NVO-funded)
BLAST (ANL, PDQ-funded)
Neuroscience

Tomography for Telescience(SDSC, NIH-funded)
Ewa Deelman
Information Sciences Institute
Current System
Pegasus(Abstract
Workflow)
Concrete Worfklow
DAGMan(CW))
Original Abstract
Workflow
Current Pegasus
Ewa Deelman
Workflow Execution
Information Sciences Institute
Workflow Refinement and execution
User’s
Request
Workflow
refinement
Levels of
abstraction
Application
-level
knowledge
Logical
tasks
Tasks
bound to
resources
and sent for
execution
Relevant
components
Policy
info
Workflow
repair
Full
abstract
workflow
Task
matchmaker
Not yet
executed
Ewa Deelman
Partial
execution
executed
time
Information Sciences Institute
Incremental Refinement

Partition Abstract workflow into partial
workflows
PW A
PW B
PW C
A Particular Partitioning
Ewa Deelman
New Abstract
Workflow
Information Sciences Institute
Meta-DAGMan
Pegasus(A)
Su(A)
DAGMan(Su(A))
Pegasus(B)
Su(B)
DAGMan(Su(B))
Pegasus(X) –Pegasus generates
the concrete workflow and the
submit files for X = Su(X)
DAGMan(Su(X))—DAGMan executes
the concrete workflow for X
Ewa Deelman
Pegasus(C)
Su(C)
DAGMan(Su(C))
Information Sciences Institute
Conclusions



Pegasus maps complex workflows onto the
Grid
Uses Grid information services to find
resources, data and executables
Reduces the workflow based on existing
intermediate products

Used in many applications

Part of GriPhyN’s Virtual Data Toolkit
Ewa Deelman
Information Sciences Institute
Future Directions



Investigate various scheduling techniques
Investigating fault tolerance issues
Enable flexible interactions between workflow
refiners (GriPhyN-wide scope: Pegasus,
DAGMan)
http://pegasus.isi.edu


GGF10 workshop on workflow management
GGF Workflow management research group
[email protected]
Ewa Deelman
Information Sciences Institute
Summary:
The Grid Now
The Future Grid


Syntax-based
matchmaking of
resources to job
requirements



Scheduling of jobs based
on Grid-able users that
specify job execution
sequences and
computing requirements





Condor matchmaker
Attribute based
discovery and selection




More agility and coordination
Wide range of users can
specify high level
requirements in a mixedinitiative mode


Semantic matchmaking
Aggregate resource reasoning
Task-level reasoning to plan
and schedule jobs and
resources

Scripting languages
Workflow languages,
Task graphs
Explicit mappings from
task to jobs, simple job
brokers
Explicit service
Ewa Deelman
Knowledge-based reasoning
about resources enables
Mapping of high-level
requirements to details
required for execution
End-to-end resource
Information
Institute
negotiation
and Sciences
adaptive