Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan.

Download Report

Transcript Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan.

Pegasus
A Framework for Workflow
Planning on the Grid
Ewa Deelman
USC Information Sciences Institute
Pegasus Acknowledgments:
Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan Vahi
Pegasus


Flexible framework, maps abstract
workflows onto the Grid
Possess well-defined APIs and clients for:
– Information gathering
> Resource information
> Replica query mechanism
> Transformation catalog query mechanism
– Resource selection
> Compute site selection
> Replica selection
– Data transfer mechanism

Can support a variety of workflow executors
pegasus.isi.edu
Ewa Deelman
Pegasus



Job
a
May reduce the workflow based on
available data products
Augments the workflow with data stage-in
and data stage-out
Augments the workflow with data
registration
Job
b
Job
c
Job f
Job
d
Job
e
Job
g
Job
h
Job
a
Job
d
Job
b
Job
e
Job
g
Job
c
Job
f
Job
h
Job i
Job i
pegasus.isi.edu
Ewa Deelman
KEY
The original node
Pull transfer node
Registration node
Push transfer node
Inter-pool transfer node
Rls-client
Genpoolconfig
client
Tc-client
Replica Query
and Registration
Mechanism
Transformation
Catalog
Mechanism
(TC)
RLS
File
Resource
Information
Catalog
Database
MDS
CPlanner
(gencdag)
Replica Selection
Site Selector
RLS
Round
Robin
File
Min-Min
Random
Grasp
Prophesy
PEGASUS
ENGINE
Data Transfer
Mechanism
Stork
Gridlab
transfer
Submit Writer
Transfer2
Globusurl-copy
Multiple
Transfer
Existing Interfaces
Production Implementations
Interfaces in development
Research Implementations
Condor
Pegasus command line clients
Pegasus Components
pegasus.isi.edu
Ewa Deelman
GridLab
GRMS
Stork
Writer
Max-Min
Original Pegasus configuration
Pegasus(Abstract
Workflow)
Concrete Worfklow
DAGMan(CW))
Original Abstract
Workflow
Original Pegasus
Configuration
Workflow Execution
Simple scheduling: random or round robin
using well-defined scheduling interfaces.
pegasus.isi.edu
Ewa Deelman
Deferred Planning through Partitioning
PW A
PW B
PW C
A Particular Partitioning
A variety of planning algorithms can be implemented
pegasus.isi.edu
Ewa Deelman
New Abstract
Workflow
Pegasus(A) = Su(A)
DAGMan(Su(A))
Pegasus(X): Pegasus
generated the concrete
workflow and the submit
files for Partition X -Su(X)
Pegasus(B) = Su(B)
DAGMan(Su(X): DAGMan
executes the concrete
workflow for X
Mega DAG is created
by Pegasus and then
submitted to DAGMan
pegasus.isi.edu
DAGMan(Su(B))
Pegasus(C) = Su(C)
DAGMan(Su(C))
Ewa Deelman
Re-planning
capabilities
Pegasus(A) = Su(A)
DAGMan(Su(A))
Pegasus(X): Pegasus
generated the concrete
workflow and the submit files for
Partition X --- Su(X)
Retry Y times
Pegasus(B) = Su(B)
DAGMan(Su(X): DAGMan
executes the concrete
workflow for Partition X
DAGMan(Su(B))
Retry Y times
Pegasus(C) = Su(C)
DAGMan(Su(C))
Retry Y times
pegasus.isi.edu
Ewa Deelman
Pegasus’ Log files record
sites considered
Complex Replanning for Free (almost)
Pegasus(A) = Su(A)
Move f1
to R2
DAGMan(Su(A))
Retry Y times
Execute C at R2
failure
A
f1
f1
B
Move
f3 to R1
Move f2
to R1
Move
f3 to R1
Move f2
to R1
f1
C
Execute D at R1
Execute D at R1
D
Move
f4 to
Output
location
Move
f4 to
Output
location
Pegasus mapping, f2
and f3 were found in
a replica catalog
Workflow submitted
to DAGMan
f4
Original abstract
workflow partition
pegasus.isi.edu
f1
B
f2
f3
f2
Move
f3 to R1
A
C
f2
D
Execute D at R1
f3
Move
f4 to
Output
location
f4
Pegasus is called
again with original
partition
Ewa Deelman
New mapping, here
assuming R1 was
picked again
Optimizations

If the workflow being refined by Pegasus
consists of only 1 node
– Create a condor submit node rather than a
dagman node
– This optimization can leverage Euryale’s
super-node writing component
pegasus.isi.edu
Ewa Deelman
Planning & Scheduling Granularity

Partitioning
– Allows to set the granularity of planning ahead

Node aggregation
– Allows to combine nodes in the workflow and schedule
them as one unit (minimizes the scheduling overheads)
– May reduce the overheads of making scheduling and
planning decisions

Related but separate concepts
– Small jobs
> High-level of node aggregation
> Large partitions
– Very dynamic system
> Small partitions
pegasus.isi.edu
Ewa Deelman

Montage (NASA and NVO)
– Deliver science-grade custom
mosaics on demand
– Produce mosaics from a wide
range of data sources
(possibly in different spectra)
– User-specified parameters of
projection, coordinates, size,
rotation and spatial sampling.




Montage
Bruce Berriman, John Good, Anastasia Laity,
Caltech/IPAC
Joseph C. Jacob, Daniel S. Katz, JPL
Doing large: 6 and 10 degree
dags (for the m16 cluster).
The 6 degree runs had about
13,000 compute jobs and the
10 degree run had about
40,000 compute jobs
Mosaic created by Pegasus based Montage from a run of
the M101 galaxy images on the Teragrid.
pegasus.isi.edu
Ewa Deelman
Montage Workflow
mProject1
3
2
1
Final Mosaic
mProject2
1
mDiff1 2
mProject3
mAdd
1
2
3
mDiff2 3
mBackground1
2
3
mBackground2
mBackground3
Data Stage in nodes
D12
D23
a1 x + b1 y + c1 = 0
a2 x + b2 y + c2 = 0
a3 x + b3 y + c3 = 0
mFitplaneD12
mFitplaneD23
Montage compute nodes
Data stage out nodes
Inter pool transfer nodes
mBgModel
ax + by + c = 0
dx + ey + f = 0
pegasus.isi.edu
Ewa Deelman
Future work

Staging in executables on demand

Expanding the scheduling plug-ins


Investigating various partitioning
approaches
Investigating reliability across partitions
pegasus.isi.edu
Ewa Deelman

Non-GriPhyN applications using
Pegasus
Galaxy Morphology
(National Virtual
Observatory)
– Investigates the dynamical
state of galaxy clusters
– Explores galaxy evolution
inside the context of
large-scale structure.
– Uses galaxy morphologies
as a probe of the star
formation and stellar
distribution history of the
galaxies inside the
clusters.
– Data intensive
computations involving
hundreds of galaxies in a
cluster
The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the
positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue
dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the
most symmetric, indicative of elliptical galaxies, are concentrated more toward the center.
pegasus.isi.edu
Ewa Deelman
BLAST: set of sequence comparison algorithms that are used to
search sequence databases for optimal local alignments to a query
2 major runs were performed using
Chimera and Pegasus:
1) 60 genomes (4,000 sequences each),
In 24 hours processed Genomes selected
from DOE-sponsored sequencing
projects
67 CPU-days of processing time
delivered
~ 10,000 Grid jobs
>200,000 BLAST executions
50 GB of data generated
2) 450 genomes processed
Speedup of 5-20 times were achieved
because the compute nodes we used
efficiently by keeping the submission
of the jobs to the compute cluster
constant.
Lead by Veronika Nefedova (ANL) as part of the
PACI Data Quest Expedition program
pegasus.isi.edu
Ewa Deelman
Biology Applications (cont’d)
Tomography (NIH-funded
project)

Derivation of 3D structure from a
series of 2D electron microscopic
projection images,

Reconstruction and detailed
structural analysis
– complex structures like
synapses
– large structures like dendritic
spines.


Acquisition and generation of
huge amounts of data
Large amount of state-of-the-art
image processing required to
segment structures from
extraneous background.
Dendrite structure to be rendered by
Tomography
Work performed by Mei Hui-Su with Mark Ellisman, Steve Peltier,
Abel Lin, Thomas Molina (SDSC)
pegasus.isi.edu
Ewa Deelman
Southern California Earthquake Center
CAT Knowledge
Base
SCEC Datatype
DB
Compositional
Analysis Tool
(CAT)
Pathway
Composition
Tool
Grid-Based
Data Selector
Replica
Location
Service
Metadata
Catalog Service
DAX
Generator
Dax
The SCEC/IT project, funded by
(NSF), is developing a new
framework for physics-based
simulations for seismic hazard
analysis building on several
information technology areas,
including knowledge
representation and reasoning,
knowledge acquisition, grid
computing, and digital libraries.
Pegasus
Dag
Condor
DAGMAN
host1
HAZARD MAP
host2
GRID
Data
Rsl
Data
People involved: Vipin Gupta, Phil Maechling (USC)
pegasus.isi.edu
Ewa Deelman