Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu www.isi.edu/~deelman Acknowledgements      Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies,

Download Report

Transcript Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu www.isi.edu/~deelman Acknowledgements      Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies,

Pegasus:
Running Large-Scale
Scientific Workflows on the
TeraGrid
Ewa Deelman
USC
Information Sciences Institute
http://pegasus.isi.edu
www.isi.edu/~deelman
Acknowledgements





Carl Kesselman, Gaurang Mehta, Gurmeet Singh,
Mei-Hui Su, Karan Vahi (Center for Grid
Technologies, ISI)
James Blythe, Yolanda Gil (Intelligent Systems
Division, ISI)
http://pegasus.isi.edu
Research funded as part of the NSF GriPhyN, NVO
and SCEC projects, NIH-funded CRCNS project and
EU-funded GridLab
Thanks for the use of the TeraGrid
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Outline




Applications as workflows
Pegasus (Planning for Execution in Grids)
Montage application (Astronomy,
NSF&NASA)
CyberShake (Southern California Earthquake
Center)


Results from running on the TeraGrid
Conclusions
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Today’s Scientific
Applications

Applications




Increasing in the level of complexity
Use of individual application components
Components are supplied by various individuals
Reuse of individual intermediate data products (files)
Execution environment is complex and very dynamic
 Resources come and go
 Data is replicated
 Components can be found at various locations or staged in on
demand
Separation between
 the application description
 the actual execution description


Applications being described in terms of workflows
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Scientific Analysis
Construct the Analysis
Workflow Evolution
Workflow Template
Select the Input Data
Abstract Worfklow
Map the Workflow onto
Available Resources
Executable Workflow
Execute the Workflow
Ewa Deelman, [email protected]
Tasks to be executed
www.isi.edu/~deelman
pegasus.isi.edu
Grid Resources
Scientific Analysis
Execution Environment
User guided
Construct the Analysis
Component characteristics
Workflow Template
Select the Input Data
Workflow Evolution
Library of
Application
Components
Data
Catalogs
Data properties
Abstract Worfklow
Automated
Map the Workflow onto
Available Resources
Resource availability and
characteristics
Information
Services
Executable Workflow
Execute the Workflow
Ewa Deelman, [email protected]
Tasks to be executed
www.isi.edu/~deelman
pegasus.isi.edu
Grid Resources
Executable Workflow
Generation and Mapping
Intelligent
Workflow
Composition
tools ( WINGS
and CAT)
(Nat. Lang.
Proc)
Virtual Data
Language
(VDL)
(GTOMO,
HEP, Biology,
others)
Applicationspecific
Abstract
Workflow
Service (LIGO,
SCEC,
Montage)
Results
Abstract Workflow
Applicationdependent
Application
independent
Pegasus
Executable
Workflow
Condor
DAGMan
jobs
Grid Resources
WINGS and CAT, developed at ISI by Y. Gil,
VDL, developed at ANL & Uof C by I. Foster, J. Voeckler & M. Wilde
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Pegasus:
Planning for Execution in Grids





Maps from abstract to executable workflow
Automatically locates physical locations for
both workflow components and data
Finds appropriate resources to execute the
components
Reuses existing data products where
applicable
Publishes newly derived data products

Provides provenance information
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Information Components
used by Pegasus

Globus Monitoring and Discovery Service
(MDS) (or static file)


Locates available resources
Finds resource properties



Globus Replica Location Service



Dynamic: load, queue length
Static: location of GridFTP server, RLS, etc
Locates data that may be replicated
Registers new data products
Transformation Catalog

Locates installed executables
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Example Workflow
Reduction

Original abstract workflow
a

d1
d2
c
If “b” already exists (as determined by query to the
RLS), the workflow can be reduced
b

b
c
d2
Also useful in case of failures
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Mapping from abstract to
executable
b

d2
c
Query RLS, MDS, and TC, schedule computation and data
movement
Move b
from A
to B
Ewa Deelman, [email protected]
Execute
d2 at B
Move c
from B
to U
www.isi.edu/~deelman
Register
c in the
RLS
pegasus.isi.edu
Mosaic of M42
created on the
Teragrid resources
using Pegasus
Pegasus improved
the runtime of this
application by 90%
over the baseline
case
Workflow with 4,500
nodes
Bruce Berriman,
John Good (Caltech)
Joe Jacob, Dan Katz
(JPL)
Gurmeet Singh, Mei
Su (ISI)
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Small Montage
Workflow
~1200 nodes
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Montage
Region Name, Degrees
Pegasus
JPL
User Portal
mDAGFiles
JPL
Abstract
Workflow


Concrete Workflow
Condor DAGMAN
Computational
Grid
SDSC
mNotify
IPAC
NCSA
User
Notification
Service
Collaboration with JPL & IPAC

ISI
TeraGrid Clusters
Image
List
2MASS
Image List
Service
Grid Scheduling
and Execution
Service
DAGMan
Abstract
Workflow
Abstract
Workflow
Service
m2MASSList
IPAC
mGridExec
ISI
Condor Pool
Initial prototype implemented and tested on the TeraGrid
Montage performance evaluations
Production Montage portal open to the astronomy community this year
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
SCEC


Derive Probabilistic Hazard Curves & maps for the
Los Angeles Area: 6 sites in 2005, 625 in 2006, and
10,000 in 2007
Probability of a certain ground motion during a
certain period of time
Hazard Map
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
SCEC workflows on the
TG
Executable workflow
Provision the
resources
Resource
Descriptions
Record
Information about
the Workflow
Map the
Workflow onto
the Grid
resources
Executable
Workflow
Run the
Workflow on the
Grid Resources
Ta
s
kI
nfo
Tasks
Grid Resources
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
SCEC Workflows on the
TG
Gaurang Mehta at ISI ran the experiments
(nice TeraGrid folks)
(Jens Voeckler, Mike Wilde
(UofC, ANL)
Condor Glide-in
Resource
Descriptions
VDS Kickstart &
Provenance Tracking
Catalog (PTC)
Executable
Workflow
Condor’s
DAGMan
Ta
s
kI
nfo
Tasks
University of Wisconsin Madison
Local machine
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
SCEC computations so
far
Ewa Deelman, [email protected]
fa
i
le
d
jo
b
s
n
ra
tio
ist
eg
R
at
a
m
og
is
Se
ak
Va
lC
al
c

Retries
Rescue DAG
Pe

D
Failed job recovery
sf
er

Tr
an

23 days total runtime
NCSA & SDSC TG

at
a
[11, 1000] jobs
D

Li
Each workflow
en
_
26 workflows
100000
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
ra
m
G
USC


33 workflows
ya


Number of Jobs, total number of jobs 261,823
Pasadena
_O
ka

www.isi.edu/~deelman
pegasus.isi.edu
So far 2 SCEC sites done
(Pasadena and USC)
Number of jobs per day (23 days), 261,823 jobs total, Number
of CPU hours per day, 15,706 hours total (1.8 years)
JOBS
100000
HRS
10000
1000
100
10
Ewa Deelman, [email protected]
www.isi.edu/~deelman
11
/1
0
11
/8
11
/6
11
/4
11
/2
10
/3
1
10
/2
9
10
/2
7
10
/2
5
10
/2
3
10
/2
1
10
/1
9
1
pegasus.isi.edu
Distribution of seismogram
jobs
100000
Num of Jobs
10000
1000
70 hours
100
10
1
10
60
110
160
210
260
310
360
410
460
510
900 2400 4200
Time (mins)
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Execution Sites
1,000,000
100,000
10,000
NUM JOBS
1,000
DAYS
100
10
1
local

ncsa
sdsc
Observations from working with the Scientists


Two way street: they give us feedback on our technologies,
we show them how things run (break) at scale
We have seen great performance improvements in the
codes
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Some other Pegasus
Application Domains






Laser Gravitational Wave
Observatory (LIGO)
Galaxy morphology
(NVO)
Tomography for neural
structure reconstruction
(NIH)
High-energy physics
Gene alignment
Natural Language
processing
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Courtesy of David Meyers, Caltech
LIGO has used Pegasus to run on the Open Science Grid at SC’05
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Benefits of the workflow &
Pegasus approach





Pegasus can run the workflow on a variety of resources
Pegasus can run a single workflow across multiple
resources
Pegasus can opportunistically take advantage of
available resources (through dynamic workflow mapping)
Pegasus can take advantage of pre-existing intermediate
data products
Pegasus can improve the performance of the
application.
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Benefits of the workflow &
Pegasus approach

Pegasus shields from the Grid details

The workflow exposes



the structure of the application
maximum parallelism of the application
Pegasus can take advantage of the structure to


Set a planning horizon (how far into the workflow to plan)
Cluster a set of workflow nodes to be executed as one (for
performance)
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Pegasus Research




resource discovery and assessment
resource selection
resource provisioning
workflow restructuring


adaptive computing


task merged together or reordered to improve overall
performance
Workflow refinement adapts to changing execution
environment
workflow debugging
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu
Software releases

Pegasus http://pegasus.isi.edu

released as part of the GriPhyN Virtual Data System (VDS)

Collaborators in VDS: Ian Foster (ANL) Mike Wilde
(ANL) and Jens Voeckler (Uof C)

http://vds.isi.edu
Ewa Deelman, [email protected]
www.isi.edu/~deelman
pegasus.isi.edu