Transcript Slide 1

Pegasus: Planning for Execution in Grids
http://pegasus.isi.edu
Virtual Data Concepts
Rls-client
-- Capture and manage information about relationships among
-- Data (of widely varying representations)
-- Programs (& their execution needs)
-- Computations (& execution environments)
-- Apply this information to, e.g.
-- Discovery: Data and program discovery
-- Workflow: Structured paradigm for organizing,
locating, specifying, & requesting data
-- Explanation: provenance
-- Ressearch part of NSF funded GriPhyN project
Genpoolconfig
client
Tc-client
Replica Query
and Registration
Mechanism
Transformation
Catalog
Mechanism
(TC)
RLS
File
Resource
Information
Catalog
Database
MDS
CPlanner
(gencdag)
Rls-query-client
Replica Selection
NonJava
Callout
Site Selector
RLS
Round
Robin
File
Min-Min
Random
Group
Grasp
Max-Min
PEGASUS
ENGINE
Data Transfer
Mechanism
Stork
Gridlab
transfer
Submit Writer
Transfer2
Globusurl-copy
Multiple
Transfer
Condor
Existing Interfaces
Production Implementations
Interfaces in development
Research Implementations
GridLab
GRMS
Stork
Writer
Pegasus command line clients
Logic
MDS
(available
Resources)
Pool config
Abstract
Workflow
RLS
(available
data)
Check Resource
Access
TC
MDS
Pegasus: Planning for Execution in Grids
Pool config
Reduce the
Workflow
-- Maps from abstract to concrete workflow
-- Algorithmic and AI based techniques
-- Automatically locates physical locations for both
components (transformations) and data
-- Uses Globus RLS and the Transformation Catalog
-- Finds appropriate resources to execute
-- via Globus MDS
-- Reuses existing data products where applicable
-- Publishes newly derived data products
-- Chimera virtual data catalog & MCS
-- Uses Globus COG Kit for authentication
Perform Site
Selection
Site Selector
RLS
TC
Cluster
Individual Jobs
Fully
Instantiated
Workflow
Add Transfer
Nodes
Write Submit
Files
DAGMan/
Condog-G
file
Replica
Selector
Planning and Scheduling Granularity
Deferred Planning
PW A
PW B
PW C
New Abstract
Worfklow
A Particular Partitioning
Re Planning
Pegasus’ Log files
record sites considered
Pegasus(A) =
Submit(A)
DAGMan(Submit(A))
Retry Y times
Pegasus(X): Pegasus
generated the concrete
workflow and the submit
files for X = Submit(X)
Pegasus(A) =
Submit(A)
DAGMan(Submit(A))
DAGMan(Submit(X)):
DAGMan executes the
concrete workflow for X
Retry Y times
Pegasus(A) =
Submit(A)
DAGMan(Submit(A))
Retry Y times
Grid Setup
-- Leverage Condor’s job
retry mechanism to
trigger retry on partition
in case of failure.
-- Parse Condor Log files
to determine the sites at
which job failed.
-- Subsequent invocation
of Pegasus on the same
partition are aware of
the bad sites.
-- Partitioning
-- Allows to set the granularity for planning ahead.
-- Node Aggregation
-- Allows to combine nodes in the workflow and
schedule them as one unit.
-- Minimizes scheduling overhead and planning
overhead
-- Related But Separate Concepts
-- Small Jobs
> High level of Node Aggregation
> Large Partitions
-- Very Dynamic System
> Small Partitions
Pegasus Portal
jobmanager
Compute
Resource
Chimera
Information
Pegasus
DAGMan
MDS Index
RLI
LRC
MDS local
GridFTP Server
Storage System
Pool of Resources
Submit
Host
Pool of
Resources
Pool of
Resources
Transformation Catalog
Pool Configuration Info
Properties
Pool of
Resources
jobs
Pool of
Resources
Pool of
Resources
People Involved:
USC/ISI: Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh,
Mei-Hui Su, Karan Vahi ,James Blythe, Yolanda Gil