Transcript Slide 1

Planning the SCEC Pathways: Pegasus at work on the Grid

The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need to discover the available resources and schedule the jobs onto them, essentially composing detailed application workflow descriptions by hand. This leaves users struggling with the complexity of the Grid and weighing which resources to use, where to run the computations, where to access the data etc. Thus there is a need to automate the workflow generation and execution process as much as possible.

Pegasus: Planning for Execution in Grids

http://pegasus.isi.edu

 Maps from abstract to concrete workflow.

 Isolates the user from many Grid details.  Automatically locates physical locations for both components (transformations) and data, via Globus RLS and the Transformation Catalog.

 Finds appropriate resources to execute the components (via Globus MDS).

 Interfaces with external site selectors.

 Publishes newly derived data products.

 Reuses existing data products where applicable.

 Supports on demand staging of binary executables.

Rls-client Replica Query and Registration Mechanism RLS Tc-client Transformation Catalog Mechanism (TC) File Database Genpoolconfig client Resource Information Catalog MDS File Replica Selection RLS Site Selector Random Round Robin Grasp Min-Min Prophesy Max-Min CPlanner (gencdag) Stork Gridlab transfer Data Transfer Mechanism Globus url-copy Transfer2 Multiple Transfer Existing Interfaces Interfaces in development Production Implementations Research Implementations

PEGASUS ENGINE

Pegasus command line clients Submit Writer Condor GridLab GRMS Stork Writer

SCEC COMPOSITION PROCESS

CAT Knowledge Base SCEC Datatype DB Compositional Analysis Tool (CAT) Pathway Composition Tool Grid-Based Data Selector Metadata Catalog Service DAX Generator Dax Pegasus HAZARD MAP host1 GRID host2 Data Data

A View of SCEC Composition Process

Testbed

Replica Location Service Dag Rsl Condor DAGMAN  CAT (Compositional Analysis Tool) an ontology based workflow composition tool or PCT (Pathway Composition Tool) generate the application workflows template (using ontologies and data types).

 The Grid-Based Input Data selection component allows the user to select the input data necessary to populate the workflow template. The result in an abstract workflow that refers only to the logical application components and logical input data required for a pathway.

 The DAX generator translates the abstract workflow to a corresponding XML description (DAX).  Pegasus takes in the DAX and generates the concrete workflow.

 Concrete Workflow identifies the resources that are used to run on the grid and refers to the physical locations of input data.

 Condor DAGMAN submits the workflow on the grid and tracks the execution of the workflow.

 Successful execution generates the final hazard map for the region.

SCEC Resources Teragrid Resources Other Resources NCSA-Teragrid ANL-Teragrid PSC-Teragrid

SCEC Workflow

Caltech Teragrid SCEC USC SDSC-Teragrid Palm Springs  Preparation job prepares input for Pathway 2 simulation.

 Pathway 2 is Fortran based, wave propagation MPI code.  Pathway2PGV reads in binary output file generated by Pathway2, and converts it into hazard map that can be visualized .

Other Success Stories

 Laser Interferometer Gravitational Wave Observatory (LIGO)

http://www.ligo.caltech.edu

 Montage

http://montage.ipac.caltech.edu

 BLAST Genome Analysis and Database Update

http://www-fp.mcs.anl.gov/pdq/pdq.htm

 ATLAS Monte Carlo data production  Sloan Digital Sky Survey galaxy cluster finding

http://www.sdss.org

People Involved: ISI :

Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, John McGee Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi

SCEC:

Vipin Gupta, Phil Maechling

USC :

Maureen Dougherty, Brian Mendenhall, Garrick Staples