Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan.
Download ReportTranscript Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta, Mei-Hui Su, Gurmeet Singh, Karan Vahi Pegasus Flexible framework, maps abstract workflows onto the Grid Possess well-defined APIs and clients for: – Information gathering > Resource information > Replica query mechanism > Transformation catalog query mechanism – Resource selection > Compute site selection > Replica selection – Data transfer mechanism Can support a variety of workflow executors pegasus.isi.edu Ewa Deelman Pegasus Job a May reduce the workflow based on available data products Augments the workflow with data stage-in and data stage-out Augments the workflow with data registration Job b Job c Job f Job d Job e Job g Job h Job a Job d Job b Job e Job g Job c Job f Job h Job i Job i pegasus.isi.edu Ewa Deelman KEY The original node Pull transfer node Registration node Push transfer node Inter-pool transfer node Rls-client Genpoolconfig client Tc-client Replica Query and Registration Mechanism Transformation Catalog Mechanism (TC) RLS File Resource Information Catalog Database MDS CPlanner (gencdag) Replica Selection Site Selector RLS Round Robin File Min-Min Random Grasp Prophesy PEGASUS ENGINE Data Transfer Mechanism Stork Gridlab transfer Submit Writer Transfer2 Globusurl-copy Multiple Transfer Existing Interfaces Production Implementations Interfaces in development Research Implementations Condor Pegasus command line clients Pegasus Components pegasus.isi.edu Ewa Deelman GridLab GRMS Stork Writer Max-Min Original Pegasus configuration Pegasus(Abstract Workflow) Concrete Worfklow DAGMan(CW)) Original Abstract Workflow Original Pegasus Configuration Workflow Execution Simple scheduling: random or round robin using well-defined scheduling interfaces. pegasus.isi.edu Ewa Deelman Deferred Planning through Partitioning PW A PW B PW C A Particular Partitioning A variety of planning algorithms can be implemented pegasus.isi.edu Ewa Deelman New Abstract Workflow Pegasus(A) = Su(A) DAGMan(Su(A)) Pegasus(X): Pegasus generated the concrete workflow and the submit files for Partition X -Su(X) Pegasus(B) = Su(B) DAGMan(Su(X): DAGMan executes the concrete workflow for X Mega DAG is created by Pegasus and then submitted to DAGMan pegasus.isi.edu DAGMan(Su(B)) Pegasus(C) = Su(C) DAGMan(Su(C)) Ewa Deelman Re-planning capabilities Pegasus(A) = Su(A) DAGMan(Su(A)) Pegasus(X): Pegasus generated the concrete workflow and the submit files for Partition X --- Su(X) Retry Y times Pegasus(B) = Su(B) DAGMan(Su(X): DAGMan executes the concrete workflow for Partition X DAGMan(Su(B)) Retry Y times Pegasus(C) = Su(C) DAGMan(Su(C)) Retry Y times pegasus.isi.edu Ewa Deelman Pegasus’ Log files record sites considered Complex Replanning for Free (almost) Pegasus(A) = Su(A) Move f1 to R2 DAGMan(Su(A)) Retry Y times Execute C at R2 failure A f1 f1 B Move f3 to R1 Move f2 to R1 Move f3 to R1 Move f2 to R1 f1 C Execute D at R1 Execute D at R1 D Move f4 to Output location Move f4 to Output location Pegasus mapping, f2 and f3 were found in a replica catalog Workflow submitted to DAGMan f4 Original abstract workflow partition pegasus.isi.edu f1 B f2 f3 f2 Move f3 to R1 A C f2 D Execute D at R1 f3 Move f4 to Output location f4 Pegasus is called again with original partition Ewa Deelman New mapping, here assuming R1 was picked again Optimizations If the workflow being refined by Pegasus consists of only 1 node – Create a condor submit node rather than a dagman node – This optimization can leverage Euryale’s super-node writing component pegasus.isi.edu Ewa Deelman Planning & Scheduling Granularity Partitioning – Allows to set the granularity of planning ahead Node aggregation – Allows to combine nodes in the workflow and schedule them as one unit (minimizes the scheduling overheads) – May reduce the overheads of making scheduling and planning decisions Related but separate concepts – Small jobs > High-level of node aggregation > Large partitions – Very dynamic system > Small partitions pegasus.isi.edu Ewa Deelman Montage (NASA and NVO) – Deliver science-grade custom mosaics on demand – Produce mosaics from a wide range of data sources (possibly in different spectra) – User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Montage Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC Joseph C. Jacob, Daniel S. Katz, JPL Doing large: 6 and 10 degree dags (for the m16 cluster). The 6 degree runs had about 13,000 compute jobs and the 10 degree run had about 40,000 compute jobs Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid. pegasus.isi.edu Ewa Deelman Montage Workflow mProject1 3 2 1 Final Mosaic mProject2 1 mDiff1 2 mProject3 mAdd 1 2 3 mDiff2 3 mBackground1 2 3 mBackground2 mBackground3 Data Stage in nodes D12 D23 a1 x + b1 y + c1 = 0 a2 x + b2 y + c2 = 0 a3 x + b3 y + c3 = 0 mFitplaneD12 mFitplaneD23 Montage compute nodes Data stage out nodes Inter pool transfer nodes mBgModel ax + by + c = 0 dx + ey + f = 0 pegasus.isi.edu Ewa Deelman Future work Staging in executables on demand Expanding the scheduling plug-ins Investigating various partitioning approaches Investigating reliability across partitions pegasus.isi.edu Ewa Deelman Non-GriPhyN applications using Pegasus Galaxy Morphology (National Virtual Observatory) – Investigates the dynamical state of galaxy clusters – Explores galaxy evolution inside the context of large-scale structure. – Uses galaxy morphologies as a probe of the star formation and stellar distribution history of the galaxies inside the clusters. – Data intensive computations involving hundreds of galaxies in a cluster The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the most symmetric, indicative of elliptical galaxies, are concentrated more toward the center. pegasus.isi.edu Ewa Deelman BLAST: set of sequence comparison algorithms that are used to search sequence databases for optimal local alignments to a query 2 major runs were performed using Chimera and Pegasus: 1) 60 genomes (4,000 sequences each), In 24 hours processed Genomes selected from DOE-sponsored sequencing projects 67 CPU-days of processing time delivered ~ 10,000 Grid jobs >200,000 BLAST executions 50 GB of data generated 2) 450 genomes processed Speedup of 5-20 times were achieved because the compute nodes we used efficiently by keeping the submission of the jobs to the compute cluster constant. Lead by Veronika Nefedova (ANL) as part of the PACI Data Quest Expedition program pegasus.isi.edu Ewa Deelman Biology Applications (cont’d) Tomography (NIH-funded project) Derivation of 3D structure from a series of 2D electron microscopic projection images, Reconstruction and detailed structural analysis – complex structures like synapses – large structures like dendritic spines. Acquisition and generation of huge amounts of data Large amount of state-of-the-art image processing required to segment structures from extraneous background. Dendrite structure to be rendered by Tomography Work performed by Mei Hui-Su with Mark Ellisman, Steve Peltier, Abel Lin, Thomas Molina (SDSC) pegasus.isi.edu Ewa Deelman Southern California Earthquake Center CAT Knowledge Base SCEC Datatype DB Compositional Analysis Tool (CAT) Pathway Composition Tool Grid-Based Data Selector Replica Location Service Metadata Catalog Service DAX Generator Dax The SCEC/IT project, funded by (NSF), is developing a new framework for physics-based simulations for seismic hazard analysis building on several information technology areas, including knowledge representation and reasoning, knowledge acquisition, grid computing, and digital libraries. Pegasus Dag Condor DAGMAN host1 HAZARD MAP host2 GRID Data Rsl Data People involved: Vipin Gupta, Phil Maechling (USC) pegasus.isi.edu Ewa Deelman