Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu www.isi.edu/~deelman Acknowledgements Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies,
Download ReportTranscript Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu www.isi.edu/~deelman Acknowledgements Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies,
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute http://pegasus.isi.edu www.isi.edu/~deelman Acknowledgements Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI) James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) http://pegasus.isi.edu Research funded as part of the NSF GriPhyN, NVO and SCEC projects, NIH-funded CRCNS project and EU-funded GridLab Thanks for the use of the TeraGrid Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Outline Applications as workflows Pegasus (Planning for Execution in Grids) Montage application (Astronomy, NSF&NASA) CyberShake (Southern California Earthquake Center) Results from running on the TeraGrid Conclusions Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Today’s Scientific Applications Applications Increasing in the level of complexity Use of individual application components Components are supplied by various individuals Reuse of individual intermediate data products (files) Execution environment is complex and very dynamic Resources come and go Data is replicated Components can be found at various locations or staged in on demand Separation between the application description the actual execution description Applications being described in terms of workflows Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Scientific Analysis Construct the Analysis Workflow Evolution Workflow Template Select the Input Data Abstract Worfklow Map the Workflow onto Available Resources Executable Workflow Execute the Workflow Ewa Deelman, [email protected] Tasks to be executed www.isi.edu/~deelman pegasus.isi.edu Grid Resources Scientific Analysis Execution Environment User guided Construct the Analysis Component characteristics Workflow Template Select the Input Data Workflow Evolution Library of Application Components Data Catalogs Data properties Abstract Worfklow Automated Map the Workflow onto Available Resources Resource availability and characteristics Information Services Executable Workflow Execute the Workflow Ewa Deelman, [email protected] Tasks to be executed www.isi.edu/~deelman pegasus.isi.edu Grid Resources Executable Workflow Generation and Mapping Intelligent Workflow Composition tools ( WINGS and CAT) (Nat. Lang. Proc) Virtual Data Language (VDL) (GTOMO, HEP, Biology, others) Applicationspecific Abstract Workflow Service (LIGO, SCEC, Montage) Results Abstract Workflow Applicationdependent Application independent Pegasus Executable Workflow Condor DAGMan jobs Grid Resources WINGS and CAT, developed at ISI by Y. Gil, VDL, developed at ANL & Uof C by I. Foster, J. Voeckler & M. Wilde Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Pegasus: Planning for Execution in Grids Maps from abstract to executable workflow Automatically locates physical locations for both workflow components and data Finds appropriate resources to execute the components Reuses existing data products where applicable Publishes newly derived data products Provides provenance information Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Information Components used by Pegasus Globus Monitoring and Discovery Service (MDS) (or static file) Locates available resources Finds resource properties Globus Replica Location Service Dynamic: load, queue length Static: location of GridFTP server, RLS, etc Locates data that may be replicated Registers new data products Transformation Catalog Locates installed executables Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Example Workflow Reduction Original abstract workflow a d1 d2 c If “b” already exists (as determined by query to the RLS), the workflow can be reduced b b c d2 Also useful in case of failures Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Mapping from abstract to executable b d2 c Query RLS, MDS, and TC, schedule computation and data movement Move b from A to B Ewa Deelman, [email protected] Execute d2 at B Move c from B to U www.isi.edu/~deelman Register c in the RLS pegasus.isi.edu Mosaic of M42 created on the Teragrid resources using Pegasus Pegasus improved the runtime of this application by 90% over the baseline case Workflow with 4,500 nodes Bruce Berriman, John Good (Caltech) Joe Jacob, Dan Katz (JPL) Gurmeet Singh, Mei Su (ISI) Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Small Montage Workflow ~1200 nodes Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Montage Region Name, Degrees Pegasus JPL User Portal mDAGFiles JPL Abstract Workflow Concrete Workflow Condor DAGMAN Computational Grid SDSC mNotify IPAC NCSA User Notification Service Collaboration with JPL & IPAC ISI TeraGrid Clusters Image List 2MASS Image List Service Grid Scheduling and Execution Service DAGMan Abstract Workflow Abstract Workflow Service m2MASSList IPAC mGridExec ISI Condor Pool Initial prototype implemented and tested on the TeraGrid Montage performance evaluations Production Montage portal open to the astronomy community this year Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu SCEC Derive Probabilistic Hazard Curves & maps for the Los Angeles Area: 6 sites in 2005, 625 in 2006, and 10,000 in 2007 Probability of a certain ground motion during a certain period of time Hazard Map Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu SCEC workflows on the TG Executable workflow Provision the resources Resource Descriptions Record Information about the Workflow Map the Workflow onto the Grid resources Executable Workflow Run the Workflow on the Grid Resources Ta s kI nfo Tasks Grid Resources Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu SCEC Workflows on the TG Gaurang Mehta at ISI ran the experiments (nice TeraGrid folks) (Jens Voeckler, Mike Wilde (UofC, ANL) Condor Glide-in Resource Descriptions VDS Kickstart & Provenance Tracking Catalog (PTC) Executable Workflow Condor’s DAGMan Ta s kI nfo Tasks University of Wisconsin Madison Local machine Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu SCEC computations so far Ewa Deelman, [email protected] fa i le d jo b s n ra tio ist eg R at a m og is Se ak Va lC al c Retries Rescue DAG Pe D Failed job recovery sf er Tr an 23 days total runtime NCSA & SDSC TG at a [11, 1000] jobs D Li Each workflow en _ 26 workflows 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 ra m G USC 33 workflows ya Number of Jobs, total number of jobs 261,823 Pasadena _O ka www.isi.edu/~deelman pegasus.isi.edu So far 2 SCEC sites done (Pasadena and USC) Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years) JOBS 100000 HRS 10000 1000 100 10 Ewa Deelman, [email protected] www.isi.edu/~deelman 11 /1 0 11 /8 11 /6 11 /4 11 /2 10 /3 1 10 /2 9 10 /2 7 10 /2 5 10 /2 3 10 /2 1 10 /1 9 1 pegasus.isi.edu Distribution of seismogram jobs 100000 Num of Jobs 10000 1000 70 hours 100 10 1 10 60 110 160 210 260 310 360 410 460 510 900 2400 4200 Time (mins) Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Execution Sites 1,000,000 100,000 10,000 NUM JOBS 1,000 DAYS 100 10 1 local ncsa sdsc Observations from working with the Scientists Two way street: they give us feedback on our technologies, we show them how things run (break) at scale We have seen great performance improvements in the codes Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Some other Pegasus Application Domains Laser Gravitational Wave Observatory (LIGO) Galaxy morphology (NVO) Tomography for neural structure reconstruction (NIH) High-energy physics Gene alignment Natural Language processing Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Courtesy of David Meyers, Caltech LIGO has used Pegasus to run on the Open Science Grid at SC’05 Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Benefits of the workflow & Pegasus approach Pegasus can run the workflow on a variety of resources Pegasus can run a single workflow across multiple resources Pegasus can opportunistically take advantage of available resources (through dynamic workflow mapping) Pegasus can take advantage of pre-existing intermediate data products Pegasus can improve the performance of the application. Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Benefits of the workflow & Pegasus approach Pegasus shields from the Grid details The workflow exposes the structure of the application maximum parallelism of the application Pegasus can take advantage of the structure to Set a planning horizon (how far into the workflow to plan) Cluster a set of workflow nodes to be executed as one (for performance) Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Pegasus Research resource discovery and assessment resource selection resource provisioning workflow restructuring adaptive computing task merged together or reordered to improve overall performance Workflow refinement adapts to changing execution environment workflow debugging Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Software releases Pegasus http://pegasus.isi.edu released as part of the GriPhyN Virtual Data System (VDS) Collaborators in VDS: Ian Foster (ANL) Mike Wilde (ANL) and Jens Voeckler (Uof C) http://vds.isi.edu Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu