Using Grid Technologies to Support Large

Download Report

Transcript Using Grid Technologies to Support Large

Using Grid Technologies to Support
Large-Scale Astronomy Applications
Ewa Deelman
Center for Grid Technologies
USC Information Sciences Institute
Ewa Deelman
[email protected]
Outline
• Large-scale applications
• Mapping large-scale applications onto Grid
environments
– Pegasus (developed by ISI under the GriPhyN project)
• Supporting Montage (an image mosaicking
application) on the Grid
• Recent results of running on the Teragrid
• Other applications and conclusions
Ewa Deelman
[email protected]
Acknowledgements
Pegasus
• Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh,
Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI)
• James Blythe, Yolanda Gil (Intelligent Systems Division, ISI)
• http://pegasus.isi.edu
• Research funded as part of the NSF GriPhyN, NVO and SCEC
projects and EU-funded GridLab
Montage
•
•
•
•
Bruce Berriman, John Good, Anastasia Laity, IPAC
Joseph C. Jacob, Daniel S. Katz, JPL
http://montage.ipac.caltech.edu/
Montage is funded by the NASA’s Earth Science Technology
Office, Computational Technologies Project, under Cooperative
Agreement Number NCC5-626 between NASA and the California
Institute of Technology.
Ewa Deelman
[email protected]
Grid Applications
•
•
•
•
Increasing in the level of complexity
Use of individual application components
Reuse of individual intermediate data products
Description of Data Products using Metadata Attributes
• Execution environment is complex and very dynamic
–
–
–
–
Resources are heterogeneous and distributed in the WAN
Resources come and go because of failure or policy changes
Data is replicated
Components can be found at various locations or staged in
on demand
• Separation between
– the application description
– the actual execution description
Ewa Deelman
[email protected]
Scientific Analysis
Construct the Analysis
Workflow Evolution
Workflow Template
Select the Input Data
Abstract Worfklow
Map the Workflow onto
Available Resources
Concrete Workflow
Execute the Workflow
Ewa Deelman
Tasks to be executed
[email protected] Grid Resources
Scientific Analysis
User guided
Construct the Analysis
Library of
Application
Components
Component characteristics
Workflow Template
Select the Input Data
Workflow Evolution
Execution Environment
Data properties
Data
Catalogs
Abstract Worfklow
Automated
Map the Workflow onto
Available Resources
Resource availability and
characteristics
Information
Services
Concrete Workflow
Execute the Workflow
Ewa Deelman
Tasks to be executed
[email protected] Grid Resources
Why Automate Workflow Generation?
Usability:
– Limit User’s necessary Grid knowledge
• Monitoring and Directory Service
• Replica Location Service
Complexity:
– User needs to make choices
• Alternative application components
• Alternative files
• Alternative locations
– The user may reach a dead end
– Many different interdependencies may occur among components
Solution cost:
– Evaluate the alternative solution costs
• Performance
• Reliability
• Resource Usage
Global cost:
– minimizing cost within a community or a virtual organization
– requires reasoning about individual user’s choices in light of
other user’s choices
Ewa Deelman
[email protected]
Concrete Workflow Generation and Mapping
Compositional
Analysis Tool
(CAT)
Workflow Template
Input Data
Selector
Montage
Abstract
Workflow
Service
Chimera
Results
Abstract Workflow
Applicationdependent
Pegasus
Application
independent
Ewa Deelman
Concrete
Workflow
Condor
DAGMan
jobs
[email protected]
Grid Resources
Specifying abstract workflows
• Using GriPhyN Tools (Chimera)
– Using the Chimera Virtual Data Language
TR
galMorph( in redshift, in pixScale, in zeroPoint, in Ho, in om, in flat,
in image, out galMorph ) {
… }
• Writing the abstract workflow directly
– Using scripts (write XML)
• Using high-level workflow composition tools
– Component Analysis Tool (CAT), uses ontologies to describe
workflow components
Ewa Deelman
[email protected]
Generating a Concrete Workflow
Information
–
–
location of files and component
Instances
State of the Grid resources
FFT filea
Select specific
–
–
–
Resources
Files
Add jobs required to form a
concrete workflow that can be
executed in the Grid environment
•
–
–
Data movement
Data registration
Each component in the abstract
workflow is turned into an
executable job
Ewa Deelman
Abstract
Workflow
Move filea from host1://
home/filea
to host2://home/file1
/usr/local/bin/fft /home/file1
DataTransfer
Concrete
Workflow
Data Registration
[email protected]
Pegasus:
Planning for Execution in Grids
• Maps from abstract to concrete workflow
– Algorithmic and AI-based techniques
• Automatically locates physical locations for both
components (transformations) and data
• Finds appropriate resources to execute
• Reuses existing data products where applicable
• Publishes newly derived data products
– Chimera virtual data catalog
– Provides provenance information
Ewa Deelman
[email protected]
Information Components used by Pegasus
• Globus Monitoring and Discovery Service (MDS)
– Locates available resources
– Finds resource properties
• Dynamic: load, queue length
• Static: location of GridFTP server, RLS, etc
• Globus Replica Location Service
– Locates data that may be replicated
– Registers new data products
• Transformation Catalog
– Locates installed executables
Ewa Deelman
[email protected]
Example Workflow Reduction
• Original abstract workflow
a
d1
b
d2
c
• If “b” already exists (as determined by query to the
RLS), the workflow can be reduced
b
Ewa Deelman
d2
c
[email protected]
Mapping from abstract to concrete
b
d2
c
• Query RLS, MDS, and TC, schedule computation and data
movement
Move b
from A
to B
Execute
d2 at B
Ewa Deelman
Move c
from B
to U
[email protected]
Register
c in the
RLS
Condor’s DAGMan
• Developed at UW Madison (Livny)
• Executes a concrete workflow
• Makes sure the dependencies are
followed
• Executes the jobs specified in the
workflow
– Execution
– Data movement
– Catalog updates
• Provides a “rescue DAG” in case of failure
Ewa Deelman
[email protected]
What is Montage?
• Delivers custom, science grade image mosaics
– User specifies projection, coordinates, spatial sampling,
mosaic size, image rotation
– Preserve astrometry & photometric accuracy
– Modular “toolbox” design
• Loosely-coupled Engines for Image Reprojection, Background
Rectification, Co-addition
– Control testing and maintenance costs
– Flexibility; e.g custom background algorithm; use as a
reprojection and co-registration engine
• Public service will be deployed on the Teragrid
– Order mosaics through web portal
Ewa Deelman
[email protected]
Region Name, Degrees
Pegasus
JPL
User Portal
mDAGFiles
JPL
Abstract
Workflow
Concrete Workflow
Condor DAGMAN
TeraGrid Clusters
Computational
Grid
Image
List
2MASS
Image List
Service
Grid Scheduling
and Execution ISI
Service
DAGMan
Abstract
Workflow
Service
m2MASSList
IPAC
Abstract
Workflow
mGridExec
SDSC
mNotify
IPAC
NCSA
User
Notification
Service
ISI
Condor Pool
Montage Portal
Ewa Deelman
[email protected]
Small Montage Workflow
~1200 nodes
Ewa Deelman
[email protected]
Mosaic of M42
created on the
Teragrid resources
using Pegasus
Ewa Deelman
[email protected]
Node Clustering for Performance
(Gurmeet Singh, ISI)
Overheads are incurred
when scheduling
individual nodes of the
workflow
One way to look at the
workflow is by level and
then cluster jobs within
the level and destined
for the same host
You can construct as
many clusters as there
are available
processors for example
Ewa Deelman
mProject
mDiff
mFitplane
mConcatFit
mBgModel
mBackground
mAdd
[email protected]
Total time (in minutes) for executing the concrete
workflow for creating a mosaic covering 6× 6degrees2
region centered at M16.
Ewa Deelman
[email protected]
Total time taken (in minutes) for executing the concrete
workflow as the size of the desired mosaic increases from
1×1 degree2 to 10×10 degree2 centered at M16.
64 processors used
Number of nodes in
The abstract workflow
Ewa Deelman
[email protected]
Benefits of the workflow & Pegasus approach
• The workflow exposes
– the structure of the application
– maximum parallelism of the application
• Pegasus can take advantage of the structure to
– Set a planning horizon (how far into the workflow to plan)
– Cluster a set of workflow nodes to be executed as one
• Pegasus shields from the Grid details
• Pegasus can run the workflow on a variety of resources
• Pegasus can run a single workflow across multiple
resources
• Pegasus can opportunistically take advantage of available
resources (through dynamic workflow mapping)
• Pegasus can take advantage of pre-existing intermediate
data products
• Pegasus can improve the performance of the application.
Ewa Deelman
[email protected]
Applications Using Pegasus and DAGMan
• GriPhyN applications:
– High-energy physics: Atlas, CMS (many)
– Astronomy: SDSS (Fermi Lab, ANL)
– Gravitational-wave physics: LIGO (Caltech, AEI)
• Astronomy:
– Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded)
– Montage (IPAC, JPL, NASA-funded)
• Biology
– BLAST (ANL, PDQ-funded)
• Neuroscience
– Tomography for Telescience(SDSC, NIH-funded)
• Earthquake Science
– Simulation of earthquake propagation in soil (in the Southern
California area – SCEC)
Ewa Deelman
[email protected]
Future directions
• Improving scheduling strategies
• Supporting the Pegasus framework through
pluggable interfaces for resource and data
selection
• Support for staging in executables on demand
• Supporting better space and resource
management (space and compute node
reservation)
• Reliability
Ewa Deelman
[email protected]
For more information
• NVO project www.us-vo.org
• GriPhyN project www.griphyn.org
– Virtual Data Toolkit www.cs.wisc.edu/vdt
• Montage montage.ipac.caltech.edu (IRSA Booth)
• Pegasus pegasus.isi.edu
• My website www.isi.edu/~deelman
Ewa Deelman
[email protected]