Pegasus-a framework for planning for execution in grids Karan Vahi [email protected] USC Information Sciences Institute May 5th , 2004

Download Report

Transcript Pegasus-a framework for planning for execution in grids Karan Vahi [email protected] USC Information Sciences Institute May 5th , 2004

Pegasus-a framework for
planning for execution in grids
Karan Vahi
[email protected]
USC Information Sciences Institute
May 5th , 2004
People Involved
USC/ISI
Advanced Systems: Ewa Deelman, Carl
Kesselmann, Gaurang Mehta, Mei-Hui Su,
Gurmeet Singh, Karan Vahi.
Karan Vahi, ISI
[email protected]
May 5th, 2004
2
Outline

Introduction To Planning

DAX

Pegasus

Portal

Demonstration
Karan Vahi, ISI
[email protected]
May 5th, 2004
3
Planning in Grids


One has various alternatives out on the grid in
terms of data and compute resources.
Planning
– Select the best available resources and data sets,
and schedule them on to the grid to get the best
possible execution time.
– Plan for the data movements between the sites
Karan Vahi, ISI
[email protected]
May 5th, 2004
4
Recipe For Planning

Understand the request
– Figure out what data product the request refers to, and
how to generate it from scratch.

Locations of data products
– Final data product
– Intermediate data products which can be used to generate
the final data product.

Location of Job executables

State of the Grid
– Available processors, physical memory available, job
queue lengths etc.
Karan Vahi, ISI
[email protected]
May 5th, 2004
5
Constituents of Planning
Domain
Knowledge
Resource
Information
Location
Information
Plan submitted the
grid
Planner
Karan Vahi, ISI
[email protected]
May 5th, 2004
6
Terms (1)

Abstract Workflow (DAX)
– Expressed in terms of logical entities
– Specifies all logical files required to generate the
desired data product from scratch
– Dependencies between the jobs
– Analogous to build style dag

Concrete Workflow
– Expressed in terms of physical entities
– Specifies the location of the data and executables
– Analogous to a make style dag
Karan Vahi, ISI
[email protected]
May 5th, 2004
7
Outline

Introduction to Planning

DAX

Pegasus

Portal

Demonstration
Karan Vahi, ISI
[email protected]
May 5th, 2004
8
DAX


The format for specifying the abstract
workflow, that identifies the recipe for
creating the final data product at a logical
level.
In case of montage, the IPAC webservice
ends up creating the dax for the user
request.
Developed at University Of Chicago
Karan Vahi, ISI
[email protected]
May 5th, 2004
9
DAX Example








<?xml version="1.0" encoding="UTF-8"?>
<!-- generated: 2003-09-25T11:51:19-05:00 -->
<!-- generated by: vahi [??] -->
<adag xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.6.xsd" count="1"
index="0" name="black-diamond">
<!-- part 1: list of all files used (may be empty) -->
<filename file="f.a" link="input"/>
<filename file="f.b" link="inout"/>
<filename file="f.c" link="output"/>

















<!-- part 2: definition of all jobs (at least one) -->
<job id="ID000001" namespace="montage" name="preprocess" version="1.0" level = "2">
<argument>-a top -T60 -i <filename file="f.a"/> -o <filename file="f.b"/> </argument>
<uses file="f.a" link="input" dontRegister="false" dontTransfer="false"/>
<uses file="f.b" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/>
</job>
<job id="ID000002" namespace="montage" name="analyze" version="1.0" level="1" >
<argument>-a bottom -T60 -i <filename file="f.b"/> -o <filename file="f.c"/></argument>
<uses file="f.b" link="input" dontRegister="false" dontTransfer="false"/>
<uses file="f.c" link="output" dontRegister="false" dontTransfer="false"/>
</job>
<!-- part 3: list of control-flow dependencies (empty for single jobs) -->
<child ref="ID000002">
<parent ref="ID000001"/>
</child>
</adag>
Karan Vahi, ISI
[email protected]
May 5th, 2004
10
Outline

Introduction to Planning

DAX

Pegasus

Demonstration

Portal
Karan Vahi, ISI
[email protected]
May 5th, 2004
11
Pegasus

A configurable system to map and execute
complex workflows on the grids.
– DAX Driven Configuration
– Metadata Driven Configuration

Can do full ahead planning or deferred
planning to map the workflows.
Karan Vahi, ISI
[email protected]
May 5th, 2004
12
Full Ahead Planning



At the time of submission of the workflow, you
decide where you want to schedule the jobs in the
workflow.
Allows you to perform certain optimizations by
looking ahead for bottleneck jobs and then
scheduling around them.
However, for large workflows the decision you
make at submission time may no longer be valid or
optimum at the point the job is actually run.
Karan Vahi, ISI
[email protected]
May 5th, 2004
13
Deferred Planning



Delay the decision of mapping the job to the site
as late as possible.
Involves partitioning of the original dax into
smaller daxes each of which refers to a partition on
which Pegasus is run.
Construct a Mega DAG that ends up running
pegasus automatically on the partition daxes, as
each partition is ready to run.
Karan Vahi, ISI
[email protected]
May 5th, 2004
14
High Level Block Diagram
IPAC/JPL WebService
Abstract Worfklow
Request Manager
Workflow
Planning
Data
Management
Replica Locatio
n
Available
Reources
Workflow
Reduction
at
io
n
in
fo
rm
Concrete
Workflow
Globus Monitoring
and Discovery
Service
Application Models
M
on
ito
rin
g
workflow executor
(DAGman)
Execution
Data
Publication
Dynamic
information
Submission and
Monitoring System
Replica and
Resource
Selector
Globus Replica
Location Service
Information and
Models
s
ta
Grid
ks
Raw data
detector
Karan Vahi, ISI
[email protected]
May 5th, 2004
15
Replica Discovery



Pegasus needs to know where the input files for
the workflow reside.
In Montage case, it should know where the fits
files that are required for the mProject jobs reside.
Hence Pegasus needs to discover the files that are
required for executing a particular abstract
workflow.
Karan Vahi, ISI
[email protected]
May 5th, 2004
16
RLS
1) Pegasus queries
RLI with the LFN
Pegasus
RLI
2) RLI returns the list
of LRC’s that contain
the desired mappings.
3) Pegasus queries
each LRC in the list to
get the PFN’s.
Each LRC sends periodic
updates to the RLI
LRCA
LRCB
LRCC
Each LRC is
responsible for
one pool
Figure (1) RLS Configuration for Pegasus
Interfacing to RLS done by Karan Vahi, Shishir
Karan Vahi, ISI
[email protected]
May 5th, 2004
17
Alternate Replica Mechanisms

Replica Catalog
– Pegasus supports the LDAP based Replica
Catalog

User defined mechanisms
– Pegasus provides the flexibility for the user
to specify his own replica mechanism
instead of RLS or Replica Catalog
– The user just has to implement the
concerned interface
Design and Implementation done by Karan Vahi
Karan Vahi, ISI
[email protected]
May 5th, 2004
18
Transformation Catalog




Pegasus needs to access a catalog to determine the
pools where it can run a particular piece of code.
If a site does not have the executable, one should be
able to ship the executable to the remote site.
Generic TC API for users to implement their own
transformation catalog.
Current Implementations
– File Based
– Database Based
Karan Vahi, ISI
[email protected]
May 5th, 2004
19
File based Transformation Catalog


Consists of a simple text file.
– Contains Mappings of Logical Transformations to Physical
Transformations.
Format of the tc.data file
#poolname logical tr
isi
preprocess


physical tr
/usr/vds/bin/preprocess
env
VDS_HOME=/usr/vds/;
All the physical transformations are absolute path names.
Environment string contains all the environment variables
required in order for the transformation to run on the
execution pool.
Karan Vahi, ISI
[email protected]
May 5th, 2004
20
DB based Transformation Catalog




Presently ported on MySQL. Postgres to be tested.
Adds support for transformations, compiled for different
architectures, OS, OS version and glibc combination,
that would enable us to transfer transformation to
remote sites if the executable does not reside there.
Supports multiple profile namespaces. At present using
only the env namespace.
Supports multiple physical transformations for the same
logical transformation,pool,type tuple.
Karan Vahi, ISI
[email protected]
May 5th, 2004
21
Pool Configuration (1)


Pool Config is an XML file which contains information
about various pools on which DAGs may execute.
Some of the information contained in the Pool Config file
is
– Specifies the various job-managers which are available on
the pool for the different types of condor universes.
– Specifies the GridFtp storage servers associated with each
pool.
– Specifies the Local Replica Catalogs where data residing in
the pool has to be cataloged.
– Contains profiles like environment hints which are common
site wide.
– Contains the working and storage directories to be used on
the pool.
Karan Vahi, ISI
[email protected]
May 5th, 2004
22
Pool Configuration (2)

Two Ways to construct the Pool Config File.
– Monitoring and Discovery Service
– Local Pool Config File (Text Based)

Client tool to generate Pool Config File
– The tool genpoolconfig is used to query the
MDS and/or the local pool config file/s to
generate the XML Pool Config file.
Karan Vahi, ISI
[email protected]
May 5th, 2004
23
Pool Configuration (3)


This file is read by the information provider and published into
MDS.
Format
gvds.pool.id : <POOL ID>
gvds.pool.lrc : <LRC URL>
gvds.pool.gridftp : <GSIFTP URL>@<GLOBUS VERSION>
gvds.pool.gridftp : gsiftp://sukhna.isi.edu/nfs/asd2/[email protected]
gvds.pool.universe : <UNIVERSE>@<JOBMANAGER URL>@<
GLOBUS VERSION>
gvds.pool.universe : [email protected]/[email protected]
gvds.pool.gridlaunch : <Path to Kickstart executable>
gvds.pool.workdir : <Path to Working Dir>
gvds.pool.profile : <namespace>@<key>@<value>
gvds.pool.profile : env@GLOBUS_LOCATION@/smarty/gt2.2.4
gvds.pool.profile : vds@VDS_HOME@/nfs/asd2/gmehta/vds
Karan Vahi, ISI
[email protected]
May 5th, 2004
24
DAX Driven Configuration(1)


Pegasus uses IPAC/JPL webservice as an
abstract workflow generator
Pegasus takes in this abstract workflow
and creates a concrete workflow by
consulting the various grid services
described before
Karan Vahi, ISI
[email protected]
May 5th, 2004
25
DAX Driven Configuration(2)
IPAC/JPL Service
(1) Abstract Workflow
(DAG)
Current State
Generator
(16) Results
(12) DAGMan files
(2) Abstract Dag Request Manager
MCS
RLS
(9) Concrete Dag
(3) Logical File Names
(LFN’s)
(11) DAGMan files
Abstract Dag
Reduction
(10) Concrete Dag
(4) Physical File Names
(PFN’s)
(15) Monitoring
MDS
Abstract and
Concrete Planner
Concrete Planner
(5) Full Abstract Dag (6) Reduced Abstract DAG
(7) Logical
Transformations
(8) Physical
Transformations and
VDL Generator
Execution Environment
Information
Submit File
Generator
DAGMan Submission
& Monitoring
(13) DAG
(14) Log files
Transformation
Catalog
Condor-G/
DAGMan
Karan Vahi, ISI
[email protected]
May 5th, 2004
26
DAG Reduction

Abstract Dag Reduction
– Pegasus queries the RLS with the LFN’s
referred to in the Abstract Workflow
– If data products are found to be already
materialized, Pegasus reuses them and thus
reduces the complexity of CW
Karan Vahi, ISI
[email protected]
May 5th, 2004
27
Abstract Dag Reduction
On applying the
reduction algorithm
additional jobs a,b,c
are deleted
Job c
Job a
Job b
Job f
Pegasus Queries the
RLS and finds the
data products of jobs
d,e,f already
materialized. Hence
deletes those jobs
Job d
Job e
Job g
KEY
The original node
Job h
Pull transfer node
Registration node
Job i
Push transfer node
Implemented by Karan Vahi
Karan Vahi, ISI
[email protected]
May 5th, 2004
28
Concrete Planner (1)
Job c
Job a
Pegasus adds
transfer nodes for
transferring the
input files for the
root nodes of the
decomposed dag
(job g)
Pegasus schedules
job g,h on pool X
and job i on pool Y.
Hence adding an
interpool transfer
node
Job b
Job f
Job d
These three nodes are
for transferring the
output files of the leaf
job (f) to the output
pool, since job f has
been deleted by the
Reduction Algorithm.
Job e
Job g
Job h
KEY
The original node
Job i
Pull transfer node
Pegasus adds
replica nodes for
each job that
materializes data
(g, h, i ).
Registration node
Push transfer node
Node deleted by Reduction algo
Inter-pool transfer node
Implemented by Karan Vahi
Karan Vahi, ISI
[email protected]
May 5th, 2004
29
Transient Files

Selective Transfer of output files
– Data Sets generated by intermediate nodes in DAG
are huge
– However, user maybe interested only in outputs of
selected jobs
– Transfer of all the files could severely overload the
jobmanagers on the compute sites

Need For Selective Transfer of Files
– For each file at the virtual data, user can specify
whether it is transient or not.
– Pegasus bases it’s decision on whether to transfer
the file or not on this.
Implemented by Karan Vahi
Karan Vahi, ISI
[email protected]
May 5th, 2004
30
Outline

Introduction to Planning

DAX

Pegasus

Portal

Demonstration
Karan Vahi, ISI
[email protected]
May 5th, 2004
31
Portal Architecture
Karan Vahi, ISI
[email protected]
May 5th, 2004
32
Portal Demonstration
Karan Vahi, ISI
[email protected]
May 5th, 2004
33
Outline

Introduction to Planning

DAX

Pegasus

Portal

Demonstration
Karan Vahi, ISI
[email protected]
May 5th, 2004
34
Demonstration


Run a small black diamond dag using both full
ahead planning and deferred planning on the isi
condor pool.
Show the various configuration files (tc.data and
pool.config) and how to generate them
(pool.config).

Generate the condor submit files.

Submit the condor dag to condor dagman.
Karan Vahi, ISI
[email protected]
May 5th, 2004
35
Software Required!!

Submit Host
–
–
–
–
–

Condor DAGMAN (to submit the workflows on the grid).
Java 1.4 (to run Pegasus)
Globus 2.4 or higher
Globus RLS (the registration jobs run on the local host).
Xerces, ant , cog etc that come with the VDS distribution
Compute Sites (Machines in the pool)
–
–
–
–
Globus 2.4 or higher (gridftp server, g-u-c, MDS)
On one machine per pool, an lrc should be running.
Condor daemon running.
Various jobmanagers correctly configured.
Karan Vahi, ISI
[email protected]
May 5th, 2004
36
TC File


Walk through the editing of TC file.
A command line client is also in the works
that allows you to update, add and modify
the entries in your transformation catalog
regardless of the underlying
implementation.
Karan Vahi, ISI
[email protected]
May 5th, 2004
37
GenPoolConfig (Demo)




genpoolconfig is the client to generate the pool config
file required by Pegasus.
It queries the MDS and/or a local pool config file (text
based) and generates a XML file.
Am going to generate the pool config file from the text
based configuration.
Usage :
– genpoolconfig –Dvds.giis.host <MDS GIIS hostname> Dvds.giis.dn <MDS GIIS DN> --poolconfig <comma
separated local pool config files> --output <pool config
output>
Karan Vahi, ISI
[email protected]
May 5th, 2004
38
gencdag




The Concrete planner takes the DAX produced by Chimera
and converts into a set of condor dag and submit files.
Usage : gencdag –dax|--pdax <file> --p <list of execution
pools> [--dir <dir for o/p files>] [--o <outputpool>] [-force]
You can specify more then one execution pools. Execution
will take place on the pools on which the executable exists.
If the executable exists on more then one pool then the
pool on which the executable will run is selected randomly.
Output pool is the pool where you want all the output
products to be transferred to. If not specified the
materialized data stays on the execution pool
Karan Vahi, ISI
[email protected]
May 5th, 2004
39
Mei’s Exploits




Mei has been running the montage code for the
past one year, including some huge 6 and 10
degree dags (for the m16 cluster).
The 6 degree runs had about 13,000 compute jobs
and the 10 degree run had about 40,000 compute
jobs!!!
The final mosaic files can be downloaded from
http://www.isi.edu/~griphyn/out_M16_10.fits
http://www.isi.edu/~griphyn/out_M16_6.fits
Karan Vahi, ISI
[email protected]
May 5th, 2004
40
Questions?
Karan Vahi, ISI
[email protected]
May 5th, 2004
41