The Nimrod Computational Workbench: A Case Study in

Download Report

Transcript The Nimrod Computational Workbench: A Case Study in

Computational Grids and Computational
Economy: Nimrod/G Approach
David Abramson
Rajkumar Buyya
Jonathan Giddy
Parametric Execution of Applications
Coarse-grained SPMD model
Execute one application repeatedly for many
combinations of input parameters
Legacy applications: add iteration and
distribution without modifying code
New applications: remove iteration and
distribution from design
Parametrised modeling experiments:
– Require very high levels of performance
– Generate
– Large amounts of work & concurrency
– Uncoupled computations
– Tolerate - moderately high latencies
Job 1
Job 4
Job 5
Job 3
Job 6
Job 7
Job 8
Job 9
Job 10
Job 11
Job 12
Job 13
Job 16
Description
of Parameters
Job 2
Job 14
Job 17
Job 15
Job 18
Working with Small Clusters
Nimrod (1994 - )
DSTC funded project
– Designed for department level clusters
– Proof of concept
Clustor (Activetools) (1997 - )
– Commercial version of Nimrod
– Re-engineered
Features
– Workstation orientation
– Access to idle workstations
– Random allocation policy
– Password security
–
Clustor limitations
 Manual resource location



static file of machine names
No resource scheduling
– first come first served
No cost model
– all machines cost alike
Single access mechanism
–
Towards Grid Computing….
Source: www.globus.org & updated
Nimrod/G - Nimrod over Globus/Grid
Wide-Area Network Support
redesigned architecture
– use of high-performance networks
Scalable Scheduling
– “guaranteed” deadline
– use of existing schedulers
Computational Economy
– “I am willing to pay $$, can you complete the
job by given deadline”
– trading, bidding, resource reservation...
–
Layered Architecture (Grid Components)
Applications
GlobusView
High-level Services and Tools
DUROC
Nexus
Gloperf
MPI
MPI-IO
CC++
Testbed Status
Nimrod/G
globusrun
Core Services
Metacomputing
Directory
Service
Condor
MPI
LSF
Easy
Source: www.globus.org
NQE
Globus
Security
Interface
Local
Services
GRAM
Heartbeat
Monitor
AIX
GASS
TCP
UDP
Irix
Solaris
Nimrod/G Architecture
Nimrod/G Client
Nimrod/G Client
Parametric
Engine
Nimrod/G Client
Schedule Advisor
Resource Discovery
Persistent
Info.
Dispatcher
Grid Directory Services
Grid Middleware Services
GUSTO Test Bed
Nimrod/G Interactions
Resource
location MDS
Scheduler
Prmtc..
Engine
Dispatcher
GASS
server
Root node
server
GRAM
server
Additional services used implicitly:
• GSI (authentication & authorization)
• Nexus (communication)
Resource
allocation
(local)
Queuing
Job
System
Wrapper
User
process
File access
Gatekeeper node
Computational node
Scheduling Algorithm
Find a set of machines (MDS search)
Distribute jobs from root to machines
Establish job consumption rate for each machine
For each machine
Can we meet deadline?
If not, then return some jobs to root
If yes, distribute more jobs to resource
If cannot meet deadline with current resource
Find additional resources
A Nimrod/G Client
Deadline
Available
Machines
Cost
Sample Applications of Nimrod
 Bioinformatics: Protein Modeling
Sensitivity experiments on smog formation
Parametric study of Laser detuning
 Combinatorial Optimization: Simulated Annealing
 Ecological Modeling: Control Strategies for Cattle Tick
 Electronic CAD: Field Programmable Gate Arrays
 Computer Graphics: Ray Tracing
 High Energy Physics: Searching for Rare Events
 Physics: Laser-Atom Collisions
 VLSI Design: SPICE Simulations
 Radiation Protection and Nuclear Safety
Electronic CAD
Some early results Graph 2 - GUSTO Usage for Ionization Chamber Study
80
20 Hour deadline
15 hour deadline
10 hour deadline
70
60
Average
No Processors
50
40
30
20
10
0
0
2.5
5
7.5
10
Time
12.5
15
17.5
20
Graph 5 - GUSTO Usage for 10 Hour Deadline
Graph 4 - GUSTO Usage for 15 Hour Deadline
35
20
18
30
10 Cost Units
50 Cost Units
16
Average No Processors
14
20
10 Cost Units
15
15 Cost Units
10
12
5 CUs
5 CUs
10 CUs
10 CUs
15 CUs
15 CUs
20 CUs
20 CUs
50 CUs
15 Cost Units
50 CUs
10
8
6
50 Cost Units
4
5 Cost Units
5
2
20 Cost Units
0
5 Cost Units
0
0
2.5
5
7.5
10
12.5
15
0 17.5
2.5 20
5
7.5
Time
10
12.5
Time
Graph 3 - GUSTO Usage for 20 Hour Deadline
20
18
10 Cost Units
16
14
Average No Processors
No Processes
25
12
5 CUs
10 CUs
10
15 CUs
20 CUs
50 CUs
8
5 Cost Units
6
4
2
0
0
2.5
5
7.5
10
Time
12.5
15
17.5
20
15
17.5
20
Related Works
 AppLeS (UC. San Diego)



application level scheduling & case-by-case
NetSolve (UTK/ORNL)
– API for creating farms
DISCWorld (U. Adelaide)
– remote information access
Millennium (UC. Berkeley)
– remote execution environment on clusters
and supports computational economy
–
Conclusions
 Nimrod/G architecture offers a scalable model for
resource management and scheduling on
computational grids
 Supports Computational Economy
 The current model supporting Parametric
Computing can be extended to support parallel
jobs or any other computational model.
 Plan to use the concept of Advance Resource
Reservation in order to offer the feature wherein
the user can say “I am willing to pay $…, can you
complete my job by this time…”
Further Information:
www.csse.monash.edu.au/~davida/nimrod.ht
ml