Transcript powerpoint
Resource Management
Reading:
“A Resource Management Architecture for
Metacomputing Systems”
What is Resource Management?
Mechanisms for locating and allocating
computational resources
Authentication
Process creation
Remote job submission
Scheduling
Other resources that can be managed:
Memory
Disk
Networks
Resource Management Issues
for Grid Computing
Site autonomy
Resources owned by different organizations,
in different administrative domains
Local policies for use, scheduling, security
Heterogeneous substrate
Different local resource management
systems
Policy extensibility
Local sites need ability to customize their
resource management policies
More Issues for Grid Computing
Co-allocation
May need resources at several sites
Mechanism for allocating multiple
resources, initiating computation,
monitoring and managing
On-line control
Adapt application requirements to resource
availability
Specifying Resource and Job
Requirements
Resource requirements:
Machine type
Number of nodes
Memory
Network
Job or scheduler parameters:
Directory
Executable
Arguments
Environment
Maximum time required
Resource and Job Specification
Globus: Resource Specification Language
(RSL)
&(executable=myprog)
(|(&(count=5)(memory>=64))
(&(count=10)(memory>=32)))
Condor: Classified ads
Resource owners advertise abilities and
constraints
Applications advertise resource requests
Matchmaking: match offers & requests
Components of Globus Resource
Management Architecture
Resource specification using RSL
Resource brokers: translate resource
requirements into specifications
Co-allocators: break down requests for
multiple sites
Local resource managers: apply local,
site-specific resource management policies
Information about available compute
resources and their characteristics
Resource Specification Language
Common notation for exchange of
information between components
API provided for manipulating RSL
RSL Syntax
Elementary form: parenthesis clauses
(attribute op value [ value … ] )
Operators Supported:
<, <=, =, >=, > , !=
Some supported attributes:
executable, arguments, environment, stdin,
stdout, stderr, resourceManagerContact,
resourceManagerName
Unknown attributes are passed through
May be handled by subsequent tools
Constraints: “&”
For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
(executable=myprog)
“Create 5-10 instances of myprog, each
on a machine with at least 64 MB
memory that is available to me for 4
hours”
Multirequest: “+”
A multirequest allows us to specify multiple
resource needs, for example
+ (& (count=5)(memory>=64)
(executable=p1))
(&(network=atm) (executable=p2))
Execute 5 instances of p1 on a machine
with at least 64M of memory
Execute p2 on a machine with an ATM
connection
Multirequests are central to co-allocation
Resource Broker
Takes high-level RSL specification
Transforms into concrete specifications
through “specialization” process
Locate resources that meet requirements
Multiple brokers may service single request
Application-specific brokers translate
application requirements
Output: complete specification of locations
of resources; given to co-allocator
Examples of Resource Brokers
Nimrod-G
Automates creation and management of
large parametric experiments
Run application under wide range of input
conditions and aggregate results
Queries MDS to find resources
Generates number of independent jobs
GRAM allocates jobs to computational nodes
Higher-level broker: allows user to specify
time and cost constraints
Examples of Resource Brokers
AppLeS
Application Level Scheduler
Map large number of independent tasks to
dynamically varying pool of available
computers
Use GRAM to locate resources and initiate
and manage computation
Resource co-allocators
May request resources at multiple sites
Two or more computers and networks
Break multi-request into components
Pass each component to resource manager
Provide means for monitoring job status or
terminating job
Complex:
Two or more resource managers
Global state like availability of resources
difficult to determine
Different co-allocation services
1.
2.
3.
Require all resources to be available
before job proceeds; fail globally if failure
occurs at any resource
Allocate at least N out of M resources and
return
Return immediately, but gradually return
more resources as they become available
Each useful for some class of applications
Concurrent Allocation
If advance reservations are available:
Obtain list of available time slots from each
participating resource manager and choose timeslot
Without reservations:
Optimistically allocate resources
Hope desired set will be available at future time
Use information service (MDS) to determine current
availability of resources
Construct RSL request that is likely to succeed
If allocation fails, all started jobs must be terminated
Disadvantages of
Concurrent Allocation Scheme
Computational resources wasted while
waiting for all requested resources to
become available
Application must be altered to perform
barrier to synchronize startup across
components
Detecting failure of a resource is difficult,
e.g. in queue-based local resource
managers
Local Resource Managers
Implemented with Globus Resource
Allocation Manager (GRAM)
1. Processing RSL specifications representing
resource requests
Deny request
Create one or more processes (jobs) that satisfy
request
2. Enable remote monitoring and management
of jobs
3. Periodically update MDS information service
with current availability and capabilities of
resources
GRAM (cont.)
Interface between grid environment and
entity that can create processes
E.g., Parallel scheduler or Condor pool
GRAM may schedule resource itself
More commonly, maps resource
specification into a request to a local
resource allocation mechanism
E.g., Condor, LoadLeveler, LSF
Co-exists with local mechanisms
GRAM (cont.)
GRAM API has functions for:
Submitting a job request: produces
globally unique job handle
Canceling a job request
Asking when job request is expected to run
Upon submission, can request that progress
be signaled asynchronously to callback URL
GRAM Scheduling Model
Jobs are either:
Pending: resources have not yet been
allocated to the job
Active: resources allocated, job running
Done: when all processes have terminated
and resources have been deallocated
Failed: job terminates due to :
explicit termination
error in request format
failure in resource management system
denial of access to resource
GRAM Components
Gatekeeper
Responds to a request:
1. Performs mutual authentication of user
and resource
2. Determines local user name for remote
user
3. Starts a job manager that executes as
local user and handles request
GRAM Components (cont.)
Job manager
Creates processes requested by user
Submits resource allocation requests to
underlying resource management system
(or does fork)
Monitors state of created processes
Notifies callback contact of state transitions
Implements control operations like
termination
GRAM Components (cont.)
GRAM reporter
Responsible for storing into MDS
(information service) info about:
Scheduler structure
Support reservations?
Number of queues
Scheduler state
Currently active jobs
Expected wait time in queue
Total number of nodes and available nodes
Resource
Management Architecture
RSL
specialization
Broker
RSL
Queries
& Info
Application
Ground RSL
Information
Service
Co-allocator
Simple ground RSL
Local
resource
managers
GRAM
GRAM
GRAM
LSF
EASY-LL
NQE
Job Submission Interfaces
Globus Toolkit includes several command
line programs for job submission
globus-job-run: Interactive jobs
globus-job-submit: Batch/offline jobs
globusrun: Flexible scripting infrastructure