download report

Transcript powerpoint

Resource Management
“A Resource Management Architecture for
Metacomputing Systems”
What is Resource Management?
Mechanisms for locating and allocating
computational resources
Process creation
Remote job submission
Other resources that can be managed:
Resource Management Issues
for Grid Computing
Site autonomy
Resources owned by different organizations,
in different administrative domains
Local policies for use, scheduling, security
Heterogeneous substrate
Different local resource management
Policy extensibility
Local sites need ability to customize their
resource management policies
More Issues for Grid Computing
May need resources at several sites
Mechanism for allocating multiple
resources, initiating computation,
monitoring and managing
On-line control
Adapt application requirements to resource
Specifying Resource and Job
Resource requirements:
Machine type
Number of nodes
Job or scheduler parameters:
Maximum time required
Resource and Job Specification
Globus: Resource Specification Language
Condor: Classified ads
Resource owners advertise abilities and
Applications advertise resource requests
Matchmaking: match offers & requests
Components of Globus Resource
Management Architecture
Resource specification using RSL
Resource brokers: translate resource
requirements into specifications
Co-allocators: break down requests for
multiple sites
Local resource managers: apply local,
site-specific resource management policies
Information about available compute
resources and their characteristics
Resource Specification Language
Common notation for exchange of
information between components
API provided for manipulating RSL
RSL Syntax
Elementary form: parenthesis clauses
(attribute op value [ value … ] )
Operators Supported:
<, <=, =, >=, > , !=
Some supported attributes:
executable, arguments, environment, stdin,
stdout, stderr, resourceManagerContact,
Unknown attributes are passed through
May be handled by subsequent tools
Constraints: “&”
For example:
& (count>=5) (count<=10)
(max_time=240) (memory>=64)
“Create 5-10 instances of myprog, each
on a machine with at least 64 MB
memory that is available to me for 4
Multirequest: “+”
A multirequest allows us to specify multiple
resource needs, for example
+ (& (count=5)(memory>=64)
(&(network=atm) (executable=p2))
Execute 5 instances of p1 on a machine
with at least 64M of memory
Execute p2 on a machine with an ATM
Multirequests are central to co-allocation
Resource Broker
Takes high-level RSL specification
Transforms into concrete specifications
through “specialization” process
Locate resources that meet requirements
Multiple brokers may service single request
Application-specific brokers translate
application requirements
Output: complete specification of locations
of resources; given to co-allocator
Examples of Resource Brokers
Automates creation and management of
large parametric experiments
Run application under wide range of input
conditions and aggregate results
Queries MDS to find resources
Generates number of independent jobs
GRAM allocates jobs to computational nodes
Higher-level broker: allows user to specify
time and cost constraints
Examples of Resource Brokers
Application Level Scheduler
Map large number of independent tasks to
dynamically varying pool of available
Use GRAM to locate resources and initiate
and manage computation
Resource co-allocators
May request resources at multiple sites
Two or more computers and networks
Break multi-request into components
Pass each component to resource manager
Provide means for monitoring job status or
terminating job
Two or more resource managers
Global state like availability of resources
difficult to determine
Different co-allocation services
Require all resources to be available
before job proceeds; fail globally if failure
occurs at any resource
Allocate at least N out of M resources and
Return immediately, but gradually return
more resources as they become available
Each useful for some class of applications
Concurrent Allocation
If advance reservations are available:
 Obtain list of available time slots from each
participating resource manager and choose timeslot
Without reservations:
 Optimistically allocate resources
 Hope desired set will be available at future time
 Use information service (MDS) to determine current
availability of resources
 Construct RSL request that is likely to succeed
 If allocation fails, all started jobs must be terminated
Disadvantages of
Concurrent Allocation Scheme
Computational resources wasted while
waiting for all requested resources to
become available
Application must be altered to perform
barrier to synchronize startup across
Detecting failure of a resource is difficult,
e.g. in queue-based local resource
Local Resource Managers
Implemented with Globus Resource
Allocation Manager (GRAM)
1. Processing RSL specifications representing
resource requests
 Deny request
 Create one or more processes (jobs) that satisfy
2. Enable remote monitoring and management
of jobs
3. Periodically update MDS information service
with current availability and capabilities of
GRAM (cont.)
Interface between grid environment and
entity that can create processes
E.g., Parallel scheduler or Condor pool
GRAM may schedule resource itself
More commonly, maps resource
specification into a request to a local
resource allocation mechanism
E.g., Condor, LoadLeveler, LSF
Co-exists with local mechanisms
GRAM (cont.)
GRAM API has functions for:
Submitting a job request: produces
globally unique job handle
Canceling a job request
Asking when job request is expected to run
Upon submission, can request that progress
be signaled asynchronously to callback URL
GRAM Scheduling Model
Jobs are either:
Pending: resources have not yet been
allocated to the job
Active: resources allocated, job running
Done: when all processes have terminated
and resources have been deallocated
Failed: job terminates due to :
 explicit termination
 error in request format
 failure in resource management system
 denial of access to resource
GRAM Components
Responds to a request:
1. Performs mutual authentication of user
and resource
2. Determines local user name for remote
3. Starts a job manager that executes as
local user and handles request
GRAM Components (cont.)
Job manager
Creates processes requested by user
Submits resource allocation requests to
underlying resource management system
(or does fork)
Monitors state of created processes
Notifies callback contact of state transitions
Implements control operations like
GRAM Components (cont.)
GRAM reporter
Responsible for storing into MDS
(information service) info about:
Scheduler structure
 Support reservations?
 Number of queues
Scheduler state
 Currently active jobs
 Expected wait time in queue
 Total number of nodes and available nodes
Management Architecture
& Info
Ground RSL
Simple ground RSL
Job Submission Interfaces
Globus Toolkit includes several command
line programs for job submission
globus-job-run: Interactive jobs
globus-job-submit: Batch/offline jobs
globusrun: Flexible scripting infrastructure