Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting

Download Report

Transcript Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting

Scalable Systems Software Center
Resource Management and Accounting
Working Group
Face-to-Face Meeting
June 13-14, 2002
Resource Management and
Accounting Working Group
•
•
•
•
•
Working Group Scope and Components
Progress over last quarter
Current issues being worked
Next steps
Discussions involving larger group
Working Group Scope
The Resource Management Working Group is involved in the areas of
resource management, scheduling and accounting.
This working group will focus on the following software components:
• Queue Manager
• Scheduler
• Allocation Manager (and accounting)
• Meta Scheduler
Other critical resource management components are being developed in
the Process Management and Monitoring Working Group:
• Process Manager
• Node Monitor
Proposed Component
Architecture
Meta
Scheduler
Allocation
Manager
Color
Key
Working Group
Resource Management
and Accounting
Execution Management
and Monitoring
Node Configuration and
Infrastructure
Local
Scheduler
Queue
Manager
Process
Manager
Node
Monitor
Node
Manager
Infrastructure
Services
Information
Service
Discovery
Service
Security
System
Resource Management Prototype
Demonstration
4 Create-Reservation
Allocation
Local
9 Withdraw-Allocation
Manager
Scheduler
Discovery
Service
1 Submit-Job
Queue
Manager
6 Exec-Process
Job
Submission
Client
Process
Manager
Node
Monitor
This demo runs a
simple end-to-end
test with a job
being submitted
running past it’s
wallclock limit
Color
Key
Working Group
Resource Management
and Accounting
Execution Management
and Monitoring
Node Configuration and
Infrastructure
General Progress
• Prototype components (Queue Manager and Allocation
Manager) advanced to stage of responding to basic
requests over XML protocol
• Existing components (Maui, PBS) partially modified to
communicate to SSS components over XML
• We can run a job now completely in SSS protocol!
• Initial Requirements documents for Allocation Manager &
Queue Manager drafted
• Began initial draft of Scalable Systems Software Resource
Management and Accounting Protocol (SSSRMAP)
Scheduler Progress
•
•
•
•
•
•
•
•
•
Developed own XML parser/builder
Converted to internal use of XML (job checkpointing etc.)
Logically separated Node Monitor & Queue Manager Iface
Implemented and tested XML interface to Allocation Mgr to create
reservations and make allocation withdrawals
Implemented and tested XML interface to Queue Manager to query,
start and cancel jobs
Implemented and tested XML interface to Node Monitor to query
nodes
Modified scheduler clients to allow SSS-0.1 socket protocol interface
modify checkjob output to display machine readable AVP data
Progress on log-based job (resourceXduration and node-mapping) GUI
Meta Scheduler Progress
• Call Dave in AM or get from Brett
Queue Manager Progress
• Initial Queue-Manager server and clients supporting:
job submission, job query, job deletion and job startup
• Queue manager and clients use XML over basic protocol
• Queue Manager supports challenge protocol for communications with
the Process Manager
• Submission client submits job to queue manager and queue manager
reports status to user client
• Test interaction with scheduler to return job information, start a job and
cancel a job
• Job startup is supported via create-process commands with the process
manager
Allocation Manager Progress
• Completed first draft of initial requirements
• Reviewed requirements/design of other existing project management
software
• Implemented audit log
• Preservation of historical state (distinct from audit log – allows
statement creation and time travel)
• Support for operators and conjunctions in queries
• Reworked class structure and schema to support dynamic extensibility
of objects and attributes
• Implemented cached metadata dictionary (for dynamic web-GUIs and
generic proxy handling of objects)
• Lot’s of work on the protocol
Current Issues
• How best to provide XML interface for PBS
• Working with Software Engineering Working Group to
decide on test framework
• Seeking to clarify interaction with node manager
• Determining which component best suited to handle
arbitrary batch-specific node features
Next Work
• Release initial resource management interface
specification
• Incorporate security in RMA components
• All components under CVS
• Testing framework installed and first tests created
for each component
Next Work
Local Scheduler
• Test interaction with checkpoint/restart mechanisms when
interfaces ready
• Lot’s of testing and write-up of new capabilities
• Certification of milestones (20% of bullet items ready to be
checked off)
• Security integration
• Progress on graphical interfaces
Next Work
Queue manager
• Documentation and packing for easy site configuration
(nearly done)
• Implementation of a backside database connection to
provide job queue persistence across restarts of the Queue
manager
• Full challenge protocol support in clients and server
• QM Support for more advanced jobs and job
prologue/epilogue, stdout/stderr handling.
Next Work
Allocation manager
• Focus on getting QBank ready for bundling with SSS
(security, use key, improved installation procedure)
• Focus effort on open source of new Allocation Manager
(gold)
• Implement simple pricing engine
• Develop XML schema for external pricing
• Implementation of functional allocation, reservation
mechanisms
• Security integration (gold)
Issues requiring inter-group
discussion
•
•
•
Framing mechanism
Security protocol
Need to solidify SSS-wide standards for
packaging, testing, revision control,
documentation standards, problem
tracking, etc.