Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting
Download ReportTranscript Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002 Resource Management and Accounting Working Group • • • • • Working Group Scope and Components Progress over last quarter Current issues being worked Next steps Discussions involving larger group Working Group Scope The Resource Management Working Group is involved in the areas of resource management, scheduling and accounting. This working group will focus on the following software components: • Queue Manager • Scheduler • Allocation Manager (and accounting) • Meta Scheduler Other critical resource management components are being developed in the Process Management and Monitoring Working Group: • Process Manager • Node Monitor Proposed Component Architecture Meta Scheduler Allocation Manager Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure Local Scheduler Queue Manager Process Manager Node Monitor Node Manager Infrastructure Services Information Service Discovery Service Security System Resource Management Prototype Demonstration 4 Create-Reservation Allocation Local 9 Withdraw-Allocation Manager Scheduler Discovery Service 1 Submit-Job Queue Manager 6 Exec-Process Job Submission Client Process Manager Node Monitor This demo runs a simple end-to-end test with a job being submitted running past it’s wallclock limit Color Key Working Group Resource Management and Accounting Execution Management and Monitoring Node Configuration and Infrastructure General Progress • Prototype components (Queue Manager and Allocation Manager) advanced to stage of responding to basic requests over XML protocol • Existing components (Maui, PBS) partially modified to communicate to SSS components over XML • We can run a job now completely in SSS protocol! • Initial Requirements documents for Allocation Manager & Queue Manager drafted • Began initial draft of Scalable Systems Software Resource Management and Accounting Protocol (SSSRMAP) Scheduler Progress • • • • • • • • • Developed own XML parser/builder Converted to internal use of XML (job checkpointing etc.) Logically separated Node Monitor & Queue Manager Iface Implemented and tested XML interface to Allocation Mgr to create reservations and make allocation withdrawals Implemented and tested XML interface to Queue Manager to query, start and cancel jobs Implemented and tested XML interface to Node Monitor to query nodes Modified scheduler clients to allow SSS-0.1 socket protocol interface modify checkjob output to display machine readable AVP data Progress on log-based job (resourceXduration and node-mapping) GUI Meta Scheduler Progress • Call Dave in AM or get from Brett Queue Manager Progress • Initial Queue-Manager server and clients supporting: job submission, job query, job deletion and job startup • Queue manager and clients use XML over basic protocol • Queue Manager supports challenge protocol for communications with the Process Manager • Submission client submits job to queue manager and queue manager reports status to user client • Test interaction with scheduler to return job information, start a job and cancel a job • Job startup is supported via create-process commands with the process manager Allocation Manager Progress • Completed first draft of initial requirements • Reviewed requirements/design of other existing project management software • Implemented audit log • Preservation of historical state (distinct from audit log – allows statement creation and time travel) • Support for operators and conjunctions in queries • Reworked class structure and schema to support dynamic extensibility of objects and attributes • Implemented cached metadata dictionary (for dynamic web-GUIs and generic proxy handling of objects) • Lot’s of work on the protocol Current Issues • How best to provide XML interface for PBS • Working with Software Engineering Working Group to decide on test framework • Seeking to clarify interaction with node manager • Determining which component best suited to handle arbitrary batch-specific node features Next Work • Release initial resource management interface specification • Incorporate security in RMA components • All components under CVS • Testing framework installed and first tests created for each component Next Work Local Scheduler • Test interaction with checkpoint/restart mechanisms when interfaces ready • Lot’s of testing and write-up of new capabilities • Certification of milestones (20% of bullet items ready to be checked off) • Security integration • Progress on graphical interfaces Next Work Queue manager • Documentation and packing for easy site configuration (nearly done) • Implementation of a backside database connection to provide job queue persistence across restarts of the Queue manager • Full challenge protocol support in clients and server • QM Support for more advanced jobs and job prologue/epilogue, stdout/stderr handling. Next Work Allocation manager • Focus on getting QBank ready for bundling with SSS (security, use key, improved installation procedure) • Focus effort on open source of new Allocation Manager (gold) • Implement simple pricing engine • Develop XML schema for external pricing • Implementation of functional allocation, reservation mechanisms • Security integration (gold) Issues requiring inter-group discussion • • • Framing mechanism Security protocol Need to solidify SSS-wide standards for packaging, testing, revision control, documentation standards, problem tracking, etc.