Platform LSF HPC Business Presentation

Download Report

Transcript Platform LSF HPC Business Presentation

Integrated Workload Management for Beowulf Clusters
Bill DeSalvo – August 18, 2004
[email protected]
1
What We’ll Cover
Platform LSF Family of Products
What is Platform LSF HPC
Key Features & Benefits
How it Works
Q&A
2
© Platform Computing Inc. 2003
Platform’s Grid Solution Architecture
3
© Platform Computing Inc. 2003
Technical Computing Product Family
4
© Platform Computing Inc. 2003
Platform LSF Family of Products
Platform LSF
Intelligent, policy-driven batch application workload processing
Manage & accelerate batch workloads for compute- and data-intensive applications
Platform LSF HPC
Intelligent, policy-driven high performance computing (HPC) workload processing
Manage & accelerate High Performance Computing (HPC) mission-critical workload
Complimentary Products
Platform LSF
MultiCluster
Intelligent, policy-driven batch application workload processing across multiple Platform LSF clusters
Platform LSF
License Scheduler
Intelligent, policy-driven application license optimization for Platform LSF clusters
Platform LSF
Analytics
Intelligent delivery of precise information for better project decisions
Platform LSF
Reports
Intelligent cluster operation reporting for Platform LSF clusters
5
Share between autonomously managed departments or organizations spanning geographical locations
Optimize the usage of all application licenses based on an organization’s established distribution policy
Better co-ordinate projects, estimate project completion times and provision resources more accurately
© Platform Computing Inc. 2003
Visibility into cluster utilization
What Problems Are We Solving?
Solve large, grand challenge, complex problems by optimizing the
placement of workload in High Performance Computing environments
6
© Platform Computing Inc. 2003
Platform LSF HPC
Intelligent, policy-driven high performance computing (HPC) workload
processing
Parallel & sequential batch workload management for High
Performance Computing (HPC)
Includes patent-pending topology-based scheduling
Intelligently schedules parallel batch jobs
Virtualizes resources
Prioritizes service levels based on policies
Based on Platform LSF:
Standards-based, OGSI-compliant, grid-enabled solution
Commercial production quality product
7
© Platform Computing Inc. 2003
Platform Customers
8
© Platform Computing Inc. 2003
Platform Customers
9
© Platform Computing Inc. 2003
Platform Customers
10
© Platform Computing Inc. 2003
Platform LSF HPC
Platform LSF HPC AlphaServer SC
Platform LSF HPC for IBM
Platform LSF HPC for Linux
Platform LSF HPC for SGI
Platform LSF HPC for Cray
11
© Platform Computing Inc. 2003
Extensive Hardware Support
HP
SGI
HP AlphaServer SC
SGI IRIX
HP XC
SGI TRIX
HP Superdome
SGI Altix, SGI Propack
HP-UX 11i
IBM
Linux
IBM RS/6000 AIX
IA-64 systens with RedHat
IBM SP2/SP3
Intel, AMD 32-bit systems
with LINUX kernel
Sun
SUN Solaris
High Performance Interconnects
Myrinet with GM
Quadrics QsNet
SGI Numa Flex SGI NumaLink
IBM SP Switch
12
© Platform Computing Inc. 2003
Platform LSF HPC – Linux Support
HP
HP XC Systems running Unlimited Linux
HP Itanium 2 systems running LINUX 2.4.x kernel, glibc 2.2 with RMS on
Quadrics QsNet/Elan3
HP Alpha/AXP systems running LINUX 2.4.x kernel, glibc 2.2.x with RMS on
Quadrics QsNet/Elan3
Linux
IA-64 systems, Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 7.3
x86 systems:
Kernel 2.2.x, compiled with glibc 2.1.x, tested on Debian 2.2, OpenLinux 2.4,
RedHat 6.2 and 7.0, SuSE 6.4 and 7.0, TurboLinux 6.1
Kernel 2.4.x, compiled with glibc 2.1.x, tested on RedHat 7.x and 8.0, and
SuSE 7.0, and RedHat Linux Advanced Server 2.1
Clustermatic Linux 3.0 Kernel 2.4.x, compiled with glibc 2.2.x, tested on
RedHat 8.0
Scyld Linux, Kernel 2.4.x, compiled with glibc 2.2.x.
SGI
13
© Platform Computing Inc. 2003
SGI Altix systems running Linux Kernel 2.4.x compiled with glibc 2.2.x and
SGI Propack 2.2 and higher
Key Features and Benefits
Platform LSF HPC
Key Features
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Extensive Hardware Support
Comprehensive, Extensible and Standards-based Security
15
© Platform Computing Inc. 2003
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
16
© Platform Computing Inc. 2003
Adaptive Interconnect Performance Optimization
Scheduling that takes advantage of unique interconnect
properties
IBM SP Switch at the POE software level
RMS on AlphaServer SC (Quadrics)
SGI topology hardware graph
Out-of-the-box functionality without any customization
required
17
© Platform Computing Inc. 2003
Generic Parallel Job Launcher
Generic support for all different types of Parallel Job
Launchers
LAMMPI, MPICH-GM, MPICH-P4, POE, SCALI,
CHAMPION PRO, etc
Customizable for any vendor or publicly available parallel
solution
Control - ensuring no jobs can escape the workload
management system
18
© Platform Computing Inc. 2003
HPC Workload Scheduling
Dynamic load balancing supporting heterogeneous workloads
IBM SP switch aware scheduling
Scheduling of parallel jobs
Number of CPUs, min/max, node span
Backfill on processor & memory
Processor & memory reservation
Topology aware scheduling
Exclusive scheduling
Advance Reservation
Fairshare, Preemption
Accounting
19
© Platform Computing Inc. 2003
Intelligent Scheduling Policies
Fairshare (User & Project-based)
Ensure job resources are used for the right work
Guarantees resource allocation among users and projects are met
Co-ordinate access to the right number of resources for different users and
projects according to pre-defined shares
Differentiation
Hierarchal & guaranteed
Fairshare
Policy-based Preemption
Goal-oriented SLA driven policies
20
© Platform Computing Inc. 2003
Based on customer SLA driven goals: Deadline, Velocity, Throughput
Guarantees projects are completed on time
Reduces projects and administration costs
Provides visibility into the progress of projects
Allows the admin focus on “What work and When” needs to be done, not
“how” the resources are to be allocated
Intelligent Scheduler
Maximizes throughput of high priority critical work based on priority and
load conditions
Prevents starvation of lower priority work
Differentiation
Platform LSF supports multiple preemption policies
Preemption
Resource
Reservation
Advance
Reservation
License
Scheduling
SLA
Scheduling
Service Level
Agreement
MultiCluster
Other
Scheduling
Modules
Plugin Schedulers
Advanced Self-Management
Flexible, Comprehensive Resource Definitions
Resources defined on a node basis across an entire cluster or subset of the nodes in a
cluster
Auto-detectable or user defined resources
Adaptive membership – nodes join and leave Platform LSF clusters dynamically and
automatically without administration effort
Dynamic or static resources
Job Level Exception Management
Exception-based error detection to take automatic, configurable, corrective actions
Increased job reliability & predictability
Improved visibility on job and system errors & reduced administration overhead and
costs
Automatic Job Migration and Requeue
Automatically migrate and requeue jobs based on policies in the event of host or
network failures
Reduce user and administrator overhead in managing failures & reduce risk of running
critical workloads
Master Scheduler Failover
21
© Platform Computing Inc. 2003
Automatically fail over to another host if the master host is unavailable
Continuous scheduling service and execution of jobs & eliminate manual intervention
Backfill
Policy configured at the queue level and applies to all jobs in a queue
Smaller sequential jobs are ‘backfilled’ behind larger parallel jobs
Improves hardware utilization
Users provided with an accurate time when their job will start
22
© Platform Computing Inc. 2003
Key New Feature & Benefits
Platform LSF V6.0
Feature Overview
OGSI Compliance
Goal-Oriented SLA-Driven Scheduling
License-Aware Scheduling
Job-Level Exception Management (Self Management Enhancement)
Job Group Support
Other Scheduling Enhancements
Queue-Based Fairshare
User Fairshare by Queue Priority
Job Starvation Prevention plug-in
24
© Platform Computing Inc. 2003
Feature Overview (Cont.)
HPC Enhancements
Dynamic ptile Enforcement
Resource Requirement Specification for Advance Reservation
Thread Limit Enforcement
General Parallel Support
Parallel Job Size Scheduling
Job Limit Enhancements
Non-normalized Job Run Limit
Resource Allocation Limit Display
Administration and Diagnostics
Scheduler Dynamic Debug
Administrator Action Messages
25
© Platform Computing Inc. 2003
Goal-Oriented SLA-Driven Scheduling
What is it?
A new scheduling paradigm.
Unlike current scheduling policies based on configured shares or limits,
SLA-driven scheduling is based on customer provided goals:
Deadline based goal: Specify the deadline for a group of jobs.
Velocity based goal: Specify the number of jobs running at any one time.
Throughput based goal: Specify the number of finished jobs per hour.
This scheduling policy works on top of queues and host partitions.
Benefits
Guarantees projects are completed on time according to explicit SLA
definitions.
Provides visibility into the progress of projects to see how well projects are
tracking to SLAs
Allows the admin focus on “What work and When” needs to be done, not
“how” the resources are to be allocated.
Guarantees service level deliveries to the user community, reduces the risks
of projects and administration cost.
26
© Platform Computing Inc. 2003
User case
27
© Platform Computing Inc. 2003
Problem: we need to finish all simulation jobs before 15:00pm.
Solution: Configure a deadline service class in lsb.serviceclasses file.
Begin ServiceClass
NAME=simulation
PRIORITY=100
GOALS = [deadline timeWindow (13:00 – 15:00)]
DESCRIPTION = A simple deadline demo
End ServiceClass
Submitting and monitoring jobs
$bsub –sla simulation –W 10 –J A[1-50] mySimulation
$date;bsla
Wed Aug 20 14:00:16 EDT 2003
SERVICE_CLASS_NAME: simulation
GOAL: DEADLINE ACTIVE_WINDOW: (13:00 – 15:00)
STATUS: Active:Ontime
DEAD_LINE: (Wed Aug 20 15:00)
ESTIMATED_FINISH_TIME: (Wed Aug 20 14:30)
Optimum Number of Running Jobs: 5
NJOBS PEND RUN SSUSP USUSP FINISH
50
25
5
20
Job-Level Exception Management (Self Management
Enhancement)
What is it?
Platform LSF can monitor the exception behavior and take action accordingly.
Benefits
Increased reliability of job execution
Improved visibility on job and system errors
Reduced administration overhead and costs
How it works
Platform LSF V6 handles following exceptions:
“Job eating” machine (or “black-hole” machine): for some reason, jobs keep exiting
abnormally on a machine (e.g. no processes, mount daemon dies, etc.)
Job underrun (job run time less than configured minimum time)
Job overrun (job run time more than configured maximum time)
Job run idle (job run without cpu usage increasing).
28
© Platform Computing Inc. 2003
Job Starvation Prevention Plug-in
What is it?
External scheduler plug-in allows users to define their own equation for job
priority
Benefits
Low priority work is guaranteed to run after ‘waiting’ for a specified time
ensuring that the job does not wait forever (i.e. starvation).
How it works
By default, the scheduler provides the following calculation
Job priority =A * (q_priority) *MIN(1, int(wait_time/T0))
* (B*requested_processors+MAX(C*wait_time*(1+1/run_time),D)
+E*requested_memory)
Where A, B, C, D, E are coefficients. T0 is the grace period. Default
run_time= INFINIT
Admin can define different coefficients for each queue with the following
format:
MANDATORY_EXTSCHED=JOBWEIGHT[A=val1; B=val2; …]
29
© Platform Computing Inc. 2003
Resource Requirement Specification For Advance
Reservation
What is it?
Enable users to select the hosts for advance reservation based on
the resource requirement.
Benefit
More flexible to reserve the host slots for the mission critical job.
How it works
brsvadd command supports select string:
brsvadd –R “select[type==LINUX]” –n 4 –u xwei –b 10:00 –e 12:00
30
© Platform Computing Inc. 2003
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
31
© Platform Computing Inc. 2003
Job Termination Reasons
Accounting log with detailed audit & error information for
every job in the system
Indicates why a job was terminated
Difference between an abnormal termination or caused by
Platform LSF HPC
32
© Platform Computing Inc. 2003
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
33
© Platform Computing Inc. 2003
Enterprise Proven
Running on several of the top 10 supercomputers in the
world on the “TOP500” (#3,5,9,11)
More than 250,000 licenses in use spanning 1,500 customer
sites
Scales to over 100 clusters, 200,000 CPUs and 500,000
active jobs per cluster
11+ years experience in distributed & grid computing
Risk free investment – proven solution
Commercial production quality
34
© Platform Computing Inc. 2003
Key Features – Platform LSF HPC
Optimized Application, System and Hardware Performance
Enhanced Accounting, Auditing & Control
Commercial Grade System Scalability & Reliability
Comprehensive, Extensible and Standards-based Security
35
© Platform Computing Inc. 2003
Comprehensive, Extensible, Standards-based Security
Scalable scheduler architecture
Multiple scheduler plug-in API support
External executable support
Web GUI
Open source components
Risk free investment – proven solution
Commercial grade
Scalability and flexibility as a business grows
36
© Platform Computing Inc. 2003
How It Works
Platform LSF HPC
Master Host Election Process
Master
/dev/kmem
/dev/kmem
/dev/kmem
LIM
LIM
SBD
SBD
LIM
Am I the
master?
SBD
MBD
Master announcement
MBSCHD
Host 1
38
© Platform Computing Inc. 2003
Host 2
Host N
Platform LSF Daemons
Master
/dev/kmem
/dev/kmem
/dev/kmem
LIM
MELIM
SELIM
SBD
LIM
MELIM
SELIM
SBD
LIM
SBD
MELIM
SELIM
PIM
PIM
PIM
RES
RES
RES
MBD
MBSCHD
Host 1
39
© Platform Computing Inc. 2003
Host 2
Host N
Grid-enabled, Scalable Architecture
Open, modular plug-in schedulers scale
with the growth of your business
40
© Platform Computing Inc. 2003
Multiple Scheduling Modules
41
PreProcessing
PreProcessing
Matching /
Limits
Matching /
Limits
Order /
Allocation
Order /
Allocation
PostProcessing
PostProcessing
Internal
Module
Add-on
Module 1
© Platform Computing Inc. 2003
PreProcessing
...
...
...
...
Order /
Allocation
• Vendor
specific
matching
policies
(without
changing the
existing
scheduler
PostProcessing
• Support for
external
scheduler
Matching /
Limits
Add-on
Module N
Maui Integration
MAUI Plugin
Event Handle
(wait until GO event)
Job, Host, Res Info
RMGetInfo
Pre-processing
Decisions and ack
Order jobs
SCH_FM
Sync
MBD
QueueScheduleSJobs
QueueScheduleRJobs
QueueScheduleIJobs
QueueBackFill
Post-Processing
UIProcessClients
42
© Platform Computing Inc. 2003
MAUI
Scheduler
Linux-specific Solutions
Controlling an MPI job
On a distributed system (Linux cluster) there are many problems to
address:
44
1.
Job launch across multiple nodes
2.
Gather resource usage while job executes
3.
Propagate signals
4.
Job “clean-up” to eliminate “dangling” MPI processes
5.
Comprehensive job accounting
© Platform Computing Inc. 2003
“traditional” MPI sequence
Job
launcher
Resource
manager
submit
Jobscript
mpirun
a.out
45
© Platform Computing Inc. 2003
a.out
Platform LSF HPC for Linux - MPICH-GM
mbatchd
sbatchd
bsub
res
Job script
pam
gmmpirun_wrapper
mpirun
res
res
TS
PIM
TS
a.out
PIM
a.out
46
© Platform Computing Inc. 2003
Platform LSF HPC for Linux/Myrinet - MPICH_GM
master LIM
PIM
LIM
LIM
elim
Report resource availability
LIM
MBD
Report resource availability
SBD
PIM
elim
elim
SBD child
MBSCHD
SBD
Mpirun.lsf
Master Host
Signals and rusage collection
lsblib
pam
high
Gmmpirun_w
rapper
med
Root res
Root res
Hostname & pid
hpc_queue
bsub
Set LSF_PJL_TYPE
To mpich_gm
Mpirun.ch_g
m
Hostname & pid
TaskStarter
esub
Submission host
© Platform Computing Inc. 2003
TaskStarter
Queues
a.out: process 1
Execution Host H1
47
rsh
a.out: process 2
H2
Scyld Beowulf Integration
• Scyld Beowulf handles the systems management challenge
effectively
• No OS to distribute / synchnronize
• Central point of control from master
• Single process space makes it appear as large SMP
• Platform integrates with Scyld treating cluster as SMP and
allocating resources
• Integrate with mpirun, mpprun or bpsh to start tasks
• Collect resource usage from BPROC
• Collect load information via BPROC APIs
• Singe user interface across Sycld & non-Scyld env.
48
© Platform Computing Inc. 2003
Platform LSF HPC for Linux/BProc
Computing Nodes
1C
master LIM
PIM
LIM
allocated
nodes
3
MBD
SBD
LIM
1B
PIM
User Job
Processes
4
2
MBSCHD
5
SBD child –exec()
res
SBD
Master Host
lsblib
Res6B
1A
high
bsub
6C
med
low
Job file
Modify submission options
esub
Submission host
49
© Platform Computing Inc. 2003
Bpsh/mpirun
Queues
Bproc Front-end Node
H3
More info at:
50
•
www.platform.com/customers
•
www.platform.com/barriers
© Platform Computing Inc. 2003
Q&A