NPACI Rocks Tutorial

Download Report

Transcript NPACI Rocks Tutorial

Rocks Clusters
SUN HPC Consortium
November 2004
Federico D. Sacerdoti
Advanced CyberInfrastructure Group
San Diego Supercomputer Center
Outline
• Rocks Identity
• Rocks Mission
• Why Rocks
• Rocks Design
• Rocks Technologies, Services, Capabilities
• Rockstar
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Identity
• System to build and manage
Linux Clusters

General Linux maintenance
system for N nodes


Desktops too
Happens to be good for clusters
• Free
• Mature
• High Performance

Designed for scientific workloads
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Mission
• Make Clusters Easy (Papadopoulos, 00)
• Most cluster projects assume a sysadmin will help build
the cluster.
• Build a cluster without assuming CS knowledge
 Simple

idea, complex ramifications

Automatic configuration of all components and services

~30 services on frontend, ~10 services on compute nodes
Clusters for Scientists
• Results in a very robust system that is insulated from
human mistakes
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Why Rocks
• Easiest way to build a Rockstar-class machine with SGE
ready out of the box
• More supported architectures

Pentium, Athlon, Opteron, Nocona, Itanium
• More happy users
 280 registered clusters, 700 member support list
 HPCwire Readers Choice Awards 2004
• More configured HPC software: 15 optional extensions
(rolls) and counting.
• Unmatched Release Quality.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Why Rocks
• Big projects use Rocks
 BIRN (20 clusters)
 GEON (20 clusters)
 NBCR (6 clusters)
• Supports different clustering toolkits

Rocks Standard (RedHat HPC)

SCE

SCore (Single Process Space)

OpenMosix (Single Process Space: on the way)
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Design
• Uses RedHat’s intelligent installer

Leverages RedHat’s ability to discover & configure hardware

Everyone tries System Imaging at first

Who has homogeneous hardware?

If so, whose cluster stays that way?
• Description Based install: Kickstart

•
Like Jumpstart
Contains a viable Operating System

No need to “pre-configure” an OS
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Design
• No special “Rocksified” package structure. Can
install any RPM.
• Where Linux core packages come from:

RedHat Advanced Workstation (from SRPMS)

Enterprise Linux 3
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Leap of Faith
• Install is primitive operation for Upgrade and Patch

Seems wrong at first

Why must you reinstall the whole thing?

Actually right: debugging a Linux system is fruitless at this scale.
Reinstall enforces stability.

Primary user has no sysadmin to help troubleshoot
• Rocks install is scalable and fast: 15min for entire cluster

•
Post script work done in parallel by compute nodes.
Power Admins may use up2date or yum for patches.

To compute nodes by reinstall
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Technology
Cluster Integration with Rocks
1.
Build a frontend node
1.
2.
2.
Build compute nodes
1.
2.
3.
Insert CDs: Base, HPC, Kernel, optional Rolls
Answer install screens: network, timezone, password
Run insert-ethers on frontend (dhcpd listener)
PXE boot compute nodes in name order
Start Computing
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Dynamic Kickstart File
On node install
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Roll Architecture
• Rolls are Rocks Modules

Think Apache
• Software for cluster

Packaged

•
3rd party tarballs

Tested

Automatically configured
services
RPMS plus Kickstart graph in
ISO form.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Dynamic Kickstart File
With Roll (HPC)
base
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
HPC
Rocks Tech: Wide Area Net Install
Install a frontend without CDs
Benefits
• Can install from minimal
boot image
• Rolls downloaded
dynamically
• Community can build
specific extensions
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: Security & Encryption
To protect the kickstart file
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Tech: 411 Information Service
• 411 does NIS

Distribute passwords
• File based, simple

HTTP transport

Multicast
• Scalable
• Secure
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services
Rocks Cluster Homepage
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Ganglia Monitoring
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Job Monitoring
SGE Batch System
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Job Monitoring
How a job affects resources on this node
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Services: Configured, Ready
• Grid (Globus, from NMI)
• Condor (NMI)

Globus GRAM
• SGE

Globus GRAM
• MPD parallel job launcher (Argonne)

MPICH 1, 2
• Intel Compiler set
• PVFS
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Capabilities
High Performance Interconnect
Support
• Myrinet
 All major versions, GM2
 Automatic configuration and support in Rocks since first
release
• Infiniband
 Via Collaboration with AMD & Infinicon
IB
 IPoIB

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Visualization “Viz” Wall
•
Enables LCD Clusters
One PC / tile
 Gigabit Ethernet
 Tile Frame

•
Applications



•
Large remote sensing
Volume Rendering
Seismic Interpretation
Electronic Visualization Lab
Bio-Informatics
 Bio-Imaging (NCMIR BioWall)

Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rockstar
Rockstar Cluster
• Collaboration between SDSC and SUN
• 129 Nodes: Sun V60x (Dual P4 Xeon)

Gigabit Ethernet Networking (copper)

Top500 list positions: 201, 433
• Built on showroom floor of Supercomputing
Conference 2003

Racked, Wired, Installed: 2 hrs total

Running apps through SGE
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Building of Rockstar
QuickTime™ and a
MPEG-4 V ideo decompressor
are needed to see this picture.
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rockstar Topology
•
•
24-port switches
Not a symmetric network



Best case - 4:1 bisection bandwidth
Worst case - 8:1
Average - 5.3:1
•
•
Linpack achieved 49% of peak
Very close to percentage peak of
1st generation DataStar at SDSC
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
Rocks Future Work
• High Availability: N Frontend nodes.
 Not that far off (supplemental install server design)
 Limited by Batch System

Frontends are long lived in practice:

Keck 2 Cluster (UCSD) uptime: 249 days, 2:56
• Extreme install scaling
• More Rolls!
• Refinements
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents
www.rocksclusters.org
• Rocks mailing List

https://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion
• Rocks Cluster Register

http://www.rocksclusters.org/rocks-register
• Core: {fds,bruno,mjk,phil}@sdsc.edu
Copyright © 2004 F. Sacerdoti, M. Katz, G. Bruno, P. Papadopoulos, UC Regents