Building a High-performance Computing Cluster Using FreeBSD

Download Report

Transcript Building a High-performance Computing Cluster Using FreeBSD

Building
aFreeBSD
High-performance
Computing
Cluster Using
'03
September
10,
2003 Gary Green,
Brooks
Davis,
Michael
CraigAuYeung,
Lee
The
Aerospace
Corporation
ElBSDCon
Segundo,
CA
{brooks,lee,mauyeung}@aero.org,
[email protected]
HPC Clustering Basics
●
●
HPC
Cluster
features:
– Commodity computers
Networked
to
enable
distributed,
parallel
computations
Vastly
lower
compared
to traditional
supercomputers
Many,
but
notcost
all HPC
applications
work well
on
clusters
–
–
Cluster Overview
●
●
●
Fellowship
is the Aerospace Corporate
Cluster
– Name is short for "The Fellowship of the Ring"
Running
FreeBSD
4.8-STABLE
Over
183GFlops
of
floating
point benchmark
performance using the
LINPACK
Cluster
Overview
Nodes and
Servers
●
●
160
Nodes
(320
CPUs)
– dual CPU 1U systems with Gigabit Ethernet
– 86 Pentium III (7 1GHz, 40 1.26GHz, 39 1.4GHz
74 Xeon
2.4GHz
4–– Core
Systems
frodo – management server
– fellowship – shell server
– gamgee – backup, database, monitoring server
– legolas – scratch server (2.8TB)
Cluster
Overview
Network
and Remote
Access
●
●
●
Gigabit
Ethernet
network
– Cisco Catalyst 6513 switch
– Populated with 11 16-port 10/100/1000T blades
Serial
console
access
– Cyclades TS2000 and TS3000 Terminal Servers
Power
control
– Baytech RPC4 and RPC14 serial power
controllers
Cluster
Overview
Physical
Layout
Design Issues
●
●
●
●
●
●
●
Operating
System
Hardware
Architecture
Network
Interconnects
Addressing
and
Naming
Node
Configuration Management
Job
Scheduling
System Monitoring
Operating System
●
●
Almost
anything can
work
Considerations:
– Local experience
–
–
–
Needed
applications
Maintenance
model
Need to modify
OS
●
FreeBSD
– Diskless support
–
–
–
Cluster
architect
is
a
committer
Ease
upgrades
Linux of
Emulation
Hardware Architecture
●
●
Many
choices:
– i386, SPARC, Alpha
Considerations:
– Price
–
–
–
Performance
Power/heat
Software
support
(OS,
dev
tools)apps,
●
Intel
PIII/Xeon
– Price
–
–
OS
Support
Power
Network Interconnects
●
●
Many
choices
– 10/100 Ethernet
– Gigabit Ethernet
– Myrinet
Issues
– price
– OS support
– application mix
●
Gigabit
Ethernet
– application mix
middle
ground
between
tightly and
loosely
coupled
applications
– price
●
Addressing and Naming Schemes
●
●
●
To
subnet
or
not?
Public
or private
IPs?
Naming
conventions
– apply
The usual
rules
to core
servers
–
Large
cluster
probably
want
more
mechanical
names
for
nodes
●
●
●
10.5/16
private
subnet
Core Lord
servers
named
after
of
the
Rings
characters
Nodes named
and
numbed
by location
–
rack 1, node 1:
●
r01n01
●
10.5.1.1
Node Configuration Management
●
●
Major
methods:
– individual installs
automated
installs
network
booting
Automation is critical
–
–
●
●
Network
booted
nodes
– PXE
Automatic
node
disk
configuration
– version in MBR
–
●
diskprep script
Upgrade
of
root using copy
Job Scheduling
●
Options
– manual scheduling
–
–
batch
queuing
systems
(SGE,
OpenPBS,
etc.)
custom schedulers
●
Sun
Grid
Engine
– Ported to FreeBSD
starting patches
with Ron
Chen's
System Monitoring
●
●
Standard
monitoring
tools:
– Saint)
Nagios (aka Net
– Big Sister
Cluster
specific
tools:
– Ganglia
–
Most schedulers
●
●
Ganglia
– port:
sysutils/gangliamonitor-core
Sun
Grid Engine
SystemGanglia
Monitoring
Lessons Learned
●
●
●
●
Hardware
attrition
can
be
significant
Neatness
counts
in
cabling
System
automation
is
very
important
– If you do it to a node, automate it
Much
of the
HPC community thinks the world
is
a Linux
box
FY 2004 Plans
●
●
●
●
Switch
upgrades: Sup 720 and 48-port
blades
New
racks:
another
row
of
racks
adding
6
more
node
racks
(192
nodes)
More
nodes:
either more
Upgrade
to FreeBSD
5.x Xeons or Opterons
Future Directions
●
●
●
●
●
Determining
a
node
replacement
policy
Clustering
on
demand
Schedular
improvements
Grid
integration
(Globus
Toolkit)
Trusted clusters
Wish List
●
●
Userland:
– Database driven, PXE/DHCP server
Kernel:
– Distributed files system support (i.e. GFS)
–
–
Checkpoint
restart process
capability
BProc style and
distributed
management
Acknowledgements
Aerospace
– Michael AuYeung
–
–
–
–
Brooks
Davis
Alan
Foonberg
Gary
Craig Green
Lee
Vendors
– iXsystems
–
–
–
Off
My
Server
Iron
Systems
RajiXsystems,
Chahal
Iron
Systems, ASA
●
Computers
Resources
●
Paper
and
presentation:
– http://people.freebsd.org/~brooks/papers/bsdcon2003/