Getting Started on Topsail - Information Technology Services
Download
Report
Transcript Getting Started on Topsail - Information Technology Services
Getting Started on Topsail
Charles Davis
ITS Research Computing
February 10, 2010
Outline
History of Topsail
Structure of Topsail
File Systems on Topsail
Compiling on Topsail
Topsail and LSF
2
Initial Topsail Cluster
Initially: 1040 CPU Dell Linux Cluster
• 520 dual socket, single core nodes
Infiniband interconnect
Intended for capability research
Housed in ITS Franklin machine room
Fast and efficient for large computational
jobs
3
Topsail Upgrade 1
Topsail upgraded to 4,160 CPU
• replaced blades with dual socket, quad core
Intel Xeon 5345 (Clovertown) Processors
• Quad-Core with 8 CPU/node
Increased number of processors, but decreased
individual processor speed (was 3.6 GHz, now 2.33)
Decreased energy usage and necessary resources for
cooling system
Summary: slower clock speed, better memory
bandwidth, less heat
• Benchmarks tend to run at the same speed per core
• Topsail shows a net ~4X improvement
• Of course, this number is VERY application dependent
4
Topsail – Upgraded blades
52 Chassis: Basis of node names
• Each holds 10 blades -> 520 blades total
• Nodes = cmp-chassis#-blade#
Old Compute Blades: Dell PowerEdge 1855
• 2 Single core Intel Xeon EMT64T 3.6 GHZ procs
• 800 Mhz FSB
• 2MB L2 Cache per socket
• Intel NetBurst MicroArchitecture
New Compute Blades: Dell PowerEdge 1955
• 2 Quad core Intel 2.33 GHz procs
• 1333 Mhz FSB
• 4MB L2 Cache per socket
• Intel Core 2 MicroArchitecture
5
Topsail Upgrade 2
Most recent Topsail upgrade (Feb/Mar ‘09)
Refreshed much of the infrastructure
Improved IBRIX filesystem
Replaced and improved Infiniband cabling
Moved cluster to ITS-Manning building
• Better cooling and UPS
6
Current Topsail
Architecture
Login node: 8 CPU @ 2.3 GHz Intel EM64T,
12 GB memory
Compute nodes: 4,160 CPU @ 2.3 GHz Intel
EM64T, 12 GB memory
Shared disk: 39TB IBRIX Parallel File System
Interconnect: Infiniband 4x SDR
64bit Linux Operating System
7
Multi-Core Computing
Processor Structure
on Topsail
• 500+ nodes
• 2 sockets/node
• 1 processor/socket
• 4 cores/processor
(Quad-core)
• 8 cores/node
http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html
8
Multi-Core Computing
The trend in High Performance
Computing is towards multi-core or many
core computing.
More cores at slower clock speeds for less
heat
Now, dual and quad core processors are
becoming common.
Soon 64+ core processors will be common
• And these may be heterogeneous!
9
The Heat Problem
Taken From: Jack Dongarra, UT
10
More Parallelism
Taken From: Jack Dongarra, UT
11
Infiniband Connections
Connection comes in single (SDR), double (DDR),
and quad data rates (QDR).
• Topsail is SDR.
Single data rate is 2.5 Gbit/s in each direction per
link.
Links can be aggregated - 1x, 4x, 12x.
• Topsail is 4x.
Links use 8B/10B encoding —10 bits carry 8 bits of
data — useful data transmission rate is four-fifths
the raw rate. Thus single, double, and quad data
rates carry 2, 4, or 8 Gbit/s respectively.
Data rate for Topsail is 8 GB/s (4x SDR).
12
Topsail Network Topology
13
Infiniband Benchmarks
Point-to-point
(PTP) intranode
communication
on Topsail for
various MPI send
types
Peak bandwidth:
• 1288 MB/s
Minimum Latency
(1-way):
• 3.6 ms
14
Infiniband Benchmarks
Scaled
aggregate
bandwidth for
MPI Broadcast
on Topsail
Note good
scaling
throughout the
tested range
(from 24-1536
cores)
15
Login to Topsail
Use ssh to connect:
• ssh topsail.unc.edu
SSH Secure Shell with Windows
For using interactive programs with XWindows Display:
• ssh –X topsail.unc.edu
• ssh –Y topsail.unc.edu
Off-campus users (i.e. domains outside of
unc.edu) must use VPN connection
16
Topsail File Systems
39TB IBRIX Parallel File System
Split into Home and Scratch Space
Home: /ifs1/home/my_onyen
Scratch: /ifs1/scr/my_onyen
Mass Storage
• Only Home is backed up
• /ifs1/home/my_onyen/ms
17
File System Limits
500GB Total Limit per User
Home – 15GB limit for Backups
Scratch:
• No limit except 500GB total
• Not backed up
• Periodically cleaned
Few installed packages/programs
18
Compiling on Topsail
Modules
Serial Programming
• Intel Compiler Suite for Fortran77, Fortran90, C and
C++ - Recommended by Research Computing
• GNU
Parallel Programming
• MPI
• OpenMP
Must use Intel Compiler Suite
Compiler tag: -openmp
Must set OMP_NUM_THREADS in submission script
19
Compiling Modules
Module commands
• module – list commands
• module avail – list modules
• module add – add module temporarily
• module list – list modules being used
• module clear – remove module temporarily
Add module using startup files
20
Available Compilers
Intel – ifort, icc, icpc
GNU – gcc, g++, gfortran
Libraries - BLAS/LAPACK
MPI:
• mpicc/mpiCC
• mpif77/mpif90
mpixx is just a wrapper around the Intel or GNU
compiler
• Adds location of MPI libraries and include files
• Provided as a convenience
21
Test MPI Compile
Copy cpi.c to scratch directory:
• cp /ifs1/scr/cdavis/Topsail/cpi.c /ifs1/scr/my_onyen/.
Add Intel module:
• module load hpc/mvapich-intel-11
Confirm Intel module:
• which mpicc
Compile code:
• mpicc –o cpi cpi.c
22
MPI/OpenMP Training
Courses are taught throughout year by
Research Computing
http://learnit.unc.edu/workshops
Next course:
• MPI – Summer
• OpenMP – March 3rd
23
Running Programs on
Topsail
Upon ssh to Topsail, you are on the Login
node.
Programs SHOULD NOT be run on Login
node.
Submit programs to one of 4,160
Compute nodes.
Submit jobs using Load Sharing Facility
(LSF).
24
Job Scheduling Systems
Allocates compute nodes to job
submissions based on user priority,
requested resources, execution time, etc.
Many types of schedulers
• Load Sharing Facility (LSF) – Used by Topsail
• IBM LoadLeveler
• Portable Batch System (PBS)
• Sun Grid Engine (SGE)
25
Load Sharing Facility (LSF)
Submission host other
3
LIM
2
6
7
bsub app
LIM
MLIM
MBD
SBD
RES
hosts
MLIM
4
5
Batch API
1
hosts
Master host other
Load
information
MBD
9
8
queue
10
– Load Information Manager
– Master LIM
– Master Batch Daemon
– Slave Batch Daemon
– Remote Execution Server
Execution host
LIM
SBD
11
Child SBD
12
RES
13
User job
26
Submitting a Job to LSF
For a compiled MPI job:
• bsub -n "< number CPUs >" -o out.%J -e err.%J
-a mvapich mpirun ./mycode
bsub – LSF command that submits job to
compute node
bsub –o and bsub -e
• Job output saved to file in submission
directory
27
Queue System on Topsail
Topsail uses queues to distribute jobs.
Specify queue with –q in bsub:
• bsub –q week …
No –q specified = default queue (week)
Queues vary depending on size and
required time of jobs
See listing of queues:
• bqueues
28
Topsail Queues
Queue
Time Limit
Jobs/User
CPU/Job
int
2 hrs
128
---
debug
day
week
512cpu
128cpu
32cpu
chunk
2 hrs
24 hrs
1 week
4 days
4 days
2 days
4 days
128
1024
1024
1024
1024
1024
1024
--4 – 128
4 – 128
32 – 1024
32 – 128
4 – 32
Batch Jobs
• Most jobs do not scale very well over 128 cpu.
29
Submission Scripts
Easier to write submission script that can
be edited for each job submission.
Example script file – run.hpl:
#BSUB -n "< number CPUs >"
#BSUB -e err.%J
#BSUB -o out.%J
#BSUB -a mvapich
mpirun ./mycode
Submit with: bsub < run.hpl
30
More bsub options
bsub –x NO LONGER USE!!!!
• Exclusive use of a node
• Use extensively when first testing code
bsub –n 4 –R span[ptile=4]
• Forces all 4 processors to be on same node
• Similar to –x
bsub –J job_name
see man pages for a complete description
• man bsub
31
Performance Test
Gromacs MD simulation of bulk water
Simulation setups:
• Case 1: -n 8 -R span[ptile=1]
• Case 2: -n 8 -R span[ptile=8]
Simulation times (1ns MD):
• Case 1: 1445 sec
• Case 2: 1255 sec
Using 1 node only improved speed by 13%
32
Following Job After
Submission
bjobs
• bjobs –l JobID
• Shows current status of job
bhist
• bhist –l JobID
• More details information regarding job history
bkill
• bkill –r JobID
• Ends job prematurely
33
Submit Test MPI Job
Submit the test MPI program on Topsail
• bsub –q week –n 4 –o out.%J –e err.%J –a
mvapich mpirun ./cpi
Follow submission: bjobs
Output stored in out.%J file
34
Pre-Compiled Programs on
Topsail
Some applications are precompiled for all
users:
• /ifs1/apps
• Amber, Gaussian, Gromacs, NetCDF, NWChem, R
Add module to path using module commands:
• module list – shows available applications
• module add – add specific application
Once module command is used, executable is
added to the full path
35
Test Gaussian Job on
Topsail
Add Gaussian Application to path:
• module add apps/gaussian-03e01
• module list
Copy input com file:
• cp /ifs1/scr/cdavis/Topsail/water.com .
Check that executable has been added to path:
• echo $PATH
Submit job:
• bsub –q week –n 4 –e err.%J –o out.%J g03 water.com
36
Common Error 1
If job immediately dies, check err.%J file
err.%J file has error:
• Can't read MPIRUN_HOST
Problem: MPI enivronment settings were not
correctly applied on compute node
Solution: Include mpirun in bsub command
37
Common Error 2
Job immediately dies after submission
err.%J file is blank
Problem: ssh passwords and keys were not
correctly setup at initial login to Topsail
Solution:
• cd ~/.ssh/
• mv id_rsa id_rsa-orig
• mv id_rsa.pub id_rsa.pub-orig
• Logout of Topsail
• Login to Topsail and accept all defaults
38
Interactive Jobs
To run long shell scripts on Topsail, use
int queue
bsub –q int –Ip /bin/bash
• This bsub command provides a prompt on
compute node
• Can run program or shell script interactively
from compute node
Totalview debugger can also be run
interactively from Topsail
39
Further Help with Topsail
More details about using Topsail can be found on
the Getting Started on Topsail help document
• http://help.unc.edu/?id=6214
• http://keel.isis.unc.edu/wordpress/ - ON CAMPUS
For assistance with Topsail, please contact the ITS
Research Computing group
• Email: [email protected]
For immediate assistance, see manual pages
on Topsail:
• man <command>
40