Getting Started on Topsail - Information Technology Services

Download Report

Transcript Getting Started on Topsail - Information Technology Services

Getting Started on Topsail
Mark Reed
Charles Davis
ITS Research Computing
Outline
 Compute Clusters
 Logging In
 File Spaces
 User Environment and
Applications, Compiling
 Job Management
2
Logistics
 Course Format
 Lab Exercises
 Breaks
 UNC Research Computing
• http://its.unc.edu/research
 Getting started Topsail page
• http://help.unc.edu/6214
3
What is a compute cluster?
What is Topsail?
What is a compute cluster?
Some Typical Components
 Compute Nodes
 Interconnect
 Shared File System
 Software
 Operating System (OS)
 Job Scheduler/Manager
 Mass Storage
5
Compute Cluster Advantages
 fast interconnect, tightly coupled
 aggregated compute resources
 large (scratch) file spaces
 installed software base
 scheduling and job management
 high availability
 data backup
6
Initial Topsail Cluster
 Initially: 1040 CPU Dell Linux Cluster
• 520 dual socket, single core nodes
 Infiniband interconnect
 Intended for capability research
 Housed in ITS Franklin machine room
 Fast and efficient for large computational
jobs
7
Topsail Upgrade 1
 Topsail upgraded to 4,160 CPU




• replaced blades with dual socket, quad core
Intel Xeon 5345 (Clovertown) Processors
• Quad-Core with 8 CPU/node
Increased number of processors, but decreased
individual processor speed (was 3.6 GHz, now 2.33)
Decreased energy usage and necessary resources for
cooling system
Summary: slower clock speed, better memory
bandwidth, less heat, quadrupled the core count
• Benchmarks tend to run at the same speed per core
• Topsail shows a net ~4X improvement
• Of course, this number is VERY application dependent
8
Topsail – Upgraded blades
 52 Chassis: Basis of node names
• Each holds 10 blades -> 520 blades total
• Nodes = cmp-chassis#-blade#
 Old Compute Blades: Dell PowerEdge 1855
• 2 Single core Intel Xeon EMT64T 3.6 GHZ procs
• 800 Mhz FSB
• 2MB L2 Cache per socket
• Intel NetBurst MicroArchitecture
 New Compute Blades: Dell PowerEdge 1955
• 2 Quad core Intel 2.33 GHz procs
• 1333 Mhz FSB
• 4MB L2 Cache per socket
• Intel Core 2 MicroArchitecture
9
Topsail Upgrade 2
 Most recent Topsail upgrade (Feb/Mar ‘09)
 Refreshed much of the infrastructure
 Improved IBRIX filesystem
 Replaced and improved Infiniband cabling
 Moved cluster to ITS-Manning building
• Better cooling and UPS
10
Top 500 History
 Top 500 lists comes out twice a year
• ISC conference in June
• SC conference in Nov
 Topsail debuted at 74 in June 2006
 Peaked at 25 in June 2007
 Still in the Top 500
11
Current Topsail
Architecture
 Login node: 8 CPU @ 2.3 GHz Intel EM64T,
12 GB memory
 Compute nodes: 4,160 CPU @ 2.3 GHz Intel
EM64T, 12 GB memory
 Shared disk: 39TB IBRIX Parallel File System
 Interconnect: Infiniband 4x SDR
 64bit Linux Operating System
12
Multi-Core Computing
 Processor Structure
on Topsail
• 500+ nodes
• 2 sockets/node
• 1 processor/socket
• 4 cores/processor
(Quad-core)
• 8 cores/node
http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html
13
Multi-Core Computing
 The trend in High Performance
Computing is towards multi-core or many
core computing.
 More cores at slower clock speeds for less
heat
 Now, dual and quad core processors are
becoming common.
 Soon 64+ core processors will be common
• And these may be heterogeneous!
14
The Heat Problem
Taken From: Jack Dongarra, UT
15
More Parallelism
Taken From: Jack Dongarra, UT
16
Infiniband Connections
 Connection comes in single (SDR), double (DDR),




and quad data rates (QDR).
• Topsail is SDR.
Single data rate is 2.5 Gbit/s in each direction per
link.
Links can be aggregated - 1x, 4x, 12x.
• Topsail is 4x.
Links use 8B/10B encoding —10 bits carry 8 bits of
data — useful data transmission rate is four-fifths
the raw rate. Thus single, double, and quad data
rates carry 2, 4, or 8 Gbit/s respectively.
Data rate for Topsail is 8 GB/s (4x SDR).
17
Topsail Network Topology
18
Infiniband Benchmarks
 Point-to-point
(PTP) intranode
communication
on Topsail for
various MPI send
types
 Peak bandwidth:
• 1288 MB/s
 Minimum Latency
(1-way):
• 3.6 ms
19
Infiniband Benchmarks
 Scaled
aggregate
bandwidth for
MPI Broadcast
on Topsail
 Note good
scaling
throughout the
tested range
(from 24-1536
cores)
20
Login to Topsail
 Use ssh to connect:
• ssh Topsail.unc.edu
 SSH Secure Shell with Windows
 For using interactive programs with
X-Windows Display:
• ssh –X Topsail.unc.edu
• ssh –Y Topsail.unc.edu
 Off-campus users (i.e. domains outside of
unc.edu) must use VPN connection
21
File Spaces
File Space
 Home directories
• /ifs1/home/<onyen>
• anyone over 15 GB is not backed up
 Scratch Space
• /ifs1/scr/<onyen>
• over 39 TB of scratch space
• run jobs with large output in this space
 Mass Storage
• ~/ms
23
Mass Storage
 long term archival storage
 access via ~/ms
 looks like ordinary disk file system – data
is actually stored on tape
 “limitless” capacity
 data is backed up
 For storage only, not a work directory
(i.e. don’t run jobs from here)
 if you have many small files, use tar or
zip to create a single file for better
performance
“To infinity … and beyond”
- Buzz Lightyear
 Sign up for this service on onyen.unc.edu
24
User Environment and
Applications, Compiling Code
Modules
Modules
 The user environment is managed by modules
 Modules modify the user environment by
modifying and adding environment variables
such as PATH or LD_LIBRARY_PATH
 Typically you set these once and leave them
 Note there are two module settings, one for
your current environment and one to take
affect on your next login (e.g. batch jobs
running on compute nodes)
26
Common Module
Commands
 module avail
• module avail apps
 module help
Login version
 module list
 module add
 module rm
 module initlist
 module initadd
 module initrm
More on modules see
http://help.unc.edu/CCM3_006660
27
Parallel Jobs with MPI
 There are three implementations of the
MPI standard installed:
• mvapich
• mvapich2
• openmpi
 Performance is similar for all three, all
three run on the IB fabric. Mvapich is the
default. Openmpi and mvapich2 have more
the the MPI-2 features implemented.
28
Compiling MPI programs
 Use the MPI wrappers to compile your
program
• mpicc, mpiCC, mpif90, mpif77
• the wrappers will find the appropriate
include files and libraries and then invoke
the actual compiler
• for example, mpicc will invoke either gcc or
icc depending upon which module you have
loaded
29
Compiling on Topsail
 Serial Programming
• Intel Compiler Suite for Fortran77, Fortran90, C and C++
- Recommended by Research Computing
 icc, icpc, ifort
• GNU
 gcc, g++, gfortran
 Parallel Programming
• MPI (see previous page)
• OpenMP
 Compiler tag: -openmp for Intel
-fopenmp for GNU
 Must set OMP_NUM_THREADS in submission script
30
Job Scheduling and Management
What does a Job Scheduler
and batch system do?
Manage Resources






allocate user tasks to resource
monitor tasks
process control
manage input and output
report status, availability, etc
enforce usage policies
32
Job Scheduling Systems
 Allocates compute nodes to job
submissions based on user priority,
requested resources, execution time, etc.
 Many types of schedulers
• Load Sharing Facility (LSF) – Used by Topsail
• IBM LoadLeveler
• Portable Batch System (PBS)
• Sun Grid Engine (SGE)
33
LSF
 All Research Computing clusters use LSF to do job
scheduling and management
 LSF (Load Sharing Facility) is a (licensed) product
from Platform Computing
• Fairly distribute compute nodes among users
• enforce usage policies for established queues
 most common queues: int, now, week, month
• RC uses Fair Share scheduling, not first come, first
served (FCFS)
 LSF commands typically start with the letter b (as
in batch), e.g. bsub, bqueues, bjobs, bhosts, …
• see man pages for much more info!
34
Simplified view of LSF
Jobs Queued
job_J
job_F
myjob
job_7
Login Node
job dispatched to run on
available host which satisfies
job requirements
job routed
to queue
bsub –n 64 –a mvapich –q week mpirun myjob
user logged in to login
node submits job
35
Running Programs on
Topsail
 Upon ssh to Topsail, you are on the Login
node.
 Programs SHOULD NOT be run on Login
node.
 Submit programs to one of 4,160 Compute
nodes.
 Submit jobs using Load Sharing Facility (LSF)
via the bsub command.
36
Common batch commands
 bsub - submit jobs
 bqueues – view info on defined queues
• bqueues –l week
 bkill – stop/cancel submitted job
 bjobs – view submitted jobs
• bjobs –u all
 bhist – job history
• bhist –l <jobID>
37
Common batch commands
 bhosts – status and resources of hosts (nodes)
 bpeek – display output of running job
 Use man pages to get much more info!
• man bjobs
38
Submitting Jobs: bsub
Command
 Submit Jobs - bsub
• Run large jobs out of scratch space, smaller jobs can
run out of your home space
 bsub [-bsub_opts] executable [-exec_opts]
 Common bsub options:
• –o <filename>
•
•
•
•
 –o out.%J
-q <queue name>
 -q week
-R “resource specification”
 -R “span[ptile=8]”
-n <number of processes>
 used for parallel, MPI jobs
-a <application specific esub>
 -a mvapich(used on MPI jobs)
39
Two methods to submit jobs:
 bsub example: submit the executable job,
myexe, to the week queue and redirect
output to the file out.<jobID> (default is to
mail output)
 Method 1: Command Line
• bsub –q week –o out.%J myexe
 Method 2: Create a file (details to follow)
called, for example, myexe.bsub, and then
submit that file. Note the redirect symbol, <
• bsub < myexe.bsub
40
Method 2 cont.
 The file you submitted will contain all the bsub
options you want in it, so for this example
myexe.bsub will look like this
#BSUB –q week
#BSUB –o out.%J
myexe
 This is actually a shell script so the top line could
be the normal #!/bin/csh, etc and you can run any
commands you would like.
• if this doesn’t mean anything to you then nevermind :)
41
Parallel Job example
Batch Command Line Method
 bsub –q week –o out.%J -n 64 -a mvapich mpirun
myParallelExe
Batch File Method
 bsub < myexe.bsub
 where myexe.bsub will look like this
#BSUB –q week
#BSUB –o out.%J
#BSUB –a mvapich
#BSUB –n 64
mpirun myexe
42
Topsail Queues
Queue
Time Limit
Jobs/User
CPU/Job
int
2 hrs
128
---
debug
day
week
512cpu
128cpu
32cpu
chunk
2 hrs
24 hrs
1 week
4 days
4 days
2 days
4 days
64
512
512
512
512
512
512
--4 – 128
4 – 128
32 – 512
32 – 128
4 – 32
Batch Jobs
• For access to the 512cpu queue the scalabitly should be demonstrated
43
Common Error 1
 If job immediately dies, check err.%J file
 err.%J file has error:
• Can't read MPIRUN_HOST
 Problem: MPI enivronment settings were not
correctly applied on compute node
 Solution: Include mpirun in bsub command
44
Common Error 2
 Job immediately dies after submission
 err.%J file is blank
 Problem: ssh passwords and keys were not
correctly setup at initial login to Topsail
 Solution:
• cd ~/.ssh/
• mv id_rsa id_rsa-orig
• mv id_rsa.pub id_rsa.pub-orig
• Logout of Topsail
• Login to Topsail and accept all defaults
45
Interactive Jobs
 To run long shell scripts on Topsail, use
int queue
 bsub –q int –Ip /bin/bash
• This bsub command provides a prompt on
compute node
• Can run program or shell script interactively
from compute node
 Totalview debugger can also be run
interactively from Topsail
46
MPI/OpenMP Training
 Courses are taught throughout year by
Research Computing
• http://learnit.unc.edu/workshops
• http://help.unc.edu/CCM3_008194
 See schedule for next course
• MPI
• OpenMP
47
Further Help with Topsail
 More details about using Topsail can be found on the
Getting Started on Topsail help document
• http://help.unc.edu/?id=6214
• http://keel.isis.unc.edu/wordpress/ - ON CAMPUS
 For assistance with Topsail, please contact the ITS
Research Computing group
• Email: [email protected]
• Phone: 919-962-HELP
• Submit help ticket at http://help.unc.edu
 For immediate assistance, see manual pages on
Topsail:
• man <command>
48