Transcript Getting Started on Topsail - Information Technology Services
Using Kure and Topsail
Mark Reed Grant Murphy Charles Davis ITS Research Computing
Compute Clusters • Topsail • Kure Logging In File Spaces User Environment and Applications, Compiling Job Management
Outline
2
Course Format Lab Exercises Breaks UNC Research Computing •
http://its.unc.edu/research
Getting started Topsail page •
http://help.unc.edu/6214
Getting started Kure page •
http://help.unc.edu/ccm3_015682
Logistics
3
What is a compute cluster?
What exactly is Topsail? Kure?
What is a compute cluster?
Some Typical Components
Compute Nodes Interconnect Shared File System Software Operating System (OS) Job Scheduler/Manager Mass Storage
5
Compute Cluster Advantages
fast interconnect, tightly coupled aggregated compute resources large (scratch) file spaces installed software base scheduling and job management high availability data backup
6
Initial Topsail Cluster
Initially: 1040 CPU Dell Linux Cluster • 520 dual socket, single core nodes Infiniband interconnect Intended for capability research Housed in ITS Franklin machine room Fast and efficient for large computational jobs
7
Topsail Upgrade 1
Topsail upgraded to 4,160 CPU • replaced blades with dual socket, quad core Intel Xeon 5345 (Clovertown) Processors • Quad-Core with 8 CPU/node Increased number of processors, but decreased individual processor speed (was 3.6 GHz, now 2.33) Decreased energy usage and necessary resources for cooling system Summary: slower clock speed, better memory bandwidth, less heat, quadrupled the core count • • • Benchmarks tend to run at the same speed per core Topsail shows a net ~4X improvement Of course, this number is VERY application dependent
8
Topsail – Upgraded blades
52 Chassis: Basis of node names • Each holds 10 blades -> 520 blades total • Nodes = cmp-chassis#-blade# Old Compute Blades: Dell PowerEdge 1855 • 2 Single core Intel Xeon EMT64T 3.6 GHZ procs • • • 800 Mhz FSB 2MB L2 Cache per socket Intel NetBurst MicroArchitecture New Compute Blades: Dell PowerEdge 1955 • 2
Quad core
Intel 2.33 GHz procs • • •
1333 Mhz 4MB
FSB L2 Cache per socket Intel Core 2 MicroArchitecture
9
Topsail Upgrade 2
Most recent Topsail upgrade (Feb/Mar ‘09) Refreshed much of the infrastructure Improved IBRIX filesystem Replaced and improved Infiniband cabling Moved cluster to ITS-Manning building • Better cooling and UPS
10
Top 500 History
Top 500 lists comes out twice a year • ISC conference in June • SC conference in Nov Topsail debuted at 74 in June 2006 Peaked at 25 in June 2007 Still in the Top 500
11
Current Topsail Architecture
Login node: 8 CPU @ 2.3 GHz Intel EM64T, 12 GB memory
Compute nodes:
4,160 CPU EM64T, 12 GB memory @ 2.3 GHz Intel
Shared disk:
39TB IBRIX Parallel File System Interconnect: Infiniband 4x SDR 64bit Linux Operating System
12
Multi-Core Computing
Processor Structure on Topsail • • • • 500+ nodes 2 sockets/node 1 processor/socket 4 cores/processor (Quad-core) • 8 cores/node
http://www.tomshardware.com/2006/12/06/quad-core-xeon-clovertown-rolls-into-dp-servers/page3.html
13
Multi-Core Computing
The trend in High Performance Computing is towards multi-core or many core computing.
More cores at slower clock speeds for less heat Now, dual and quad core processors are becoming common.
Soon 64+ core processors will be common • And these may be heterogeneous!
14
The Heat Problem
Taken From: Jack Dongarra, UT
15
More Parallelism
Taken From: Jack Dongarra, UT
16
Infiniband Connections
Connection comes in single (SDR), double (DDR), and quad data rates (QDR). • Topsail is SDR .
Single data rate is 2.5 Gbit/s in each direction per link.
Links can be aggregated - 1x, 4x, 12x. • Topsail is 4x .
Links use 8B/10B encoding —10 bits carry 8 bits of data — useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively. Data rate for Topsail is 8 GB/s (4x SDR).
17
Topsail Network Topology
18
Infiniband Benchmarks
Point-to-point (PTP) intranode communication on Topsail for various MPI send types Peak bandwidth: • 1288 MB/s Minimum Latency (1-way): • 3.6 m s
19
Scaled aggregate bandwidth for MPI Broadcast on Topsail Note good scaling throughout the tested range (from 24-1536 cores)
Infiniband Benchmarks
20
The newest, “latest and greatest” compute cluster in RC Named after the beach in North Carolina It’s pronounced like the Nobel prize winning physicist and chemist, Madame Curie
Kure
21
Kure Compute Cluster
Heterogeneous Research Cluster Hewlett Packard Blades 79 Compute Nodes, mostly • Xeon 5560 2.8 GHz • • • • • Nehalem Microarchitecture Dual socket, quad core 48 GB memory over 600 cores some higher memory nodes Infiniband 4x QDR priority usage • for patrons Buy in is cheap Storage • Scratch space same as emerald • No AFS home
22
Kure Cont.
The current configuration of Kure is mostly homogeneous but it will become increasingly heterogeneous as patrons and others add to it.
Most login nodes are 48 GB but there are currently four high memory nodes 2 nodes each with 128 GB of memory 2 nodes each with 96 GB of memory
23
Topsail/Kure Comparison
Topsail
homogeneous 4000+ cores 2.33 GHz cores, Intel Core microarch.
12 GB memory/node IB 4x SDR interconnect
Kure
heterogeneous 600+ cores 2.8 Ghz cores, Intel Nehalem micorarch.
48 GB memory/node IB 4x QDR interconnect
24
Login to Topsail/Kure
Use ssh to connect: • ssh topsail.unc.edu
• ssh kure.unc.edu
SSH Secure Shell with Windows • see
http://shareware.unc.edu/software.html
For use with X-Windows Display: • ssh –X topsail.unc.edu
or
• ssh –Y topsail.unc.edu
or
ssh –X kure.unc.edu
ssh –Y kure.unc.edu
Off-campus users (i.e. domains outside of unc.edu) must use VPN connection
25
File Spaces
Topsail File Space
Home directories • /ifs1/home/
27
Kure File Space
Home directories • /nas02/home///
28
Mass Storage
long term archival storage access via ~/ms looks like ordinary disk file system – data is actually stored on tape “limitless” capacity data is backed up For storage only, not a work directory (i.e. don’t run jobs from here ) if you have many small files, use tar or zip to create a single file for better performance Sign up for this service on onyen.unc.edu
“To infinity … and beyond” - Buzz Lightyear
29
User Environment and Applications, Compiling Code
Modules
Modules
The user environment is managed by modules Modules modify the user environment by modifying and adding environment variables such as PATH or LD_LIBRARY_PATH Typically you set these once and leave them Note there are two module settings, one for your current environment and one to take affect on your next login (e.g. batch jobs running on compute nodes)
31
Common Module Commands
module avail • module avail apps module help Login version module list module initlist module add module initadd module rm module initrm
More on modules see
http://help.unc.edu/CCM3_006660
32
Parallel Jobs with MPI
There are three implementations MPI standard installed: • mvapich • mvapich2 (currently only on topsail) • openmpi of the Performance is similar for all three, all three run on the IB fabric. Mvapich is the default. Openmpi and mvapich2 have more the the MPI-2 features implemented.
33
Compiling MPI programs
Use the MPI wrappers to compile your program • mpicc, mpiCC, mpif90, mpif77 • the wrappers will find the appropriate include files and libraries and then invoke the actual compiler • for example, mpicc will invoke either gcc or icc depending upon which module you have loaded
34
Compiling on Topsail/Kure
Serial Programming • Intel Compiler Suite for Fortran77, Fortran90, C and C++, Recommended by Research Computing • icc, icpc, ifort GNU gcc, g++, gfortran Parallel Programming • • MPI (see previous page) OpenMP Compiler tag: -openmp for Intel -fopenmp for GNU Must set OMP_NUM_THREADS in submission script
35
Debugging - Totalview
If you are debugging code there is a powerful commercial debugger, totalview See http://help.unc.edu/CCM3_021717 parallel and serial code Fortran/C/C++ GUI for source level control too many features to list!
36
Job Scheduling and Management
What does a Job Scheduler and batch system do?
Manage Resources allocate user tasks to resource monitor tasks process control manage input and output report status, availability, etc enforce usage policies
38
Job Scheduling Systems
Allocates compute nodes to job submissions based on user priority, requested resources, execution time, etc.
Many types of schedulers • Load Sharing Facility (LSF) – Used by Topsail/Kure • IBM LoadLeveler • Portable Batch System (PBS) • Sun Grid Engine (SGE)
39
LSF
All Research Computing clusters use LSF scheduling and management to do job LSF (Load Sharing Facility) is a (licensed) product from Platform Computing • Fairly distribute compute nodes among users • enforce usage policies for established queues most common queues:
int, now, week, month
• RC uses Fair Share scheduling, not first come, first served (FCFS) LSF commands typically start with the letter b in batch), e.g. bsub, bqueues, bjobs, bhosts, … (as • see man pages for much more info!
40
Simplified view of LSF
Login Node Jobs Queued job_J job_F myjob job_7 job routed to queue bsub –n 64 –a mvapich –q week mpirun myjob job dispatched to run on available host which satisfies job requirements user logged in to login node submits job
41
Running Programs on Topsail
Upon ssh to Topsail/Kure, you are on the Login node.
Programs SHOULD NOT be run on Login node.
Submit programs to one of the many, many compute nodes.
Submit jobs using Load Sharing Facility (LSF) via the bsub command.
42
Common batch commands
bsub - submit jobs bqueues – view info on defined queues • bqueues –l week bkill – stop/cancel submitted job bjobs – view submitted jobs • bjobs –u all bhist – job history • bhist –l
43
Common batch commands
bhosts – status and resources of hosts (nodes) bpeek – display output of running job Use man pages • man bjobs to get much more info!
44
Submitting Jobs: bsub Command
Submit Jobs - bsub • Run large jobs out of scratch space, smaller jobs can run out of your home space bsub [-bsub_opts] executable [-exec_opts] Common bsub options: • • • • • –o
45
Two methods to submit jobs:
bsub example: submit the executable job, myexe, to the week queue and redirect output to the file out.
46
Method 2 cont.
The file you submitted will contain all the bsub options you want in it, so for this example myexe.bsub will look like this #BSUB –q week #BSUB –o out.%J myexe This is actually a shell script so the top line could be the normal #!/bin/csh, etc and you can run any commands you would like.
• if this doesn’t mean anything to you then nevermind :)
47
Parallel Job example
Batch Command Line Method bsub –q week –o out.%J -n 64 -a mvapich mpirun myParallelExe Batch File Method bsub < myexe.bsub
where myexe.bsub will look like this #BSUB –q week #BSUB –o out.%J #BSUB –a mvapich #BSUB –n 64 mpirun myParallelExe
48
Some Topsail Queues
Queue
int
Time Limit
2 hrs
Jobs/User
128
CPU/Job
-- debug day week 512cpu 128cpu 32cpu chunk 2 hrs 24 hrs 1 week 4 days 4 days 2 days 4 days 64 512 512 512 512 512 512 -- 4 – 128 4 – 128 32 – 512 32 – 128 4 – 32 Batch Jobs • For access to the 512cpu queue the scalability must be demonstrated
49
Some Kure Queues
Queue
int debug bigmem week patrons
Time Limit
10 hrs 5 minutes 1 week 1 week none
Jobs/User
2 32 8 Most users have a 32 job slots limit unless they have been granted extra slots.
Queues are always
subject to change
and probably will change as Kure production ramps up. Use the bqueues command to find the current status
50
Common Error 1
If job immediately dies, check err.%J file err.%J file has error: • Can't read MPIRUN_HOST Problem: MPI enivronment settings were not correctly applied on compute node Solution: Include mpirun in bsub command
51
Common Error 2
Job immediately dies after submission err.%J file is blank Problem: ssh passwords and keys were not correctly setup at initial login to Topsail
Solution:
• • • • •
cd ~/.ssh/ mv id_rsa id_rsa-orig mv id_rsa.pub id_rsa.pub-orig Logout of Topsail Login to Topsail and accept all defaults 52
Interactive Jobs
To run long shell scripts on Topsail or Kure, use int queue bsub –q int –Ip /bin/bash • This bsub command provides a prompt on compute node • Can run program or shell script interactively from compute node
53
Specialty Scripts
There are specialty scripts provided on Kure for the user convenience. Batch scripts • bmatlab, bsas, bstata X-window scripts • xmatlab, xsas, xstata Interactive scripts • imatlab, istata
54
MPI/OpenMP Training
Courses are taught throughout year by Research Computing •
http://learnit.unc.edu/workshops
•
http://help.unc.edu/CCM3_008194
See schedule for next course • MPI • OpenMP
55
Further Help with Topsail/Kure
More details can be found on the Getting Started help documents: • http://help.unc.edu/?id=6214 - Topsail • •
http://help.unc.edu/ccm3_015682
http://keel.isis.unc.edu/wordpress/ - Kure - ON CAMPUS For assistance with Topsail/Kure, please contact the ITS Research Computing group • Email: [email protected]
• Phone: 919-962-HELP • Submit help ticket at http://help.unc.edu
For immediate assistance, see manual pages • man
56