Introduction to Scientific Computing

Download Report

Transcript Introduction to Scientific Computing

Introduction to
Scientific Computing
Shubin Liu, Ph.D.
Research Computing Center
University of North Carolina at Chapel Hill
Course Goals
 An introduction to high-performance computing
and UNC Research Computing Center
 Available Research Computing hardware
facilities
 Available software packages
 Serial/parallel programming tools and libraries
 How to efficiently make use of Research
Computing facilities on campus
its.unc.edu
2
Agenda
 Introduction to High-Performance Computing
 Hardware Available
• Servers, storage, file systems, etc.
 How to Access
 Programming Tools Available
• Compilers & Debugger tools
• Utility Libraries
• Parallel Computing
 Scientific Packages Available
 Job Management
 Hands-on Exercises– 2nd hour
The PPT format of this presentation is available here:
http://its2.unc.edu/divisions/rc/training/scientific/
its.unc.edu
3
Pre-requisites
 An account on Emerald cluster
 UNIX Basics
its.unc.edu
Getting started:
http://help.unc.edu/?id=5288
Intermediate:
http://help.unc.edu/?id=5333
vi Editor:
http://help.unc.edu/?id=152
Customizing:
http://help.unc.edu/?id=208
Shells:
http://help.unc.edu/?id=5290
ne Editor:
http://help.unc.edu/?id=187
Security:
http://help.unc.edu/?id=217
Data Management:
http://help.unc.edu/?id=189
Scripting:
http://help.unc.edu/?id=213
HPC Application:
http://help.unc.edu/?id=4176
4
About Us
 ITS – Information Technology Services
• http://its.unc.edu
• http://help.unc.edu
• Physical locations:
 401 West Franklin St.
 211 Manning Drive
• 10 Divisions/Departments
 Information Security
 Research Computing Center
 User Support and Engagement
 Communication Technologies
 Enterprise Applications
its.unc.edu
IT Infrastructure and Operations
Teaching and Learning
Office of the CIO
Communications
Finance and Administration
5
Research Computing
Center
 Where and who are we and what do we do?
•
•
ITS Manning: 211 Manning Drive
Website
http://its.unc.edu/research-computing.html
•
Groups
 Infrastructure -- Hardware
 User Support -- Software
 Engagement -- Collaboration
its.unc.edu
6
About Myself
 Ph.D. from Chemistry, UNC-CH
 Currently Senior Computational Scientist @ Research Computing Center, UNC-CH
 Responsibilities:
• Support Computational Chemistry/Physics/Material Science software
• Support Programming (FORTRAN/C/C++) tools, code porting, parallel computing, etc.
• Conduct research and engagement projects in Computational Chemistry
 Development of DFT theory and concept tools
 Applications in biological and material science systems
its.unc.edu
7
What is Scientific Computing?
 Short Version
• To use high-performance computing (HPC) facilities
to solve real scientific problems.
 Long Version, from Wikipedia.com
• Scientific computing (or computational science) is the
field of study concerned with constructing
mathematical models and numerical solution
techniques and using computers to analyze and solve
scientific and engineering problems. In practical use,
it is typically the application of computer simulation
and other forms of computation to problems in
various scientific disciplines.
its.unc.edu
8
What is Scientific Computing?
Engineering
Theory/Model Layer
Sciences
Applied
Mathematics
Scientific
Computing
Algorithm Layer
Computer
Science
Hardware/Software
Natural
Sciences
Application Layer
From scientific discipline viewpoint
Parallel
Computing
From operational viewpoint
HighPerformance
Computing
Scientific
Computing
From Computing Perspective
its.unc.edu
9
What is HPC?
 Computing resources which provide more than an order
of magnitude more computing power than current topend workstations or desktops – generic, widely
accepted.
 HPC ingredients:
• large capability computers (fast CPUs)
• massive memory
• enormous (fast & large) data storage
• highest capacity communication networks (Myrinet,
10 GigE, InfiniBand, etc.)
• specifically parallelized codes (MPI, OpenMP)
• visualization
its.unc.edu
10
Why HPC?
 What are the three-dimensional structures of all of the proteins encoded







by an organism's genome and how does structure influence function, both
spatially and temporally?
What patterns of emergent behavior occur in models of very large
societies?
How do massive stars explode and produce the heaviest elements in the
periodic table?
What sort of abrupt transitions can occur in Earth’s climate and
ecosystem structure?
How do these occur and under what circumstances? If we could design
catalysts atom-by-atom, could we transform industrial synthesis?
What strategies might be developed to optimize management of complex
infrastructure systems?
What kind of language processing can occur in large assemblages of
neurons?
Can we enable integrated planning and response to natural and manmade disasters that prevent or minimize the loss of life and property?
http://www.nsf.gov/pubs/2005/nsf05625/nsf05625.htm
its.unc.edu
11
Measure of Performance
1 CPU, Units in MFLOPS (x106)
Machine/CPU
LINPACK
Performance
Peak
Performanc
e
Intel Pentium 4 (2.53 GHz)
2355
5060
NEC SX-6/1 (1proc. 2.0 ns)
7575
8000
HP rx5670 Itanium2 (1GHz)
3528
4000
IBM eServer pSeries 690 (1300
MHz)
2894
5200
Cray SV1ex-1-32(500MHz)
1554
2000
Compaq ES45 (1000 MHz)
1542
2000
AMD Athlon MP1800+(1530MHz)
1705
3060
Intel Pentium III (933 MHz)
507
933
SGI Origin 2000 (300 MHz)
533
600
Intel Pentium II Xeon (450 MHz)
295
450
Sun UltraSPARC (167MHz)
237
333
Type
Mega FLOPS (x106)
Giga FLOPS (x109)
Tera FLOPS (x1012)
Peta FLOPS (x1015)
Exa FLOPS (x1018)
Zetta FLOPS (x1021)
Yotta FLOPS (x1024)
http://en.wikipedia.org/wiki/FLOPS
Reference: http://performance.netlib.org/performance/html/linpack.data.col0.html
its.unc.edu
12
How to Quantify
Performance? TOP500
 A list of the 500 most powerful computer systems over
the world
 Established in June 1993
 Compiled twice a year (June & November)
 Using LINPACK Benchmark code (solving linear algebra
equation aX=b )
 Organized by world-wide HPC experts, computational
scientists, manufacturers, and the Internet community
 Homepage: http://www.top500.org
its.unc.edu
13
TOP500:November 2007
TOP 5, Units in GFLOPS (=1024 MGLOPS)
Rank
Installatio Site
/Year
Rmax
Rpeak
1
DOE/NNSA/LLNL
United States/2007
BlueGene/L
eServer Blue Gene Solution /
212,992, IBM
478,200
596,378
2
Forschungszentrum
Juelich (FZJ)
Germany/2007
JUGENE - Blue Gene/P Solution
IBM
167,300
SGI/New Mexico
Computing Applications
Center (NMCAC)
United States/2007
SGI Altix ICE 8200, Xeon quad
core 3.0 GHz, SGI
4
Computational Research
Laboratories, TATA SONS
India/2007
EKA - Cluster Platform 3000
BL460c, Xeon 53xx 3GHz,
Infiniband
Hewlett-Packard, 14,240
117.900
170,800
5
Government Agency
Sweden/2007
Cluster Platform 3000 BL460c,
Xeon 53xx 2.66GHz, Infiniband
Hewlett-Packard, 13,728
102,800
University of North
Carolina
United States/2007
Topsail - PowerEdge 1955, 2.33
GHz, Cisco/Topspin Infiniband,
Dell, 4160
28,770
3
36
its.unc.edu
Manufacturer
Computer/Procs
65,536
14,336
222,822
126,900
172,032
146,430
38821.1
14
TOP500: June 2008
Rank
Site
Computer/Year Vendor
Cores
Rmax
Rpeak
Power
1
DOE/NNSA/LANL
United States
Roadrunner - BladeCenter QS22/LS21
Cluster, PowerXCell 8i 3.2 Ghz /
Opteron DC 1.8 GHz , Voltaire
Infiniband / 2008
IBM
122,400
1026.00
1375.78
2345.50
2
DOE/NNSA/LLNL
United States
BlueGene/L - eServer Blue Gene Solution
/ 2007
IBM
212,992
478.20
596.38
2329.60
3
Argonne National
Laboratory
United States
Blue Gene/P Solution / 2007
IBM
163,840
450.30
557.06
1260.00
Ranger - SunBlade x6420, Opteron Quad
2Ghz, Infiniband / 2008
Sun Microsystems
62,976
326.00
503.81
2000.00
Jaguar - Cray XT4 QuadCore 2.1 GHz /
2008
Cray Inc.
30,976
205.00
260.20
1580.71
4,160
28.77
38.82
4
5
67
its.unc.edu
Texas Advanced
Computing Center
Univ. of Texas
United States
DOE/Oak Ridge National
Laboratory
United States
University of North
Carolina
United
States/2007
Topsail - PowerEdge 1955, 2.33 GHz,
Cisco/Topspin Infiniband, Dell
Rmax and Rpeak values are in TFlops.
Power data in KW for entire system
15
TOP500 History of
UNC-CH Entry
List
Systems
Highest Ranking
Sum Rmax
(GFlops)
Sum Rpeak
(GFlops)
Site Efficiency
(%)
06/2008
1
67
28770.00
38821.10
74.11
11/2007
1
36
28770.00
38821.10
74.11
06/2007
1
25
28770.00
38821.10
74.11
11/2006
1
104
6252.00
7488.00
83.49
06/2006
1
74
6252.00
7488.00
83.49
11/2003
1
393
439.30
1209.60
36.32
06/1999
1
499
24.77
28.80
86.01
its.unc.edu
16
Shared/Distributed-Memory
Architecture
CPU
CPU
CPU
CPU
CPU
M
M
M
CPU
CPU
CPU
M
BUS
NETWORK
MEMORY
Distributed memory - each processor
has it’s own local memory. Must do
message passing to exchange data
between processors.
(examples: Baobab, the new Dell Cluster)
its.unc.edu
Shared memory - single address
space. All processors have access to a
pool of shared memory. (examples:
Chastity/zephyr, happy/yatta,
cedar/cypress, sunny) Methods of
memory access : Bus and Crossbar
17
What is a Beowulf Cluster?
 A Beowulf system is a collection of personal
computers constructed from commodity-off-theshelf hardware components interconnected with
a system-area-network and configured to operate
as a single unit, parallel computing platform
(e.g., MPI), using an open-source network
operating system such as LINUX.
 Main components:
• PCs running LINUX OS
• Inter-node connection with Ethernet,
Gigabit, Myrinet, InfiniBand, etc.
• MPI (message passing interface)
its.unc.edu
18
LINUX Beowulf Clusters
its.unc.edu
19
What is Parallel Computing ?
 Concurrent use of multiple processors to
process data
• Running the same program on many processors.
• Running many programs on each processor.
its.unc.edu
20
Advantages of Parallelization
 Cheaper, in terms of Price/Performance Ratio
 Faster than equivalently expensive
uniprocessor machines
 Handle bigger problems
 More scalable: the performance of a particular
program may be improved by execution on a
large machine
 More reliable: In theory if processors fail we
can simply use others
its.unc.edu
21
Catch: Amdahl's Law
Speedup = 1/(s+p/n)
its.unc.edu
22
Parallel Programming Tools
 Share-memory architecture
• OpenMP
 Distributed-memory architecture
• MPI, PVM, etc.
its.unc.edu
23
OpenMP





An Application Program Interface (API) that may be used to explicitly direct multithreaded, shared memory parallelism
What does OpenMP stand for?
• Open specifications for Multi Processing via collaborative work between
interested parties from the hardware and software industry, government and
academia.
Comprised of three primary API components:
• Compiler Directives
• Runtime Library Routines
• Environment Variables
Portable:
• The API is specified for C/C++ and Fortran
• Multiple platforms have been implemented including most Unix platforms and
Windows NT
Standardized:
• Jointly defined and endorsed by a group of major computer hardware and
software vendors
• Expected to become an ANSI standard later???
its.unc.edu
24
OpenMP Example
(FORTRAN)
+
PROGRAM HELLO
INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,
OMP_GET_THREAD_NUM
C Fork a team of threads giving them their own copies of variables
!$OMP PARALLEL PRIVATE(TID)
C Obtain and print thread id
TID = OMP_GET_THREAD_NUM()
PRINT *, 'Hello World from thread = ', TID
C Only master thread does this
IF (TID .EQ. 0) THEN
NTHREADS = OMP_GET_NUM_THREADS()
PRINT *, 'Number of threads = ', NTHREADS
END IF
C All threads join master thread and disband
!$OMP END PARALLEL
END
its.unc.edu
25
The Message Passing Model
 Parallelization scheme for distributed memory.
 Parallel programs consist of cooperating processes,
each with its own memory.
 Processes send data to one another as messages
 Message can be passed around among compute
processes
 Messages may have tags that may be used to sort
messages.
 Messages may be received in any order.
its.unc.edu
26
MPI: Message Passing Interface
 Message-passing model
 Standard (specification)
• Many implementations (almost each vendor has one)
• MPICH and LAM/MPI from public domain most widely used
• GLOBUS MPI for grid computing
 Two phases:
• MPI 1: Traditional message-passing
• MPI 2: Remote memory, parallel I/O, and dynamic processes
 Online resources
•
•
•
•
•
its.unc.edu
http://www-unix.mcs.anl.gov/mpi/index.htm
http://www-unix.mcs.anl.gov/mpi/mpich/
http://www.lam-mpi.org/
http://www.mpi-forum.org
http://www-unix.mcs.anl.gov/mpi/tutorial/learning.html
27
A Simple MPI Code
#include "mpi.h"
#include <stdio.h>
include ‘mpif.h’
integer myid, ierr, numprocs
int main( argc, argv )
int argc;
char **argv;
call MPI_INIT( ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,
myid, ierr)
call MPI_COMM_SIZE (MPI_COMM_WORLD,
numprocs,ierr)
{ MPI_Init( &argc, &argv );
printf( "Hello world\n" );
MPI_Finalize();
return 0;
}
C Version
its.unc.edu
write(*,*) ‘Hello from ‘, myid
write(*,*) ‘Numprocs is’, numprocs
call MPI_FINALIZE(ierr)
end
FORTRAN Version
28
Other Parallelization Models
 VIA: Virtual Interface Architecture -- Standards-based




its.unc.edu
Cluster Communications
PVM: a portable message-passing programming system,
designed to link separate host machines to form a
``virtual machine'' which is a single, manageable
computing resource. It’s largely an academic effort and
there has been no much development since 1990s.
BSP: Bulk Synchronous Parallel Model, a generalization
of the widely researched PRAM (Parallel Random Access
Machine) model
Linda:a concurrent programming model from Yale, with
the primary concept of ``tuple-space''
HPF: PGI’s first standard parallel programming language
for shared and distributed-memory systems.
29
RC Servers @ UNC-CH
 SGI Altix 3700 – SMP, 128 CPUs,
cedar/cypress
 Emerald LINUX Cluster – Distributed
memory, ~400 CPUs, emerald
•
yatta/p575 IBM AIX nodes
 Dell LINUX cluster – Distributed memory
4160 CPUs, topsail
its.unc.edu
30
IBM P690/P575 SMP
-IBM pSeries 690/P575 Model 6C4, Power4+ Turbo, 32
1.7 GHz processors
- access to 4TB of NetApp NAS RAID array used for
scratch space, mounted as /nas and /netscr
-OS: IBM AIX 5.3 Maintenance Level 04
- login node: emerald.isis.unc.edu
- compute node:
its.unc.edu
-yatta.isis.unc.edu
32 CPUs
-P575-n00.isis.unc.edu
16 CPUs
-P575-n01.isis.unc.edu
16 CPUs
-P575-n02.isis.unc.edu
16 CPUs
-P575-n03.issi.unc.edu
16 CPUs
31
SGI Altix 3700 SMP
 Servers for Scientific Applications
such as Gaussian, Amber, and
custom code
 Login node: cedar.isis.unc.edu
 Compute node:
cypress.isis.unc.edu
 Cypress: SGI Altix 3700bx2 - 128
Intel Itanium2 Processors
(1600MHz), each with 16k L1 cache
for data, 16k L1 cache for
instructions, 256k L2 cache, 6MB L3
cache, 4GB of Shared Memory
(512GB total memory)
 Two 70 GB SCSI System Disks as /scr
its.unc.edu
32
SGI Altix 3700 SMP
 Cedar: SGI Altix 350 - 8 Intel Itanium2
Processors (1500MHz), each with 16k L1
cache for data, 16k L1 cache for
instructions, 256k L2 cache, 4MB L3
cache, 1GB of Shared Memory (8GB total
memory), two 70 GB SATA System Disks.
 RHEL 3 with Propack 3, Service Pack 3
 No AFS (HOME & pkg space) access
 Scratch Disk:
/netscr, /nas, /scr
its.unc.edu
33
Emerald Cluster

General purpose Linux Cluster for Scientific and Statistical
Applications


Machine Name: Emerald.isis.unc.edu

18 Compute Nodes: Dual AMD Athlon 1600+ 1.4GHz MP Processor,
Tyan Thunder MP Motherboard, 2GB DDR RAM on each node

6 Compute Nodes: Dual AMD Athlon 1800+ 1.6GHz MP Processor, Tyan
Thunder MP Motherboard, 2GB DDR RAM on each node

25 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.4GHz, 2.5GB
RAM on each node

96 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 2.8GHz, 2.5GB
RAM on each node

15 Compute Nodes: IBM BladeCenter, Dual Intel Xeon 3.2GHz, 4.0GB
RAM on each node

Access to 10 TB of NetApp NAS RAID array used for scratch space,
mounted as /nas and /scr
Login: emerald.isis.unc.edu
Access to 7TB of NetApp NAS RAID array used for scratch space,
mounted as /nas and /scr
OS: RedHat Enterprise Linux 3.0
TOP500: 395th place in the June 2003 release.




2 Login Nodes: IBM BladeCenter, one Xeon 2.4GHz, 2.5GB RAM and
one Xeon 2.8GHz, 2.5GB RAM
its.unc.edu
34
its.unc.edu
35
Dell LINUX Cluster, Topsail






its.unc.edu
520 dual nodes (4160 CPUs) Xeon (EM64T)
3.6GHz, 2MB L1 cache 2GB memory per CPU
InfiniBand inter-node connection
Not AFS mounted, not open to general public
Access based on peer-reviewed proposal
HPL: 6.252 Teraflops, 74th in 2006 JuneTOP500
list and 104th in the November 2006 list and
25th in the June 2007 list (28.77 teraflops after
upgrade)
36
Topsail

Login node : topsail.unc.edu 8 CPUs @ 2.3 GHz
Intel EM64T with 2x4M L2 cache (Model
E5345/Clovertown), 12 GB memory

Compute nodes : 4,160 CPUs @ 2.3 GHz Intel EM64T
with 2x4M L2 cache (Model E5345/Clovertown), 12
GB memory



Shared Disk : (/ifs1) 39 TB IBRIX Parallel File System
Interconnect: Infiniband 4x SDR
Resource management is handled by LSF v.6.2,
through which all computational jobs are submitted
for processing
its.unc.edu
37
File Systems
 AFS (Andrew File System): AFS is a distributed network file system that
enables files from any AFS machine across the campus to be accessed as
easily as files stored locally.
•
As ISIS HOME for all users with an ONYEN – the Only Name You’ll
Ever Need
•
•
•
•
Limited quote: 250 MB for most users [type “fs lq” to view]
Current production version openafs-1.3.8.6
Files backed up daily [ ~/OldFiles ]
Directory/File tree: /afs/isis/home/o/n/onyen
 For example: /afs/isis/home/m/a/mason, where “mason” is
the ONYEN of the user
•
•
•
•
its.unc.edu
Accessible from emerald, happy/yatta
But not from cedar/cypress, topsail
Recommended to compile, run I/O intensive jobs on /scr or /netscr
More info: http://help.unc.edu/?id=215#d0e24
38
Basic AFS Commands
 To add or remove packages
• ipm add pkg_name, ipm remove pkg_name
 To find out space quota/usage
• fs lq
 To see and review AFS tokens (read/write-able), which
expires in 25 hours
• tokens, klog
 Over 300 packages installed in AFS pkg space
• /afs/isis/pkg/
 More info available at
• http://its.unc.edu/dci/dci_components/afs/
its.unc.edu
39
Data Storage
 Local Scratch: /scr – local to a machine
•
•
•
•


its.unc.edu
Cedar/cypress:
2x500 GB SCSI System Disks
Topsail: /ifs1
39 TB IBRIX Parallel File System
Happy/yatta:
2x500 GB Disk Drives
For running jobs, temporary data storage, not backed up
Network Attached Storage (NAS) – for temporary storage
• /nas/uncch, /netscr
• 7TB of NetApp NAS RAID array used for scratch space, mounted as /nas
and /scr
• For running jobs, temporary data storage, not backed up
• Shared by all login and compute nodes
(cedar/cypress, happy/yatta, emerald)
Mass Storage (MS) – for permanent storage
• Never run jobs using files in ~/ms (compute nodes do not have ~/ms
access)
• Mounted for long term data storage on all scientific computing
servers’ login nodes as ~/ms ($HOME/ms)
40
Subscription of Services
 Have an ONYEN ID
• The Only Name You’ll Ever Need
 Eligibility: Faculty, staff, postdoc, and graduate students
 Go to http://onyen.unc.edu
its.unc.edu
41
Access to Servers
 To Emerald
•
ssh emerald.isis.unc.edu
 To cedar
•
ssh cedar.isis.unc.edu
 To Topsail
• ssh topsail.unc.edu
its.unc.edu
42
Programming Tools
 Compilers
• FORTRAN 77/90/95
• C/C++
 Utility Libraries
• BLAS, LAPACK, FFTW, SCALAPACK
• IMSL, NAG,
• NetCDF, GSL, PETSc
 Parallel Computing
• OpenMP
• PVM
• MPI (MPICH, LAM/MPI, MPICH-GM)
its.unc.edu
43
Compilers: SMP Machines
 Cedar/Cypress – SGI Altix 3700, 128 CPUs
• 64-bit Intel Compiler versions 9.1 and 10.1,
/opt/intel
 FORTRAN 77/90/95:
 C/C++:
ifort/ifc/efc
icc/ecc
• 64-bit GNU compilers
 FORTRAN 77
 C and C++
f77/g77
gcc/cc and
g++/c++
 Yatta/P575 – IBM P690/P575, 32/64CPUs
• XL FORTRAN 77/90 8.1.0.3
• C and C++ AIX 6.0.0.4
its.unc.edu
xlf, xlf90
xlc, xlC
44
Compilers: LINUX Cluster




its.unc.edu
Absoft ProFortran Compilers
• Package Name: profortran
• Current Version: 7.0
• FORTRAN 77 (f77): Absoft FORTRAN 77 compiler version 5.0
• FORTRAN 90/95 (f90/f95): Absoft FORTRAN 90/95 compiler version 3.0
GNU Compilers
• Package Name: gcc
• Current Version: 4.1.2
• FORTRAN 77 (g77/f77): 3.4.3, 4.1.2
• C (gcc): 3.4.3, 4.1.2
• C++ (g++/c++): 3.4.3, 4.1.2
Intel Compilers
• Package Name: intel_fortran intel_CC
• Current Version: 10.1
• FORTRAN 77/90 (ifc): Intel LINUX compiler version 8.1, 9.0, 10.1
• CC/C++ (icc): Intel LINUX compiler version 8.1, 9.0, 10.1
Portland Group Compilers
• Package Name: pgi
• Current Version: 7.1.6
• FORTRAN 77 (pgf77): The Portland Group, Inc. pgf77 v6.0, 7.0.4, 7.1.3
• FORTRAN 90 (pgf90): The Portland Group, Inc. pgf90 v6.0, 7.0.4, 7.1.3
• High Performance FORTRAN (pghpf): The Portland Group, Inc. pghpf v6.0, 7.0.4, 7.1.3
• C (pgcc): The Portland Group, Inc. pgcc v6.0, 7.0.4, 7.1.3
• C++ (pgCC): The Portland Group, Inc. v6.0, 7.0.4, 7.1.3
45
LINUX Compiler Benchmark
Absoft
ProFortran 90
Intel FORTRAN
90
Portland Group
FORTRAN 90
GNU FORTRAN
77
Molecular Dynamics
(CPU time)
4.19 (4)
2.83 (2)
2.80 (1)
2.89 (3)
Kepler (CPU Time)
0.49 (1)
0.93 (2)
1.10 (3)
1.24 (4)
Linpack (CPU Time)
98.6 (4)
95.6 (1)
96.7 (2)
97.6 (3)
182.6 (4)
183.8 (1)
183.2 (3)
183.3 (2)
89.5 (4)
70.0 (3)
68.7 (2)
68.0 (1)
309.7 (3)
403.0 (2)
468.9 (1)
250.9 (4)
20
11
12
17
Linpack (MFLOPS)
LFK (CPU Time)
LFK (MFLOPS)
Total Rank
•For reference only. Notice that performance is code and compilation flag dependent. For each benchmark, three
identical runs were performed and the best CPU timing was chosen among the three and then listed in the Table.
Optimization flags: for Absoft -O, Portland Group -O4 -fast, Intel -O3, GNU -O
its.unc.edu
46
Profilers & Debuggers
 SMP machines
• Happy: dbx, prof, gprof
• Cedar: gprof
 LINUX Cluster
• PGI: pgdebug, pgprof, gprof
• Absoft: fx, xfx, gprof
• Intel: idb, gprof
• GNU: gdb, gprof
its.unc.edu
47
Utility Libraries
 Mathematic Libraries
• IMSL, NAG, etc.
 Scientific Computing
• Linear Algebra
 BLAS, ATLAS
 EISPACK
 LAPACK
 SCALAPACK
•
•
•
•
its.unc.edu
Fast Fourier Transform, FFTW
BLAS/LAPACK, ScaLAPACK
The GNU Scientific Library, GSL
Utility Libraries, netCDF, PETSc, etc.
48
Utility Libraries
 SMP Machines
• Yatta/P575: ESSL (Engineering and Scientific Subroutine
Library), -lessl
 BLAS
 LAPACK
 EISPACK
 Fourier Transforms, Convolutions and Correlations, and
Related Computations
 Sorting and Searching
 Interpolation
 Numerical Quadrature
 Random Number Generation
 Utilities
its.unc.edu
49
Utility Libraries
 SMP Machines
 Cedar/Cypress: MKL (Intel Math Kernel Library) 8.0,
-L/opt/intel/mkl721/lib/64 -lmkl -lmkl_lapack
-lsolver -lvml -lguide
o
o
o
o
o
o
its.unc.edu
BLAS
LAPACK
Sparse Solvers
FFT
VML (Vector Math Library)
Random-Number Generators
50
Utility Libraries for
Emerald Cluster

Mathematic Libraries
• IMSL
 The IMSL Libraries are a comprehensive set of mathematical and statistical
functions
 From Visual Numerics, http://www.vni.com
 Functions include
- Optimization
- Interpolation
- Correlation
- Time series analysis
 Available in FORTRAN and C






- FFT’s
- Differential equations
- Regression
- and many more
Package name: imsl
Required compiler: Portland Group compiler, pgi
Installed on AFS ISIS package space, /afs/isis/pkg/imsl
Current default version 4.0, latest version 5.0
To subscribe IMSL, type “ipm add pgi imsl”
To compiler a C code, code.c, using IMSL:
pgcc -O $CFLAGS code.c -o code.x $LINK_CNL_STATIC
its.unc.edu
51
Utility Libraries for
Emerald Cluster

Mathematic Libraries
• NAG
 NAG produces and distributes numerical, symbolic, statistical, visualisation and
simulation software for the solution of problems in a wide range of applications
in such areas as science, engineering, financial analysis and research.
 From Numerical Algorithms Group, http://www.nag.co.uk
 Functions include
- Optimization
- FFT’s
- Interpolation
- Differential equations
- Correlation
- Regression
- Time series analysis
- Multivariate factor analysis
- Linear algebra
- Random number generator
 Available in FORTRAN and C
 Package name: nag
 Available platform: SGI IRIX, SUN Solaris, IBM AIX, LINUX
 Installed on AFS ISIS package space, /afs/isis/pkg/nag
 Current default version 6.0
 To subscribe IMSL, type “ipm add nag”
its.unc.edu
52
Utility Libraries
for Emerald Cluster
 Scientific Libraries
•
Linear Algebra
 BLAS, LAPACK, LAPACK90, LAPACK++, ATALS,
SPARSE-BLAS, SCALAPACK, EISPACK, FFTPACK,
LANCZOS, HOMPACK, etc.
 Source code downloadable from the website:
http://www.netlib.org/liblist.html
 Compiler dependent
 BLAS and LAPACK available for all 4 compiler at AFS ISIS
package space, gcc, profortran, intel and pgi
 SCALAPACK available for pgi and intel compilers
 Assistance available if other versions are needed
its.unc.edu
53
Utility Libraries
for Emerald Cluster
 Scientific Libraries
•
Other Libraries: not fully implemented yet and thus please be cautious
and patient when using them
 FFTW
http://www.fftw.org/
 GSL
http://www.gnu.org/software/gsl/
 NetCDF
http://www.unidata.ucar.edu/software/netcdf/
 NCO
http://nco.sourceforge.net/
 HDF
http://hdf.ncsa.uiuc.edu/hdf4.html
 OCTAVE
http://www.octave.org/
 PETSc http://www-unix.mcs.anl.gov/petsc/petsc-as/
 ……
•
its.unc.edu
If you think more libraries are of broad interest, please recommend to us
54
Parallel Computing
 SMP Machines:
• OpenMP
 Compilation:
Use “-qsmp=omp” flag on happy
 Use “-openmp” flag on cedar
 Environmental Variable Setup


setenv OMP_NUM_THREADS n
• MPI
 Compilation:
Use “-lmpi” flag on cedar
 Use MPI capable compilers, e.g., mpxlf,
mpxlf90, mpcc, mpCC
Hybrid (OpenMP and MPI): Do both!

•
its.unc.edu
55
Parallel Computing With
Emerald Cluster
 Setup
MPI Implementation
MPICH
MPI-LAM
MPI Package to be “ipm add”-ed
mpich
mpi-lam
Vendor\Programming Language
F77
GNU Compilers
F90
√
C
C++ F77
√
√
√
F90
C
C++
√
√
Absoft ProfFortran Compilers
√
√
√
√
√
√
√
√
Portland Group Compilers
√
√
√
√
√
√
√
√
Intel Compilers
√
√
√
√
√
√
√
√
its.unc.edu
56
Parallel Computing With
Emerald Cluster
 Setup
Vendor \
Language
Package Name
FORTRAN 77
GNU
gcc
g77
Absoft
ProfFortran
profortran
f77
f95
Portland
Group
pgi
pgf77
Intel
intel_fortran
intel_CC
Commands for
Parallel MPI
Compilation
mpich or
mpi-lam
its.unc.edu
FORTRAN 90
C
C++
gcc
g++
pgf90
pgcc
pgCC
ifc
ifc
icc
icc
mpif77
mpif90
mpicc
mpiCC
57
Parallel Computing With
Emerald Cluster
 Setup
• AFS Packages to be “ipm add”-ed
• Notice the order: Compiler is always added first
• Add ONLY ONE compiler into your environment
COMPILER
MPICH
MPI-LAM
GNU
ipm add gcc mpich
ipm add gcc mpi-lam
Absoft ProFortran
ipm add profortran
mpich
ipm add profortran
mpi-lam
Portland Group
ipm add pgi mpich
ipm add pgi mpi-lam
Intel
ipm add intel_fortran
intel_CC mpich
ipm add intel_fortran
intel_CC mpi-lam
its.unc.edu
58
Parallel Computing With
Emerald Cluster
 Compilation
• To compile an MPI Fortran 77 code, code.f, and to form
an executable, exec
%mpif77 -O -o exec code.f
• For a Fortran 90/95 code, code.f90, and to form an
executable, exec
%mpif90 -O -o exec code.f90
• For a C code, code.c, and to form an executable, exec
%mpicc -O -o exec code.c
• For a C++ code, code.cc, and to form an executable,
exec
%mpiCC -O -o exec code.cc
its.unc.edu
59
Scientific Packages
 Available in AFS package space
 To subscribe a package, type “ipm add pkg_name” where
“pkg_name is the name of the package. For example, “ipm add
gaussian”
 To remove it, type “ipm remove pkg_name”
 All packages are installed at the /afs/isis/pkg/ directory.
For example, /afs/isis/pkg/gaussian.
 Categories of scientific packages include:
• Quantum Chemistry
• Molecular Dynamics
• Material Science
• Visualization
• NMR Spectroscopy
• X-Ray Crystallography
• Bioinformatics
• Others
its.unc.edu
60
Scientific Package: Quantum
Chemistry
Current
Version
Software
Package Name
Platforms
ABINIT
abinit
IRIX/LINUX
4.3.3
YES (MPI)
ADF
adf
LINUX
2002.02
Yes (PVM)
Cerius2
cerius2
IRIX/LINUX
4.10
Yes (MPI)
GAMESS
gamess
IRIX/LINUX
2003.9.6
Yes (MPI)
Gaussian
gaussian
IRIX/LINUX
03E01
Yes (OpenMP)
MacroModel
macromodel
IRIX
7.1
No
MOLFDIR
molfdir
IRIX
2001
NO
Molpro
molpro
IRIX/LINUX
2006.6
Yes (MPI)
NWChem
nwchem
IRIX/LINUX
5.1
Yes (MPI)
MaterialStudio
materisalstudi
o
LINUX
4.2
Yes (MPI)
CPMD
cpmd
IRIX/LINUX
3.9
YES (MPI)
ACES2
aces2
IRIX
4.1.2
No
its.unc.edu
Parallel
61
Scientific Package: Molecular
Dynamics
Current
Version
Software
Package Name
Platforms
Amber
amber
IRIX/LINUX
9.1
MPI
NAMD/VMD
namd,vmd
IRIX/LINUX
2.5
MPI
Gromacs
gromcs
IRIX/LINUX
3.2.1
MPI
InsightII
insightII
IRIX
2000.3
--
MacroModel
macromodel
IRIX
7.1
--
PMEMD
pmemd
IRIX/LINUX
3.0.0
MPI
Quanta
quanta
IRIX
2005
MPI
Sybyl
sybyl
IRIX/LINUX
7.1
--
CHARMM
charmm
IRIX
3.0B1
MPI
TINKER
tinker
LINUX
4.2
--
O
o
IRIX
9.0.7
--
its.unc.edu
Parallel
62
Molecular & Scientific
Visualization
its.unc.edu
Software
Package Name
Platforms
Current Version
AVS
avs
IRIX
5.6
AVS Express
Avs-express
IRIX
6.2
Cerius2
cerius2
IRIX/LINUX
4.9
DINO
dino
IRIX
0.8.4
ECCE
ecce
IRIX
2.1
GaussView
gaussian
IRIX/LINUX/AIX
4.0
GRASP
grasp
IRIX
1.3.6
InsightII
insightII
IRIX/LINUX
2000.3
MOIL-VIEW
Moil-view
IRIX
9.1
MOLDEN
molden
IRIX/LINUX
4.0
MOLKEL
molkel
IRIX
4.3
MOLMOL
molmol
IRIX
2K.1
MOLSCRIPT
molscript
IRIX
2.1.2
MOLSTAR
molstar
IRIX/LINUX
1.0
63
Molecular & Scientific
Visualization
Software
its.unc.edu
Platforms
Current Version
MOVIEMOL
Package Name
moviemol
IRIX
1.3.1
NBOView
nbo
IRIX/LINUX
5.0
QUANTA
quanta
IRIX/LINUX
2005
RASMOL
rasmol
IRIX/LINUX/AIX
2.7.3
RASTER3D
raster3d
IRIX/LINUX
2.7c
SPARTAN
spartan
IRIX
5.1.3
SPOCK
spock
IRIX
1.7.0p1
SYBYL
sybyl
IRIX/LINUX
7.1
VMD
vmd
IRIX/LINUX
1.8.2
XtalView
xtalview
IRIX
4.0
XMGR
xmgr
IRIX
4.1.2
GRACE
grace
IRIX/LINUX
5.1.2
IMAGEMAGICK
Imagemagick
IRIX/LINUX/AIX
6.2.1.3
GIMP
gimp
IRIX/LINUX/AIX
1.0.2
XV
xv
IRIX/LINUX/AIX
3.1.0a
64
NMR & X-Ray Crystallography
its.unc.edu
Software
Package Name
Platforms
Current Version
CNSsolve
cnssolve
IRIX/LINUX
1.1
AQUA
aqua
IRIX/LINUX
3.2
BLENDER
blender
IRIX
2.28a
BNP
bnp
IRIX/LINUX
0.99
CAMBRIDGE
cambridge
IRIX
5.26
CCP4
ccp4
IRIX/LINUX
4.2.2
CNX
cns
IRIX/LINUX
2002
FELIX
felix
IRIX/LINUX
2004
GAMMA
gamma
IRIX
4.1.0
MOGUL
mogul
IRIX/LINUX
1.0
Phoelix
phoelix
IRIX
1.2
TURBO
turbo
IRIX
5.5
XPLOR-NIH
Xplor_nih
IRIX/LINUX
2.11.2
XtalView
xtalview
IRIX
4.0
65
Scientific Package:
Bioinformatics
its.unc.edu
Software
Package Name
Platforms
Current Version
BIOPERL
bioperl
IRIX
1.4.0
BLAST
blast
IRIX/LINUX
2.2.6
CLUSTALX
clustalx
IRIX
8.1
EMBOSS
emboss
IRIX
2.8.0
GCG
gcg
LINUX
11.0
Insightful Miner
iminer
IRIX
3.0
Modeller
modeller
IRIX/LINUX
7.0
PISE
pise
LINUX
5.0a
SEAVIEW
seaview
IRIX/LINUX
1.0
AUTODOCK
autodock
IRIX
3.05
DOCK
dock
IRIX/LINUX
5.1.1
FTDOCK
ftdock
IRIX
1.0
HEX
hex
IRIX
2.4
66
Why do We Need Job
Management Systems?
 “Whose job you run in addition to when
and where it is run, may be as important as
how many jobs you run!”
 Effectively optimizes the utilization of
resources
 Effectively optimizes the sharing of
resources
 Often referred to as Resource Management
Software, Queuing Systems, or Job
Management System, etc.
its.unc.edu
67
Job Management Tools
 PBS - Portable Batch System
• Open Source Product Developed at NASA Ames Research Center
 DQS - Distributed Queuing System
• Open Source Product Developed by SCRI at Florida State University
 LSF - Load Sharing Facility
• Commercial Product from Platform Computing, Already Deployed at UNC-CH
ITS Computing Servers
 Codine/Sun Grid Engine
• Commercial Version of DQS from Gridware, Inc. Now owned by SUN.
 Condor
• A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin
 Others Too Numerous To Mention
its.unc.edu
68
Operations of LSF
Submission host other
3
LIM
2
6
7
bsub app
LIM
MLIM
MBD
SBD
RES
its.unc.edu
MLIM
4
5
Batch API
1
hosts
Master host
Load
information
MBD
9
8
queue
– Load Information Manager
– Master LIM
– Master Batch Daemon
– Slave Batch Daemon
– Remote Execution Server
other
hosts
10
Execution host
LIM
SBD
11
Child SBD
12
RES
13
User job
69
Common LSF Commands





its.unc.edu
lsid
• A good choice of LSF command to
start with is the lsid command
lshosts/bhosts
• shows all of the nodes that the
LSF system is aware of
bsub
• submits a job interactively or in
batch using LSF batch scheduling
and queue layer of the LSF suite
bjobs
• isplays information about a
recently run job. You can use the
–l option to view a more detailed
accounting
bqueues
• displays information about the
batch queues. Again, the –l option
will display a more thorough
description






bkill <job ID# >
• kill the job with job ID number of
#
bhist -l <job ID# >
• displays historical information
about jobs. A “-a” flag can
displays information about both
finished and unfinished jobs
bpeek -f <job ID#>
• displays the stdout and stderr
output of an unfinished job with a
job ID of #.
bhpart
• displays information about host
partitions
bstop
• Suspend a unfinished jobs
bswitch
• switches unfinished jobs from one
queue to another
70
More about LSF
 Type “jle” -- checks job efficiency
 Type “bqueues” for all queues on one cluster/machine (-m);
Type “bqueues -l queue_name” for more info about the
queue named “queue_name”
 Type “busers” for user job slot limits
 Specific for Baobab:
• cpufree -- to check how many free/idle CPUs avaialble
• pending -- to check how many jobs are still pending
• bfree – to check how many free slots available “bfree –
h”
its.unc.edu
71
LSF Queues Emerald Clusters
Queues
int
Interactive jobs
now
Preemptive debugging queue, 10 min wall-clock limit, 2 CPUs
week
Default queue, one week wall-clock limit, up to 32 CPUs/user
month
Long-running serial-job queue, one month wall-clock limit, up
to 4 jobs per user
staff
ITS Research Computing staff queue
manager
its.unc.edu
Description
For use by LSF administrators
72
How to Submit Jobs via LSF
on Emerald Clusters
 Jobs to Interactive Queue
bsub -q int -m cedar -Ip my_interactive_job
 Serial Jobs
bsub -q week -m cypress my_batch_job
 Parallel OpenMP Jobs
setenv OMP_NUM_THREADS 4
bsub -q week -n 4 -m cypress my_parallel_job
 Parallel MPI Jobs
bsub -q week -n 4 -m cypress mpirun -np 4
my_parallel_job
its.unc.edu
73
Peculiars of Emerald Cluster
Resources
CPU Type
-R
Xeon 2.4 GHz
Xeon 2.8 GHz
Parallel Job Submission
esub
-a
Wrapper
lammpi
lammpirun_wrappe
r
Xeon24, blade,…
Xeon28, blade,…
mpichp4
Xeon 3.2 GHz
Xeo32, blade,…
16-Way IBM P575
p5aix,…
mpichp4_wrapper
Notice that -R and -a flags are mutually exclusive in one command line.
its.unc.edu
74
Run Jobs on Emerald
LINUX Cluster
 Interactive Jobs
bsub -q int -R xeon28 -Ip my_interactive_job
 Syntax for submitting a serial job is:
bsub -q queuename -R resources executable
•
For example
bsub -q week -R blade my_executable
 To run a MPICH parallel job on AMD Athlon machines with, say, 4
CPUs,
bsub -q idle -n 4 -a mpichp4 mpirun.lsf my_par_job
 To run LAM/MPI parallel jobs on IBM BladeCenter machines with, say,
4 CPUs:
bsub -q week -n 4 -a lammpi mpirun.lsf my_par_job
its.unc.edu
75
Final Friendly Reminders
 Never run jobs on login nodes
• For file management, coding, compilation, etc.,
purposes only
 Never run jobs outside LSF
• Fair sharing
 Never run jobs on your AFS ISIS home or ~/ms. Instead, on
/scr, /netscr, or /nas
• Slow I/O response, limited disk space
 Move your data to mass storage after jobs are finished and
remove all temporary files on scratch disks
• Scratch disk not backed up, efficient use of limited
resources
• Old files will automatically be deleted without
notification
its.unc.edu
76
Online Resources
 Get started with Research Computing:





http://www.unc.edu/atn/hpc/getting_started/index.shtml?id=4196
Programming Tools
http://www.unc.edu/atn/hpc/programming_tools/index.shtml
Scientific Packages
http://www.unc.edu/atn/hpc/applications/index.shtml?id=4237
Job Management
http://www.unc.edu/atn/hpc/job_management/index.shtml?id=4484
Benchmarks
http://www.unc.edu/atn/hpc/performance/index.shtml?id=4228
High Performance Computing
http://www.beowulf.org
http://www.top500.org
http://www.linuxhpc.org
http://www.supercluster.org/
its.unc.edu
77
Short Courses













Introduction to Scientific Computing
Introduction to Emerald
Introduction to Topsail
LINUX: Introduction
LINUX: Intermediate
MPI for Parallel Computing
OpenMP for Parallel Computing
MATLAB: Introduction
STATA: Introduction
Gaussian and GaussView
Introduction to Computational Chemistry
Shell Scripting
Introduction to Perl
http://learnit.unc.edu click “Current Schedule of ITS Workshops”
its.unc.edu
78
Hands-on Exercises
 If you haven’t done so yet
•
•
•
•
•




its.unc.edu
Subscribe the Research Computing services
Access via SecureCRT or X-Win32 to emerald, topsail, etc.
Create a working directory for yourself on /netscr or /scr
Get to know basic AFS, UNIX commands
Get to know the Baobab Beowulf cluster
Compile serial and one parallel (MPI) codes on Emerald
Get familiar with basic LSF commands
Get to know available packages available in AFS space
Submit jobs via LSF using serial or parallel queues
79
Please direct comments/questions about
research computing to
E-mail: [email protected]
Please direct comments/questions
pertaining to this presentation to
E-Mail: [email protected]
its.unc.edu