No Slide Title

Download Report

Transcript No Slide Title

Overview of UCSD’s Triton
Resource
A cost-effective, high performance
shared resource for research
computing
SAN DIEGO SUPERCOMPUTER CENTER
What is the Triton Resource?
• A medium-scale high performance computing (HPC) and
data storage system
• Designed to serve the needs of UC researchers:
• Turn-key, cost competitive access to a robust computing resource
• Supports computing research, scientific & engineering computing,
large scale data analysis
• Lengthy proposals & long waits for access are not required
• Support short- or long-term projects
• Flexible usage models are accommodated
• Free of equipment headaches and staffing costs associated with
maintaining a dedicated cluster
SAN DIEGO SUPERCOMPUTER CENTER
Triton Resource Components
Data Oasis: 2,000 – 4,000 terabytes of disk storage for research data
High Performance File System
Petascale Data Analysis Facility (PDAF):
Unique SMP system for analyzing very large
datasets. 28 nodes, 256/512GB of memory,
8 quad core AMD Shanghai processors/node
(32 cores/node).
High Performance
Network
To High
Bandwidth
Research
Networks &
Internet
SAN DIEGO SUPERCOMPUTER CENTER
Triton Compute Cluster (TCC):
Medium-scale cluster system for general
purpose HPC. 256 nodes, 24GB of
memory, 2 quad core Nehalem
processors/node (8 cores/node).
Flexible Usage Models
• Shared-queue access
•
•
•
•
Compute nodes are shared with other users
Jobs are submitted to queue and wait to run
Batch and interactive jobs are supported
User accounts are debited by actual service units consumed by the job
• Dedicated compute nodes
• User can reserve a fixed number of compute nodes for exclusive access
• User is charged for 24x7 use of the nodes at 70% utilization
• Any utilization over 70% is a “bonus”
• Nodes may be reserved on a monthly basis
• Hybrid
• Dedicated nodes for core computing tasks and shared-queue access for
overflow or jobs that are not time-critical or jobs requiring higher core counts
SAN DIEGO SUPERCOMPUTER CENTER
Triton Resource Benefits
•
•
•
•
•
•
•
•
•
Short lead time for project start-up
Low waits-in-queue
No lengthy proposal process
Flexible usage models:
Access to HPC experts for setup, software optimization and troubleshooting
Avoid using research staff for sysadmin tasks
Avoid headaches with maintenance, aging equipment, project winddown
Access to parallel high performance, high capacity storage system
Access to high bandwidth research networks
SAN DIEGO SUPERCOMPUTER CENTER
Triton Affiliates & Partners
Program
• TAPP is SDSC’s program for accessing the Triton
Resource
• Two components:
• Central Campus Purchase
• Individual / Department Purchase
• Central Campus Purchase
• Block purchase made by central campus then allocated out to
individual faculty / researchers
• Individual Purchase
• Faculty / researchers / departments purchase cycles from grants or
other funding
• Startup Accounts
• 1,000 SU accounts for evaluation are granted upon request
SAN DIEGO SUPERCOMPUTER CENTER
Contact for access/allocations:
Ron Hawkins
TAPP Manager
[email protected]
(858) 534-5045
SAN DIEGO SUPERCOMPUTER CENTER
Numerical Libraries on Triton
Mahidhar Tatineni
04/22/2010
SAN DIEGO SUPERCOMPUTER CENTER
AMD Core Math Library (ACML)
• Installed on Triton as part of the PGI compiler
installation directory.
• Covers BLAS, LAPACK, and FFT routines.
• ACML user guide is in the following location:
/opt/pgi/linux86-64/8.0-6/doc/acml.pdf
• Example BLAS, LAPACK, FFT codes in:
/home/diag/examples/ACML
SAN DIEGO SUPERCOMPUTER CENTER
BLAS Example Using ACML
• Compile and link as follows:
pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm lpgftnrtl –lrt
• Output:
-bash-3.2$ ./a.out
ACML example: dot product of two complex vectors using cdotu
-----------------------------------------------------------Vector x: ( 1.0000, 2.0000)
( 2.0000, 1.0000)
( 1.0000, 3.0000)
Vector y: ( 3.0000, 1.0000)
( 1.0000, 4.0000)
( 1.0000, 2.0000)
r = x.y = (
-6.000,
21.000)
SAN DIEGO SUPERCOMPUTER CENTER
Lapack Example Using ACML
• Compile and link as follows:
pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml lm -lpgftnrtl –lrt
• Output:
-bash-3.2$ ./a.out
ACML example: SVD of a matrix A using dgesdd
-------------------------------------------Matrix A:
-0.5700 -1.2800
-1.9300 1.0800
2.3000 0.2400
-1.9300 0.6400
-0.3900
-0.3100
0.4000
-0.6600
0.2500
-2.1400
-0.3500
0.0800
Singular values of matrix A:
3.9147 2.2959 1.1184 0.3237
SAN DIEGO SUPERCOMPUTER CENTER
FFT Example Using ACML
• Compile and link as follows:
pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib –lacml
• Output:
-bash-3.2$ ./a.out
ACML example: FFT of a real sequence using ZFFT1D
-------------------------------------------------Components of discrete Fourier transform:
1 2.4836
2 -0.2660
3 -0.2577
4 -0.2564
5 0.0581
6 0.2030
7 0.5309
Original sequence as restored by inverse transform:
1
2
3
4
5
6
7
Original Restored
0.3491 0.3491
0.5489 0.5489
0.7478 0.7478
0.9446 0.9446
1.1385 1.1385
1.3285 1.3285
1.5137 1.5137
SAN DIEGO SUPERCOMPUTER CENTER
Intel Math Kernel Libraries
(MKL)
• Installed on Triton as part of the Intel compiler
directory.
• Covers BLAS, LAPACK, FFT, BLACS, and
SCALAPACK libraries.
• Most useful link: The Intel link advisor!
http://software.intel.com/en-us/articles/intel-mkl-link-lineadvisor/
• Examples in the following directory:
• /home/diag/examples/MKL
SAN DIEGO SUPERCOMPUTER CENTER
CBLAS example using MKL
• Compile as follows:
> export MKLPATH=/opt/intel/Compiler/11.1/046/mkl
> icc cblas_cdotu_subx.c common_func.c -I$MKLPATH/include
$MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group
$MKLPATH/lib/em64t/libmkl_intel_lp64.a
$MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a
-Wl,--end-group -lpthread
• Run as follows:
[mtatineni@login-4-0 MKL]$ ./a.out cblas_cdotu_subx.d C B L A S _ C D O T U _ S U B EXAMPLE PROGRAM
INPUT DATA
N=4
VECTOR X INCX=1
( 1.00, 1.00) ( 2.00, -1.00) ( 3.00, 1.00) ( 4.00, -1.00)
VECTOR Y INCY=1
( 3.50, 0.00) ( 7.10, 0.00) ( 1.20, 0.00) ( 4.70, 0.00)
OUTPUT DATA
CDOTU_SUB = ( 40.100, -7.100)
SAN DIEGO SUPERCOMPUTER CENTER
LAPACK example using MKL
• Compile as follows:
ifort dgebrdx.f -I$MKLPATH/include
$MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group
$MKLPATH/lib/em64t/libmkl_intel_lp64.a
$MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a
-Wl,--end-group libaux_em64t_intel.a -lpthread
• Output:
[mtatineni@login-4-0 MKL]$ ./a.out < dgebrdx.d
DGEBRD Example Program Results
Diagonal
3.6177 2.4161 -1.9213 -1.4265
Super-diagonal
1.2587 1.5262 -1.1895
SAN DIEGO SUPERCOMPUTER CENTER
ScaLAPACK example using MKL
• Sample test case (from MKL examples) is in:
• /home/diag/examples/scalapack
• The make file is set up to compile all the tests. Procedure:
module purge
module load intel
module load openmpi_mx
make libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/046/mkl/lib/em64t
• Sample link line (to illustrate how to link for scalapack):
mpif77 -o ../xsdtlu_libem64t_openmpi_intel_noopt_lp64 psdtdriver.o psdtinfo.o psdtlaschk.o psdbmv1.o
psbmatgen.o psmatgen.o pmatgeninc.o -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t
/opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_scalapack_lp64.a
/opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a L/opt/intel/Compiler/11.1/046/mkl/lib/em64t
/opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group
/opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_sequential.a
/opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread
SAN DIEGO SUPERCOMPUTER CENTER
Profiling Tools on Triton
• FPMPI
• MPI profiling library: /home/beta/fpmpi/fpmpi-2
(PGI +MPICH MX)
• TAU
• Profiling and tracing toolkit for performance analysis of
parallel programs written in Fortran, C, C++, Java, Python.
Available on Triton, compiled with PGI compilers.
/home/beta/tau/2.19-pgi
/home/beta/pdt/3.15-pgi
SAN DIEGO SUPERCOMPUTER CENTER
Using FPMPI on Triton
• The library is located in:
/home/beta/fpmpi/fpmpi-2/lib
• Needs PGI and MPICH MX:
> module purge
> module load pgi
> module load mpich_mx
• Just relink with the library. For example:
/opt/pgi/mpichmx_pgi/bin/mpicc -o cpi cpi.o -L/home/beta/fpmpi/fpmpi-2/lib -lfpmpi
SAN DIEGO SUPERCOMPUTER CENTER
Using FPMPI on Triton
• Run code normally:
>mpirun -machinefile $PBS_NODEFILE -np 2 ./cpi
Process 1 on tcc-2-25.local
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.036982
Process 0 on tcc-2-25.local
• Creates output file (fpmpi_profile.txt) with profile
data.
• Check /home/diag/FPMPI directory for more
examples.
SAN DIEGO SUPERCOMPUTER CENTER
Sample FPMPI Output
•
Command: /mirage/mtatineni/TESTS/FPMPI/./cpi
•
•
•
•
•
Date:
Wed Apr 21 16:44:04 2010
Processes:
2
Execute time:
0
Timing Stats: [seconds]
[min/max]
wall-clock: 0 sec
0.000000 / 0.000000
[min rank/max rank]
0/0
•
Memory Usage Stats (RSS) [min/max KB]:
825/926
•
•
•
•
•
•
Average of sums over all processes
Routine
Calls
Time Msg Length %Time by message length
0.........1........1........
K
M
MPI_Bcast
:
2 0.00179
4 0*00000000000000000000000000
MPI_Reduce
:
1 0.0252
8 00*0000000000000000000000000
SAN DIEGO SUPERCOMPUTER CENTER
Sample FPMPI Output
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Details for each MPI routine
Average of sums over all processes
% by message length
(max over
0.........1........1........
processes [rank])
K
M
MPI_Bcast:
Calls :
2
2 [ 0] 0*00000000000000000000000000
Time
: 0.00179
0.00356 [ 1] 0*00000000000000000000000000
Data Sent :
4
8 [ 0]
By bin : 1-4 [2,2]
[ 5.96e-06, 0.00356]
MPI_Reduce:
Calls :
1
1 [ 0] 00*0000000000000000000000000
Time
: 0.0252
0.027 [ 0] 00*0000000000000000000000000
Data Sent :
8
8 [ 0]
By bin : 5-8 [1,1]
[ 0.0235, 0.027]
•
•
•
•
Summary of target processes for point-to-point communication:
1-norm distance of point-to-point with an assumed 2-d topology
(Maximum distance for point-to-point communication from each process)
0 0
•
•
•
•
Detailed partner data: source: dest1 dest2 ...
Size of COMM_WORLD
2
0:
1:
SAN DIEGO SUPERCOMPUTER CENTER
About Tau
TAU is a suite of Tuning and Analysis Utilities
www.cs.uoregon.edu/research/tau
• 11+ year project involving
•
•
•
University of Oregon Performance Research Lab
LANL Advanced Computing Laboratory
Research Centre Julich at ZAM, Germany
• Integrated toolkit
•
•
•
•
Performance instrumentation
Measurement
Analysis
Visualization
SAN DIEGO SUPERCOMPUTER CENTER
Using Tau
• Load the papi and tau modules
• Gather information for the profile run:
• Type of run (profiling/tracing, hardware counters, etc…)
• Programming Paradigm (MPI/OMP)
• Compiler (Intel/PGI/GCC…)
• Select the appropriate TAU_MAKEFILE based on your choices
($TAU/Makefile.*)
• Set up the selected PAPI counters in your submission script
• Run as usual & analyze using paraprof
• You can transfer the database to your own PC to do the analysis
SAN DIEGO SUPERCOMPUTER CENTER
TAU Performance System
Architecture
SAN DIEGO SUPERCOMPUTER CENTER
Tau: Example
Set up the tau environment (this will be in modules in the next software stack of Triton):
export PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH
export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH
Choose the TAU_MAKEFILE to use for your code. For example:
/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi
So we set it up:
% export TAU_MAKEFILE=/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi
And we compile using the wrapper provided by tau:
% tau_cc.sh matmult.c
Run the job through the queue normally. Analyze output using paraprof. (More detail in the
Ranger part of the presentation).
SAN DIEGO SUPERCOMPUTER CENTER
Coming Soon on Triton
• Data Oasis version 0! We have the hardware on site and
are working to get the lustre filesystem setup (~350TB).
• Upgrade of entire software stack. A lot of the packages in
/home/beta will become a permanent part of the stack
(we have rocks rolls for them). This will happen within a
month.
• mpiP will be installed soon on Triton.
• PAPI/IPM needs perfctr patch of kernel. Need to integrate
this into our stack (Not in the current upgrade).
SAN DIEGO SUPERCOMPUTER CENTER