PAT application testing suite
Download
Report
Transcript PAT application testing suite
PAT application suite
Adam Leko
Hans Sherburne
HCS Research Laboratory
University of Florida
Application suite requirements overview
Purpose: test the ability of a performance tool to assist the user in picking out a
performance problem in their application
Method: select 3-5 applications used during performance tool testing
Categories of bottlenecks
Each has a particular “bottleneck” and a decent amount of parallelism
Also include “regular” program to make sure tool doesn’t give bogus recommendations
Applications should be well-known and mainly well-implemented
Problem size about ~200s running time sequential
Since most tools support MPI, suite should contain mainly MPI programs
Too much synchronization
Too many small messages
Irregular work distribution (load imbalance)
Bandwidth-dependent code
Etc…
Applications should be reasonably-sized, but not too large
Small, limited applications (e.g., microbenchmarks that test communication performance) will
be too simplistic
Large applications (e.g., MPIBlast) will take to long to try out with each performance tool
Applications must be easy to compile, since instrumenting will most likely complicate the
compilation process
2
Initial tests: PPerfMark
Authors
MPI port of “Grindstone: A PVM Parallel Performance Analysis Tool
Validation Suite” by JK Hollingsworth
Set of 9 “synthetic” application codes
Each with a particular (obvious) bottleneck
Smallish size (100-200 lines of C MPI)
Performance bottlenecks
Maseeh College of Engineering and Computer Science
http://www.cs.pdx.edu/~karavan/pperfdb.html
Very large messages (big-message.c)
Overloaded server/load imbalance (intensive-server.c)
Round-trip latency (ping-pong.c)
Too much synchronization (random-barrier.c)
Too many small messages (small-messages.c)
Sending messages in another order they are received in (wrong-way.c)
One procedure taking up a lot of time (hot-procedure.c, diffuse-procedure.c)
Too much time in kernel calls (system-time.c)
Use this suite to get a good idea how each performance tool helps pick
out bottlenecks
3
Verification: NPB benchmark, LU
Authors
Brief description
Part of the NAS NPB benchmark suite V2.4/V3.1
Performs LU decomposition of a matrix
MPI-Fortran77 code, ~5 kLOC
Performance bottlenecks
NAS group @ NASA
http://www.nas.nasa.gov/Software/NPB/
Large numbers of small (40-byte) messages [tech report
NAS-95-020]
Not much inherent parallelism
Can verify results obtained with PPerfMark on a
more realistic (although not entirely real) code
4
Verification: CAMEL
Authors
Brief description
Differential cryptanalysis applicaiton
MPI-C code, ~1 kLOC
Performance bottlenecks
Matt Murphy, Chris Conger @ HCS Lab
MPI version: Adam Leko
None
Used as the “good” application in the suite
5
Other applications considered
Other applications considered and eliminated
POP – parallel ocean program
http://climate.lanl.gov/Models/POP/index.htm
Very large application (tens of kLOC)
Would take too much time to recompile & instrument for each PAT
Modified versions of CAMEL
Could easily add artificial bottlenecks to program
http://www.csm.ornl.gov/chammp/pstswm/index.html
Large Fortran MPI application; difficult to compile
Would take too much time to recompile & instrument for each PAT
MPIBlast
Has bandwidth bottleneck and load imbalance during check phase
These bottlenecks are covered by PPerfMark
Parallel Spectral Transform Shallow Water Model
Bottlenecks listed covered by PPerfMark
Bench9 MPI
Barrier syncs in outer computational loop
Random delays for load imbalance
http://mpiblast.lanl.gov/
Pattern matching application (bioinformatics, gene sequencing, etc)
Very large application, C & C++ MPI
Would take too much time to recompile & instrument for each PAT
If time allows, could use one of these applications (spectral transform or POP?)
on narrowed-down list of best performance tools
6