PAT application testing suite

Download Report

Transcript PAT application testing suite

PAT application suite
Adam Leko
Hans Sherburne
HCS Research Laboratory
University of Florida
Application suite requirements overview


Purpose: test the ability of a performance tool to assist the user in picking out a
performance problem in their application
Method: select 3-5 applications used during performance tool testing






Categories of bottlenecks






Each has a particular “bottleneck” and a decent amount of parallelism
Also include “regular” program to make sure tool doesn’t give bogus recommendations
Applications should be well-known and mainly well-implemented
Problem size about ~200s running time sequential
Since most tools support MPI, suite should contain mainly MPI programs
Too much synchronization
Too many small messages
Irregular work distribution (load imbalance)
Bandwidth-dependent code
Etc…
Applications should be reasonably-sized, but not too large



Small, limited applications (e.g., microbenchmarks that test communication performance) will
be too simplistic
Large applications (e.g., MPIBlast) will take to long to try out with each performance tool
Applications must be easy to compile, since instrumenting will most likely complicate the
compilation process
2
Initial tests: PPerfMark

Authors




MPI port of “Grindstone: A PVM Parallel Performance Analysis Tool
Validation Suite” by JK Hollingsworth
Set of 9 “synthetic” application codes



Each with a particular (obvious) bottleneck
Smallish size (100-200 lines of C MPI)
Performance bottlenecks









Maseeh College of Engineering and Computer Science
http://www.cs.pdx.edu/~karavan/pperfdb.html
Very large messages (big-message.c)
Overloaded server/load imbalance (intensive-server.c)
Round-trip latency (ping-pong.c)
Too much synchronization (random-barrier.c)
Too many small messages (small-messages.c)
Sending messages in another order they are received in (wrong-way.c)
One procedure taking up a lot of time (hot-procedure.c, diffuse-procedure.c)
Too much time in kernel calls (system-time.c)
Use this suite to get a good idea how each performance tool helps pick
out bottlenecks
3
Verification: NPB benchmark, LU

Authors



Brief description




Part of the NAS NPB benchmark suite V2.4/V3.1
Performs LU decomposition of a matrix
MPI-Fortran77 code, ~5 kLOC
Performance bottlenecks



NAS group @ NASA
http://www.nas.nasa.gov/Software/NPB/
Large numbers of small (40-byte) messages [tech report
NAS-95-020]
Not much inherent parallelism
Can verify results obtained with PPerfMark on a
more realistic (although not entirely real) code
4
Verification: CAMEL

Authors



Brief description



Differential cryptanalysis applicaiton
MPI-C code, ~1 kLOC
Performance bottlenecks


Matt Murphy, Chris Conger @ HCS Lab
MPI version: Adam Leko
None
Used as the “good” application in the suite
5
Other applications considered

Other applications considered and eliminated

POP – parallel ocean program




http://climate.lanl.gov/Models/POP/index.htm
Very large application (tens of kLOC)
Would take too much time to recompile & instrument for each PAT
Modified versions of CAMEL

Could easily add artificial bottlenecks to program







http://www.csm.ornl.gov/chammp/pstswm/index.html
Large Fortran MPI application; difficult to compile
Would take too much time to recompile & instrument for each PAT
MPIBlast





Has bandwidth bottleneck and load imbalance during check phase
These bottlenecks are covered by PPerfMark
Parallel Spectral Transform Shallow Water Model


Bottlenecks listed covered by PPerfMark
Bench9 MPI


Barrier syncs in outer computational loop
Random delays for load imbalance
http://mpiblast.lanl.gov/
Pattern matching application (bioinformatics, gene sequencing, etc)
Very large application, C & C++ MPI
Would take too much time to recompile & instrument for each PAT
If time allows, could use one of these applications (spectral transform or POP?)
on narrowed-down list of best performance tools
6