Open Trace Format (OTF) Tutorial - FrontPage

Download Report

Transcript Open Trace Format (OTF) Tutorial - FrontPage

Open Trace Format (OTF)
Tutorial
Wolfgang E. Nagel, Holger Brunst, T.U. Dresden, Germany
Sameer Shende, Allen D. Malony, ParaTools, Inc.
http://www.vampir.eu
[email protected]
© 2006 Wolfgang E. Nagel, TU Dresden, ZIH
Outline
• An overview of OTF, TAU and Vampir/VNG
• OTF
– Tools
– API
– Building trace conversion tools
• TAU
– Instrumentation
– Measurement
– Analysis
• Scalable Tracing
– Vampir
– VNG
– OTF
2
Tutorial Goals
• This tutorial is intended as an introduction to OTF tools.
• Today you should leave here with a better understanding of…
– OTF API and tools
– Steps involved in building a trace conversion tool to target OTF
– How to instrument your programs with TAU to generate OTF
– Automatic instrumentation at the routine level and outer loop level
– Manual instrumentation at the loop/statement level
– Measurement options provided by TAU
– Environment variables used for choosing metrics, generating
performance data
– How to use the Vampir and VNG tools
– Nature and types of visualization that VNG provides for visualizing OTF
traces
3
Vampir: Technical Components
Trace 1
Trace 2
Trace 3
Trace N
Tools
Server
Worker 1
Worker 2
Worker m
Master
1.
2.
3.
4.
5.
4
Trace generator
Classical Vampir viewer and
analyzer
Vampir client viewer
Parallel server engine
Conversion and analysis tools
Many Trace Formats to choose from …
5
OTF Features
• Fast and efficient sequential and parallel access
• Platform independent
• Selective access to
– Processes
– Time intervals
• API / Interfaces
– High level interface for analysis tools
– Read/write complete traces with multiple files
– Supports filtering and parallel I/O
– Low level interface for trace libraries
6
Relative File Size
3
2,5
Relative Size
Better
2
SMG 98 (18MB)
IRS (1.8 GB)
1,5
SMG2000 (2.3 GB)
1
0,5
0
STF
VTF
OTFZ
OTF
7
Read Performance
3,5
3
2,5
Mevents/s
Better
2
SMG 98 (18MB)
IRS (1.8 GB)
SMG2000 (2.3 GB)
1,5
1
0,5
0
STF
VTF
OTFZ
OTF
8
Performance Scalability
100
VTF
STF
OTF
10
OTFZ
25
6
64
16
4
1
1
Mevents/s
Better
1000
9
Vampir Server Workflow
Parallel Program
Monitor
System
(TAU/Kojak)
File System
Analysis Server
Merged
Traces
Trace 1
Trace 2
Trace 3
Trace N
Master
Worker 1
Classic
Analysis:
Worker
2
 monolithic
Worker
m
 sequential
Event Streams
Process
Visualization Client
Parallel
I/O
Timeline with 16
visible Traces
Message
Passing
Interne
t
Segment
Indicator
768 Processes
Thumbnail
10
Organization of Parallel Analysis
Master
Worker
Message Passing
Message Passing
Master
Worker 1
Session Thread
Session Thread
Worker 2
Analysis Module
Analysis Merger
Worker m
Event Databases
Endian Conversion
Traces
Trace Format Driver
Socket Communication
N Session Threads
N Session Threads
M Worker
11
Visualization
Client
Scalability – sPPM Analyzed on Origin 2000
• sPPM ASCI
18,00
16,00
Benchmark
Speedup
– 3D Gas Dynamic
– Data to be analyzed
• 16 Processes
14,00
Com. Matrix
12,00
Timeline
10,00
Summary Profile
8,00
Process Profile
6,00
Stack Tree
4,00
LoadTime
2,00
– 200 MByte Volume
0,00
0
10
20
30
40
Number of Workers
Number of Workers
Load Time
Timeline
Summary Profile
Process Profile
Com. Matrix
Stack Tree
1
47,33
0,10
1,59
1,32
0,06
2,57
2
22,48
0,09
0,87
0,70
0,07
1,39
4
10,80
0,06
0,47
0,38
0,08
0,70
12
8
5,43
0,08
0,30
0,26
0,09
0,44
16
3,01
0,09
0,28
0,17
0,09
0,25
32
3,16
0,09
0,25
0,17
0,09
0,25
A Fairly Large Test Case
Processing Times in Seconds
• IRS ASCI Benchmark
– Implicit Radiation Solver
• Data to be analyzed:
– 64 Processes in
8 Streams
– Approx.
800.000.000 Events
– 40 GByte Data Volume
9,11
10,00
8,00
6,00
5,59
4,65
4,67
3,84
3,62
4,00
2,00
0,02
0,16 0,09
0,02
0,00
Timeline
Summary
Prof.
Process
Prof.
16 Worker
• Analysis Platform:
– Jump.fz-juelich.de
– 41 IBM p690 nodes (32 processors per node)
– 128 GByte per node
• Visualization Platform:
– Remote Laptop
13
Com.
Matrix
32 Worker
Stack Tree
Outline
• An overview of OTF, TAU and Vampir/VNG
• OTF
– Tools
– API
– Building trace conversion tools
• TAU
– Instrumentation
– Measurement
– Analysis
• Scalable Tracing
– Vampir
– VNG
– OTF
14
OTF Trace Generation and Analysis Tools
15
OTF Contents
•
Definition records
– Map event ids to interval (begin/end) event names
– Symbols for atomic events
– Process groups
•
Performance events
– Timestamped events for entering or leaving a state
– Timestamped counter events (monotonically increasing or not)
•
Global master file
– Mapping processes to streams
•
Statistical Summaries
– Overview over a whole interval of time
•
Snapshots
– Callstack, list of pending messages, etc. at a point in time
16
OTF File Hierarchy
17
OTF Streams
18
otfmerge
•
Allows an existing OTF trace to alter the number of streams
•
Add snapshots or statistics to the merged trace file
• otfmerge - converter program of OTF library.
otfmerge [Options] <input file name>
options: -h, --help show this help message
-n <n> set number of streams for output
-f <n> set max number of filehandles available
-o <name> namestub of the output file (default
’out’)
-rb <size> set buffersize of the reader
-wb <size> set buffersize of the writer
-stats cover statistics too
-snaps cover snapshots too
-V show OTF version
19
OTF Tools: otfaux
• otfaux
–
–
–
–
–
–
Adds auxillary snapshot and/or statistics information to the trace file
Snapshots include callstack, pending messages, current counter values
Statistics include number of calls, exclusive/inclusive time
Statistics are monotonically increasing - unlike profiles
Original event trace is unmodified
Auxillary data is generated at breakpoints -periodically or at ticks
20
otfaux
• otfaux - append snapshots and statistics to
existing otf traces at given ’break’ time stamps
otfaux [Options] <file name>
Options: -h, --help show this help message
-b <size> buffer size for read and write operations
-n <n> number of breaks (distributed regularly) if
-p and -t are not set, the default for -n is
200 breaks
-p <p> create break every ’p’ ticks (if both, -n
and -p are specified the one producing more
breaks wins)
-t <t> define (additional) break at given time
stamp
-F force overwrite old snapshots and statistics
-R delete existing snapshots and statistics only
-f <n> max number of filehandles output ...
21
otfaux (contd.)
-g create functiongroup summaries instead of
function summaries
-v verbose mode, print break time stamps
-V show OTF version
-a show advancing progress during operation -snapshots write ONLY snapshots but NO
statistics
--statistics write ONLY statistics but NO snapshots
-s a[,b]* regard given streams only when computing
statistics. expects a single token or comma
separated list. this implies the ’--statistics’
option!
-l list existing stream tokens
22
tau2otf
•
Converts TAU traces to OTF
• tau2otf <TAU trace> <edf file> <out file>
[-n streams] [-nomessage]
[-z] [-v]
-n <streams> : Specifies the number of output
streams (default 1)
-nomessage : Suppress printing of message
information in the trace
-z : Enable compression of trace files. By default
it is uncompressed.
-v : Verbose
Trace format of <out file> is OTF
% tau2otf merged.trc tau.edf app.otf
23
vtf2otf
• Convert VTF traces to OTF format
• vtf2otf [Options] <input file name>
Options:
-o <file> output file
-f <n> max count of filehandles
-n <n> output stream count
-b <n> size of the writer buffer
-V show OTF version
24
otf2vtf
• Convert OTF trace files to VTF format
• otf2vtf [Options] <input file name>
Options: -o <file> output file
-b <n> size of the reader buffer
-A write VTF3 ASCII sub-format (default)
-B write VTF3 binary sub-format
-V show OTF version
25
Building Trace Analysis Tools
•
Writing OTF traces in trace conversion tools
– High level API writes multiple streams
– Low level API writes a single stream
– Each OTF file has a prefix (e.g., app.otf)
•
Parallel reading and searching in OTF analysis tools
– Each process in tool reads local and global event definitions
–
Each process reads a subset of events
– Read summary information to select interesting spots in trace
– Tool might read a selected time interval for analysis
– OTF supports efficient binary search
•
Tool may support for compressed or uncompressed OTF trace
•
Tool may support for single or multi-stream OTF traces
26
OTF Trace Writer API - OTF_FileManager_open
• Generates a new file manager with a maximum number of files that
are allowed to be open simultaneously
• OTF_FileManager* OTF_FileManager_open( uint32_t number );
#include <otf.h>
OTF_FileManager *manager;
manager = OTF_FileManager_open(256);
27
OTF_FileManager_close
• Closes the file manager
• void OTF_FileManager_close( OTF_FileManager* m );
#include <otf.h>
OTF_FileManager_close(manager);
28
OTF_Writer_open
• Define file control block for output trace file
• OTF_Writer* OTF_Writer_open(
char* fileNamePrefix,
uint32_t numberOfStreams,
OTF_FileManager* fileManager );
#include <otf.h>
void *fcb = (void *) OTF_Writer_open(out_file,
num_streams, manager);
29
OTF_Writer_setCompression
• Enable compression if specified by the user
• int OTF_Writer_setCompression( OTF_Writer* writer,
OTF_FileCompression);
#include <otf.h>
OTF_Writer_setCompression((OTF_Writer *)fcb,
OTF_FILECOMPRESSION_COMPRESSED);
30
OTF_Writer_writeDefCreator
• Specify a comment about the creator (trace conversion tool)
• int OTF_Writer_writeDefCreator( void* userData,
uint32_t stream, /* stream = 0 means global definition */
const char* creator );
#include <otf.h>
OTF_Writer_writeDefCreator(fcb, 0,
“MyTool2otf ver 2.42”);
31
OTF_Writer_writeDefProcess
• Write a process definition record
• int OTF_Writer_writeDefProcess( OTF_Writer* writer,
uint32_t stream,
uint32_t process,
const char* name,
uint32_t parent );
#include <otf.h>
OTF_Writer_writeDefProcess(
(OTF_Writer *)fcb, 0, cpuid, name, 0);
32
OTF_Writer_writeDefTimerResolution
• Provides the timer resolution. All timestamps are interpreted based
on this resolution. By default it is 1 microseconds.
• int OTF_Writer_write_DefTimerResolution(
void* userData,
uint32_t stream,
uint64_t ticksPerSecond );
#include <otf.h>
OTF_Writer_writeDefTimerResolution((OTF_Writer*)
userData, 0, getTicksPerSecond());
33
OTF_Writer_write_DefFunction
• Provide a function definition and specify an event id to name
mapping
• int OTF_Writer_write_DefFunction( void* userData,
uint32_t stream,
uint32_t func,
const char* name,
uint32_t funcGroup,
uint32_t source ); /* specify source code location */
#include <otf.h>
OTF_Writer_writeDefFunction((OTF_Writer*)userData,
0, eventID, (const char *) name, groupID, 0);
34
OTF_Writer_writeDefFunctionGroup
• Provides a function group definition
• int OTF_Handler_DefFunctionGroup( void* userData,
uint32_t stream,
uint32_t funcGroup,
const char* name );
#include <otf.h>
OTF_Writer_writeDefFunctionGroup((OTF_Writer*)user
Data, 0, groupId, GroupName);
35
OTF_Writer_writeEnter
• Write a function entry record
• int OTF_Writer_writeEnter( OTF_Writer* writer,
uint64_t time,
uint32_t function,
uint32_t process,
uint32_t source );
#include <otf.h>
OTF_Writer_writeEnter((OTF_Writer*)userData,
GetClockTicksInGHz(time), stateid, cpuid, 0);
36
int OTF_Writer_writeSendMsg
• Write a send message record
• int OTF_Writer_writeSendMsg( OTF_Writer* writer,
uint64_t time,
uint32_t sender,
uint32_t receiver,
uint32_t procGroup,
uint32_t tag,
uint32_t length,
uint32_t source );
37
int OTF_Writer_writeRecvMsg
•
Write a receive message record
•
int OTF_Writer_writeRecvMsg( OTF_Writer* writer,
uint64_t time,
uint32_t receiver,
uint32_t sender,
uint32_t procGroup,
uint32_t tag,
uint32_t length,
uint32_t source );
38
OTF Trace Reader API
• Similar to trace writer API
• Instead of Write, create a Handler for callbacks, e.g.,
• int OTF_Handler_DefFunction( void* userData,
uint32_t stream,
uint32_t func,
const char* name,
uint32_t funcGroup,
uint32_t source );
39
OTF Trace Reader API
• Similar to trace writer API
• Instead of Write, create a Handler for callbacks
• Specify the parameters to the handler routine
• After setting up handlers, read events, snapshots, definitions.... The
library invokes appropriate handlers
• Close the file manager and exit cleanly
40
Global Read Operations
• Open array handler
• Open OTF reader
• Control the buffer size
• Set handler and arguments
• Read definitions
• Read snapshots
• Read events
• Close reader
41
OTF_HandlerArray_open/close
•
To open a new array of handlers and then fill in the callback routines
•
OTF_HandlerArray *OTF_HandlerArray_open( void);
#include <otf.h>
OTF_HandlerArray *handlers;
handlers
•
= OTF_HandlerArray_open();
To close the array, use OTF_HandlerArray_close
OTF_HandlerArray_close(handlers);
42
A Sample Handler
• int OTF_handleDefinitionComment( void* fcb,
uint32_t streamid,
const char* comment )
{ /* written by user; called by OTF */ }
• The first argument is a file control block. We need to pass
this argument and the callback function’s address to the
OTF reader.
43
OTF_HandlerArray_setHandler
• int OTFHandlerArray_setHandler(
OTF_HandlerArray *handlers,
OTFFunctionPointer *pointer,
uint32_t recordtype);
• int OTF_HandlerArray_setFirstHandlerArg(
OTF_HandlerArray* handlers,
void* firsthandlerarg,
uint32_t recordtype );
• To specify any user defined pointer that should be passed
as the first argument. Useful for keeping track of location.
44
OTF_HandlerArray_setHandler
#include <otf.h>
OTF_HandlerArray *handlers = OTF_HandlerArray_Open();
...
/* put the callback routine’s address in the array */
OTF_HandlerArray_setHandler( handlers,
(OTF_FunctionPointer*) OTF_handleDefinitionComment,
OTF_DEFINITIONCOMMENT_RECORD );
/* specify the file position/any address (of say the
trace writer) as the first argument*/
OTF_HandlerArray_setFirstHandlerArg( handlers,
&fcb, OTF_DEFINITIONCOMMENT_RECORD );
/* invokes OTF_handleDefinitionComment routine*/
45
OTF_Reader_open
• Opens a master control file and returns an OTF_Reader
• OTF_Reader * OTF_Reader_open(char *name,
OTF_FileManager *manager);
#include <otf.h>
OTF_FileManager *manager =
OTF_FileManager_open(256);
OTF_Reader *reader =
OTF_Reader_open(“inputfile”, manager);
46
User defined handlers for definitions
• int handleDeftimerresolution( void* firsthandlerarg, uint32_t streamid,
uint64_t ticksPerSecond ) { ...}
• int handleDefprocess( void* firsthandlerarg, uint32_t streamid,
uint32_t deftoken, const char* name, uint32_t parent) { ... }
• int handleDefprocessgroup( void* firsthandlerarg, uint32_t streamid,
uint32_t deftoken, const char* name, uint32_t n, uint32_t* array ) { }
• int handleDeffunction( void* firsthandlerarg, uint32_t streamid,
uint32_t deftoken, const char* name, uint32_t group, uint32_t
scltoken ) { }
• int handleDefcounter( void* firsthandlerarg, uint32_t streamid,
uint32_t deftoken, const char* name, uint32_t properties, uint32_t
countergroup, const char* unit ) { } ...
47
User defined handlers for timestamped events
• int handleCounter( void* firsthandlerarg, uint64_t time, uint32_t
process, uint32_t token, uint64_t value ) {... }
• int handleRecvmsg( void* firsthandlerarg, uint64_t time, uint32_t
receiver, uint32_t sender, uint32_t communicator, uint32_t msgtype,
uint32_t msglength, uint32_t scltoken )
• int handleSendmsg( void* firsthandlerarg, uint64_t time, uint32_t
sender, uint32_t receiver, uint32_t communicator, uint32_t msgtype,
uint32_t msglength, uint32_t scltoken )
• int handleEnter( void* firsthandlerarg, uint64_t time, uint32_t
statetoken, uint32_t cpuid, uint32_t scltoken ) {...}
• int handleLeave( void* firsthandlerarg, uint64_t time, uint32_t
statetoken, uint32_t cpuid, uint32_t scltoken ) {...}
48
OTF_Reader_readDefinitions
• int OTF_Reader_readDefinitions(OTF_Reader *r,
OTF_HandlerArray *handlers);
#include <otf.h>
OTF_HandlerArray *handlers = OTF_HandlerArray_open();
OTF_Manager *manager = OTF_FileManager_open(100);
OTF_Reader *reader = OTF_Reader_open(inputFile,
manager);
/* set up handlers */
OTF_Reader_readDefinitions(reader, handlers);
/* OTF invokes handlers for process, functions,
groups and counters here */
49
OTF_Reader_readEvents
• int OTF_Reader_readEvents(OTF_Reader *reader,
OTF_HandlerArray *handlers);
#include <otf.h>
OTF_Reader_readEvents (reader, handlers);
/* invokes handlers for timestamped message
communication,
routine entry/exit, counter events */
50
Building OTF Analysis Tools
• Header files are in <otf-version>/include directory
• Libraries are in <otf-version>/<arch>/lib directory
– Support for Zlib (v1.2.3) is included in libotf.a
% g++ tool.cpp -I<otf-version>/include
% g++ tool.o -o tool -L<otf-version>/<arch>/lib
-lotf
51
Outline
• An overview of OTF, TAU and Vampir/VNG
• OTF
– Tools
– API
– Building trace conversion tools
• TAU
– Instrumentation
– Measurement
– Analysis
• Scalable Tracing
– Vampir
– VNG
– OTF
52
TAU Parallel Performance System
• http://www.cs.uoregon.edu/research/tau/
• Multi-level performance instrumentation
– Multi-language automatic source instrumentation
• Flexible and configurable performance measurement
• Widely-ported parallel performance profiling system
– Computer system architectures and operating systems
– Different programming languages and compilers
• Support for multiple parallel programming paradigms
– Multi-threading, message passing, mixed-mode, hybrid
• Integration in complex software, systems, applications
53
Using TAU: A brief Introduction
• To instrument source code, choose measurement module:
% setenv TAU_MAKEFILE /usr/tau-2.16/x86_64/lib/Makefile.tau-mpipdt-trace-pgi
And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C
compilers:
% mpif90 foo.f90
changes to
% tau_f90.sh foo.f90
• Execute application and then run:
% tau_treemerge.pl
% tau2otf tau.trc tau.edf app.otf
% vampir app.otf
54
TAU Performance System Architecture
event
selection
55
TAU Performance System Architecture
56
Program Database Toolkit (PDT)
Application
/ Library
C / C++
parser
IL
C / C++
IL analyzer
Program
Database
Files
Fortran parser
F77/90/95
IL
Fortran
IL analyzer
DUCTAPE
57
PDBhtml
Program
documentation
SILOON
Application
component glue
CHASM
C++ / F90/95
interoperability
TAU_instr
Automatic source
instrumentation
TAU Instrumentation Approach
•
Support for standard program events
– Routines
– Classes and templates
– Statement-level blocks
•
Support for user-defined events
– Begin/End events (“user-defined timers”)
– Atomic events (e.g., size of memory allocated/freed)
– Selection of event statistics
•
Support definition of “semantic” entities for mapping
•
Support for event groups
•
Instrumentation optimization (eliminate instrumentation in
lightweight routines)
58
TAU Instrumentation
• Flexible instrumentation mechanisms at multiple levels
– Source code
– manual (TAU API, TAU Component API)
– automatic
– C, C++, F77/90/95 (Program Database Toolkit (PDT))
– OpenMP (directive rewriting (Opari), POMP spec)
– Object code
– pre-instrumented libraries (e.g., MPI using PMPI)
– statically-linked and dynamically-linked
– Executable code
– dynamic instrumentation (pre-execution) (DynInstAPI)
– virtual machine instrumentation (e.g., Java using JVMPI)
– Python interpreter based instrumentation at runtime
– Proxy Components
59
TAU Measurement Approach
• Portable and scalable parallel profiling solution
–
–
–
–
Multiple profiling types and options
Event selection and control (enabling/disabling, throttling)
Online profile access and sampling
Online performance profile overhead compensation
• Portable and scalable parallel tracing solution
– Trace translation to Open Trace Format (OTF)
– Trace streams and hierarchical trace merging
• Robust timing and hardware performance support
• Multiple counters (hardware, user-defined, system)
• Performance measurement for CCA component software
60
Using TAU
•
Configuration
•
Instrumentation
–
–
–
–
–
–
Manual
MPI – Wrapper interposition library
PDT- Source rewriting for C,C++, F77/90/95
OpenMP – Directive rewriting
Component based instrumentation – Proxy components
Binary Instrumentation
– DyninstAPI – Runtime Instrumentation/Rewriting binary
– Java – Runtime instrumentation
– Python – Runtime instrumentation
•
Measurement
•
Performance Analysis
61
TAU Measurement System Configuration
•
configure [OPTIONS]
{-c++=<CC>, -cc=<cc>}
{-pthread, -sproc}
-openmp
-jdk=<dir>
-opari=<dir>
-papi=<dir>
-pdt=<dir>
-dyninst=<dir>
-mpi[inc/lib]=<dir>
-shmem[inc/lib]=<dir>
-python[inc/lib]=<dir>
-tag=<name>
-epilog=<dir>
-slog2
-otf=<dir>
-arch=<architecture>
Specify C++ and C compilers
Use pthread or SGI sproc threads
Use OpenMP threads
Specify Java instrumentation (JDK)
Specify location of Opari OpenMP tool
Specify location of PAPI
Specify location of PDT
Specify location of DynInst Package
Specify MPI library instrumentation
Specify PSHMEM library instrumentation
Specify Python instrumentation
Specify a unique configuration name
Specify location of EPILOG
Build SLOG2/Jumpshot tracing package
Specify location of OTF trace package
Specify architecture explicitly
(bgl, xt3,ibm64,ibm64linux…)
62
TAU Measurement System Configuration
•
configure [OPTIONS]
-TRACE
-PROFILE (default)
-PROFILECALLPATH
-PROFILEPHASE
-PROFILEMEMORY
-PROFILEHEADROOM
-MULTIPLECOUNTERS
-COMPENSATE
-CPUTIME
-PAPIWALLCLOCK
-PAPIVIRTUAL
-SGITIMERS
-LINUXTIMERS
Generate binary TAU traces
Generate profiles (summary)
Generate call path profiles
Generate phase based profiles
Track heap memory for each routine
Track memory headroom to grow
Use hardware counters + time
Compensate timer overhead
Use usertime+system time
Use PAPI’s wallclock time
Use PAPI’s process virtual time
Use fast IRIX timers
Use fast x86 Linux timers
63
TAU Measurement Configuration – Examples
•
./configure –pdt=/opt/ALTIX/pkgs/pdtoolkit-3.9 -mpi
– Configure using PDT and MPI with GNU compilers
•
./configure -papi=/usr/local/packages/papi -pdt=/usr/local/pdtoolkit-3.9
-mpiinc=/usr/local/include -mpilib=/usr/local/lib
-MULTIPLECOUNTERS –c++=icpc –cc=icc –fortran=intel
-tag=intel91039; make clean install
– Use PAPI counters (one or more) with C/C++/F90 automatic
instrumentation. Also instrument the MPI library. Use Intel compilers.
•
Typically configure multiple measurement libraries
•
Each configuration creates a unique <arch>/lib/Makefile.tau<options>
stub makefile. It corresponds to the configuration options used. e.g.,
– /opt/tau-2.15.5/x86_64/lib/Makefile.tau-icpc-mpi-pdt
– /opt/tau-2.15.5/x86_64/lib/Makefile.tau-icpc-mpi-pdt-trace
64
TAU Measurement Configuration – Examples
% cd /usr/tau-2.16/x86_64/lib; ls Makefile.*pgi
Makefile.tau-pdt-pgi
Makefile.tau-mpi-pdt-pgi
Makefile.tau-callpath-mpi-pdt-pgi
Makefile.tau-mpi-pdt-trace-pgi
Makefile.tau-mpi-compensate-pdt-pgi
Makefile.tau-pthread-pdt-pgi
Makefile.tau-papiwallclock-multiplecounters-papivirtual-mpi-papi-pdt-pgi
Makefile.tau-multiplecounters-mpi-papi-pdt-trace-pgi
Makefile.tau-mpi-pdt-epilog-trace-pgi
Makefile.tau-papiwallclock-multiplecounters-papivirtual-papi-pdt-openmp-opari-pgi
…
•
For an MPI+F90 application, you may want to start with:
Makefile.tau-mpi-pdt-trace-pgi
–
Supports MPI instrumentation & PDT for automatic source instrumentation for PGI with tracing
65
Configuration Parameters in Stub Makefiles
•
•
•
•
Each TAU stub Makefile resides in <tau>/<arch>/lib directory
Variables:
–
–
–
–
–
–
–
–
–
–
–
–
–
–
TAU_CXX
TAU_CC, TAU_F90
TAU_DEFS
TAU_LDFLAGS
TAU_INCLUDE
TAU_LIBS
TAU_SHLIBS
TAU_MPI_LIBS
TAU_MPI_FLIBS
TAU_FORTRANLIBS
TAU_CXXLIBS
TAU_INCLUDE_MEMORY
TAU_DISABLE
TAU_COMPILER
Specify the C++ compiler used by TAU
Specify the C, F90 compilers
Defines used by TAU. Add to CFLAGS
Linker options. Add to LDFLAGS
Header files include path. Add to CFLAGS
Statically linked TAU library. Add to LIBS
Dynamically linked TAU library
TAU’s MPI wrapper library for C/C++
TAU’s MPI wrapper library for F90
Must be linked in with C++ linker for F90
Must be linked in with F90 linker
Use TAU’s malloc/free wrapper lib
TAU’s dummy F90 stub library
Instrument using tau_compiler.sh script
Each stub makefile encapsulates the parameters that TAU was configured with
It represents a specific instance of the TAU libraries. TAU scripts use stub
makefiles to identify what performance measurements are to be performed.
66
Using TAU
•
Install TAU
% configure [options]; make clean install
•
Instrument application manually/automatically
–
•
Typically modify application makefile
–
•
Select TAU’s stub makefile, change name of compiler in Makefile
Set environment variables
–
–
•
TAU Profiling API
TAU_MAKEFILE stub makefile
directory where profiles/traces are to be stored
Execute application
% mpirun –np <procs> a.out;
•
Analyze performance data
–
paraprof, vampir, pprof, paraver …
67
TAU’s MPI Wrapper Interposition Library
• Uses standard MPI Profiling Interface
– Provides name shifted interface
– MPI_Send = PMPI_Send
– Weak bindings
• Interpose TAU’s MPI wrapper library between MPI and TAU
– -lmpi replaced by –lTauMpi –lpmpi –lmpi
• No change to the source code!
– Just re-link the application to generate performance data
– setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi -[options]
– Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers
68
Instrumenting MPI Applications
•
Under Linux you may use tau_load.sh to launch un-instrumented programs
under TAU
–
Without TAU:
% mpirun -np 4 ./a.out
–
With TAU:
% ls /usr/tau/x86_64/lib/libTAU*pgi*
% mpirun -np 4 tau_load.sh ./a.out
% mpirun -np 4 tau_load.sh -XrunTAUsh-mpi-pdt-trace-pgi.so a.out
loads <taudir>/<arch>/lib/libTAUsh-mpi-pdt-trace-pgi.so shared object
•
Under AIX, use tau_poe instead of poe
–
Without TAU:
% poe a.out -procs 8
–
With TAU:
% tau_poe a.out -procs 8
% tau_poe -XrunTAUsh-mpi-pdt-trace.so a.out -procs 8
chooses <taudir>/<arch>/lib/libTAUsh-mpi-pdt-trace.so
•
No change to source code or executables! No need to re-link!
•
Only instruments MPI routines. To instrument user routines, you may need
to parse the application source code!
69
Integration with Application Build Environment
• Try to minimize impact on user’s application build procedures
• Handle process of parsing, instrumentation, compilation, linking
• Dealing with Makefiles
– Minimal change to application Makefile
– Avoid changing compilation rules in application Makefile
– No explicit inclusion of rules for process stages
• Some applications do not use Makefiles
– Facilitate integration in whatever procedures used
• Two techniques:
– TAU shell scripts (tau_<compiler>.sh)
– Invokes all PDT parser, TAU instrumenter, and compiler
– TAU_COMPILER
70
Using Program Database Toolkit (PDT)
1.
Parse the Program to create foo.pdb:
% cxxparse foo.cpp –I/usr/local/mydir –DMYFLAGS …
or
% cparse foo.c –I/usr/local/mydir –DMYFLAGS …
or
% f95parse foo.f90 –I/usr/local/mydir …
% f95parse *.f –omerged.pdb –I/usr/local/mydir –R free
2.
Instrument the program:
% tau_instrumentor foo.pdb
–f select.tau
3.
foo.f90 –o foo.inst.f90
Compile the instrumented program:
% ifort foo.inst.f90 –c –I/usr/local/mpi/include –o foo.o
71
Tau_[cxx,cc,f90].sh – Improves Integration in Makefiles
# set TAU_MAKEFILE and TAU_OPTIONS env vars
CC = tau_cc.sh
F90 = tau_f90.sh
CFLAGS =
LIBS = -lm
OBJS = f1.o f2.o f3.o … fn.o
app: $(OBJS)
$(F90) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)
.c.o:
$(CC) $(CFLAGS) -c $<
.f90.o:
$(F90) $(FFLAGS) –c $<
72
AutoInstrumentation using TAU_COMPILER
•
$(TAU_COMPILER) stub Makefile variable
•
Invokes PDT parser, TAU instrumentor, compiler through
tau_compiler.sh shell script
•
Requires minimal changes to application Makefile
–
–
Compilation rules are not changed
User adds $(TAU_COMPILER) before compiler name
– F90=mpxlf90
Changes to
F90= $(TAU_COMPILER) mpxlf90
•
Passes options from TAU stub Makefile to the four compilation stages
•
Use tau_cxx.sh, tau_cc.sh, tau_f90.sh scripts OR $(TAU_COMPILER)
•
Uses original compilation command if an error occurs
73
Automatic Instrumentation
• We now provide compiler wrapper scripts
– Simply replace mpxlf90 with tau_f90.sh
– Automatically instruments Fortran source code, links with
TAU MPI Wrapper libraries.
• Use tau_cc.sh and tau_cxx.sh for C/C++
Before
After
CXX = mpCC
CXX = tau_cxx.sh
F90 = mpxlf90_r
F90 = tau_f90.sh
CFLAGS =
CFLAGS =
LIBS = -lm
LIBS = -lm
OBJS = f1.o f2.o f3.o … fn.o
OBJS = f1.o f2.o f3.o … fn.o
app: $(OBJS)
app: $(OBJS)
$(CXX) $(LDFLAGS) $(OBJS) -o $@
$(LIBS)
.cpp.o:
$(CXX) $(LDFLAGS) $(OBJS) -o $@
$(LIBS)
.cpp.o:
$(CC) $(CFLAGS) -c $<
$(CC) $(CFLAGS) -c $<
74
TAU_COMPILER – Improving Integration in Makefiles
include /usr/tau-2.15.5/x86_64/Makefile.tau-icpc-mpi-pdt
CXX = $(TAU_COMPILER) mpicxx
F90 = $(TAU_COMPILER) mpif90
CFLAGS =
LIBS = -lm
OBJS = f1.o f2.o f3.o … fn.o
app: $(OBJS)
$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS)
.cpp.o:
$(CXX) $(CFLAGS) -c $<
75
TAU_COMPILER Commandline Options
•
See <taudir>/<arch>/bin/tau_compiler.sh –help
•
Compilation:
% mpxlf90 -c foo.f90
Changes to
% f95parse foo.f90 $(OPT1)
% tau_instrumentor foo.pdb foo.f90 –o foo.inst.f90 $(OPT2)
% mpxlf90 –c foo.f90 $(OPT3)
•
Linking:
% mpxlf90 foo.o bar.o –o app
Changes to
% mpxlf90 foo.o bar.o –o app $(OPT4)
•
Where options OPT[1-4] default values may be overridden by the user:
F90 = $(TAU_COMPILER) $(MYOPTIONS) mpxlf90
76
TAU_COMPILER Options
•
Optional parameters for $(TAU_COMPILER): [tau_compiler.sh –help]
-optVerbose
Turn on verbose debugging messages
-optDetectMemoryLeaks
Turn on debugging memory allocations/
de-allocations to track leaks
-optPdtGnuFortranParser Use gfparse (GNU) instead of f95parse
(Cleanscape) for parsing Fortran source code
-optKeepFiles
Does not remove intermediate .pdb and .inst.* files
-optPreProcess
Preprocess Fortran sources before instrumentation
-optTauSelectFile=""
Specify selective instrumentation file for tau_instrumentor
-optLinking=""
Options passed to the linker. Typically
$(TAU_MPI_FLIBS) $(TAU_LIBS) $(TAU_CXXLIBS)
-optCompile=""
Options passed to the compiler. Typically
$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
-optPdtF95Opts=""
Add options for Fortran parser in PDT (f95parse/gfparse)
-optPdtF95Reset=""
Reset options for Fortran parser in PDT (f95parse/gfparse)
-optPdtCOpts=""
Options for C parser in PDT (cparse). Typically
$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
-optPdtCxxOpts=""
Options for C++ parser in PDT (cxxparse). Typically
$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
...
77
Overriding Default Options:TAU_COMPILER
include /usr/tau/x86_64/lib/
Makefile.tau-icpc-mpi-pdt-trace
# Fortran .f files in free format need the -R free option for parsing
# Are there any preprocessor directives in the Fortran source?
MYOPTIONS= -optVerbose –optPreProcess -optPdtF95Opts=’’-R free’’
F90 = $(TAU_COMPILER) $(MYOPTIONS) ifort
OBJS = f1.o f2.o f3.o …
LIBS = -Lappdir –lapplib1 –lapplib2 …
app: $(OBJS)
$(F90) $(OBJS) –o app $(LIBS)
.f.o:
$(F90) –c $<
78
Overriding Default Options:TAU_COMPILER
% cat Makefile
F90 = tau_f90.sh
OBJS = f1.o f2.o f3.o …
LIBS = -Lappdir –lapplib1 –lapplib2 …
app: $(OBJS)
$(F90) $(OBJS) –o app $(LIBS)
.f90.o:
$(F90) –c $<
% setenv TAU_OPTIONS ‘-optVerbose -optTauSelectFile=select.tau
-optKeepFiles’
% setenv TAU_MAKEFILE <taudir>/x86_64/lib/Makefile.tau-icpc-mpi-pdt
79
Optimization of Program Instrumentation
• Need to eliminate instrumentation in frequently executing lightweight
routines
• Throttling of events at runtime:
% setenv TAU_THROTTLE 1
Turns off instrumentation in routines that execute over 10000 times
(TAU_THROTTLE_NUMCALLS) and take less than 10 microseconds of
inclusive time per call (TAU_THROTTLE_PERCALL)
• Selective instrumentation file to filter events
% tau_instrumentor [options] –f <file> OR
% setenv TAU_OPTIONS ’-optTauSelectFile=tau.txt’
• Compensation of local instrumentation overhead
% configure -COMPENSATE
80
Selective Instrumentation File
•
Specify a list of routines to exclude or include (case sensitive)
•
# is a wildcard in a routine name. It cannot appear in the first column.
BEGIN_EXCLUDE_LIST
Foo
Bar
D#EMM
END_EXCLUDE_LIST
•
Specify a list of routines to include for instrumentation
BEGIN_INCLUDE_LIST
int main(int, char **)
F1
F3
END_LIST_LIST
•
Specify either an include list or an exclude list!
81
Selective Instrumentation File
•
Optionally specify a list of files to exclude or include (case sensitive)
•
* and ? may be used as wildcard characters in a file name
BEGIN_FILE_EXCLUDE_LIST
f*.f90
Foo?.cpp
END_EXCLUDE_LIST
•
Specify a list of routines to include for instrumentation
BEGIN_FILE_INCLUDE_LIST
main.cpp
foo.f90
END_INCLUDE_LIST_LIST
82
Selective Instrumentation File
•
User instrumentation commands are placed in INSTRUMENT section
•
? and * used as wildcard characters for file name, # for routine name
•
\ as escape character for quotes
•
Routine entry/exit, arbitrary code insertion
•
Outer-loop level instrumentation
BEGIN_INSTRUMENT_SECTION
loops file=“foo.f90” routine=“matrix#”
file=“foo.f90” line = 123 code = " print *, \" Inside foo\""
exit routine = “int foo()” code = "cout <<\"exiting foo\"<<endl;"
END_INSTRUMENT_SECTION
83
Instrumentation Specification
% tau_instrumentor
Usage : tau_instrumentor <pdbfile> <sourcefile> [-o <outputfile>] [-noinline] [-g groupname]
[-i headerfile] [-c|-c++|-fortran] [-f <instr_req_file> ]
For selective instrumentation, use –f option
% tau_instrumentor foo.pdb foo.cpp –o foo.inst.cpp –f selective.dat
% cat selective.dat
# Selective instrumentation: Specify an exclude/include list of routines/files.
BEGIN_EXCLUDE_LIST
void quicksort(int *, int, int)
void sort_5elements(int *)
void interchange(int *, int *)
END_EXCLUDE_LIST
BEGIN_FILE_INCLUDE_LIST
Main.cpp
Foo?.c
*.C
END_FILE_INCLUDE_LIST
# Instruments routines in Main.cpp, Foo?.c and *.C files only
# Use BEGIN_[FILE]_INCLUDE_LIST with END_[FILE]_INCLUDE_LIST
84
Automatic Outer Loop Level Instrumentation
BEGIN_INSTRUMENT_SECTION
loops file="loop_test.cpp" routine="multiply"
# it also understands # as the wildcard in routine name
# and * and ? wildcards in file name.
# You can also specify the full
# name of the routine as is found in profile files.
#loops file="loop_test.cpp" routine="double multiply#"
END_INSTRUMENT_SECTION
% pprof
NODE 0;CONTEXT 0;THREAD 0:
--------------------------------------------------------------------------------------%Time
Exclusive
Inclusive
msec
total msec
#Call
#Subrs
Inclusive Name
usec/call
--------------------------------------------------------------------------------------100.0
0.12
25,162
1
1
25162827 int main(int, char **)
100.0
0.175
25,162
1
4
25162707 double multiply()
90.5
22,778
22,778
1
0
[ file = <loop_test.cpp> line,col = <23,3> to <30,3> ]
22778959 Loop: double multiply()
9.3
2,345
2,345
1
0
[ file = <loop_test.cpp> line,col = <38,3> to <46,7> ]
2345823 Loop: double multiply()
85
TAU_REDUCE
• Reads profile files and rules
• Creates selective instrumentation file
– Specifies which routines should be excluded from instrumentation
rules
Selective
instrumentation file
tau_reduce
profile
86
Optimizing Instrumentation Overhead: Rules
• #Exclude all events that are members of TAU_USER
#and use less than 1000 microseconds
TAU_USER:usec < 1000
• #Exclude all events that have less than 100
#microseconds and are called only once
usec < 1000 & numcalls = 1
• #Exclude all events that have less than 1000 usecs per
#call OR have a (total inclusive) percent less than 5
usecs/call < 1000
percent < 5
• Scientific notation can be used
– usec>1000 & numcalls>400000 & usecs/call<30 & percent>25
• Usage:
% pprof –d > pprof.dat
% tau_reduce –f pprof.dat –r rules.txt –o select.tau
87
TAU Tracing Enhancements
•
Configure TAU with -TRACE –otf=<dir> option
% configure –TRACE –otf=<dir> …
Generates tau_merge, tau2vtf, tau2otf tools in <tau>/<arch>/bin directory
% tau_f90.sh app.f90 –o app
•
Instrument and execute application
•
Merge and convert trace files to OTF format
% mpirun -np 4 app
% tau2otf tau.trc tau.edf app.otf [-z][–n <nstreams>]
% vampir app.otf
OR use VNG to analyze OTF/VTF trace files
88
Environment Variables
•
Configure TAU with -TRACE –otf=<dir> option
% configure –TRACE –otf=<dir>
-MULTIPLECOUNTERS –papi=<dir> -mpi
–pdt=dir …
•
Set environment variables
%
%
%
%
•
setenv
setenv
setenv
setenv
TRACEDIR
COUNTER1
COUNTER2
COUNTER3
/p/gm1/<login>/traces
GET_TIME_OF_DAY (reqd)
PAPI_FP_INS
PAPI_TOT_CYC …
Execute application
% mpirun -np 32 ./a.out [args]
% tau_treemerge.pl; tau2otf/tau2vtf ...
89
Outline
• An overview of OTF, TAU and Vampir/VNG
• OTF
– Tools
– API
– Building trace conversion tools
• TAU
– Instrumentation
– Measurement
– Analysis
• Scalable Tracing
– Vampir
– VNG
– OTF
90
Using Vampir Next Generation (VNG)
91
VNG Timeline Display
92
VNG Calltree Display
93
VNG Timeline Zoomed In
94
VNG Grouping of Interprocess Communications
95
VNG Process Timeline with PAPI Counters
96
OTF/VNG Support for Counters
97
VNG Communication Matrix Display
98
VNG Message Profile
99
VNG Process Activity Chart
100
VNG Preferences
101
Support Acknowledgements
•
Lawrence Livermore National Laboratory (LLNL)
•
Department of Energy (DOE)
– Office of Science contracts
– LLNL ParaTools/GWT contract
•
University of Oregon
•
T.U. Dresden, GWT
•
Research Centre Juelich
102