PowerPoint Presentation - Computational Informatics for
Download
Report
Transcript PowerPoint Presentation - Computational Informatics for
Recent Advances in the
TAU Performance System
Allen D. Malony, Sameer Shende
{malony,shende}@cs.uoregon.edu
Department of Computer and Information Science
Computational Science Institute
University of Oregon
2
Outline
Complexity and performance technology
What is the TAU performance system?
Problems currently being investigated
Instrumentation control and selection
Performance mapping and callpath profiling
Online performance analysis and visualization
Performance analysis for component software
Performance database framework
Concluding remarks
Recent Advances in the TAU Performance System
LLNL, September 2002
3
Complexity in Parallel and Distributed Systems
Complexity in computing system architecture
Diverse parallel and distributed system architectures
shared
/ distributed memory, cluster, hybrid, NOW, Grid, …
Sophisticated processor / memory / network architectures
Complexity in parallel software environment
Diverse parallel programming paradigms
Optimizing compilers and sophisticated runtime systems
Advanced numerical libraries and application frameworks
Hierarchical, multi-level software architectures
Multi-component, coupled simulation models
Recent Advances in the TAU Performance System
LLNL, September 2002
4
Complexity Determines Performance Requirements
Performance observability requirements
Multiple levels of software and hardware
Different types and detail of performance data
Alternative performance problem solving methods
Multiple targets of software and system application
Performance technology requirements
Broad scope of performance observation
Flexible and configurable mechanisms
Technology integration and extension
Cross-platform portability
Open, layered, and modular framework architecture
Recent Advances in the TAU Performance System
LLNL, September 2002
5
Complexity Challenges for Performance Tools
Computing system environment complexity
Observation integration and optimization
Access, accuracy, and granularity constraints
Diverse/specialized observation capabilities/technology
Restricted modes limit performance problem solving
Sophisticated software development environments
Programming paradigms and performance models
Performance data mapping to software abstractions
Uniformity of performance abstraction across platforms
Rich observation capabilities and flexible configuration
Common performance problem solving methods
Recent Advances in the TAU Performance System
LLNL, September 2002
6
General Problems (Performance Technology)
How do we create robust and ubiquitous
performance technology for the analysis and tuning
of parallel and distributed software and systems in
the presence of (evolving) complexity challenges?
How do we apply performance technology effectively
for the variety and diversity of performance
problems that arise in the context of complex
parallel and distributed computer systems?
Recent Advances in the TAU Performance System
LLNL, September 2002
7
TAU Performance System Framework
Tuning and Analysis Utilities (aka Tools Are Us)
Performance system framework for scalable parallel and
distributed high-performance computing
Targets a general complex system computation model
nodes / contexts / threads
Multi-level: system / software / parallelism
Measurement and analysis abstraction
Integrated toolkit for performance instrumentation,
measurement, analysis, and visualization
Portable performance profiling/tracing facility
Open software approach
Recent Advances in the TAU Performance System
LLNL, September 2002
8
TAU Performance System Architecture
Paraver
EPILOG
Recent Advances in the TAU Performance System
LLNL, September 2002
9
Instrumentation Control and Selection
Selection of which performance events to observe
How is selection supported in instrumentation system?
Could depend on scope, type, level of interest
Could depend on instrumentation overhead
No choice
Include / exclude lists (TAU)
Environment variables
Static vs. dynamic
Problem: Controlling instrumentation of small routines
High relative measurement overhead
Significant intrusion and possible perturbation
Recent Advances in the TAU Performance System
LLNL, September 2002
10
Rule-Based Overhead Analysis (N. Trebon, UO)
Analyze the performance data to determine events with
high (relative) overhead performance measurements
Create a select list for excluding those events
Rule grammar (used in TAUreduce tool)
[GroupName:] Field Operator Number
GroupName indicates rule applies to events in group
Field is a event metric attribute (from profile statistics)
numcalls,
numsubs, percent, usec, cumusec, totalcount,
stdev, usecs/call, counts/call
Operator is one of >, <, or =
Number is any number
Compound rules possible using & between simple rules
Recent Advances in the TAU Performance System
LLNL, September 2002
12
TAUReduce Example
tau_reduce implements overhead reduction in TAU
Consider klargest example
Find kth largest element in a N elements
Compare two methods: quicksort, select_kth_largest
Testcase: i = 2324, N = 1000000 (uninstrumented)
quicksort: (wall clock) = 0.188511 secs
select_kth_largest: (wall clock) = 0.149594 secs
Total: (P3/1.2GHz time) = 0.340u 0.020s 0:00.37
Execute with all routines instrumented
Execute with rule-based selective instrumentation
usec>1000 & numcalls>400000 & usecs/call<30 & percent>25
Recent Advances in the TAU Performance System
LLNL, September 2002
13
Simple sorting example on one processor
Before selective instrumentation reduction
NODE 0;CONTEXT 0;THREAD 0:
--------------------------------------------------------------------------------------%Time
Exclusive
Inclusive
#Call
#Subrs Inclusive Name
msec
msec
usec/call
--------------------------------------------------------------------------------------100.0
13
4,982
1
4
4982030 int main
93.5
3,223
4,659 4.20241E+06 1.40268E+07
1 void quicksort
62.9
0.00481
3,134
5
5
626839 int kth_largest_qs
36.4
137
1,813
28
450057
64769 int select_kth_largest
33.6
150
1,675
449978
449978
4 void sort_5elements
28.8
1,435
1,435 1.02744E+07
0
0 void interchange
0.4
20
20
1
0
20668 void setup
0.0
0.0118
0.0118
49
0
0 int ceil
After selective instrumentation reduction
NODE 0;CONTEXT 0;THREAD 0:
--------------------------------------------------------------------------------------%Time
Exclusive
Inclusive
#Call
#Subrs Inclusive Name
msec
total msec
usec/call
--------------------------------------------------------------------------------------100.0
14
383
1
4
383333 int main
50.9
195
195
5
0
39017 int kth_largest_qs
40.0
153
153
28
79
5478 int select_kth_largest
5.4
20
20
1
0
20611 void setup
0.0
0.02
0.02
49
0
0 int ceil
Recent Advances in the TAU Performance System
LLNL, September 2002
14
Performance Mapping
Associate performance with “significant” entities (events)
Source code points are important
Functions, regions, control flow events, user events
Execution process and thread entities are important
Some entities are more abstract, harder to measure
Consider callgraph (callpath) profiling
Measure time (metric) along an edge (path) of callgraph
incident
edge gives parent / child view
edge sequence (path) gives parent / descendant view
Problem: Callpath profiling when callgraph is unknown
Determine callgraph dynamically at runtime
Map performance measurement to dynamic call path state
Recent Advances in the TAU Performance System
LLNL, September 2002
15
Callgraph (Callpath) Profiling
0-level callpath
A
C
B
Callgraph node
A
1-level callpath
Immediate descendant
AB, EI, DH
CH ?
D
G
E
H
F
I
Recent Advances in the TAU Performance System
k-level callpath (k>1)
k call descendant
2-level: AD, CI
2-level: AI ?
3-level: AH
LLNL, September 2002
16
1-Level Callpath Profiling in TAU (S. Shende, UO)
TAU maintains a performance event (routine) callstack
Profiled routine (child) looks in callstack for parent
Previous profiled performance event is the parent
A callpath profile structure created first time parent calls
TAU records parent in a callgraph map for child
String representing 1-level callpath used as its key
“a(
)=>b( )” : name for time spent in “b” when called by “a”
Map returns pointer to callpath profile structure
1-level callpath is profiled using this profiling data
Build upon TAU’s performance mapping technology
Measurement is independent of instrumentation
Recent Advances in the TAU Performance System
LLNL, September 2002
17
Callpath Profiling Example (NAS LU v2.3)
% configure -PROFILECALLPATH -SGITIMERS -arch=sgi64
-mpiinc=/usr/include -mpilib=/usr/lib64 -useropt=-O2
Recent Advances in the TAU Performance System
LLNL, September 2002
18
Callpath Parallel Profile Display
0-level and 1-level callpath grouping
0-Level Callpath
Recent Advances in the TAU Performance System
1-Level Callpath
LLNL, September 2002
19
Performance Monitoring and Steering
Desirable to monitor performance during execution
Large-scale parallel applications complicate solutions
Long-running applications
Steering computations for improved performance
More parallel threads of execution producing data
Large amount of performance data (relative) to access
Analysis and visualization more difficult
Problem: Online performance data access and analysis
Incremental profile sampling (based on files)
Integration in computational steering system
Dynamic performance measurement and access
Recent Advances in the TAU Performance System
LLNL, September 2002
20
Online Performance Analysis (K. Li, UO)
SCIRun (Univ. of Utah)
Application
Performance
Steering
Performance
Visualizer
// performance
data streams
TAU
Performance
System
// performance
data output
file system
accumulated
samples
Performance
Data Integrator
Performance
Analyzer
Performance
Data Reader
• sample sequencing
• reader synchronization
Recent Advances in the TAU Performance System
LLNL, September 2002
21
2D Field Performance Visualization in SCIRun
SCIRun program
Recent Advances in the TAU Performance System
LLNL, September 2002
22
Uintah Computational Framework (UCF)
University
of Utah
UCF analysis
Scheduling
MPI library
Components
500 processes
Use for online
and offline
visualization
Apply SCIRun
steering
Recent Advances in the TAU Performance System
LLNL, September 2002
23
Performance Analysis of Component Software
Complexity in scientific problem solving addressed by
advances in software development environments
rich layered software middleware and libraries
Increases complexity in performance problem solving
Integration barriers for performance technology
Incompatible with advanced software technology
Inconsistent with software engineering process
Problem: Performance engineering for component systems
Respect software development methodology
Leverage software implementation technology
Look for opportunities for synergy and optimization
Recent Advances in the TAU Performance System
LLNL, September 2002
24
Focus on Component Technology and CCA
Emerging component technology for HPC and Grid
Component: software object embedding functionality
Component architecture (CA): how components connect
Component framework: implement a CA
Common Component Architecture (CCA)
Standard foundation for scientific component architecture
Component descriptions
Scientific
Interface Description Language (SIDL)
CCA ports for component interactions (provides and uses)
CCA services: directory, registery, connection, event
High-performance components and interactions
Recent Advances in the TAU Performance System
LLNL, September 2002
25
Extend Component Design for Performance
generic
component
Compliant with component architecture
Component composition performance engineering
Utilize technology and services of component framework
Recent Advances in the TAU Performance System
LLNL, September 2002
26
Performance Knowledge
Describe and store “known” component’s performance
Benchmark characterizations in performance database
Models of performance
empirical-based
simulation-based
analytical-based
Saved information about component performance
Use for performance-guided selection and deployment
Use for runtime adaptation
Representation must be in common forms with standard
means for accessing the performance information
Recent Advances in the TAU Performance System
LLNL, September 2002
27
Performance Knowledge Repository & Component
Component performance repository
Implement in component
architecture framework
Similar to CCA component
repository
Access by component
infrastructure
View performance knowledge as component (PKC)
PKC ports give access to performance knowledge
to other components
back to original component
Static/dynamic component control and composition
Component composition performance knowledge
Recent Advances in the TAU Performance System
LLNL, September 2002
28
Performance Observation
Ability to observe execution performance is important
Empirically-derived performance knowledge requires it
does
not require measurement integration in component
Monitor during execution to make dynamic decisions
measurement
Performance observation integration
integration is key
Component integration: core and variant
Runtime measurement and data collection
On-line and off-line performance analysis
Performance observation technology must be as portable
and robust as component software
Recent Advances in the TAU Performance System
LLNL, September 2002
29
Performance Observation Component (POC)
Performance observation in a
performance-engineered
component model
Functional extension of original
component design ( )
Include new component
methods and ports ( ) for other
components to access measured performance data
Allow original component to access performance data
encapsulate
as tightly-couple and co-resident performance
observation object
POC “provides” port allow use optmized interfaces ( )
to access ``internal'' performance observations
Recent Advances in the TAU Performance System
LLNL, September 2002
30
Architecture of a Performance Component
Each component advertises its services
Performance component:
Ports
Timer (start/stop)
Performance
Event (trigger)
Component
Query (timers…)
Knowledge (component performance model)
Timer
Event
Query
Knowledge
Prototype implementation of timer
CCAFFEINE reference framework
http://www.cca-forum.org/café.html
SIDL
Instantiate with TAU functionality
Recent Advances in the TAU Performance System
LLNL, September 2002
31
TimerPort Interface Declaration in CCAFEINE
Create Timer port abstraction
namespace performance{
namespace ccaports{
/**
* This abstract class declares the Timer interface.
* Inherit from this class to provide functionality.
*/
class Timer: /* implementation of port */
public virtual gov::cca::Port { /* inherits from port spec */
public:
virtual ~ Timer (){ }
/**
* Start the Timer. Implement this function in
* a derived class to provide required functionality.
*/
virtual void start(void) = 0; /* virtual methods with */
virtual void stop(void) = 0; /* null implementations */
...
}
Recent Advances in the TAU Performance System
LLNL, September 2002
32
Using Performance Component Timer
Component uses framework services to get TimerPort
Use of this TimerPort interface is independent of TAU
// Get Timer port from CCA framework services form CCAFFEINE
port = frameworkServices->getPort ("TimerPort");
if (port)
timer_m = dynamic_cast < performance::ccaports::Timer * >(port);
if (timer_m == 0) {
cerr << "Connected to something, not a Timer port" << endl;
return -1;
}
string s = "IntegrateTimer"; // give name for timer
timer_m->setName(s);
// assign name to timer
timer_m->start();
// start timer (independent of tool)
for (int i = 0; i < count; i++) {
double x = random_m->getRandomNumber ();
sum = sum + function_m->evaluate (x);
}
timer_m->stop();
// stop timer
Recent Advances in the TAU Performance System
LLNL, September 2002
33
Using SIDL for Language Interoperability
Can create Timer interface in SIDL for creating stubs
//
// File:
performance.sidl
//
version performance 1.0;
package performance {
class Timer {
void start();
void stop();
void setName(in string name);
string getName();
void setType(in string name);
string getType();
void setGroupName(in string name);
string getGroupName();
void setGroupId(in long group);
long getGroupId();
}
}
Recent Advances in the TAU Performance System
LLNL, September 2002
34
Using SIDL Interface for Timers
C++ program that uses the SIDL Timer interface
Again, independent of timer implementations (e.g., TAU)
// SIDL:
#include "performance_Timer.hh"
int main(int argc, char* argv[])
{
performance::Timer t = performance::Timer::_create();
...
t.setName("Integrate timer");
t.start();
// Computation
for (int i = 0; i < count; i++) {
double x = random_m->getRandomNumber ();
sum = sum + function_m->evaluate (x);
}
...
t.stop();
return 0;
}
Recent Advances in the TAU Performance System
LLNL, September 2002
35
Using TAU Component in CCAFEINE
repository
repository
repository
repository
repository
repository
repository
repository
create
create
create
create
create
create
get
get
get
get
get
get
get
get
TauTimer
Driver
MidpointIntegrator
MonteCarloIntegrator
RandomGenerator
LinearFunction
NonlinearFunction
PiFunction
/* get TAU component from repository */
/* get application components */
LinearFunction lin_func
/* create component instances */
NonlinearFunction nonlin_func
PiFunction pi_func
MonteCarloIntegrator mc_integrator
RandomGenerator rand
TauTimer tau
/* create TAU component instance */
/* connecting components and running */
connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort
connect mc_integrator FunctionPort nonlin_func FunctionPort
connect mc_integrator TimerPort tau TimerPort
create Driver driver
connect driver IntegratorPort mc_integrator IntegratorPort
go driver Go
quit
Recent Advances in the TAU Performance System
LLNL, September 2002
36
Component Composition Performance Engineering
Performance of component-based scientific applications
depends on interplay
Component functions
Computational resources available
Management of component compositions throughout
execution is critical to successful deployment and use
Identify key technological capabilities needed to support
the performance engineering of component compositions
Two model concepts
Performance awareness
Performance attention
Recent Advances in the TAU Performance System
LLNL, September 2002
37
Performance Awareness of Component Ensembles
Composition performance knowledge and observation
Composition performance knowledge
Can come from empirical and analytical evaluation
Can utilize information provided at the component level
Can be stored in repositories for future review
Extends the notion of component observation to
ensemble-level performance monitoring
Associate monitoring components hierarchical component
grouping
Build upon component-level observation support
Monitoring components act as performance integrators
and routers
Use component framework mechanisms
Recent Advances in the TAU Performance System
LLNL, September 2002
38
Performance Databases
Focus on empirical performance optimization process
Necessary for multi-results performance analysis
Integral component of performance analysis framework
Multiple experiments (codes, versions, platforms, …)
Historical performance comparison
Improved performance analysis architecture design
More flexible and open tool interfaces
Supports extensibility and foreign tool interaction
Performance analysis collaboration
Performance tool sharing
Performance data sharing and knowledge base
Recent Advances in the TAU Performance System
LLNL, September 2002
39
Empirical-Based Performance Optimization
Process
Experiment
Schemas
Performance
Tuning
hypotheses
Performance
Diagnosis
properties
Experiment
Trials
Performance
Experimentation
characterization
Performance
Observation
observability
requirements
Recent Advances in the TAU Performance System
?
LLNL, September 2002
40
TAU Performance Database Framework
Raw performance data
Performance
data description
PerfDML
translators
Performance
analysis programs
Performance analysis
and query toolkit
ORDB
PostgreSQL
profile data only
XML representation (PerfDML)
project / experiment / trial
Recent Advances in the TAU Performance System
...
PerfDB
LLNL, September 2002
41
PerfDBF Components
Performance Data Meta Language (PerfDML)
Performance DataBase (PerfDB)
Common performance data representation
Performance meta-data description
Translators to common PerfDML data representation
Standard database technology (SQL)
Free, robust database software (PostgresSQL)
Commonly available APIs
Performance DataBase Toolkit (PerfDBT)
Commonly used modules for query and analysis
Facility analysis tool development
Recent Advances in the TAU Performance System
LLNL, September 2002
42
Common and Extensible Profile Data Format
Goals
Capture data from profile tools in common representation
Implement representation in a standard format
Allow for extension of format for new profile data objects
Base on XML (obvious choice)
Leverage XML tools and APIs
XML parsers,
Sun’s Java SDK, …
XML verification systems (DTD and schemas)
Target for profile data translation tools
eXtensibile
Stylesheet Language Transformations (XSLT)
Which performance profile data are of interest?
Focus on TAU and consider other profiling tools
Recent Advances in the TAU Performance System
LLNL, September 2002
43
Performance Profiling
Performance data about program entities and behaviors
Statistics data
Execution time, number of calls, number of FLOPS ...
Characterization data
Parallel profiles
Code regions: functions, loops, basic blocks
Actions or states
Captured per process and/or per thread
Program-level summaries
Profiling tools
prof/gprof, ssrun, uprofile/dpci, cprof/vprof, …
Recent Advances in the TAU Performance System
LLNL, September 2002
44
TAU Parallel Performance Profiles
Recent Advances in the TAU Performance System
LLNL, September 2002
45
PerfDBF Example
NAS Parallel Benchmark LU
% configure -mpiinc=/usr/include -mpilib=/usr/lib64
-arch=sgi64 -fortran=sgi -SGITIMERS -useropt=-O2
NPB profiled
With TAU
Standard TAU TAU to XML TAU XML
Converter
Output Data
Format
Database Loader
SQL
Database
Recent Advances in the TAU Performance System
Analysis
Tool
LLNL, September 2002
46
Scalability Analysis Process
Scalability study on LU
Vary number of processes: 1, 2, 4, and 8
%
mpirun -np 1 lu.W1
% mpirun -np 2 lu.W2
% mpirun -np 4 lu.W4
% mpirun -np 8 lu.W8
Populate the performance database
run Java translator to translate profiles into XML
run Java XML reader to write XML profiles to database
Read times for routines and program from experiments
Calculate scalability metrics
Recent Advances in the TAU Performance System
LLNL, September 2002
47
Raw TAU Profile Data
Raw data output
One processor:
"applu ” 1 15 2939.096923828125
248744666.5830078
0 GROUP="applu“
51691412.17797852
51691412.17797852
51691519.34106445
51691377.21313477
0
0
0
0
Four processors:
"applu
"applu
"applu
"applu
” 1
” 1
"1
"1
15
15
14
14
2227.343994140625
2227.343994140625
596.568115234375
616.833251953125
Recent Advances in the TAU Performance System
GROUP="applu“
GROUP="applu“
GROUP="applu“
GROUP="applu"
LLNL, September 2002
48
XML Profile Representation
One processor
<instrumentedobj>
<funcname> 'applu '</funcname>
<funcID>8</funcID>
<inclperc>100.0</inclperc>
<inclutime>2.487446665830078E8</inclutime>
<exclperc>0.0</exclperc>
<exclutime>2939.096923828125 </exclutime>
<call>1</call>
<subrs>15</subrs>
<inclutimePcall>2.487446665830078E8</inclutimePcall>
</instrumentedobj>
Recent Advances in the TAU Performance System
LLNL, September 2002
49
XML Representation
Four processor mean
<meanfunction>
<funcname>'applu '</funcname>
<funcID>12</funcID>
<inclperc>100.0</inclperc>
<inclutime>5.169148940026855E7</inclutime>
<exclperc>0.0</exclperc>
<exclutime>1044.487548828125</exclutime>
<call>1</call>
<subrs>14.25</subrs>
<inclutimePcall>5.1691489E7</inclutimePcall>
</meanfunction>
Recent Advances in the TAU Performance System
LLNL, September 2002
50
Contents of Performance Database
Recent Advances in the TAU Performance System
LLNL, September 2002
51
Scalability Analysis Results
Scalability of LU performance experiments
Four
trial runs
Funname
….
applu
applu
applu
…
exact
exact
exact
| processors | meanspeedup
|2
|4
|8
| 2.0896117809566
| 4.812100975788783
| 8.168409581149514
|2
|4
|8
| 1.95853126762839071803
| 4.03622321124616535446
| 7.193812137750623668346
Recent Advances in the TAU Performance System
LLNL, September 2002
52
Current PerfDBF Status and Future
PerfDBF prototype
TAU profile to XML translator
XML to PerfDB populator
PostgresSQL database
Java-based PostgresSQL query module
Use
as a layer to support performance analysis tools
Make accessing the Performance Database quicker
Continue development
XML parallel profile representation
Basic specification
Opportunity for APART to define a common format
Recent Advances in the TAU Performance System
LLNL, September 2002
53
Performance Tracking and Reporting
Integrated performance measurement allows performance
analysis throughout development lifetime
Applied performance engineering in software design and
development (software engineering) process
Create “performance portfolio” from regular performance
experimentation (couple with software testing)
Use performance knowledge in making key software
design decision, prior to major development stages
Use performance benchmarking and regression testing to
identify irregularities
Support automatic reporting of “performance bugs”
Enable cross-platform (cross-generation) evaluation
Recent Advances in the TAU Performance System
LLNL, September 2002
54
XPARE - eXPeriment Alerting and REporting
Experiment launcher automates measurement / analysis
Configuration and compilation of performance tools
Instrumentation control for Uintah experiment type
Execution of multiple performance experiments
Performance data collection, analysis, and storage
Integrated in Uintah software testing harness
Reporting system conducts performance regression tests
Apply performance difference thresholds (alert ruleset)
Alerts users via email if thresholds have been exceeded
Web alerting setup and full performance data reporting
Historical performance data analysis
Recent Advances in the TAU Performance System
LLNL, September 2002
55
XPARE System Architecture
Experiment
Launch
Mail
server
Web
server
Performance
Reporter
Performance
Database
Alerting
Setup
Recent Advances in the TAU Performance System
Comparison
Tool
Regression
Analyzer
LLNL, September 2002
56
Concluding Remarks
Complex software and parallel computing systems pose
challenging performance analysis problems that require
robust methodologies and tools
To build more sophisticated performance tools, existing
proven performance technology must be utilized
Performance tools must be integrated with software and
systems models and technology
Performance engineered software
Function consistently and coherently in software and
system environments
TAU performance system offers robust performance
technology that can be broadly integrated
Recent Advances in the TAU Performance System
LLNL, September 2002
57
Recent Advances in the TAU Performance System
LLNL, September 2002