Computational Informatics for Brain Electromagnetic Research

Download Report

Transcript Computational Informatics for Brain Electromagnetic Research

Integrating Performance Analysis in
the Uintah Software Development Cycle
Allen D. Malony,
Sameer Shende
J. Davison de St. Germain,
Allan Morris, Steven G. Parker
{malony,sameer}@cs.uoregon.edu
{dav,amorris,sparker}@cs.utah.edu
Department of Computer and
Information Science
Computational Science Institute
University of Oregon
Department of Computer Science
School of Computing
University of Oregon
Outline


Scientific software engineering
C-SAFE and Uintah Computational Framework (UCF)





TAU performance system
Role of performance mapping
Performance analysis integration in UCF



Goals and design
Challenges for performance technology integration
TAU performance mapping
X-PARE
Concluding remarks
May 16, 2002
2
ISHPC 2002
Scientific Software (Performance) Engineering

Modern scientific simulation software is complex ()




Large development teams of diverse expertise
Simultaneous development on different system parts
Iterative, multi-stage, long-term software development
Need support for managing complex software process


Software engineering tools for revision control,
automated testing, and bug tracking are commonplace
In contrast, tools for performance engineering are not
 evaluation
(measurement, analysis, benchmarking)
 optimization (diagnosis, tracking, prediction, tuning)

Incorporate performance engineering methodology and
support by flexible and robust performance tools
May 16, 2002
3
ISHPC 2002
Utah ASCI/ASAP Level 1 Center (C-SAFE)

C-SAFE was established to build a problem-solving
environment (PSE) for the numerical simulation of
accidental fires and explosions




Combine fundamental chemistry and engineering physics
Integrate non-linear solvers, optimization, computational
steering, visualization, and experimental data verification
Support very large-scale coupled simulations
Computer science problems:



May 16, 2002
Coupling multiple scientific simulation codes with
different numerical and software properties
Software engineering across diverse expert teams
Achieving high performance on large-scale systems
4
ISHPC 2002
Example C-SAFE Simulation Problems
Heptane fire simulation
∑
Typical C-SAFE simulation with
a billion degrees of freedom and
non-linear time dynamics
Material stress simulation
May 16, 2002
5
ISHPC 2002
Uintah Problem Solving Environment (PSE)

Enhanced SCIRun PSE




Pure dataflow  component-based
Shared memory  scalable multi-/mixed-mode parallelism
Interactive only  interactive plus standalone
Design and implement Uintah component architecture

Application programmers provide
 description
of computation (tasks and variables)
 code to perform task on single “patch” (sub-region of space)



Components for scheduling, partitioning, load balance, …
Follow Common Component Architecture (CCA) model
Design and implement Uintah Computational Framework
(UCF) on top of the component architecture
May 16, 2002
6
ISHPC 2002
Uintah High-Level Component View
May 16, 2002
7
ISHPC 2002
Uintah Parallel Component Architecture
C-SAFE
Problem Specification
High Level Architecture
Scheduler
Subgrid
Model
Mixing
Model
Simulation
Controller
Numerical
Solvers
Fluid
Model
High Energy
Simulations
Material
Properties
Database
MPM
Data
Manager
Post Processing
And Analysis
Numerical
Solvers
Parallel
Services
Resource
Management
Visualization
Database
Chemistry
Databases
Chemistry
Database
Controller
Performance
Analysis
Non-PSE Components
Implicitly
Connected to
All Components
UCF
Data
Checkpointing
PSE Components
Control / Light Data
Blazer
May 16, 2002
8
ISHPC 2002
Uintah Computational Framework (UCF)

Execution model based on software (macro) dataflow


Exposes parallelism and hides data transport latency
Computations expressed a directed acyclic graphs of tasks
 consumes
input and produces output (input to future task)
 input/outputs specified for each patch in a structured grid

Abstraction of global single-assignment memory





DataWarehouse
Directory mapping names to values (array structured)
Write value once then communicate to awaiting tasks
Task graph gets mapped to processing resources
Communications schedule approximates global optimal
May 16, 2002
9
ISHPC 2002
Uintah Task Graph (Material Point Method)


Diagram of named tasks
(ovals) and data (edges)
Imminent computation


Dataflow-constrained
MPM




May 16, 2002
Newtonian material point
motion time step
Solid: values defined at
material point (particle)
Dashed: values defined at
vertex (grid)
Prime (’): values updated
during time step
10
ISHPC 2002
Example Taskgraphs (MPM and Coupled)
May 16, 2002
11
ISHPC 2002
Uintah PSE

UCF automatically sets up:






Domain decomposition
Inter-processor communication with aggregation/reduction
Parallel I/O
Checkpoint and restart
Performance measurement and analysis (stay tuned)
Software engineering





May 16, 2002
Coding standards
CVS (Commits: Y3 - 26.6 files/day, Y4 - 29.9 files/day)
Correctness regression testing with bugzilla bug tracking
Nightly build (parallel compiles)
170,000 lines of code (Fortran and C++ tasks supported)
12
ISHPC 2002
Performance Technology Integration

Uintah presents challenges to performance integration

Software diversity and structure
 UCF
middleware, simulation code modules
 component-based hierarchy

Portability objectives
 cross-language
and cross-platform
 multi-parallelism: thread, message passing, mixed




Scalability objectives
High-level programming and execution abstractions
Requires flexible and robust performance technology
Requires support for performance mapping
May 16, 2002
13
ISHPC 2002
TAU Performance System Framework



Tuning and Analysis Utilities
Performance system framework for scalable parallel and
distributed high-performance computing
Targets a general complex system computation model




nodes / contexts / threads
Multi-level: system / software / parallelism
Measurement and analysis abstraction
Integrated toolkit for performance instrumentation,
measurement, analysis, and visualization


May 16, 2002
Portable performance profiling/tracing facility
Open software approach
14
ISHPC 2002
TAU Performance System Architecture
Paraver
EPILOG
May 16, 2002
15
ISHPC 2002
Performance Analysis Objectives for Uintah

Micro tuning


Optimization of simulation code (task) kernels for
maximum serial performance
Scalability tuning

Identification of parallel execution bottlenecks
 overheads:
scheduler, data warehouse, communication
 load imbalance


Adjustment of task graph decomposition and scheduling
Performance tracking


Understand performance impacts of code modifications
Throughout course of software development
 C-SAFE
May 16, 2002
application and UCF software
16
ISHPC 2002
Task Execution in Uintah Parallel Scheduler

Profile methods
and functions in
scheduler and in
MPI library
Task execution time
dominates (what task?)
Task execution time
distribution per process
MPI communication
overheads (where?)

Need to map
performance data!
May 16, 2002
18
ISHPC 2002
Semantics-Based Performance Mapping


Associate
performance
measurements
with high-level
semantic
abstractions
Need mapping
support in the
performance
measurement
system to assign
data correctly
May 16, 2002
19
ISHPC 2002
Hypothetical Mapping Example

Particles distributed on surfaces of a cube
Particle* P[MAX]; /* Array of particles */
int GenerateParticles() {
/* distribute particles over all faces of the cube */
for (int face=0, last=0; face < 6; face++){
/* particles on this face */
int particles_on_this_face = num(face);
for (int i=last; i < particles_on_this_face; i++) {
/* particle properties are a function of face */
P[i] = ... f(face);
...
}
last+= particles_on_this_face;
}
}
May 16, 2002
20
ISHPC 2002
Hypothetical Mapping Example (continued)
int ProcessParticle(Particle *p) {
/* perform some computation on p */
}
int main() {
GenerateParticles();
/* create a list of particles */
for (int i = 0; i < N; i++)
/* iterates over the list */
ProcessParticle(P[i]);
}



How much time (flops) spent processing face i particles?
What is the distribution of performance among faces?
How is this determined if execution is parallel?
May 16, 2002
21
ISHPC 2002
No Performance Mapping versus Mapping


Typical performance
tools report performance
with respect to routines
Does not provide support
for mapping

TAU (w/ mapping)
TAU (no mapping)
May 16, 2002
TAU’s performance
mapping can observe
performance with respect
to scientist’s
programming and
problem abstractions
22
ISHPC 2002
Uintah Task Performance Mapping


Uintah partitions individual particles across processing
elements (processes or threads)
Simulation tasks in task graph work on particles

Tasks have domain-specific character in the computation
 “interpolate



particles to grid” in Material Point Method
Task instances generated for each partitioned particle set
Execution scheduled with respect to task dependencies
How to attribute execution time among different tasks?

Assign semantic name (task type) to a task instance
 SerialMPM::interpolateParticleToGrid



Map TAU timer object to (abstract) task (semantic entity)
Look up timer object using task type (semantic attribute)
Further partition along different domain-specific axes
May 16, 2002
23
ISHPC 2002
Task Performance Mapping (Profile)
Mapped task
performance
across processes
Performance
mapping for
different tasks
May 16, 2002
24
ISHPC 2002
Task Performance Mapping (Trace)
Work packet
computation
events colored
by task type
May 16, 2002
Distinct phases of
computation can be
identifed based on task
25
ISHPC 2002
Task Performance Mapping (Trace - Zoom)
Startup
communication
imbalance
May 16, 2002
26
ISHPC 2002
Task Performance Mapping (Trace - Parallelism)
Communication
/ load imbalance
May 16, 2002
27
ISHPC 2002
Comparing Uintah Traces for Scalability Analysis
8 processes
32 processes
32 processes
May 16, 2002
8 processes
28
ISHPC 2002
Performance Tracking and Reporting


Integrated performance measurement allows
performance analysis throughout development lifetime
Applied performance engineering in software design and
development (software engineering) process





Create “performance portfolio” from regular performance
experimentation (couple with software testing)
Use performance knowledge in making key software
design decision, prior to major development stages
Use performance benchmarking and regression testing to
identify irregularities
Support automatic reporting of “performance bugs”
Enable cross-platform (cross-generation) evaluation
May 16, 2002
29
ISHPC 2002
XPARE - eXPeriment Alerting and REporting

Experiment launcher automates measurement / analysis






Reporting system conducts performance regression tests




Configuration and compilation of performance tools
Instrumentation control for Uintah experiment type
Execution of multiple performance experiments
Performance data collection, analysis, and storage
Integrated in Uintah software testing harness
Apply performance difference thresholds (alert ruleset)
Alerts users via email if thresholds have been exceeded
Web alerting setup and full performance data reporting
Historical performance data analysis
May 16, 2002
30
ISHPC 2002
XPARE System Architecture
Experiment
Launch
Mail
server
Performance
Database
Performance
Reporter
May 16, 2002
Alerting
Setup
Comparison
Tool
31
Regression
Analyzer
ISHPC 2002
Scaling Performance Optimizations (Past)
Last year:
initial “correct”
scheduler
Reduce
communication
by 10 x
Reduce task
graph overhead
by 20 x
ASCI Nirvana
SGI Origin 2000
Los Alamos
National Laboratory
May 16, 2002
32
ISHPC 2002
Scalability to 2000 Processors (Current)
ASCI Nirvana
SGI Origin 2000
Los Alamos
National Laboratory
May 16, 2002
33
ISHPC 2002
Concluding Remarks

Modern scientific simulation environments involves a
complex (scientific) software engineering process


Complex parallel software and systems pose challenging
performance analysis problems that require flexible and
robust performance technology and methods




Iterative, diverse expertise, multiple teams, concurrent
Cross-platform, cross-language, large-scale
Fully-integrated performance analysis system
Performance mapping
Neet to support performance engineering methodology
within scientific software design and development

May 16, 2002
Performance comparison and tracking
34
ISHPC 2002
Acknowledgements


Department of Energy (DOE), ASCI Academic
Strategic Alliances Program (ASAP)

Center for the Simulation of Accidental Fires and
Explosions (C-SAFE), ASCI/ASAP Level 1 center,
University of Utah
http://www.csafe.utah.edu

Computational Science Institute, ASCI/ASAP
Level 3 projects with LLNL / LANL,
University of Oregon
http://www.csi.uoregon.edu
ftp://ftp.cs.uoregon.edu/pub/malony/Talks/ishpc2002.ppt
May 16, 2002
35
ISHPC 2002