DARPA Day 2005 Template - Parallel Programming Laboratory

Download Report

Transcript DARPA Day 2005 Template - Parallel Programming Laboratory

Benchmarking Working Group
Session Agenda
1:00-1:15 David Koester
What Makes HPC Applications Challenging?
1:15-1:30 Piotr Luszczek
HPCchallenge Challenges
1:30-1:45 Fred Tracy
1:45-2:00 Henry Newman
Algorithm Comparisons of Application
Benchmarks
I/O Challenges
2:00-2:15 Phil Colella
The Seven Dwarfs
2:15-2:30 Glenn Luecke
Run-Time Error Detection Benchmark
2:30-3:00 Break
3:00-3:15 Bill Mann
SSCA #1 Draft Specification
3:15-3:30 Theresa Meuse
SSCA #6 Draft Specification
Discussions — User Needs
3:30-??
HPCS Vendor Needs for the MS4 Review
HPCS Vendor Needs for the MS5 Review
HPCS Productivity Team Working Groups
MITRE
Slide-1
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
What Makes HPC Applications
Challenging?
David Koester, Ph.D
11-13 January 2005
HPCS Productivity Team Meeting
Marina Del Rey, CA
This work is sponsored by the Department of Defense under Army Contract W15P7T-05-C-D001.
Opinions, interpretations, conclusions, and recommendations are those of the author
and are not necessarily endorsed by the United States Government.
MITRE
Slide-2
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Outline
• HPCS Benchmark Spectrum
• What Makes HPC Applications Challenging?
–
–
–
–
–
Memory access patterns/locality
Processor characteristics
Concurrency
I/O characteristics
What new challenges will arise from Petascale/s+
applications?
• Bottleneckology
– Amdahl’s Law
– Example: Random Stride Memory Access
• Summary
MITRE
Slide-3
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
HPCS Benchmark Spectrum
Execution and
Development
Performance Indicators
Data Generator
System Bounds
1.
2. Kernel Optimal
3. Kernel Pattern
4. Kernel Matching
HPCchallenge
Benchmarks
4. Kernel
HPCS
Spanning
Set of
Kernels
Discrete
Math
…
Graph
Analysis
…
Linear
Solvers
…
Signal
Processing
…
Simulation
…
I/O
Data Generator
1. Kernel
3.
2. KernelSimulation
3. Kernel NWCHEM
4. Kernel
Data Generator
4.
Simulation
2. Kernel
NAS PB AU
1. Kernel
3. Kernel
4. Kernel
Data Generator
Existing Applications
Kernels
Emerging Applications
Global
Linpack
PTRANS
RandomAccess
1D FFT
2.
Graph
3. Kernel Analysis
2. Kernel
Future Applications
Local
DGEMM
STREAM
RandomAccess
1D FFT
1. Kernel
Current
UM2000
GAMESS
OVERFLOW
LBMHD
RFCTH
HYCOM
Near-Future
NWChem
ALEGRA
CCSM
5.
Simulation
2. Kernel
Multi-Physics
1. Kernel
3. Kernel
4. Kernel
Micro &
Kernel
Benchmarks
Data Generator
6.
1. Kernel
Signal
2. KernelProcessing
3. KernelKnowledge
4. KernelFormation
Scalable Synthetic
Compact Applications
MITRE
Slide-4
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Mission Partner
Application
Benchmarks
ISI
Simulation
Execution
Performance
Bounds
Data Generator
Reconnaissance
Execution
Performance
Indicators
Intelligence
1. Kernel
HPCS Benchmark Spectrum
Execution and
Development
Performance Indicators
Intelligence
System Bounds
2. Kernel
Execution
Performance
Indicators
Execution
Performance
Bounds
HPCchallenge
Benchmarks
Data Generator
• Full applications may be
1. Kernel
2.
Graph
3. Kernel Analysis
Kernels
challenging due to
4. Kernel
HPCS
Spanning
Set of
Kernels
Discrete
Math
…
Graph
Analysis
…
Linear
Solvers
…
Signal
Processing
…
Simulation
…
I/O
Data Generator
3.
– Killer
Kernels
Simulation
NWCHEM
– Global data layouts
– Input/Output
4.
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
• Killer Kernels are
Simulation
NAS PB AU
2. Kernel
3. Kernel
Existing Applications
2. Kernel
Emerging Applications
Global
Linpack
PTRANS
RandomAccess
1D FFT
4. Kernel
Future Applications
Local
DGEMM
STREAM
RandomAccess
1D FFT
3. Kernel
Current
UM2000
GAMESS
OVERFLOW
LBMHD
RFCTH
HYCOM
Near-Future
NWChem
ALEGRA
CCSM
challenging because of
5.
many things
Simulation that link
directlyMulti-Physics
to architecture
6.
Identify bottlenecks
by
Signal
Processing
mapping
applications to
Knowledge
Mission Partner
Formation
architectures
Application
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
•
Micro &
Kernel
Benchmarks
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Scalable Synthetic
Compact Applications
MITRE
Slide-5
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Benchmarks
ISI
Simulation
1. Kernel
Reconnaissance
What Makes
1.
HPCOptimal
Applications
Pattern
Challenging?
Matching
Data Generator
What Makes HPC Applications
Challenging?
•
•
Memory access patterns/locality
–
Spatial and Temporal
 Indirect addressing
 Data dependencies
Killer Kernels
Global Data Layouts
Processor characteristics
–
Processor throughput (Instructions per cycle)
 Low arithmetic density
 Floating point versus integer
–
•
•
Killer Kernels
Special features
 GF(2) math
 Popcount
 Integer division
Concurrency
–
–
Ubiquitous for Petascale/s
Load balance
Killer Kernels
Global Data Layouts
I/O characteristics
–
–
–
–
Bandwidth
Latency
File access patterns
File generation rates
MITRE
Slide-6
What Makes HPC
Applications Challenging
Input/Output
MIT Lincoln Laboratory
ISI
Cray
“Parallel Performance Killer” Kernels
Kernel
Performance Characteristic
RandomAccess
High demand on remote memory
No locality
3D FFT
Non-unit strides
High bandwidth demand
Sparse matrix-vector multiply
Irregular, unpredictable locality
Adaptive mesh refinement
Dynamic data distribution; dynamic
parallelism
Multi-frontal method
Multiple levels of parallelism
Sparse incomplete factorization
Amdahl’s Law bottlenecks
Preconditioned domain
decomposition
Frequent large messages
Triangular solver
Frequent small messages; poor ratio
of computation to communication
Branch-and-bound algorithm
Frequent broadcast synchronization
MITRE
Slide-7
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Killer Kernels
Phil Colella —The Seven Dwarfs
COMPUTATIONAL RESEARCH DIVISION
Algorithms that consume the bulk of the cycles of
current high-end systems in DOE
• Structured Grids (including locally structured grids, e.g.
AMR)
• Unstructured Grids
• Fast Fourier Transform
• Dense Linear Algebra
• Sparse Linear Algebra
• Particles
• Monte Carlo
(Should also include optimization / solution of nonlinear
systems, which at the high end is something one uses mainly
in conjunction with the other seven)
1
MITRE
Slide-8
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Mission Partner Applications
Memory Access Patterns/Locality
HPCS Challenge Points
HPCchallenge Benchmarks
HPCS Challenge Points
HPCchallenge Benchmarks
1
High
HPL
FFT
0.6
AVUS
NAS CG C
0.4
0.2
RandomAccess
0
0.75
0.8
0.85
STREAM
0.9
0.95
HPL
Temporal Locality
Temporal Locality
0.8
1
Low
FFT
Mission
Partner
Applications
RandomAccess
Low
Spatial Locality
Spatial Locality
• How do mission partner applications relate to HPCS
spatial/temporal view of memory?
– Kernels?
– Full applications?
MITRE
Slide-9
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
PTRANS
STREAM
High
Processor Characteristics
Special Features
•
Algorithmic speedup of 120x
Comparison of similar speed MIPS
processors with and without
–
–
•
•
Similar or better performance reported
using Alpha processors (Jack Collins
(NCIFCRF))
Codes
–
–
•
MITRE
Slide-10
What Makes HPC
Applications Challenging
GF(2) math
Popcount
Cray-supplied library
The Portable Cray Bioinformatics
Library by ARSC
References
–
http://www.cray.com/downloads/biolib.pdf
–
http://cbl.sourceforge.net/
MIT Lincoln Laboratory
ISI
Concurrency
Insert Cluttered VAMPIR Plot here
MITRE
Slide-11
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
pe
Ta
NA
S
Di
sk
M
em
or
y
ch
e
Ca
L2
Ca
L1
gi
s
CP
U
Re
ch
e
1.0E+11
1.0E+10
1.0E+09
1.0E+08
1.0E+07
1.0E+06
1.0E+05
1.0E+04
1.0E+03
1.0E+02
1.0E+01
1.0E+00
te
rs
Latency Differences
I/O Relative Data Latency‡
Note: 11 orders of magnitude relative differences!
‡Henry
MITRE
Slide-12
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Newman (Instrumental)
ISI
I/O Relative Data Bandwidth per CPU‡
1.0E+03
Times differnce
1.0E+02
1.0E+01
1.0E+00
1.0E-01
1.0E-02
CPU
Registers
L1 Cache
L2 Cache
Memory
Disk
Tape
NAS
Note: 5 orders of magnitude relative differences!
‡Henry
MITRE
Slide-13
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Newman (Instrumental)
ISI
Strawman
HPCS I/O Goals/Challenges
• 1 Trillion files in a single file system
– 32K file creates per second
• 10K metadata operations per second
– Needed for Checkpoint/Restart files
• Streaming I/O at 30 GB/sec full duplex
– Needed for data capture
• Support for 30K nodes
– Future file system need low latency communication
An envelope on HPCS Mission Partner requirements
MITRE
Slide-14
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
HPCS Benchmark Spectrum
Future and Emerging Applications
Execution and
Development
Performance Indicators
Data Generator
System Bounds
1.
2. Kernel Optimal
3. Kernel Pattern
4. Kernel Matching
–
–
Global
Linpack
PTRANS
RandomAccess
1D FFT
•
Graph
Analysis
4. Kernel
10-20K processor — 10-100 Teraflop/s
Discrete
scale applications
Data Generator
Math
3.
1. Kernel
…
20-120K processor
—
100-300
Simulation
2.
Kernel
Graph
NWCHEM
HPCS
Teraflop/s
scale
Analysis applications3. Kernel
4.
Kernel
…
Spanning
Petascale/sLinear
applications
Data Generator
Set of
Solvers
4.
1. Kernel
Applications
beyond
Petascale/s
Kernels
…
Simulation
2. Kernel
Signal
NAS PB AU
3. Kernel
LACSI Workshop
— The Path to
Processing
4. Kernel
…
Extreme Supercomputing
Simulation
HPCchallenge
– 12 October…2004
I/O
Benchmarks
– http://www.zettaflops/org
Data Generator
Current
UM2000
GAMESS
OVERFLOW
LBMHD
RFCTH
HYCOM
Near-Future
NWChem
ALEGRA
CCSM
5.
Simulation
2. Kernel
Multi-Physics
1. Kernel
3. Kernel
4. Kernel
Micro &
Kernel
Benchmarks
Data Generator
6.
1. Kernel
Signal
2. KernelProcessing
3. KernelKnowledge
4. KernelFormation
Mission Partner
Application
Benchmarks
Scalable
Synthetic
• What new challenges will arise from
Petascale/s+
applications?
Compact Applications
MITRE
Slide-15
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Simulation
–
3. Kernel
Reconnaissance
Local
DGEMM
STREAM
RandomAccess
1D FFT
2. Kernel
Kernels
Existing Applications
Performance
Bounds –
Data Generator
Partner
1. Kernel
2.
Emerging Applications
Identifying
Execution
efforts
Future Applications
•
Execution
Performance
Indicators
HPCS
Mission
Intelligence
1. Kernel
Outline
• HPCS Benchmark Spectrum
• What Makes HPC Applications Challenging?
–
–
–
–
–
Memory access patterns/locality
Processor characteristics
Parallelism
I/O characteristics
What new challenges will arise from Petascale/s+
applications?
• Bottleneckology
– Amdahl’s Law
– Example: Random Stride Memory Access
• Summary
MITRE
Slide-16
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Bottleneckology
• Bottleneckology
– Where is performance lost when an application is run on an
architecture?
– When does it make sense to invest in architecture to improve
application performance?
– System analysis driven by an extended Amdahl’s Law
 Amdahl’s Law is not just about parallel and sequential parts of
applications!
• References:
– Jack Worlton, "Project Bottleneck: A Proposed Toolkit for
Evaluating Newly-Announced High Performance Computers",
Worlton and Associates, Los Alamos, NM, Technical Report
No.13,January 1988
– Montek Singh, “Lecture Notes — Computer Architecture and
Implementation: COMP 206”, Dept. of Computer Science, Univ.
of North Carolina at Chapel Hill, Aug 30, 2004
www.cs.unc.edu/~montek/teaching/ fall-04/lectures/lecture-2.ppt
MITRE
Slide-17
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Lecture Notes — Computer Architecture
and Implementation (5)‡
‡Montek
MITRE
Slide-18
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Singh (UNC)
ISI
Lecture Notes — Computer Architecture
and Implementation (6)‡
‡Montek
MITRE
Slide-19
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Singh (UNC)
ISI
Lecture Notes — Computer Architecture
and Implementation (7)‡
Also works for Rate = Bandwidth!
‡Montek
MITRE
Slide-20
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Singh (UNC)
ISI
Lecture Notes — Computer Architecture
and Implementation (8)‡
‡Montek
MITRE
Slide-21
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Singh (UNC)
ISI
Bottleneck Example (1)
•
Combine stride 1 and random
stride memory access
–
–
•
25% random stride access
33% random stride access
Memory bandwidth
performance is dominated by
the random stride memory
access
SDSC MAPS on an IBM SP-3
MITRE
Slide-22
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Bottleneck Example (2)
•
Combine stride 1 and random
stride memory access
–
–
•
25% random stride access
33% random stride access
Memory bandwidth
performance is dominated by
the random stride memory
access
SDSC MAPS on a COMPAQ Alphaserver
Amdahl’s Law
MITRE
Slide-23
What Makes HPC
Applications Challenging
[ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s
MIT Lincoln Laboratory
ISI
Bottleneck Example (2)
•
Combine stride 1 and random
stride memory access
–
–
•
25% random stride access
33% random stride access
Memory bandwidth
performance is dominated by
the random stride memory
access
• Some HPCS Mission Partner applications
– Extensive random stride memory access
– Some random stride memory access
• However, even a small amount of random memory access can
SDSC
MAPS
on a COMPAQ
Alphaserver
cause
significant
bottlenecks!
Amdahl’s Law
MITRE
Slide-24
What Makes HPC
Applications Challenging
[ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s
MIT Lincoln Laboratory
ISI
Outline
• HPCS Benchmark Spectrum
• What Makes HPC Applications Challenging?
–
–
–
–
–
Memory access patterns/locality
Processor characteristics
Parallelism
I/O characteristics
What new challenges will arise from Petascale/s+
applications?
• Bottleneckology
– Amdahl’s Law
– Example: Random Stride Memory Access
• Summary
MITRE
Slide-25
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
ISI
Summary (1)
What makes Applications Challenging!
•
•
Memory access patterns/locality
–
Spatial and Temporal
 Indirect addressing
 Data dependencies
Processor characteristics
–
Processor throughput (Instructions per cycle)
 Low arithmetic density
 Floating point versus integer
–
•
•
Special features
 GF(2) math
 Popcount
 Integer division
Parallelism
–
–
• Understand Bottlenecks
•
•
Characterize applications
Characterize architectures
Bandwidth
Latency
File access patterns
File generation rates
MITRE
Slide-26
What Makes HPC
Applications Challenging
– HPCS Mission Partners
– HPCS Vendors
Ubiquitous for Petascale/s
Load balance
I/O characteristics
–
–
–
–
• Expand this List as required
• Work toward consensus with
MIT Lincoln Laboratory
ISI
HPCS Benchmark Spectrum
Execution and
Development
Performance Indicators
Intelligence
System Bounds
2. Kernel
Execution
Performance
Indicators
Execution
Performance
Bounds
HPCchallenge
Benchmarks
Data Generator
• Full applications may be
1. Kernel
2.
Graph
3. Kernel Analysis
2. Kernel
Current
Impress upon
challenging
duethe
to HPCS
UM2000
community
to identify
GAMESS
3.
OVERFLOW
– Killer
Kernels
Simulation makes the
what
LBMHD
NWCHEM
RFCTH
– application
Global data challenging
layouts
HYCOM
using an existing
– when
Input/Output
4.
Simulation
Mission
Partner
Near-Future
NAS PB AU
Killer
Kernels
are
application for a systems NWChem
ALEGRA
challenging
because
of
analysis5. in the MS4 review CCSM
many things
Simulation that link
Multi-Physics
directly to architecture
6.
Identify bottlenecks
by
Signal
Processing
mapping
applications to
Knowledge
Mission Partner
Formation
architectures
Application
4. Kernel
HPCS
Spanning
Set of
Kernels
Discrete
Math
…
Graph
Analysis
…
Linear
Solvers
…
Signal
Processing
…
Simulation
…
I/O
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
•
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
Existing Applications
Kernels
Emerging Applications
Global
Linpack
PTRANS
RandomAccess
1D FFT
4. Kernel
Future Applications
Local
DGEMM
STREAM
RandomAccess
1D FFT
3. Kernel
1. Kernel
2. Kernel
3. Kernel
•
Micro &
Kernel
Benchmarks
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Scalable Synthetic
Compact Applications
MITRE
Slide-27
What Makes HPC
Applications Challenging
MIT Lincoln Laboratory
Benchmarks
ISI
Simulation
1. Kernel
Reconnaissance
What Makes
1.
HPCOptimal
Applications
Pattern
Challenging?
Matching
Data Generator