SciDAC Software Infrastructure for Lattice Gauge Theory

Download Report

Transcript SciDAC Software Infrastructure for Lattice Gauge Theory

SciDAC Software Infrastructure
for Lattice Gauge Theory
Richard C. Brower
Annual Progress Review
JLab , May 14 , 2007
Code distribution see
http://www.usqcd.org/software.html
Software Committee
•
•
•
•
•
•
•
Rich Brower (chair) [email protected]
Carleton DeTar
[email protected]
Robert Edwards
[email protected]
Don Holmgren
[email protected]
Bob Mawhinney
[email protected]
Chip Watson
[email protected]
Ying Zhang
[email protected]
SciDAC-2 Minutes/Documents &Progress report
http://super.bu.edu/~brower/scc.html
Major Participants in SciDAC Project
Arizona
Doug Toussaint
MIT
Dru Renner
BU
Rich Brower *
Joy Khoriaty
North Carolina
James Osborn
Mike Clark
BNL
Andrew Pochinsky
Rob Fowler
Ying Zhang *
JLab
Chip Watson *
Chulwoo Jung
Robert Edwards *
Enno Scholz
Jie Chen
Efstratios Efstathiadis
Balint Joo
Columbia
Bob Mawhinney *
IIT
Xien-He Sun
DePaul
Massimo DiPierro
Indiana
Steve Gottlieb
FNAL
Don Holmgren *
Jim Simone
Subhasish Basak
Utah
Jim Kowalkowski
Amitoj Singh
Carleton DeTar *
Ludmila Levkova
Vanderbilt
Ted Bapty
* Software Committee: Participants funded in part by SciDAC grant
QCD Software Infrastructure Goals: Create a
unified software environment that will enable the
US lattice community to achieve very high
efficiency on diverse high performance hardware.
Requirements:
1. Build on 20 year investment in MILC/CPS/Chroma
2. Optimize critical kernels for peak performance
3. Minimize software effort to
port to new platforms & to create new applications
Solution for Lattice QCD
 (Perfect) Load Balancing:
Uniform periodic lattices &
identical sublattices per
processor.
 (Complete) Latency hiding:
overlap computation
/communications
 Data Parallel: operations on
Small 3x3 Complex Matrices
per link.
Lattice Dirac operator:
 Critical kernels : Dirac Solver,
HMC forces, etc. ... 70%-90%
SciDAC-1 QCD API
Optimised for P4 and QCDOC
Level 3
Optimized Dirac Operators,
Inverters
Level 2
QDP (QCD Data Parallel)
Lattice Wide Operations,
Data shifts
Level 1
ILDG collab
QIO
Binary/ XML
Metadata Files
QLA (QCD Linear Algebra)
QMP (QCD Message Passing)
Exists in C/C++
C/C++, implemented over MPI, native
QCDOC, M-via GigE mesh
Data Parallel QDP/C,C++ API
•
•
•
•
•
•
•
Hides architecture and layout
Operates on lattice fields across sites
Linear algebra tailored for QCD
Shifts and permutation maps across sites
Reductions
Subsets
Entry/exit – attach to existing codes
Example of QDP++ Expression
 Typical for Dirac Operator:


QDP/C++ code:
Use Portable Expression Template Engine (PETE)
Temporaries eliminated, expressions optimized
MILC
QCD Physics Toolbox
Shared Alg,Building Blocks, Visualization,Performance Tools
Level 3
Level 2
Level 1
RoleYourOwn
SciDAC-2 QCD API
PERI
Level 4
/
Application Codes:
CPS /
Chroma /
TOPS
Workflow
and Data Analysis tools
QOP (Optimized in asm)
Uniform User Env
Dirac Operator, Inverters, Force etc
Runtime, accounting, grid,
QDP (QCD Data Parallel)
QIO
Lattice Wide Operations, Data shifts
Binary / XML files & ILDG
QLA
QMP
QMC
(QCD Linear Algebra)
(QCD Message Passing)
(QCD Multi-core interface)
SciDAC-1/SciDAC-2 = Gold/Blue
Level 3 Domain Wall CG Inverter y
JLAB 3G, Level II
JLab 3G, Level III
JLAB 4G, Level II
JLab 4G, Level III
FNAL Myrinet, Level III
(32 nodes)
1600
1400
Mflops/node
1200
1000
800
600
400
200
0
0
42 6 16
5000
(4) 52 20
10000
15000
Local lattice size
y
20000
25000
(6)102 40
Ls = 16, 4g is P4 2.8MHz, 800MHz FSB
30000
6 163
Asqtad Inverter on Kaon cluster @ FNAL
Mflop/s per core
1200
1000
800
SciDAC 16
SciDAC 64
non-SciDA C 16
600
non-SciDA C 64
400
200
0
4
6
8
10
12
14
L
Comparison of MILC C code vs SciDAC/QDP on L4 sub-volumes for
16 and 64 core partition of Kaon
Level 3 on QCDOC
DW RHMC kernels
323x64x16 with subvolumes 43x8x16
Mflop/s
400
350
300
Asqtad RHMC kernels
250
As qtad
200
150
100
50
0
3
4
6
8
10
Asqtad CG on L4 subvolumes
L
243x32 with subvolumes 63x18
Building on SciDAC-1

Fuller use of API in application code.
1.
Integrate QDP into MILC & QMP into CPS
2.
3.
Universal use of QIO, File Formats, QLA etc
Level 3 Interface standards
 Common Runtime Environment
1. File transfer, Batch scripts, Compile targets
2. Practical 3 Laboratory “Metafacility”

Porting API to INCITE Platforms
1. BG/L & BG/P: QMP and QLA using XLC & Perl script
2. Cray XT4 & Opteron, clusters
New SciDAC-2 Goals
 Exploitation of Multi-core
1. Multi-core not Hertz is new paradigm
2. Plans for a QMC API (JLab & FNAL & PERC)
See SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop
 Tool Box -- shared algorithms / building blocks
1.
RHMC, eigenvector solvers, etc
2.
3.
Visualization and Performance Analysis (DePaul & PERC)
Multi-scale Algorithms (QCD/TOPS Collaboration)
http://www.yale.edu/QCDNA/

Workflow and Cluster Reliability
1.
2.
Automated campaign to merge lattices, propagators and
extract physics . (FNAL & Illinois Institute of Tech)
Cluster Reliability: (FNAL & Vanderbuilt)
http://lqcd.fnal.gov/workflow/WorkflowProject.html
QMC – QCD Multi-Threading
• General Evaluation
– OpenMP vs. Explicit Thread library (Chen)
– Explicit thread library can do better than
OpenMP but OpenMP is compiler
dependent
•
Simple Threading API: QMC
– based on older smp_lib (Pochinsky)
– use pthreads and investigate barrier
synchronization algorithms
Serial
Code
Idle
Parallel
sites 0..7
Parallel
sites 8..15
• Evaluate threads for SSE-Dslash
• Consider threaded version of QMP
( Fowler and Porterfield in RENCI)
finalize / thread join
Conclusions
Progress has been made using a common QCD-API & libraries for
Communication, Linear Algebra, I/0, optimized inverters etc.
But full Implementation, Optimization, Documentation & Maintenance of
shared codes is a continuing challenge.
And there is much work to keep up with changing Hardware and
Algorithms.
Still NEW users (young and old) with no prior lattice experience
have initiated new lattice QCD research using SciDAC software!
The bottom line is PHYSICS is being well served.