Transcript Document

Exascale Computing Initiative
(ECI)
Steve Binkley
DOE/ASCR
Bob Meisner
NNSA/ASC
April 1, 2015
Exascale Applications Respond to DOE/NNSA Missions
in Discovery, Design, and National Security
Scientific Discovery
– Mesoscale materials
and chemical sciences
– Improved climate
models with reduced
uncertainty
Engineering Design
National Security
 Nuclear power reactors
 Stockpile stewardship
 Advanced energy
technologies
 Real-time cybersecurity and
incident response
 Resilient power grid
 Advanced manufacturing
Blue Bold Text indicates planned or existing
exascale application projects
1
Stockpile Stewardship Challenges
Nuclear Stockpile
• Safety
• Surety
• Reliability
• Robustness
Thermonuclear burn
p, D, T, He3,He4
Weapons Science
Atomic Physics
ΔτBurn ~ 1012 sec
Δτee  1015 sec
Coulomb
Collisions
Burning Plasma
Radiation
(Photons)
Debye
screening
Quantum interference
and diffraction
Non-Proliferation
and Nuclear
Counter Terrorism
2
Spontaneous and
stimulated emission
Hydrodynamics
2
Mission: Extreme Scale Science
Next Generation of Scientific Innovation
• DOE's mission is to push the frontiers of science
and technology to:
– Enable scientific discovery
– Provide state-of-the-art scientific tools
– Plan, implement, and operate user facilities
• The next generation of advancements will require
Extreme Scale Computing
– 1,000X capabilities of today's Petaflop computers
with a similar size and power footprint
• Extreme Scale Computing, however, cannot be
achieved by a “business-as-usual” evolutionary
approach
• Extreme Scale Computing will require major novel advances in
computing technology – Exascale Computing
Exascale Computing Will Underpin Future Scientific Innovations
3
Exascale Computing Initiative
• Top-Line Messages:
– This effort is driven by the need for significant improvements in
computer performance to enable future scientific discoveries.
– The Department is developing a plan that will result in the deployment
of exascale-capable systems by early in the next decade.
– The budget request preserves options consistent with that timeline and
keeps the U.S. globally competitive in high performance computing.
– It is important to emphasize that this is a major research and
development effort to address and influence significant changes in
computing hardware and software, and our ability to use computers for
scientific discovery and engineering. It is not a race to deploy the first
exaflop machine.
4
Exascale Challenges and Issues
• Four primary challenges must be overcome
–
–
–
–
Parallelism / concurrency
Reliability / resiliency
Energy efficiency
Memory / Storage
• Productivity issues
– Managing system complexity
– Portability / Generality
• System design issues
– Scalability
– Time to solution
– Efficiency
• Extensive Exascale Studies
– US (DOE, DARPA, … ), Europe, Japan, …
5
Impact of No ECI: What’s at Stake?
• Power restrictions will limit the performance of future computing
systems
– Without ECI, industry will build an energy- and footprint-inefficient point
solution
• Declining US leadership in science,
engineering, and national security
– HPC is the foundation of the nation’s
nuclear security and economic leadership
– International R&D investment already
surpassing US
– Asia and Europe: China’s Tianhe-2 is #1
(HPL); EU’s Mont Blanc with ARM
• Increasing dependence on foreign technology
– Countries could exert export controls enforced against us
– There will be unacceptable cybersecurity and computer supply
chain risks
6
DOE Exascale Computing Initiative (ECI)
R&D Goals
• Develop a new era of computers: exascale computers
– Sustained 1018 operations/second and required storage for broader range of
mission-critical applications
– Create extreme-scale computing: approximately 1,000X performance of today's
computers within a similar size, cost, and power footprint
– Foster new generation of scientific, engineering, and large-data applications
• Create dramatically more productive systems
– Usable by a wide variety of scientists and engineers for more problem areas
– Simplifies efficiency and scalability for shorter time to solution and science result
• Develop marketable technologies
– Set industry on new trajectory of progress
– Exploit economies of scale and trickle-bounce effect
• Prepare for “Beyond Exascale”
7
What is Exascale Computing?
• What Exascale computing is not
– Exaflops Linpack Benchmark Computer
– Just a billion floating-point arithmetic units packaged
together
• What is Exascale computing?
– 1,000X performance over a “petaflop” system (exaflops
sustained performance on complex, real-world
applications)
– Similar power and space requirements as a petaflops
computer
– High programmability, generality, and performance
portability
8
Key Performance Goals
for an exascale computer (ECI)
Parameter
Performance
Power
Cabinets
System Memory
Sustained 1 – 10 ExaOPS
20 MW
200 - 300
128 PB – 256 PB
Reliability
Consistent with current platforms
Productivity
Scalable benchmarks
Throughput benchmarks
Better than or consistent with current
platforms
Target speedup over “current” systems
…
Target speedup over “current” systems
…
ExaOPS = 1018 Operations / sec
9
Exascale Target System Characteristics
• 20 pJ per average operation
• Billion-way concurrency (current systems have Million-way)
• Ecosystem to support new application development and
collaborative work, enable transparent portability,
accommodate legacy applications
• High reliability and resilience through self-diagnostics and
self-healing
• Programming environments (high-level languages, tools, …) to
increase scientific productivity
10
Exascale Computing
We Need to Reinvent Computing
Traditional path of 2x performance improvement every 18 months has ended
• For decades, Moore's Law plus Dennard scaling provided more, faster transistors in each
new process technology
• This is no longer true – we have hit a power wall!
• The result is unacceptable power requirements for increased performance
We cannot procure an exascale system based on today's or projected
future commodity technology
• Existing HPC solutions cannot be usefully scaled up to
exascale
• Energy consumption would be prohibitive (~300MW)
Exascale will require partnering with U.S.
computing industry to chart the future
• Industry at a crossroads and is open to new paths
• Time is right to push energy efficiency into the
marketplace
Exascale vs. Predecessor Computers
Parameter
Accepted
Sequoia
(CPU)
Summit &
Sierra
CPU-GPU
Titan
(CPU-GPU)
Exascale
2013
2013
2018
8
9
10
~ 20
20.13
27.11
150
> 1,000
96
200
192
> 200
98,304
18,688
3,500
TBD
System Memory (TB)
1,573
710
Linpack performance (PF)
17.17
17.59
Power (MW)
Peak Performance (PF)
Cabinets
Nodes
2,100 > 128,000
12
ECI Strategy
• Integrate applications, acquisitions, and research
and development
• Exploit co-design process, driven by the full
application workflow
• Develop exascale software stacks
• Partner with and fund vendors to transition
research to product space
• Collaborate with other government agencies and
other countries, as advantageous
13
Partnership with Industry is Vital
• We need industry involvement
– Don't want one-off, stove-piped solutions that are obsolete before they're
deployed
– Need continued “product” availability and upgrade potential beyond the lifetime
of this initiative
• Industry needs us
– Business model obligates industry to optimize for profit, beat competitors
– Internal investments heavily weighted towards near-term, evolutionary
improvements with small margin over competitors
– Funding for far-term technology is limited ($) and constrained in scope
• How do we impact industry?
– Work with those that have strong advocate(s) within the company
– Fund development and demonstration of far-term technologies that clearly show
potential as future mass-market products (or mass market components of families
of products)*
•
*Corollary: do not fund product development
– Industry has demonstrated that it will incorporate promising
technologies into future product lines
* Industrial contractor, private communication.
14
FY2011:
MOU between the SC and NNSA for the Coordination of Exascale Activities
Exascale Co-Design Centers Funded
Request for Information: Critical and Platform Technologies
DOE
Progress
Towards
Exascale
FY2012:
Programming Environments (X-Stack)
FastFoward 1: Vendor Partnerships on Critical Component technologies
FY2013:
Exascale Strategy Plan to Congress
Operating System / Runtime (OS/R)
DesignForward 1: Vendor Partnerships on Critical System-level technologies
Meeting with Secretary Moniz, “go get a solid plan with defendable cost”
FY2014:
Meetings with HPC vendors to validate ECI timeline, update on exascale plans and costs
Established Nexus / Plexus lab structure – determine software plans and costs
FastForward 2: Exascale Node designs
External Review of “Exascale Preliminary Project Design Document (EPPDD)”
FY2015:
DesignForward 2: Conceptual Designs of Exascale Systems
Release to ASCAC “Preliminary Conceptual Design for an Exascale Computing Initiative”
Generate requirements for exascale systems to be developed and deployed in FY-2023
Develop and release FOAs and RFPs, for funding in FY-2016
FY2016:
Initiate the Exascale Computing Initiative (ECI)
15
Schedule Baseline
16
Exploit Co-Design Process
Exascale Co-Design Center for
Materials in Extreme
Environments (ExMatEx)
– Director: Timothy Germann (LANL)
– http://www.exmatex.org
Center for Exascale Simulation of
Advanced Reactors (CESAR)
– Director: Andrew Siegel (ANL)
– https://cesar.mcs.anl.gov
Center for Exascale Simulation of
Combustion in Turbulance (ExaCT)
– Director: Jacqueline Chen (SNL)
– http://exactcodesign.org
17
Current partnerships with vendors
(jointly funded by SC & NNSA)
Fast Forward Program – node technologies
• Phase 1: Two year contracts, started July 1, 2012 ($64M )
• Phase 2: Two year contracts, started Fall 2014 ($100M)
• Performers: AMD, Cray, IBM, Intel, NVIDIA
Project Goals & Objectives
• Initiate partnerships with multiple companies to accelerate the R&D of critical node technologies
and designs needed for extreme-scale computing.
• Fund technologies targeted for productization in the 5–10 year timeframe.
Design Forward Program – system technologies
• Phase 1: Two year contracts, started Fall 2013 ($23M)
• Phase 2: Two year contracts. started Winter 2015 ($10M)
• Performers: AMD, Cray, IBM, Intel, NVIDIA
• Project Goals & Objectives
• Initiate partnerships with multiple companies to accelerate the R&D of interconnect architectures
and conceptual designs for future extreme-scale computers.
• Fund technologies targeted for productization in the 5–10 year timeframe.
18
FY-2016 ECI Cross-cut
(in $K)
FY 2015
Enacted
NNSA
ASC: Advanced Technology Development and Mitigation
SC
ASCR: Mathematical, Computational, and Computer
Sciences Research
ASCR: High Performance Computing and Network Facilities
BER
BES
SC Total
Exascale Total
FY 2016
Request
FY 2016 vs FY
2015
50,000
64,000
+14,000
41,000
43,511
+2,511
50,000
--8,000
99,000
134,383
18,730
12,000
208,624
+84,383
+18,730
+4,000
+109,624
149,000
272,624
+123,624
19
ECI Major Risks
• Maintaining strong leadership and commitment from the US
government.
• Achieving the extremely challenging power and productivity
goals.
• Decreasing reliability as power efficiency and system
complexity/concurrency increase.
• Vendor commitment and stability; deployment of the
developed technology.
21
Summary
• Leadership in high-performance computing (HPC) and large-scale data
analysis will advance national competitiveness in a wide array of strategic
sectors, including basic science, national security, energy technology, and
economic prosperity.
• The U.S. semiconductor and HPC industries have the ability to develop
the necessary technologies for an exascale computing capability early in
the next decade.
• An integrated approach to the development of hardware, software, and
applications is required for the development of exascale computers.
• ECI’s goal is to deploy two capable exascale computing systems.
22
END
23