Future of High Performance Computing Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Download ReportTranscript Future of High Performance Computing Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Future of High Performance Computing
Thom Dunning National Center for Supercomputing Applications
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Outline of Presentation
•
Directions in Computing Technology
• • From uni-core to multi-core chips On to many-core chips •
From Terascale to Petascale Computing
• • Science @ Petascale Blue Waters Petascale Computing System •
Path to Exascale Computing
• Issues for beyond petascale computing •
Take Home Lessons
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
A major shift is underway in computing technology with multicore and many-core chips
Directions in Computing Technology
Increasing Performance of Microprocessors
“In the past, performance scaling in conventional single-core processors has been accomplished largely through increases in clock frequency (
accounting for roughly 80 percent of the performance gains to date
).”
Platform 2015
S. Y. Borkar
et al
., 2006 Intel Corporation
Intel Pentium
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
Problem with Uni-core Microprocessors
Rocket Nozzle 1000 Nuclear Reactor 100 10 Hot Plate i386 Pentium 4 (Willamette) Pentium III Pentium II i486 Pentium Pro Pentium 1 1.5
m
1.0
m
0.7
m
Pentium 4 (Prescott) 0.5
m
0.35
m
0.25
m
0.18
m
0.13
m
0.1
m
0.07
m
Decreasing Feature Size Increasing Chip Frequency
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
From Uni-core to Multi-core Processors
• • • • •
Intel’s Nehalem
Modular Up to 8 cores 3 levels of cache Integrated memory controller Multiple QuickPath Interconnects
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
Switch to Multicore Chips
– – – “For the next several years the only way to obtain significant increases in microprocessor performance will be through increasing use of parallelism: 8 × 16 × in 2009-10, in 2011-12, and so on
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
On to Many-core Chips
AMD Llano (4 x86 cores + 480 stream processors) NVIDIA Fermi (512 cores) Intel Teraflops Chip (80 cores) Intel Many Integrated Cores (>80 x86+ cores)
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
Recent Evolution of NVIDIA GPUs
GPU:
Transistors CUDA Cores DP Floating Point SP Floating Point Shared Memory L1 Cache L2 Cache ECC Memory Address Width
G80
681 million 128 None 128 MAD/cycle 16 KB/SM None None No 32-bit
GT200
1,400 million 240 30 FMA/cycle 240 MAD/cycle 16 KB/SM None None No 32-bit
Fermi
3,000 million 512 256 FMA/cycle 512 FMA/cycle 16 or 48 KB/SM 16 or 48 KB/SM 768 KB Yes 64-bit
Peak DP performance = 256 FMA/cycle x 2 flops/FMA x 1.5 GHz = 768 GF Peak SP performance = 512 FMA/cycle x 2 flops/FMA x 1.5 GHz = 1,536 GF
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Directions in Computing Technology
Fermi Streaming Multiprocessor
•
Architecture
Streaming Multiprocessor
• • 16 SMs per chip Each SM has: • • 32 CUDA cores Floating point and integer units for each core • Fused multiply-add instruction • • 16 load-store units 4 special function units • Transcendental functions (sin, cos, reciprocal, square root)
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
• • •
Directions in Computing Technology
AMD’s Fusion “Application Processing Unit”
Heterogeneous Architecture
• • x86 cores Streaming processors
High Performance Interconnect High Performance Memory Controller
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters: From Terascale to Petascale Computing
A computing system for solving the most challenging compute-, memory- and data-intensive problems
Blue Waters
NSF Track 1 Solicitation
“The petascale HPC environment will enable investigations of computationally challenging problems that require computing systems capable
approaching 10 15
of delivering
sustained
(petaflops) on real applications, that consume
performance floating point operations per second large amounts of memory
, and/or that work with
very large data sets
.”
Leadership-Class System Acquisition - Creating a Petascale Computing Environment for Science and Engineering
NSF 06-573
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters
Computational Science and Engineering
Petascale computing will enable advances in a broad range of science and engineering disciplines:
Molecular Science Weather & Climate Forecasting Astronomy Earth Science
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Health
Blue Waters
Desired Attributes of Petascale System
• • • • •
Maximum Core Performance
…to minimize number of cores needed for a given performance level, lessen impact of sections of code with limited scalability
Low Latency, High Bandwidth Interconnect
…to enable science and engineering applications to scale to tens to hundreds of thousands of cores
Large, Fast Memories
…to solve the most memory-intensive problems
Large, Fast I/O System and Data Archive
…to solve the most data-intensive problems
Reliable Operation
…to enable the solution of Grand Challenge problems
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters
Building Blue Waters
Blue Waters will be the most powerful computer in the world for scientific research when it comes on line in Summer of 2011.
Power7 Chip
8 cores, 32 threads L1, L2, L3 cache (32 MB) Up to 256 GF (peak) 128 Gb/s memory bw
45 nm technology
Quad-chip Module
4 Power7 chips 128 GB memory 512 GB/s memory bw 1 TF (peak)
Hub Chip
1,128 GB/s bw
IH Server Node
8 QCM’s (256 cores) 8 TF (peak) 1 TB memory 4 TB/s memory bw 8 Hub chips Power supplies PCIe slots
Fully water cooled
Blue Waters Building Block
32 IH server nodes 256 TF (peak) 32 TB memory 128 TB/s memory bw 4 Storage systems (>500 TB) 10 Tape drive connections
Blue Waters
~10 PF Peak ~1 PF sustained >300,000 cores ~1.2 PB of memory >18 PB of disk storage 500 PB of archival storage ≥100 Gbps connectivity Blue Waters is built from components that can be used to build systems with a wide range of capabilities—from servers to beyond Blue Waters.
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters
Comparison: Jaguar and Blue Waters
System Attribute ORNL Jaguar (#1) NCSA Blue Waters
Vendor (Model) Processor Cray (XT5) AMD Opteron IBM Power7 IBM (PERCS)
~ 4
Peak Performance (PF) Sustained Performance (PF) Number of Cores/Chip Number of Processor Cores Amount of Memory (TB) Amount of On-line Disk Storage (PB) Sustained Disk Transfer (TB/sec) 2.3
?
6 224,256 299 5 0.24
1⅓ < 1½ 4 > 3 > 6 25
~10 ≳ 1 8 >300,000 ~1,200 >18 >1.5
up to
500
Blue Waters Project
Critical Features of Blue Waters. I
• •
High Performance Compute Module
• SMP system • • Four Power7 chips Hub chip • • Performance: 1 TF Memory: 128 GB
High Performance Interconnect
• High bandwidth, low latency • • Hub Chip/QCM: > 1 TB/sec/QCM Latency: ~ 1 m sec • Fully connected, two tier network • Copper + optical links
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters Project
Critical Features of Blue Waters. II
•
High Performance I/O and Data archive Systems
• Large storage subsystems • • On-line disks: > 18 PB (
usable
) Archival tapes: Up to 500 PB • • High sustained disk transfer rate: > 1.5 TB/sec (
sustained
) Fully integrated storage system: GPFS + HPSS •
General
• Hardware support for global shared memory
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters
National Petascale Computing Facility
Partners
EYP MCF/ Gensler IBM Yahoo!
•
Modern Data Center
• 90,000+ ft 2 total • 30,000 ft 2 20,000 ft 2 raised floor machine gallery •
Energy Efficiency
• • LEED certified Gold (goal: Platinum) PUE = 1.1–1.2
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Path to Exascale Computing
Although an exascale computer is at least 10 years away, the issues being confronted will impact all systems beyond Blue Waters
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Blue Waters
A Glimpse into the Future: Sequoia
System Attribute
Vendor (Model) Processor
NCSA Blue Waters
IBM (PERCS) IBM Power7 Peak Performance (PF) Sustained Performance (PF) Number of Cores/Chip Number of Processor Cores Amount of Memory (TB) Amount of On-line Disk Storage (PB) Sustained Disk Transfer (TB/sec) ~10 ≳ 1 8 >300,000 ~1,200 >18 >1.5
LLNL Sequoia
IBM BG/Q IBM PowerPC ~20 ?
16 ~1,600,000 ~1,600 ~50 0.5–1.0
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Path from Petascale to Exascale
• • • •
Levels of concurrency
• • Cores: 100s of thousands ➙ 100s of millions Threads: million ➙ billion •
Clock Rate of Core
• No significant increase
Memory per Core
• 1-4 GB ➙ 10s–100s of MB
Aggressive Fault Management in HW and SW Power Consumption
• 10 MW ➙ 40 MW – 150 MW
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Take Home Lessons
• • • •
Examine New Computing Technologies
• Computers of future will be based on many-core chips • Details TBD, but may be heterogeneous
Focus on Scalable Algorithms
• Only significant speed gains in future will come through increased parallelization
Explore New Programming Models
• • Computing systems will be (are!) collections of SMPs Need to assess and improve MPI/OpenMP, UPC, CAF
Enhance Reliability
• • Systems level (e.g., virtualization) Applications level
Petascale Summer School • 6-9 July 2010 • Urbana, Illinois
Questions?
Private Sector Program Annual Meeting • 12-14 May 2008 • Urbana, Illinois