Future of High Performance Computing Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.

Download Report

Transcript Future of High Performance Computing Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.

Future of High Performance Computing

Thom Dunning National Center for Supercomputing Applications

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

Outline of Presentation

Directions in Computing Technology

• • From uni-core to multi-core chips On to many-core chips •

From Terascale to Petascale Computing

• • Science @ Petascale Blue Waters Petascale Computing System •

Path to Exascale Computing

• Issues for beyond petascale computing •

Take Home Lessons

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

A major shift is underway in computing technology with multicore and many-core chips

Directions in Computing Technology

Increasing Performance of Microprocessors

“In the past, performance scaling in conventional single-core processors has been accomplished largely through increases in clock frequency (

accounting for roughly 80 percent of the performance gains to date

).”

Platform 2015

S. Y. Borkar

et al

., 2006 Intel Corporation

Intel Pentium

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

Problem with Uni-core Microprocessors

Rocket Nozzle 1000 Nuclear Reactor 100 10 Hot Plate i386 Pentium 4 (Willamette) Pentium III Pentium II i486 Pentium Pro Pentium 1 1.5

m

1.0

m

0.7

m

Pentium 4 (Prescott) 0.5

m

0.35

m

0.25

m

0.18

m

0.13

m

0.1

m

0.07

m

Decreasing Feature Size Increasing Chip Frequency

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

From Uni-core to Multi-core Processors

• • • • •

Intel’s Nehalem

Modular Up to 8 cores 3 levels of cache Integrated memory controller Multiple QuickPath Interconnects

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

Switch to Multicore Chips

– – – “For the next several years the only way to obtain significant increases in microprocessor performance will be through increasing use of parallelism: 8 × 16 × in 2009-10, in 2011-12, and so on

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

On to Many-core Chips

AMD Llano (4 x86 cores + 480 stream processors) NVIDIA Fermi (512 cores) Intel Teraflops Chip (80 cores) Intel Many Integrated Cores (>80 x86+ cores)

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

Recent Evolution of NVIDIA GPUs

GPU:

Transistors CUDA Cores DP Floating Point SP Floating Point Shared Memory L1 Cache L2 Cache ECC Memory Address Width

G80

681 million 128 None 128 MAD/cycle 16 KB/SM None None No 32-bit

GT200

1,400 million 240 30 FMA/cycle 240 MAD/cycle 16 KB/SM None None No 32-bit

Fermi

3,000 million 512 256 FMA/cycle 512 FMA/cycle 16 or 48 KB/SM 16 or 48 KB/SM 768 KB Yes 64-bit

Peak DP performance = 256 FMA/cycle x 2 flops/FMA x 1.5 GHz = 768 GF Peak SP performance = 512 FMA/cycle x 2 flops/FMA x 1.5 GHz = 1,536 GF

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Directions in Computing Technology

Fermi Streaming Multiprocessor

Architecture

Streaming Multiprocessor

• • 16 SMs per chip Each SM has: • • 32 CUDA cores Floating point and integer units for each core • Fused multiply-add instruction • • 16 load-store units 4 special function units • Transcendental functions (sin, cos, reciprocal, square root)

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

• • •

Directions in Computing Technology

AMD’s Fusion “Application Processing Unit”

Heterogeneous Architecture

• • x86 cores Streaming processors

High Performance Interconnect High Performance Memory Controller

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters: From Terascale to Petascale Computing

A computing system for solving the most challenging compute-, memory- and data-intensive problems

Blue Waters

NSF Track 1 Solicitation

“The petascale HPC environment will enable investigations of computationally challenging problems that require computing systems capable

approaching 10 15

of delivering

sustained

(petaflops) on real applications, that consume

performance floating point operations per second large amounts of memory

, and/or that work with

very large data sets

.”

Leadership-Class System Acquisition - Creating a Petascale Computing Environment for Science and Engineering

NSF 06-573

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters

Computational Science and Engineering

Petascale computing will enable advances in a broad range of science and engineering disciplines:

Molecular Science Weather & Climate Forecasting Astronomy Earth Science

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Health

Blue Waters

Desired Attributes of Petascale System

• • • • •

Maximum Core Performance

…to minimize number of cores needed for a given performance level, lessen impact of sections of code with limited scalability

Low Latency, High Bandwidth Interconnect

…to enable science and engineering applications to scale to tens to hundreds of thousands of cores

Large, Fast Memories

…to solve the most memory-intensive problems

Large, Fast I/O System and Data Archive

…to solve the most data-intensive problems

Reliable Operation

…to enable the solution of Grand Challenge problems

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters

Building Blue Waters

Blue Waters will be the most powerful computer in the world for scientific research when it comes on line in Summer of 2011.

Power7 Chip

8 cores, 32 threads L1, L2, L3 cache (32 MB) Up to 256 GF (peak) 128 Gb/s memory bw

45 nm technology

Quad-chip Module

4 Power7 chips 128 GB memory 512 GB/s memory bw 1 TF (peak)

Hub Chip

1,128 GB/s bw

IH Server Node

8 QCM’s (256 cores) 8 TF (peak) 1 TB memory 4 TB/s memory bw 8 Hub chips Power supplies PCIe slots

Fully water cooled

Blue Waters Building Block

32 IH server nodes 256 TF (peak) 32 TB memory 128 TB/s memory bw 4 Storage systems (>500 TB) 10 Tape drive connections

Blue Waters

~10 PF Peak ~1 PF sustained >300,000 cores ~1.2 PB of memory >18 PB of disk storage 500 PB of archival storage ≥100 Gbps connectivity Blue Waters is built from components that can be used to build systems with a wide range of capabilities—from servers to beyond Blue Waters.

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters

Comparison: Jaguar and Blue Waters

System Attribute ORNL Jaguar (#1) NCSA Blue Waters

Vendor (Model) Processor Cray (XT5) AMD Opteron IBM Power7 IBM (PERCS)

~ 4

Peak Performance (PF) Sustained Performance (PF) Number of Cores/Chip Number of Processor Cores Amount of Memory (TB) Amount of On-line Disk Storage (PB) Sustained Disk Transfer (TB/sec) 2.3

?

6 224,256 299 5 0.24

1⅓ < 1½ 4 > 3 > 6 25

~10 ≳ 1 8 >300,000 ~1,200 >18 >1.5

up to

500

Blue Waters Project

Critical Features of Blue Waters. I

• •

High Performance Compute Module

• SMP system • • Four Power7 chips Hub chip • • Performance: 1 TF Memory: 128 GB

High Performance Interconnect

• High bandwidth, low latency • • Hub Chip/QCM: > 1 TB/sec/QCM Latency: ~ 1 m sec • Fully connected, two tier network • Copper + optical links

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters Project

Critical Features of Blue Waters. II

High Performance I/O and Data archive Systems

• Large storage subsystems • • On-line disks: > 18 PB (

usable

) Archival tapes: Up to 500 PB • • High sustained disk transfer rate: > 1.5 TB/sec (

sustained

) Fully integrated storage system: GPFS + HPSS •

General

• Hardware support for global shared memory

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters

National Petascale Computing Facility

Partners

EYP MCF/ Gensler IBM Yahoo!

Modern Data Center

• 90,000+ ft 2 total • 30,000 ft 2 20,000 ft 2 raised floor machine gallery •

Energy Efficiency

• • LEED certified Gold (goal: Platinum) PUE = 1.1–1.2

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Path to Exascale Computing

Although an exascale computer is at least 10 years away, the issues being confronted will impact all systems beyond Blue Waters

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Blue Waters

A Glimpse into the Future: Sequoia

System Attribute

Vendor (Model) Processor

NCSA Blue Waters

IBM (PERCS) IBM Power7 Peak Performance (PF) Sustained Performance (PF) Number of Cores/Chip Number of Processor Cores Amount of Memory (TB) Amount of On-line Disk Storage (PB) Sustained Disk Transfer (TB/sec) ~10 ≳ 1 8 >300,000 ~1,200 >18 >1.5

LLNL Sequoia

IBM BG/Q IBM PowerPC ~20 ?

16 ~1,600,000 ~1,600 ~50 0.5–1.0

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Path from Petascale to Exascale

• • • •

Levels of concurrency

• • Cores: 100s of thousands ➙ 100s of millions Threads: million ➙ billion •

Clock Rate of Core

• No significant increase

Memory per Core

• 1-4 GB ➙ 10s–100s of MB

Aggressive Fault Management in HW and SW Power Consumption

• 10 MW ➙ 40 MW – 150 MW

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Take Home Lessons

• • • •

Examine New Computing Technologies

• Computers of future will be based on many-core chips • Details TBD, but may be heterogeneous

Focus on Scalable Algorithms

• Only significant speed gains in future will come through increased parallelization

Explore New Programming Models

• • Computing systems will be (are!) collections of SMPs Need to assess and improve MPI/OpenMP, UPC, CAF

Enhance Reliability

• • Systems level (e.g., virtualization) Applications level

Petascale Summer School • 6-9 July 2010 • Urbana, Illinois

Questions?

Private Sector Program Annual Meeting • 12-14 May 2008 • Urbana, Illinois