Transcript Slide 1

Computación de Alto Rendimiento
en IBM Research
Septiembre 2006
Dr. José G. Castaños
[email protected]
IBM T.J. Watson Research Center
© 2006 IBM Corporation
IBM Research
El Projecto “Blue Gene”
 En Diciembre 1999, IBM Research anuncia el Blue Gene


Producir nuevos adelantos en simulaciones biomoleculares
Investigar nueva tecnologias en hardware y software para producir computadoras
de alto rendimiento
 Blue Gene sigue un enfoque modular, donde el bloque basico (o célula)
puede copiarse ad infinitum
 Procesadores de bajo consumo – permite rendimientos combinados
mas altos
– PowerPC 440

System-on-a-chip ofrece ventajas en costo/rendimiento
– Menor complejidad
– Alta densidad (2048 procesadores por rack, enfriado por aire)
– Redes integradas para gran escala
Ambiente de software familiar, simplificado para HPC
 Mucha atención a RAS (“reliability, availability, and serviceability”) en todo el
sistema

© 2006 IBM Corporation
IBM Research
El Chip Blue Gene/L (ASIC)
PLB (4:1)
32k/32k L1
256
128
L2
440 CPU
4MB
EDRAM
“Double FPU”
snoop
Multiported
Shared
SRAM
Buffer
256
32k/32k L1
440 CPU
I/O proc
128
Shared
L3 directory
for EDRAM
L3 Cache
1024+
or
144 ECC
Memory
L2
256
Includes ECC
256
“Double FPU”
128
• IBM CU-11, 0.13 µm
• 11 x 11 mm die size
• 25 x 32 mm CBGA
• 474 pins, 328 signal
• 1.5/2.5 Volt
Ethernet
Gbit
Gbit
Ethernet
JTAG
Access
JTAG
Torus
6 out and
6 in, each at
1.4 Gbit/s link
Tree
3 out and
3 in, each at
2.8 Gbit/s link
Global
Interrupt
4 global
barriers or
interrupts
DDR
Control
with ECC
144 bit wide
DDR
256/512MB
© 2006 IBM Corporation
IBM Research
Arquitectura de Blue Gene/L
© 2006 IBM Corporation
IBM Research
Blue Gene/L en Lawrence Livermore National Laboratory
BG/L
Number of Racks
64
Number of Nodes
65536
Processor Frequency
Peak
Performance
Memory
700 Mhz
Rack
5.7 TF
Machine
360 TF
Linpack
280 TF
App
101 TF
Rack
256 GB
Machine
Power
16 TB
~2 MW
Size
250 sq.m.
Bisection Bandwidth
700 GB/s
Cables
Storage
Number
5,000
Length
25 km
Disk
200 TB
© 2006 IBM Corporation
IBM Research
Blue Gene en los Top500
#
Vendor
Rmax
TFlops
Installation
#
Vendor
Rmax
TFlops
Instalation
1
IBM
280.6
DOE/NSSA/LLNL
(64 racks BlueGene/L)
11
IBM
27.91
MareNostrum Barcelona
Supercomputer (JS20)
2
IBM
91.2
BlueGene at Watson
(20 racks BlueGene/L)
12
IBM
27.45
ASTRON Netherlands
(6 racks BlueGene/L)
3
IBM
75.8
ASC Purple LLNL
(1280 nodes p5 575)
13
Cray
20.52
ORNL – Jaguar
(XT3 Opteron)
4
SGI
51.9
NASA/Columbia (Itanium2)
14
Calif Dig
19.94
LLNL
(Itanium2)
5
Bull
42.90
CEA/DAM Tera10
(Itanium2)
15
IBM
18.20
AIST - Japan
(4 rack BlueGene/L)
6
Dell
38.27
Sandia -Thunderbird
(EM64T/Infiniband)
16
IBM
18.20
EPFL - Switzerland
(4 rack BlueGene/L)
7
Sun
38.18
Tsubame Galaxy TiTech
(Opteron/Infiniband)
17
IBM
18.20
KEK – Japan
(4 rack BlueGene/L)
8
IBM
37.33
FRZ – Juelich
(8 racks BlueGene/L)
18
IBM
18.20
KEK – Japan
(4 rack BlueGene/L)
9
Cray
36.19
Sandia – Red Storm
(XT3 Opteron)
19
IBM
18.20
IBM – On Demand Ctr
(4 rack BlueGene/L)
10
NEC
35.86
Japan Earth Simulator
(NEC)
20
Cray
16.97
ERDC MSRC
(Cray XT3 Opteron)
Source: www.top500.org
© 2006 IBM Corporation
IBM Research
Motivacion del Software de Sistema
 Nodos de Computación dedicados a
ejecutar una sola aplicacion, and casi nada
más
Compute node kernel (CNK)
 Simplicidad!

 Nodos de I/O corren Linux and proveen
servicios de OS – files, sockets, comenzar
programas, señales, debugging, and
fianalización de tareas
 Solution estandar: Linux
 Nodos de Servicio ejecutan todos los
servicios de administración (e.g., latidos,
checkean errores)

transparente para el programa de los
usuarios
© 2006 IBM Corporation
IBM Research
Blue Gene/L: Architectura del Software de Sistema
tree
Service Node
I/O Node 0
Front-end
Nodes
System
Console
DB2
CMCS
File
Servers
C-Node 0
C-Node 63
fs client
app
app
ciod
CNK
CNK
Linux
Functiona
l Gigabit
Ethernet
torus
I/O Node 1023
I2C
LoadLeveler
Control
Gigabit
Ethernet
IDo chip
Pset 0
C-Node 0
C-Node 63
fs client
app
app
ciod
CNK
CNK
Linux
JTAG
Pset 1023
© 2006 IBM Corporation
IBM Research
Classical MD – ddcMD
2005 Gordon Bell Prize Winner!!
 Scalable, general purpose code for
performing classical molecular dynamics (MD)
simulations using highly accurate MGPT
potentials
 MGPT semi-empirical potentials, based on a
rigorous expansion of many body terms in the
total energy, are needed in to quantitatively
investigate dynamic behavior of d-shell and fshell metals.
524 million atom simulations on 64K
nodes achieved 101.5 TF/s
sustained. Superb strong and weak
scaling for full machine - (“very
impressive machine” says PI)
2,048,000 Tantalum atoms
Visualization of important scientific findings
already achieved on BG/L: Molten Ta at 5000K
demonstrates solidification during isothermal
compression to 250 GPa
© 2006 IBM Corporation
IBM Research
Resolidificación Rapida del Tantalum (ddcMD)
Nucleation of solid is initiated at multiple independent sites throughout each sample cell
Growth of solid grains initiates independently, but soon leads to grain boundaries which
span the simulation cell: size of cell is now influencing continued growth
2,048,000 simulation recently performed indicates formation of many more grains
© 2006 IBM Corporation