Transcript Slide 1
Computación de Alto Rendimiento en IBM Research Septiembre 2006 Dr. José G. Castaños [email protected] IBM T.J. Watson Research Center © 2006 IBM Corporation IBM Research El Projecto “Blue Gene” En Diciembre 1999, IBM Research anuncia el Blue Gene Producir nuevos adelantos en simulaciones biomoleculares Investigar nueva tecnologias en hardware y software para producir computadoras de alto rendimiento Blue Gene sigue un enfoque modular, donde el bloque basico (o célula) puede copiarse ad infinitum Procesadores de bajo consumo – permite rendimientos combinados mas altos – PowerPC 440 System-on-a-chip ofrece ventajas en costo/rendimiento – Menor complejidad – Alta densidad (2048 procesadores por rack, enfriado por aire) – Redes integradas para gran escala Ambiente de software familiar, simplificado para HPC Mucha atención a RAS (“reliability, availability, and serviceability”) en todo el sistema © 2006 IBM Corporation IBM Research El Chip Blue Gene/L (ASIC) PLB (4:1) 32k/32k L1 256 128 L2 440 CPU 4MB EDRAM “Double FPU” snoop Multiported Shared SRAM Buffer 256 32k/32k L1 440 CPU I/O proc 128 Shared L3 directory for EDRAM L3 Cache 1024+ or 144 ECC Memory L2 256 Includes ECC 256 “Double FPU” 128 • IBM CU-11, 0.13 µm • 11 x 11 mm die size • 25 x 32 mm CBGA • 474 pins, 328 signal • 1.5/2.5 Volt Ethernet Gbit Gbit Ethernet JTAG Access JTAG Torus 6 out and 6 in, each at 1.4 Gbit/s link Tree 3 out and 3 in, each at 2.8 Gbit/s link Global Interrupt 4 global barriers or interrupts DDR Control with ECC 144 bit wide DDR 256/512MB © 2006 IBM Corporation IBM Research Arquitectura de Blue Gene/L © 2006 IBM Corporation IBM Research Blue Gene/L en Lawrence Livermore National Laboratory BG/L Number of Racks 64 Number of Nodes 65536 Processor Frequency Peak Performance Memory 700 Mhz Rack 5.7 TF Machine 360 TF Linpack 280 TF App 101 TF Rack 256 GB Machine Power 16 TB ~2 MW Size 250 sq.m. Bisection Bandwidth 700 GB/s Cables Storage Number 5,000 Length 25 km Disk 200 TB © 2006 IBM Corporation IBM Research Blue Gene en los Top500 # Vendor Rmax TFlops Installation # Vendor Rmax TFlops Instalation 1 IBM 280.6 DOE/NSSA/LLNL (64 racks BlueGene/L) 11 IBM 27.91 MareNostrum Barcelona Supercomputer (JS20) 2 IBM 91.2 BlueGene at Watson (20 racks BlueGene/L) 12 IBM 27.45 ASTRON Netherlands (6 racks BlueGene/L) 3 IBM 75.8 ASC Purple LLNL (1280 nodes p5 575) 13 Cray 20.52 ORNL – Jaguar (XT3 Opteron) 4 SGI 51.9 NASA/Columbia (Itanium2) 14 Calif Dig 19.94 LLNL (Itanium2) 5 Bull 42.90 CEA/DAM Tera10 (Itanium2) 15 IBM 18.20 AIST - Japan (4 rack BlueGene/L) 6 Dell 38.27 Sandia -Thunderbird (EM64T/Infiniband) 16 IBM 18.20 EPFL - Switzerland (4 rack BlueGene/L) 7 Sun 38.18 Tsubame Galaxy TiTech (Opteron/Infiniband) 17 IBM 18.20 KEK – Japan (4 rack BlueGene/L) 8 IBM 37.33 FRZ – Juelich (8 racks BlueGene/L) 18 IBM 18.20 KEK – Japan (4 rack BlueGene/L) 9 Cray 36.19 Sandia – Red Storm (XT3 Opteron) 19 IBM 18.20 IBM – On Demand Ctr (4 rack BlueGene/L) 10 NEC 35.86 Japan Earth Simulator (NEC) 20 Cray 16.97 ERDC MSRC (Cray XT3 Opteron) Source: www.top500.org © 2006 IBM Corporation IBM Research Motivacion del Software de Sistema Nodos de Computación dedicados a ejecutar una sola aplicacion, and casi nada más Compute node kernel (CNK) Simplicidad! Nodos de I/O corren Linux and proveen servicios de OS – files, sockets, comenzar programas, señales, debugging, and fianalización de tareas Solution estandar: Linux Nodos de Servicio ejecutan todos los servicios de administración (e.g., latidos, checkean errores) transparente para el programa de los usuarios © 2006 IBM Corporation IBM Research Blue Gene/L: Architectura del Software de Sistema tree Service Node I/O Node 0 Front-end Nodes System Console DB2 CMCS File Servers C-Node 0 C-Node 63 fs client app app ciod CNK CNK Linux Functiona l Gigabit Ethernet torus I/O Node 1023 I2C LoadLeveler Control Gigabit Ethernet IDo chip Pset 0 C-Node 0 C-Node 63 fs client app app ciod CNK CNK Linux JTAG Pset 1023 © 2006 IBM Corporation IBM Research Classical MD – ddcMD 2005 Gordon Bell Prize Winner!! Scalable, general purpose code for performing classical molecular dynamics (MD) simulations using highly accurate MGPT potentials MGPT semi-empirical potentials, based on a rigorous expansion of many body terms in the total energy, are needed in to quantitatively investigate dynamic behavior of d-shell and fshell metals. 524 million atom simulations on 64K nodes achieved 101.5 TF/s sustained. Superb strong and weak scaling for full machine - (“very impressive machine” says PI) 2,048,000 Tantalum atoms Visualization of important scientific findings already achieved on BG/L: Molten Ta at 5000K demonstrates solidification during isothermal compression to 250 GPa © 2006 IBM Corporation IBM Research Resolidificación Rapida del Tantalum (ddcMD) Nucleation of solid is initiated at multiple independent sites throughout each sample cell Growth of solid grains initiates independently, but soon leads to grain boundaries which span the simulation cell: size of cell is now influencing continued growth 2,048,000 simulation recently performed indicates formation of many more grains © 2006 IBM Corporation