Implementation and experience with a 20.4 TFLOPS IBM BladeCenter cluster Craig A.

Download Report

Transcript Implementation and experience with a 20.4 TFLOPS IBM BladeCenter cluster Craig A.

Implementation and experience with a 20.4
TFLOPS IBM BladeCenter cluster
Craig A. Stewart
Matthew Link, D. Scott McCaulay, Greg Rodgers,
George Turner, David Hancock, Richard Repasky,
Faisal Saied, Marlon Pierce, Ross Aiken,
Matthias Mueller, Matthias Jurenz, Matthias Lieber
26 June 2007
November 6, 2015
Outline
•
•
•
•
•
•
Background about Indiana University
Brief history of implementation
System architecture
Performance analysis
User experience and science results
Lessons learned to date
November 6, 2015
Introduction - IU in a nutshell
•
•
•
•
•
•
•
•
~$2B Annual Budget
One university with 8 campuses; 90.000
students, 3.900 faculty
878 degree programs, including nation’s 2nd
largest school of medicine
President Elect: Michael A. McRobbie
IT organization: >$100M/year IT budget, 7
Divisions
Research Technologies Division - responsible
for HPC, grid, storage, advanced viz
Pervasive Technology Labs (Gannon, Fox,
Lumsdaine)
Strategic priorities: life sciences and IT
November 6, 2015
Big Red - Basics and history
• Spring 2006: assembled in 17 days at IBM
facility, disassembled, shipped to IU,
reassembled in 10 days.
• 20,4 TFLOPS peak theortical, 15,04 achieved
on Linpack. 23rd on June 2006 Top500 List.
• In production for local users on 22 August 2006,
for TeraGrid users 1 October 2006
• Best Top500 rankingin IU history
• Upgraded to 30,72 TFLOPS Spring 2008, ???
on June 2007 Top500 List
• Named after nickname for IU sports teams
November 6, 2015
Motivations and goals
• Initial goals for 20,4 TFLOPS system:
Local demand for cycles exceeded supply
TeraGrid Resource Partner commitments to meet
Support life science research (Indiana
Metabolomics and Cytomics Initiative MetaCYT)
Support applications at 100s to 1000s of
processors
• 2nd phase upgrade to 30,7 TFLOPS
Support economic development in State of
Indiana
November 6, 2015
TeraGrid
Motivation for being part of
TeraGrid:
• Support national research
agendas
• Improve ability of IU
researchers to use national
cyberinfrastructure
• Testbed for IU computer
science research
• Compete for funding for
larger center grants
November 6, 2015
Why a PowerPC-based cluster?
• Processing power per node
• Density, good power efficiency relative to
available processors
Processor
TFLOPS/
MWatt
Mwatts/
PetaFLOPS
Intel Xeon
200
5,0
AMD
208
4,8
PowerPC 970 MP (dual core)
167
6,0
• Possibility of performance gains through use of
Altivec unit & VMX instructions
• Blade architecture provides flexibility for future
• Results of RFP
November 6, 2015
Feature
Description
Computational hardware
JS21 components
Two 2.5 GHz PowerPC 970MP processors, 8 GB RAM, 73 GB SAS Drive, 40
GFLOPS
No. of JS21 blades
768
No. of processors; cores
1,536 processors; 3,072 processor cores
Disk storage
Total system memory
6 TB
Local hard disk per blade
2.25 TB total
GPFS scratch space
266 TB
Lustre
535 TB
Home directory space
25 TB
Networks
Theoretical performance
30.72 TeraFLOPS
Total outbound network bandwidth
40 Gbit/sec
Bisection bandwidth
96 GB/sec - Myrinet 2000
5 GB/sec - Gigabit Ethernet
November 6, 2015
November 6, 2015
November 6, 2015
November 6, 2015
November 6, 2015
HPCC and Linpack Results
(510 nodes)
G-HPL
GPTRAN
S
GRandom
Access
G-FFTE
EPSTREAM
Sys
EPSTREAM
EPDGEMM
Triad
TFlop/s
GB/s
Gup/s
GFlop/s
GB/s
Total
13.53
40.76
0.2497
67.33
2468
Per
processo
r
0.013264
0.0399
0.000244
0.066
2.41
Per MPI
process
0.006632
3
0.0199
0.000122
0.033
1.20
GB/s
Random
Ring
Bandwidth
Rando
m Ring
Latency
GB/s
usec
0.0212
17.73
GFlop/s
1.21
0.0212
8.27
0.0212
November 6, 2015
IBM e1350 vs Cray XT3
per processor
IBM e1350 vs Cray XT3
per process (core)
November 6, 2015
IBM e1350 vs HP
XC4000
November 6, 2015
Benchmark
set
Nodes
Peak
Achieved
%
HPCC
Top500
Top500
510 nodes
512 nodes
768 nodes
20,40
20,48
30.72
13,53
15,04
21,79
66,3
73,4
70,9
Difference: 4 KB vs 16 MB page size
November 6, 2015
November 6, 2015
November 6, 2015
November 6, 2015
November 6, 2015
Comparative performance-NAMD
Processors
Seconds per timestep
Cobalt
(NCSA,
512 P,
6.55
TFLOPS)
Big Red
(IU,
512 P,
20.4
TFLOPS)
Mercury
(NCSA,
1262 P,
10.23
TFLOPS)
DataStar
(SDSC,
768 P,
14.3
TFLOPS)
64
0.032
0.034
0.041
0.075
128
0.019
0.019
0.024
0.039
256
0.005
0.012
0.022
0.024
November 6, 2015
• Simulation of TonB-dependent
transporter (TBDT)
• Used systems at NCSA, IU,
PSC
• Modeled mechanisms for
allowing transport of molecules
through cell membrane
• Work by Emad Tajkhorshid and
James Gumbart, of University
of Illinois Urbana-Champaign
• Mechanics of Force
Propagation in TonB-Dependent
Outer Membrane Transport.
Biophysical Journal 93:496-504
(2007)
November 6, 2015
ChemBioGrid
• Analyzed 555,007
abstracts in PubMed in ~
8,000 CPU hours
• Used OSCAR3 to find
SMILES strings -> SDF
format -> 3D structure
(GAMESS) -> into Varuna
database and then other
applications
• “Calculate and look up”
model for ChemBioGrid
November 6, 2015
WxChallenge
• Over 1,000 undergraduate students, 64
teams, 56 institutions
• Usage on Big Red:
~16,000 CPU hours on Big Red (most of
any TeraGrid resource)
63% of processing done on Big Red
Most of the students who used Big Red
couldn’t tell you what it is
November 6, 2015
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
November 6, 2015
Overall usage to date
Application
NAMD
WRF
???
CPU Hours
November 6, 2015
Overall user reactions
• NAMD, WRF users very pleased
• Some community codes essentially
excluded
• Porting from Intel instruction set a
significant perceived challenge in a cyclerich environment
• MILC optimization with VMX not
successful so far
November 6, 2015
Overall evaluation & conclusions
• The manageability of the system is excellent
• For a select group of applications, Big Red
provides excellent performance and reasonable
scalability
• We are likely to expand the 10GigE from Big
Red to the rest of the IU cyberinfrastructure
• We are installing a 7 TFLOPS Intel cluster;
model in future to be Intel-compatible processors
as “default entry point,” more specialized
systems for highly scalable codes
November 6, 2015
Pace of change
• The most powerful system attached to the TeraGrid has
changed 3 times since June 2006
• Absolute rate of change feels very fast
Processor
TFLOPS/
MWatt
Mwatts/
PetaFLOPS
PowerPC 970 MP (dual core)
167
6,0
Woodcrest (dual core, 3 GHz)
200
5,0
Clovertown (quad core, 2.3
GHz)
322
3,1
November 6, 2015
Conclusions
• A 20.4 TFLOPS system with “not the usual” processors was
successfully implemented serving local Indiana University
researchers, and the national research audience via the
TeraGrid (IU is 5th in providing cycles to TeraGrid at present)
• We had excellent success in some regards with the system;
excellent response in some niches
• In the future Science Gateways may be more and more
important in improving usability:
It’s impossible to expect most scientists to chase after the
fastest available system when the fastest system is
changing 3 times a year
Programmability of increasingly unusual architectures not
likely to become easier
For applications with broad potential user bases, or extreme
scalability on specialized systems, Science Gateways can
successfully hide complexity from researchers
November 6, 2015
•
•
•
•
•
•
•
•
Acknowledgements - funding
agencies
IU’s involvement as a TeraGrid Resource Partner is supported in part by the National Science
Foundation under Grants No. ACI-0338618l, OCI-0451237, OCI-0535258, and OCI-0504075
The IU Data Capacitor is supported in part by the National Science Foundation under Grant
No. CNS-0521433.
This research was supported in part by the Indiana METACyt Initiative. The Indiana METACyt
Initiative of Indiana University is supported in part by Lilly Endowment, Inc.
This work was supported in part by Shared University Research grants from IBM, Inc. to
Indiana University.
The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and
Dr. Beth Plale, and supported by grants ###___
The ChemBioGrid Portal is developed under the leadership of IU Professor Dr. Geoffrey C.
Fox and Dr. Marlon Pierce and funded via the Pervasive Technology Labs (supported by the
Lilly Endowment, Inc.) and the National Institutes of Health ###)))
Many of the ideas presented in this talk were developed under a Fulbright Senior Scholar’s
award to Stewart, funded by the
Any opinions, findings and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National Science
Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other
funding agency
November 6, 2015
Acknowledgements - People
• Malinda Lingwall deserves thanks for most of the .ppt layout work
• Maria Morris contributed to the graphics used in this talk
• Marcus Christie and Surresh Marru of the Extreme! Computing
Lab contributed the LEAD graphics
• John Morris (www.editide.us) and Cairril Mills (cairril.com Design
& Marketing) contributed graphics
• This work would not have been possible without the dedicated
and expert efforts of the staff of the Research Technologies
Division of University Information Technology Services, the
faculty and staff of the Pervasive Technology Labs, and the staf
of UITS generally.
• Thanks to the faculty and staff with whom we collaborate locally
at IU and globally (via the TeraGrid, and especially at Technische
Universitaet Dresden)
November 6, 2015
Author affiliations
Craig A. Stewart; [email protected]; Office of the Vice President and CIO, Indiana University, 601 E. Kirkwood, Bloomington, IN
Matthew Link; [email protected]; University Information Technology Services, Indiana University, 2711 E. 10th St., Bloomington, IN 47408
D. Scott McCaulay, [email protected], University Information Technology Services, Indiana University, 2711 E. 10th St., Bloomington, IN
47408
Greg Rodgers; [email protected]; IBM Corporation, 2455 South Road, Poughkeepsie, New York 12601
George Turner; [email protected]; University Information Technology Services, Indiana University, 2711 E. 10th St., Bloomington, IN 47408
David Hancock; dyhancoc@iupui,edu; University Information Technology Services, Indiana University — Purdue University Indianapolis, 535
W. Michigan Street, Indianapolis, IN 46202
Richard Repasky; [email protected], University Information Technology Services, Indiana University, 2711 E. 10th St., Bloomington, IN
47408
Peng Wang; [email protected]; University Information Technology Services, Indiana University — Purdue University Indianapolis, 535 W.
Michigan Street, Indianapolis, IN 46202
Faisal Saied; [email protected]; Rosen Center for Advanced Computing, Purdue University, 302 W. Wood Street, West Lafayette, Indiana 47907
Marlon Pierce; Community Grids Lab, Pervasive Technology Labs at Indiana University, 501 N. Morton Street, Bloomington, IN 47404
Ross Aiken; [email protected]; IBM Corporation, 9229 Delegates Row, Precedent Office Park Bldg 81, Indianapolis, IN 46240;
Matthias Mueller; [email protected]; Center for Information Services and High Performance Computing (ZIH) Dresden University
of Technology D-01062 Dresden, Germany
Matthias Jurenz; [email protected]; Center for Information Services and High Performance Computing (ZIH) Dresden University of
Technology D-01062 Dresden, Germany
Matthias Lieber; [email protected];Center for Information Services and High Performance Computing (ZIH) Dresden University of
Technology D-01062 Dresden, Germany
November 6, 2015
Thank you
• Any questions?