The Pittsburgh Supercomputing Center

Download Report

Transcript The Pittsburgh Supercomputing Center

Capability Computing
Challenges and Payoffs
Ralph Roskies
Scientific Director, Pittsburgh Supercomputing Center
Professor of Physics, University of Pittsburgh
December 10, 2003













P I TT TT S B U R G
G H
1
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Simulation is becoming an increasingly
indispensable tool in all areas of science.





Driven by relentless implications of Moore’s Law that has
the price of equivalent computing dropping by (at least) a
factor of 2 every 18 months
Simulation leads to new insights.
As computing gets stronger and the models more realistic,
more and more phenomena can be effectively simulated. It
sometimes becomes cheaper, faster, more accurate to
simulate than to do experiments
Progress in modeling will be greatly speeded by the new
ability to couple experiments to simulation
Both capacity computing and capability computing
essential













P I TT TT S B U R G
G H
2
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Why capability computing?


Many important problems require tightly-coupled
leading-edge computing capability
Real-time constraints may require the highest end
capability
• Weather forecasting, storm modeling
• Interactive requirements













P I TT TT S B U R G
G H
3
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Top 500 –processor type













P I TT TT S B U R G
G H
4
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Top 500 architecture













P I TT TT S B U R G
G H
5
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
PSC Terascale Computing System


Designed a machine its its
operation to facilitate the
highest capability
computations
At its introduction (Oct
2001) was number 3 most
powerful machine in the
world. (now 12).













P I TT TT S B U R G
G H
6
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Challenges of Capability Computing




Technical
• Machine bottlenecks
• Reliability
• Power and Space Needs
Operational
• Scheduling
• Maintenance
• User support
Cultural
• Users
• Vendor
Political
• Concentrating resources justifiable only if results otherwise
unobtainable













P I TT TT S B U R G
G H
7
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Technical: Machine bottlenecks



Processor performance (usually memory bandwidth)
• Peak flops or Linpack not the measure
• Commodity processors required by fiscal considerations
Memory size
• Global shared memory not very important
• At least one GB/processor
Interprocessor communication
• Essential for scaling large problems
• Want low latency, high bandwidth and redundancy













P I TT TT S B U R G
G H
8
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
HPC also means massive data handling,
data repositories, and visualization




Input/Output
• Take advantage of parallel IO from each processor
• Major demands come from snapshots and
checkpointing
• Wrote optimized routines using underlying Quadrics
capabilities to speed IO and file transfer
Coupled visualization sector (Linux Intel PCs with Nvidia
cards) into Terascale system over Quadrics switch
Designed cost-effective global disk system linked to HSM
High-speed networking













P I TT TT S B U R G
G H
9
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Terascale Computing System
Quadrics
Control
Summary
• 750 Compute Nodes
LAN
• 3000 Alpha processors
• 6 Tf peak
Compute Nodes
Interactive
/tmp
/home
File Servers
WAN/LAN
Viz
Mass Store
WAN/LAN
•3 TB memory
• 40 TB local disk
• Multi-rail fat-tree network
• Redundant monitor/ctrl
• WAN/LAN accessible
• Parallel visualization
• File servers:
30TB, 32 GB/s
• Mass store, ~1 TB/hr

WAN/LAN












P I TT TT S B U R G
G H
10
Archive
buffer
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Technical: Reliability


750 servers. If each has one failure a year, this system fails
twice a day. Most calculations take longer than that.
• Solution is redundancy where feasible
• Spares
• Checkpoint/restart capability
Vendor has no way to test and validate software updates on
a system this size.
• Solution is cooperative effort to validate code right on
our machine.













P I TT TT S B U R G
G H
11
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Operational: Scheduling




Had been doing a drain at 8pm every night
Costs about 5% in throughput
Experimenting with continuous drain
Reservations for real-time work, large scale debugging













P I TT TT S B U R G
G H
12
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
TCS Usage Pattern emphasizes large
processor count jobs
Percent Usage by Size
100%
Percent of Hours Delivered
90%
80%
%2048 to 4095
70%
%1024 to 2047
60%
%512 to 1023
50%
%256 to 511
40%
%128 to 255
%64 to127
30%
%<64
20%
10%
Aug-03
Jul-03
Jun-03
May-03
Apr-03
Mar-03
Feb-03
Jan-03
Dec-02
Nov-02
Oct-02
Sep-02
Aug-02
Jul-02
Jun-02
May-02
Apr-02
Mar-02
0%
Date













P I TT TT S B U R G
G H
13
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Operational: Maintenance




Rapid vendor response to failure not the thing to focus on
Highly instrumented dark machine room
Spares
“Bring out your dead”













P I TT TT S B U R G
G H
14
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Operational: User support


Legacy codes can’t just be scaled up
• (“we’re not computer scientists- we just want to get our work
done”)
For performance, codes designed for tens of processors have to be
rethought and rewritten
• Ratio of computation to communication changes
• Scaling may require new algorithms
• Load balancing must be done dynamically
• May have to change libraries
Solution is to make PSC consultants de-facto members
of the research group. Work very closely with users.
“Large calculations have the flavor of big experiments. You need
someone monitoring, scheduling, facilitating.”













P I TT TT S B U R G
G H
15
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Consultant contributions




Optimize code
Optimize IO (e.g. aggregating messages)
Internal advocates
• With systems group to facilitate scheduling, special
requests e.g. larger temporary disk assignment
• To vendor (PSC is the customer)
Workshops on optimization, scaling, load balancing













P I TT TT S B U R G
G H
16
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Lessons from scaling workshop


17
Control granularity; Virtualize
• Define problem in terms of a large number of small
objects greater than the number of processors
• Let the system map objects to processors. Time
consuming objects can be broken down into shorter
ones, which allows better load balancing.
Incorporate latency tolerance
• Overlap communication with computation
• If multiple objects on one processor are sending
messages to another, aggregate them
• If messages trigger computation, pipeline them to
initiate computation earlier
• Don’t wait-speculate, pre-fetch













P I TT TT S B U R G
G H
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Lessons from scaling workshop




Reduce dependency on synchronization
• Regular communications often rely on synchronization
• Heterogeneity exacerbates problem
Maintain per-process load
• Requires distributed monitoring capabilities
• Let the system map objects to processes
Use optimized libraries (e.g. ATLAS)
Develop performance models (machine profiles;
application signatures) to anticipate bottlenecks
Only new aspect is the degree to which these
things matter













P I TT TT S B U R G
G H
18
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Case Study:
NAMD Scalable Molecular Dynamics



Three-dimensional object-oriented code
Message-driven execution capability
Asynchronous communications













P I TT TT S B U R G
G H
19
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Results are excellent scaling
F1-ATPase 2.5bell Lemieux
10
4 procs/node
perfect
1
0.1
Seconds/Step
F1-ATPase (PME)
327,506 atoms













0.01
4
20
8
16
32
64
128
256
512
1024
2048
P I TT TT S B U R G
G H
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Some scaling successes at PSC




NAMD now scales to 3000 processors, > 1Tf sustained
Earthquake simulation code, 2048 processors, 87%
parallel efficiency.
‘Real-time’ tele-immersion code scales to 1536 processors
Increased scaling of the Car-Parrinello Ab-Initio
Molecular Dynamics (CPAIMD) code from its previous
limit of 128 processors (for 128 states) to 1536 processors.













P I TT TT S B U R G
G H
21
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Payoffs- Insight into important real-life
problems



Insights
• Structure to function of biomolecules
Increased realism to confront experimental data
• Earthquakes and design of buildings
• QCD
Novel uses of HPC
• Teleimmersion
• Internet simulation













P I TT TT S B U R G
G H
22
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
How Aquaporins Work
(Schulten group, University of Illinois)


Aquaporins are proteins which conduct large volumes of
water through cell walls while filtering out charged particles
like hydrogen ions.
Start with known crystal structure, simulate 12 nanoseconds
of molecular dynamics of over 100,000 atoms, using NAMD













P I TT TT S B U R G
G H
23
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Aquaporin mechanism
Water moves through aquaporin
channels in single file. Oxygen
leads the way in. At the most
constricted point of channel,
water molecule flips.
Protons can’t do this.
Animation pointed to by 2003
Nobel chemistry prize
announcement













P I TT TT S B U R G
G H
24
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
High Resolution Forward and Inverse
Earthquake Modeling on Terascale Computers
Volkan Akcelik, Jacobo Bielak, Ioannis Epanomeritakis
Antonio Fernandez, Omar Ghattas, Eui Joong Kim
Julio Lopez, David O'Hallaron, Tiankai Tu
Carnegie Mellon University
George Biros
University of Pennsylvania
John Urbanic
Pittsburgh Supercomputing Center













P I TT TT S B U R G
G H
25
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Complexity of earthquake ground
motion simulation






Multiple spatial scales
• wavelengths vary from O(10m) to O(1000m)
• Basin/source dimensions are O(100km)
Multiple temporal scales
• O(0.01s) to resolve highest frequencies of source
• O(10s) to resolve of shaking within the basin
So need unstructured grids even though good parallel
performance harder to achieve
Highly irregular basin geometry
Highly heterogeneous soils material properties
Geology and source parameters observable
only indirectly













P I TT TT S B U R G
G H
26
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Performance of forward earthquake
modeling code on PSC Terascale system
Largest simulation
• 28 Oct 2001 Compton aftershock in
Greater LA Basin
• maximum resolved frequency: 1.85Hz
• 100m/s min shear wave velocity
• physical size: 100x100x37.5 km3
• # of elements: 899,591,066
• # of grid points: 1,023,371,641
• # of slaves: 125,726,862
• 25 sec wallclock/time step on 1024 PEs
27
• 65 Gb input













P I TT TT S B U R G
G H
lemieux at PSC
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Role of PSC
Assistance in




Optimization
Efficient IO of terabyte size datasets
Expediting scheduling
Visualization













P I TT TT S B U R G
G H
28
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Inverse problem: Use records of past
seismic events to improve velocity model
S. CA significant earthquakes since 1812













Seismometer locations and intensity map
for Northridge earthquake
P I TT TT S B U R G
G H
29
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Inverse problem













P I TT TT S B U R G
G H
30
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Major recognition
This entire effort won Gordon Bell prize for
special achievement, 2003, the premier prize
for outstanding computations in HPC. Given to
the entry that utilizes innovative techniques to
demonstrate the most dramatic gain in
sustained performance for an important class
of real-world application.













P I TT TT S B U R G
G H
31
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
QCD

Increased realism to
confront experimental data
 QCD – compelling
evidence for the need to
include quark virtual
degrees of freedom
 Improvements due to
continued algorithmic
development, access to
major platforms and
sustained effort over
decades













P I TT TT S B U R G
G H
32
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Tele-immersion (real time)
Henry Fuchs, U. of North Carolina
can process 6
frames/sec
(640 x 480)
from 10 camera
triplets using
1800
processors.













P I TT TT S B U R G
G H
33
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Simulating Network traffic
(almost real time)
George Riley et al (Georgia Tech)



Simulating networks with > 5M elements.
• modeled 106M packet transmissions in one second of wall
clock time, using 1500 processors
Near real time web traffic simulation
• Empirical HTTP Traffic model [Mah, Infocom ‘97]
• 1.1M nodes, 1.0M web browsers, 20.5M TCP Connections
• 541 seconds of wall clock time on 512 processors to
simulate 300 seconds of network operation
Fastest detailed computer simulations of computer networks
ever constructed













P I TT TT S B U R G
G H
34
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Where are grids in all this?



Grids aimed primarily at:
• Availability- computing on demand
• Reduce influence effect of geographic distance
• Make services more transparent
Motivated by remote data, on-line instruments, sensors,
as well as computers
They also contribute to the highest end by aggregating
resources.
“ The emerging vision is to use cyberinfrastructure to build more
ubiquitous, comprehensive digital environments that become
interactive and functionally complete for research communities in
terms of people, data, information, tools, and instruments and that
operate at unprecedented levels of computational, storage, and data
transfer capacity.”
NSF Blue Ribbon Panel on Cyberinfrastructure













P I TT TT S B U R G
G H
35
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
DTF (2001)
IA 64 clusters at 4 sites
10 Gb/s point to point links
Can deliver 30 Gb/s
between 2 sites
Caltech
LA
ANL
Chicago
SDSC
NCSA
Physical Topology
(Full Mesh)













P I TT TT S B U R G
G H
36
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Extensible Terascale Facility (2002)




Make network scalable, so
introduce hubs
Allow heterogeneous
architecture, and retain
interoperability
First step is integration of
PSC’s TCS machine
Many more computer science
interoperability issues
3 new sites approved in 2003
(Texas, Oak Ridge, Indiana)













P I TT TT S B U R G
G H
37
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Examples of Science Drivers

GriPhyn - Particle physics- Large Hadron Collider at CERN
• Overwhelming amount of data for analysis (>1 PB/year)
• Find rare events resulting from the decays of massive
new particles in a dominating background
• Need new services to support world-wide data access and
remote collaboration for coordinated management of
distributed computation and data without centralized control













P I TT TT S B U R G
G H
38
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Examples of Science Drivers
NVO- National Virtual Observatory
• Breakthroughs in telescope, detector, and computer
technology allow astronomical surveys to produce
terabytes of images and catalogues, in different wavebands,
from gamma- and X-rays, optical, infrared, through radio.
• Soon it will be easier to "dial-up" a part of the sky than
wait many months to access a telescope.
• Need multi-terabyte on-line databases interoperating
seamlessly, interlinked catalogues, sophisticated query
engines
 research results from on-line data will be just as rich
as that from "real" telescopes














P I TT TT S B U R G
G H
39
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
UK – Teragrid
HPC-Grid Experiment
TeraGyroid: Lattice-Boltzmann
simulations of defect dynamics in
amphiphilic liquid crystals




ANL
Peter Coveney (University College London),
Richard Blake (Daresbury Lab)
Stephen Pickles (Manchester).
Bruce Boghosian (Tufts)













P I TT TT S B U R G
G H
40
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Project Partners
Reality Grid partners:
• University College London (Application,
Visualisation, Networking)
• University of Manchester (Application,
Visualisation, Networking)
• Edinburgh Parallel Computing Centre
(Application)
• Tufts University (Application)
UK High-End Computing Services
- HPCx run by the University of Edinburgh
and CCLRC Daresbury Laboratory
(Compute, Networking, Coordination)
- CSAR run by the University of Manchester
and CSC (Compute and Visualisation)
Teragrid sites at:
•Argonne National
Laboratory
(Visualization,
Networking)
•National Center for
Supercomputing
Applications
(Compute)
•Pittsburgh
Supercomputing
Center (Compute,
Visualisation)
•San Diego
Supercomputer
Center (Compute)













P I TT TT S B U R G
G H
41
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Project explanation




Amphiphiles are chemicals with hydrophobic (water-avoiding)
tails and hydrophilic (water attracting) heads. When dispersed
in solvents or oil/water mixtures, self assemble into complex
shapes; some (gyroids) are of particular interest in biology.
Shapes depend on parameters like
• abundance and initial distribution of each component
• the strength of the surfactant-surfactant coupling,
Desired structures can sometimes only be seen in very large
systems. E.g. smaller region form gyroids in different
directions and how they then interact is of major significance.
Project goal is to study defect pathways and dynamics
in gyroid self-assembly













P I TT TT S B U R G
G H
42
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Networking
Netherlight
Amsterdam
Teragrid
Glasgow
Belfast
UK
43
DL
Edinburgh
Newcastle
BT
provision
Manchester
Cambridge
Oxford
RAL
Cardiff
London
Southampton













P I TT TT S B U R G
G H
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Distribution of function




Computations run at HPCx, CSAR, SDSC, PSC and
NCSA. (7 TB memory - 5K processors in integrated
resource) One Gigabit of LB3D data is generated per
simulation time-step.
Visualisation run at Manchester/ UCL/ Argonne
Scientists steering calculations from UCL and Boston over
Access Grid. Steering requires reliable near-real time
data transport across the Grid to visualization engines.
Visualisation output and collaborations multicast to SC03
Phoenix and visualised on the show floor in the University
of Manchester booth













P I TT TT S B U R G
G H
44
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Exploring parameter space
through computational steering
Cubic
Cubicmicellar
micellarphase,
phase,
low
high
surfactant
surfactant
density
density
gradient.
gradient.
Initial condition:
Random water/
surfactant mixture.
Self-assembly
starts.
Rewind and
restart from
checkpoint.
Lamellar phase:
Lamellar
phase:
surfactant
bilayers
surfactant
between
waterbilayers
between water layers.
layers.













P I TT TT S B U R G
G H
45
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Results

Linking these resources allowed computation of the largest
set of lattice-Boltzmann (LB) simulations ever performed,
involving lattices of over one billion sites.













P I TT TT S B U R G
G H
46
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
How do upcoming developments deal
with the major technical issues

Memory bandwidth
• Old Crays- 2loads and a store/clock= 12B/flop
• TCS, better than most commodity processors 1 B/flop
• Earth Simulator 4 B/flop

Power
• TCS, ~700 kW
• Earth Simulator, ~4 MW

Space
• TCS, ~2500 sq feet
• ASCI Q New machine room of ~40,000 sq feet
• Earth Simulator, 3250 sq meters














Reliability
P I TT TT S B U R G
G H
47
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Short term responses
Livermore, BlueGene/L
 Sandia, Red Storm














P I TT TT S B U R G
G H
48
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
BlueGene/L (Livermore)

System on a chip
• IBM powerPC with reduced clock (700 Mhz) for lower
power consumption
• 2 processor/node each 2.8 GF peak
• 256 MB/node (small, but allows up to 2GB/node)
• Memory on chip, to increase memory bandwidth to
2Bytes/flop
• Communications processor on chip, speeds
interprocessor communication (175MB/s/link)
• Total 360 Tf, 65536 nodes in 3D torus
• Total power 1MW
• floor space 2500 sq ft
• very fault tolerant (expect 1 failure/week)













P I TT TT S B U R G
G H
49
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
BlueGene/L Science


Protein folding (molecular dynamics needs small memory
and large floating point capability)
Materials science, (again molecular dyanmics)













P I TT TT S B U R G
G H
50
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Red Storm (Sandia)







Inspired by the T3E- a true MPP
Opteron chip from AMD,
• 2 Ghz clock, 4 Gflop, 1.3B/flop memory bandwidth
High-bandwidth proprietary interconnect (from Cray)
• bandwidth of 6.4 GB/sec, as good as local memory
10,000 cpus, 3-d torus
40 Tf peak, 10TB memory
<2MW, < 3000 sq ft
Much emphasis on RAS (Reliability, Availability,
Serviceability)













P I TT TT S B U R G
G H
51
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Scalability considerations

System

X1
Red Storm
ASCI Red
Cray T3E
BlueGene/L
Earth Simulator
ASCI Blue Mountain
ASCI White
LANL Pink
PSC Alpha Cluster
ASCI Purple










Node speed
(Mflops/s)
51200
4000
666
1200
5600
64000
64000
10000
9600
8000
218000
Interconnect
(MB/s)
100000
6400
800
1200
1050
12300
1200
2000
250
700
32000
Ratio
B/flop
1.95
1.6
1.2
1.0
0.19
0.19
.02
.083
.026
.0875
.147













P I TT TT S B U R G
G H
52
Red are MPPs, Green are SMPs
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Longer term responses

DARPA High Productivity Computing Systems
• Now involves Cray, SUN and IBM (formerly also HP
and SGI)
• Ultimate time frame is 2010, but prototype design now
due in 2007
• New architectural innovations
• Also stresses programmer productivity as much as raw
machine speed













P I TT TT S B U R G
G H
53
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Longer Term Response
High Productivity Computing Systems
Goal:
 Provide a new generation of economically viable high productivity computing
systems for the national security and industrial user community (2007 – 2010)
Impact:
 Performance (time-to-solution): speedup critical national
security applications by a factor of 10X to 40X
 Programmability (time-for-idea-to-first-solution): reduce
cost and time of developing application solutions
 Portability (transparency): insulate research and
operational application software from system
 Robustness (reliability): apply all known techniques to
protect against outside attacks, hardware faults, &
programming errors
HPCS Program Focus Areas
Applications:
 Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant
modeling and biotechnology













Fill the Critical Technology and Capability Gap
Today (late 80’s HPC technology)…..to…..Future (Quantum/Bio Computing)
P I TT TT S B U R G
G H
54
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Many Science Drivers for
more powerful machines













P I TT TT S B U R G
G H
55
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Materials Science Requirements
Electronic structures:
• Current: ~300 atom: 0.5 Tflop/s, 100 Gbyte memory.
• Future: ~3000 atom: 50 Tflop/s, 2 Tbyte memory.
Magnetic materials:
• Current: ~2000 atom: 2.64 Tflop/s, 512 Gbytes
• Future: hard drive simulation: 30 Tflop/s, 2 Tbyte
Molecular dynamics:
• Current: 109 atoms, ns time scale: 1 Tflop/s, 50 Gbytes
• Future: alloys, us time scale: 20 Tflop/s, 4 Tbytes













P I TT TT S B U R G
G H
56
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Climate Modeling Requirements
Current state-of-the-art:
• Atmosphere: 1 x 1.25 deg spacing, with 29 vertical
layers.
• Ocean: 0.25 x 0.25 degree spacing, 60 vertical layers.
• Currently requires 52 seconds CPU time per simulated
day.
Future requirements (to resolve ocean mesoscale eddies):
• Atmosphere: 0.5 x 0.5 deg spacing.
• Ocean: 0.125 x 0.125 deg spacing.
• Computational requirement: 17 Tflop/s.
Future goal: resolve tropical cumulus clouds:
• 2 to 3 orders of magnitude more than above.













P I TT TT S B U R G
G H
57
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Fusion Requirements
Tokamak simulation -- ion temperature gradient turbulence
in ignition experiment:
• Grid size: 3000 x 1000 x 64, or about 2 x 108 grid points.
• Each grid cell contains 8 particles, for total of 1.6 x 109.
• 50,000 time steps required.
• Total cost: 3.2 x 1017 flop/s, 1.6 Tbyte.
All-Orders Spectral Algorithm (AORSA) – to address effects
of RF electromagnetic waves in plasmas.
• 120,000 x120,000 complex linear system.
• 230 Gbyte memory.
• 1.3 hours on 1 Tflop/s.
• 300,000 x 300,000 linear system requires 8 hours.
• Future: 6,000,000 x 6,000,000 system (576 Tbyte
memory), 160 hours on 1 Pflop/s system.













P I TT TT S B U R G
G H
58
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Accelerator Modeling Requirements
Current computations:
• 1283 to 5123 cells, or 40 million to 2 billion particles.
• Currently requires 10 hours on 256 CPUs.
Future computations:
• Modeling intense beams in rings will be 100 to 1000
times more challenging.













P I TT TT S B U R G
G H
59
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Astrophysics Requirements
Supernova simulation:
• 3-d understanding of Type 1a supernovas, “standard
candles” in calculating distances to remote galaxies,
require 2,000,000 CPU-hours, > exceeding 256 Gbyte
 Analysis of cosmic microwave background data:
• MAXIMA data, 5.3 x 1016 flops, 100 Gbyte mem
• BOOMERANG data 1019 flops, 3.2 Tbyte mem
• Future MAP data, 1020 flops, 16 Tbyte mem
• Future PLANCK data 1023 flops, 1.6 Pbyte mem













P I TT TT S B U R G
G H
60
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R
Take home lessons



Fielding capability computing takes considerable thought
and expertise
Computing power continues to grow, and it will be
accessible to you
Think of challenging problems to stress existing systems
and justify more powerful ones













P I TT TT S B U R G
G H
61
S U P ERC O M PPU
U TI
T IN
NG
C
E
N
T
E
R