NSF’s Evolving Cyberinfrastructure Program Guy Almes Office of Cyberinfrastructure Oklahoma Supercomputing Symposium 2005 Norman 5 October 2005

Download Report

Transcript NSF’s Evolving Cyberinfrastructure Program Guy Almes Office of Cyberinfrastructure Oklahoma Supercomputing Symposium 2005 Norman 5 October 2005

NSF’s Evolving Cyberinfrastructure Program
Guy Almes <[email protected]>
Office of Cyberinfrastructure
Oklahoma Supercomputing Symposium 2005
Norman
5 October 2005
National Science Foundation
Overview

Cyberinfrastructure in Context

Existing Elements

Organizational Changes

Vision and High-performance
Computing planning

Closing thoughts
2
National Science Foundation
Cyberinfrastructure in Context

Due to the research university’s mission:



each university wants a few people from
each key research specialty
therefore, research colleagues are
scattered across the nation / world
Enabling their collaborative work is key
to NSF
3
National Science Foundation

Traditionally, there were two
approaches to doing science:



theoretical / analytical
experimental / observational
Now the use of aggressive
computational resources has led to third
approach

in silico simulation / modeling
4
National Science Foundation
Cyberinfrastructure Vision
A new age has dawned in scientific and engineering
research, pushed by continuing progress in computing,
information, and communication technology, and pulled by
the expanding complexity, scope, and scale of today’s
challenges. The capacity of this technology has crossed
thresholds that now make possible a comprehensive
“cyberinfrastructure” on which to build new types of scientific
and engineering knowledge environments and organizations
and to pursue research in new ways and with increased
efficacy.
[NSF Blue Ribbon Panel report, 2003]
5
National Science Foundation
Historical Elements

Supercomputer Center program from 1980s


NSFnet program of 1985-95




NCSA, SDSC, and PSC leading centers ever
since
connect users to (and through) those centers
56 kb/s to 1.5 Mb/s to 45 Mb/s within ten years
Sensors: telescopes, radars, environmental,
but treated in an ad hoc fashion
Middleware: of growing importance, but
underestimated in importance
6
National Science Foundation
‘00
‘97
ITR
Projects
Supercomputer
Centers
Disciplinespecific
CI
Projects
• PSC
• NCSA
• SDSC
• JvNC
• CTC
Terascale Computing
Systems
Partnerships for Advanced
Computational Infrastructure
• Alliance (NCSA-led)
• NPACI (SDSC-led)
Branscomb
Report
‘85
Hayes
Report
‘93
‘95
PITAC
Report
‘99
ETF
Management
& Operations
Core
Support
• NCSA
• SDSC
Atkins
Report
‘03
FY‘05
‘08
7
National Science Foundation
Explicit Elements

Advanced Computing


Advanced Instruments


Connecting researchers, instruments, and
computers together in real time
Advanced Middleware


Sensor networks, weather radars, telescopes, etc.
Advanced Networks


Variety of strengths, e.g., data-, compute-
Enable the potential sharing and collaboration
Note the synergies!
8
National Science Foundation
CRAFT: A normative example – Sensors + network + HEC
Univ Oklahoma
NCSA and PSC
Internet2
UCAR Unidata Project
National Weather Service
9
National Science Foundation
Current Projects within OCI

Office of Cyberinfrastructure






HEC + X
Extensible Terascale Facility (ETF)
International Research Network Connections
NSF Middleware Initiative
Integrative Activities: Education, Outreach & Training
Social and Economic Frontiers in Cyberinfrastructure
10
National Science Foundation
TeraGrid: One Component
• A distributed system of
unprecedented scale
• 30+ TF, 1+ PB, 40 Gb/s net
• Unified user environment
across resources
• Created an initial community of
over 500 users, 80 PIs
• Created User Portal in
collaboration with NMI
• User software environment User
support resources
• Integrated new partners to
introduce new capabilities
• Additional computing,
visualization capabilities
• New types of resources: data
collections, instruments
• Built a strong, extensible Team
courtesy Charlie Catlett
11
National Science Foundation
Key TeraGrid Resources

Computational

very tightly coupled clusters


tightly coupled clusters


DataStar at SDSC
memory-intensive systems


Itanium2 and Xeon clusters at several sites
data-intensive systems


LeMieux and Red Storm systems at PSC
Maverick at TACC and Cobalt at NCSA
experimental

MD-Grape system at Indiana and BlueGene/L at SDSC
12
National Science Foundation

Online and Archival Storage


Data Collections


e.g., more than a PB online at SDSC
numerous
Instruments


Spallation Neutron Source at Oak Ridge
Purdue Terrestrial Observatory
13
National Science Foundation
TeraGrid DEEP Examples
Aquaporin Mechanism
Animation pointed to by 2003 Nobel
chemistry prize announcement.
Klaus Schulten, UIUC
Atmospheric Modeling
Kelvin Droegemeier, OU
Reservoir Modeling
Joel Saltz, OSU
Advanced Support for
TeraGrid Applications:
TeraGrid staff are “embedded”
with applications to create
- Functionally distributed
workflows
- Remote data access,
storage
and visualization
- Distributed data mining
- Ensemble and parameter
sweep
run and data management
Groundwater/Flood Modeling
David Maidment, Gordon Wells, UT
courtesy Charlie Catlett
Lattice-Boltzman Simulations
Peter Coveney, UCL
Bruce Boghosian, Tufts
14
National Science Foundation
Cyberresources
QuickTime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
QuickTime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
Key NCSA Systems

Distributed Memory Clusters





Dell (3.2 GHz Xeon): 16 Tflops
Dell (3.6 GHz EM64T): 7 Tflops
IBM (1.3/1.5 GHz Itanium2): 10 Tflops
QuickTime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
IBM p690 (1.3 GHz Power4): 2 Tflops
SGI Altix (1.5 GHz Itanium2): 6 Tflops
Archival Storage System


QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Shared Memory Clusters


QuickTime™ and a
T IFF (Uncompressed) decompressor
are needed to see t his picture.
SGI/Unitree (3 petabytes)
Visualization System

courtesy NCSA
SGI Prism (1.6 GHz Itanium2+
GPUs)
15
National Science Foundation
Cyberresources
Recent Scientific Studies at NCSA
Weather Forecasting
Computational Biology
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Molecular Science
QuickT ime™ and a
T IFF (Uncompressed) decompressor
are needed to see thi s pi cture.
courtesy NCSA
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Earth Science
Qui ckTi me™ and a
TIFF (Uncompressed) decompressor
are needed to see this pictur e.
16
National Science Foundation
Computing: One Size Doesn’t Fit All
Algorithm Requirements
Science
Multi-physics Dense linear
Areas
& multi-scale algebra
FFTs
Nanoscience
X
X
X
Combustion
X
Fusion
X
X
Climate
X
X
Astrophysics
X
X
X
Particle
methods AMR
X
X
X
X
X
X
X
X
Data
Irregular
parallelism control flow
X
X
X
X
X
X
X
X
X
X
Trade-off
Interconnect




Interconnect fabric
Processing power
Memory
I/O
courtesy SDSC
P
P
P
M
M
M
17
Data Storage/Preservation
Extreme I/O
(Increasing I/O and storage)
SDSC Data Science Env
Data capability
National Science Foundation
Computing: One Size Doesn’t Fit All
SCEC
Visualization
EOL
NVO
Can’t be done on Grid
(I/O exceeds WAN)
Climate
SCEC
ENZO
Simulation
1.
simulation
ENZO
2.
3D + time
simulation
Out-of-Core
Visualization
CIPRes
CFD
Protein
Folding
Campus,
Departmental and
Desktop
Computing
Distributed I/O
Capable
CPMD
QCD
Traditional HEC Env
Compute capability
(increasing FLOPS)
courtesy SDSC
18
National Science Foundation
SDSC Resources
COMPUTE SYSTEMS

DataStar





TeraGrid Cluster



2,396 Power4+ pes
IBM p655 and p690
4 TB total memory
Up to 2 GB/s I/O to disk
512 Itanium2 pes
1 TB total memory
Intimidata



Early IBM BlueGene/L
2,048 PowerPC pes
128 I/O nodes
DATA ENVIRONMENT







SCIENCE and TECHNOLOGY STAFF,
SOFTWARE, SERVICES





courtesy SDSC
Support for
1 PByte SAN
community data
6 PB StorageTek tape library collections and
databases
DB2, Oracle, MySQL
Data
Storage Resource Broker
management,
HPSS
mining,
analysis, and
72-CPU Sun Fire 15K
preservation
96-CPU IBM p690s
User Services
Application/Community Collaborations
Education and Training
SDSC Synthesis Center
Community SW, toolkits, portals, codes
19
National Science Foundation
Pittsburgh Supercomputing
Center “Big Ben” System
• Cray Redstorm XT3
• based on Sandia system
• Working with Cray, SNL, ORNL
• Approximately 2000 compute nodes
• 1 GB memory/node
• 2 TB total memory
• 3D toroidal-mesh
• 10 Teraflops
• MPI latency: < 2µs (neighbor)
• < 3.5 µs (full system)
• Bi-section BW: 2.0/2.9/2.7 TB/s (x,y,z)
• Peak link BW: 3.84 GB/s
• 400 sq. ft. floor space
• < 400 KW power
• Now operational
courtesy PSC
• NSF award in Sept. 2004
•Oct. 2004 Cray announced
Commercial version of Redstorm, XT3
20
National Science Foundation
I-Light, I-Light2, and the TeraGrid Network
Resource
courtesy IU and PU
21
National Science Foundation
Purdue, Indiana Contributions to
the TeraGrid




The Purdue Terrestrial Observatory
portal to the TeraGrid will deliver GIS
data from IU and real-time remote
sensing data from the PTO to the
national research community
Complementary large facilities,
including large Linux clusters
Complementary special facilities, e.g.,
Purdue NanoHub and Indiana
University MD-GRAPE systems
Indiana and Purdue Computer
Scientists are developing new portal
technology that makes use of the
TeraGrid (GIG effort)
courtesy IU and PU
22
National Science Foundation
New Purdue RP resources



courtesy IU and PU
11 teraflops
Community
Cluster
(being deployed)
1.3 PB tape robot
Non-dedicated
resources
(opportunistic),
defining a model
for sharing
university
resources with
the nation
23
National Science Foundation
PTO, Distributed Datasets for
Environmental Monitoring
courtesy IU and PU
24
National Science Foundation
TeraGrid as Integrative
Technology




A likely key to ‘all’ foreseeable NSF HPC
capability resources
Working with OSG and others, work even
more broadly to encompass both capability
and capacity resources
Anticipate requests for new RPs
Slogans:


Learn once, execute anywhere
Whole is more than sum of parts
25
National Science Foundation
TeraGrid as a Set of
Resources


TeraGrid gives each RP an opportunity to
shine
Balance:



value of innovative/peculiar resources
vs value of slogans
Opportunistic resources, SNS, Grapes as
interesting examples
Note the stress on the allocation process
26
National Science Foundation
2005 IRNC Awards

Awards






TransPAC2 (U.S. – Japan and beyond)
GLORIAD (U.S. – China – Russia – Korea)
Translight/PacificWave (U.S. – Australia)
TransLight/StarLight (U.S. – Europe)
WHREN (U.S. – Latin America)
Example use: Open Science Grid involving
partners in U.S. and Europe, mainly
supporting high energy physics research
based on LHC
27
National Science Foundation
NSF Middleware Initiative
(NMI)




Program began in 2001
Purpose: To design, develop, deploy and support a
set of reusable and expandable middleware functions
that benefit many science and engineering
applications in a networked environment
Program encourages open source development
Program funds mainly development, integration,
deployment and support activities
28
National Science Foundation
Example NMI-funded Activities



GridShib – integrating Shibboleth campus
attribute services with Grid security
infrastructure mechanisms
UWisc Build and Test facility – community
resource and framework for multi-platform
build and test of grid software
Condor – mature distributed computing
system installed on 1000’s of CPU “pools”
and 10’s of 1000’s of CPUs.
29
National Science Foundation
Organizational Changes

Office of Cyberinfrastructure



Cyberinfrastructure Council


chair is NSF Director; members are ADs
Vision Document started


formed on 22 July 2005
had been a division within CISE
HPC Strategy chapter drafted
Advisory Committee for Cyberinfrastructure
30
National Science Foundation
Cyberinfrastructure
Components
Collaboration &
Data
Communication
Tools &
Tools &
Services
Services
Education
&
Training
High Performance
Computing
Tools & Services
31
National Science Foundation
Vision Document Outline


Call to Action
Strategic Plans for …





High Performance Computing
Data
Collaboration and Communication
Education and Workforce Development
Complete document by 31 March 2006
32
National Science Foundation
Strategic Plan for
High Performance Computing


Covers 2006-2010 period
Enable petascale science and
engineering by creating a world-class
HPC environment




Science-driven HPC Systems Architectures
Portable Scalable Applications Software
Supporting Software
Inter-agency synergies will be sought
33
National Science Foundation
Coming HPC Solicitation







There will be a solicitation issued this month
One or more HPC systems
One or more RPs
Rôle of TeraGrid
Process driven by Science User needs
Confusion about capacity/capability
Workshops


Arlington -- 9 September
Lisle -- 20-21 September
34
National Science Foundation
HPC Platforms
(2000-2005)
TCS LeMieux 6TF
Tightly Coupled
Platforms
Marvel 0.3 TF
Red Storm 10 TF
Purdue Cluster 1.7TF
Cray-Dell Xeon Cluster 6.4 TF
IBM Cluster 0.2 TF
Dell Xeon Cluster 16.4 TF
ETF
Integrating
Framework
Commodity
Platforms
Condor Pool 0.6 TF
IBM DataStar 10.4TF
I/O Intensive
Platforms
SGI SMP system 6.6 TF
IBM Itanium Cluster 8TF
IBM Itanium Cluster 3.1 TF
35
National Science Foundation
Cyberinfrastructure Vision
NSF will lead the development and support of a comprehensive
cyberinfrastructure essential to 21st century advances
in science and engineering.
unities and Outreach
r
d
ific
se
¥ P artners
¥ Caltech
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
University of Florida
Open Science Grid and Grid3
Fermilab
DOE PPDG
CERN
NSF GriPhyn and iVDGL
EU LCG and EGEE
Brazil (UERJ,É )
Pakistan (NUST, É )
Korea (KAIST,É )
LHC Data Distribution Model
s
ar
lp
36