NSF’s Evolving Cyberinfrastructure Program Guy Almes Office of Cyberinfrastructure Oklahoma Supercomputing Symposium 2005 Norman 5 October 2005
Download ReportTranscript NSF’s Evolving Cyberinfrastructure Program Guy Almes Office of Cyberinfrastructure Oklahoma Supercomputing Symposium 2005 Norman 5 October 2005
NSF’s Evolving Cyberinfrastructure Program Guy Almes <[email protected]> Office of Cyberinfrastructure Oklahoma Supercomputing Symposium 2005 Norman 5 October 2005 National Science Foundation Overview Cyberinfrastructure in Context Existing Elements Organizational Changes Vision and High-performance Computing planning Closing thoughts 2 National Science Foundation Cyberinfrastructure in Context Due to the research university’s mission: each university wants a few people from each key research specialty therefore, research colleagues are scattered across the nation / world Enabling their collaborative work is key to NSF 3 National Science Foundation Traditionally, there were two approaches to doing science: theoretical / analytical experimental / observational Now the use of aggressive computational resources has led to third approach in silico simulation / modeling 4 National Science Foundation Cyberinfrastructure Vision A new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology, and pulled by the expanding complexity, scope, and scale of today’s challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy. [NSF Blue Ribbon Panel report, 2003] 5 National Science Foundation Historical Elements Supercomputer Center program from 1980s NSFnet program of 1985-95 NCSA, SDSC, and PSC leading centers ever since connect users to (and through) those centers 56 kb/s to 1.5 Mb/s to 45 Mb/s within ten years Sensors: telescopes, radars, environmental, but treated in an ad hoc fashion Middleware: of growing importance, but underestimated in importance 6 National Science Foundation ‘00 ‘97 ITR Projects Supercomputer Centers Disciplinespecific CI Projects • PSC • NCSA • SDSC • JvNC • CTC Terascale Computing Systems Partnerships for Advanced Computational Infrastructure • Alliance (NCSA-led) • NPACI (SDSC-led) Branscomb Report ‘85 Hayes Report ‘93 ‘95 PITAC Report ‘99 ETF Management & Operations Core Support • NCSA • SDSC Atkins Report ‘03 FY‘05 ‘08 7 National Science Foundation Explicit Elements Advanced Computing Advanced Instruments Connecting researchers, instruments, and computers together in real time Advanced Middleware Sensor networks, weather radars, telescopes, etc. Advanced Networks Variety of strengths, e.g., data-, compute- Enable the potential sharing and collaboration Note the synergies! 8 National Science Foundation CRAFT: A normative example – Sensors + network + HEC Univ Oklahoma NCSA and PSC Internet2 UCAR Unidata Project National Weather Service 9 National Science Foundation Current Projects within OCI Office of Cyberinfrastructure HEC + X Extensible Terascale Facility (ETF) International Research Network Connections NSF Middleware Initiative Integrative Activities: Education, Outreach & Training Social and Economic Frontiers in Cyberinfrastructure 10 National Science Foundation TeraGrid: One Component • A distributed system of unprecedented scale • 30+ TF, 1+ PB, 40 Gb/s net • Unified user environment across resources • Created an initial community of over 500 users, 80 PIs • Created User Portal in collaboration with NMI • User software environment User support resources • Integrated new partners to introduce new capabilities • Additional computing, visualization capabilities • New types of resources: data collections, instruments • Built a strong, extensible Team courtesy Charlie Catlett 11 National Science Foundation Key TeraGrid Resources Computational very tightly coupled clusters tightly coupled clusters DataStar at SDSC memory-intensive systems Itanium2 and Xeon clusters at several sites data-intensive systems LeMieux and Red Storm systems at PSC Maverick at TACC and Cobalt at NCSA experimental MD-Grape system at Indiana and BlueGene/L at SDSC 12 National Science Foundation Online and Archival Storage Data Collections e.g., more than a PB online at SDSC numerous Instruments Spallation Neutron Source at Oak Ridge Purdue Terrestrial Observatory 13 National Science Foundation TeraGrid DEEP Examples Aquaporin Mechanism Animation pointed to by 2003 Nobel chemistry prize announcement. Klaus Schulten, UIUC Atmospheric Modeling Kelvin Droegemeier, OU Reservoir Modeling Joel Saltz, OSU Advanced Support for TeraGrid Applications: TeraGrid staff are “embedded” with applications to create - Functionally distributed workflows - Remote data access, storage and visualization - Distributed data mining - Ensemble and parameter sweep run and data management Groundwater/Flood Modeling David Maidment, Gordon Wells, UT courtesy Charlie Catlett Lattice-Boltzman Simulations Peter Coveney, UCL Bruce Boghosian, Tufts 14 National Science Foundation Cyberresources QuickTime™ and a T IFF (Uncompressed) decompressor are needed to see t his picture. QuickTime™ and a T IFF (Uncompressed) decompressor are needed to see t his picture. Key NCSA Systems Distributed Memory Clusters Dell (3.2 GHz Xeon): 16 Tflops Dell (3.6 GHz EM64T): 7 Tflops IBM (1.3/1.5 GHz Itanium2): 10 Tflops QuickTime™ and a T IFF (Uncompressed) decompressor are needed to see t his picture. IBM p690 (1.3 GHz Power4): 2 Tflops SGI Altix (1.5 GHz Itanium2): 6 Tflops Archival Storage System QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Shared Memory Clusters QuickTime™ and a T IFF (Uncompressed) decompressor are needed to see t his picture. SGI/Unitree (3 petabytes) Visualization System courtesy NCSA SGI Prism (1.6 GHz Itanium2+ GPUs) 15 National Science Foundation Cyberresources Recent Scientific Studies at NCSA Weather Forecasting Computational Biology QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Molecular Science QuickT ime™ and a T IFF (Uncompressed) decompressor are needed to see thi s pi cture. courtesy NCSA QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Earth Science Qui ckTi me™ and a TIFF (Uncompressed) decompressor are needed to see this pictur e. 16 National Science Foundation Computing: One Size Doesn’t Fit All Algorithm Requirements Science Multi-physics Dense linear Areas & multi-scale algebra FFTs Nanoscience X X X Combustion X Fusion X X Climate X X Astrophysics X X X Particle methods AMR X X X X X X X X Data Irregular parallelism control flow X X X X X X X X X X Trade-off Interconnect Interconnect fabric Processing power Memory I/O courtesy SDSC P P P M M M 17 Data Storage/Preservation Extreme I/O (Increasing I/O and storage) SDSC Data Science Env Data capability National Science Foundation Computing: One Size Doesn’t Fit All SCEC Visualization EOL NVO Can’t be done on Grid (I/O exceeds WAN) Climate SCEC ENZO Simulation 1. simulation ENZO 2. 3D + time simulation Out-of-Core Visualization CIPRes CFD Protein Folding Campus, Departmental and Desktop Computing Distributed I/O Capable CPMD QCD Traditional HEC Env Compute capability (increasing FLOPS) courtesy SDSC 18 National Science Foundation SDSC Resources COMPUTE SYSTEMS DataStar TeraGrid Cluster 2,396 Power4+ pes IBM p655 and p690 4 TB total memory Up to 2 GB/s I/O to disk 512 Itanium2 pes 1 TB total memory Intimidata Early IBM BlueGene/L 2,048 PowerPC pes 128 I/O nodes DATA ENVIRONMENT SCIENCE and TECHNOLOGY STAFF, SOFTWARE, SERVICES courtesy SDSC Support for 1 PByte SAN community data 6 PB StorageTek tape library collections and databases DB2, Oracle, MySQL Data Storage Resource Broker management, HPSS mining, analysis, and 72-CPU Sun Fire 15K preservation 96-CPU IBM p690s User Services Application/Community Collaborations Education and Training SDSC Synthesis Center Community SW, toolkits, portals, codes 19 National Science Foundation Pittsburgh Supercomputing Center “Big Ben” System • Cray Redstorm XT3 • based on Sandia system • Working with Cray, SNL, ORNL • Approximately 2000 compute nodes • 1 GB memory/node • 2 TB total memory • 3D toroidal-mesh • 10 Teraflops • MPI latency: < 2µs (neighbor) • < 3.5 µs (full system) • Bi-section BW: 2.0/2.9/2.7 TB/s (x,y,z) • Peak link BW: 3.84 GB/s • 400 sq. ft. floor space • < 400 KW power • Now operational courtesy PSC • NSF award in Sept. 2004 •Oct. 2004 Cray announced Commercial version of Redstorm, XT3 20 National Science Foundation I-Light, I-Light2, and the TeraGrid Network Resource courtesy IU and PU 21 National Science Foundation Purdue, Indiana Contributions to the TeraGrid The Purdue Terrestrial Observatory portal to the TeraGrid will deliver GIS data from IU and real-time remote sensing data from the PTO to the national research community Complementary large facilities, including large Linux clusters Complementary special facilities, e.g., Purdue NanoHub and Indiana University MD-GRAPE systems Indiana and Purdue Computer Scientists are developing new portal technology that makes use of the TeraGrid (GIG effort) courtesy IU and PU 22 National Science Foundation New Purdue RP resources courtesy IU and PU 11 teraflops Community Cluster (being deployed) 1.3 PB tape robot Non-dedicated resources (opportunistic), defining a model for sharing university resources with the nation 23 National Science Foundation PTO, Distributed Datasets for Environmental Monitoring courtesy IU and PU 24 National Science Foundation TeraGrid as Integrative Technology A likely key to ‘all’ foreseeable NSF HPC capability resources Working with OSG and others, work even more broadly to encompass both capability and capacity resources Anticipate requests for new RPs Slogans: Learn once, execute anywhere Whole is more than sum of parts 25 National Science Foundation TeraGrid as a Set of Resources TeraGrid gives each RP an opportunity to shine Balance: value of innovative/peculiar resources vs value of slogans Opportunistic resources, SNS, Grapes as interesting examples Note the stress on the allocation process 26 National Science Foundation 2005 IRNC Awards Awards TransPAC2 (U.S. – Japan and beyond) GLORIAD (U.S. – China – Russia – Korea) Translight/PacificWave (U.S. – Australia) TransLight/StarLight (U.S. – Europe) WHREN (U.S. – Latin America) Example use: Open Science Grid involving partners in U.S. and Europe, mainly supporting high energy physics research based on LHC 27 National Science Foundation NSF Middleware Initiative (NMI) Program began in 2001 Purpose: To design, develop, deploy and support a set of reusable and expandable middleware functions that benefit many science and engineering applications in a networked environment Program encourages open source development Program funds mainly development, integration, deployment and support activities 28 National Science Foundation Example NMI-funded Activities GridShib – integrating Shibboleth campus attribute services with Grid security infrastructure mechanisms UWisc Build and Test facility – community resource and framework for multi-platform build and test of grid software Condor – mature distributed computing system installed on 1000’s of CPU “pools” and 10’s of 1000’s of CPUs. 29 National Science Foundation Organizational Changes Office of Cyberinfrastructure Cyberinfrastructure Council chair is NSF Director; members are ADs Vision Document started formed on 22 July 2005 had been a division within CISE HPC Strategy chapter drafted Advisory Committee for Cyberinfrastructure 30 National Science Foundation Cyberinfrastructure Components Collaboration & Data Communication Tools & Tools & Services Services Education & Training High Performance Computing Tools & Services 31 National Science Foundation Vision Document Outline Call to Action Strategic Plans for … High Performance Computing Data Collaboration and Communication Education and Workforce Development Complete document by 31 March 2006 32 National Science Foundation Strategic Plan for High Performance Computing Covers 2006-2010 period Enable petascale science and engineering by creating a world-class HPC environment Science-driven HPC Systems Architectures Portable Scalable Applications Software Supporting Software Inter-agency synergies will be sought 33 National Science Foundation Coming HPC Solicitation There will be a solicitation issued this month One or more HPC systems One or more RPs Rôle of TeraGrid Process driven by Science User needs Confusion about capacity/capability Workshops Arlington -- 9 September Lisle -- 20-21 September 34 National Science Foundation HPC Platforms (2000-2005) TCS LeMieux 6TF Tightly Coupled Platforms Marvel 0.3 TF Red Storm 10 TF Purdue Cluster 1.7TF Cray-Dell Xeon Cluster 6.4 TF IBM Cluster 0.2 TF Dell Xeon Cluster 16.4 TF ETF Integrating Framework Commodity Platforms Condor Pool 0.6 TF IBM DataStar 10.4TF I/O Intensive Platforms SGI SMP system 6.6 TF IBM Itanium Cluster 8TF IBM Itanium Cluster 3.1 TF 35 National Science Foundation Cyberinfrastructure Vision NSF will lead the development and support of a comprehensive cyberinfrastructure essential to 21st century advances in science and engineering. unities and Outreach r d ific se ¥ P artners ¥ Caltech ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ ¥ University of Florida Open Science Grid and Grid3 Fermilab DOE PPDG CERN NSF GriPhyn and iVDGL EU LCG and EGEE Brazil (UERJ,É ) Pakistan (NUST, É ) Korea (KAIST,É ) LHC Data Distribution Model s ar lp 36