Review of NCAR Al Kellie SCD Director November 01, 2001 Outline of Presentation • Introduction to • UCAR • NCAR • SCD • Overview of divisional activities • • • • • Research data.
Download ReportTranscript Review of NCAR Al Kellie SCD Director November 01, 2001 Outline of Presentation • Introduction to • UCAR • NCAR • SCD • Overview of divisional activities • • • • • Research data.
Review of NCAR Al Kellie SCD Director November 01, 2001 Outline of Presentation • Introduction to • UCAR • NCAR • SCD • Overview of divisional activities • • • • • Research data sets (Worley) Mass Storage System (Harano) Extracting model performance (Hammond) Visualization & Earth System GRiD (Middleton) Computing RFP (ARCS) Outline of Presentation • INTRODUCTION • Overview of three divisional aspects • Computing RFP (ARCS) University Corporation for Atmospheric Research Member Institutions Board of Trustees Finance & Administration President Corporate Affairs Katy Schmoll, VP Richard Anthes Jack Fellows, VP NCAR UCAR Programs Tim Killeen, Director Jack Fellows, Director Information Infrastructure Technology & Applications (IITA) Richard Chinman Atmospheric Chemistry Division (ACD) Atmospheric Technology Division (ATD) David Carlson Daniel McKenna Environmental & Societal Impacts Group (ESIG) Robert Harriss Advanced Study Program (ASP) High Altitude Observatory (HAO) Michael Knölker Climate & Global Dynamics Diviion (CGD) Al Cooper Mesoscale & Microscale Meteorological Division (MMM) Robert Gall Maurice Blackmon Research Applications Programs (RAP) Scientific Computing Division (SCD) Brant Foote Al Kellie Cooperative Program for Optional Meteorology Education and Training (COMET) Constellation Observing System for Meteorology Ionosphere Climate (COSMIC) Timothy Spangler GPS Science and Technology Program (GST) Randolph Ware Bill Kuo Digital Library for Earth System Science (DLESE) Mary Marlino Unidata Visiting Scientists Programs (VSP) Joint Office for Science Support (JOSS) David Fulker Meg Austin Karyn Sawyer Denotes President’s Office 12/07/98 NCAR Organization Atmospheric Chemistry Dan McKenna Climate & High Altitude Global Observatory Dynamics Michael Knolker Maurice Blackmon Mesoscale & Microscale Meteorology Bob Gall NCAR Tim Killeen ESIG Bob Harriss ASP Al Cooper Scientific Computing Al Kellie Atmospheric Technology Dave Carlson Associate Director Steve Dickson UCAR Rick Anthes UCAR Board of Trustees ISS K. Kelly B&P R.Brasher Research Applications Brant Foote NCAR at a Glance 41 years; 850 Staff – 135 Scientists $128M budget for FY2001 9 divisions and programs Research tools, facilities, and visitor programs for the NSF and university communities EPA 0.4% FAA 7% DOE Other 2% 8% DOD 5% NOAA 3% NASA 8% NSF Regular 62% NSF Special 5% FY 2000 Funding Distribution NCAR FY 2000 Expenditures/Commitments Total FY2001 funding: $128M NCAR Peer-Reviewed Publications 500 400 300 285 145 214 209 206 182 172 170 1997 1998 1999 2000 200 100 0 NCAR Authors Joint with outside authors NCAR Visitors 800 700 95 600 500 400 45 17 134 38 300 90 200 129 100 42 80 73 133 169 31-90 8-30 1-7 Days 318 310 1999 2000 141 0 1998 180+ 91-180 Where did SCD come from? 1959 “Blue Book” Link “There are four compelling reasons for establishing a National Institute for Atmospheric Research” 2. The requirement for facilities and technological assistance beyond those that can properly be made available at individual universities SCD Mission Enable the best atmospheric & related research, no matter where the investigator is located through the provision of high performance computing technologies and related services SCIENTIFIC COMPUTING DIVISION DIRECTOR’S OFFICE Al Kellie, Director (12) Computational Science Steve Hammond (8) Algorithmic Software Development Model performance Research Science Collaboration Frameworks Standards & Benchmarking Networking Engineering & Telecommunications Marla Meehl (25) LAN MAN WAN Dial-up Access Network Infrastructure Data Support Roy Jenne (9) Data Archives Data Catalogs User Assistance Operations and Infrastructure Support Aaron Andersen (18) Operations Room Facility Management & Reporting Database Applications Site Licenses High Performance Systems Gene Harano (13) Supercomputer Systems Mass Storage Systems User Support Section Ginger Caldwell (21) Training/Outreach/Consulting Digital Information Distributed Servers & Workstations Allocations & Account Management Visualization & Enabling Technologies Don Middleton (12) Base $24,874 Ucar $4,027 Outside $2,020 Overhead $1,063 Data Access Data Analysis Visualization SCD BASE BUDGET DISTRIBUTION $4,000,000 G&A OVERHEAD $4,800,000 SALARIES $2,100,000 $2,200,000 OTHER OPERATING COSTS BENEFITS $11,000,000 EQUIPMENT, MAINT,SFWR Computing Services for Research • Operates two distinct computational facilities. – Climate simulations – University community • Governance of these SCD resources in the hands of the users - two external allocation committees. • Computing leverages a common infrastructure for access, networking, data storage & analysis, research data sets, and support services including software development, and consulting. Climate Simulation Laboratory (CSL) •The CSL is a national, multi-agency, special-use, computing facility for climate system modeling for the U.S. Global Change Research Program (USGCRP). – Priority projects that require very large amounts of computer time. • CSL resources are available to U.S. individual researchers with a preference for research teams regardless of sponsorship. • An inter-agency panel selects the projects that use the CSL. Community Facility • The Community Facility is used primarily by university-based NSF grantees and NCAR Scientists. – Community resources are allocated evenly between NCAR and the university community. • NCAR resources are allocated by the NCAR Director to the various NCAR divisions. • University resources are allocated by the SCD Advisory Panel. Open to areas of atmospheric and related sciences. Distribution of Compute Resources All Use of SCD Computing by Area of Scientific Interest through September 2000 for Fiscal Year 2000 Clim ate 25.36% Astrophysics 21.10% Other 4.72% Miscellaneous 11.19% Weather Prediction 17.81% Oceanography Upper 9.24% Atmosphere 10.58% Other includes: Basic Fluid Dyanmics (1.50%) Cloud Physics (3.22%) 100 90 80 70 60 50 40 30 20 10 0 FY 90 FY 91 FY 92 FY 93 FY 94 FY 95 FY 96 FY 97 FY 98 FY 99 FY 00 Percentage SCD Computing by Area of Scientific Interest Climate 100 90 80 70 60 50 40 30 20 10 0 FY 90 FY 91 FY 92 FY 93 FY 94 FY 95 FY 96 FY 97 FY 98 FY 99 FY 00 Percentage SCD Computing by Area of Scientific Interest Weather Prediction 100 90 80 70 60 50 40 30 20 10 0 FY 90 FY 91 FY 92 FY 93 FY 94 FY 95 FY 96 FY 97 FY 98 FY 99 FY 00 Percentage SCD Computing by Area of Scientific Interest Oceanography History of Supercomputing at NCAR CDC 3600 1960 1970 CDC 7600 CDC 6600 1980 IBM SP/1308 IBM SP/604 IBM SP/64 Compaq ES40/36 Cluster IBM SP/296 IBM SP/32 Beowulf/16 SGI Origin2000/128 Cray J90se/24 HP SPP-2000/64 Cray T3D/128 Cray J90se/24 Cray Y-MP/8I Cray J90/16 Cray J90/20 Cray T3D/64 Cray C90/16 CCC Cray 3/4 IBM SP1/8 TMC CM5/32 IBM RS/6000 Cluster Cray Y-MP/8 Cray X-MP/4 TMC CM2/8192 Cray 1-A S/N 14 Cray Y-MP/2 Production Machines Cray 1-A S/N 3 Non-Production Machines Currently in Production 1990 1995 1999 2000 2001 STK 9940 #5 #4 2001 NCAR Wide Area Connectivity • OC3 (155Mbps) to the Front Range GigaPop - OC12 (622Mbps) on 1/1/2002 – OC3 to AT&T Commodity Internet – OC3 to C&W Commodity Internet – OC3 to Abilene (OC12 on 1/1/2002) • OC3 to the vBNS+ • OC12 (622Mbps) to University of Colorado at Boulder – intra-site research and back-up link to FRGP • OC12 to NOAA/NIST in Boulder – Intra-site research and UUNET Commodity Internet • Dark fiber metropolitan area network at GigE (1000Mbps) to other NCAR campus sites TeraGrid Wide Area Network StarLight International Optical Peering Point (see www.startap.net) Abilene Chicago Indianapolis Urbana * DENVER Los Angeles San Diego OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber) • Solid lines in place and/or available by October 2001 • Dashed I-WIRE lines planned for summer 2002 UIC I-WIRE ANL Starlight / NW Univ Multiple Carrier Hubs Ill Inst of Tech Univ of Chicago NCSA/UIUC Indianapolis (Abilene NOC) ARCS Synopsis Credit: Tom Engel ARCS RFP Overview • BEST VALUE PROCUREMENT – – – – – – – – – – Technical evaluation Delivery schedule Production disruption Allocation ready state Infrastructure Maintenance Cost impact – i.e. existing equipment Past performance of bidders Business proposal review Other considerations - invitation to partner ARCS Procurement • Production-level – Availability, robust batch capacity, operational sustainability and support – Integrated software engineering and development environment • High performance execution of existing applications • Additionally – environment conducive to development of next-generation models Workload profile context • Jobs using > 32 nodes – 0.4 % of workload – Average 44 nodes or 176 pes • Jobs using < 32 nodes – 99.6 % of workload – Average 6 nodes or 24 pes ARCS – The Goal • A production-level, high-performance computing system providing for both capability and capacity computing • A stable and upwardly compatible system architecture, user environment, and software engineering & development environments Long Term: Achieve 1 TFLOPs sustained by 2005 1.0 Sustained TFLOPs Initial equipment: At least double current capacity at NCAR 1.2 0.8 0.6 0.4 0.2 0.0 2001 2002 2003 2004 2005 ARCS – The Process • SCD began technical requirements draft Feb 2000 • RFP process (including scientific reps from NCAR divisions, UCAR Contracts, & external review panel) formally began Mar 2000; RFP released Nov 2000 • Offeror proposal reviews, BAFOs, & Supplemental proposals Jan-May 2001 • Technical Evaluations, Performance projections, Risk Assessment, etc. Feb-Jun 2001 • SCD Recommendation for Negotiations 21 Jun; NCAR/ UCAR acceptance of recommendation 25 Jun • Negotiations 24-26 Jul; tech. Ts&Cs completed 14 Aug • Contract submitted to the NSF 01 Oct • NSF Approval 5 Oct … Joint Press Release week SC01 ARCS RFP Technical Attributes • Hardware (processors, nodes, memory, disk, interconnect, network, HIPPI) • Software (OS, user environment, filesystems, batch subsystem) • System admin., resource mgmt., user limits, accounting, network/HIPPI, security • Documentation & training • System maintenance & support services • Facilities (power, cooling, space) Major Requirements • Critical Resource ratios: – Disk – Memory 6 Bytes/peak-FLOP: 64+ MB/sec single-stream & 2+ GB/sec bandwidth - sustainable 0.4 Bytes/peak-FLOP • “Full-featured” product set (cluster-aware compilers, debuggers, performance tools, administrative tools, monitoring) • Hardware & Software stability • Hardware & Software vendor support & responsiveness (on-site, call center, development organization, escalation procedures) • Resource allocation (processor(s), node(s), memory, disk; user limits & disk quotas) • Batch Subsystem and NCAR job scheduler (BPS) ARCS – Benchmarks (1) • Kernels (Hammond Harkness, Loft) Single Processor (COPY, IA, XPOSE, SHAL, RADABS, ELEFUNT, STREAMC) Multi-processor shared memory (PSTREAM) Message-Passing Performance (XPAIR, BISECT, XGLOB, COMMS[1,2,3], STRIDED[1,2], SYNCH, ALLGATHER) • Parallel Shared Memory Applications – CCM3.10.16 (T42 30-days & T170 1-day) – CGD, Rosinski – WRF Prototype (b_wave 5days) - MMM, Michalakes more > ARCS – Benchmarks (2) • Parallel (MPI & Hybrid) models • System Tests – – – – – CCM3.10.16 (T42 30-day & T170 1-day – CGD, Rosinski – MM5 3.3 (t3a 6-hr & “large” 1-hr) – MMM, Michalakes – POP 1.0 (medium & large) – CGD, Craig – MHD3D (medium & large) – HAO, Fox – MOZART2 (medium & large) – ACD, Walters – PCM 1.2 (T42) – CGD, Craig – WRF Prototype (b_wave 5day) – MMM, Michalakes < return HIPPI – SCD, Merrill I/O-tester – SCD, Anderson Network – SCD, Mitchell Batch Workload includes: 2 I/O-tester, 4 Hybrid MM5 3.3 large, 2 Hybrid MM5 3.3 t3a, 2 POP 1.0 medium & large, ccm3.10.16 T170, MOZART2 medium, PCM 1.2 T42, 2 MHD3D medium & large, WRF Prototype – SCD, Engel Risks • Vendor ability to meet commitments – Hardware (processor architecture, clock speed boosts, memory architecture) – Software (OS, filesystems, processor-aware compilers/libraries, tools [3rd party]) • Service, Support, Responsiveness • Vendor stability (product set, financial) • Vendor promises vs. reality Past Performance • Hardware & Software – SCD/NCAR experience – Other customers’ experience • “Missed Promises” – Vendor X ~ 2 yr slip, product line changes – Vendor Y ~ on target – Vendor Z ~ 1.5 yr slip, product line changes Other Considerations • “Blue Light” project invitation to develop of models for an exploratory supercomputer – Invitation to a partnership development. – Offer for an industrial partnership • 256 Tflops peak, 8TB mem, 200TB disk on 64k nodes. True MPP with Torus interconnect. • Node-64 Gflops, 128 MB mem, 32 kB L1 cache, 4MB L2 cache – Columbia, LLNL, SDSC, Oak Ridge ARCS Award • IBM was chosen to supply the NCAR Advanced Research Computing System (ARCS) … … will exceed the articulated purpose and goals • A world-class system to provide reliable production supercomputing to the NCAR Community and Climate Simulation Laboratory • A phased introduction of new, state-of-the-art computational, storage and communications technologies through the life of the contract (3-5 years) • First equipment delivered Friday, 5 October ARCS Timetable Date System Node Type Processor Oct 2001 blackforest upgrade Winterhawk-2 & Nighthawk-2 375 MHz POWER3-II Sep 2002 bluesky with Colony Switch Regatta ~1.35 GHz POWER4 Sep-Dec 2003 Federation Switch Upgrade (blackforest removed after Federation acceptance) 3-Year Contract 2-Year Extension Option Sep-Dec 2004 bluesky upgrade Armada ~2.0 GHz POWER4-GP ARCS Capacities Date System Minimum Total Disk Capacity (TB) Total Memory (TB) Peak TFLOPs New (Total) 3-Year Contract Oct 2001 blackforest upgrade 10.5 0.75 1.1 (2.0) Sep 2002 bluesky with Colony Switch 33 2.8 5.81+ (6.81+) Sep-Dec 2003 Federation Switch Upgrade 2-Year Extension Option Sep-Dec 2004 bluesky upgrade 65 + Negotiated capability commitments may require installation of additional capacity. 3.8 8.75+ (8.75+) ARCS Commitments • Minimum Model Capability Commitments – blackforest upgrade – bluesky – bluesky upgrade 1.0x (defines ‘x’) 3.1x 4.6x Failure to meet these commitments will result in IBM installing additional computational capacity • Improved user environment functionality, support and problem resolution response • Early access to new hardware & software technologies • NCAR’s participation in IBM’s “Blue Light” exploratory supercomputer project (PFLOPs) ARCS, SCD Roadmap Goal, and Moore's Law 1.2 Estimated Sustained TFLOPs 1.0 bluesky upgrade 0.8 blackforest deinstall Likely path to TFLOP goal Goal Federation install 0.6 bluesky install Moore's Law 0.4 ARCS Contract blackforest upgrade 0.2 0.0 Jul-01 Jul-02 Jul-03 Jul-04 Jul-05 Proposed Equipment - IBM ARO+60 Sep 2002 164 WH2/4 5 NH2/16 +120 POWER4 MI SMP/8 Processor 375 MHz POWER3 1.35 GHz POWER4 Interconn. TBMX 180MB/s; 22 usec Colony/NH2 Adapter† 345MB/s; 17 usec Peak TF 1.1 6.6 Mem (TB) 0.45 2.49 Disk (TB) 23.5 44.5 Nodes System Software: PSSP/AIX, JFS/GPFS, LoadLeveler † Federation switch (2400 MB/s, 4 usec) option 2H03 ARCS Roadmap Oct ’01 Oct ’02 Oct ’03 Oct ‘04 blackforest Upgrade bluesky Installation Federation Upgrade bluesky Upgrade bluesky 4.8+ TFLOPs peak 2.8 TB memory 21 TB GPFS disk Colony Switch 3 NH2/16pe – P3 POWER4/~1.35 GHz P4 Node/pe #s TBD ~2.0 GB memory/pe bluesky 4.8+ TFLOPs peak 2.8 TB memory 21 TB GPFS disk Federation Switch NH2 removed POWER4/~1.35 GHz P4 Node/pe #s TBD ~2.0 GB memory/pe bluesky 8.75+ TFLOPs peak 3.8 TB memory 65 TB GPFS disk Federation Switch POWER4-GP/~2.0 GHz P4 Node/pe #s TBD ~3.0 memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe NH2 to bluesky 512 MB memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe 512 MB memory/pe blackforest 2.0 TFLOPs peak 0.73 TB memory 10.5 TB GPFS disk TBMX Switch POWER3-II/375 MHz 315 WH2/4pe 3 NH2/16pe 512 MB memory/pe “TFLOP Option” SCD will likely augment bluesky with additional POWER4 nodes when blackforest is decommissioned Thank you all for attending CAS 2001 See you all in 2003!