A tale of two grids – Open Science Grid & TeraGrid Craig Stewart Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies 20 July 2011 [email protected] Presented.
Download ReportTranscript A tale of two grids – Open Science Grid & TeraGrid Craig Stewart Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies 20 July 2011 [email protected] Presented.
A tale of two grids – Open Science Grid & TeraGrid Craig Stewart Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies 20 July 2011 [email protected] Presented at FLEET** Working Group Meeting, 20 July, Vienna, Austria Available from: http://hdl.handle.net/2022/13405 **http://cordis.europa.eu/fetch?CALLER=PROJ_ICT&ACTION=D&CAT=PROJ&RCN=99182 1 Open Science Grid Today From http://www.opensciencegrid.org/ http://myosg.grid.iu.edu/map?all_sites=on&active=on&active_value=1&disable_value=1&gridtype=on&gridtype_1=on 2 11 Resource Providers, One Facility 2010 UW Grid Infrastructure Group (UChicago) UC/ANL PSC NCAR PU NCSA Caltech IU ORNL USC/ISI UNC/RENCI NICS SDSC TACC LONI Resource Provider (RP) Software Integration Partner Network Hub This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 2-5 August 2010, Pittsburgh PA. Used with permission TeraGrid Today – XSEDE Tomorrow https://www.teragrid.org/ https://www.xsede.org/ © XSEDE.ORG Used with permission 4 Three experimental projects => Trillium => OSG • • • Three separate experimental projects – Particle Physics Data Grid (DOE, 1999) www.ppdg.net/ – GriPhyN (NSF, 2000) www.griphyn.org/ – International Virtual Data Grid Laboratory (NSF, 2001) www.ivdgl.org/ Trillium – Formal steering committee – Very clear definitions about what project did, and what it did not do Open Science Grid – 2005 – Two awards, but parallel leadership – Clear command and control – But clear community input – Continued focus on what OSG does and what it does not do – strategic choices are real choices 5 TeraGrid and related NSF History • 1980s NSF supercomputer centers program: 5, then 4 supercomputer centers • And then there were two: NPACI and NCSA • In 2000, the $36 million Terascale Computing System award to PSC (LeMieux - capable of 6 trillion operations per second. When LeMieux went online in 2001, it was the most powerful U.S. system committed to general academic research.) • LeMieux was intended to be a production supercomputer system 6 We’re growing – what direction? • • • The TeraGrid began in 2001 when NSF awarded $45 million to NCSA, SDSC, Argonne National Laboratory, and the Center for Advanced Computing Research (CACR) at California Institute of Technology, to establish a Distributed Terascale Facility (DTF). The initial TeraGrid specifications included computers capable of performing 11.6 teraflops, disk-storage systems with capacities of more than 450 terabytes of data, visualization systems, data collections, integrated via grid middleware and linked through a 40-gigabits-per-second optical network. In 2002, NSF made a $35 million Extensible Terascale Facility (ETF) award to expand the initial TeraGrid to include PSC and integrate PSC's LeMieux system. Resources in the ETF provide the national research community with more than 20 teraflops of computing power distributed among the five sites and nearly one petabyte (one quadrillion bytes) of disk storage capacity. NSF made three Terascale Extensions awards totaling $10 million in 2003. The new awards funded high-speed networking connections to link the TeraGrid with resources at Indiana and Purdue Universities, Oak Ridge National Laboratory, and the Texas Advanced Computing Center at The University of Texas, Austin. 7 Production • In production 1 October 2004 … But what did that really mean? • The same centers in multiple allocation schemes caused great confusion • Is it an instrument? A social movement? Is it really delivering services uniquely enabled by the combination of hardware, networks, and people involved? • In August 2005, NSF Office of Cyberinfrastructure extended support for the TeraGrid with a $150M set of awards for operation, user support and enhancement of the TeraGrid facility over the next five years 8 TeraGrid objectives – circa 2010 • DEEP Science: Enabling Terascale and Petascale Science – make science more productive through an integrated set of veryhigh capability resources • address key challenges prioritized by users • WIDE Impact: Empowering Communities – bring TeraGrid capabilities to the broad science community • partner with science community leaders - “Science Gateways” • OPEN Infrastructure, OPEN Partnership – provide a coordinated, general purpose, reliable set of services and resources • partner with campuses and facilities This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 2-5 August 2010, Pittsburgh PA . Used with permission What is the TeraGrid? – circa 2010 • An instrument that delivers high-end IT resources/services: computation, storage, visualization, and data/services – a computational facility – over two PFlop in parallel computing capability – collection of Science Gateways – provides a new idiom for access to HPC resources via discipline-specific web-portal frontends – a data storage and management facility – over 20 PetaBytes of storage (disk and tape), over 100 scientific data collections – a high-bandwidth national data network • A service: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources • Available freely to research and education projects with a US lead – research accounts allocated via peer review – Startup and Education accounts automatic This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 2-5 August 2010, Pittsburgh PA. Used with permission How is TeraGrid organized? - 2010 • TG is set up like a large cooperative research group – evolved from many years of collaborative arrangements between the centers – still evolving! • Federation of 12 awards – Resource Providers (RPs) • provide the computing, storage, and visualization resources – Grid Infrastructure Group (GIG) • • central planning, reporting, coordination, facilitation, and management group Leadership provided by the TeraGrid Forum – made up of the PI’s from each RP and the GIG – led by the TG Forum Chair, who is responsible for coordinating the group (elected position) • John Towns – TG Forum Chair – responsible for the strategic decision making that affects the collaboration • Day-to-Day Functioning via Working Groups (WGs): – each WG under a GIG Area Director (AD), includes RP representatives and/or users, and focuses on a targeted area of TeraGrid This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 25 August 2010, Pittsburgh PA. Used with permission Track I and II RFPs • • • • • Track I – NCSA Blue Waters Track IIa – TACC Ranger Track IIb – NICS Kraken Track IIc – the wheels come off… Track IId – Data intensive – Testbed – Experimental GPU system 12 TeraGrid eXtreme Digital RFP and Taskforces • TeraGrid eXtreme Digital RFP fundamentally different in its organizational structure • One leader, many subcontracts • Still a great deal of diversity, but one fundamental point of control organizationally and path toward one point of control financially 13 NSF Advisory Committee for Cyberinfrastructure Task Forces • • • Cyberinfrastructure consists of computational systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible. In early 2009 National Science Foundation’s (NSF) Advisory Committee for Cyberinfrastructure (ACCI) charged six different task forces to make strategic recommendations to the NSF in strategic areas of cyberinfrastructure: – Data – Grand Challenges and Virtual Organizations – High Performance Computing – Software and Tools – Workforce Development – Campus Bridging Why Bridging? We need bridges because it feels like you are falling off a cliff when you go from your campus CI to the TeraGrid or Open Science Grid, so you need to have a bridge…. 14 Some information about input to the NSF regarding Cyberinfrastructure Key point: During the competition in response to the TeraGrid XD solicitation, the two teams competing to manage new facility did in fact also collaborate by providing NSF with data and reasoning that supported the making of an award – rather than not making an award at all (which was a real possibility) 15 Not a Branscomb Pyramid NSF Track 1 Track 2 and other major facilities Campus HPC/ Tier 3 systems Workstations at Carnegie… Volunteer computing Commercial cloud (Iaas and Paas) 0 2000 4000 6000 8000 10000 12000 So that anyone may quibble, the data are published: Welch, V., R. Sheppard, M.J. Lingwall and C.A. Stewart. Current structure and past history of US cyberinfrastructure (data set and figures). 2011. Available from: http://hdl.handle.net/2022/13136 16 Adequacy of Research CI Never (10.6%) Some of the time (20.2%) Most of the time (40.2%) All of the time (29%) Stewart, C.A., D.S. Katz, D.L. Hart, D. Lantrip, D.S. McCaulay and R.L. Moore. Technical Report: Survey of cyberinfrastructure needs and interests of NSF-funded principal investigators. 2011. Available from: http://hdl.handle.net/2022/9917 17 • Each is most probably correct; with regard to some aspect of innovative capability, each computer scientist’s software usually is the best. • At the end of the day, choices have to be made about which tools are most widely adopted as part of the national (international?) infrastructure to achieve some economy of scale 18 Audience Current Annual growth rate Number of users Potential user communities Creators Size of community expected to contribute and maintain software License terms Reusability Current Reuse Readiness Level Best practices in software engineering Is there a formal software development plan? Are there independent reviews and audits of software development? Software functionality Describe the software’s efficiency, including parallel scaling if appropriate Scientific outcomes What publications and major awards have been enabled by this software? Adapted from: Cyberinfrastructure Software Sustainability and Reusability Workshop Final Report. C.A. Stewart, G.T. Almes, D.S. McCaulay and B.C. Wheeler, eds., 2010. Available from: http://hdl.handle.net/2022/6701 or https://www.createspace.com/3506064 We are all human (subjects) • Resource elicitation meetings done with proper IRB approval were essential in creating publishable needs analysis • http://hdl.handle.net/2022/9 917 20 NSF ACCI Task Forces also provide significant guidance to NSF http://www.nsf.gov/od/oci/taskforces/ 21 http://pti.iu.edu/campusbridging/ . 22 CIF21, TeraGrid XD, and OSG of the future • The fundamental challenge with the TeraGrid was always that the formal organizational and financial structure was fundamentally different than the structure the NSF stated verbally they wanted to have • Cyberinfrastructure Framework for 21st century discovery • TeraGrid eXtreme Digital – Subset of CIF21, but umbrella for many existing programs • Thanks to the data deluge, we have a mission for TeraGrid XD that is both general and meaningful • OSG has its challenges as well 23 Thank you! • Questions and discussion? Please cite as: Stewart, C.A. 2011. “A tale of two grids – Open Science Grid & TeraGrid.” Presentation. Presented at FLEET Working Group Meeting, 20 July, 2011, Vienna, Austria. Available from: http://hdl.handle.net/2022/13405 Except where otherwise noted, the contents of this presentation are copyright 2011 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/) 24