A tale of two grids – Open Science Grid & TeraGrid Craig Stewart Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies 20 July 2011 [email protected] Presented.

Download Report

Transcript A tale of two grids – Open Science Grid & TeraGrid Craig Stewart Executive Director, Pervasive Technology Institute; Associate Dean, Research Technologies 20 July 2011 [email protected] Presented.

A tale of two grids – Open Science Grid &
TeraGrid
Craig Stewart
Executive Director, Pervasive Technology Institute; Associate
Dean, Research Technologies
20 July 2011
[email protected]
Presented at FLEET** Working Group Meeting, 20 July, Vienna, Austria
Available from: http://hdl.handle.net/2022/13405
**http://cordis.europa.eu/fetch?CALLER=PROJ_ICT&ACTION=D&CAT=PROJ&RCN=99182
1
Open Science Grid Today
From http://www.opensciencegrid.org/
http://myosg.grid.iu.edu/map?all_sites=on&active=on&active_value=1&disable_value=1&gridtype=on&gridtype_1=on
2
11 Resource Providers, One Facility 2010
UW
Grid Infrastructure
Group (UChicago)
UC/ANL
PSC
NCAR
PU
NCSA
Caltech
IU
ORNL
USC/ISI
UNC/RENCI
NICS
SDSC
TACC
LONI
Resource Provider (RP)
Software Integration Partner
Network Hub
This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 2-5
August 2010, Pittsburgh PA. Used with permission
TeraGrid Today – XSEDE Tomorrow
https://www.teragrid.org/
https://www.xsede.org/ © XSEDE.ORG
Used with permission
4
Three experimental projects => Trillium => OSG
•
•
•
Three separate experimental projects
– Particle Physics Data Grid (DOE, 1999) www.ppdg.net/
– GriPhyN (NSF, 2000) www.griphyn.org/
– International Virtual Data Grid Laboratory (NSF, 2001)
www.ivdgl.org/
Trillium
– Formal steering committee
– Very clear definitions about what project did, and what it did not do
Open Science Grid – 2005
– Two awards, but parallel leadership
– Clear command and control
– But clear community input
– Continued focus on what OSG does and what it does not do –
strategic choices are real choices
5
TeraGrid and related NSF History
• 1980s NSF supercomputer centers program: 5, then 4
supercomputer centers
• And then there were two: NPACI and NCSA
• In 2000, the $36 million Terascale Computing System
award to PSC (LeMieux - capable of 6 trillion operations
per second. When LeMieux went online in 2001, it was
the most powerful U.S. system committed to general
academic research.)
• LeMieux was intended to be a production supercomputer
system
6
We’re growing – what direction?
•
•
•
The TeraGrid began in 2001 when NSF awarded $45 million to NCSA, SDSC,
Argonne National Laboratory, and the Center for Advanced Computing Research
(CACR) at California Institute of Technology, to establish a Distributed Terascale
Facility (DTF). The initial TeraGrid specifications included computers capable of
performing 11.6 teraflops, disk-storage systems with capacities of more than 450
terabytes of data, visualization systems, data collections, integrated via grid
middleware and linked through a 40-gigabits-per-second optical network.
In 2002, NSF made a $35 million Extensible Terascale Facility (ETF) award to
expand the initial TeraGrid to include PSC and integrate PSC's LeMieux system.
Resources in the ETF provide the national research community with more than 20
teraflops of computing power distributed among the five sites and nearly one
petabyte (one quadrillion bytes) of disk storage capacity.
NSF made three Terascale Extensions awards totaling $10 million in 2003. The new
awards funded high-speed networking connections to link the TeraGrid with
resources at Indiana and Purdue Universities, Oak Ridge National Laboratory, and
the Texas Advanced Computing Center at The University of Texas, Austin.
7
Production
• In production 1 October 2004 … But what did that really
mean?
• The same centers in multiple allocation schemes caused
great confusion
• Is it an instrument? A social movement? Is it really
delivering services uniquely enabled by the combination
of hardware, networks, and people involved?
• In August 2005, NSF Office of Cyberinfrastructure
extended support for the TeraGrid with a $150M set of
awards for operation, user support and enhancement of
the TeraGrid facility over the next five years
8
TeraGrid objectives – circa 2010
• DEEP Science: Enabling Terascale and Petascale Science
– make science more productive through an integrated set of veryhigh capability resources
• address key challenges prioritized by users
• WIDE Impact: Empowering Communities
– bring TeraGrid capabilities to the broad science community
• partner with science community leaders - “Science
Gateways”
• OPEN Infrastructure, OPEN Partnership
– provide a coordinated, general purpose, reliable set of services
and resources
• partner with campuses and facilities
This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference,
2-5 August 2010, Pittsburgh PA . Used with permission
What is the TeraGrid? – circa 2010
• An instrument that delivers high-end IT resources/services:
computation, storage, visualization, and data/services
– a computational facility – over two PFlop in parallel computing
capability
– collection of Science Gateways – provides a new idiom for
access to HPC resources via discipline-specific web-portal frontends
– a data storage and management facility – over 20 PetaBytes of
storage (disk and tape), over 100 scientific data collections
– a high-bandwidth national data network
• A service: help desk and consulting, Advanced Support for TeraGrid
Applications (ASTA), education and training events and resources
• Available freely to research and education projects with a US lead
– research accounts allocated via peer review
– Startup and Education accounts automatic
This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference,
2-5 August 2010, Pittsburgh PA. Used with permission
How is TeraGrid organized? - 2010
•
TG is set up like a large cooperative research group
– evolved from many years of collaborative arrangements between the centers
– still evolving!
•
Federation of 12 awards
– Resource Providers (RPs)
•
provide the computing, storage, and visualization resources
– Grid Infrastructure Group (GIG)
•
•
central planning, reporting, coordination, facilitation, and management group
Leadership provided by the TeraGrid Forum
– made up of the PI’s from each RP and the GIG
– led by the TG Forum Chair, who is responsible for coordinating the group (elected
position)
•
John Towns – TG Forum Chair
– responsible for the strategic decision making that affects the collaboration
•
Day-to-Day Functioning via Working Groups (WGs):
– each WG under a GIG Area Director (AD), includes RP representatives and/or users,
and focuses on a targeted area of TeraGrid
This slide from the talk “Overview of the TeraGrid” presented by John Towns at the TeraGrid10 Conference, 25 August 2010, Pittsburgh PA. Used with permission
Track I and II RFPs
•
•
•
•
•
Track I – NCSA Blue Waters
Track IIa – TACC Ranger
Track IIb – NICS Kraken
Track IIc – the wheels come off…
Track IId
– Data intensive
– Testbed
– Experimental GPU system
12
TeraGrid eXtreme Digital RFP and
Taskforces
• TeraGrid eXtreme Digital RFP fundamentally
different in its organizational structure
• One leader, many subcontracts
• Still a great deal of diversity, but one
fundamental point of control organizationally and
path toward one point of control financially
13
NSF Advisory Committee for
Cyberinfrastructure Task Forces
•
•
•
Cyberinfrastructure consists of computational systems, data and information
management, advanced instruments, visualization environments, and people,
all linked together by software and advanced networks to improve scholarly
productivity and enable knowledge breakthroughs and discoveries not
otherwise possible.
In early 2009 National Science Foundation’s (NSF) Advisory Committee for
Cyberinfrastructure (ACCI) charged six different task forces to make strategic
recommendations to the NSF in strategic areas of cyberinfrastructure:
– Data
– Grand Challenges and Virtual Organizations
– High Performance Computing
– Software and Tools
– Workforce Development
– Campus Bridging
Why Bridging? We need bridges because it feels like you are falling off a cliff
when you go from your campus CI to the TeraGrid or Open Science Grid, so
you need to have a bridge….
14
Some information about input to the NSF
regarding Cyberinfrastructure
Key point: During the competition in response to the
TeraGrid XD solicitation, the two teams competing to
manage new facility did in fact also collaborate by
providing NSF with data and reasoning that supported
the making of an award – rather than not making an
award at all (which was a real possibility)
15
Not a Branscomb Pyramid
NSF Track 1
Track 2 and other
major facilities
Campus HPC/ Tier 3
systems
Workstations at
Carnegie…
Volunteer
computing
Commercial cloud
(Iaas and Paas)
0
2000
4000
6000
8000
10000
12000
So that anyone may quibble, the data are published: Welch, V., R. Sheppard, M.J. Lingwall and C.A.
Stewart. Current structure and past history of US cyberinfrastructure (data set and figures). 2011.
Available from: http://hdl.handle.net/2022/13136
16
Adequacy of Research CI
Never (10.6%)
Some of the time
(20.2%)
Most of the time
(40.2%)
All of the time (29%)
Stewart, C.A., D.S. Katz, D.L. Hart, D. Lantrip, D.S. McCaulay and R.L. Moore. Technical Report:
Survey of cyberinfrastructure needs and interests of NSF-funded principal investigators. 2011.
Available from: http://hdl.handle.net/2022/9917
17
• Each is most probably
correct; with regard to
some aspect of
innovative capability,
each computer
scientist’s software
usually is the best.
• At the end of the day,
choices have to be made
about which tools are
most widely adopted as
part of the national
(international?)
infrastructure to achieve
some economy of scale
18
Audience
Current
Annual growth rate
Number of users
Potential user communities
Creators
Size of community expected to contribute and maintain software
License terms
Reusability
Current Reuse Readiness Level
Best practices in software engineering
Is there a formal software development plan?
Are there independent reviews and audits of software development?
Software functionality
Describe the software’s efficiency, including parallel scaling if
appropriate
Scientific outcomes
What publications and major awards have been enabled by this
software?
Adapted from: Cyberinfrastructure Software Sustainability and
Reusability Workshop Final Report. C.A. Stewart, G.T. Almes, D.S.
McCaulay and B.C. Wheeler, eds., 2010. Available from:
http://hdl.handle.net/2022/6701 or
https://www.createspace.com/3506064
We are all human (subjects)
• Resource elicitation
meetings done with proper
IRB approval were essential
in creating publishable
needs analysis
• http://hdl.handle.net/2022/9
917
20
NSF ACCI Task Forces also provide significant
guidance to NSF
http://www.nsf.gov/od/oci/taskforces/
21
http://pti.iu.edu/campusbridging/
.
22
CIF21, TeraGrid XD, and OSG of the future
• The fundamental challenge with the TeraGrid was always that the
formal organizational and financial structure was fundamentally
different than the structure the NSF stated verbally they wanted to
have
• Cyberinfrastructure Framework for 21st century discovery
• TeraGrid eXtreme Digital
– Subset of CIF21, but umbrella for many existing programs
• Thanks to the data deluge, we have a mission for TeraGrid XD that
is both general and meaningful
• OSG has its challenges as well
23
Thank you!
• Questions and discussion?
Please cite as: Stewart, C.A. 2011. “A tale of two grids – Open Science Grid &
TeraGrid.” Presentation. Presented at FLEET Working Group Meeting, 20 July, 2011,
Vienna, Austria. Available from: http://hdl.handle.net/2022/13405
Except where otherwise noted, the contents of this presentation are copyright 2011 by
the Trustees of Indiana University. This content is released under the Creative Commons
Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/)
24