Transcript Document

Grid Computing
from a solid past to a bright future?
David Groep
NIKHEF
2002-08-28
The Grid: a vision?
Imagine that you could plug your computer
into the wall and have direct access to huge
computing resources immediately,
just as you plug in a lamp to get instant light.
…
Far from being science-fiction, this is the idea
the XXXXXX project is about to make into reality.
…
from a project brochure in 2001
The Need for Grids: LHC
Physics @ CERN
• LHC particle accellerator
• operational in 2007
• 5-10 Petabyte per year
• 150 countries
• > 10000 Users
• lifetime ~ 20 years
http://www.cern.ch/
CPU & Data Requirements
Estimated
CPU Capacity
at CERNat
Estimated
CPU capacity
required
CERN
5,000
4,500
4,000
K SI95
3,500
3,000
2,500
2,000
1,500
Other
experiments
LHC experiments
1,000
500
0
Moore’s law –
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
some measure
year
Jan 2000:
3.5K SI95
http://www.cern.ch/
of the
capacity technology
advances provide for a
constant number of
processors or investment
< 50% of the main analysis
capacity will be at CERN
More Reasons Why
ENVISAT
• 3500 MEuro programme cost
•
•
•
•
•
10 instruments on board
200 Mbps data rate to ground
400 Tbytes data archived/year
~100 `standard’ products
10+ dedicated facilities in Europe
• ~700 approved science user projects
http://www.esa.int/
And More …
Bio-informatics
•For access to data
–Large network bandwidth to
access computing centers
–Support of Data banks replicas
(easier and faster mirroring)
–Distributed data banks
•For interpretation of data
–GRID enabled algorithms
BLAST on distributed
data banks,
distributed data mining
And even more …
• financial services, life sciences, strategy evaluation, …
• instant immersive teleconferencing
• remote experimentation
• pre-surgical planning and simulation
Why is the Grid successful?
• Applications need
large amounts of data or computation
• Ever larger, distributed user community
• Network grows faster than compute power/storage
Inter-domain communication
• The Internet community spawned 3360 RFCs
(as of August 2nd, 2002)
• Myriad of different protocols and APIs
• Be strict in what you send
be liberal in what you accept
Average IETF attendance
2500
2000
1500
1000
500
2000
1998
1196
1994
1992
1990
1988
0
1986
• Inter-domain by nature
• Increasing focus on security
Intra-domain tools
• RPC proved hugely successful within domains
– YP
– Network File System
– Typical client-server stuff…
• CORBA
– Extension of RPC to OO design model
– Diversification
• Latest trend: web services
The beginnings of the Grid
• Grown out of distributed computing
• Gigabit network test beds & meta-computing
• Supercomputer sharing (I-WAY)
• Condor ‘flocking’
GUSTO meta-computing test bed in 1999
• Focus shifts to inter-domain operations
The Grid
Ian Foster and Carl Kesselman, editors,
“The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, 1999
The One-Liner
• Resource sharing and
coordinated problem solving
in dynamic multi-institutional virtual organisations
Standards Requirements
• Standards are key to inter-domain operations
• GGF established in 2001
• Approx. 40 working & research groups
(G)GF attendance
1200
1000
800
600
400
200
0
1999 1999 2000 2000 2000 2001 2001 2001 2002 2002
http://www.gridforum.org/
Protocol Layers & Bodies
Application
Standard bodies: GGF
W3C
Application
Resource
Session
Transport
Collective
Standard body: IETF
Connectivity
Internet
Fabric
Network
Data Link
Physical
Standard body: IEEE
Transport
Link
Internet Protocol Architecture
Presentation
Application
Grid Architecture (v1)
“Coordinating multiple resources”:
ubiquitous infrastructure services, appspecific distributed services
“Sharing single resources”: negotiating
access, controlling use
“Talking to things”: communication
(Internet protocols) & security
“Controlling things locally”: Access to,
& control of, resources
Collective
Application
Resource
Connectivity
Transport
Internet
Fabric
Link
Internet Protocol Architecture
Application
What should the Grid provide?
• Dependable, consistent and pervasive access
• Interoperation among organisations
• Challenges:
– Complete transparency for the user
– Uniform access methods for
computing, data and information
– Secure, trustworthy environment for providers
– Accounting (and billing)
– Management-free ‘Virtual Organizations’
Grid Middleware
• Globus Project started 1997
• Current de-facto standard
• Reference implementation of Global Grid Forum
standards
• Toolkit `bag-of-services' approach
http://www.globus.org/
• Several middleware projects:
– EU DataGrid
– CrossGrid, DataTAG, PPDG, GriPhyN
– In NL: ICES/KIS Virtual Lab, VL-E
Condor
• Scavenging cycles off idle work stations
• Leading themes:
– Make a job feel `at home’
– Don’t ever bother the resource owner!
• Bypass
redirect data to process
• ClassAds
matchmaking concept
• DAGman
dependent jobs
• Kangaroo
file staging & hopping
• NeST
allocated `storage lots’
• PFS
Pluggable File System
• Condor-G
reliable job control
for the Grid
http://www.cs.wisc.edu/condor/
Application Toolkits
Collect and abstract services in an order fashion
• Cactus: plug-n-play numeric simulations
• Numeric propulsion system simulation NPSS
• Commodity Grid Toolkits (CoGs):
JAVA, CORBA, …
• NIMROD-G: parameter sweeping simulations
• Condor: high-throughput computing
• GENIUS, VLAM-G, … (web) portals to the Grid
Grids Today
Grid Protocols Today
• Based on the popular protocols on the ’Net
• Use common Grid Security Infrastructure:
– Extensions to TLS for delegation (single sign-on)
– Uses GSS-API standard where possible
• GRAM (resource allocation):
attrib/value pairs over HTTP
• GridFTP (bulk file transfer):
FTP with GSI and high-throughput extras (striping)
• MDS (monitoring and discovery service):
LDAP + schemas
• ……
Getting People Together
Virtual Organisations
• The user community `out there’ is huge & highly dynamic
• Applying at each individual resource does not scale
• Users get together to form Virtual Organisations:
– Temporary alliance of stakeholders
(users and/or resources)
– Various groups and roles
– Managed out-of-band
by (legal) contracts
• Authentication, Authorization, Accounting (AAA)
Grid Security Infrastructure
• Requirements:
– Strong authentication and accountability
– Trace-ability
– “Secure”!
– Single sign-on
– Dynamic VOs: “proxying”, “delegation”
– Work everywhere
(“easyEverything”, airport kiosk, handheld)
– Multiple roles for each user
– Easy!
Authentication & PKI
Alice
(e,n)
Certificate Request
CommonName=‘Alice’
Organization=‘KNMI’
Alice generates a key pair
and send the public key to CA
Private
Key(d,n)
Alice…
CA ships the new
certificate to Alice
CA operator signs the
request with the CA key
The CA will check
identifier in the request
against the identity of
the requestor
CA private key
CA self-signed certificate
• EU DataGrid PKI: 1 PMA, 13 Certification Authorities
• Automatic policy evaluation tools
• Largest Grid-PKI in the world (and growing )
GSI in Action
“Create Processes at A and B
that Communicate & Access Files at C”
User
Single sign-on via “grid-id”
& generation of proxy cred.
User Proxy
Proxy
credential
Or: retrieval of proxy cred.
from online repository
Remote process
creation requests*
GSI-enabled Authorize
Site A
GRAM server Map to local id
(Kerberos)
Create process
Generate credentials
Computer
Process
Local id
Kerberos
ticket
Restricted
proxy
Ditto
Communication*
Remote file
access request*
* With mutual authentication
Site C
(Kerberos)
Storage
system
GSI-enabled
GRAM server
Site B
(Unix)
Computer
Process
Local id
Restricted
proxy
GSI-enabled
FTP server
Authorize
Map to local id
Access file
Authorization
• Authorization poses main scaling problem
• Conflict between accountability and
ease-of-use / ease-of-management
• By getting rid of “local user” concept
ease support for large, dynamic VOs:
– Temporary account leasing: pool accounts à la DHCP
– Grid ID-based file operations: slashgrid
– Sandbox-ing applications
Direction of EU DataGrid and PPDG
Locating a Replica
• Grid Data Mirror Package
• Moves data across sites
• Replicates both files and
individual objects
• Catalogue used by Broker
• Replica Location Service
(giggle)
• Read-only copies “owner” by
the Replica Manager.
http://cmsdoc.cern.ch/cms/grid
Mass Data Transport
• Need for efficient, high-speed protocol: GridFTP
• All storage elements share common interface
disk caches, tape robots, …
• Also supports GSI & single sign-on
• Optimize for high-speed networks (>1 Gbit/s)
• Data source striping through parallel streams
• Ongoing work on “better TCP”
Grid Data Bases ?!
• Database Access and Integration (DAI)-WG
– OGSA-DAI integration project
– Data Virtualisation Services
– Standard Data Source Services
Early Emerging Standards:
– Grid Data Service specification (GDS)
– Grid Data Service Factory (GDSF)
Largely spin-off from the UK e-Science effort & DataGrid
Grid Access to Databases
• SpitFire (standard data source services)
uniform access to persistent storage on the Grid
•
•
•
•
Multiple roles support
Compatible with GSI (single sign-on) though CoG
Uses standard stuff: JDBC, SOAP, XML
Supports various back-end data bases
http://hep-proj-spitfire.web.cern.ch/hep-proj-spitfire/
Spitfire security model
Standard access to DBs
•GSI SOAP protocol
•Strong authentication
•Supports single-signon
•Local role repository
•Connection pool to
•Multiple backend DBs
Version 1.0 out,
WebServices version in alpha
A Bright Future?
OGSA: new directions
Open Grid Services Architecture …
… cleaning up the protocol mess
• Concept from the `web services’ world
• Based on common standards:
– SOAP, WSDL, UDDI
– Running over “upgraded” Grid Security Infra (GSI)
• Adds Transient Services:
– State of distributed activities
– Workflow, multi-media, distributed data analysis
OGSA Roadmap
• Introduced at GGF4 (Toronto, March 2002)
• New services already web-services based
(Spitfire 2, etc.)
• Alpha-version of Globus Toolkit v3:
expected December 2002.
• Huge industrial commitment
EU DataGrid
• Middleware research project (2001-2003)
• Driving applications:
• HE Physics
• Earth Observation
• Biomedicine
• Operational testbed
• 21 sites
• 6 VOs
• ~ 200 users, growing with ~100/month!
http://www.eu-datagrid.org/
EU DataGrid Test Bed 1
• DataGrid TB1:
–
–
–
–
14 countries
21 major sites
CrossGrid: 40 more sites
Growing rapidly…
• Submitting Jobs:
– Login only once,
run everywhere
– Cross administrative
boundaries in a
secure and trusted way
– Mutual authorization
http://marianne.in2p3.fr/
DutchGrid Platform
www.dutchgrid.nl
• DutchGrid:
ASTRON
– Test bed coordination
– PKI security
– Support
• Participation by
Leiden
Delft
Amsterdam
Enschede
KNMI
Utrecht
Nijmegen
NIKHEF, KNMI, SARA
DAS-2 (ASCI):
TUDelft, Leiden, VU,
UvA, Utrecht
Telematics Institute
FOM, NWO/NCF
Min. EZ, ICES/KIS
IBM, KPN, …
A Bright Future!
You could plug your computer into the wall
and have direct access to huge computing
resources almost immediately
(with a little help from toolkits and portals)
…
It may still be science – although not fiction –
but we are about to make this into reality!