The Globus Toolkit V4.0: Current Status, Future Directions
Download
Report
Transcript The Globus Toolkit V4.0: Current Status, Future Directions
The Globus® Toolkit V4.0:
Current Status, Future
Directions
Carl Kesselman
2
The Application-Infrastructure Gap
Dynamic
and/or
Distributed
Applications
Shared Distributed Infrastructure
B
A
1
1
9
9
Bridging the Gap:
Service-Oriented Infrastructure
3
Users
Service-oriented applications
Wrap applications as
services
Compose applications
into workflows
Service-oriented
infrastructure
Provision physical
resources to support
application workloads
Composition
Workflows
Invocation
Appln
Service
Appln
Service
Provisioning
Globus is Service-Oriented
Infrastructure Technology
Software for service-oriented infrastructure
Uniform abstractions & mechanisms
Registries, security, data management, …
Open source & open standards
E.g., GRAM on computer, GridFTP on
storage system, custom application service
Tools to build applications that exploit
service-oriented infrastructure
Service enable new & existing resources
Each empowers the other
Enabler of a rich tool & service ecosystem
4
Globus as
Service-Oriented Infrastructure
User
Application
User
Application
Tool
Uniform interfaces,
security mechanisms,
Web service transport,
monitoring
GRAM
Computers
Reliable
File
Transfer
MDSIndex
User Svc
Host Env
Specialized
resource
5
User
Application
Tool
User Svc
Host Env
MyProxy
GridFTP
Storage
DAIS
Database
A Typical eScience Use of Globus:
Network for Earthquake Eng. Simulation
Links instruments, data,
computers, people
An eBusiness Use of Globus:
SAP Demonstration @ GlobusWorld
3 Globus-enabled applns:
CRM: Internet Pricing Configurator (IPC)
CRM: Workforce
Management (WFM)
Web Browsers / Batch Processes
SCM: Advanced Planner
& Optimizer (APO)
Applications modified to:
Adjust to varying
demand & resources
Use Globus to discover
& provision resources
(typically several thousand requests)
Request:
Price Query
1
IPC
Server
2
IPC
Delegation of
Dispatcher
Request
2
IPC
Response: PricelistServer
Depending on:
- Time
- Discount
- Number of Items
-… 3
SAP AG R/3 Internet Pricing
& Configurator (IPC)
8
Globus Toolkit V4.0
Major release planned April 29th 2005
Fifteen months of design, development and
testing
1.8M lines of code
Major contributions from five institutions
Hundreds of millions of service calls executed
over weeks of continuous operation
Significant improvements over GT3 code base
in all dimensions
9
Our Goals for GT4
Usability, reliability, scalability, …
Documentation at acceptable quality level
Consistency with latest standards (WS-*,
WSRF, WS-N, etc.) and Apache platform
Web service components have quality equal
or superior to pre-WS components
WS-I Basic (Security) Profile compliant
New components, platforms, languages
And links to larger Globus ecosystem
Globus Open Source Grid Software
G
T
4
G
T
3
G
T
2
G
T
3
G
T
4
Community
Scheduler
Framework
[contribution]
Delegation
Service
Python WS Core
[contribution]
C WS Core
Community
Authorization
Service
OGSA-DAI
[Tech Preview]
WS
Authentication
Authorization
Reliable
File
Transfer
Grid
Resource
Allocation Mgmt
(WS GRAM)
Monitoring
& Discovery
System
(MDS4)
Java WS Core
GridFTP
Grid
Resource
Allocation Mgmt
(Pre-WS GRAM)
Monitoring
& Discovery
System
(MDS2)
C Common
Libraries
Pre-WS
Authentication
Authorization
Web
Services
Components
Components
Replica
Location
Service
XIO
Credential
Management
Security
Data
Management
Non-WS
Execution
Management
Information
Services
Common
Runtime
11
GT4 Components
Your
Your
CC
Client
Client
SERVER
Your
Your
Python
Python
Client
Client
Java Services in Apache Axis Python hosting,
Plus GT Libraries and Handlers
GT Libraries
Pre-WS MDS
C WS
Core
Pre-WS GRAM
pyGlobus
WS Core
RLS
Your
C
Service
MyProxy
Your
Python
Service
SimpleCA
X.509 credentials =
common authentication
CAS
OGSA-DAI
GTCP
Delegation
Index
Trigger
Archiver
Your
Your
Java
Java
Service
Service
GRAM
RFT
Interoperable
WS-I-compliant
SOAP messaging
Your
Your
CC
Client
Client
Your
Your
Java
Java
Client
Client
Your
Your
Python
Python
Client
Client
GridFTP
Your
Your
Java
Java
Client
Client
CLIENT
C Services using GT
Libraries and Handlers
12
GT4 Web Services Core
Supports both Globus services (GRAM, RFT,
Delegation, etc.) & user-developed services
Redesign to enhance scalability, modularity,
performance, usability
Leverages existing WS standards
WS-I Basic Profile: WSDL, SOAP, etc.
WS-Security, WS-Addressing
Adds support for emerging WS standards
WS-Resource Framework, WS-Notification
Java, Python, & C hosting environments
13
GT4 Web Services Core
Custom
Web
Services
Custom
GT4
WSRF Web WSRF Web
Services
Services
WS-Addressing, WSRF,
WS-Notification
WSDL, SOAP, WS-Security
Registry
Administration
GT4 Container
User Applications
14
Open Source/Open Standards
WSRF developed in collaboration with IBM
Currently in OASIS process
Contributions to Apache for
WS-Security
WS-Addressing
Axis
Apollo (WSRF)
Hermes (WS-Notification)
15
GT4 Security Highlights
Standards based support for message level
and transport level security
Standards based authorization (SAML) via
CAS or callout
Stand-alone delegation service
More authentication options
MyProxy, simpleCA, …
16
GT4’s Use of Security Standards
17
GT4 Security
SSL/WS-Security
with Proxy
Services (running
Certificates
Authz Callout
on user’s behalf)
Access
Compute
Center
Rights
CAS or VOMS
issuing SAML
or X.509 ACs
Users
Rights
Local Policy
on VO identity
or attribute
authority
MyProxy
VO
Rights’
KCA
18
GT4 Data Management
Stage large data to/from nodes
Replicate data for performance & reliability
Locate data of interest
Provide access to diverse data sources
File systems, parallel file systems,
hierarchical storage (GridFTP)
Databases (OGSA DAI)
19
GT4 Data Functions
Find your data: Replica Location Service
Managing ~40M files in production settings
Move/access your data: GridFTP, RFT
High-performance striped data movement
27 Gbit/s memory-to-memory on a 30 Gbit/s link (90%
utilization) with 32 IBM TeraGrid nodes.
17.5 Gbit/s disk-to-disk limited by the storage system
Reliable movement of 120,000 files (so far)
Couple data & execution management
GRAM uses GridFTP and RFT for staging
Bandwidth Vs Striping
100% Globus code
Bandwidth (Mbps)
18000
GridFTP in GT4
No licensing issues
Stable, extensible
20
Disk-to-disk on
TeraGrid
20000
16000
14000
12000
10000
8000
6000
4000
2000
0
0
10
20
30
40
50
60
70
Degree of Striping
# Stream = 1
# Stream = 2
# Stream = 4
# Stream = 8
# Stream = 16
# Stream = 32
IPv6 Support
XIO for different transports
Striping multi-Gb/sec wide area transport
Pluggable
Front-end: e.g., future WS control channel
Back-end: e.g., HPSS, cluster file systems
Transfer: e.g., UDP, NetBLT transport
21
Reliable File Transfer:
Third Party Transfer
Fire-and-forget transfer
Web services interface
Many files & directories
RFT Client
SOAP
Messages
RFT Service
Integrated failure recovery
GridFTP Server
Master
DSI
Protocol
Interpreter
GridFTP Server
Data
Channel
Data
Channel
IPC Link
IPC
Receiver
Notifications
(Optional)
Protocol
Interpreter
Master
DSI
IPC Link
Slave
DSI
Data
Channel
Data
Channel
Slave
DSI
IPC
Receiver
22
Replica Location Service
Identify location of files
via logical to physical
name map
Distributed indexing of
names, fault tolerant
update protocols
GT4 version scalable &
stable
Managing ~40 million
files across ~10 sites
Index
Index
Local Update Bloom Bloom
DB
send
filter
filter
(secs) (secs) (bits)
10K
<1
2
1M
1M
2
24
10 M
5M
7
175
50 M
23
Data Replication Service
(tech preview)
Pull “missing” files to local site
Site B
Site A
List of
required
Files
Data
Replication
Service
Local
Replica
Catalog
Replica
Location
Index
Reliable
File
Transfer
Service
GridFTP
Reliable
File
Transfer
Service
GridFTP
Data
Replication
Service
Local
Replica
Catalog
Replica
Location
Index
24
OGSA-DAI
Flexible & Composable Middleware
Data access
Data integration
Relational & XML Databases, semi-structured files
Multiple data delivery mechanisms, data translation
Extensible & Efficient framework
Request documents contain multiple tasks
A task = execution of an activity
Group work to enable efficient operation
Extensible set of activities
> 30 predefined, framework for writing your own
Moves computation to data
Pipelined and streaming evaluation
Concurrent task evaluation
25
Predefined Activities
Developers encouraged to roll their own – many do
fileAccess
fileManipulation
directoryAccess
fileWriting
relationalResourceManager
sqlBulkLoadRowset
sqlUpdateStatement
sqlStoredProcedure
sqlQueryStatement
DeliverFromFile
DeliverFromGDT
xmlCollectionManagement
xmlResourceManagement
xQueryStatement
xUpdateStatement
xPathStatement
DeliverToStream
DeliverFromGFTP
DeliverToGFTP
DeliverToURL
DeliverFromURL
DeliverToFile
DeliverToGDT
outputStream
inputStream
xslTransform
zipArchive
gzipCompression
26
OGSA-DAI
Current Release = Release 5
Added Installation wizards & indexed files
>1100 registered users we know about
Running on 3 message passing infrastructures
Release 6 – May 2005
Improved client side API
Explicit control of sequential & parallel tasks
Dynamic reconfigurability
Release 7 – September 2005
Sessions and local transactions
More integration components, distributed relational
query
WS-DAI reference implementation
Talk by Neil Chue Hong 15:45 Today
27
Execution Management (GRAM)
Common WS interface to schedulers
Unix, Condor, LSF, PBS, SGE, …
More generally: interface for process
execution management
Lay down execution environment
Stage data
Monitor & manage lifecycle
Kill it, clean up
A basis for application-driven provisioning
28
GT4 GRAM
2nd-generation WS implementation
Streamlined critical path
Use only what you need
Flexible credential management
optimized for performance, stability, scalability
Credential cache & delegation service
GridFTP & RFT used for data operations
Data staging & streaming output
Eliminates redundant GASS code
Single and multi-job support
29
GT4 GRAM Structure:
WSRF/WSN Poster Child
Service host(s) and compute element(s)
Client
Delegate
Delegation
Transfer
request
RFT File
Transfer
Compute element
Local job
control
sudo
GT4 Java Container
GRAM
GRAM
services
services
GRAM
adapter
GridFTP
FTP
control
Local
scheduler
User
job
FTP data
GridFTP
Remote
storage
element(s)
30
Initial Investigations into VM
Deployment
request
VM EPR
VM Factory
create new VM image
Client
use existing VM image
Create
VM image
inspect and manage
VM Repository
deploy & suspend
start program
VM Manager
Resource
VM
31
Monitoring and Discovery
“Every service should be monitorable and
discoverable using common mechanisms”
WSRF/WSN provides those mechanisms
A common aggregator framework for
collecting information from services, thus:
MDS-Index: Xpath queries, with caching
MDS-Trigger: perform action on condition
Deep integration with Globus containers &
services: every GT4 service is discoverable
GRAM, RFT, GridFTP, CAS, …
GT4
Monitoring & Discovery
WS-ServiceGroup
Clients
(e.g., WebMDS)
GT4 Container
Registration &
WSRF/WSN Access
GT4 Container
MDSIndex
Automated
registration
in container
GRAM
32
MDSIndex
adapter
GT4 Cont.
Custom protocols
for non-WSRF entities
MDSIndex
GridFTP
User
RFT
33
MDS4 Extensibility
Aggregator framework provides
Registration management
Collection of information from Grid
Resources
Plug in interface for data access, collection
,query, …
WebMDS framework provides for
customized display
XSLT transformations
GT4
Documentation
is
Much Improved!
35
The Globus Ecosystem
Globus components address core issues
relating to resource access, monitoring,
discovery, security, data movement, etc.
A larger Globus ecosystem of open
source and proprietary components
provide complementary components
GT4 being the latest version
A growing list of components
These components can be combined to
produce solutions to Grid problems
We’re building a list of such solutions
Many Tools Build on, or Can
Contribute to, GT4-Based Grids
36
Condor-G, DAGman
VOMS
MPICH-G2
PERMIS
GRMS
GT4IDE
Nimrod-G
Sun Grid Engine
Ninf-G
PBS scheduler
Open Grid Computing Env.
LSF scheduler
Commodity Grid Toolkit
GridBus
GriPhyN Virtual Data System
TeraGrid CTSS
Virtual Data Toolkit
NEES
GridXpert Synergy
IBM Grid Toolbox
Platform Globus Toolkit
…
37
2005 and Beyond
We have a solid Web services base
We now want to build, on that base, a
open source service-oriented infrastructure
Virtualization
New services for provisioning, data
management, security, VO management
End-user tools for application development
Etc., etc.
38
How Globus Works
Globus is a distributed open source
community with many contributors & users
CVS, documentation, bugzilla, email lists
Modular structure allows many to contribute
Globus Alliance Board provides
governance when needed
Meritocracy: individuals who demonstrate
ongoing contributions & commitment
Primarily: what to include, when to release
Globus Alliance is an informal partnership
of organizations led by Board members
39
Evolution of the Globus Alliance
Argonne/U.Chicago (Childers, Foster): 1995
USC/ISI (Kesselman): 1995
Edinburgh (Atkinson, Parsons): 2003
Swedish PDC (Johnsson, Mulmo): 2003
NCSA (Welch): 2004
Univa (Czajkowski, Tuecke): 2004
Other contributors will surely be added
40
From eScience to eBusiness
Since ~2001, growing interest in Globus
for commercial use
Enterprises, IT vendors, ISVs asking Globus
leaders to address commercial needs
But hard to do in a research laboratory
In response, we have created two new
organizations
Globus Consortium
Univa
Globus Consortium
(www.globusconsortium.com)
Nonprofit organization funded by companies
to advance Globus Toolkit for enterprise use
Initial sponsor members: HP, IBM, Intel, Sun
Initial contributors: Nortel, Univa
First two projects already identified
Member-driven software quality improvements
Contributions to job submission standards
Other projects to be defined, e.g.
Develop new features key to enterprise use
Education & outreach
41
42
Provider of commercial support, services, &
products around open source Globus
Commercial distribution of GT4 & beyond
Integration with enterprise systems
Committed to open source & open standards
Founded by Tuecke, Foster, Kesselman
Tuecke left Argonne to be CEO
Foster, Kesselman remain at Argonne, ISI
Experienced management team
Rich Miller, Vas Vasiliadis, Paul Davé,
Bob Mandel
43
Globus and its User Community
How can “we” best support “you”?
We try to provide the best software we can
We use bugzilla & other community tools
We work to grow the set of contributors
How can “you” best support “us”?
Become a contributor: of software, bug
fixes, answers to questions, documentation
Provide us with success stories that can
justify continued Globus development
Promote Globus within your communities
44
Working with GT4
Download and use the software, and provide
feedback
Review, critique, add to documentation
Join [email protected] mail list
Globus Doc Project: http://gdp.globus.org
Tell us about your GT4-related tool, service,
or application
45
So…
GT4 is a significant step
forward in the quality,
functionality and
standards compliance of
GT.
Beta release available for
immediate use, final April
29th
Downloads and docs at:
www.globustoolkit.org
2nd Edition
www.mkp.com/grid2