Grid Services Karan Bhatia Presented by

Download Report

Transcript Grid Services Karan Bhatia Presented by

Grid Services
Presented by
Karan Bhatia
Hype Curve
2
Overview
• Grid Computing Background
– Definition
– Opportunities
– Markets
• Technical Challenges
– Security Infrastructure
– Resource Management
– Service Interoperability
• Summary
3
Grid Computing is …
• “Co-ordinated resource sharing and problem solving in dynamic
multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]
– Co-ordinated - multiple resources working in concert, eg. Disk & CPU,
or instruments & database, etc.
– Resources - compute cycles, databases, files, application services,
instruments.
– Problem solving - focus on solving scientific problems
– Dynamic - environments that are changing in unpredictable ways
– Virtual Organization - resources spanning multiple organizations and
administrative domains, security domains, and technical domains
4
Grid Computing is … (Industry)
• “about finding distributed, underutilized compute resources (systems,
desktops, storage) and provisioning those resources to users or
applications requiring them.” [The Grid Report, Clabby Analytics]
– Distributed - all the resources laying around in departments or server
rooms.
– Underutilized - typical utilization of “big iron” is 5 to 10%. Organizations
save money by increasing utilization versus purchasing new resources.
– Resources - servers and server cycles, applications, data resources
– Provisioning - predict and schedule resource use depending on load.
5
Types of Grids…
• Compute Grids
– Seti@home, Entropia,
United Devices, Condor
• Data Grids
– Storage Resource Broker
(SRB), Avaki, BIRN,
GEON
• Collaboration Grids
– Instrumentation
(telescience), applications
• Enterprise Grids
– Majority of commercial
interest
• Partner Grids
– B2B, Academic/Govt Grids
• Service Grids
– “Utility” Computing, “On
Demand”, pervasive,
autonomic, etc…
6
A Grid is …
• “the next generation Internet,”
• “all about free cycles ala SETI@HOME,”
• “a distributed object system,”
• “a new programming model,”
• “a replacement for high performance computing,”
7
Example… TeleScience Grid
DATA
ACQUISITION
ADVANCED
VISUALIZATION
,ANALYSIS
COMPUTATIONAL
RESOURCES
IMAGING INSTRUMENTS
LARGE-SCALE DATABASES
8
Grid Resources - Networks
9
Grid Resources - Compute
10
Top 500.org
11
12
Another Grid Example …
Google
• Queries
– 150 M queries/day (2000/s)
– 100 countries
– 3.3 B documents
• Hardware
–
–
–
–
–
15,000 Linux systems in 6 data centers
15 Tflop/s and 1000 TB total capacity
40-80 1U/2U servers/cabinet
100 MB Ethernet switches/cabinate with gigabit uplinks
Growth from 4000 systems (18 M queries/day)
13
Grid Resources - Data
• SDSC Resources
– HPSS:
• SDSC's central long-term data storage system,
• one of the world's largest IBM High Performance Storage System
(HPSS) units,
• currently holds more than a petabyte (a million gigabytes) of data in
approximately 21 million files,
• It has the capacity to store six petabytes of data; files are added at an
average rate of 10,000 gigabytes per month.
– Storage-Area Network (SAN):
• A 72-processor Sun Microsystems SunFire 15K high-end server and 11
Brocade switches (1,400 ports)
• 225,000 gigabytes of networked disk storage for data-oriented
applications.
• 1 TB of data = $2500
14
Protein Data Bank (PDB)
15
Putting it all together… TeraGrid
16
Grid Market
17
Grid Companies
• IBM
– “on demand” solutions
• Sun Microsystems
– N1 initiative
• Oracle
– 10g
• Dell
• HP
– “utility” computing
• Platform Computing
– LSF, metaclulstering
• United Devices
– Desktop grids
•
•
•
•
DataSynapse
Akamai
Google?
Sony online
entertainment?
• Where’s Microsoft?
18
Grid Organizations
• Global Grid Forum (GGF)
• Organization for the
Advancement of
Structured Information
Standards (OASIS)
• Distributed Management
Task Force (DMTF)
• World Wide Web
Consortium (W3C)
• Globus Alliance
• NSF Middleware Initiative
(NMI)
• NASA IPG
• DOE Science Grid
• EU DataGrid
• NSF TeraGrid
19
Technical Challenges for Grid
Computing
20
Challenges: Security
• Grids traverse organizational boundaries
– Different administration domains have different authentication
mechanisms
– Resources have different use agreements and sharing priorities
• Single sign-on
– Multiple passwords difficult to manage
• Rights delegation
• Trust
– Authentication of users
– Authorization of users
– Resource access
21
Security
• Public Key Infrastructure
– Public key A.public
– Private key A.private
• Supports Encrpyption
– Message to B:
• m’ = F(m,A.private), send m’ to B
• recv m’, m = F’(m’,A.public)
• Digital Signatures
– Signed message to B:
• m’ = (m,F(m,A.public))
– Receiver verifies that m’ is from A and not tampered
22
Grid Security Infrastructure
(GSI)
• A central concept in GSI
authentication is the certificate.
• Every user and service on the Grid is
identified via a certificate, a text file
containing the following information:
– a subject name identifying the person
or object that the certificate represents,
– the public key belonging to the
subject,
– the identity of a Certificate Authority
(CA) that has signed the certificate to
certify that the public key and the
identity both belong to the subject,
– the digital signature of the named CA.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
23
Proxy Certificate
•
•
•
A proxy consists of a new certificate
with a new public and private key.
The new certificate contains the
owner's identity modified slightly to
indicate that it is a proxy.
The new certificate is signed by the
owner rather than a CA.
– This is called a self-signed certificate.
•
•
•
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
The certificate also includes a time
notation after which the proxy should
no longer be accepted by others.
Proxies have limited lifetimes in order
to minimize the security vulnerability.
Because the proxy isn't valid for very
long, it doesn't have to kept quite as
secure as the owner's private key.
24
Mutual Authentication
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
25
Additional Challenges
• Certificate
Management
– MyProxy
• Role-based Access
Control
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
– CAS, VOM
• Authorization services
• Integration with
applications & Portals
26
Challenges: Resource
Management
• Resources loosely-coupled
– Higher network latencies
– Planned and unplanned disruptions
• How to provide QoS guarantees?
• Case Study: Entropia Desktop Grids
– Additional trust/security issues
27
Entropia 1: Gimps
•
•
•
•
•
Over 1.5 Billion
CPU hours
served
300,000+
machines, over 4
years
operational
Every PC and
hardware config
imaginable
(proc, memory,
disk, etc.)
Every
networking
hookup
imaginable
Found 35th, 36th,
37th, 38th, and
39th Mersenne
Primes
29
Entropia 2: FightAids@home
•
•
•
•
•
Sept 2000 launch
Internet-Based
54,657 total
machines
10,770,506 total
hours of
computation
27,881 peak
billions of
calculations/sec
30
Entropia 3: DCGrid
• Enterprise focus
– Tremendous resources available in enterprise
– Complements other HPC resources
• Computing Platform
– Arbitrary application (open scheduling model)
– Security, unobtrusiveness, manageability guaranteed
• Focus on
– Pharmaceuticals, Chemicals, and Materials
– Financial Services
31
DCGrid Architecture
32
Server vs. Desktop Grids
•
Server environment
–
–
–
–
•
Desktop environment
–
–
–
–
•
Fixed IP, always connected
Always-on operation
Moderate number of systems (10’s – 100’s)
Dedicated use, trusted systems
Dynamic, temporary IP, intermittent connection
Off evenings, off weekends, off lunch
Large numbers of systems (100’s – 1000’s - ?)
Shared resources, potentially untrusted users
These differences give rise to desktop Grid challenges
35
Typical PC-Grid Environment
Elbert Chart - Week 5
700
600
Machine Number
500
400
300
200
100
0
552
576
600
624
648
672
696
720
Time (hours)
36
PC-Grid Challenges
• Provide a stable compute environment for apps
– Isolate app from variable desktop environment
• Operate in environment of dynamic use
– Unobtrusiveness and Fault Tolerance are key!
• Provide simple application integration
– Support ANY Application without modification
• Provide centralized management console
– Zero additional management costs
37
Workflow
End-user
computation
1
8
Job
Management
Job Manager
2
Resource
Schedulinng
3
resource description
7
Subjob Scheduler
b
Physical Node
Management
resource
4
6
5
Node Manager
a
Entropia
Clients
38
Stable Compute Environment
• Entropia Proprietary Sandbox
– Binary-level protection
– System virtualization (registry, file system, network)
• Open Scheduling Infrastructure
– Intelligent scheduling (match resources to subjobs
requirements)
– Manage subjob redundancy/fault tolerance
39
Manage Dynamic Use
• PC primary use must be respected!
• Entropia Proprietary Sandbox
– Guaranteed to run at idle priority
– Limit application capability
– Monitor page faults, network access
• Management
– Provide time-of-use windows
– Different levels of unobtrusiveness
• Gathers 95+ % of cycles
40
Application Integration
• Support any Win32 binary
– Language Neutral (C, C++, Fortran, Java,C#, etc.)
– Compiler/library Neutral
App A
App B
App C
qsub
qstat
…
Run Applications
Application
Preparation Tools
Client1
*
…
…
Client2
*
Open Grid
Platform
41
Manageability
42
Application Performance
160
40
140
HMMER
30
Compounds per Hour
Sequences per hour
35
25
20
15
Entropia
1CPU SGI
1CPU SUN
Linear (Entropia)
10
5
0
0
25
50
100
125
80
60
40
0
0
150
5
10
15
20
25
30
35
40
45
50
Number of Clients
Compounds per Hour
7000
350
AUTODOCK
300
100
20
Number of Clients
400
Throughput (Packets per Hour)
75
GOLD
120
250
200
150
100
6000
DOCK
5000
4000
3000
2000
1000
50
0
0
0
100
200
300
Number of Clients
400
500
600
0
100
200
300
Number of Clients
400
43
500
Scheduling Performance
Job 14 Nodes (94 clients)
100
90
80
70
Client ID
60
50
40
30
20
10
0
0
3600
7200
10800
14400
18000
21600
Time (secs)
44
Challenges: Service
Interoperability
• Trying to force
homogeneity on users
is futile. Everyone has
their own preferences,
sometimes even
dogma.
• The Internet provides
the model…
45
Typical Application
Web
Browser
Simulation
Tool
Web
Portal
Registration
Service
Data
Viewer
Tool
Chat
Tool
Credential
Repository
Certificate
authority
Users work
with client
applications
Application services
organize VOs & enable
access to other services
Compute
Server
Compute
Server
Camera
Telepresence
Monitor
Camera
Data
Catalog
Database
service
Database
service
Database
service
Collective services
aggregate &/or
virtualize resources
Resources implement
standard access &
management
interfaces
46
Typical Application
• Implementations are provided by a mix of
–
–
–
–
Application-specific code
“Off the shelf” tools and services
Tools and services from the Globus Toolkit
Tools and services from the Grid community
(compatible with GT)
• Glued together by…
– Application development
– System integration
47
How it Really Happens
(without the Grid)
Web
Browser
Application
Developer
Off the
Shelf
Globus
Toolkit
Grid
Community
Users work
with client
applications
9
13
0
0
A
Simulation
Tool
Web
Portal
Registration
Service
Data
Viewer
Tool
Chat
Tool
Credential
Repository
Certificate
authority
Application services
organize VOs & enable
access to other services
B
Compute
Server
Compute
Server
Camera
Telepresence
Monitor
Camera
Data
Catalog
Database
service
Database
D
service
Database
E
service
C
Collective services
aggregate &/or
virtualize resources
Resources implement
standard access &
management
interfaces
48
How it Really Happens
(with the Grid)
Web
Browser
Application
Developer
Off the
Shelf
Globus
Toolkit
Grid
Community
Users work
with client
applications
Simulation
Tool
Data
Viewer
Tool
2
9
Globus Index
Service
Portal
portlet
4
4
MyProxy
Telepresence
Monitor
Camera
Camera
Database
service
Globus Database
DAI service
Globus Database
DAI service
Globus
DAI
Globus
MCS/RLS
Certificate
Authority
Application services
organize VOs & enable
access to other services
Compute
Server
Globus Compute
GRAM Server
Globus
GRAM
Collective services
aggregate &/or
virtualize resources
Resources implement
standard access &
management
interfaces
49
Theory -> Practice
50
What You Get in the Globus Toolkit
• OGSI(3.x)/WSRF(4.x) Core Implementation
– Used to develop and run OGSA-compliant Grid Services (Java,
C/C++)
• Basic Grid Services
– Popular among current Grid users, common interfaces to the most
typical services; includes both OGSA and non-OGSA
implementations
• Developer APIs
– C/C++ libraries and Java classes for building Grid-aware
applications and tools
• Tools and Examples
– Useful tools and examples based on the developer APIs
51
Components in Globus Toolkit
3.0
GSI
WU GridFTP
Pre-WS
GRAM
WS-Security
RFT
(OGSI)
WS GRAM
(OGSI)
MDS2
JAVA
WS Core
(OGSI)
WS-Index
(OGSI)
OGSI
C Bindings
RLS
Security
Data
Management
Resource
Management
Information
Services
WS
Core
52
Components in Globus Toolkit
3.2
GSI
WU GridFTP
Pre-WS
GRAM
WS-Security
RFT
(OGSI)
WS GRAM
(OGSI)
CAS
(OGSI)
RLS
SimpleCA
OGSI-DAI
MDS2
JAVA
WS Core
(OGSI)
WS-Index
(OGSI)
OGSI
C Bindings
OGSI
Python Bindings
(contributed)
pyGlobus
(contributed)
XIO
Security
Data
Management
Resource
Management
Information
Services
WS
Core
53
Planned Components in GT 4.0
GSI
New GridFTP
Pre-WS
GRAM
WS-Security
RFT
(WSRF)
WS-GRAM
(WSRF)
CAS
(WSRF)
RLS
CSF
(contribution)
SimpleCA
OGSI-DAI
MDS2
JAVA
WS Core
(WSRF)
WS-Index
(WSRF)
C WS Core
(WSRF)
pyGlobus
(contributed)
Authz
Framework
Security
XIO
Data
Management
Resource
Management
Information
Services
WS
Core
54
Grid and Web Services Convergence
The definition of WSRF means that the Grid and Web
services communities can move forward on a common base.
55
Grid Services Example
• (from sotomayor
tutorial)
Note 1: How is this different than
- Web Services?
- Corba?
- COM/DCOM?
Note 2: This is too simple! What
about
- co-ordination/workflows
- personalization
- presentation
- security
• MathService API:
– add(int x)
– subtract(int x)
– getvalue()
OGSI
(or what is a grid service?)
• Using web service infrastructure
– MathService is defined by WSDL (like idl)
<?xml version="1.0" encoding="UTF-8"?>
...
<types>
<xsd:schema
targetNamespace="http://www.gt3tutorial.org/namespaces/0.2/core/gwsdl/Math"
attributeFormDefault="qualified"
elementFormDefault="qualified"
xmlns="http://www.w3.org/2001/XMLSchema">
<xsd:element name="add">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="value" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="addResponse">
<xsd:complexType/>
</xsd:element>
...
</types>
<gwsdl:portType name="MathPortType" extends="ogsi:GridService">
<operation name="add">
<input message="tns:AddInputMessage"/>
<output message="tns:AddOutputMessage"/>
<fault name="Fault"
message="ogsi:FaultMessage"/>
</operation>
<operation name="subtract">
<input message="tns:SubtractInputMessage"/>
<output message="tns:SubtractOutputMessage"/>
<fault name="Fault"
message="ogsi:FaultMessage"/>
</operation>
<operation name="getValue">
<input message="tns:GetValueInputMessage"/>
<output message="tns:GetValueOutputMessage"/>
<fault name="Fault"
message="ogsi:FaultMessage"/>
</operation>
</gwsdl:portType>
<message name="AddInputMessage">
<part name="parameters" element="tns:add"/>
</message>
<message name="AddOutputMessage">
<part name="parameters" element="tns:addResponse"/>
</message>
...
</definitions>
Basic Concepts
The GridService PortType
• a “grid service” is a web service that
implements the GridService PortType
<portType name="GridService">
<operation name="setServiceData"> [snip] </operation>
<operation name="destroy"> [snip] </operation>
<operation name="requestTerminationAfter"> [snip] </operation>
<operation name="requestTerminationBefore"> [snip] </operation>
<operation name="findServiceData"> [snip] </operation>
</portType>
<gwsdl:portType name="GridService">
<sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="constant" name="interface" nillable="false" type="xsd:QName"/>
<sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="mutable" name="serviceDataName" nillable="False" type="xsd:QName"/>
<sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="factoryLocator" nillable="true" type="ogsi:LocatorType"/>
<sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="extendable" name="gridServiceHandle" nillable="false"
type="ogsi:HandleType"/>
<sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="mutable" name="gridServiceReference" nillable="false"
type="ogsi:ReferenceType"/>
<sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="findServiceDataExtensibility" nillable="false" type="ogsi
OperationExtensibilityType"/>
<sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="setServiceDataExtensibility" nillable="false"
type="ogsi:OperationExtensibilityType"/>
<sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="terminationTime" nillable="false" type="ogsi:TerminationTimeType"/>
<sd:staticServiceDataValues>
<ogsi:findServiceDataExtensibility inputElement="ogsi:queryByServiceDataNames"/>
<ogsi:setServiceDataExtensibility inputElement="ogsi:setByServiceDataNames"/>
<ogsi:setServiceDataExtensibility inputElement="ogsi:deleteByServiceDataNames"/>
</sd:staticServiceDataValues>
</gwsdl:portType>
GridService PortType
•
•
•
•
•
•
•
•
FindServiceData()
QueryByServiceDataNames()
GetServiceData()
SetByServiceDataNames()
DeleteByServiceDataNames()
RequestTerminationAfter()
RequestTerminationBefore()
Destroy()
Capabilities of a Grid Service
•
•
•
•
•
•
2-level naming (GSH vs. GSR)
Factories
Lifetime management
Service Data Elements
Event Notification
ServiceGroups
GSH versus GSR
• A GSH (Grid Service Handle) is a unique
name for a Grid Service Instance
• A GSR (Grid Service Reference) is a
perhaps temporary mechanism to access the
Grid Service Instance
Factories
• Create new instances of services
dynamically
• Individualized Instances
• lifetime management techniques
Service Data Elements
• Generalized State
– useful for describing capability
– Get/Set model similar to javaBeans Properties
• Can specify initial values in WSDL
• Integrated with Notification mechanism
Service Data Elements:
GridService
•
•
•
•
•
•
Interface
ServiceDataName
FactoryLocator
GridServiceHandle
GridServiceReference
TerminationTime
Notifications
•
Source
– implements NotificationSourcePortType
– sends a notification message (XML Element) to Sinks
•
Sink
– implements NotificationSinkPortType
– sends a notification subscription request to source
– causes a GridService Instance of porttype NotificationSubscription to be
created
ServiceGroups
• A grid service that maintains information about
other grid services
• Can be used to implement a classic registry model
• Can be used for dataset replication
• A grid service can belong to more than one
Service Group
• Membership in a ServiceGroup can be
homogeneous or heterogeneous
• Service group portTypes are optional
Grid Services: Summary
• Extends Web Services to support Transient
Services
– WSDL 1.2 expected to include extensions
• Requires support for factories, lifetime
management, soft-state management, and
notifications
• Java implementation pretty solid
– Security implementation still shaky
Other Challenges
•
•
•
•
•
•
•
Developing user interfaces
Data Management
Scheduling/co-scheduling of resources
Failure management
Application development
Performance
Many others…
69
What I hope you got from this
talk
• Grid Computing is about
– Co-ordinated use of different resources
– Provisioning resources for increased utilization
– Scaling to large numbers of resources, services
and users
• Many systems being built
• Many Applications being developed
70