Transcript Document

Services and the Semantic
Grid
SKG2005 Beijing China November 28 2005
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
[email protected]
http://www.infomall.org
1
Data Deluged Science




In the past, we worried about data in the form of parallel I/O or
MPI-IO, but we didn’t consider it as an enabler of new science
and new ways of computing
Data assimilation was not central to HPCC
DoE ASCI set up because didn’t want test data!
Now particle physics will get 100 petabytes from CERN
• Nuclear physics (Jefferson Lab) in same situation
• Use around 30,000 CPU’s simultaneously 24X7




Weather, climate, solid earth (EarthScope)
Bioinformatics curated databases (Biocomplexity only 1000’s of
data points at present)
Virtual Observatory and SkyServer in Astronomy
Environmental Sensor nets
2
Information/Knowledge Grids


Distributed (10’s to 1000’s) of data sources (instruments,
file systems, curated databases …)
Data Deluge: 1 (now) to 100’s petabytes/year (2012)
• Moore’s law for Sensors




Possible filters assigned dynamically (on-demand)
• Run image processing algorithm on telescope image
• Run Gene sequencing algorithm on compiled data
Needs decision support front end with “what-if”
simulations
Metadata (provenance)
critical to annotate data
Integrate across experiments
as in multi-wavelength
astronomy
Data Deluge comes from pixels/year available
3
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
S
SS
FS
FS
Filter Service
MD
MD
FS
SS
O
S
SS
FS
O
S
FS
O
S
FS
FS
FS
SS
FS
FS
O
S
FS
FS
MD
O
S
MD
O
S
FS
F
S
O
S
MD
FS
MD
F
S
FS
MD
MD
SS
O
S
FS
FS
SS
SS
O
S
FS
O
S
FS
Other
Service
MD
O
S
FS
MetaData
SS
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
Sensor Service
4
Database
Semantic Grid and Services








Implications of SOA (Service Oriented Architectures) for SG
(Semantic Grid)
• Build services to implement SG
Implications of SG for SOA
• Build metadata rich systems of services using SG
Services receive data in SOAP messages, manipulate it and
produce transformed data as further messages
Meta-data is carried in SOAP messages
Meta-data controls processing and transport of SOAP Messages
Knowledge is created from data by services
The Grid enhances Web services with semantically rich system
and application specific management
One must exploit and work around the different approaches to
meta-data and their manipulation in Web Services
5
Structure of SOAP Messages
Container Workflow
H1
H2
H3
H4
Body
F1
F2
F3
F4
Service
Container Handlers





SOAP Messages have System information in the header
including WS-Policy based meta-data defining processing
options
• Processed by Handlers
Application data and meta-data is the body (controversies here!)
• Processed by the Service itself
Some meta-data like WS-RF is logically “only in messages”
Other like that in WS-Context or the SRB are stored in logical
equivalent of XML databases
We only need to preserve semantic structure (XML/SOAP
Infoset) so transport in fast XML and store in efficient relational
databases
6
What Type of Services are there?





There are a horde of support services supplying security,
collaboration, database access, user interfaces
The support services are either associated with system or
application
• We will study the WS-* and GS-* which implicitly or
explicitly define many support services
There are generalized filter services which are applications that
accept messages and produce new messages with some data
derived from that in input
• Simulations (including PDE’s and reactive systems)
• Data-mining
• Transformations
• Agents
• Reasoning
are all termed filters here
There are services like “author ontology”, “parse RDF” or
“attach provenance” that directly support Semantic Grid
But all services and their interactions are bathed in sea of metadata and so implicitly need and support the Semantic Grid
7
It’s a Composite Hierarchical World





Filters can be a workflow which means they are “just collections
of other simpler services”
• One needs meta-data to control the workflow
Services are programs that accept messages and produce
messages
Grids are a distributed collection of services supporting
managed shared resources
• Management requires meta-data
Grids are distributed systems that accept distributed messages
and produce distributed result messages
• Can always talk about Grids and view a service or a
workflow as a special case of a Grid
It just requires meta-data to send a message to a Grid and it
routed to “correct computer” holding “requested service”
8
• Meta-data allows mapping of virtual to real addresses
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
SOAP Message
Streams
SS
S
Another
Service
Filter Service
FS
Wisdom
MD
Data
FS
SS
Raw Data
FS
Data
FS
Raw Data
O
S
O
FS
Knowledge
S
O
S
MD
Information
FS
MD
SS
FS
SS
FS
O
S
FS
FS
FS
MD
F
S
MD
Knowledge
O
S
MD
F
S
Information
O
S
O
S
FS
Other
Service
MD
O
S
DataFS
FS
O
S
FS
MD
Data
FS
Decisions
O
S
FS
FS
SS
SS
MD
O
S Information FS
SS
Another
Service
FS
MetaData
SS
S
S
Another
Database
Grid
S
S
Raw Data
S
S
S
S
Grids of Grids Architecture
S
S
S
S
S
S
S
S
Raw Data
SOAP
Message Streams
Another
Grid
S
S
Sensor Service
is same as outward
facing application
9
service
The Grid and Web Service Institutional Hierarchy
4: Application or Community of Interest
Specific Services
such as “Run BLAST” or “Look at Houses for sale”
3: Generally Useful Services and Features
Such as “Access a Database” or “Submit a Job” or “Semantic
Grid” or “Support a Portal” or “Collaborative Visualization”
2: System Services and Features
Handlers like WS-RM, Security, Programming Models like BPEL
or Registries like UDDI
1: Container and
Run Time (Hosting) Environment
OGSA
and other
GGF/W3C/
………
WS-* from
OASIS/W3C/
Industry
Apache Axis
.NET etc.
10
The WS-* Infrastructure

Core Grid Services build on and/or extend the 60 or so
WS-* Infrastructure specifications which define
• 1. Container Model, XML, WSDL …
• 2. Service Internet ( (Reliable) Messaging, Addressing)
including extensions for high performance transport and
representation. This is natural basis for streaming
applications
• 3. Notification
• 4. Workflow and Transactions
• 5. Security
• 6. Service Discovery
• 7. Metadata and State including lifetime These categories
are directly connected
• 8. Management (service interactions)
to metadata
• 9. Policy, Agreements
• 10. Portals and User Interfaces
11
A List of Web Services 6
• 6) Service Discovery
• UDDI (Broadly Supported OASIS Standard) V3 August
2003
• WS-Discovery Web services Dynamic Discovery
(Microsoft, BEA, Intel …) February 2004
• WS-IL Web Services Inspection Language, (IBM,
Microsoft) November 2001
• Note WS-Context as a metadata catalog and WSManagement Catalog are examples of related services
• There
are many
UDDI
extensions
as Grimoires from
Discovery
is just
accessing
part ofsuch
meta-data
UK
OMIIawhich
defining
Grid often are essentially providing semantic
enrichment
12
A List of Web Services 7
• 7) Metadata and State
• RDF Resource Description Framework (W3C) Set of
recommendations expanded from original February 1999 standard
• DAML+OIL combining DAML (Darpa Agent Markup Language)
and OIL (Ontology Inference Layer) (W3C) Note December 2001
• OWL Web Ontology Language (W3C) Recommendation February
2004
• WS-MetadataExchange Web Services Metadata Exchange (BEA,
IBM, Microsoft, SAP, Sun …) September 2004
• ASAP Asynchronous Service Access Protocol (OASIS) with V1.0
working draft 2B December 11 2004
• WS-GAF Web Service Grid Application Framework (Arjuna,
Newcastle University) August 2003
• WBEM Web-Based Enterprise Management including CIM
(Common Information Model) from DMTF (Distributed
13
Management Task Force) 2004-2005
A List of Web Services 7
• 7) Metadata and State: Resource Framework
• WS-RF Web Services Resource Framework (OASIS)
including
• WS-Resource Framework Web Services Resource 1.2
(OASIS) Public Review Draft 01, 10 June 2005
• WS-ResourceProperties Web Services Resource
Properties V1.2 Public Review Draft 01, 10 June 2005
• WS-ResourceLifetime Web Services Resource Lifetime
V1.2 Public Review Draft 01, 13 June 2005
• WS-ServiceGroup Web Services Service Group V1.2
PublicWS-*
Review
Draftsyntax
01, 10ofJune
2005 (RDF
These
define
Meta-data
• OWL
WS-BaseFaults
Web to
Services
Base
Faults
V1.2 Public
CIM) and how
use it in
system
(WSReview Draft 01, June
13, 2005 headers (WS-RF) 14
MetadataExchange)
– especially
Metadata and Service Context





Consider a collection of services working together
• Workflow tells you how to specify service
interaction but more basically there is shared
information or context specifying/controlling
collection
WS-RF and WS-GAF have different approaches to
contextualization – supplying a common “context”
which at its simplest is a token to represent state
More generally core shared information includes
dynamic service metadata and the equivalent of
configuration information.
Two services linked by a stream are perhaps simplest
example of a collection of services needing context
Note that there is a tension between storing
metadata in messages and services.
• This is shared versus distributed memory debate in
parallel computing
15
Stateful Interactions


There are (at least) four approaches to specifying state
• OGSI use factories to generate separate services for
each session in standard distributed object fashion
• Globus GT-4 and WSRF use metadata of a resource
to identify state associated with particular session
• WS-GAF uses WS-Context to provide abstract
context defining state. Has strength and weakness
that reveals less about nature of session
• WS-I+ “Pure Web Service” leaves state specification
the application – e.g. put a context in the SOAP body
I think we should smile and write a great metadata
(semantic) service hiding all these different models for
state and metadata
16
Role of WS-Context





There are many WS-* specifications addressing meta-data
and both many approaches and many trade-offs
We hear about Distributed Hash Tables (Chord) to achieve
scalability in large scale networks
Managed dynamic workflows as in sensor integration and
collaboration require
• Fault-tolerance and ability to support dynamic changes
with few millisecond delay
• But only a modest number of involved services (up to
1000’s in a session)
• Need Session NOT Service/Resource meta-data so don’t use
WS-RF
We are building a WS-Context compliant metadata catalog
supporting distributed or central paradigms – see later talk by
Mehmet Aktas
Use for OGC Web catalog service with UDDI for slowly
varying meta-data
17
A List of Web Services 8
• 8) Management
• WS-DistributedManagement Web Services
Distributed Management Framework with MUWS
and MOWS below (OASIS)
• WSDM-MUWS Web Services Distributed
Management: Management Using Web Services
(OASIS) OASIS Standard March 9 2005
• WSDM-MOWS Web Services Distributed
Management: Management of Web Services
(OASIS) OASIS Standard March 9 2005
18
A List of Web Services 8- Contd
• 8) Management: Microsoft Stack
• WS-Management Web Services for Management
(Microsoft, Intel, Sun …) August 2005
• WS-Management Catalog The WS-Management
Catalog (Microsoft, Intel, Sun …) August 2005
• WS-Transfer Web Service Transfer (Microsoft,
BEA, Sonic Software etc.) September 2004
• WS-Enumeration Web Service Enumeration
(Microsoft,
BEA,
Sonic
Software
etc.)
September
These WS-* define exchange of data and meta-data
2004 services
between
19
A List of Web Services 9
• 9) General Service Characteristics
• WS-PolicyFramework Web Services Policy
Framework (BEA, IBM, Microsoft, SAP …) September
2004
• WS-PolicyAttachment Web Services Policy
Attachment (BEA, IBM, Microsoft, SAP …)
September 2004
• WS-PolicyAssertions Web Services Policy Assertions
Language (BEA, IBM, Microsoft, SAP) 18 December
These
WS-* defineby
syntax
of Meta-data defining
2002 (Superseded
WS-PolicyFramework)
of distributed
SystemAgreement Specification
• structure
WS-Agreement
Web Services
Grids
are
managed
(meta-data
enhanced)
(GGF under development) 9 August 2004
20
distributed collections of Internet Scale services
Activities in Global Grid Forum Working Groups
GGF Area
Standards Activities
1: Architecture
High Level Resource/Service Naming (level 2 of fig. 1),
Integrated Grid Architecture
2: Applications
Software Interfaces to Grid, Grid Remote Procedure Call,
Checkpointing and Recovery, Interoperability to Job Submittal services,
Information Retrieval,
3: Compute
Job Submission, Basic Execution Services, Service Level Agreements
for Resource use and reservation, Distributed Scheduling
4: Data
Database and File Grid access, Grid FTP, Storage Management, Data
replication, Binary data specification
and interface, High-level
publish/subscribe, Transaction management
5: Infrastructure
Network measurements, Role of IPv6 and high performance
networking, Data transport
6: Management
Resource/Service configuration, deployment and lifetime, Usage
records and access, Grid economy model
7: Security
Authorization, P2P and Firewall Issues, Trusted Computing
Use the sea of meta-data supported by Semantic Grid
21
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and
databases
Service
Data
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service 22
Two-level Programming II




The Grid is discussing the composition of distributed
services with the runtime Service1
Service2
interfaces to Grid in
analogy to UNIX
Service3
Service4
pipes/data streams
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
Such interpretative environments are the single
processor analog of Grid Programming
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
23
3 Layer Programming Model
Web Service 1
WS 2
WS N-1
Web Service N
Level 1 Programming inside services
Application expressed in in Java Fortran C++ MPI etc.
WS-* Infrastructure
Level 2 Programming choosing services by virtualization
Application Semantics (Metadata, Ontology) Semantic Grid
Level 3 Grid Programming composing multiple services
Service Workflow, Transactions, Mediation
Substantial work in UK e-Science program,
international semantic web community
24
Information Architecture and Semantic Grid




WS-* provides key low level capability but deliberately
does not define an information (data) architecture and
leaves this to domain specific specification activities such
as CellML/SBML for biology, WFS/GML for GIS and
XGSP for Collaboration
WS-* does define a primitive service discovery (UDDI)
and meta-data capabilities including WS-Context, WSRF, RDF and WS-MetadataExchange already discussed.
GGF defines Grid data capabilities including info-D
(publish/subscribe) and OGSA-DAI for data repositories
Semantic Grid uses WS-* and GS-* extending meta-data
and service discovery with data-mining and reasoning
25
3 XML Databases of Importance






WS-Context controlling a workflow
(Extended) UDDI supporting semantic service discovery
WFS or ASFS (see later) provides application specific
data/meta-data repository)
These have different performance, scalability and data unit size
requirement
In our implementation, each is currently “just an
Oracle/MySQL” database front ended by filters that convert
between XML (GML for WFS) and object-relational Schema
• Example of Semantics (XML) versus representation (SQL)
difference
OGSA-DAI offers Grid interface to databases – we could use but
don’t as we only need to expose WFS and not MySQL to Grid
26
Information Management/Processing




SOAP messages transport information expressed in a
semantically rich fashion between sources and services that
enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL help us have
rich expressivity
Data  Information  Knowledge transformation
We build application specific information
management/transformation systems ASIS for each application
domain
One special domain is the system itself where the metadata
associated with services, sessions, Grids, messages, streams and
workflow is itself managed and supported by an SIIS
27
Generalizing a GIS

Geographical Information Systems GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store,
access, query, manipulate and display geographical features
• In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above

However such a universal information model has not
been developed in other areas even though there are
many fields in which it appears possible
•
•
•
•
•
BIS Biological Information System
MIS Military Information System
IRIS Information Retrieval Information System
PAIS Physics Analysis Information System
SIIS Service Infrastructure Information System
28
ASIS Application Specific Information System I


a) Discovery capabilities that are best done using WS-*
standards
b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key
data of application (cf coordinate transformations). Lets call this
ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past)
and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
filtered to ASL. Lets call ASBD (Application Specific Binary Data)
29
ASIS Application Specific Information System II






Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
The ASVS can itself be federated and presents an ASFS output interface
d) There should be application service interface for ASIS from which all
ASIS service inherit
e) There will be other user services interfacing to ASIS
All user and system services will input and output data in ASL using
filters to cope with ASBD
AS
Repository
Filter, Transformation, Reasoning,
Data-mining, Analysis
AS Tool
(generic)
AS Service
(user defined)
AS Tool
(generic)
ASVS
Display
AS
“Sensor”
Messages using ASL
30
Directly GS-* WS-*
Filters/ASTT
Military
Information
Management
System
Everything
Is a
Service
or a message/
Information
Nugget
ASVS
31
MIO
or Military
Information
Object
ASFS
Unit of
Managed
Information
expressed in
ASL
OGSA-DAI and Sensor Standards
Info-D
WS-Notification
WS-Eventing
32
Information
Resource
IS =
Information
Service
(Sensor,
Service or
Repository)
BFS
=
Receive
Request/Select
Issue
Request/Select
Get
Status
Request
Status
ASL
Data Get
ASL
Data Put
Filter Resource
Basic Filter
Service
Receive
Request/Select
Get
Status
ASL
Data Get
Filters either transform or aggregate Information
33
A Filter Service is a general workflow
(the microscopic workflow) of Basic
Filter Services
BFS
FS
=
BFS
BFS
BFS
BFS
The output of a Filter Service is
indistinguishable from that of an IS
BFS
A transport link supports asynchronous publish/subscribe semantics
and Web Service Reliable messaging fault tolerance
Transport links can be multicast to support collaboration (typically
for last link before or after Presentation Service) or replication for
fault tolerance.
34
Top IS could be produced by a Filter Service
IS Gridlet
IS
IS
IS
FS
FS
FS
=
FS
The basic unit (Gridlet) transforms and aggregates
application specific information
Gridlets are composed using Grid of Grids concept
35
IS Gridlet
IS Gridlet
IS Gridlet
IS Gridlet
IS Gridlet
Federation
Macrosopic Workflow
Session
Management
IS Gridlet
Search
Planning
Construction
Management
Portal
Presentation
ASVS
IS
Gridlet
IS Gridlet
General System
Services
----------------------Messaging/Data
transport
Notification
Security
Fault Tolerance
Metadata
Directory
Collaboration
Replica
Management
36
Data  Information  Knowledge as messages flow from original sources to top of Filter Grid
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
SOAP Message
Streams
SS
S
Another
Service
Filter Service
FS
Wisdom
MD
Data
FS
SS
Raw Data
FS
Data
FS
Raw Data
O
S
O
FS
Knowledge
S
O
S
MD
Information
FS
MD
SS
FS
SS
FS
O
S
FS
FS
FS
MD
F
S
MD
Knowledge
O
S
MD
F
S
Information
O
S
O
S
FS
Other
Service
MD
O
S
DataFS
FS
O
S
FS
MD
Data
FS
Decisions
O
S
FS
FS
SS
SS
MD
O
S Information FS
SS
Another
Service
FS
MetaData
SS
S
S
Another
Database
Grid
S
S
Raw Data
S
S
S
S
Grids of Grids Architecture
S
S
S
S
S
S
S
S
Raw Data
SOAP
Message Streams
Another
Grid
S
S
Sensor Service
is same as outward
facing application
37
service
Summary




Virtualization everywhere
Focus on semantics not representation to get
performance combined with expressivity for transport
and data access
All this enabled by powerful meta-data services
Grids add management to rich but potentially chaotic
set of Web Services;
• management and coherence enabled by meta-data




Can define general information architectures (ASIS,
GIS, SIIS) for both applications and system
Knowledge from filters that span simulations, datamining, reasoning and agents
A service is just a special case of a Grid
Build systems from SubGrids (Gridlets)
38