NIHmodelstandardsApril10-07.ppt

Download Report

Transcript NIHmodelstandardsApril10-07.ppt

Lessons on Process and
Standards in other science
communities
IMAG Model Sharing Strategies Workshop
NIH April 10 2007
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
http://grids.ucs.indiana.edu/ptliupages/presentations/
[email protected]
http://www.infomall.org
1
What is a Model Electronically?



This should have a label – a URI
It should have a collection of data or metadata defining it
It might have some way of building composite models by joining
multiple smaller models together
• Need to be able to define connections


Maybe there are also “mechanisms” to manipulate model or
evolve it in time
A computer program defines the data as values and the
mechanisms as subroutines/methods
• Programs can be Fortran, Python, C#, Prolog
• Declarative or Imperative; Scripted or Compiled

However in spite of software engineering, computer programs
are very hard to share and re-use
2
What are Questions?





What are the models we are trying to define?
What is Process to decide on needed standards and
their Syntax
Are we mainly concerned about data defining the
model and/or the programs that build the model
Where are overlaps between IMAG requirements and
other computer science or science fields
Is the barrier to sharing models “science” (i.e. it is not
clear what the common interfaces are) or
“systematization” (we agree on interface points but
don’t have a common syntax)
3
Some Examples


There are many examples of relevant efforts to encourage
sharing of models
DMSO (Defense Modeling and Simulation Office) produced
HLA (High Level Architecture) as a (pre-CORBA/Web Service)
way of defining military models as discrete event simulations
• Good but out of date

The Open Geospatial Consortium OGC
http://www.opengeospatial.org/ is a consortium of 339
organization setting excellent standards for Geographical
Information Systems
• We could develop a BIS Biological Information System?

International Virtual Observatory Alliance IVOA
http://www.ivoa.net/ is 16 organizations (each of which is a
collection like EVO the European Virtual Obsevatory) is
defining sharing standards for astronomy data
4
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
5
Galaxy Density Map
OGC Standards I
Standard
Definition
Specification
Geography
Markup
Language
(GML)
GML is an XML grammar written in XML Schema for the
modeling, transport, and storage of geographic
information. GML provides a variety of kinds of objects
for describing geography including features, coordinate
reference systems, geometry, topology, time, units of
measure and generalized values.
ISO/TC
211/WG
19136
OGC 03-105r1
Version: 3.1.0
Date:2004-02-07
Pages: 601
Observations and The general models and XML encodings for observations OGC 05-087r3
and measurements, including but not restricted to those Version: 0.13.0
Measurements
using sensors. Based on GML.
Date: 2006-02-24
(O&M)
Pages: 136
Sensor Model
Language
(SensorML)
The general models and XML encodings for sensors.
OGC 05-086
Date: 2005-10-05
Version: 1.0
Pages 110
Web
Feature
Service (WFS)
WFS allows a client to retrieve and update geospatial data
encoded in GML from multiple Web Feature Services. The
specification defines interfaces for data access and
manipulation operations on geographic features, using
HTTP as the distributed computing platform. Via these
interfaces, a Web user or service can combine, use and
manage geodata -- the feature information behind a map
image -- from different sources.
OGC 04-094
Date: 2005-05-03
Version: 1.1.0
Pages: 131
6
OGC Standards II
Standard
Definition
Specification
Web
Map
Service (WMS)
A Web Map Service (WMS) produces maps of spatially
referenced data dynamically from geographic information.
This International Standard defines a “map” to be a
portrayal of geographic information as a digital image file
suitable for display on a computer screen.
OGC 06-042
Date: 2006-03-15
Version: 1.3.0
Pages: 85
Web Coverage
Service (WCS)
WCS extends the WMS interface to allow access to
geospatial “coverages" (raster data sets) that represent
values or properties of geographic locations, rather than
WMS generated maps (pictures).
OGC 03-065r6
Date: 2003-08-27
Version: 1.0.0
Pages: 67
Catalogue
Services
Catalogue Service Implementation Specification defines a
common interface that enables diverse but conformant
applications to perform discovery, browse and query
operations against distributed heterogeneous catalog
servers.
OGC 02-087r3
Date: 2002-12-13
Version: 1.1.1
Pages: 239
Filter Encoding
Filter Encoding defines an XML encoding for filter
expressions. A filter expression constrains property values
to create a subset of a group of objects. The goal, typically,
is to operate on just those objects by, for example,
rendering them in a different color or saving them to
another format.
OGC 04-095
Date: 3 May 2005
Version: 1.1.0
Pages: 40
7
WMS uses WFS that uses data sources
<gml:featureMember>
<fault>
<name> Northridge2 </name>
<segment> Northridge2
</segment>
<author> Wald D. J.</author>
<gml:lineStringProperty>
<gml:LineString
srsName="null">
<gml:coordinates>
-118.72,34.243 118.591,34.176
</gml:coordinates>
</gml:LineString>
</gml:lineStringProperty>
</fault>
</gml:featureMember>
`
WMS
le
ec
tio
n
Fe
a
ol
tur
eC
eC
oll
Ge
tF
ea
e
r
tu
r
tu
a
Fe
a
Fe
et
G
tur
e
Client
io
ct
n
s
ad
i l ro ]
a
R [a-b
Railroads
WFS Server
Hi
River [a-d]
Bridge [1-5]
ry
SQL Query
ue
LQ
SQ
SQ
L
gw
ay
[1
2-
Q
ue
18
ry
]
Interstate
Highways
Rivers
Bridges
90
Defines Earthquake Fault
8
OGC Standards






Typify a common competition – there is a similar effort by
Technical Committee tasked by the International Standards
Organization (ISO/TC211).
Are very complex – GML specification itself is over 600 pages
Underlie the success of GIS and enabled through first through
ESRI (ArcInfo) and Minnesota Map Server and now through
Google Maps
Are built in XML (as they should be) but for efficiency one
• Transmits through binary XML
• Stores in SQL databases not in XML databases
Define some tings (catalog) which are unnecessary as provided
by a broader community
Observations and Measurements work for any time series and
so are also broader but no competition!
9
OGC Standards Structure




Have a language GML that defines the field – this
would be CellML and SBML in the case of Biology and
CML for ChemInformatics
Have a user interface (the Map) captured as a Web
Map Service
Have a “pixel data” service WCS the Web Coverage
Service
Have a “vector” (feature, property) data service WFS
the Web Feature Service
• Note any Earth Science simulation or data analysis can be
thought of as accepting WFS compatible data and producing
WFS or WCS compatible output
10
Grid Workflow Datamining in Earth Science

NASA GPS

Work with Scripps Institute
Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Earthquake
Streaming Data
Support
Archival
Transformations
Data Checking
Hidden Markov
Datamining (JPL)
Real Time
Display (GIS)
11
Data Federation

The IVOA activities is aimed largely at supporting interoperable
data repositories that can feed into the image processing filtering
needed to extract signals
• There us not so much simulation


ChemInformatics has most data in NIH’s PubChem but will
need to federate additional repositories such as those produced
by individual Chemistry groups and the raw data from NIH
screening centers
Every county (total 92) in Indiana has its own GIS and
something equivalent to a WFS holding information not yet
known to Google! (e.g. our house pinpoint address and
assessment)
• Need to federate all these to support state agencies

So federation of distributed resources a major issue and WFS
uses “capabilities” to support this
12
Indiana County Map Grid
GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature)
Servers from different vendors. Grids federate different data repositories (cf Astronomy
VO federating different observatory collections)
13
Google Maps Server
Marion County
Map Server
(ESRI ArcIMS)
Must provide adapters
for each Map Server
type .
Tile Server requests
map tiles at all zoom
levels with all layers.
These are converted
to uniform projection,
indexed, and stored.
Overlapping images
are combined.
Hamilton County
Map Server
(AutoDesk)
Adapter
Adapter
Adapter
Tile Server
Cache Server
Browser +
Google Map API
Cass County Map
Server
(OGC Web Map
Server)
Browser client fetches
image tiles for the
bounding box using
Google Map API.
The cache server
fulfills Google map
calls with cached tiles
at the requested
bounding box that fill
the bounding box.
14
Searched on Transit/Transportation
15
Service or Web service Approach


One uses GML, CML etc. to define the data in a system and one
uses services to capture “methods” or “programs”
In eScience, important services fall in three classes
• Simulations
• Data access, storage, federation, discovery
• Filters for data mining and manipulation





Services use something like WSDL (Web Service Definition
Language) to define interoperable interfaces (see OPAL talk!)
WSDL establishes a “contract” independent of implementation
between two services or a service and a client
Services should be loosely coupled which normally means they
are coarse grain
Services will be composed (linked together) by mashups
(typically scripts) or workflow (often XML – BPEL)
Software Engineering and Interoperability/Standards are closely
related
16
Philosophy of Web Service Grids





Much of Distributed Computing was built by natural
extensions of computing models developed for sequential
machines
This leads to the distributed object (DO) model represented
by Java and CORBA
• RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java
Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly
• Distributed Objects Replaced by Services
Note CORBA was considered too complicated in both
organization and proposed infrastructure
• and Java was considered as “tightly coupled to Sun”
• So there were other reasons to discard
Thus replace distributed objects by services connected by
“one-way” messages and not by request-response messages
17
Web services

resources
Humans
service logic
BPEL, Java, .NET
Databases
Programs
Computational resources
message processing

Web Services build
loosely-coupled,
distributed
applications, (wrapping
existing codes and
databases) based on the
SOA (service oriented
architecture) principles.
Web Services interact
by exchanging messages
in SOAP format
The contracts for the
message exchanges that
implement those
interactions are
described via WSDL
interfaces.
SOAP and WSDL

Devices
<env:Envelope>
<env:Header>
...
</env:header>
<env:Body>
...
</env:Body>
</env:Envelope>
SOAP messages
18
A typical Web Service


In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled away (inlining)
The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
Web Services
WSDL interfaces
Portal
Service
Security
WSDL interfaces
Web Services
Payment
Credit Card
Catalog
Warehouse
Shipping
control
19
CICC Web Service Infrastructure
Cheminformatics Services
Statistics Services
Database Services
Core functionality
Fingerprints
Similarity
Descriptors
2D diagrams
File format conversion
Computation functionality
Regression
Classification
Clustering
Sampling distributions
3D structures by
CID
SMARTS
3D Similarity
Docking scores/poses by
CID
SMARTS
Protein
Docking scores
Applications
Applications
Docking
Predictive models
Filtering
Feature selection
Druglikeness
2D plots
Toxicity predictions
Arbitrary R code (PkCell)
Mutagenecity predictions
PubChem related data by
Anti-cancer activity predictions
Pharmacokinetic parameters
CID, SMARTS
OSCAR Document Analysis
InChI Generation/Search
Computational Chemistry (Gamess, Jaguar etc.)
Grid Services
Varuna.net
Quantum Chemistry
Portal Services
Service Registry
Job Submission and Management
Local Clusters
IU Big Red
TeraGrid, Open Science Grid
RSS Feeds
User Profiles
Collaboration as in Sakai
Where Does The Functionality Come From?
University of
Michigan
 PkCell
gNova Consulting
DigitalChemistry
 BCI fingerprints
 DivKMeans
Cambridge University

InChi generation / search

OSCAR
NIH
CDK

PubChem
 PubMed

Cheminformatics
European Chemicals
Bureau
 ToxTree toxicity predictions
OpenEye
 Docking
Indiana University

VOTables

NCI DTP predictions

Database services
R
Foundation
 R package
Service Modeling Language (SML)




Submitted to W3C by industry giants 21 March 2007
A model in SML is realized as a set of interrelated XML
documents. The XML documents contain information about the
parts of an IT service, as well as the constraints that each part
must satisfy for the IT service to function properly. Constraints
are captured in two ways:
Schemas – these are constraints on the structure and content of
the documents in a model. SML uses a profile of XML Schema
1.0 as the schema language. SML also defines a set of extensions
to XML Schema to support inter-document references.
Rules – are Boolean expressions that constrain the structure and
content of documents in a model. SML uses a profile of
Schematron (goes between documents) and XPath 1.0 for rules.
22
Models in SML





Models focus on capturing all invariant aspects of a service/system that
must be maintained for the service/system to be functional.
Models are units of communication and collaboration between designers,
implementers, operators, and users; and can easily be shared, tracked, and
revision controlled. This is important because complex services are often
built and maintained by a variety of people playing different roles.
Models drive modularity, re-use, and standardization. Most real-world
complex services and systems are composed of sufficiently complex
parts. Re-use and standardization of services/systems and their parts is a
key factor in reducing overall production and operation cost and in
increasing reliability.
Models represent a powerful mechanism for validating changes before
applying the changes to a service/system. Also, when changes happen in a
running service/system, they can be validated against the intended state
described in the model. The actual service/system and its model together
enable a self-healing service/system – the ultimate objective. Models of a
service/system must necessarily stay decoupled from the live service/system to
create the control loop
Models enable increased automation of management tasks. Automation
facilities exposed by the majority of IT services/systems today could be
driven by software – not people – for reliable initial realization of a
service/system as well as for ongoing lifecycle management.
23
Structured v Unstructured Metadata




The schema’s that are defined by GML etc. are
structured definitions
The traditional semantic web approach is largely based
on structured metadata (OWL) that one can analyze
precisely
UML was for example used by OGC in developing
standards
In the “real world”, unstructured annotation has been
very successful as seen in Connotea, del.icio.us and
CiteULike
24
How to set standards

If one is Google, you can just define the standard and not bother
to discuss it!
• Google maps does not support OGC standards






The growth in distributed computing has spurred a great deal of
standards work as we need the different parts of system built by
different people
Often meet every few weeks to build a standard in 12 months
OASIS defines a process and doesn’t define an architecture
W3C is most prestigious
OGF Open Grid Forum has an eScience section that is currently
led by me
Or do it outside any standards body as in fact most domain
specific standards are done
• Note IVOA has meetings from time to time at OGF to coordinate their
astronomy standards with general Grid standards
25
The Grid and Web Service Institutional Hierarchy
4: Application or Community of Interest (CoI)
Specific Services such as “Map Services”, “Run
BLAST” or “Simulate a Missile”
XBML
XTCE VOTABLE
CML
CellML
3: Generally Useful Services and Features
(OGSA and other GGF, W3C) Such as “Collaborate”,
“Access a Database” or “Submit a Job”
OGSA GS-*
and some WS-*
GGF/W3C/….
XGSP (Collab)
2: System Services and Features
(WS-* from OASIS/W3C/Industry)
Handlers like WS-RM, Security, UDDI Registry
1: Container and Run Time (Hosting)
Environment (Apache Axis, .NET etc.)
Must set standards to get interoperability
WS-* from
OASIS/W3C/
Industry
Apache Axis
.NET etc.
26
The Ten areas covered by the 60 core WS-* Specifications
WS-* Specification Area
Examples
1: Core Service Model
XML, WSDL, SOAP
2: Service Internet
WS-Addressing, WS-MessageDelivery; Reliable
Messaging WSRM; Efficient Messaging MOTM
3: Notification
WS-Notification, WS-Eventing (Publish-Subscribe)
4: Workflow and Transactions
BPEL, WS-Choreography, WS-Coordination
5: Security
WS-Security, WS-Trust, WS-Federation, SAML,
WS-SecureConversation
6: Service Discovery
UDDI, WS-Discovery
7: System Metadata and State
WSRF, WS-MetadataExchange, WS-Context
8: Management
WSDM, WS-Management, WS-Transfer
9: Policy and Agreements
WS-Policy, WS-Agreement
10: Portals and User Interfaces
WSRP (Remote Portlets)
27
Activities in Global Grid Forum Working Groups
GGF Area
GS-* and OGSA Standards Activities
1: Architecture
High Level Resource/Service Naming (level 2 of slide 6),
Integrated Grid Architecture
2: Applications
Software Interfaces to Grid, Grid Remote Procedure Call,
Checkpointing and Recovery, Interoperability to Job Submittal services,
Information Retrieval,
3: Compute
Job Submission, Basic Execution Services, Service Level Agreements
for Resource use and reservation, Distributed Scheduling
4: Data
Database and File Grid access, Grid FTP, Storage Management, Data
replication, Binary data specification
and interface, High-level
publish/subscribe, Transaction management
5: Infrastructure
Network measurements, Role of IPv6 and high performance
networking, Data transport
6: Management
Resource/Service configuration, deployment and lifetime, Usage
records and access, Grid economy model
7: Security
Authorization, P2P and Firewall Issues, Trusted Computing
28
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and
databases
Service
Data
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service 29
Two-level Programming II




The Grid is discussing the composition of distributed
services with the runtime Service1
Service2
interfaces to Grid as
opposed to UNIX
Service3
Service4
pipes/data streams
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
Such interpretative environments are the single
processor analog of Grid Programming
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
30
Grid Workflow Data Assimilation in Earth Science

Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Typical
graphical
interface to
service
composition
31
3 Layer Programming Model
Application
(level 1 Programming)
Application Semantics (Metadata, Ontology)
Level 2 “Programming”
MPI Fortran C++ etc.
Semantic Web
Basic Web Service Infrastructure
Web Service 1
WS 2
WS 3
WS 4
Workflow (level 3) Programming BPEL
Workflow can be built on top of NaradaBrokering as messaging layer
32
Raw Data 
S
S
S
S
FS
FS
FS
FS
MD
FS
MD
O
S
FS
O
S
FS
F
S
FS
MD
MD
SS
O
S
FS
FS
O
S
FS
MD
O
S
FS
F
S
O
S
MD
Filter Service
FS
O
S
FS
Other
Service
MD
O
S
FS
MetaData
SS
S
S
Database
O
S
FS
SS
Another
Grid
FS
O
S
O
S
SS
Decisions
MD
MD
FS
SS
FS
S
S
O
S
SS
Another
Service
 Wisdom
Knowledge
Another
Grid
FS
SS
Information 
S
S
Another
Grid
Data 
S
S
S
S
Another
Service
S
S
S
S
S
S
S
S
S
S
S
S
Sensor Service
33
Information Management/Processing




SOAP messages transport information expressed in a
semantically rich fashion between sources and services that
enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL help us have
rich expressivity
Data  Information  Knowledge transformation
We build application specific information
management/transformation systems ASIS for each application
domain
One special domain is the system itself where the metadata
associated with services, sessions, Grids, messages, streams and
workflow is itself managed and supported by an SIIS
34
Generalizing a GIS

Geographical Information Systems GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store,
access, query, manipulate and display geographical features
• In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above

However such a universal information model has not
been developed in other areas even though there are
many fields in which it appears possible
•
•
•
•
•
BIS Biological Information System
MIS Military Information System
IRIS Information Retrieval Information System
PAIS Physics Analysis Information System
SIIS Service Infrastructure Information System
35
ASIS Application Specific Information System I


a) Discovery capabilities that are best done using WS-*
standards
b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key
data of application (cf coordinate transformations). Lets call this
ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past)
and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
filtered to ASL. Lets call ASBD (Application Specific Binary Data)
36
ASIS Application Specific Information System II






Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
The ASVS can itself be federated and presents an ASFS output interface
d) There should be application service interface for ASIS from which all
ASIS service inherit
e) There will be other user services interfacing to ASIS
All user and system services will input and output data in ASL using
filters to cope with ASBD
AS
Repository
Filter, Transformation, Reasoning,
Data-mining, Analysis
AS Tool
(generic)
AS Service
(user defined)
AS Tool
(generic)
ASVS
Display
AS
“Sensor”
Messages using ASL
37
Mashups v Workflow?





Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63
Workflow Tools are reviewed by Gannon and Fox
http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf
Both include
scripting in PHP,
Python, sh etc. as
both implement
distributed
programming at level
of services
Mashups use all
types of service
interfaces and do not
have the potential
robustness (security)
of Grid service
approach
Typically “pure”
HTTP (REST)
38
Web 2.0 APIs


http://www.programmableweb.com/apis currently
(March 3 2007) 388 Web 2.0 APIs with GoogleMaps the
most used in Mashups
This site acts as a “UDDI” or “OGC Catalog” for Web
2.0
The List of
Web 2.0 API’s




Each site has API
and its features
Divided into
broad categories
Only a few used a
lot (34 API’s used
in more than 10
mashups)
RSS feed of new
APIs
3 more Mashups
each day



Growing number of commercial Mashup Tools
For a total of 1609
March 3 2007
Note ClearForest
runs Semantic Web
Services Mashup
competitions (not
workflow
competitions)
Some Mashup
types: aggregators,
search aggregators,
visualizers, mobile,
maps, games
APIs/Mashups per Protocol Distribution
google
maps
Number of
APIs
Number of
Mashups
del.icio.us
virtual
earth
411sync
yahoo! search
yahoo! geocoding
technorati
netvibes
yahoo! images
trynt
yahoo! local
amazon
ECS
google
search
flickr
SOAP
ebay
youtube
amazon S3
REST
live.com
XML-RPC
REST,
XML-RPC
REST,
XML-RPC,
SOAP
REST,
SOAP
JS
Other