SensorGrid High Performance Web Service Architecture for Geographic Information Systems Thesis Proposal Galip Aydin [email protected].

Download Report

Transcript SensorGrid High Performance Web Service Architecture for Geographic Information Systems Thesis Proposal Galip Aydin [email protected].

SensorGrid
High Performance Web Service Architecture for
Geographic Information Systems
Thesis Proposal
Galip Aydin
[email protected]
Outline
Introduction
 Motivations
 SensorGrid Architecture
 Research Issues and Goals
 Contributions

Geographic Information Systems

A geographic information system (GIS) is a system for creating
and managing spatial data and associated attributes.

A computer system capable of






A "smart map" tool that allows users to




integrating,
storing,
editing,
analyzing,
and displaying geographically-referenced information.
create interactive queries (user created searches),
analyze the spatial information,
and edit data.
Maps are created by overlaying various geospatial
features.
Traditional GIS approach

Mostly desktop applications, require expertise
and high amount of resources.
 Centralized server-client models for webbased GIS environments.
 Cross-vendor or cross-product interoperability
is not possible without costly format
conversions.
 Most of the applications consume archived
data but with the advancements of the sensors
new applications that consume real-time data
are appearing in abundance.
Traditional GIS approach (contd.)

Limitations
Distributed nature of geospatial data.
 Proprietary data formats, and service
methodologies.
 Lack of interoperable services.


Problems
Assembling data from distributed sources
 Format conversions
 Amount of resources for geoprocessing

Open GIS Standards

Several standards bodies started developing
data standards and implementation
specifications for geospatial and location
based services.
 The goal is to make geographic information
and services neutral and available across any
network, application, or platform.
 Two major organizations are Open Geospatial
Consortium (OGC) and ISO/TC211.
OGC

Supports interoperable solutions that "geoenable" the Web. Several specifications:
 Geospatial Data: Geography Markup
Language (GML)
 Sensors:



Metadata – SensorML
Measurements – Observations & Measurements
(GML extension)
Services:



Web Feature Service
Web Map Service
Web Coverage Service etc.
Issues with Open Standards



HTTP GET/POST based services; limited data
transport capabilities (HTTP, FTP, e-mail, files
etc.)
Not Web Services; tightly coupled, point to point
communication results in centralized, synchronous
applications.
High-end scientific and complex GIS apps require:




Asynchronous communication models to cope with the high
number of participants and long-running codes.
Transfer of large data between services.
Coupling data sources and high performance tools.
Orchestrating multiple services for solving complex problems.
Motivation 1

Complex problems require GIS applications
and services to collaborate.
 Lack of service orchestration capabilities


Lack of service oriented practice causes hard to
manage distributed practices especially when large
number of participants are involved.
Coupling data sources to GIS applications

There are various types of distributed geospatial
data sources used by the GIS applications and we
need a flexible computing environment for
seamless integration.
Motivation 2

Data transport requirements


High performance


GIS require large amount of data to be transported
between sources and consumers. Current approaches
do not provide a scalable and flexible solution.
It is a must, not an option for most scientific GIS
applications. For instance evaluating pre-seismic realtime messages may lead to early warnings.
Proliferation of Sensors

Sensors introduce new challenges to the current GIS
applications in terms of data collection, management
and processing.
Motivating Examples

Pattern Informatics



Regularized Dynamic Annealing Hidden Markov
Method (RDAHMM)




Earthquake forecasting code developed by Prof. John Rundle
(UC Davis) and collaborators.
Uses seismic archives.
Time series analysis code by Dr. Robert Granat (JPL).
Can be applied to GPS and seismic archives.
Can be applied to real-time data.
Interdependent Energy Infrastructure Simulation
System (IEISS)

Models infrastructure networks (e.g. electric power systems
and natural gas pipelines) and simulates their physical
behavior, interdependencies between systems.
SOA for GIS


Utilize Web Services to realize Service Oriented
Architecture, Open GIS standards for “data format and
service interfaces” for interoperability.
We have built WS versions of:




WFS – access to geospatial data on various databases
WMS (A. Sayar) – visualization of feature data
Extended UDDI and WS-Context (M. Aktas) - supporting
dynamic service metadata and services registry.
Problems with simple WS version



Basic WFS; request-response, not asynchronous.
Performance: GI Services are not designed to handle nontrivial data transfers.
XML: Size of the geospatial data increases with XML
encoding.
GIS Data Grids


Data is in the heart of every GIS.
Easy and fast access to distributed geospatial data
is crucial especially in time of crisis or disasters.
 Points to consider:






High performance transport
Real-time observations from distributed sensors.
Unified access to geospatial data stored in relational DBs,
XML DBs and ESRI Shape files.
Leverage OGC Web Feature Service to provide standard
access and query interfaces.
Develop Web Service version of WFS and modify/extend for
high performance.
Fast population of GML Feature Collections from data in the
various DBs.
GIS Data Services

WFS Specification; transporting high volume
geospatial data encoded in GML is not trivial
with HTTP methods or pure Web Services.
 Researching use of publish/subscribe based
messaging system for large data transport
and fast response.
 Issues:



Support for multiple clients, creating topics on the fly.
Dynamic session metadata: Keeping session state and
metadata for each client and request. Use of WS-Context.
Prioritize client requests.
Real-Time Sensors




Sensors are everywhere; they are being deployed
as sensor networks for more accurate
measurements.
With the proliferation of the sensors, data
collection and processing paradigms are
changing.
Most scientific geo-applications are designed to
work with archived data.
Critical Infrastructure Systems and Crisis
management environments require fast and
accurate access to real-time sources and a
flexible/pluggable architecture for geoprocessing
of the data.
Use Case - GPS Sensors



A good example for scientific sensors are GPS station
networks.
GPS measurements are used for determining seismic
events, understanding long-term crustal movement
etc.
We have access to SOPAC GPS networks:




Currently only socket based RYO format access is available,
but not utilized!
We provide multiple format (RYO, ASCII, GML) real-time
streaming access by using NaradaBrokering topics.
OHIO and chain of filters.
We are investigating use of topic based messaging
systems for managing real-time data streams.
SensorGrid Architecture






Support both archived and real-time geospatial
data access.
Support alternate transport and representation
schemes. Use topic based messaging
infrastructure for large volume data transport.
WS-Context for managing dynamic service
metadata.
UDDI based FTHPIS as services registry.
Streaming WFS for serving archived data.
Streaming SCS for serving sensor metadata
and sensor measurements.
Framework for HP WS

Research improving Web Service performance
by using better transport protocol and XML
representation scheme.
 Virtualize representation and protocol by
binding SOAP to message-oriented
middleware.
 Handlers will negotiate protocol and convert
messages between different representations.
 WS-Context for keeping session metadata
related to methodology and specific
parameters.
Negotiation Protocol

Design a negotiation protocol for web services to negotiate:

Transport protocol


Efficient representation of XML



BXSA, bnux, BXML, MTOM, Fast Infoset, Millau, XOP, DFDL, Fast Web
Services, …
Other (Security etc.)
Try to develop strategies for determining



HTTP over TCP, Parallel TCP, UDP …
Best available protocol
Best representation for a given communication.
We will investigate use/extend of WS-Policy to build a
negotiation protocol.
 We will not develop a binary representation method but
build a framework that supports multiple binary formats.
Research Issues 1

Applying Web Service principles to GIS data services



High Performance


Should support HP data transport for GIS services.
Interoperability



We have built a WS version of WFS
Not suitable for large data sets and where quick response is
required
The system should bridge GIS and Web Service communities
by adapting standards from both.
Other GIS applications should be able to consume data
without having to do costly format conversions.
Security
Research Issues 2



Scalability
 The system should be able to handle high volume and
high rate data transport and processing.
 Plugging new sensors, data sources or
geoprocessing applications should not degrade
system’s overall performance.
Flexibility and extendibility
 Setting architectural principles for real-time Filters to
process sensor data on the fly.
 Ability to add new filters without system failures.
Quality of Service
 Is latency introduced by filter chains in processing
real-time sensor data acceptable?
 Is the system fault tolerant?
Scaling Measurements
Time
RYO
ASCII
GML
1 sec
1.5KB
4.03KB
48.7KB
1 hr
5.31MB
14.18MB
171.31MB
1 day
127.44MB
340.38MB
4.01GB
1 month
3.8GB
9.97GB
123.3GB
1 yr
45.8GB
119.67GB
1.41TB
Entire SOPAC Network
5 Networks
(47 stations)
1yr
229GB
598.35GB
7.05TB
Entire SCIGN Network
(250 stations)
1yr
1.23TB
16.18TB
160TB
1 SOPAC Network
(SDCRTN - 9 Stations)
Research Goals

Design a High Performance Web Service architecture for
distributed GIS services to support archived and real-time
geospatial data.
 Build GIS Data Services for coupling scientific applications
with various types of distributed geospatial databases.

Implement Web Service versions of



Web Feature Service for archived data
Sensor Collection Service for real-time geospatial data and sensor
metadata.
Utilize publish-subscribe based messaging infrastructure to
deploy distributed filters for processing real-time sensor
data.
 Develop a negotiation protocol for Web Services for
supporting high performance data transport.
Contribution of This Thesis




Merges two important software worlds: GIS and
Web Service Architectures.
Allows unified access to data by developing Web
Services and Open GIS standards based services
to access and manage archived and real-time
geospatial data.
Develops a novel way of deploying filter chains on
a topic based messaging system for processing
real-time streaming sensor data.
Identifies a novel approach for negotiating various
characteristics of communication between Web
Services for High Performance messaging.
Appendix
Sample GML Document
<wfs:FeatureCollection >
<gml:boundedBy>
<gml:Box>
<gml:coordinates decimal="." cs="," ts=" ">-83,25 -80,31</gml:coordinates>
</gml:Box>
</gml:boundedBy>
<gml:featureMember>
<Entity>
<CityGate>
<name>City Gate #10</name>
<id>CG10</id>
<consumptionRate>8.5579E7</consumptionRate>
<location>
<gml:Point srsName="null">
<gml:coord>
<gml:X>-85.465</gml:X>
<gml:Y>30.132</gml:Y>
<gml:Z>2.0</gml:Z>
</gml:coord>
</gml:Point>
</location>
<connections>
<id>J27</id>
</connections>
</CityGate>
</Entity>
</gml:featureMember>
<gml:featureMember>
.
.
Sample GML visualization
RYO Message Format
High Performance XML I (G. Fox)




There are many approaches to efficient “binary” representations of XML
Infosets
 MTOM, XOP, Attachments, Fast Web Services
 DFDL is one approach to specifying a binary format
Assume URI-S labels Scheme and URI-R labels realization of Scheme for
a particular message i.e. URI-R defines specific layout of information in
each message
 DFDL from GGF quite interesting for this
Assume we are interested in conversations where a stream of messages is
exchanged between two services or between a client and a service i.e. two
end-points
Assume that we need to communicate fast between end-points that
understand scheme URI-S but must support conventional representation if
one end-point does not understand URI-S
High Performance XML II (G. Fox)



First Handler Ft=F1 handles Transport protocol; it negotiates with other
end-point to establish a transport conversation which uses either HTTP
(default) or a different transport such as UDP with WSRM implementing
reliability
 URI-T specifies transport choice
Second Handler Fr=F2 handles representation and it negotiates a
representation conversation with scheme URI-S and realization URI-R
 Negotiation identifies parts of SOAP header that are present in all
messages in a stream and are ONLY transmitted ONCE
Fr needs to negotiate with Service and other handlers illustrated by F3 and
F4 below to decide what representation they will process
F1
F2
F3
F4
Container Handlers
High Performance XML III (G. Fox)



Filters controlled by Conversation Context convert messages between
representations using permanent context (metadata) catalog to hold
conversation context
Different message views for each end point or even for individual handlers
and service within one end point
 Conversation Context is fast dynamic metadata service to enable
conversions
NaradaBrokering will implement Fr and Ft using its support of multiple
transports, fast filters and message queuing;
H1
H2
H3
H4
Conversation Context
URI-S, URI-R, URI-T
Replicated Message Header
Body
Transported Message
Handler
Message View
Ft
Fr
F3
Service
Message View
F4
Container Handlers
Service
RDAHMM: GPS Time Series Segmentation (M. Pierce)
Slide Courtesy of Robert Granat, JPL
GPS displacement (3D)
length two years.
Divided automatically
by HMM into 7 classes.
Features:
• Dip due to aquifer
drainage (days 120250)
• Hector Mine
earthquake (day 626)
• Noisy period at
end of time series


Complex data with subtle signals is difficult for humans
to analyze, leading to gaps in analysis
HMM segmentation provides an automatic way to focus
attention on the most interesting parts of the time series
Traditional NaradaBrokering Features (G. Fox)
Multiple protocol
transport support
In publish-subscribe
Paradigm with different
Protocols on each link
Transport protocols supported include TCP, Parallel TCP
streams, UDP, Multicast, SSL, HTTP and HTTPS.
Communications through authenticating proxies/firewalls &
NATs. Network QoS based Routing
Allows Highest performance transport
Subscription Formats
Subscription can be Strings, Integers, XPath queries, Regular
Expressions, SQL and tag=value pairs.
Reliable delivery
Robust and exactly-once delivery in presence of failures
Ordered delivery
Producer Order and Total Order over a message type. Time
Ordered delivery using Grid-wide NTP based absolute time
Recovery and Replay
Recovery from failures and disconnects.
Replay of events/messages at any time. Buffering services.
Security
Message-level WS-Security compatible security
Message Payload options
Compression and Decompression of payloads
Fragmentation and Coalescing of payloads
Messaging Related
Compliance
Java Message Service (JMS) 1.0.2b compliant
Support for routing P2P JXTA interactions.
Grid Feature Support
NaradaBrokering enhanced Grid-FTP. Bridge to Globus GT3.
Web Services supported
Implementations of WS-ReliableMessaging, WS-Reliability