High Performance Web Service Architecture for Sensors and Geographic Information Systems Galip Aydin Geographic Information Systems    A Geographic Information System is a system for creating, storing,

Download Report

Transcript High Performance Web Service Architecture for Sensors and Geographic Information Systems Galip Aydin Geographic Information Systems    A Geographic Information System is a system for creating, storing,

High Performance Web
Service Architecture for
Sensors and Geographic
Information Systems
Galip Aydin
Geographic Information Systems



A Geographic Information System is a system
for creating, storing, sharing, analyzing,
manipulating and displaying spatial data and
associated attributes.
GIS history saw the evolution from mainframe
GIS to Desktop GIS to Distributed GIS.
Modern GIS require:
 Distributed
data access for spatial databases
 Utilizing remote analysis, simulation or visualization
tools.
Traditional Distributed GIS Approach

Problems with traditional approaches:
Distributed nature of the geo-data; various client-server
models, databases, HTTP, FTP, RDBs, XML DBs etc.
 Data format problems, conversion overheads
 Data processing issues, hardware and software
requirements, COM+/ActiveX, CORBA/IIOP frameworks


Which introduce three challenges
Assembling data from distributed repositories
 Adoption of universal standards for format
interoperability
 Interoperable services for better utilization of
computational resources

Open Geographic Standards



Open GIS Standards bodies aim to make geographic
information and services neutral and available across
any network, application, or platform.
Two major standard bodies: OGC and ISO/TC211,
former being most popular
OGC Specifications are widely accepted:




Data Format Specs: GML, SensorML, O&M
Service Specs: WFS, WMS, WCS
OGC Services are HTTP GET/POST based; limited data
transport capabilities (HTTP, FTP, files etc.)
Not Web Services; tightly coupled, point to point
communication results in centralized, synchronous
applications.
Motivations

Lack of service orchestration capabilities




Coupling data sources to scientific applications
Data transport requirements
Proliferation of Sensors


Ability to analyze data on-the-fly, continuous streaming support,
scalable systems for addition of new sensors.
High performance and high rate messaging


Complex problems require GIS applications to collaborate.
Real-time data access, rapid response systems, crisis
management etc.
From the Grids perspective


To apply general Grid/Distributed computing principles to GIS
Investigate how to integrate with geophysical and other scientific
applications
Motivating Use Cases

Pattern Informatics


Regularized Dynamic Annealing Hidden Markov
Method (RDAHMM)


Time series analysis code, can be applied to GPS and seismic
archives, can be applied to real-time data.
Interdependent Energy Infrastructure Simulation
System (IEISS)


Earthquake forecasting code developed by Prof. John Rundle
(UC Davis) and collaborators, uses seismic archives.
Models infrastructure networks (e.g. electric power systems and
natural gas pipelines) and simulates their physical behavior,
interdependencies between systems.
SOPAC GPS Networks provide real-time messages.
Research Issues 1

Applying Web Service principles to GIS data
services

Orchestration of Services, workflows, simple services are
not suitable for large data sets and where quick response
is required
 High Performance support in GIS services.

Interoperability

The system should bridge GIS and Web Service
communities by adapting standards from both.
 Other GIS applications should be able to consume data
without having to do costly format conversions.
Research Issues 2



Scalability
 The system should be able to handle high volume and
high rate data transport and processing.
 Plugging new sensors, data sources or geoprocessing
applications should not degrade system’s overall
performance.
Flexibility and extendibility
 How to develop real-time services to process sensor
data on the fly.
 Ability to add new filters without system failures.
Quality of Service Issues
 Is latency introduced by services in processing real-time
sensor data acceptable?
SOA for GIS – Geophysical Data Grid


We utilize Web Services to realize Service Oriented
Architecture, OGC data formats and application
interfaces for interoperability at both levels.
GIS Data Grid Properties





Based on the sources geospatial data can be seen as
archival and real-time data. The architecture provides
standard control and access interfaces for both types.
Supports alternate transport and representation schemes,
uses topic based messaging infrastructure for large volume
data transport.
UDDI based FTHPIS as services registry.
Streaming and non-streaming services to access archived
data.
Real-Time and near real-time services for accessing sensor
metadata and sensor measurements.
Geophysical Data Grid Architecture
Archival Data Grid
Real-Time Data Grid
GIS Grid 1 - Archival Data Services





Web Feature Service is the default OGC specification for
vector data.
We have built Web Service version of WFS for accessing
geospatial data on distributed databases.
The first Web Service version of WFS has been successfully
used in several scientific workflows with other services (WMS,
HPSearch, FTHPIS).
WFS can access multiple distributed databases, can query
other WFSs for remote features.
Problems with Web Service version of the WFS


Request-response, not asynchronous,
Performance: GI Services are not designed to handle non-trivial
data transfers. Large data requests, SOAP overhead.
 XML Encoding: Size of the geospatial data increases with GML
encoding which increases transfer times, or may cause exceptions
WFS Performance Improvements
Streaming WFS

To improve performance of the WFS:

Utilized publish/subscribe messaging system for high
performance data transfer. Similar to WFS but data and
control channel separation, allows one to many data
distribution.
 Used streaming database connection (MySQL) for
faster retrieval of the query results, and lower GML
creation overhead.
 Binary XML Frameworks are integrated for reducing XML
payload size which improves transfer times.
 Binding data transfer to Grid messaging middleware
reduces SOAP creation overhead.
WFS Interaction with services and data sources
GIS Grid Example – IEISS Integration
WMS – Ahmet Sayar
UDDI, Context Service – Mehmet Aktas
Streaming WFS Performance
NB Transfer Time Comparison
TCP
NB Server @ Indianapolis
NB Transfer Time Comparison
TCP
NB Server @ Bloomington
1,400
1000
900
1,200
800
1,000
Time (ms)
Time (ms)
700
600
500
400
300
800
600
400
200
200
100
0
0
500
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
Num ber Of Features
XML
BNUX
10,000
500
1,000
2,000
FI
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
Number Of Features
NB Transfer Tim e Com parison
TCP
NB Server @ La Jolla, CA
XML
BNUX
FI
18,000
16,000
We test the system for up to 10.000 features
 The tests reveal the performance of the streaming service with
and without Binary XML integration
 We use BNUX and Fast Infoset Binary XML Frameworks for
compressing the GML FeatureCollection documents
 The BNUX and FI timings include encoding and decoding costs

14,000
Time (ms)
12,000
10,000
8,000
6,000
4,000
2,000
0
500
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Num ber Of Features
XML
BNUX
FI
8,000
9,000
10,000
GIS Grid 2 - Real-Time Data Services





Sensors and sensor networks are being deployed
for measuring various geo-physical entities.
Sensors and GIS are closely related. Sensor
measurements are used by GIS for statistical or
analytical purposes.
With the proliferation of the sensors, data collection
and processing paradigms are changing.
Most scientific geo-applications are designed to
work with archived data.
Critical Infrastructure Systems and Crisis
Management environments require fast and
accurate access to real-time sources and a
flexible/pluggable architecture for geoprocessing of
the data.
SensorGrid Architecture

Major components:

Real-Time filters
 Grid Messaging Substrate
 Information Service



Filters can be run as Web
Services to create workflows.
Filter Chains can be
deployed for complex
processing.
Streaming messaging
provide high-performance
transfer options.
Real-Time Filters
Input Signal



Filter
Output Signal
Real-time data processing is supported by
employing filters around publish/subscribe
messaging system.
The filters are extended from a generic class
to inherit publish and subscribe capabilities.
They can be connected in parallel or serial
as chains to solve complex problems.
Filter Metadata and Chains
Parallel Operation
Serial Operation
Use Case - GPS Sensors


A good example for scientific sensors are GPS station
networks. GPS measurements are used for determining postseismic deformation, understanding long-term crustal
movement etc.
SOPAC GPS networks:





8 networks for 80 stations produce 1Hz high resolution data.
Socket based real-time binary-RYO format access is available,
but not utilized!
We developed filters to provide multiple format (RYO, ASCII,
GML) real-time streaming access.
OHIO principle and chain of filters.
We use publish/subscribe based NaradaBrokering for
managing real-time streams, topics for hierarchical
organization of the sensors.
SOPAC Real-Time Filters for GPS Streams
Application Integration with Real-Time Filters



RDAHMM
Station
Monitor
Filter Filter
records real-time
positions for 10
minutes and invokes
calculates
position changes
RDAHMM
application
which
state
Graphdetermines
Plotter
changes
in the
XYZ
Application
creates
signal.
visual representation of
the positions.
Graph
Plotter
Application creates
visual representation of
the RDAHMM output.
AJAX and Real-Time positions on Google
maps
Recording and Replaying Sensor
Streams





Filters can be used to record and replay
scenarios, such as Earthquakes in GPS case.
We developed RYO Recorder and RYO
Publisher Filters.
The RYO Recorder creates daily archives of the
GPS Streams.
RYO Publisher can be used to play daily or
certain segments of the records.
We replayed the 2004 Southern California
Earthquake using Parkfield GPS network archive
SensorGrid Performance Tests

Two Major Goals: System Stability and
Scalability
 Ensuring
stability of the distributed Filter
Services for continuous operation.
 Finding the maximum number of publishers
(sensors) and clients that can be supported
with a single broker.

Investigate if system scales for large
number of sensors and clients.
Test Methodology
Ttransfer = (T2 – T1) + (T4 – T3)




The test system consists of a NaradaBrokering server
and a three-filter chain for publishing, converting and
receiving RYO messages.
We take 4 timings for determining mean end-to-end
delivery times of GPS measurements.
The tests were run at least for 24 hours.
GridFarm001-008 servers are used in these tests.
1- System Stability Test
6
5
4
3
2
1
Tim e of the Day
Transfer Time
Standard Deviation
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
9:00
10:30
7:30
6:00
4:30
3:00
0
1:30

System Stability Test
0:00

The basic system with
three filters and one
broker.
The figure shows
average results for
every 30 minutes.
The average transfer
time shows the
continuous operation
does not degrade the
system performance.
Time (ms)

2 – Multiple Publishers Test
Multiple Publishers Test
6
5
Time (ms)
4
3
2
1
Tim e of the Day
Transfer Time


Standard Deviation
We add more GPS networks by running more
publishers.
The results show that 1000 publishers can be
supported with no performance loss. This is an
operating system limit.
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
6:00
4:30
3:00
1:30
0:00
0
3 – Multiple Clients Test
1000 Clients
Multiple Subscribers Test
40
35
Time (ms)
30
25
20
Adding clients
15
10
5
0
00 :30 :00 :30 :00 :30 :00 :30 :00 :30 :00 :30 :00 :30 :00 :30
0:
1
3
4
6
7
9 10 12 13
15 16 18 19
21 22
Time Of the Day
Transfer Time


Standard Deviation
We add more clients by running multiple Simple Filters which
subscribe to the same ASCII topic.
The system can support as many as 1000 clients with very
low performance decrease.
Extending Scalability




The limit of the basic system appears to be 1000 clients
or publishers.
This is due to an Operating System restriction of open
file descriptors (1024 for Red Hat Linux).
To overcome this limit we create NaradaBrokering
networks with linking multiple brokers.
We run 2 brokers to support 1500 clients.

Number of brokers can be increased indefinitely, so we
can potentially support any number of publishers and
subscribers.
4 – Multiple Brokers Test




Messages published to
first broker can be
received from the second
broker.
We take timings on each
broker.
We connect 750 clients to
each broker and run for
24 hours.
The results show that the
performance is very good
and similar to single
broker test.
4 – Multiple Brokers Test
Multiple Broker Test
Broker 2
Multiple Broker Test
Broker 1
40.00
35.00
35.00
30.00
30.00
Tim e Of The Day
Transit Time
Standard Deviation
Tim e Of The Day
Transfer Time
Standard Deviation
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
6:00
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
0.00
7:30
0.00
6:00
5.00
4:30
5.00
3:00
10.00
1:30
10.00
4:30
15.00
3:00
15.00
750 Clients
20.00
1:30
750 Clients
25.00
0:00
Time (ms)
20.00
0:00
Tim (ms)
25.00
Real-Time Filters Test Results




The RYO Publisher filter runs at 1Hz and
publishes 24-hour archive of the CRTN_01 GPS
network, which contains 9 GPS stations.
The single broker configuration can support
1000 clients or publishers (GPS networks - 9000
individual stations).
The system can be scaled up by creating
NaradaBrokering broker networks.
Message order was preserved in all tests.
Contributions







A SOA approach to create a common platform to support both
archival and real-time geospatial data in data-centric Grids.
Merging Web Services and Open Geographic Standards for
supporting interoperability at both data and application levels.
We have shown that the GIS Services can be implemented as
streaming services.
Integration of Binary XML Frameworks with the Streaming
Services shows performance gains for long network
distances.
We have shown that the Sensor Grids can be built on top of
the publish/subscribe middleware.
Real-Time continuous data support is realized in a Service
Architecture.
Scalable architecture implementation for large number of
sensor networks.