Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids Galip Aydin Department Of Computer Science Indiana University 1/15/2007

Download Report

Transcript Service Oriented Architecture for Geographic Information Systems Supporting Real Time Data Grids Galip Aydin Department Of Computer Science Indiana University 1/15/2007

Service Oriented Architecture for
Geographic Information Systems
Supporting Real Time Data Grids
Galip Aydin
Department Of Computer Science
Indiana University
1/15/2007
1
Geographic Information Systems




A Geographic Information System is a system for creating,
storing, sharing, analyzing, manipulating and displaying spatial
data and associated attributes.
GIS history saw the evolution from mainframe GIS to Desktop
GIS to Distributed GIS.
Modern GIS require:
 Distributed data access for spatial databases
 Utilizing remote analysis, simulation or visualization tools.
Problems with traditional distributed GIS approaches:
 Distributed nature of the geo-data; various client-server
models, databases, HTTP, FTP, RDBs, XML DBs etc.
 Data format problems, conversion overheads
 Data processing issues, hardware and software
requirements, COM+/ActiveX, CORBA/IIOP frameworks
2
Open Geographic Standards





Open GIS Standards bodies aim to make geographic information
and services neutral and available across any network,
application, or platform.
Two major standard bodies: OGC and ISO/TC211, former being
most popular
OGC Specifications are widely accepted:
 Data Format Specs: GML, SensorML, O&M
 Service Specs: WFS, WMS, WCS
OGC Services are HTTP GET/POST based; limited data transport
capabilities.
Request-response type services; centralized, synchronous
applications.
3
PBO and CRTN GPS Stations
Plate Boundary Observatory (PBO) GPS
Stations in North America
California Real-Time GPS Network (CRTN).
4
Requirements for a GIS/Sensor Grid

Requirements of service orchestration capabilities




Coupling data sources to scientific applications
Data transport requirements
Proliferation of Sensors


Ability to analyze data on-the-fly, continuous streaming support,
scalable systems for addition of new sensors.
High performance and high rate messaging


Complex problems require GIS applications to collaborate.
Real-time data access, rapid response systems, crisis
management etc.
From the Grids perspective the Motivations are


To apply general Grid/Distributed computing principles to GIS
Investigate how to integrate with geophysical and other scientific
applications with data sources
5
Motivating Use Cases

Very successful and highly acclaimed earthquake science
applications




Pattern Informatics (PI) - UC Davis
• Earthquake forecasting code developed by Prof. John Rundle (UC
Davis) and collaborators, uses seismic archives.
Regularized Dynamic Annealing Hidden Markov Method
(RDAHMM) – NASA/JPL
• Time series analysis code, can be applied to GPS and seismic
archives. It can be applied to real-time and archival data.
SOPAC GPS Networks provide real-time messages – UCSD/SIO
• 8 networks for 80 stations produce 1Hz high resolution data. The
signatures of GPS Sensors are used in Earthquake forecasting.
Interdependent Energy Infrastructure Simulation System
(IEISS) - LANL

Models infrastructure networks (e.g. electric power systems and
natural gas pipelines) and simulates their physical behavior,
interdependencies between systems.
6
Research Issues 1

Applying Web Service principles to GIS data services
Orchestration of Services, workflows. We need services
suitable for large data sets and where quick response is
required.
 High Performance support in GIS services
 The performance problem must be addressed in a
complete and general framework supporting different data
requirements


Interoperability


The system should bridge GIS and Web Service
communities by adapting standards from both.
Other GIS applications should be able to consume data
without having to do costly format conversions.
7
Research Issues 2

Scalability



Flexibility and extendibility



The system should be able to handle high volume and high
rate data transport and processing.
Plugging new sensors, data sources or geo-processing
applications should not degrade system’s overall
performance.
How to develop real-time services to process sensor data
on the fly.
Ability to add new filters without system failures.
Quality of Service Issues

Is latency introduced by services in processing real-time
sensor data acceptable?
8
SOA for GIS – Geophysical Data Grid

To create a GIS Data Grid (Geophysical Grid) Architecture we
utilize



Web Services to realize Service Oriented Architecture
OGC data formats and application interfaces to achieve
interoperability at both data and service levels.
GIS Data Grid Features




Depending on the source, geospatial data can be archival or realtime. The architecture provides standard control and access
interfaces for both types.
Supports alternate transport and representation schemes, uses
topic based messaging infrastructure for data and message
exchange.
Streaming and non-streaming services to access archived data.
Real-Time and near real-time filter services for accessing sensor
metadata and sensor measurements.
9
GIS Grid Usage Model – Earthquake Science
Supporting geophysical repositories and real-time
sensors is essential
 To analyze a typical earthquake it is important to
access to precise measurements of the initial
earthquakes and aftershocks
 To support earthquake forecasting and the time
and spatial positions of the forecasts




PI can be used with existing data
RDAHMM can be used with the real-time data
Earth Science field is moving from a previously
data poor field to a data rich world. We will have
thousands of sensors spread around the world.
(i.e. GPS sensors, InSAR satellites)
10
GIS Grid Components


Filter Services for Real-Time data support
OGC Web Feature Service (WFS) for archival data
support





Web Service version
Streaming version, which introduces data and control
channel separation
All control goes through SOAP messages, data is
transferred by a variety of transport mechanisms which
are implied by the control message.
Publish-Subscribe system for message and data
exchange
UDDI based service registry (by Mehmet Aktas)
11
Geophysical Data Grid Architecture
Archival Data Grid
Real-Time Data Grid
12
GIS Grid Part 1 - Real-Time Data Services





Sensors and sensor networks are being deployed for
measuring various geo-physical entities.
Sensors and GIS are closely related. Sensor
measurements are used by GIS for statistical or analytical
purposes.
With the proliferation of the sensors, data collection and
processing paradigms are changing.
Most scientific geo-applications are designed to work
with archived data.
Critical Infrastructure Systems and Crisis Management
environments require


fast and accurate access to real-time sources
a flexible/pluggable architecture for coupling geoprocessing applications with the data.
13
SensorGrid Architecture




Major components:
 Real-Time filters
 Publish-Subscribe System
 Information Service
Filters can be run as Web
Services to create workflows.
Filter Chains can be deployed
for complex processing.
Streaming messaging provides
high-performance transfer
options.
14
Real-Time Filters
Input Signal
Filter
Output Signal
Real-time data processing is supported by
employing filters around publish/subscribe
messaging system.
 The filters are extended from a generic class to
Parallel
Operationcapabilities.
inherit publish and
subscribe
 They can be connected in parallel or serial as
chains to solve complex problems.

Serial Operation
15
Use Case - GPS Sensors


GPS is used to identify long-term tectonic deformation and
static displacements. SCIGN has 250 Real-Time GPS Stations.
SOPAC GPS networks:





Our Architecture





8 networks for 80 stations produce 1Hz high resolution data.
Socket based real-time binary-RYO format access is available.
We developed filters to provide multiple format (RYO, ASCII, GML)
real-time streaming access.
OHIO principle (a general principle required by DOD) and chain of
filters.
Uses publish/subscribe based NaradaBrokering for managing realtime GPS streams
Utilizes topics for hierarchical organization of the sensors
Deploys successive data filters ranging from format translators to
data analysis codes
Could potentially be used to run RDAHMM clones to monitor state
changes in the entire GPS network
We are partner in a pioneering project to use the real-time
GPS data for the first time in this context.
16
Processing Real-Time GPS Streams
ascii2gml
ryo2ascii
RYO
Ports
ascii2pos
7010
Raw Data
Scripps
RTD
Server
7011
NB
Server
ryo2nb
Single
Station
7012
Displacement
Filter
GPS Networks
RDAHMM
Filter
Raw Data
ryo2nb
ryo2ascii
ascii2pos
Station
Health
Filter
Single
Station
RDAHMM
Filter
/SOPAC/GPS/CRTN01/RYO
/SOPAC/GPS/CRTN01/ASCII
/SOPAC/GPS/CRTN01/POS
/SOPAC/GPS/CRTN01/DSME
A Complete Sensor Message Processing Path, including a data analysis application.
17
Application Integration with Real-Time Filters



RDAHMM
Station
Monitor
Filter Filter
records real-time
positions for 10
minutes and invokes
calculates
position changes
RDAHMM
application
which
state
Graphdetermines
Plotter
changes
in the
XYZ
Application
creates
signal.
visual representation of
the positions.
Graph
Plotter
Application creates
visual representation of
the RDAHMM output.
18
Recording and Replaying Sensor Streams






Filters can be used to record and replay scenarios,
such as Earthquakes in GPS case.
We developed RYO Recorder and RYO Publisher
Filters.
The RYO Recorder creates daily archives of the GPS
Streams.
RYO Publisher can be used to play daily or certain
segments of the records.
We replayed the 2004 Southern California
Earthquake using Parkfield GPS network archive
These filters are used in the performance and
scalability tests.
19
SensorGrid Performance Tests

Two Major Goals: System Stability and
Scalability
Ensuring stability of the distributed Filter Services
for continuous operation.
 Finding the maximum number of publishers
(sensors) and clients that can be supported with a
single broker.


Investigate if system scales for larger number
of sensors and clients.
20
Test Methodology
RYO
Publisher
1
NB
Server
1
2
3
RYO To
ASCII
Converter
4
Ttransfer = (T2 – T1) + (T4 – T3)
Simple
Filter




The test system consists of a NaradaBrokering server and
a three-filter chain for publishing, converting and
receiving RYO messages.
We take 4 timings to calculate mean end-to-end delivery
times of GPS measurements.
The tests were run at least for 24 hours.
GridFarm001-008 servers are used in these tests.
21
1- System Stability Test
System Stability Test
6
4
3
2
1
Time Of The Day
Transfer Time
Standard Deviation
22
23:00
22:00
21:00
20:00
19:00
18:00
17:00
16:00
15:00
14:00
13:00
12:00
11:00
10:00
9:00
8:00
6:00
5:00
4:00
3:00
2:00
1:00
0
0:00
Time (ms)
5
7:00
The basic system with
three filters and one
broker.
 The figure shows
average results for
every 30 minutes.
 The average transfer
time shows the
continuous operation
does not degrade the
system performance.

2 – Multiple Publishers Test
RYO
Publisher 2
Multiple Publishers Test
6
RYO
Publisher 1
2
1
Time Of The Day
Simple
Filter


Transfer Time
Standard Deviation
We add more GPS networks by running more publishers.
The results show that 1000 publishers can be supported
with no performance loss. This is an operating system limit.
23
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
0
6:00
RYO
Publisher n
0:00
Topic
1B
Topic
n
4:30
RYO To
ASCII
Converter
NB
Server
3
3:00
Topic
1A
4
1:30
Topic
2
Time (ms)
5
3 – Multiple Clients Test
1000 Clients
Multiple Clients Test
RYO
Publisher 1
40
35

Simple
Filter 2
900 1,000
Number
Time Of
OfThe
Clients
Day
Transfer Time
22:30
21:00
800
19:30
700
18:00
16:30
600
15:00
500
13:30
400
12:00
300
10:30
200
9:00
100
7:30
1
6:00
Adding Clients
4:30
20
20
15
15
10
10
5
5
0
0
3:00
Simple
Filter n
25
25
1:30
Topic
1B
Simple
Filter 1

NB
Server
0:00
RYO To
ASCII
Converter
30
30
(ms)
Time
Time(ms)
Topic
1A
Standard Deviation
We add more clients by running multiple Simple Filters which subscribe
to the same ASCII topic.
The system can support as many as 1000 clients with very low
performance decrease.
24
Extending Scalability




The limit of the basic system appears to be 1000
clients or publishers.
This is due to an Operating System restriction of
open file descriptors (1024 for Red Hat Linux) which
can be increased by changing OS parameters.
To overcome this limit we create NaradaBrokering
networks with linking multiple brokers. NB supports
scalable linkage of the brokers for building tree like
architectures.
We run 2 brokers to support 1500 clients.

Number of brokers can be increased indefinitely, so we can
potentially support any number of publishers and
subscribers.
25
4 – Multiple Brokers Test
RYO
Publisher

RYO To
ASCII
Converter
Topic
1A
NB
Server
1

Simple
Filter 1
Simple
Filter 2

Topic
1B
Simple
Filter
750


Simple
Filter
751
NB
Server
2
NB
Server
2
Simple
Filter
752
Topic
1B

Simple
Filter
1500
NaradaBrokering allows creation
of Broker networks.
We create a two-broker network.
Messages published to first
broker can be received from the
second broker.
We take timings on each broker.
We connect 750 clients to each
broker and run for 24 hours. We
chose 750 clients to stay well
below the saturation limit.
The results show that the
performance is very good and
similar to single broker test.
26
4 – Multiple Brokers Test
Multiple Clients Test
Broker 2
40
30
35
25
30
20
15
25
20
15
Time Of The Day
Transfer Time
Standard Deviation
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
0
7:30
0
6:00
5
4:30
5
3:00
10
1:30
10
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Time (ms)
35
0:00
Time (ms)
Multiple Clients Test
Broker 1
Time Of The Day
Transfer Time
Standard Deviation
27
Real-Time Filters Test Results
The RYO Publisher filter runs at 1Hz and
publishes 24-hour archive of the CRTN_01 GPS
network, which contains 9 GPS stations.
 The single broker configuration can support
1000 clients or publishers (GPS networks 9000 individual stations).
 The system can be scaled up by creating
NaradaBrokering broker networks.
 Message order was preserved in all tests.

28
GIS Grid Part 2 - Archival Data Grid



Web Feature Service is the default OGC specification for vector data.
We have built Web Service version of WFS for accessing geospatial data on
distributed databases.
Requirements









Various Feature data should be stored in the databases
Queries are in OGC Common Query Language (GML) format
Results are GML Feature Collections
Operations to support are Get Capabilities, Describe Feature Types, Get Features
To connect to multiple databases we have implemented a DB federation
scheme
Adding features is easy with using XML configuration files
We have Implemented OGC Filter Encoding for Query Translation
Dynamic Capability generation allow federation of the services
The first Web Service version of WFS has been successfully used in several
scientific workflows with other services (WMS, HPSearch,
UDDI).
29
WFS Performance Improvements
Streaming WFS

Issues with Web Service version of the WFS




Synchronous request-response style
Handling non-trivial data transfers, large data requests, SOAP overhead.
XML Encoding: Size of the geospatial data increases with GML encoding
which increases transfer times, or may cause exceptions
To improve performance of the WFS:





Utilized publish/subscribe messaging system for high performance data
transfer. Similar to WFS but introduces data and control channel separation
which allows one to many data distribution.
Used streaming database connection (MySQL) for faster retrieval of the
query results, and lower GML creation overhead.
Binary XML Frameworks are integrated for reducing XML payload size
which improves transfer times. We used BNUX and Fast Infoset
frameworks in our tests.
Binding data transfer to publish-subscribe messaging system reduces SOAP
overhead.
Database processing, GML creation and data transport is streaming 30
GIS Grid Example –IEISS Integration (LANL)
NB
Interface
Web
Feature
Service
MySQL
Feature
Database
NB
Server
2
GML
Builder
NB
Server
IEISS
Service
Interface
Context
Service
UDDI
Registry
Service
Web Map
Service
WMS – Ahmet Sayar
UDDI, Context Service – Mehmet Aktas
31
WMS User Interface
Streaming WFS + AJAX
Real-Time positions on Google maps
32
Streaming WFS Performance Tests
DB
Query
Builder
Request
Handler
DB
Manager
GML
Builder
Binary
XML
Encoder
W
S
D
L
NB
Publisher
Client
App
NB
Server
NB
Subscriber
Binary
XML
Decoder
The
Goal is to find the performance of the Streaming-WFS with and without
the Binary XML integration.
 We test the system performance against message size with up to 10.000
features by changing number of features per request.
We use BNUX and Fast Infoset Binary XML Frameworks for compressing
the GML FeatureCollection documents
 The BNUX and FI timings include encoding and decoding costs
33
Streaming WFS Performance Tests
Streaming-WFS End-to-End Transfer Times
NB Server at Bloomington
Streaming -WFS End-to-End Transfer Times
NB Server at Indianapolis
1000
900
800
Streaming WFS End-To-End Transfer Times
NB Server at La Jolla, CA
1,400
700
1,200
18,000
500
400
Time (ms)
1,000
16,000
800 14,000
300
200
100
0
600
Time (ms)
Time (ms)
600
400
200
500
0
12,000
10,000
8,000
6,000
1,000
2,000
3,000
4,000
500
2,000
4,000
5,000
6,000
7,000
8,000
9,000
6,000
FI
7,000
8,000
10,000
Number Of Features
1,000
2,000XML
3,000
4,000
BNUX5,000
9,000
10,000
Number Of Features
0
500
1,000
2,000
3,000
5,000
XML 4,000BNUX
FI6,000
7,000
8,000
9,000
10,000
Number Of Features
XML
BNUX
FI
34
Contributions








Proposed and implemented a SOA architecture to provide a
common platform supporting both archival and real-time geospatial
data in data-centric Grids.
Integrated Web Services with Open Geographic Standards for
supporting interoperability at both data and application levels.
Shown that the GIS Services can be implemented as streaming
services.
Integration of Binary XML Frameworks with the Streaming Services
shows performance gains for long network distances.
We have shown that the Sensor Grids can be built on top of the
publish/subscribe middleware.
Continuous real-time data support is achieved in Service
Architecture.
Scalable architecture implementation for large number of sensor
networks.
Detailed investigation of the scalability and performance of the
system.
35
Acknowledgement






Mehmet Aktas: UDDI and WS-Context
Ahmet Sayar: WMS Server and Client
ZhiGang Qi: SensorGrid Performance Tests
We thank Prof. Yehuda Bock and his group at SIO for
their help with real-time GPS data streams.
The work described in this presentation is part of the
QuakeSim project which is supported by the Advanced
Information Systems Technology Program of NASA's
Earth-Sun System Technology Office.
This collaboration is part of the NASA ACCESS ROSES
funded project, Modeling and On-the-fly Solutions in
Solid Earth Science.
36
Additional Slides
37
Future Work
Exploring the use of UDP transport for sensor
streams, which could potentially increase the
NB related performance.
 Investigating real-time sensor workflows with
Grid workflow tools such as Taverna.
 A smart selection tool for choosing best Binary
XML format for particular geographic features.
This could be based on Case Based Reasoning
(CBR) approach.

38
Related Work




Linked Environments for Atmospheric Discovery (LEAD), addressing
fundamental IT and meteorology research challenges to create an
integrated framework for analyzing and predicting the atmosphere.
Open-source Project for a Network Data Access Protocol
(OPeNDAP) is a framework that aims to simplify all aspects of
scientific networking,
allows access to scientific data over the internet from applications
that were not specifically designed for that purpose.
The Real-time Observatories, Applications, and Data management
Network (ROADNet), focuses on resolving challenges related to
building wireless sensor networks for various types of observations
and the information management system which will deliver this
sensor observation in real-time to the users.
Laboratory for Advanced Information and Technology Standards
(LAITS) at George Mason University, researches GRID (based on
Globus Technology) in Earth and Space Science.
39
Processing Real-Time GPS Streams
ascii2gml
ryo2ascii
ascii2pos
RYO
Ports
7010
Raw Data
RTD
Server
ryo2nb
7011
NB
Server
Single
Station
7012
Displacement
Filter
RDAHMM
Filter
Raw
Data
ryo2nb
ryo2ascii
ascii2pos
Single
Station
Station
Health
Filter
RDAHMM
Filter
/SOPAC/GPS/CRTN01/RYO
/SOPAC/GPS/CRTN01/ASCII
/SOPAC/GPS/CRTN01/POS
/SOPAC/GPS/CRTN01/DSME
A Complete Sensor Message Processing Path, including a data analysis application.
40