([email protected]) Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs measurements and analysis • Conclusions.

Download Report

Transcript ([email protected]) Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs measurements and analysis • Conclusions.

([email protected])
1
Outline
• Motivations
• Research Issues
• Architecture: Federated Service-Oriented
Geographic Information System
• Performance enhancing designs measurements and analysis
• Conclusions
2
Geographic Information Systems (GIS)
• GIS is a system for: creating,
storing, sharing, analyzing,
manipulating and displaying geodata and associated attributes.
• Inherently requires federation (see
the figure)
– Autonomy for scalability flexibility
and extensibility
• Distributed data access for geodata resources (databases, digital
libraries etc.)
• Utilizing remote analysis,
simulation or visualization tools.
• Open Standards
– OGC
– ISO/TC-211
3
Motivations
• Requirements for –
o Interoperable Service-oriented Geographic
Information Systems
– Necessity for sharing and integrating heterogeneous data
and computation resources to produce knowledge.
o Uniform data access/query, display and analysis
from a single access point
o Responsive and interactive information systems
– GIS applications require quick response
• Emergency early warning systems
• Home-land security and natural disasters.
4
Research Issues
• Interoperability
– Defining component based Service-oriented GIS data Grid
framework
– Adoption of Open Geographic Standards -data model and services
– Applying Web Service principles to GIS data services
– Integrating Web Service and Open Geographic Standards
• Federation
– Capability-based federation of GIS Web Service components
– Unified data access/query, display from a single access point
through integrated data-views
• Addressing high-performance support for responsiveness
– Streaming GIS Web Services and Pre-fetching framework
– Client-based caching
– Parallel processing through attribute based query decomposition
5
Web Service components and data-flow
Service-oriented GIS
• • WMS
are
data rendering services providing human comprehensible data
Built
over:
(binary map images)
– are
Web
Services
and
• WFS
data
servicesstandards
providing (WS-I+)
data in common
data model GML – Geographic
Markup
Language
– Open
Geographic Standards (OGC and ISO/TC-211)
behavingofastwo
mediator
and
services.
• •Consists
types
ofannotation
online services
• WMS
and WFS
have
their own
typeand
of capability
metadata
defined
by Open
–
Web
Map
Services
(WMS)
Web
Feature
Services
(WFS)
Geographic specs.
And two types
of data:is done through “getCapability” service interface.
• • Inter-service
communication
– Binary
data –map
images (provided by WMS),
• UDDI
based registry
services.
• Components
are Web Services
all control
through
SOAP messages
– Structured-data
–GML :and
content
(coregoes
data)
and presentation
(attribute
geometry
elements)
(provided by WFS)
• XML-based
queryand
language
(standard
schema)
Relation of the components and data flow:
Binary
data
w
s
d
l
getCapability
getMap
getFeatureInfo
WMS
GML
rendering
GIS
GML
w
s
d
l
WFS
(mediator)
getCapability
getFeature
DescribeFeatureType
6
Capability-based Federation of Standard GIS
Web Service Components
Web Map
Client
Interactive
map tools
WSDL
Aggregating
WMS
(Federator)
Stubs
– Inspired from OGC’s cascading
WMS
Stubs
HTTP
SOAP
WSDL
WFS
+
Seismic Rec.
WSDL
“REST”
WFS
+
State Bounds
WMS
+
OnEarth
Google Maps
…
• Built over the proposed standard
Web Service components and
common data models
• Federation is done by aggregating
GIS Web Services’ capabilities
metadata
• Unified data access/query/display
from a single access point
• Providing application-based
hierarchical data definitions
– layer based data and service
(WMS and WFS) compositions
• Capability is basically a metadata
about data+service:
– Server’s information content and
acceptable request parameter
values
7
Why Capability metadata
• Web Services provide key low level capability but
do not define an information or data architecture
• These are left to domain specific capabilities
metadata and data description language (GML).
• Machine and human readable information
– Enables easy integration and federation
• Enables developing application based standard
interactive re-usable tools
– for data query display and analysis
– Seamless data/access/query
8
Designs, measurements and
analysis
9
Performance Investigation
• Interoperability requirements bring up some compliance
costs:
– Common data model (GML)
– Web Services (SOAP protocol for communication)
• Approaches: Enhancing the GIS systems’ responsiveness
– Data transfer and rendering
• Streaming GIS Web Services (1)
• Structured/annotated GML data rendering (2)
– Federator-oriented approaches
• Pre-fetching (3)
• Client-based caching (4)
• Query decomposition and parallel processing (5)
• Testing with large scale Geo-science applications
– Earthquake forecasting (PI),
– Virtual California (VC)
• Aim: Turning compliance requirements into competitiveness
10
Conventional OGC-GIS systems
Baseline Performance Test
• Naïve approach is characterized as
– Stateless services
– On-demand data access,
– Single-threaded and no-caching
• Systems developed with Open Geographic Standards have:
– High degree of interoperability but poor performance results
Test Setup:
Average Response Times
Тысячи
70
60
Time - msec
50
40
30
20
Avg Resp Time
10
0
0
200
400
600
800
Data Size -KB
1000
1200
11
(1) Streaming GIS Web-Services
• Concern is large-sized XML-structured data transfer
• XML representation of data tend to be significantly
larger than binary representations
– The larger data sizes consume the greater network
bandwidth
– We still need to use it for interoperability reasons
• In initial development of the proposed Serviceoriented GIS we used GIS Web Services and SOAP over
HTTP as transfer protocol.
– BUT, this had some limitations over the performance.
• We investigated “Streaming Data Transfer”
– topic-based publish-subscribe messaging systems for
exchanging SOAP messages and data payloads.
12
(1) Streaming GIS Web-Services (Cont)
registry
UDDI
Average Response Times (ART) for
Streaming and Non-Streaming cases
w s d l
(A)WMS
Subscriber
client
5
1
w
getFeature
s
3
d
(topic, IP, port) l
GML
GML
Topic-wfs
Narada
Brokering
Server
4
WFS
3
Publisher
server
Log(Time) in msec
2
6
2
ART-Streaming
1
ART-Non-Streaming
0
0
200
400
600
800
Data Size -KB
1000
1200
• Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services
• SOAP is used for negotiation (line-3) – standard getFeature request
– Publisher information in (topic, IP, port) triple is returned.
• Publisher streams, subscriber receives.
• The performance gain is average 40%
13
(2) GML Data Processing
• Processing XML data: Parsing and rendering to create map images.
• Two well-known approaches are document models (DOM) and push
models (SAX).
• We use pull approach for XML processing:
– Parses only what is asked for
– No support for document validation (major gains of performance)
– Doesn’t build complete object model in memory (unlike DOM)
– Contents are returned directly to application from calls to parser (unlike SAX)
GML rendering
by using DOM vs. Xpp
4 000
(KB)
3 500
3 000
Time - msec
Total rendering timings
(1GB allocated VM)
Data Size
2 500
2 000
1 500
1 000
dom4j
500
Xpp
0
0
2000
4000
6000
8000
Data Size -MB
10000
12000
DOM (dom4j)
pull (Xpp)
1
469.22
15.59
10
494.06
72.81
100
625.54
183.06
1,000
760.20
270.47
5,000
1,422.91
671.74
10,000
3,557.44
1,025.67
100,000
-OUT OF MEM -
7,059.72
150,000
-OUT OF MEM -
11,047.89
200,000
-OUT OF MEM -
14,949.12
15
(3) Pre-fetching
•
•
•
•
Getting the GML data before it is needed
Extension for Pre-fetching Module is shown in grey region
Overcomes the network bandwidth problem and repeated data conversions.
This technique is good for infrequently changing archived data
– In other case, it might cause consistency problem
• Red curve – map rendering over the pre-fetched data (ready to use GML data)
• Black curve – map rendering through on-demand fetching
User Portal
Interactive
Tools
Federator
WFS
WMS
Processor
WMS
2
2
1
1
WFS
PR
WFS
GML
PR runs pre-defined
task in pre-defined
periodicity
Temp
Storage
Local File
System
NB
PR: Pre-fetching runner
NB: NaradaBrokering
WMS: Web Map Service
WFS: Web Feature Service
16
(3) Pre-fetching vs. On-demand Fetching
Data Size
MB
Average Response
Pre-fetching
StdDev
Average Response
On-demand
StdDev
0.01
19,261.90
481.57
1,808.13
140.32
0.1
19,112.30
673.69
2,635.46
313.48
0.5
19,222.48
631.35
5,001.29
238.94
1
19,427.48
305.94
8,225.73
200.27
5
20,146.00
516.50
33,419.31
394.48
10
20,165.90
546.53
Comparison
of the Average 64,506.78
Response Times283.24
50
systems
22,882.52 Prefetcing
509.98 vs. On-demand
316,906.00
623.08
23,990.43
548.65
100
1 000 000
603.59
643,344.00
• For 100MB, prefetching is about 30
times faster
conventional ondemand fetching.
• The larger the data
size the higher the
performance gains.
Average Response Times for Prefetcing system
100 000
30 000
Log(Time - msecs)
10 000
Time - msecs
25 000
20 000
1 000
15 000
100
10 000
10
5 000
log(Pre-fetching)
Response
Time
log(On-demand)
1
0
0
0
20
20
40
40
60
60
Data Size -MB
80
Data Size -MB
80
100
100
120
120
17
(4) Client-based Caching
• Each client has separate caching area allocated.
• Application of working-window and locality
principles into map images rendering
• Clients are differentiated according to the client
assigned session-id parameter in the header of
queries.
• Always keep the least recently-used data
• Brings up some overhead to keep up workingwindow for each client.
18
Brief Architecture
Server-side Create identity card. Update at every request from the client
• FormerRequest Class
String uuid;
/*unique-user-id*/
String bbox;
/*bounding box of the user’s last request*/
Double density;
/*data size falling into per unit square*/
Vector [] feature_data;
/*geometry elements of the last request*/
Register to client table
Client-side
uuid-1
uuid-2
…..
FormerRequest-1
FormerRequest-2
……
Set identity to message header
ClientWSStub binding;
binding = (ClientWSStub ) new ServiceLocator().WMSServices( servaddress));
String sessionID = session.getid(); //uuid-1
String channel_name = “getMapChannel”;
/*Add SessionID to the SOAP message’s header*/
binding.setHeader(service_address, channel_name, sessionID);
19
Map mymap = binding.getMap(request);
Why Client-based Caching
• Makes stateless GIS Web Services stateful
• Allows share workload as equal as possible for the most
efficient parallel processing.
Comparing with Google-like Map Servers:
• In large scale applications it is impossible to cache whole data
– Limited storage and computation capabilities
• Google-like map servers are fast because
– They replace computation with storage.
– Pre-making all images and cut up into tiles
– They formalize the accepted requests in terms of parameters, and
responses in terms of the tile compositions.
• BUT, good for only the client-server based applications
– It can’t be applied to distributed dynamic data rendering and
extensible applications.
– They don’t deal with the feature enriched maps enabling
attribute-based querying,
– And structured/annotated scientific data rendering.
20
(5) Parallel Processing over Client-based Caching
Main query  cached-data extraction  rectangulation {Rectangles[Ri]} partitioning – {sub-queries [ri]} 
assigning separate threads  assembling the results
R1
1
R3
Critical data
provider in GML
WFS
R2
R1
2
R4
r1
GML
Cached
GML1
r2
GML2
r3
. . .
. . . .
rPn
GMLPn
GetFeature
requests
Critical data falling
into partitioned
regions
3
Successive
request
Main query:
cached data
extraction and
rectangulation
Critical data layer
R2
R1
Cached
Data
R1
R2
4
Layers from Other
WFS and WMS
21
Challenge: Geo-Data Characteristic
(c,d)
R3
(c,d)
R2
(c, (b+d)/2)
(c, (b+d)/2)
R1
(a,b)
R4
((a+c)/2, b)
(1)
(a,b)
((a+c)/2, b)
(2)
• Need for advanced techniques for
workload sharing !
• A point data is described
with location attribute
– (x, y) coordinates.
• Linestrings, polylines,
polygons etc are defined as
set of points.
• Data sets falling into a
queried region is formulated
as bounding box (bbox)
– Coordinates of a rectangle
(a, b, c, d)
• Geo-data is characterized as
un-evenly distributed and
variable sized according to
their locations attributes.
– Ex. Human population
22
Attribute-based Query Decomposition
• Cached data extraction
• Rectangulation over the remaining : R1, R2, R3, R4
• Each rectangle goes through partitioning process.
– Blind partitioning
• Such as first time queries
• Uses default partitioning number
– Smart partitioning
• client-based caching
• FormerRequest Object
• All partitions are assigned to separate threads and
results are merged to create final response
maxx,maxy,
Cached
Data
minx,miny,
R3
R1
Query
R2
R2
R1
R4
R2
R1
Partition into 4 23
Smart Partitioning through Client-based Caching
• Based-on the locality principles.
– Assumption: Former and current requests have similar data density
• Cached data area:
CD_size_br2 = (maxxc - minxc)*(maxyc - minyc)
• Main-query area:
R_size_br2 = (maxx - minx)*(maxy - miny)
• Thr: Pre-defined threshold value changing from data to data.
• Pn : The number of partitions calculated for a rectangle
(maxxc, maxyc)
(maxx, maxy)
Determining the most efficient number of partitions (Pn)
Cache
Query
(minxc,
minyc)
(minx, miny)
If Pn >= 2 Cut the rectangle into Pn
number of equal sized regions.
24
Assigning Partitions to Workers
• Partitions are assigned to the worker nodes in round-robin fashion.
• We keep a pool of worker nodes for each feature layer that parallel
processing is applied.
• According to the algorithm
– PN: number of partitions
– WN: number of worker nodes in the pool
– share is the number of partitions each worker is supposed to get
• Check if there is still remaining partitions waiting
• Assignments:
• First rmg #of worker nodes assigned share+1
• And others (WN-rmg) are assigned share number of partitions
25
-110,35,-100,36
GFeature-1
-110,36,-100,37
GFeature-2
-110,37,-100,38
GFeature-3
-110,38,-100,39
GFeature-4
-110,39,-100,40
GFeature-5
Vertical partitioning in case of
having 5 partitions
26
Data Access Timings
-No Cached DataТысячи
Comparisons of data capturing times based on
different partitioning levels
70
60
single-thread
2-thread
10-thread
20-thread
Time - msecs
50
40
30
20
10
0
0
2
4
6
8
10
12
Data Size -MB
• Tdata access = Tquery conversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator
+ TBuilding GML at federator
Federator
WFS
DB
27
Overhead and Response Timings
ex. case: 10-threaded parallel processing
Comparisons of overheads for 10-threaded
case based on different partitioning levels
Тысячи
Comparisons of response times
with single threaded case
70
2 000
1 600
single-threaded
partitioning
sub-query crt
merging
1 400
Time - msecs
50
Time - msecs
1 800
10-threaded
60
40
30
1 200
1 000
800
600
20
400
10
200
0
0
0
2
4
6
8
10
12
0
5
10
Data Size -MB
15
20
25
30
35
Partition Number
• The performance does not increase in the same ratio at which the thread number
increases
– Overheads: Query partitioning, sub-query creation, map creation and map transfer.
– There is no performance gain for less then a threshold-data size handled.
Browser
Eventbased
dynamic
map tools
Federator
WFS
WFS
DB
28
Partial Usage of Cached Data (Ex. case:1/2 cached)
Comparison of the response times
Half cache-10 thrd
NO Cache-10 thrd NO Cache-Single thrd
Avg. Time StdDev avg time std dev Avg. Time StdDev •
Data
MB
0.01
3,095.19
204.22 2,329.50
131.46
1,808.13
0.1
3,576.73
283.8 2,760.00
104.35
2,635.46
0.5
3,721.77
210.41 3,460.40
120.24
5,001.29
1
4,311.73
192.45 4,640.53
106.42
8,225.73
5
11,294.58
313.59 16,725.4
201.62
33,419.31
10
18,371.72
296.19 23,118.4
941.83
64,506.78
Тысячи
Comparisons of response times
70
– As the data size increases.
– As the overlapped cached
region increase
– 100% overlapping -> look
like pre-fetching case
half-cached/10-thread
no-cached/10-thread
no-cached/single-thread
60
50
Time - msecs
There is no performance
gain for the small sizes of
140.32
data due to the overheads.
313.48
• For 10mb, the proposed
238.94
system is almost 4 times
200.27
faster than the ordinary on394.48
demand one-threaded
system.
283.24
• The performance gain
increases:
40
30
20
10
CT
0
0
2
4
6
Data Size -MB
8
10
Fedrtr
WFS
WFS
WFS
DB
12
29
Conclusions
• Streaming data transfer techniques allow data rendering even on
partially returned data.
• Pull parsing results in best outcomes for XML encoded GML data
rendering - Eliminating the requirement of data validation.
• Federator’s natural characteristic allowed us develop advanced
caching and parallel processing designs.
• Pre-fetching and parallel-processing techniques are mutually
exclusive.
• Best performance outcomes are achieved through pre- fetching but
can cause data inconsistency .
– Triggering periodicity must be defined carefully.
• Parallel-processing techniques’ success is based on how well we share
the workload to worker nodes.
– Un-evenly distributed and variable sized geo-data characteristics.
• We saw that
– Application of working-window and locality principles by means of
client-based caching.
– Parallel processing through attribute-based query decomposition
Helped us increase the system responsiveness to a greater extent.
30
Conclusions – General Framework
• Heterogeneous data sources are queried as a single
resource
– Heterogeneous: Autonomous local resources controlling
definition of data
– Single resource: Remove the burden of individually accessing
each data source with ad-hoc query languages.
– WFS-based mediation :
• Data and query conversions
• Easy extension with new data and service resources
– Open Geographic and Web Service standards
• No physical data integration
– Data always at local source
– Easy maintenance of data and high degree of autonomy
• Seamless interaction with the system through integrated
data views as multi-layered map images
31
Contributions
• A federated Service-oriented Geographic Information
Systems framework
– Integrating Web Services with Open Geographic Standards
to support interoperability at both data and service levels
– Production of knowledge from distributed data sources in
multi-layered map images.
• Hierarchical data definitions through capability metadata
federations
• Enabling unified interactive data access/query and display.
• Investigated performance efficient designs and did
detailed benchmarking
– Streaming GIS Web Services
– Federator-oriented high-performance design techniques
• Pre-fetching
• Client-based caching : Working-window and locality principles
• Parallel processing through attribute-based query decomposition
32
Acknowledgement
• The work described in this presentation is part
of the QuakeSim project which is supported
by the Advanced Information Systems
Technology Program of NASA's Earth-Sun
System Technology Office.
• Galip Aydin: Web Feature Server (WFS)
33
Thanks!....
34
BACK-UP SLIDES
35
Capability-based Federation of the standard Web
Service Components
• Built over the proposed standard Web Service components and common
data models
• Unified data
access/query/display from
a single access
point
Application-based
hierarchical
data:
• Providing application-based hierarchical data definitions
[Application]– layer based
data and servicePattern
(WMS andInformatics
WFS) compositions
• Federation is –
done
by aggregating
GIS Web Services’
capabilities metadata
[Layer-1]
State-boundary
over Satellite
• Capability is basically a metadata about data+service:
• [Data-1]
– Server’s information content and acceptable request parameter values
– State-boundary (WFS-1)
Capability Federation
Map Rendering
• [Data-2]
a, b, c and d
User Portal
–
Interactive
Map-Tools
Browser
Events:
Satellite-Image(WMS-2)
1
– [Layer-2]
2
1
WFS
2
Federator
WMS
WMS
c
2
1
3 map (WMS-1)
• Google
GIS
WFS
1. GetCapability (metadata data+service)
• [Data-1]
2. GetMap (get map data in set of layer(s))
GetFeatureInfo (query the attributes of data)
• 3.Earthquake-Seismic(WFS-3)
Sample Layers for PI:
a.
WFS
d
– [Layer-3]- Earthquake-Seismic
- Move,
- Zooming in/out
- Panning (drag-drop)
- Rectangular region
- Distance calc.
- Attribute querying
a
WMS
b
b.
c.
d.
NASA satellite
layer
Earthquakeseismic layer
Google Map Layer
State-boundaries
Layer
36
Hierarchical data
Integrated data-view
1
2
3
1: Google map layer
2: States boundary
lines layer
3: seismic data layer
Event-based Interactive Tools :
Query and data analysis over integrated data views
37
38
•
•
•
•
•
Integrated views
Event-based querying through integrated views.
WFS-based mediators
XML-based query language
Federation related specific related works (might not be
active)
– MIX mediation of information using XML
– SRB/MCAT (SDSC)
– TSIMMIS (Stanford Univ)
• XML-based standard queries for the standard services.
– Capability gives the list of data provided, attribute lists they can
be queried and constraints on the queries to make create valid
requests such as getMap, getFeature.)
• We do syntactical and structural integration.
39
Hierarchical data / Integrated data-view
For IEISS Geo-science Application
Application-based hierarchical data:
[Application]- IEISS
– [Layer-1] Gas-pipeline over Satellite
• [Data-1]
– Gas-pipeline (WFS-1)
• [Data-2]
– Satellite-Image(WMS-2)
– [Layer-2]
• Google map (WMS-1)
– [Layer-3]- Electric-power
• [Data-1]
• Electric-power(WFS-3)
40
GetCapabilities Schema and Sample Request Instance
41
GetMap Schema and Sample Request Instance
42
43
Event-based Interactive Map Tools
• <event_controller>
–
–
–
–
–
–
–
–
<event name="init" class="Path.InitListener" next="map.jsp"/>
<event name="REFRESH" class=" Path.InitListener " next="map.jsp"/>
<event name="ZOOMIN" class=" Path.InitListener " next="map.jsp"/>
<event name="ZOOMOUT" class="Path.InitListener" next="map.jsp"/>
<event name="RECENTER" class="Path.InitListener“next="map.jsp"/>
<event name="RESET" class=" Path.InitListener " next="map.jsp"/>
<event name="PAN" class=" Path.InitListener " next="map.jsp"/>
<event name="INFO" class=" Path.InitListener " next="map.jsp"/>
• </event_controller>
44
Sample GML document
45
Sample GetFeature Request Instance
46
A Template simple capabilities file for a WMS
47
Generalizing the Problem Domain
• Query heterogeneous data
sources as a single resource
Client/User-Query
– Heterogeneous: local
resource controls definition
of the data
– Single resource: remove the
burden of individually
accessing each data source
Integrated View
• Easy extension with new
data and service resources
• No real integration of data
Mediator
DB
Mediator
Files
Mediator
WWW
Data in files, HTML, XML/Relational Databases,
Spatial Sources/sensors
– Data always at local source
– Easy maintenance of data
• Seamless interaction with
the system
– Collaborative decision
makings
48
Generalization of the Proposed Architecture
•• GIS-style
information
can be redefined
We need
to definemodel
Application
Specific: in any application areas
such as Chemistry and Astronomy
• Federator federating the capabilities of distributed ASVS
– Application Specific Information Systems (ASIS).
and ASFS to create application-based hierarchy of
• We need
to definedata
Application
Specific
distributed
and service
resources.
– Language (ASL) -> GML :expressing domain specific features, semantic of
• Mediators:
Query and data format conversions
data
–• Feature
Service (ASFS)
-> WFStheir
:Serving
data in common
language (ASL)
Data sources
maintain
internal
structure
–• Visualization
Services
(ASVS) -> WMS : Visualizes information and provide
Large
degree
of
autonomy
a way of navigating ASFS compatible/mediated data resources
No actualmetadata
physicalfordata
–• Capabilities
ASVSintegration
and ASFS.
Such as filter, transformation, reasoning, data-mining,
analysis
Unified data
query/access/display
1
Federator
2
ASVS
3
Capability Federation
ASL-Rendering
Standard service API
4
Standard
service API
3
AS Services
(user defined)
Mediator
Messages using ASL
2
Standard
service API
AS
Repository
1
Mediator
ASAS
Sensor
Sensor
49
Contributions (Systems Software)
• Developing Web Map Server (WMS) in Open Geographic
Standards
– Extended with Web Service Standards and
– Streaming map creation capabilities
• Developing GIS Federator
– Provides application specific layer-structured hierarchical data
as a composition of distributed standard GIS Web Service
components
– Enable uniform data access and query
• Interactive map tools for data display, query and analysis.
– Browser and event-based.
– Extended with AJAX (Asynchronous Java and XML)
50