Outline • Background: Geographic Information Systems and Open Geographic Standards • Motivations and Motivating Use Cases • Research Issues • Architecture: Federated Service-Oriented Geographic Information System •

Download Report

Transcript Outline • Background: Geographic Information Systems and Open Geographic Standards • Motivations and Motivating Use Cases • Research Issues • Architecture: Federated Service-Oriented Geographic Information System •

1
Outline
• Background: Geographic Information Systems
and Open Geographic Standards
• Motivations and Motivating Use Cases
• Research Issues
• Architecture: Federated Service-Oriented
Geographic Information System
• Performance enhancing designs -measurements
and analysis
• Contribution
• Future Work
2
Geographic Information Systems (GIS)
• GIS is a system for: creating,
storing, sharing analyzing,
manipulating and displaying spatial
data and associated attributes.
• GIS evaluated from mainframe
systems to Desktop to Distributed
systems.
• Modern GIS require:
– Distributed data access for
spatial databases
– Utilizing remote analysis,
simulation or visualization
tools.
• Problems with traditional
distributed GIS approaches:
– Distributed nature of the geodata; various client-server
models, databases, HTTP, FTP
3
Open Geographic Standards
• Aim is to make geographic information and services neutral
and available across any network, application, or platform.
• Two major well-known standard bodies: Open Geospatial
Standards (OGC) and ISO/TC211.
• OGC Specifications defines online services and data models:
– Data Format Specs: Geographic Markup Language (GML)
– Service Specs: Web Feature Service (WFS), Web Map Service
(WMS)
• OGC Services are HTTP-based which has limited data transport
capabilities.
• HTTP-based services are request-response type services;
centralized and synchronous applications.
4
Motivations
• Requirements for Interoperable Service-oriented
Geographic Information Systems
– Necessity for sharing and integrating heterogeneous data
and computation resources
• Uniform data access/query/display from a single
access point
• Responsive and interactive GIS systems
– GIS applications require quick response
• Emergency early warning systems
• Home-land security and natural disasters.
5
Motivating Use Cases
• Earthquake science applications
– Pattern Informatics (PI)
• Earthquake forecasting code developed by Prof. John Rundle (UC
Davis) and collaborators, uses seismic archives.
– Virtual California (VC)
• Time series analysis code, can be applied to GPS and seismic
archives. It can be applied to real-time and archival data.
• Interdependent Energy Infrastructure Simulation System
(IEISS) – Los Alamos National Laboratory (LANL)
– Models infrastructure networks (e.g. electric power systems and
natural gas pipelines) and simulates their physical behavior,
interdependencies between systems.
6
Research Issues
• Interoperability
– Adoption of Open Geographic Standards
– Applying Web Service principles to GIS data services
• Flexibility and Extensibility
– The system should bridge GIS and Web Service communities by
adapting standards from both
– Other GIS applications should be able to consume data without
having to do costly format conversions
• Federation
– Federation of GIS systems
• Unified data access/query/display from a single access point
– Principles for generalizing the proposed federated GIS system
• In terms of components, framework and requirements.
• Addressing high-performance support for responsiveness
7
Interoperable Service-oriented GIS
• Composed of two types of online services, Web Map Services (WMS) and
Web Feature Services (WFS)
• And two types of data:
– Binary data –map images (provided by WMS),
– Structured-data –GML : content (core data) and presentation (attribute and
geometry elements) (provided by WFS)
• WMS and WFS have their own type of capability metadata defined by Open
Geographic specs. They exchange capabilities through “getCapability”
service interface to make valid requests and get valid responses
• UDDI based registry services
• Components are Web Services and all control goes through SOAP messages
Relation of the components and data flow:
Binary
data
w
s
d
l
getCapability
getMap
getFeatureInfo
WMS
GML
rendering
GIS
GML
w
s
d
l
WFS
(mediator)
getCapability
getFeature
DescribeFeatureType
8
Federated Interoperable GIS
• Unified data access/query/display from a single access point
hierarchical
data:
• ProvidingApplication-based
application-based hierarchical
data definitions
– layer [Application]based data and service
(WMS and WFS) compositions
IEISS
• Federation is done by aggregating GIS Web Services’ capabilities metadata
[Layer-1]
Gas-pipeline
over Satellite
• Capability is –
basically
a metadata
about data+service:
• [Data-1]
– Server’s information
content and acceptable request parameter values
– Gas-pipeline (WFS-1)
Capability Federation
• [Data-2]
Map Rendering
– Satellite-Image(WMS-2)
User Portal
Interactive
Map-Tools
– [Layer-2]
Browser
1
Federator
WMS
2
• Google
map
(WMS-1)c
2
3
a
WMS
– [Layer-3]- Electric-power
2
1
WFS
1
GIS
WFS
• [Data-1]
(data and operations available on)
•1.2. GetCapability
Electric-power(WFS-3)
GetMap (get map data in set of layer(s))
WFS
b
Sample Layers for IEISS:
a.
b.
c.
d.
Gas-pipeline
Electric-power
NASA satellite
State-boundaries
d
WMS
3. GetFeatureInfo (query the attributes of data)
9
Why Capability metadata
• Web Services provide key low level capability but
do not define an information or data architecture
• These are left to domain specific capabilities
metadata and data description language (GML).
• Machine and human readable information
– Enables easy integration and federation
• Enables developing application based standard
interactive re-usable tools
– for data query display and analysis
– Seamless data/access/query
10
Generalizing the Proposed Architecture - I
• One can define a GIS-style information model in many
application areas such Chemistry and Astronomy
– Application Specific Information Systems (ASIS).
• We have investigated the requirements and principles
to generalize the proposed federated GIS approach.
– From GML to ASL (Application Specific - Language)
• Data description language in forms of domain specific features
– From WFS to ASFS (Feature Services)
• Provides data in ASL with standard service interfaces
– From WMS to ASVS (Visualization Services)
• Domain specific display format definitions and standard services
• Visualizes information and provide a way of navigating ASFS
compatible databases (cf. GetFeatureInfo for GIS)
– Need to define application specific capabilities metadata
for ASVS and ASFS.
11
Generalization of the Proposed Architecture - II
• Mediators: Query and data format conversions
• ASFS -> provide ASL(structured data covering content and
presentation tags).
• ASVS -> provide common data representations from ASL, in binary
images
• Federator federates the capabilities of distributed ASVS and ASFS to
create application-based hierarchy of distributed data and service
resources
ASIS
Such as filter, transformation, reasoning, data-mining,
analysis
Unified data
query/access/display
1
Federator
2
ASVS
3
Capability Federation
ASL-Rendering
Standard service API
4
Standard
service API
3
AS Services
(user defined)
Mediator
Messages using ASL
2
Standard
service API
AS
Repository
1
Mediator
ASAS
Sensor
Sensor
12
Performance enhancing designs
-measurements and analysis-
13
Performance Investigation
• Interoperability requirements bring up some compliance
costs:
– Common data model (GML)
– Web Services (SOAP protocol for communication)
• Approaches: Enhancing the GIS systems responsiveness
– Streaming GIS Web Services
– Pre-fetching
– Parallel processing with caching
• Testing with large scale science applications using large
scale data, and resource consuming processes
– Earthquake forecasting (PI),
– Virtual California (VC)
• Turning compliance requirements into competitiveness
14
Limits of Conventional OGC-GIS systems
• On-demand data access, single-threaded and no-caching
• Related projects: Deegree and UMN-Minnesota Map Servers
• Baseline performance tests over the systems developed with Open
Geographic Standards:
Time - msec
Thousands
– Local-area network – from database to user ends
– Small data sets (less than 500KB) response times are ok
– For larger data sizes the performance is not enough.
Average Response Times for conventional system
70
60
50
40
30
20
Avg Resp Time
10
0
0
200
400
600
800
Data Size -KB
1000
1200
15
Design & Measurement-1:
Large sized structured data transfer
• XML representation of data tend to be significantly
larger than binary representations
• The larger data sizes consume the greater network
bandwidth.
• In initial development of the proposed SOA based GIS
we used GIS Web Services and SOAP over HTTP as
transfer protocol.
• BUT, this had some limitations over the performance.
• We investigated “Streaming Data Transfer”: topicbased publish-subscribe messaging systems for
exchanging SOAP messages and data payloads.
16
Streaming GIS Web-Services
registry
UDDI
Average Response Times (ART) for Streaming and NonStreaming cases for Ordinary system
w s d l
6
(A)WMS
Subscriber
client
5
1
w
getFeature
s
3
d
(topic, IP, port) l
GML
GML
Topic-wfs
Narada
Brokering
Server
4
WFS
Publisher
server
3
Log(Time) in msec
2
2
ART-Streaming
1
ART-Non-Streaming
0
0
200
400
600
800
Data Size -KB
1000
1200
• Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services
• SOAP is used for negotiation (line-3) – standard getFeature request
– Publisher information in (topic, IP, port) triple is returned.
• Publisher streams, subscriber receives.
• The performance gain is average 40%
17
Design & Measurement-2:
GML Data Processing
• Processing XML data: Parsing and rendering to create map images.
• Two well-known approaches are document models (DOM) and push
models (SAX).
• We use pull approach for XML processing:
– Parses only what is asked for
– No support for document validation (major gains of performance)
– Doesn’t build complete object model in memory (unlike DOM)
– Contents are returned directly to application from calls to parser (unlike SAX)
Measuring timings of GML rendering
by using DOM and Xpp
4,000
(KB)
3,500
3,000
Time - msec
Total rendering timings
(1GB allocated VM)
Data Size
2,500
2,000
1,500
1,000
dom4j
500
Xpp
0
0
2000
4000
6000
8000
Data Size -MB
10000
12000
DOM (dom4j)
pull (Xpp)
1
469.22
15.59
10
494.06
72.81
100
625.54
183.06
1,000
760.20
270.47
5,000
1,422.91
671.74
10,000
3,557.44
1,025.67
100,000
-OUT OF MEM -
7,059.72
150,000
-OUT OF MEM -
11,047.89
200,000
-OUT OF MEM -
14,949.12
Federator-Oriented Performance
Enhancement Approaches
19
Geo-Data Characteristic
(c,d)
R3
(c,d)
R2
(c, (b+d)/2)
(c, (b+d)/2)
R1
(a,b)
R4
((a+c)/2, b)
(1)
(a,b)
((a+c)/2, b)
(2)
• Supporting alternative techniques
based on data characteristics
1. Pre-fetching
2. Parallel processing with caching
through attribute-based query
decomposition
• A data is described with
location attribute -(x, y)
coordinates.
• A set of data is described
with bounding box (bbox)
– (a, b, c, d)
• Geo-data is described as
un-evenly distributed and
variable sized according to
their locations attributes.
– Ex. Human population
• Cannot share workload
evenly
20
Design & Measurement-3:
Pre-fetching (PM)
• Getting the GML data before it is needed
• Overcomes the network bandwidth problem and repeated data conversions.
• For infrequently changing archived data
– In other case it might cause consistency
• Red curve – pre-fetching the data (data is brought to federator – ready to use)
• Black curve – on-demand fetching the from remote heterogeneous resources
User Portal
Interactive
Tools
Federator
WFS
WMS
Processor
WMS
2
2
1
PM runs predefined task in
pre-defined
periodicity
-independent of
the application
1
WFS
PM
WFS
GML
Temp
Storage
Local File
System
NB
PF: Pre-fetching module
NB: NaradaBrokering
WMS: Web Map Service
WFS: Web Feature Service
21
Pre-fetching vs. On-demand Fetching
Data Size
KB
1
10
100
1000
10,000
Average
Response
Pre-fetching
1,006.47
1,040.33
1,148.44
1,687.44
2,785.37
Average
Response
StdDev On-demand
176.84
2,375.24
233.24
2,578.69
233.11
7,973.16
421.92
59,335.69
282.39 573,324.66
StdDev
152.40
252.49
374.12
343.76
836.46
Response times comparison
Pre-fetching vs On-demand
7
• For 10MB, pre-fetching is
about 200 times faster
conventional on-demand
fetching.
• The larger the data size
the higher the
performance gains.
Log(Times in msec)
6
5
4
3
2
Log(Pre-fetching)
1
Log(on-demand)
0
1
10
100
Data Size - MB
1000
10,000
22
Design 4:
Parallel Processing and Caching
Main query  cached-data extraction  rectangulation {Rectangles[Ri]} partitioning – {sub-queries [ri]} 
assigning separate threads  assembling the results
R1
1
R3
Critical data
provider in GML
WFS
R2
R1
2
R4
r1
GML
Cached
GML1
r2
GML2
r3
. . .
. . . .
rPn
GMLPn
GetFeature
requests
Critical data falling
into partitioned
regions
3
Successive
request
Main query:
cached data
extraction and
rectangulation
Critical data layer
R2
R1
Cached
Data
R1
R2
4
Layers from Other
WFS and WMS
23
Attribute-based Query Decomposition
Over un-cached Regions
• Finding the number of partitions need to be made for
each rectangle
– Calculate the cached data density
– Compare with the pre-defined threshold value
-110,35,-100,36
• defines a region’s max possible
size
GFeature-1
GFeature-2
– Then , divide the region into-110,36,-100,37
equal
sized
(in
bbox) sub-110,37,-100,38
GFeature-3
regions whose size should be
less than or GFeature-4
equal to the
-110,38,-100,39
threshold value
-110,39,-100,40
GFeature-5
• Creating sub-queries and assembling the result sets
– Sub-queries for the partitions inherit all the attributes
from the main query.
– The only difference is bbox values
24
Caching
• Caching
– Basically removes repeated jobs
– One-time caching : Recently fetched data is kept for the
successive requests
– For each session (browser), separate short-term cache
data
• Session Tracking for Caching
– How servers know what request came from whom?
– Mapping Browser-based Sessions to Web Services
• Standard Web Service interfaces and message formats
• Each request initiated from the same browser will have same
sessionID.
• Adding new entry to header of SOAP request - “sessionID”
requestObj.setHeader(service_address, channel_name, sessionID)
25
Measurement-4:
Performance Tests –
Parallel Processing and Caching
• As a result of comparing bbox of cached data and
request, there are 3 different possible scenarios
– Case 1: No usage of cached-data
– Case 2: Complete usage of cached-data
• Bets case looks like pre-fetching
– Case 3: Partial usage of cached-data
26
Data Access Timings
-No Cached DataTime - msec Thousands
Average GML transfer time from source to Federator
with threaded approach
700
600
single-thread
500
2-threads
400
10-threads
300
200
100
0
0
2
4
6
8
10
12
Data Size -MB
• Tdata access = Tquery conversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator
+ TBuilding GML at federator
Federator
WFS
27
Overhead and Response Timings
ex. case: 10-threaded parallel processing
Response timings for 10-threaded pp and
comparison with single threaded systems
Overhead timing for 10-thread pp
Thousands
10,000
100
Time - msec
Time - msec
1,000
10
1
0
2
4
6
8
10
Data Size -MB
partitioning
map transfer
12
700
10-threaded pp
600
single-threaded
(ordinary)
500
400
300
200
100
0
sub-query creating
map creating
0
2
4
6
8
Data Size -MB
10
12
• The performance does not increase in the same ratio at which the thread number
increases
– Overheads: Query partitioning, sub-query creation, map creation and map transfer.
– There is no performance gain for less then a threshold-data size handled.
Browser
User-portal
Interactive
map - tools
Federator
WFS
28
Partial Usage of Cached Data (1/2 cached)
Comparison of the response times
GML Data
Half cached half with parallel p
Size - MB
time
Orinary systems
StdDev
time
StdDev
0.01
7,063.38
357.46
2,578.69
252.49
0.1
9,702.49
322.20
7,973.16
374.12
0.5
12,892.12
361.53
30,868.52
482.83
1
14,692.18
414.89
59,635.69
343.76
5
45,401.40
590.89
288,594.12
772.41
10
70,494.98
475.19
574,825.16
836.46
Comparing Overall Response Timings
700
10-threaded pp
single-threaded (ordinary)
hybrid(1/2cached-1/2pp)
Time - msec Thousands
600
500
• There is no performance
gain for the small sizes of
data due to the overheads.
• For 10mb, the proposed
system is almost 8 times
faster than the ordinary ondemand one-threaded
system.
• As the data size increases,
performance gain increases.
• As the overlapped cached
region increase, the
performance gain increase
– 100% overlapping -> look
like pre-fetching case
400
300
200
100
Browser
0
0
2
4
6
Data Size -MB
8
10
12
29
Contributions (Systems Research)
• A framework for federated Service-Oriented GIS
– Integrated Web Services with Open Geographic Standards for
supporting interoperability at both data and application levels
– Capability definitions and federation
• Principles for Application Specific Information Systems
– Conditions and requirements
• Investigating performance efficient designs and detailed
benchmarking
– Streaming GIS Web Services and Pre-fetching
– Attribute-based query partitioning and caching for parallel
processing
• Mapping browser-based session to Web Services
• Forecasting workload from the cached-data
30
Contributions (Systems Software)
• Developing Web Map Server (WMS) in Open
Geographic Standards
• Developing GIS Federator
• Interactive map tools for data display, query and
analysis.
• Sci-Plot (Scientific data plotting) GIS Web Services
– To integrate geo-science application data with Geodata Grid
31
Future Research Directions
• Developing generic framework for application specific
information systems –ASIS
–
–
–
–
Considering semantics of data and services
Distributed capability federation
Capability files and application specific languages
Inter-service communications through capability exchange
• Integrating ASIS with science applications
– Science plotting services as a gateway between science data
grid and applications
– Handling processed data
• Storage, overlay and association with raw(input) data
32
Acknowledgement
• Galip Aydin: Web Feature Server (WFS)
• Mehmet Aktas: Universal Description and Discovery
Services (UDDI)
• The work described in this presentation is part of the
QuakeSim project which is supported by the Advanced
Information Systems Technology Program of NASA's
Earth-Sun System Technology Office.
• This collaboration is part of the NASA ACCESS ROSES
funded project, Modeling and On-the-fly Solutions in
Solid Earth Science.
33
Thanks!....
34
BACK-UP SLIDES
35
General Structure of AS-Tools
ASF(V)S-based mediation
Standard Service API
• To be concrete let’s analyze
WFS-based mediation
• Query conversion
– From “GetFeature” to local
query (ex. SQL for database)
Request
Response
WSDL
ASF(V)S Service Layer
getCapability,getFeature,describeFeatureTyp
(2,3)
• Data set conversion and
composition
– Local query result to GML
• Common service API
– GetCapability
– GetFeature
– DescribeFeatureInfo
(1,3)
Request Handler
Composition
Mapping: query re-creation
Source Connection/Execution
Data/ information Sources
Databases, file systems or other
remote/local sources .
(HeteroSources)
36
Capabilities Federation
Capability Files for Standard Services
WMS
WFS
<Capabilities>
<Capabilities>
<Service>
<Service>
General
<Name>
<Name>
Service
<OnlineResource>
<OnlineResource>
Metadata
<ContactInfo>
<ContactInfo>
</Service>
</Service>
<Capability>
<Capability>
<Request>
<Request>
Operations <GetCapability>
<GetCapability>
Web Service
<GetMap>
<GetFeature>
Interfaces
<GetFeaturInfo>
<DescribeFeaturType>
</Request>
</Request>
<LayerList>
<DataList>
Metadata about
<Layer-1: Satellite img>
<Data-1: gas-pipeline>
provided
<Layer-2: gas-pipeline>
<Data-2: electric-power>
data/information <Layer-2: Google-map>
<Data-2: other-data>
</LayerList>
</ DataList >
</Capability>
</Capability>
37
</Capabilities>
</Capabilities>
38
Parallel processing with caching through
attribute-based query decomposition - I
(maxxc, maxyc)
(minx, miny)
• Attribute is bounding box (bbox) defined as
– (minx, miny, maxx, maxy)
• CD_size_br2 = (maxxc - minxc)*(maxyc - minyc) (minx , miny ) (minx, miny)
• R_size_br2 = (maxx - minx)*(maxy - miny)
• And pre-defined thr (threshold) value to determine if partitioning is
required for a rectangle (bbox)
• Pn : The number of partitions calculated for a rectangle
c
c
1. Determining the number of partitions (Pn)
39
Parallel processing with caching through
attribute-based query decomposition - II
• 2. How to partition a rectangle in bbox
– We know the rectangle’s bbox and Pn.
– Since we still don’t know the workload falls in that bbox earlier,
we partition that rectangle into equal sizes
– There are two options here, vertical partitioning and horizontal
partitioning. Let’s pick vertical and explain the algorithm:
maxx, maxy
1
2
Sy
Calculating the bboxes of the partitioned regions:
Pn
minx,miny
Partitioning the rectangle along
the coordinate y
for (i=0; i<Pn*sy; i=i+sy;)
print ( minx, miny – i, maxx, maxy-(i+sy) ) ;
40
Parallel processing with caching through
attribute-based query decomposition - III
• 3. How to created sub-queries
– After having partitioned regions’ bbox values printed in previous
step, corresponding sub-queries are created.
– Each partition is differentiated by their bbox values calculated
above. Other attributes are inherited from the main query.
– Ex: main query bbox is “-110, 35, -100, 40” and let’s assume we
found out that Pn=5 -110,35,-100,36
GetFeature-1
Decomposing the
rectangle according to
Pn and sy
A rectangle from the
rectangulation
process
-110, 35, -100, 40
Creating queries
for these bbox
values
-110, 35, -100, 36
GetFeature-1
-110, 36, -100, 37
GetFeature-2
-110, 37, -100, 38
GetFeature-3
-110, 38, -100, 39
GetFeature-4
-110, 39, -100, 40
GetFeature-5
41
Performance Tests – Based on Case Scenarios
• As a result of comparing bbox of cached
data and request
– (1) No usage of cached-data
– (2)-(3) Complete usage of cached-data
– (4) Partial usage of cached-data
R1
Critical data
provider in GML
WFS
R3
R2
R1
GetFeature requests
r1
r2
r3
rPn
1
2
R4
. . .
GML
Cached
GML1
Main query: cached
data extraction and
rectangulation
GML2
. . . .
GMLPn
Critical data falling into
partitioned regions
3
Critical data layer
R2
R1
Layers from Other
WFS and WMS
Successive
request
Cached
Data
R1
R2
Main query >---rectangulation---> Rectangles[Rs] >---partition---> sub-queries [rs]
4
42