([email protected]) Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs measurements and analysis • Conclusions.
Download ReportTranscript ([email protected]) Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs measurements and analysis • Conclusions.
([email protected]) 1 Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs measurements and analysis • Conclusions 2 Geographic Information Systems (GIS) • GIS is a system for: creating, storing, sharing, analyzing, manipulating and displaying geodata and associated attributes. • Inherently requires federation (see the figure) – Autonomy for scalability flexibility and extensibility • Distributed data access for geodata resources (databases, digital libraries etc.) • Utilizing remote analysis, simulation or visualization tools. • Open Standards – OGC – ISO/TC-211 3 Motivations • Requirements for – o Interoperable Service-oriented Geographic Information Systems – Necessity for sharing and integrating heterogeneous data and computation resources to produce knowledge. o Uniform data access/query, display and analysis from a single access point o Responsive and interactive information systems – GIS applications require quick response • Emergency early warning systems • Home-land security and natural disasters. 4 Research Issues • Interoperability – Defining component based Service-oriented GIS data Grid framework – Adoption of Open Geographic Standards -data model and services – Applying Web Service principles to GIS data services – Integrating Web Service and Open Geographic Standards • Federation – Capability-based federation of GIS Web Service components – Unified data access/query, display from a single access point through integrated data-views • Addressing high-performance support for responsiveness – Streaming GIS Web Services and Pre-fetching framework – Client-based caching – Parallel processing through attribute based query decomposition 5 Web Service components and data-flow Service-oriented GIS • • WMS are data rendering services providing human comprehensible data Built over: (binary map images) – are Web Services and • WFS data servicesstandards providing (WS-I+) data in common data model GML – Geographic Markup Language – Open Geographic Standards (OGC and ISO/TC-211) behavingofastwo mediator and services. • •Consists types ofannotation online services • WMS and WFS have their own typeand of capability metadata defined by Open – Web Map Services (WMS) Web Feature Services (WFS) Geographic specs. And two types of data:is done through “getCapability” service interface. • • Inter-service communication – Binary data –map images (provided by WMS), • UDDI based registry services. • Components are Web Services all control through SOAP messages – Structured-data –GML :and content (coregoes data) and presentation (attribute geometry elements) (provided by WFS) • XML-based queryand language (standard schema) Relation of the components and data flow: Binary data w s d l getCapability getMap getFeatureInfo WMS GML rendering GIS GML w s d l WFS (mediator) getCapability getFeature DescribeFeatureType 6 Capability-based Federation of Standard GIS Web Service Components Web Map Client Interactive map tools WSDL Aggregating WMS (Federator) Stubs – Inspired from OGC’s cascading WMS Stubs HTTP SOAP WSDL WFS + Seismic Rec. WSDL “REST” WFS + State Bounds WMS + OnEarth Google Maps … • Built over the proposed standard Web Service components and common data models • Federation is done by aggregating GIS Web Services’ capabilities metadata • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions – layer based data and service (WMS and WFS) compositions • Capability is basically a metadata about data+service: – Server’s information content and acceptable request parameter values 7 Why Capability metadata • Web Services provide key low level capability but do not define an information or data architecture • These are left to domain specific capabilities metadata and data description language (GML). • Machine and human readable information – Enables easy integration and federation • Enables developing application based standard interactive re-usable tools – for data query display and analysis – Seamless data/access/query 8 Designs, measurements and analysis 9 Performance Investigation • Interoperability requirements bring up some compliance costs: – Common data model (GML) – Web Services (SOAP protocol for communication) • Approaches: Enhancing the GIS systems’ responsiveness – Data transfer and rendering • Streaming GIS Web Services (1) • Structured/annotated GML data rendering (2) – Federator-oriented approaches • Pre-fetching (3) • Client-based caching (4) • Query decomposition and parallel processing (5) • Testing with large scale Geo-science applications – Earthquake forecasting (PI), – Virtual California (VC) • Aim: Turning compliance requirements into competitiveness 10 Conventional OGC-GIS systems Baseline Performance Test • Naïve approach is characterized as – Stateless services – On-demand data access, – Single-threaded and no-caching • Systems developed with Open Geographic Standards have: – High degree of interoperability but poor performance results Test Setup: Average Response Times Тысячи 70 60 Time - msec 50 40 30 20 Avg Resp Time 10 0 0 200 400 600 800 Data Size -KB 1000 1200 11 (1) Streaming GIS Web-Services • Concern is large-sized XML-structured data transfer • XML representation of data tend to be significantly larger than binary representations – The larger data sizes consume the greater network bandwidth – We still need to use it for interoperability reasons • In initial development of the proposed Serviceoriented GIS we used GIS Web Services and SOAP over HTTP as transfer protocol. – BUT, this had some limitations over the performance. • We investigated “Streaming Data Transfer” – topic-based publish-subscribe messaging systems for exchanging SOAP messages and data payloads. 12 (1) Streaming GIS Web-Services (Cont) registry UDDI Average Response Times (ART) for Streaming and Non-Streaming cases w s d l (A)WMS Subscriber client 5 1 w getFeature s 3 d (topic, IP, port) l GML GML Topic-wfs Narada Brokering Server 4 WFS 3 Publisher server Log(Time) in msec 2 6 2 ART-Streaming 1 ART-Non-Streaming 0 0 200 400 600 800 Data Size -KB 1000 1200 • Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services • SOAP is used for negotiation (line-3) – standard getFeature request – Publisher information in (topic, IP, port) triple is returned. • Publisher streams, subscriber receives. • The performance gain is average 40% 13 (2) GML Data Processing • Processing XML data: Parsing and rendering to create map images. • Two well-known approaches are document models (DOM) and push models (SAX). • We use pull approach for XML processing: – Parses only what is asked for – No support for document validation (major gains of performance) – Doesn’t build complete object model in memory (unlike DOM) – Contents are returned directly to application from calls to parser (unlike SAX) GML rendering by using DOM vs. Xpp 4 000 (KB) 3 500 3 000 Time - msec Total rendering timings (1GB allocated VM) Data Size 2 500 2 000 1 500 1 000 dom4j 500 Xpp 0 0 2000 4000 6000 8000 Data Size -MB 10000 12000 DOM (dom4j) pull (Xpp) 1 469.22 15.59 10 494.06 72.81 100 625.54 183.06 1,000 760.20 270.47 5,000 1,422.91 671.74 10,000 3,557.44 1,025.67 100,000 -OUT OF MEM - 7,059.72 150,000 -OUT OF MEM - 11,047.89 200,000 -OUT OF MEM - 14,949.12 15 (3) Pre-fetching • • • • Getting the GML data before it is needed Extension for Pre-fetching Module is shown in grey region Overcomes the network bandwidth problem and repeated data conversions. This technique is good for infrequently changing archived data – In other case, it might cause consistency problem • Red curve – map rendering over the pre-fetched data (ready to use GML data) • Black curve – map rendering through on-demand fetching User Portal Interactive Tools Federator WFS WMS Processor WMS 2 2 1 1 WFS PR WFS GML PR runs pre-defined task in pre-defined periodicity Temp Storage Local File System NB PR: Pre-fetching runner NB: NaradaBrokering WMS: Web Map Service WFS: Web Feature Service 16 (3) Pre-fetching vs. On-demand Fetching Data Size MB Average Response Pre-fetching StdDev Average Response On-demand StdDev 0.01 19,261.90 481.57 1,808.13 140.32 0.1 19,112.30 673.69 2,635.46 313.48 0.5 19,222.48 631.35 5,001.29 238.94 1 19,427.48 305.94 8,225.73 200.27 5 20,146.00 516.50 33,419.31 394.48 10 20,165.90 546.53 Comparison of the Average 64,506.78 Response Times283.24 50 systems 22,882.52 Prefetcing 509.98 vs. On-demand 316,906.00 623.08 23,990.43 548.65 100 1 000 000 603.59 643,344.00 • For 100MB, prefetching is about 30 times faster conventional ondemand fetching. • The larger the data size the higher the performance gains. Average Response Times for Prefetcing system 100 000 30 000 Log(Time - msecs) 10 000 Time - msecs 25 000 20 000 1 000 15 000 100 10 000 10 5 000 log(Pre-fetching) Response Time log(On-demand) 1 0 0 0 20 20 40 40 60 60 Data Size -MB 80 Data Size -MB 80 100 100 120 120 17 (4) Client-based Caching • Each client has separate caching area allocated. • Application of working-window and locality principles into map images rendering • Clients are differentiated according to the client assigned session-id parameter in the header of queries. • Always keep the least recently-used data • Brings up some overhead to keep up workingwindow for each client. 18 Brief Architecture Server-side Create identity card. Update at every request from the client • FormerRequest Class String uuid; /*unique-user-id*/ String bbox; /*bounding box of the user’s last request*/ Double density; /*data size falling into per unit square*/ Vector [] feature_data; /*geometry elements of the last request*/ Register to client table Client-side uuid-1 uuid-2 ….. FormerRequest-1 FormerRequest-2 …… Set identity to message header ClientWSStub binding; binding = (ClientWSStub ) new ServiceLocator().WMSServices( servaddress)); String sessionID = session.getid(); //uuid-1 String channel_name = “getMapChannel”; /*Add SessionID to the SOAP message’s header*/ binding.setHeader(service_address, channel_name, sessionID); 19 Map mymap = binding.getMap(request); Why Client-based Caching • Makes stateless GIS Web Services stateful • Allows share workload as equal as possible for the most efficient parallel processing. Comparing with Google-like Map Servers: • In large scale applications it is impossible to cache whole data – Limited storage and computation capabilities • Google-like map servers are fast because – They replace computation with storage. – Pre-making all images and cut up into tiles – They formalize the accepted requests in terms of parameters, and responses in terms of the tile compositions. • BUT, good for only the client-server based applications – It can’t be applied to distributed dynamic data rendering and extensible applications. – They don’t deal with the feature enriched maps enabling attribute-based querying, – And structured/annotated scientific data rendering. 20 (5) Parallel Processing over Client-based Caching Main query cached-data extraction rectangulation {Rectangles[Ri]} partitioning – {sub-queries [ri]} assigning separate threads assembling the results R1 1 R3 Critical data provider in GML WFS R2 R1 2 R4 r1 GML Cached GML1 r2 GML2 r3 . . . . . . . rPn GMLPn GetFeature requests Critical data falling into partitioned regions 3 Successive request Main query: cached data extraction and rectangulation Critical data layer R2 R1 Cached Data R1 R2 4 Layers from Other WFS and WMS 21 Challenge: Geo-Data Characteristic (c,d) R3 (c,d) R2 (c, (b+d)/2) (c, (b+d)/2) R1 (a,b) R4 ((a+c)/2, b) (1) (a,b) ((a+c)/2, b) (2) • Need for advanced techniques for workload sharing ! • A point data is described with location attribute – (x, y) coordinates. • Linestrings, polylines, polygons etc are defined as set of points. • Data sets falling into a queried region is formulated as bounding box (bbox) – Coordinates of a rectangle (a, b, c, d) • Geo-data is characterized as un-evenly distributed and variable sized according to their locations attributes. – Ex. Human population 22 Attribute-based Query Decomposition • Cached data extraction • Rectangulation over the remaining : R1, R2, R3, R4 • Each rectangle goes through partitioning process. – Blind partitioning • Such as first time queries • Uses default partitioning number – Smart partitioning • client-based caching • FormerRequest Object • All partitions are assigned to separate threads and results are merged to create final response maxx,maxy, Cached Data minx,miny, R3 R1 Query R2 R2 R1 R4 R2 R1 Partition into 4 23 Smart Partitioning through Client-based Caching • Based-on the locality principles. – Assumption: Former and current requests have similar data density • Cached data area: CD_size_br2 = (maxxc - minxc)*(maxyc - minyc) • Main-query area: R_size_br2 = (maxx - minx)*(maxy - miny) • Thr: Pre-defined threshold value changing from data to data. • Pn : The number of partitions calculated for a rectangle (maxxc, maxyc) (maxx, maxy) Determining the most efficient number of partitions (Pn) Cache Query (minxc, minyc) (minx, miny) If Pn >= 2 Cut the rectangle into Pn number of equal sized regions. 24 Assigning Partitions to Workers • Partitions are assigned to the worker nodes in round-robin fashion. • We keep a pool of worker nodes for each feature layer that parallel processing is applied. • According to the algorithm – PN: number of partitions – WN: number of worker nodes in the pool – share is the number of partitions each worker is supposed to get • Check if there is still remaining partitions waiting • Assignments: • First rmg #of worker nodes assigned share+1 • And others (WN-rmg) are assigned share number of partitions 25 -110,35,-100,36 GFeature-1 -110,36,-100,37 GFeature-2 -110,37,-100,38 GFeature-3 -110,38,-100,39 GFeature-4 -110,39,-100,40 GFeature-5 Vertical partitioning in case of having 5 partitions 26 Data Access Timings -No Cached DataТысячи Comparisons of data capturing times based on different partitioning levels 70 60 single-thread 2-thread 10-thread 20-thread Time - msecs 50 40 30 20 10 0 0 2 4 6 8 10 12 Data Size -MB • Tdata access = Tquery conversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator + TBuilding GML at federator Federator WFS DB 27 Overhead and Response Timings ex. case: 10-threaded parallel processing Comparisons of overheads for 10-threaded case based on different partitioning levels Тысячи Comparisons of response times with single threaded case 70 2 000 1 600 single-threaded partitioning sub-query crt merging 1 400 Time - msecs 50 Time - msecs 1 800 10-threaded 60 40 30 1 200 1 000 800 600 20 400 10 200 0 0 0 2 4 6 8 10 12 0 5 10 Data Size -MB 15 20 25 30 35 Partition Number • The performance does not increase in the same ratio at which the thread number increases – Overheads: Query partitioning, sub-query creation, map creation and map transfer. – There is no performance gain for less then a threshold-data size handled. Browser Eventbased dynamic map tools Federator WFS WFS DB 28 Partial Usage of Cached Data (Ex. case:1/2 cached) Comparison of the response times Half cache-10 thrd NO Cache-10 thrd NO Cache-Single thrd Avg. Time StdDev avg time std dev Avg. Time StdDev • Data MB 0.01 3,095.19 204.22 2,329.50 131.46 1,808.13 0.1 3,576.73 283.8 2,760.00 104.35 2,635.46 0.5 3,721.77 210.41 3,460.40 120.24 5,001.29 1 4,311.73 192.45 4,640.53 106.42 8,225.73 5 11,294.58 313.59 16,725.4 201.62 33,419.31 10 18,371.72 296.19 23,118.4 941.83 64,506.78 Тысячи Comparisons of response times 70 – As the data size increases. – As the overlapped cached region increase – 100% overlapping -> look like pre-fetching case half-cached/10-thread no-cached/10-thread no-cached/single-thread 60 50 Time - msecs There is no performance gain for the small sizes of 140.32 data due to the overheads. 313.48 • For 10mb, the proposed 238.94 system is almost 4 times 200.27 faster than the ordinary on394.48 demand one-threaded system. 283.24 • The performance gain increases: 40 30 20 10 CT 0 0 2 4 6 Data Size -MB 8 10 Fedrtr WFS WFS WFS DB 12 29 Conclusions • Streaming data transfer techniques allow data rendering even on partially returned data. • Pull parsing results in best outcomes for XML encoded GML data rendering - Eliminating the requirement of data validation. • Federator’s natural characteristic allowed us develop advanced caching and parallel processing designs. • Pre-fetching and parallel-processing techniques are mutually exclusive. • Best performance outcomes are achieved through pre- fetching but can cause data inconsistency . – Triggering periodicity must be defined carefully. • Parallel-processing techniques’ success is based on how well we share the workload to worker nodes. – Un-evenly distributed and variable sized geo-data characteristics. • We saw that – Application of working-window and locality principles by means of client-based caching. – Parallel processing through attribute-based query decomposition Helped us increase the system responsiveness to a greater extent. 30 Conclusions – General Framework • Heterogeneous data sources are queried as a single resource – Heterogeneous: Autonomous local resources controlling definition of data – Single resource: Remove the burden of individually accessing each data source with ad-hoc query languages. – WFS-based mediation : • Data and query conversions • Easy extension with new data and service resources – Open Geographic and Web Service standards • No physical data integration – Data always at local source – Easy maintenance of data and high degree of autonomy • Seamless interaction with the system through integrated data views as multi-layered map images 31 Contributions • A federated Service-oriented Geographic Information Systems framework – Integrating Web Services with Open Geographic Standards to support interoperability at both data and service levels – Production of knowledge from distributed data sources in multi-layered map images. • Hierarchical data definitions through capability metadata federations • Enabling unified interactive data access/query and display. • Investigated performance efficient designs and did detailed benchmarking – Streaming GIS Web Services – Federator-oriented high-performance design techniques • Pre-fetching • Client-based caching : Working-window and locality principles • Parallel processing through attribute-based query decomposition 32 Acknowledgement • The work described in this presentation is part of the QuakeSim project which is supported by the Advanced Information Systems Technology Program of NASA's Earth-Sun System Technology Office. • Galip Aydin: Web Feature Server (WFS) 33 Thanks!.... 34 BACK-UP SLIDES 35 Capability-based Federation of the standard Web Service Components • Built over the proposed standard Web Service components and common data models • Unified data access/query/display from a single access point Application-based hierarchical data: • Providing application-based hierarchical data definitions [Application]– layer based data and servicePattern (WMS andInformatics WFS) compositions • Federation is – done by aggregating GIS Web Services’ capabilities metadata [Layer-1] State-boundary over Satellite • Capability is basically a metadata about data+service: • [Data-1] – Server’s information content and acceptable request parameter values – State-boundary (WFS-1) Capability Federation Map Rendering • [Data-2] a, b, c and d User Portal – Interactive Map-Tools Browser Events: Satellite-Image(WMS-2) 1 – [Layer-2] 2 1 WFS 2 Federator WMS WMS c 2 1 3 map (WMS-1) • Google GIS WFS 1. GetCapability (metadata data+service) • [Data-1] 2. GetMap (get map data in set of layer(s)) GetFeatureInfo (query the attributes of data) • 3.Earthquake-Seismic(WFS-3) Sample Layers for PI: a. WFS d – [Layer-3]- Earthquake-Seismic - Move, - Zooming in/out - Panning (drag-drop) - Rectangular region - Distance calc. - Attribute querying a WMS b b. c. d. NASA satellite layer Earthquakeseismic layer Google Map Layer State-boundaries Layer 36 Hierarchical data Integrated data-view 1 2 3 1: Google map layer 2: States boundary lines layer 3: seismic data layer Event-based Interactive Tools : Query and data analysis over integrated data views 37 38 • • • • • Integrated views Event-based querying through integrated views. WFS-based mediators XML-based query language Federation related specific related works (might not be active) – MIX mediation of information using XML – SRB/MCAT (SDSC) – TSIMMIS (Stanford Univ) • XML-based standard queries for the standard services. – Capability gives the list of data provided, attribute lists they can be queried and constraints on the queries to make create valid requests such as getMap, getFeature.) • We do syntactical and structural integration. 39 Hierarchical data / Integrated data-view For IEISS Geo-science Application Application-based hierarchical data: [Application]- IEISS – [Layer-1] Gas-pipeline over Satellite • [Data-1] – Gas-pipeline (WFS-1) • [Data-2] – Satellite-Image(WMS-2) – [Layer-2] • Google map (WMS-1) – [Layer-3]- Electric-power • [Data-1] • Electric-power(WFS-3) 40 GetCapabilities Schema and Sample Request Instance 41 GetMap Schema and Sample Request Instance 42 43 Event-based Interactive Map Tools • <event_controller> – – – – – – – – <event name="init" class="Path.InitListener" next="map.jsp"/> <event name="REFRESH" class=" Path.InitListener " next="map.jsp"/> <event name="ZOOMIN" class=" Path.InitListener " next="map.jsp"/> <event name="ZOOMOUT" class="Path.InitListener" next="map.jsp"/> <event name="RECENTER" class="Path.InitListener“next="map.jsp"/> <event name="RESET" class=" Path.InitListener " next="map.jsp"/> <event name="PAN" class=" Path.InitListener " next="map.jsp"/> <event name="INFO" class=" Path.InitListener " next="map.jsp"/> • </event_controller> 44 Sample GML document 45 Sample GetFeature Request Instance 46 A Template simple capabilities file for a WMS 47 Generalizing the Problem Domain • Query heterogeneous data sources as a single resource Client/User-Query – Heterogeneous: local resource controls definition of the data – Single resource: remove the burden of individually accessing each data source Integrated View • Easy extension with new data and service resources • No real integration of data Mediator DB Mediator Files Mediator WWW Data in files, HTML, XML/Relational Databases, Spatial Sources/sensors – Data always at local source – Easy maintenance of data • Seamless interaction with the system – Collaborative decision makings 48 Generalization of the Proposed Architecture •• GIS-style information can be redefined We need to definemodel Application Specific: in any application areas such as Chemistry and Astronomy • Federator federating the capabilities of distributed ASVS – Application Specific Information Systems (ASIS). and ASFS to create application-based hierarchy of • We need to definedata Application Specific distributed and service resources. – Language (ASL) -> GML :expressing domain specific features, semantic of • Mediators: Query and data format conversions data –• Feature Service (ASFS) -> WFStheir :Serving data in common language (ASL) Data sources maintain internal structure –• Visualization Services (ASVS) -> WMS : Visualizes information and provide Large degree of autonomy a way of navigating ASFS compatible/mediated data resources No actualmetadata physicalfordata –• Capabilities ASVSintegration and ASFS. Such as filter, transformation, reasoning, data-mining, analysis Unified data query/access/display 1 Federator 2 ASVS 3 Capability Federation ASL-Rendering Standard service API 4 Standard service API 3 AS Services (user defined) Mediator Messages using ASL 2 Standard service API AS Repository 1 Mediator ASAS Sensor Sensor 49 Contributions (Systems Software) • Developing Web Map Server (WMS) in Open Geographic Standards – Extended with Web Service Standards and – Streaming map creation capabilities • Developing GIS Federator – Provides application specific layer-structured hierarchical data as a composition of distributed standard GIS Web Service components – Enable uniform data access and query • Interactive map tools for data display, query and analysis. – Browser and event-based. – Extended with AJAX (Asynchronous Java and XML) 50