Transcript Document
Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto ICDCS 2008 @ Beijing China Agenda Motivation Advertisement-based routing Covering Evaluation Conclusions ICDCS 2008 @ Beijing China Motivation Queries XML …… Content-based Data Dissemination Results …… Results XML Queries Data sources: publish XML data Data users: register XPath queries The data dissemination network: deliver matching results to a large and dynamically changing group of users ICDCS 2008 @ Beijing China Publish/Subscribe Advertisement (DTD) Subscriber Publisher Publication (XML) Subscriber Subscription (XPath) Matching of XMLs and XPaths [ICDE’06] Matching of Advertisements and XPaths Exploring relations among XPaths ICDCS 2008 @ Beijing China Covering-based Routing 1 5 3 4 2 6 ICDCS 2008 @ Beijing China Language Model Advertisement: generated from DTDs Non-recursive advertisement e.g., A = /t1/t2/t3…/tn-1/tn Recursive advertisement Simple Series Embedded A = A1(A2)+A3 A = A1(A2)+A3(A4)+A5 A = A1(A2(A3 )+ A4)+A5 <?xml encoding="UTF-8"?> <!ELEMENT personnel (person)+> <!ELEMENT person (name,email*,url*,link?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name ((family,given)|(given,family))> /personnel/person /personnel/person/name /personnel/person/name/family <!ELEMENT family (#PCDATA)> <!ELEMENT given (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT url EMPTY> <!ATTLIST url href CDATA 'http://'> <!ELEMENT link EMPTY> <!ATTLIST link manager IDREF #IMPLIED> /personnel/person/name/given /personnel/person/email /personnel/person/url /personnel/person/link …… DTD ICDCS 2008 @ Beijing China Advertisements Language Model Subscription: XPaths Absolute e.g., /c/d/*/e e d Relative c e.g., c/d/*/e Descendant operators e.g., c//e/*/c ICDCS 2008 @ Beijing China * * b e c a Advertisement-based Routing Subscription (S) Broker A1: /a/b/*/e P(A) P(S) P(S) P(A) A2: /b/e A3: /a/b/d P(A) P(S) A4: /a/b/e …… P(A) ICDCS 2008 @ Beijing China P(S) Overlapping Algorithms Basic case: A = /a /b /c /* /b /c /* /b /e Other cases: e.g, S = /a /b //c /* /b //e S = /a /b /c /* /b /e Next Table -1 0 0 0 1 2 /a /b /c /* /b /c /* /b /e /a /b /c /* /b /e ICDCS 2008 @ Beijing China Adv Sub Overlap * * Y * t Y t * Y t t Y t1 t2 N Subscription Tree Subscriptions are maintained in a hierarchical tree A child has more than one parent Siblings may intersect If a publication does not match a node, it does not /a/c match any of the descendants pointer ROOT /a /a/*/d /a/c/d ICDCS 2008 @ Beijing China /*/b /a/b /a/b/d /b /b/e /b/e/c/f d/a /b/d /b/d/a Tree Maintenance Insert Delete ICDCS 2008 @ Beijing China Covering Algorithms Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs XPEs with // operator e.g., S1 = /* /a //e /c S2 = /a /a /* //c /e /c /d S1 S2 Cover * * Y * t Y t * N t t Y t1 t2 N /* /a /e /c /a /a /*//c /*//c /e /c /d ICDCS 2008 @ Beijing China Merging Rules Rules XPEs with one difference (e.g., element, op) S = /a/*/c/* XPEs with different sub-XPEs e.g., S1= /a/*/c/d S2 = /a/*/c/e e.g., S1 S2 …… …… XPE1 XPE2 …… …… S …… // …… Merge degree P(S) P(S1) ICDCS 2008 @ Beijing China P(S2) Evaluation Setup Implemented in C++ Overlay with 127 content-based routers Cluster (each node:1.86GHz, 4G) vs. PlanetLab Workloads are generated from two DTDs: NITF and PSD Metrics Number of subscriptions per router Network traffic XPE processing time Notification delay ICDCS 2008 @ Beijing China Routing Table size (# of XPath Queries) Routing Table Size 100000 90000 80000 70000 60000 No Covering( Set A and B) 50% Covering (Set A) 90% Covering (Set B) 50000 40000 30000 20000 10000 0 0 20000 40000 60000 Number of Xpath Queries ICDCS 2008 @ Beijing China 80000 100000 Routing Table Size 50000 Routing Table Size Covering (Set B) 40000 Perfect Merging(Set B) Imperfect Merging(Set B) 30000 20000 10000 0 0 20000 40000 60000 Number of Subscriptions ICDCS 2008 @ Beijing China 80000 100000 Network Traffic Method Network Traffic Delay(ms) No-Adv-No-Cov 654,871 97.82 No-Adv-With-Cov 572,890 20.74 With-Adv-No-Cov 398,810 98.09 With-Adv-With-Cov 326,796 20.89 With-Adv-With-CovPM 254,900 16.78 With-Adv-With-CovIPM 257,567 12.24 ICDCS 2008 @ Beijing China Process Time ICDCS 2008 @ Beijing China Notification Delay (PSD) ICDCS 2008 @ Beijing China Notification Delay (NITF) ICDCS 2008 @ Beijing China Related Work Locating data sources in large distributed systems Equivalence between the original query set and the aggregated set ONYX [Diao et al. 2004] DHT based approach Data summary Query aggregation for scalable data dissemination [Chan et al. 2002] [Galanis et al. 2003] Deliver part of the XML documents Share common prefixes among queries using NFA XTreeNet [Fenner et al. 2005] Unify the pub/sub model and the query/response model Avoid repeatedly matching at each hop ICDCS 2008 @ Beijing China Conclusions Investigate advertisement-based routing for XML data dissemination networks Propose a novel data structure to maintain covering & merging relationships among XPEs. Perform experimental evaluation on a 127 broker overlay to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85% Future work Extend to tree patterns Share common prefixes among XPEs in overlapping and covering algorithms ICDCS 2008 @ Beijing China Q&A Contact [email protected] [email protected] Middleware systems research group, University of Toronto www.msrg.eecg.toronto.edu ICDCS 2008 @ Beijing China Process Time 140 Time (ms) 120 100 80 60 40 20 0 500 1000 1500 2000 2500 3000 Number of Subscriptions ICDCS 2008 @ Beijing China 3500 4000 4500 5000 Notification Delay (NITF) ICDCS 2008 @ Beijing China Notification Delay (PSD) Notification Delay (ms) 16 12 8 4 0 2 3 4 Number of Hops ICDCS 2008 @ Beijing China 5 6 False Positives False Positive (%) 8 6 4 2 0 0 0.05 0.1 Imperfect Degree ICDCS 2008 @ Beijing China 0.15 0.2 Conclusions Investigate advertisement-based routing for XML data dissemination networks Present algorithms to determine the covering relations among arbitrary XPEs Propose a novel data structure to maintain covering & merging relationships among XPEs. Explore rules to merge similar XPEs in order to further reduce the routing table size Perform experimental evaluation on a 127 broker overlay to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85% ICDCS 2008 @ Beijing China