Transcript Document

Routing of XML and
XPath Queries in Data
Dissemination Networks
Guoli Li, Shuang Hou
Hans-Arno Jacobsen
Middleware Systems Research Group
University of Toronto
ICDCS 2008 @ Beijing China
Agenda





Motivation
Advertisement-based routing
Covering
Evaluation
Conclusions
ICDCS 2008 @ Beijing China
Motivation
Queries
XML
……
Content-based Data
Dissemination
Results
……
Results
XML



Queries
Data sources: publish XML data
Data users: register XPath queries
The data dissemination network: deliver matching results to a large and
dynamically changing group of users
ICDCS 2008 @ Beijing China
Publish/Subscribe
Advertisement (DTD)
Subscriber
Publisher
Publication (XML)
Subscriber
Subscription (XPath)



Matching of XMLs and XPaths [ICDE’06]
Matching of Advertisements and XPaths
Exploring relations among XPaths
ICDCS 2008 @ Beijing China
Covering-based Routing
1
5
3
4
2
6
ICDCS 2008 @ Beijing China
Language Model

Advertisement: generated from DTDs

Non-recursive advertisement


e.g., A = /t1/t2/t3…/tn-1/tn
Recursive advertisement



Simple
Series
Embedded
A = A1(A2)+A3
A = A1(A2)+A3(A4)+A5
A = A1(A2(A3 )+ A4)+A5
<?xml encoding="UTF-8"?>
<!ELEMENT personnel (person)+>
<!ELEMENT person (name,email*,url*,link?)>
<!ATTLIST person id ID #REQUIRED>
<!ELEMENT name ((family,given)|(given,family))>
/personnel/person
/personnel/person/name
/personnel/person/name/family
<!ELEMENT family (#PCDATA)>
<!ELEMENT given (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT url EMPTY>
<!ATTLIST url href CDATA 'http://'>
<!ELEMENT link EMPTY>
<!ATTLIST link manager IDREF #IMPLIED>
/personnel/person/name/given
/personnel/person/email
/personnel/person/url
/personnel/person/link
……
DTD
ICDCS 2008 @ Beijing China
Advertisements
Language Model

Subscription: XPaths

Absolute


e.g., /c/d/*/e
e
d
Relative


c
e.g., c/d/*/e
Descendant operators

e.g., c//e/*/c
ICDCS 2008 @ Beijing China
*
*
b
e
c
a
Advertisement-based Routing
Subscription (S)
Broker
A1: /a/b/*/e
P(A)
P(S)
P(S) P(A)
A2: /b/e
A3: /a/b/d
P(A)
P(S)
A4: /a/b/e
……
P(A)
ICDCS 2008 @ Beijing China
P(S)
Overlapping Algorithms

Basic case:

A = /a /b /c /* /b /c /* /b /e
Other cases:
e.g, S = /a /b //c /* /b //e
S = /a /b /c /* /b /e
Next Table
-1 0 0 0 1 2
/a /b /c /* /b /c /* /b /e
/a /b /c /* /b /e
ICDCS 2008 @ Beijing China
Adv
Sub
Overlap
*
*
Y
*
t
Y
t
*
Y
t
t
Y
t1
t2
N
Subscription Tree




Subscriptions are
maintained in a hierarchical
tree
A child has more than one
parent
Siblings may intersect
If a publication does not
match a node, it does not
/a/c
match any of the
descendants
pointer
ROOT
/a
/a/*/d
/a/c/d
ICDCS 2008 @ Beijing China
/*/b
/a/b
/a/b/d
/b
/b/e
/b/e/c/f
d/a
/b/d
/b/d/a
Tree Maintenance


Insert
Delete
ICDCS 2008 @ Beijing China
Covering Algorithms

Similar to Adv-Sub
overlapping algorithms


Absolute simple XPEs
Relative simple XPEs

XPEs with // operator
e.g.,

S1 = /* /a //e /c
S2 = /a /a /* //c /e /c /d
S1
S2
Cover
*
*
Y
*
t
Y
t
*
N
t
t
Y
t1
t2
N
/* /a /e /c
/a /a /*//c
/*//c /e /c /d
ICDCS 2008 @ Beijing China
Merging Rules

Rules

XPEs with one difference (e.g., element, op)


S = /a/*/c/*
XPEs with different sub-XPEs


e.g., S1= /a/*/c/d S2 = /a/*/c/e
e.g.,
S1
S2
……
……
XPE1
XPE2
……
……
S
……
//
……
Merge degree
P(S)
P(S1)
ICDCS 2008 @ Beijing China
P(S2)
Evaluation

Setup

Implemented in C++

Overlay with 127 content-based routers
Cluster (each node:1.86GHz, 4G) vs. PlanetLab
Workloads are generated from two DTDs: NITF and PSD



Metrics




Number of subscriptions per router
Network traffic
XPE processing time
Notification delay
ICDCS 2008 @ Beijing China
Routing Table size
(# of XPath Queries)
Routing Table Size
100000
90000
80000
70000
60000
No Covering( Set A and B)
50% Covering (Set A)
90% Covering (Set B)
50000
40000
30000
20000
10000
0
0
20000
40000
60000
Number of Xpath Queries
ICDCS 2008 @ Beijing China
80000
100000
Routing Table Size
50000
Routing Table Size
Covering (Set B)
40000
Perfect Merging(Set B)
Imperfect Merging(Set B)
30000
20000
10000
0
0
20000
40000
60000
Number of Subscriptions
ICDCS 2008 @ Beijing China
80000
100000
Network Traffic
Method
Network Traffic
Delay(ms)
No-Adv-No-Cov
654,871
97.82
No-Adv-With-Cov
572,890
20.74
With-Adv-No-Cov
398,810
98.09
With-Adv-With-Cov
326,796
20.89
With-Adv-With-CovPM
254,900
16.78
With-Adv-With-CovIPM
257,567
12.24
ICDCS 2008 @ Beijing China
Process Time
ICDCS 2008 @ Beijing China
Notification Delay (PSD)
ICDCS 2008 @ Beijing China
Notification Delay (NITF)
ICDCS 2008 @ Beijing China
Related Work

Locating data sources in large distributed systems



Equivalence between the original query set and the aggregated set
ONYX [Diao et al. 2004]



DHT based approach
Data summary
Query aggregation for scalable data dissemination [Chan et al. 2002]


[Galanis et al. 2003]
Deliver part of the XML documents
Share common prefixes among queries using NFA
XTreeNet


[Fenner et al. 2005]
Unify the pub/sub model and the query/response model
Avoid repeatedly matching at each hop
ICDCS 2008 @ Beijing China
Conclusions



Investigate advertisement-based routing for XML data
dissemination networks
Propose a novel data structure to maintain covering &
merging relationships among XPEs.
Perform experimental evaluation on a 127 broker overlay
to demonstrate the approach



Reduce routing table by up to 90%
Improve routing latency by roughly 85%
Future work


Extend to tree patterns
Share common prefixes among XPEs in overlapping and
covering algorithms
ICDCS 2008 @ Beijing China
Q&A


Contact
 [email protected][email protected]
Middleware systems research group, University of Toronto
 www.msrg.eecg.toronto.edu
ICDCS 2008 @ Beijing China
Process Time
140
Time (ms)
120
100
80
60
40
20
0
500
1000
1500
2000
2500
3000
Number of Subscriptions
ICDCS 2008 @ Beijing China
3500
4000
4500
5000
Notification Delay (NITF)
ICDCS 2008 @ Beijing China
Notification Delay (PSD)
Notification Delay (ms)
16
12
8
4
0
2
3
4
Number of Hops
ICDCS 2008 @ Beijing China
5
6
False Positives
False Positive (%)
8
6
4
2
0
0
0.05
0.1
Imperfect Degree
ICDCS 2008 @ Beijing China
0.15
0.2
Conclusions





Investigate advertisement-based routing for XML data
dissemination networks
Present algorithms to determine the covering relations
among arbitrary XPEs
Propose a novel data structure to maintain covering &
merging relationships among XPEs.
Explore rules to merge similar XPEs in order to further
reduce the routing table size
Perform experimental evaluation on a 127 broker overlay
to demonstrate the approach


Reduce routing table by up to 90%
Improve routing latency by roughly 85%
ICDCS 2008 @ Beijing China