Transcript Powerpoint

Querying XML with Locator
Semantics
Peter Fankhauser
joint work with:
Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie
GMD German National Research Center for Information Technology
Institute for Integrated Publication- and Informationsystems
GMD-IPSI
http://xml.darmstadt.gmd.de/
Querying XML with Locator Semantics
Slide 1
Overview
Requirements for Querying XML
XQL Overview
Locators
Locator Algebra
IPSI XML-Brokering Framework
Querying XML with Locator Semantics
Slide 2
General Requirements for Querying XML
(Excerpt from Dave Maier, W3C QL 98)
Require no schema
• flexibly match irregular structure
• preserve (irregular) structure
Query & Preserve Order and Association
• sibling order
• hierarchy
Precise Semantics
• rewrite rules
• compositional semantics
Closedness/Completeness
• XML to XML
• when is a QL for XML complete?
Querying XML with Locator Semantics
Slide 3
Running Example
Bookstore:
<books_and_customers>
<bookstore>
<fiction>
<sci-fi>
<book>
<isbn>0006482805</isbn>
<title>Do androids dream of electric sheep</title>
<author>Philip K. Dick</author>
</book>
</sci-fi>
<fantasy>
<mystery>
<book>
<isbn>0261102362</isbn>
<title>The two towers</title>
<author>JRR Tolkien</author>
</book>
</mystery>
</fantasy>
</fiction>
</bookstore>
<!-- continued next column -->
Querying XML with Locator Semantics
• Non Uniform Hierarchy
• sci-fi: 2 levels
• mystery: 3 levels
Customers: Flat Table
<customers>
<customer>
<name>Jason Woolsey</name>
<boughtbooks>
<isbn>0261102362</isbn>
<isbn>0593488321</isbn>
</boughtbooks>
</customer>
<customer>
<name>P.W. Ellis</name>
<boughtbooks>
<isbn>0006482805</isbn>
<isbn>0261102362</isbn>
</boughtbooks>
</customer>
</customers>
</ books_and_customers >
Slide 4
Functional Requirements for Querying XML
(Dave Maier, W3C QL 98)
Selection and Extraction:
• all sci-fi books by P.K. Dick
Reduction:
• drop all authors but 1st author
Combination:
• combine all books with their customers via isbn
Restructuring:
• return flat lists of title/author pairs
• and vice versa
Multidocument Handling:
• get reviews and books from different sites
• follow (dereference) links in books to authors
Querying XML with Locator Semantics
Slide 5
XQL Overview (State W3C QL 98)
Basic Concept: Selection of Subtrees
• Originated as QL for DOM
• adopted for selectors in XSL-templates
(now merged with XPointer to XPel to XPath to ????)
• Defined along search contexts = an (ordered) set of document nodes
Path Expressions and Filters:
• A query is essentially a navigation in element trees
• Navigation and filters modify the search context
• Query result is the last search context
Selection of nodes by:
•
•
•
•
Element- and attribute name
Type (element, attribute, comment, etc.)
Content or value of nodes
Relationship between nodes: hierarchy, sequence, index
Combination by: union, intersection
Querying XML with Locator Semantics
Slide 6
XQL 98 Examples
Selection and Extraction:
• all books by P.K. Dick
//book[author=„P.K. Dick“]
Reduction:
• drop all but 1st author
//*?/book?/(isbn | author[0] | title)
• * matches all elements along paths to book
• shallow return operator (?) retains nesting hierarchy
• union preserves document order (title before author)
Querying XML with Locator Semantics
Slide 7
XQL 98 lacked:
Selection Functionality
• comparison operators for fulltext (in progress)
• regular path expressions for hierarchy (only // for recursive
descent and * for matching all nodes in a search context)
Restructuring
• Suggestions: return operators (SAG), XSLT (W3C), Application
Level (e.g. WebMethods)
Combination
• joins; Suggestions: see below
Graphs
• no navigation along ID/IDREF
• no multi-documents (dereferencing URIs)
• Suggestions: docref, ref, keyref, idref
Delegation
• external functions
• wrappers
Querying XML with Locator Semantics
Slide 8
Extended XQL Examples
Combination:
• combine all books with customers via isbn
$root//*?/book?[$i:=isbn]/
(* | $root//customer?[boughtbooks/isbn=$i])
• New concepts
• combination with nodes outside of search context ($root//review)
• correlation variables for expressing join predicate [$i:=isbn]
• $root used for clarity...
• Irregular structure of bookstore is preserved
Multidocuments/Delegation:
• get multiple bookstores from a bookmark list (HTTP-GET)
docref('http://www.bookstores')/docref(.//@href)//bookstore
• the same with a form (HTTP-POST - simplified!)
docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore
• the same with a wrapper (application program delivering XML)
wrapper(„bookstore“)//bookstore
Querying XML with Locator Semantics
Slide 9
Towards a Datamodel for querying XML
<document>
<person id=“jonathanr">
<firstname>Jonathan</firstname>
<lastname>Robie</lastname>
</person>
<person id=“joel">
<firstname>Joe</firstname>
<lastname>Lapp</lastname>
<!-- ... -->
<document>
?
W3C-DOM:
Element Tree
FlatElemTable
flat
"Text zu Elem1"
"Text zu Elem4"
"Text zu Elem6"
"Text zu Elem8"
"Text zu Elem10"
NonFlatElemTable
down
etName
1
"E0"
3
"E2"
4
"E3"
6
"E5"
8
"E7"
10
"E9"
DocElemTable
own_id doc up succ pred
0
1 - 1
1 0 2
2 - 3
2 2 5 4
2 3 5
2 2 7 3
6
2 5 7
2 2 5
8
2 7 9
3 - 10
3 9 -
person
?
author
firstname
Jonathan Robie
?
DocumentTable
own_id name dtdref root
1
"Dok1" 2
0
2
"Dok2" 1
2
3
"Dok3" 1
9
attrRecTable
element name
2 Attr2
3 Attr3
value
AW2
AW3
DocumentTable
own_id name etypes
1
"DTD1" {...}
2
"DTD2" {...}
3
"DTD3" {...}
config
"...."
"...."
"...."
Relational Tables
(generic massive join option)
Querying XML with Locator Semantics
article
author
firstname
lastname
XML Serialization: Structured Text
?
person
lastname
Joe
Lapp
title
XQL for
Dummies
OEM: Graph
document
document.person
document.person.@id
document.person.@id.“joel"
document.person.firstname
document.person.firstname.“Joe"
document.person.firstname.“Lapp"
document.person
document.person.@id
...
Locators: Lists of Paths
Slide 10
year
1999
Locators for Bookstore
bookstore#1
bookstore#1.fiction#2
bookstore#1.fiction#2.sci-fi#3
bookstore#1.fiction#2.sci-fi#3.book#4
bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5
bookstore#1.fiction#2.sci-fi#3.book#4.title#6
bookstore#1.fiction#2.sci-fi#3.book#4.author#7
…
bookstore#1.fiction#2.fantasy#8
bookstore#1.fiction#2.fantasy#8.mistery#9
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13
...
Querying XML with Locator Semantics
Slide 11
Locators <-> XML Serialization
Locators are lists of paths
XML-document->Locators
• each element-node gets id in document-order (depth first, left to
right traversal)
• each element-node is located by the entire path from root
• attributes are attached to element-nodes
• content is attached to leave-nodes
Locators->XML-document:
• clean up: discard locators $prefix which are followed by at least
one locator $prefix.$postfix
• generate tree
(1) for all locators generate nested serialization
(2) fill up with content and attributes
Mappings should be total, 1:1
Querying XML with Locator Semantics
Slide 12
Locator Sets vs. Relations
Commonalties
• flat sets
• identity defined by identity of components
• concatenation to derive new locators/tuples
Differences
• arity
• locators: variable length
• tuples: fixed
• access to components:
• locators: by navigation
• tuples: by position/attribute
• data:
• locator components: document nodes
tuples components: values
Querying XML with Locator Semantics
Slide 13
Locator Algebra (0)
Operator
Relational Algebra
Locator Algebra
, , -
On tuple sets
On locator sets
Select
Selects tuples with a
predicate
Selects locators with a predicate
Project
By absolute
component selection
Not available, implicit projection by
dependent join
Cross Product
Concatenate each
tuple in one set with
each tuple in another
set
Dependent join concatenating locators
from a context set with locators from
dependent set
Theta-Join
Combination of cross
product with select
Combination of dependent join, select,
and variable binding
Tree-Operators Not applicable
Querying XML with Locator Semantics
DOM-methods
Slide 14
Locator Algebra (1)
Preliminaries
• L domain of locator sets
• x, y
• PL domain of locators
• u, v
• tail(u) … last component of u
prefix(u) … u - tail(u)
Tree-Operators
• navigation in document tree using DOM methods
• root, parent, children: PL  L
• applied to locator sets from L using d-join (see below)
Set-Operators
• , , -: L  L  L
defined as usual
• order preservation due to total ordering on document nodes
Querying XML with Locator Semantics
Slide 15
Locator Algebra (2)
Select
• select[p]: L  L, where p: PL  Boolean
select[p](x) = {u | u  x, p(tail(u))}
• Example: select[nodename(.) = “book”](x) =
select[“book”](x)
Return
• Corresponds to project
duplicates tail of locator for preserving it in
subsequent d-join (see below)
• return: PL  PL
return(u)=concat(u, tail(u))
Querying XML with Locator Semantics
Slide 16
Locator Algebra (3)
Dependent-Join:
• d-join[f]: L  L, where f: PL  L
d-join[f](x) = u  x concat(prefix(u),f(tail(u))
• Example: return all titles of books in their book context
select[“title”](d-join[children(.)]
(select[“book”](d-join[return(children(.))](x)) =
/book?/title
Kleene Star:
• fixpoint-operator for recursive descent queries
• *[f]: L  L, where f: L  L
*[f](x) = f(x)  *[f](f(x))
• Example: select all titles in their original context
select[“title”](d-join[children(.)]
(*[d-join[return(children(.)](.))](x))=
//*?/title
• maybe too general for physical algebra
Querying XML with Locator Semantics
Slide 17
Locator Algebra (4)
Varbind, Varget
• to realize joins across contexts
• varbind[i,f]: L  L, where i  Name, f: PL  L
varbind[i,f](x):
for all u  x: vars(u):=vars(u) 
vf(tail(u))<i,v>
• varget[i]: PL  L
varget[i](u): {v | (i,v)  vars(u)}
Querying XML with Locator Semantics
Slide 18
Join Example (1)
$D=varbind[$i,select[“isbn”](children(.))]($B)=
//*?/book[$i:=isbn]?
bc#0
$A=*[d-join[return(children(.))](.)](x)=
//*?
bc#0.bookstore#1
bc#0.bookstore#1.fiction#2
bc#0.bookstore#1.fiction#2.sci-fi#3
...
$B=select[“book”](d-join[return(children(.))]($A))=
//*?/book
bc#0.bs#1.f#2.sf#3.b#4
bc#0.bs#1.f#2.fa#8.mi#9.b#10
...
$C=d-join[return(children(.))]($B)=//*?/book?/*
bc#0.bs#1.f#2.sf#3.b#4.isbn#5
bc#0.bs#1.f#2.sf#3.b#4.title#6
...
Querying XML with Locator Semantics
bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5>
bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11>
...
$E=select[“customer”](d-join[children(.)]
(*[d-join[return(children(.))](.)](d-join[root(.)]($D)))
=//*?/customer
customers#14.customer#15
customers#14.customer#20
$F=d-join(select[
select[“isbn”](d-join[children(.)]
(select[“boughtbooks”](d-join[children(.)](.)))=
= varget[$i](.)](“$E”)]($D)=
//*?/book[$i:=isbn]?/
(//*?/customer[boughtbooks/isbn=$i])
bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20
bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15
bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20
Slide 19
Join Example (2)
<books_and_customers>
<bookstore>
<fiction>
<sci-fi>
<book>
<isbn>0006482805</isbn>
<title>Do androids dream of electric sheep</title>
<author>Philip K. Dick</author>
<customers>
<customer>
<name>P.W. Ellis</name>
<boughtbooks>
<isbn>0006482805</isbn>
<isbn>0261102362</isbn>
</boughtbooks>
</customer>
</customers>
</book>
</sci-fi>
Querying XML with Locator Semantics
<fantasy>
<mystery>
<book>
<isbn>0261102362</isbn>
<title>The two towers</title>
<author>JRR Tolkien</author>
<customers>
<customer>
<name>Jason Woolsey</name>
<boughtbooks>
<isbn>0261102362</isbn><isbn>0593488321</isb
</boughtbooks>
</customer>
<customer>
<name>P.W. Ellis</name>
<boughtbooks>
<isbn>0006482805</isbn> <isbn>0261102362</isbn>
</boughtbooks>
</customer>
</customers>
</book>
</mystery>
</fantasy>
</fiction>
</bookstore>
</books_and_customers>
Slide 20
Some Equivalence Transformations for L’Algebra
Commutativity:
• union(A,B) = union(B,A) (within single document)
• but d-join is not commutative
Associativity:
• union, intersect, d-join
Idempotence:
• union(A,A) = A
Distributivity:
• //book/(title | author) = //book/title | //book/author
Neutral Elements:
• union: {}
• d-join: $root(?)
Querying XML with Locator Semantics
Slide 21
Open Issues
Combination with relational algebra
Graphs/Multidocuments
• DAGs: Multiple paths from root-context to node (serialization?)
• Role of URIs in locators?
Typing
• Role of XSD (XML Schema Description)
• Inference
Constructors
• attribute to element and vice versa….
• Grouping, Skolems
Details
• Investigate conformance of locator concept to W3C Infoset
• Constraints on locators/mappings to guarantee wellformedness
Political
• XQL-Implementations shipping:
underlying semantics node-based, not locator-based
Querying XML with Locator Semantics
Slide 22
The IPSI XML Brokering Framework
Visualization
HTML, CSS
URL+Queries
XQL
XML
XSL Processor
Queries
XQL
XML
Server (HTTP, URL)
Program
DOM
Persistent
DOM
Warehouse
Queryprocessor: XML Query Language (XQL)
Datamodel: Document Object Model (W3C-DOM)
HTTP/HTML
Roboter
Querying XML with Locator Semantics
Generic
Wrappers
JEDI
Framework
Specific
Wrappers
Slide 23
Wrappers
Jedi Framework for Wrappers
• Pivot Object Model
• Scripting language for control-flow
• Access to dynamic sources (ODBC, CORBA) with iterators
Generic Wrappers
• Generic Mapping of structured formats to XML
• Examples: SGML,XML, HTML, MS-RTF
Jedi Parser
• for irregularily formatted sources
• context free, attributed grammars
• fault-tolerant, efficient parser: unlimited lookahead, interpretation
of ambiguous, incomplete grammars by specificity ordering
HTTP-Access
• Access plans for delegation integrated with XQL Engine
Querying XML with Locator Semantics
Slide 24
Mediator: XQL Engine + Persistent DOM
XQL 98 Implementation
• efficient recursive descent queries by signature-index
+ Joins
+ Multi Document Handling
• extends XQL with external references (via http-get, http-post)
• Multidocument DOM; for every node namespace and URI
+ User defined functions
• input: context (reference-node-set, reference-node-pointer),
parameters: constants, XQL-expressions (lazy evaluation)
• output: node-functions, collection-functions (set of nodes),
comparison-operators
can attach base-URIs
• variables
Querying XML with Locator Semantics
Slide 25
Application 1: An XML Broker for Golfers
XSL
Query
<golfplatz id="platz0001">
<adresse>
[...]
</adresse>
<policy>
...
</policy>
<handicap>
<wochentag>34</wochentag>
<wochenende>34</wochenende>
</handicap>
</golfplatz>
Querying XML with Locator Semantics
<golfdemo
<golfplatz>
<adresse> ... </adresse>
<greenfee> ... </greenfee>
...
</golfplatz>
<wetter>
... </wetter>
<route>
... </route>
</golfdemo>
XML Broker
<www.wetter.de>
<wetter>
<plz>87724</plz>
<datum>981001</datum>
<www.reiseplanung.de>
<route>
<von>53757</von>
<nach>93333</nach>
<temperatur>16</temperatur>
<regen>90</regen>
<wind>9</wind>
<prognose>13</prognose>
</wetter>
<entfernung>481.9</entfernung>
<fahrzeit>274</fahrzeit>
<karte>5375793333.gif</karte>
</route>
<!-- ... -->
<!-- ... -->
</www.reiseplanung.de>
<www.wetter.de>
Slide 26
Application 2: RELIMO Integrating
Bioinformatics Data
XML Application
(e.g. Office 2000)
XML Browser
(e.g. Mozilla 5)
XSL Formatter
(e.g. Lotus-XSL)
XML Broker
RELIBASE
with XML
RPC
Querying XML with Locator Semantics
PDB
as local
PDOM
Slide 27
Application Data
XML Broker for Golfers
• Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB),
www.routen-information.de (200 KB)
• Joins (via zip-code) ~ 2 to 3 secs
RELIMO (Germany)
• Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB
PDOM)
• response time (100 MB) 50 to 30000 ms
MIROWEB (ESPRIT)
• JEDI for importing several sources to Oracle 8
Shakespeare
• all plays
• 10 MB (Tests with duplicated data up to 0.5 GB)
Querying XML with Locator Semantics
Slide 28
Some Links & Acks
XQL FAQ
• http://metalab.unc.edu/xql/
IPSI XML Research & Development
• http://xml.darmstadt.gmd.de
• XQL-Engine 1.0.1 download (non-commercial use)
• JEDI download (non-commercial use)
XML Brokering Framework Licensing Info (Infonyte)
• [email protected]
• www.infonyte.com
Many thanks to
• Karl Aberer, Harald Schöning, Guido Mörkotte
Querying XML with Locator Semantics
Slide 29