CBU Summer School '07 Queryin XML
Download
Report
Transcript CBU Summer School '07 Queryin XML
W3C XML Query
How to access various XML data sources?
XQuery, XML Query Lang, W3C Rec, Jan '07
– joint work by XML Query and XSL WGs
» with XPath 2.0 and XSLT 2.0
– influenced by many research groups and query
languages
» Quilt, XPath, XQL, XML-QL, SQL, OQL, Lorel, ...
– A query language for any XML-represented data:
both documents and databases
CBU 2007
XQuery and XPath
1
Functional Requirements
(1)
Support operations (selection, projection, aggregation,
sorting, etc.) on all data types:
– Choose data based on content or structure
– Operate on document hierarchy and order
Structural preservation and transformation:
– Preserve relative hierarchy and sequence of input structures
– Transform XML structures, and create new
Combining and joining:
– Combine data from different parts of a document, or from multiple
documents
CBU 2007
XQuery and XPath
2
Functional Requirements
(2)
Closure property:
– Results of XML queries are also XML
(well-formed document fragments)
– > queries can be combined without limit
Extensibility:
– should support externally defined functions on all data
types of the data model
CBU 2007
XQuery and XPath
3
XQuery in a Nutshell
Functional expression language
Strongly-typed: (optional) type-checking of expressions, and
validation of results (We’ll concentrate to processing)
– predeclared prefix for type names:
xs=http://www.w3.org/2001/XMLSchema
Extends XPath 2.0
– XQuery 1.0 and XPath 2.0 Functions and Operators, Rec. Jan.
2007
XQuery XPath 2.0 + XSLT' + SQL' (roughly)
CBU 2007
XQuery and XPath
4
Example Query
xquery version "1.0";
<cheapBooks>
<Title>Cheap Books</Title>
{ for $b in fn:doc("bib.xml")//book[@price < 50]
order by $b/title
return $b }
</cheapBooks>
Syntax "concise and easily understood"
XML-based syntax (XQueryX) also specified
CBU 2007
XQuery and XPath
5
A possible result
<?xml version="1.0" encoding="UTF-8"?>
<cheapBooks>
<Title>Cheap Books</Title>
<book price="26.50">
<title>Computing with Logic</title>
<author>David Maier</author>
<publisher>Benjamin Cummings</publisher>
<year>1999</year>
</book>
<book price="40.00">
<title>Designing Internet applications</title>
<author>Michael Leventhal</author>
<publisher>Prentice Hall</publisher>
<year>1998</year>
</book>
</cheapBooks>
CBU 2007
XQuery and XPath
6
XQuery and XPath
XQuery (1.0) is an extension of XPath (2.0)
– Common data model, functions and operators
– > study some XPath first
XPath used in several other contexts, too:
–
–
–
–
For uniqueness constraints in XML Schema
For in validation rules of Schematron
For pattern matching and selection in XSLT
For addressing in XLink and XPointer
CBU 2007
XQuery and XPath
7
XPath in a Nutshell
XPath 1.0 (W3C Rec. 11/99)
– a compact non-XML syntax for addressing parts of
XML documents (as node-sets in XPath 1.0)
– also typical operations on strings, numbers and truth
values
XPath 2.0 (2.0 Rec. 1/07) extends and
generalizes:
– data manipulated as sequences of items
» Item = a node or an atomic value of a simple XML
Schema datatype
CBU 2007
XQuery and XPath
8
XPath 1.0 vs 2.0
XPath 2.0 more elegant and complete than 1.0
Also more complex (Length of specs as pages):
XPath 1.0
~ 30
-----------------------Total
~ 30
CBU 2007
Data Model
~ 80
XPath 2.0
~100
Funcs & opers
~160
---------------------------------~340
XQuery and XPath
9
XQuery/XPath/XSLT Data Model
Documents are viewed as trees
made of six types of nodes:
–
–
–
–
–
root (additional parent of document element)
element nodes
attribute nodes
text nodes
Comments and processing instructions
Obs 1: No entity nodes
Obs 2: No namespace nodes
(XPath/XSLT 1.0 contains them)
CBU 2007
XQuery and XPath
10
Document trees
Defined in Sect. 5 of XPath 1.0 spec
– for XSLT/XPath 2.0 & XQuery in their joint Data Model
Element nodes have elements, text nodes,
comments and processing instructions of their
(direct) content as their children
– NB: attribute nodes are not children (but have a parent)
– > they have no siblings either
– the string value of an document/element is the
concatenation of its all text-node descendants
CBU 2007
XQuery and XPath
11
Document Order
Document order of nodes:
– = the depth-first traversal order
– Root first
– Other nodes in the order of the first character of their
XML markup in the document text
– > an element precedes it's attribute nodes, which
precede any content nodes of the element
– Implementation dependent btw nodes belonging to
different trees
CBU 2007
XQuery and XPath
12
XPath trees: Example
<article>Written by
<fig
file="pekka.jpg"
caption="The
Lecturer" />
the lecturer.
</article>
1st
root
Legend:
""
type
name
"Written by the lecturer."
value
2nd
element
"article"
"Written by the lecturer."
4th
3rd
text
element
text
""
"fig"
""
""
"Written by "
5th or 6th
CBU 2007
" the lecturer."
attribute
attribute
"caption"
"file"
"The Lecturer"
"pekka.jpg"
XQuery and XPath
7th
5th or 6th
13
XQuery/XPath Sequences
Expressions operate on, and return sequences of
–
–
–
–
atomic values (of XML Schema simple types) and
nodes
an item a singleton sequence
sequences are flat: no sequences as items
» (1, (2, 3), (), 1) = (1, 2, 3, 1)
– sequences are ordered, and can contain duplicates
Unlimited combination of expressions, often with automatic
type conversions (e.g. for arithmetics)
CBU 2007
XQuery and XPath
14
Sequence Expressions
Constant sequences constructed by listing
values
– comma (,) is a catenation operator
» (1, (2, 3), (), 1) = (1, 2, 3, 1)
Shorthands for numeric sequences:
– 1 to 4
-> (1, 2, 3, 4)
– 4 to 1
-> ()
– fn:reverse(1 to 4) -> (4, 3, 2, 1)
CBU 2007
XQuery and XPath
15
Location Paths
XPath can select any parts of a document
tree using …
Location paths
– evaluated with respect to a context item
» in XQuery typically starting from $x or doc(…)
– Result: a sequence of nodes in document
order, without duplicates
CBU 2007
XQuery and XPath
16
Location paths
Consist of location steps separated by '/'
– each step produces a sequence of items
– steps evaluated left-to-right,
each item in turn as the context item
Complete location step:
AxisName:: NodeTest ([PredicateExpr])*
– axis specifies the tree relationship between the context
node and the selected nodes
– node test restricts the type and and name of nodes
– filtered further by 0 or more predicates
CBU 2007
XQuery and XPath
17
Location steps: Axes
In total 12 axes (~ directions in tree)
– for staying at the context node: self
– for going downwards:
» child, descendant, descendant-or-self
– for going upwards:
» parent, ancestor, ancestor-or-self
– for moving towards start/end of the document:
» preceding-sibling, following-sibling,
preceding, following
– “Special” axes
» attribute; (+ namespace in Path 1.0)
– Only child, descendant, attribute, self, descendant-orself, and parent mandatory in XQuery
CBU 2007
XQuery and XPath
18
Path Axes and Their Orientation
Ordinary axes oriented away from context node
(attribute and namespace axes are unordered)
– the position() for the closest node = 1
– for the most remote node, position() = last()
The simplest axis, self::
Context
node:
1
CBU 2007
XQuery and XPath
19
XPath Axes and Their Orientation
parent:: (exists for every node except the root)
1
Context node:
CBU 2007
XQuery and XPath
20
XPath Axes and Their Orientation
ancestor::
2
1
ancestor-or-self::
3
2
1
CBU 2007
XQuery and XPath
21
XPath Axes and Their Orientation
child::
Context
node:
1
CBU 2007
2
3
4
XQuery and XPath
22
XPath Axes and Their Orientation
descendant::
1
2
4
3
5
7
6
9
descendant-or-self::
1
2
3
CBU 2007
5
4
8
6
XQuery and XPath
8
7
9
10
23
XPath Axes and Their Orientation
preceding-sibling::
2
1
following-sibling::
1
CBU 2007
XQuery and XPath
2
24
XPath Axes and Their Orientation
following::
4
1
2
5
3
preceding::
3
2
CBU 2007
1
XQuery and XPath
25
Location paths: Node tests
Node tests (slightly simplified)
– Name: any element node with name Name
(on an attribute axis, any attribute node with name Name)
– *: any element (any attribute node on an attribute axis)
– text(): any text node
» comment(): any comment node
» processing-instruction(): any processing instruction
– node(): any node of any type
CBU 2007
XQuery and XPath
26
Location paths: Abbreviations
Abbreviations in location steps
– 'child::' can be omitted
– 'attribute::' can be shortened to '@'
– 'parent::node()' can be shortened to '..'
– Predicate '[position()=n]' for testing
occurrence position n can be shortened to '[n]'
– '/descendant-or-self::node()/'
shortened to '//'
CBU 2007
XQuery and XPath
27
Notes on Location Paths (1)
Path 2.0 allows unrestricted expressions as steps
– but steps except the last must produce nodes only
Numeric predicates support array-style access:
$rows[$i]
Predicates evaluated step at a time. This often causes
confusion with shorthand notations:
– doc("doc.xml")//title[3]
third title child of each parent (likely none!). Why?
– = doc("doc.xml")/
descendant-or-self::node()/child::title[3]
– To get the third title in the doc use
(doc("doc.xml")//title)[3]
CBU 2007
XQuery and XPath
28
Notes on Location Paths (2)
References to attributes and subelements
easy to use as predicates
– Get divisions that are of class C or have a head:
doc("doc.xml")//div[@class="C" or head]
– Values are coerced to Booleans on demand
» string/sequence true iff non-empty
» number true iff not zero
CBU 2007
(but a single number as predicate tests for equality
with position())
XQuery and XPath
29
Semantics of Location Paths (example)
context node
1 A
5
2 B
3
A
4
B
value after
each step:
CBU 2007
C
B
6
"txt"
C
7
final value: {2}
8
*/node()/parent::B[child::A]
{2, 5, 7} {3, 4, 6, 8}
{2, 5, 7}
XQuery and XPath
30
Filter Expressions
Location steps can be filtered by predicates:
./(chap | app)[fn:last()]/title
XPath 2.0 extended step
the title of the last chapter of appendix, whichever is last
Other sequences, too:
→ (5, 10, 15, 20)
– ('.' generalized from XPath 1.0 shorthand for
self::node() into the context item)
(1 to 20)[. mod 5 eq 0]
CBU 2007
XQuery and XPath
31
Further XPath Expressions
Double-precision floating-point Arithmetics
+, -, *, div, mod (same as % in Java)
»e.g.
2.3 mod 1.1 ≈ 0.1
Functions for rounding and truncating:
floor(x), ceiling(x),round(x)
CBU 2007
XQuery and XPath
32
Set Operations on Node (!) Sequences
Assume variable bindings:
$s1
a
b
c
d
e
a
b
c
d
e
$s2
Then:
$s1 union $s2 =
$s1 intersect $s2 =
$s1 except $s2 =
CBU 2007
c
a
b
XQuery and XPath
based on
node indentity
(node1 is node2)
33
Node Comparisons
To compare single nodes,
– for identity: is
$book//chap[@id="ch1"] is ($book//chap)[1]
true iff the chapter with id="ch1" is indeed the first
– for document order: << and >>
$book//chap[@id="ch2"] >>
$book//title[. eq "Intro"]
true iff the chapter with id="ch2" appears after
<title>Intro</title>
CBU 2007
XQuery and XPath
34
Comparing values of sequences and items
General comparisons btw sequences:
– =, !=, <, <=, >, >=
– existential semantics: true iff some pair of values from
operand sequences satisfy the condition
» (1,2) = (2,3); (2,3) = (3,4); (1,2) != (3,4)
» Same as in XPath 1.0:
//book[author = "Aho"]
→ books where some author is Aho
Value comparisons btw single values:
– eq, ne, lt, le, gt, ge
» 1 eq 3 - 2; 10 lt 20; $books[@price le 100]
CBU 2007
XQuery and XPath
35
Accessing Documents
XQuery operates on nodes accessible by input
functions
– fn:doc("URI")
» document-node of the XML document available at URI
» same as document("URI") in XSLT 1.0
– fn:collection("URI")
» sequence of nodes from URI
– predeclared prefix for the default function namespace:
fn=http://www.w3.org/2005/04/xpath-functions
CBU 2007
XQuery and XPath
36
XQuery over XPath
A query is an expression
XQuery adds to XPath expressions
– Element constructors ( XSLT templates)
– FLWOR expressions
(”flower”; for-let-where-order by-return)
CBU 2007
XQuery and XPath
37
Central XQuery Expressions
Path expressions
Sequence expressions
also in XPath 2.0
Comparison operators
Quantified expressions
(some/every $var in … satisfies …)
Element constructors ( XSLT templates)
FLWOR expressions
(”flower”; for-let-where-order by-return)
and others, in examples ...
CBU 2007
XQuery and XPath
38
Element Constructors
Similar to XSLT templates:
– start and end tag enclosing the content
– literal fragments written directly,
expressions enclosed in braces { and }
≈ XSLT 1.0 attribute value templates
often used inside another expression that binds
variables used in the element constructor
– (There is no 'current node' in XQuery)
– See next
CBU 2007
XQuery and XPath
39
Example
An emp element with an empid attribute and child
elements name and job, from values in variables
$id, $n, and $j:
<emp empid="{$id}">
<name>{$n}</name>
<job>{$j}</job>
</emp>
CBU 2007
Also computed constructors:
element {"emp"} {
attribute {"empid"}{$id},
<name> {$n} </name>,
<job> {$j} </job> }
XQuery and XPath
40
FLWOR ("flower") Expressions
Constructed from for, let, where, order by and
return clauses (~SQL select-from-where)
Syntax: (ForClause | LetClause)+
WhereClause?
OrderByClause?
"return" Expr
FLWOR binds variables to values, and uses these
bindings to construct a result
XPath 2.0 has a
(an ordered sequence of items)
simpler "for-return"
CBU 2007
XQuery and XPath
41
Flow of data in a FLWOR expression
CBU 2007
XQuery and XPath
42
for clauses
for $V1 in Exp1 (,
$V2 in Exp2, …)
– associates each variable Vi with expression Expi
(e.g. a path expression)
Result: list of tuples, each containing a binding for
each of the variables
can be though of as loops iterating over the items
returned by respective expressions
CBU 2007
XQuery and XPath
43
Example: for clause
for $i in (1,2),
$j in (1 to $i)
return <tuple>
<i>{$i}</i> <j>{$j}</j></tuple>
Result:
<tuple><i>1</i><j>1</j></tuple>
<tuple><i>2</i><j>1</j></tuple>
<tuple><i>2</i><j>2</j></tuple>
CBU 2007
XQuery and XPath
44
let clauses
let also binds variables to expressions
– each variable gets the entire sequence as its value
(without iterating over the items of the sequence)
– results in binding a single sequence for each variable
Compare:
– for $b in doc("bib.xml")//book
-> many bindings (to single books)
– let $bl := doc("bib.xml")//book
-> a single binding (to sequence of books)
CBU 2007
XQuery and XPath
45
Example: let clauses
let $s := (<one/>, <two/>, <three/>)
return <out> {$s} </out>
Result:
<out>
<one/>
<two/>
<three/>
</out>
CBU 2007
for $s in (<one/>,<two/>,<three/>)
return <out> {$s} </out>
-->
<out><one/></out>
<out><two/></out>
<out><three/></out>
XQuery and XPath
46
for/let clauses
A FLWOR expr may contain several fors and lets
– each may refer to variables bound in previous clauses
the result of the for/let sequence:
– an ordered list of tuples (monikko) of bound variables
– number of tuples = product of the cardinalities of the
sequences returned by the for expressions
CBU 2007
XQuery and XPath
47
where clause
binding tuples generated by for and let clauses
are filtered by an optional where clause
– tuples with a true condition are used to instantiate the
return clause
the where clause may contain several predicates
connected by and, or, and fn:not()
– usually refer to the bound variables
– sequences as Booleans (similarly to node-sets in
XPath 1.0): empty ~ false; non-empty ~ true
CBU 2007
XQuery and XPath
48
where clause
for binds variables to single items
-> value comparisons, e.g. $color eq "red"
let to whole sequences -> general comparisons, e.g.
$colors = "red"
(~
some $c in $colors
satisfies $c eq "red")
– a number of aggregation functions available:
avg(), sum(), count(), max(), min()
(also in XPath 1.0)
CBU 2007
XQuery and XPath
49
return clause
The return clause generates the output of the
FLWOR expression
instantiated once for each binding tuple
often contains element constuctors, references to
bound variables, and nested sub-expressions
CBU 2007
XQuery and XPath
50
Example: for + return
for $i in (1,2),
$j in (1 to $i)
return <tuple>
<i>{$i}</i> <j>{$j}</j></tuple>
Result:
<tuple><i>1</i><j>1</j></tuple>
<tuple><i>2</i><j>1</j></tuple>
<tuple><i>2</i><j>2</j></tuple>
CBU 2007
XQuery and XPath
51
Positional variables: 'at'
For items, can also get their position in the seq:
for $char at $i in ("a", "b", "c")
return concat($i, ".", $char, ";")
-> 1.a;2.b;3.c;
Could pair items by their position:
let $boys:= doc("kids.xml")//boy,
$girls:= doc("kids.xml")//girl
for $b at $i in $boys
where $i le count($girls)
return <pair>{ $b, $girls[$i] }</pair>
CBU 2007
XQuery and XPath
52
Examples (modified from "XML Query Use Cases")
Assume: a document named ”bib.xml”
containing of a list of books:
<book>+
<title>
<author>+
<publisher>
<year>
<price>
CBU 2007
XQuery and XPath
53
List Morgan Kaufmann book titles since 1998
<recent-MK-books> {
for $b in doc(”bib.xml”)//book
where $b/publisher = ”Morgan Kaufmann”
and $b/year >= 1998
return <book year="{$b/year}">
{$b/title}
</book>
} </recent-MK-books>
CBU 2007
XQuery and XPath
54
Result could be...
<recent-MK-books>
<book year=”1999”>
<title>TCP/IP Illustrated</title>
</book>
<book year=”2000”>
<title>Advanced Programming in the Unix
environment</title>
</book>
</recent-MK-books>
CBU 2007
XQuery and XPath
55
Publishers with avg price of their books:
string values of the sequence,
without duplicates
for $p in fn:distinct-values(
fn:doc(”bib.xml”)//publisher )
let $a := avg( doc(”bib.xml”)//book[
publisher = $p]/price )
return <publisher>
<name> {$p} </name>
<avgprice> {$a} </avgprice>
</publisher>
CBU 2007
XQuery and XPath
56
Invert the book-list structure
<author_list>{ (: group books by authors :)
for $a in distinct-values(
doc(”bib.xml”)//author )
return
<author> {
<name> {$a} </name>,
for $b in doc(”bib.xml”)//book[
author = $a]
return $b/title }
</author>
} </author_list>
CBU 2007
XQuery and XPath
57
List of publishers alphabetically,
and their books in descending order of price
for $p in distinct-values(
doc(”bib.xml”)//publisher )
order by $p
return
<publisher>
<name>{$p}</name>
{ for $b in doc(”bib.xml”)//book[
publisher = $p]
order by $b/price descending
return <book> {$b/title,
$b/price} </book> }
</publisher>
CBU 2007
XQuery and XPath
58
Queries on Document Order
Operators << and >>:
– x << y = true iff node x precedes node y in document
order; (y >> x similarly)
Consider a surgical report with
– procedure elements containing
» incision sub-elements
Return a "critical sequence" of contents between the first
and the second incisions of the first procedure
CBU 2007
XQuery and XPath
59
Computing a "critical sequence"
<critical_sequence> {
let $p :=
(doc("report.xml")//procedure)[1]
for $n in $p/node()
where $n >> ($p//incision)[1] and
$n << ($p//incision)[2]
return $n }
</critical_sequence>
NB: if incisions are not children of the procedure,
then an ancestor of the second incision gets to the
result; How to avoid this?
CBU 2007
XQuery and XPath
60
User-defined functions: Example
declare function local:precedes($a as node(),
$b as node()) as xs:boolean
{ $a << $b and (: $a is no ancestor of $b: :)
empty($a//node() intersect $b) };
local: is predeclared prefix for the namespace of local
function names
– Alternatively:
declare namespace my=http://my.namespace.org;
declare function my:precedes(... (as above)
CBU 2007
XQuery and XPath
61
User-defined functions: Example
Now, ”critical sequence” without ancestors of incision:
<critical_sequence> {
let $p :=
(doc("report.xml")//procedure)[1]
for $n in $p/node()
where $n >> ($p//incision)[1] and
local:precedes($n,
($p//incision)[2])
return $n
} </critical_sequence>
CBU 2007
XQuery and XPath
62
Recursive Transformations
Example: “Table-of-contents” for nested sections
– NB if-then-else (in ordinary XPath 2.0 expressions, too)
declare namespace my=http://my.own-ns.org;
declare function my:toc( $n as element() )
as element()*
{ if (name($n)=”sect”)
then <sect> {
for $c in $n/* return my:toc($c) } </sect>
else if (name($n)=”title”) then $n
else (: check child elements, if any: :)
for $c in $n/* return my:toc($c) };
CBU 2007
XQuery and XPath
63
Querying relational data
Lots of data is stored in relational databases
Should be able to access also them
Example: Tables for Parts and Suppliers
– P (pno, descrip) : part numbers and descriptions
– S (sno, sname) : supplier numbers and names
– SP (sno, pno, price):
who supplies which parts and for what price?
CBU 2007
XQuery and XPath
64
Possible XML representation of relations
*
*
*
CBU 2007
XQuery and XPath
65
Selecting in SQL vs. XQuery
SELECT pno
FROM p
WHERE descrip LIKE ’Gear%’
ORDER BY pno;
SQL:
XQuery:
for $p in doc(”p.xml”)//p_tuple
where starts-with($p/descrip, ”Gear”)
order by $p/pno
return $p/pno
CBU 2007
XQuery and XPath
66
Grouping
Many queries involve grouping data and applying
aggregation function like count or avg to each
group
in SQL: GROUP BY and HAVING clauses
Example: Find the part number and average price
for parts with at least 3 suppliers
CBU 2007
XQuery and XPath
67
Grouping: SQL
SELECT pno, avg(price) AS avgprice
FROM sp
GROUP BY pno
HAVING count(*) >= 3
ORDER BY pno;
CBU 2007
XQuery and XPath
68
Grouping: XQuery
for $pn in distinct-values(
doc(”sp.xml”)//pno)
let $sp:=doc(”sp.xml”)//sp_tuple[pno=$pn]
where count($sp) >= 3
order by $pn
return
<well_supplied_item> {
<pno>{$pn}</pno>,
<avgprice> {avg($sp/price)} </avgprice>
} <well_supplied_item>
CBU 2007
XQuery and XPath
69
Joins
Example: Return a ”flat” list of supplier names and their
part descriptions, in alphabetic order
for $sp in doc(”sp.xml”)//sp_tuple,
$p in doc(”p.xml”)//p_tuple[pno = $sp/pno],
$s in doc(”s.xml”)//s_tuple[sno = $sp/sno]
order by $p/descrip, $s/sname
return <sp_pair>{
$s/sname ,
$p/descrip
}<sp_pair>
CBU 2007
XQuery and XPath
70
XQuery: Summary
– A recent W3C XML query language, also capable of
general XML processing
– Vendor support??
» http://www.w3.org/XML/Query
mentions ~ 50 prototypes or products (2004: ~ 30, 2005: ~ 40;
free, commercial, ... Oracle, IBM)
– Future?? Interesting confluence of document and
database research, and highly potential for XML-based
data integration
CBU 2007
XQuery and XPath
71