CBU Summer School '07 Queryin XML

Download Report

Transcript CBU Summer School '07 Queryin XML

W3C XML Query


How to access various XML data sources?
XQuery, XML Query Lang, W3C Rec, Jan '07
– joint work by XML Query and XSL WGs
» with XPath 2.0 and XSLT 2.0
– influenced by many research groups and query
languages
» Quilt, XPath, XQL, XML-QL, SQL, OQL, Lorel, ...
– A query language for any XML-represented data:
both documents and databases
CBU 2007
XQuery and XPath
1
Functional Requirements

(1)
Support operations (selection, projection, aggregation,
sorting, etc.) on all data types:
– Choose data based on content or structure
– Operate on document hierarchy and order

Structural preservation and transformation:
– Preserve relative hierarchy and sequence of input structures
– Transform XML structures, and create new

Combining and joining:
– Combine data from different parts of a document, or from multiple
documents
CBU 2007
XQuery and XPath
2
Functional Requirements

(2)
Closure property:
– Results of XML queries are also XML
(well-formed document fragments)
– > queries can be combined without limit

Extensibility:
– should support externally defined functions on all data
types of the data model
CBU 2007
XQuery and XPath
3
XQuery in a Nutshell


Functional expression language
Strongly-typed: (optional) type-checking of expressions, and
validation of results (We’ll concentrate to processing)
– predeclared prefix for type names:
xs=http://www.w3.org/2001/XMLSchema

Extends XPath 2.0
– XQuery 1.0 and XPath 2.0 Functions and Operators, Rec. Jan.
2007

XQuery  XPath 2.0 + XSLT' + SQL' (roughly)
CBU 2007
XQuery and XPath
4
Example Query
xquery version "1.0";
<cheapBooks>
<Title>Cheap Books</Title>
{ for $b in fn:doc("bib.xml")//book[@price < 50]
order by $b/title
return $b }
</cheapBooks>
 Syntax "concise and easily understood"
 XML-based syntax (XQueryX) also specified
CBU 2007
XQuery and XPath
5
A possible result
<?xml version="1.0" encoding="UTF-8"?>
<cheapBooks>
<Title>Cheap Books</Title>
<book price="26.50">
<title>Computing with Logic</title>
<author>David Maier</author>
<publisher>Benjamin Cummings</publisher>
<year>1999</year>
</book>
<book price="40.00">
<title>Designing Internet applications</title>
<author>Michael Leventhal</author>
<publisher>Prentice Hall</publisher>
<year>1998</year>
</book>
</cheapBooks>
CBU 2007
XQuery and XPath
6
XQuery and XPath

XQuery (1.0) is an extension of XPath (2.0)
– Common data model, functions and operators
– > study some XPath first

XPath used in several other contexts, too:
–
–
–
–
For uniqueness constraints in XML Schema
For in validation rules of Schematron
For pattern matching and selection in XSLT
For addressing in XLink and XPointer
CBU 2007
XQuery and XPath
7
XPath in a Nutshell

XPath 1.0 (W3C Rec. 11/99)
– a compact non-XML syntax for addressing parts of
XML documents (as node-sets in XPath 1.0)
– also typical operations on strings, numbers and truth
values

XPath 2.0 (2.0 Rec. 1/07) extends and
generalizes:
– data manipulated as sequences of items
» Item = a node or an atomic value of a simple XML
Schema datatype
CBU 2007
XQuery and XPath
8
XPath 1.0 vs 2.0


XPath 2.0 more elegant and complete than 1.0
Also more complex (Length of specs as pages):
XPath 1.0
~ 30
-----------------------Total
~ 30
CBU 2007
Data Model
~ 80
XPath 2.0
~100
Funcs & opers
~160
---------------------------------~340
XQuery and XPath
9
XQuery/XPath/XSLT Data Model

Documents are viewed as trees
made of six types of nodes:
–
–
–
–
–


root (additional parent of document element)
element nodes
attribute nodes
text nodes
Comments and processing instructions
Obs 1: No entity nodes
Obs 2: No namespace nodes
(XPath/XSLT 1.0 contains them)
CBU 2007
XQuery and XPath
10
Document trees

Defined in Sect. 5 of XPath 1.0 spec
– for XSLT/XPath 2.0 & XQuery in their joint Data Model

Element nodes have elements, text nodes,
comments and processing instructions of their
(direct) content as their children
– NB: attribute nodes are not children (but have a parent)
– > they have no siblings either
– the string value of an document/element is the
concatenation of its all text-node descendants
CBU 2007
XQuery and XPath
11
Document Order

Document order of nodes:
– = the depth-first traversal order
– Root first
– Other nodes in the order of the first character of their
XML markup in the document text
– > an element precedes it's attribute nodes, which
precede any content nodes of the element
– Implementation dependent btw nodes belonging to
different trees
CBU 2007
XQuery and XPath
12
XPath trees: Example
<article>Written by
<fig
file="pekka.jpg"
caption="The
Lecturer" />
the lecturer.
</article>
1st
root
Legend:
""
type
name
"Written by the lecturer."
value
2nd
element
"article"
"Written by the lecturer."
4th
3rd
text
element
text
""
"fig"
""
""
"Written by "
5th or 6th
CBU 2007
" the lecturer."
attribute
attribute
"caption"
"file"
"The Lecturer"
"pekka.jpg"
XQuery and XPath
7th
5th or 6th
13
XQuery/XPath Sequences

Expressions operate on, and return sequences of
–
–
–
–
atomic values (of XML Schema simple types) and
nodes
an item  a singleton sequence
sequences are flat: no sequences as items
» (1, (2, 3), (), 1) = (1, 2, 3, 1)

– sequences are ordered, and can contain duplicates
Unlimited combination of expressions, often with automatic
type conversions (e.g. for arithmetics)
CBU 2007
XQuery and XPath
14
Sequence Expressions

Constant sequences constructed by listing
values
– comma (,) is a catenation operator
» (1, (2, 3), (), 1) = (1, 2, 3, 1)

Shorthands for numeric sequences:
– 1 to 4
-> (1, 2, 3, 4)
– 4 to 1
-> ()
– fn:reverse(1 to 4) -> (4, 3, 2, 1)
CBU 2007
XQuery and XPath
15
Location Paths

XPath can select any parts of a document
tree using …

Location paths
– evaluated with respect to a context item
» in XQuery typically starting from $x or doc(…)
– Result: a sequence of nodes in document
order, without duplicates
CBU 2007
XQuery and XPath
16
Location paths

Consist of location steps separated by '/'
– each step produces a sequence of items
– steps evaluated left-to-right,
each item in turn as the context item

Complete location step:
AxisName:: NodeTest ([PredicateExpr])*
– axis specifies the tree relationship between the context
node and the selected nodes
– node test restricts the type and and name of nodes
– filtered further by 0 or more predicates
CBU 2007
XQuery and XPath
17
Location steps: Axes

In total 12 axes (~ directions in tree)
– for staying at the context node: self
– for going downwards:
» child, descendant, descendant-or-self
– for going upwards:
» parent, ancestor, ancestor-or-self
– for moving towards start/end of the document:
» preceding-sibling, following-sibling,
preceding, following
– “Special” axes
» attribute; (+ namespace in Path 1.0)
– Only child, descendant, attribute, self, descendant-orself, and parent mandatory in XQuery
CBU 2007
XQuery and XPath
18
Path Axes and Their Orientation

Ordinary axes oriented away from context node
(attribute and namespace axes are unordered)
– the position() for the closest node = 1
– for the most remote node, position() = last()

The simplest axis, self::
Context
node:
1
CBU 2007
XQuery and XPath
19
XPath Axes and Their Orientation

parent:: (exists for every node except the root)
1
Context node:
CBU 2007
XQuery and XPath
20
XPath Axes and Their Orientation

ancestor::
2
1

ancestor-or-self::
3
2
1
CBU 2007
XQuery and XPath
21
XPath Axes and Their Orientation

child::
Context
node:
1
CBU 2007
2
3
4
XQuery and XPath
22
XPath Axes and Their Orientation

descendant::
1
2

4
3
5
7
6
9
descendant-or-self::
1
2
3
CBU 2007
5
4
8
6
XQuery and XPath
8
7
9
10
23
XPath Axes and Their Orientation

preceding-sibling::
2

1
following-sibling::
1
CBU 2007
XQuery and XPath
2
24
XPath Axes and Their Orientation

following::
4
1
2
5
3

preceding::
3
2
CBU 2007
1
XQuery and XPath
25
Location paths: Node tests

Node tests (slightly simplified)
– Name: any element node with name Name
(on an attribute axis, any attribute node with name Name)
– *: any element (any attribute node on an attribute axis)
– text(): any text node
» comment(): any comment node
» processing-instruction(): any processing instruction
– node(): any node of any type
CBU 2007
XQuery and XPath
26
Location paths: Abbreviations

Abbreviations in location steps
– 'child::' can be omitted
– 'attribute::' can be shortened to '@'
– 'parent::node()' can be shortened to '..'
– Predicate '[position()=n]' for testing
occurrence position n can be shortened to '[n]'
– '/descendant-or-self::node()/'
shortened to '//'
CBU 2007
XQuery and XPath
27
Notes on Location Paths (1)

Path 2.0 allows unrestricted expressions as steps
– but steps except the last must produce nodes only


Numeric predicates support array-style access:
$rows[$i]
Predicates evaluated step at a time. This often causes
confusion with shorthand notations:
– doc("doc.xml")//title[3]
 third title child of each parent (likely none!). Why?
– = doc("doc.xml")/
descendant-or-self::node()/child::title[3]
– To get the third title in the doc use
(doc("doc.xml")//title)[3]
CBU 2007
XQuery and XPath
28
Notes on Location Paths (2)

References to attributes and subelements
easy to use as predicates
– Get divisions that are of class C or have a head:
doc("doc.xml")//div[@class="C" or head]
– Values are coerced to Booleans on demand
» string/sequence  true iff non-empty
» number  true iff not zero

CBU 2007
(but a single number as predicate tests for equality
with position())
XQuery and XPath
29
Semantics of Location Paths (example)
context node
1 A
5
2 B
3
A
4
B
value after
each step:
CBU 2007
C
B
6
"txt"
C
7
final value: {2}
8
*/node()/parent::B[child::A]
{2, 5, 7} {3, 4, 6, 8}
{2, 5, 7}
XQuery and XPath
30
Filter Expressions

Location steps can be filtered by predicates:
./(chap | app)[fn:last()]/title
XPath 2.0 extended step
the title of the last chapter of appendix, whichever is last

Other sequences, too:
→ (5, 10, 15, 20)
– ('.' generalized from XPath 1.0 shorthand for
self::node() into the context item)
(1 to 20)[. mod 5 eq 0]
CBU 2007
XQuery and XPath
31
Further XPath Expressions

Double-precision floating-point Arithmetics
+, -, *, div, mod (same as % in Java)
»e.g.
2.3 mod 1.1 ≈ 0.1

Functions for rounding and truncating:
floor(x), ceiling(x),round(x)
CBU 2007
XQuery and XPath
32
Set Operations on Node (!) Sequences

Assume variable bindings:
$s1

a
b
c
d
e
a
b
c
d
e
$s2
Then:
$s1 union $s2 =
$s1 intersect $s2 =
$s1 except $s2 =
CBU 2007
c
a
b
XQuery and XPath
based on
node indentity
(node1 is node2)
33
Node Comparisons

To compare single nodes,
– for identity: is
$book//chap[@id="ch1"] is ($book//chap)[1]
true iff the chapter with id="ch1" is indeed the first
– for document order: << and >>
$book//chap[@id="ch2"] >>
$book//title[. eq "Intro"]
true iff the chapter with id="ch2" appears after
<title>Intro</title>
CBU 2007
XQuery and XPath
34
Comparing values of sequences and items

General comparisons btw sequences:
– =, !=, <, <=, >, >=
– existential semantics: true iff some pair of values from
operand sequences satisfy the condition
» (1,2) = (2,3); (2,3) = (3,4); (1,2) != (3,4)
» Same as in XPath 1.0:
//book[author = "Aho"]
→ books where some author is Aho

Value comparisons btw single values:
– eq, ne, lt, le, gt, ge
» 1 eq 3 - 2; 10 lt 20; $books[@price le 100]
CBU 2007
XQuery and XPath
35
Accessing Documents

XQuery operates on nodes accessible by input
functions
– fn:doc("URI")
» document-node of the XML document available at URI
» same as document("URI") in XSLT 1.0
– fn:collection("URI")
» sequence of nodes from URI
– predeclared prefix for the default function namespace:
fn=http://www.w3.org/2005/04/xpath-functions
CBU 2007
XQuery and XPath
36
XQuery over XPath
A query is an expression
 XQuery adds to XPath expressions

– Element constructors ( XSLT templates)
– FLWOR expressions
(”flower”; for-let-where-order by-return)
CBU 2007
XQuery and XPath
37
Central XQuery Expressions






Path expressions
Sequence expressions
also in XPath 2.0
Comparison operators
Quantified expressions
(some/every $var in … satisfies …)
Element constructors ( XSLT templates)
FLWOR expressions
(”flower”; for-let-where-order by-return)

and others, in examples ...
CBU 2007
XQuery and XPath
38
Element Constructors

Similar to XSLT templates:
– start and end tag enclosing the content
– literal fragments written directly,
expressions enclosed in braces { and }
≈ XSLT 1.0 attribute value templates

often used inside another expression that binds
variables used in the element constructor
– (There is no 'current node' in XQuery)
– See next
CBU 2007
XQuery and XPath
39
Example

An emp element with an empid attribute and child
elements name and job, from values in variables
$id, $n, and $j:
<emp empid="{$id}">
<name>{$n}</name>
<job>{$j}</job>
</emp>
CBU 2007
Also computed constructors:
element {"emp"} {
attribute {"empid"}{$id},
<name> {$n} </name>,
<job> {$j} </job> }
XQuery and XPath
40
FLWOR ("flower") Expressions


Constructed from for, let, where, order by and
return clauses (~SQL select-from-where)
Syntax: (ForClause | LetClause)+
WhereClause?
OrderByClause?
"return" Expr

FLWOR binds variables to values, and uses these
bindings to construct a result
XPath 2.0 has a
(an ordered sequence of items)
simpler "for-return"
CBU 2007
XQuery and XPath
41
Flow of data in a FLWOR expression
CBU 2007
XQuery and XPath
42
for clauses

for $V1 in Exp1 (,
$V2 in Exp2, …)
– associates each variable Vi with expression Expi
(e.g. a path expression)


Result: list of tuples, each containing a binding for
each of the variables
can be though of as loops iterating over the items
returned by respective expressions
CBU 2007
XQuery and XPath
43
Example: for clause
for $i in (1,2),
$j in (1 to $i)
return <tuple>
<i>{$i}</i> <j>{$j}</j></tuple>
Result:
<tuple><i>1</i><j>1</j></tuple>
<tuple><i>2</i><j>1</j></tuple>
<tuple><i>2</i><j>2</j></tuple>
CBU 2007
XQuery and XPath
44
let clauses

let also binds variables to expressions
– each variable gets the entire sequence as its value
(without iterating over the items of the sequence)
– results in binding a single sequence for each variable

Compare:
– for $b in doc("bib.xml")//book
-> many bindings (to single books)
– let $bl := doc("bib.xml")//book
-> a single binding (to sequence of books)
CBU 2007
XQuery and XPath
45
Example: let clauses
let $s := (<one/>, <two/>, <three/>)
return <out> {$s} </out>
Result:
<out>
<one/>
<two/>
<three/>
</out>
CBU 2007
for $s in (<one/>,<two/>,<three/>)
return <out> {$s} </out>
-->
<out><one/></out>
<out><two/></out>
<out><three/></out>
XQuery and XPath
46
for/let clauses

A FLWOR expr may contain several fors and lets
– each may refer to variables bound in previous clauses

the result of the for/let sequence:
– an ordered list of tuples (monikko) of bound variables
– number of tuples = product of the cardinalities of the
sequences returned by the for expressions
CBU 2007
XQuery and XPath
47
where clause

binding tuples generated by for and let clauses
are filtered by an optional where clause
– tuples with a true condition are used to instantiate the
return clause

the where clause may contain several predicates
connected by and, or, and fn:not()
– usually refer to the bound variables
– sequences as Booleans (similarly to node-sets in
XPath 1.0): empty ~ false; non-empty ~ true
CBU 2007
XQuery and XPath
48
where clause

for binds variables to single items
-> value comparisons, e.g. $color eq "red"

let to whole sequences -> general comparisons, e.g.
$colors = "red"
(~
some $c in $colors
satisfies $c eq "red")
– a number of aggregation functions available:
avg(), sum(), count(), max(), min()
(also in XPath 1.0)
CBU 2007
XQuery and XPath
49
return clause



The return clause generates the output of the
FLWOR expression
instantiated once for each binding tuple
often contains element constuctors, references to
bound variables, and nested sub-expressions
CBU 2007
XQuery and XPath
50
Example: for + return
for $i in (1,2),
$j in (1 to $i)
return <tuple>
<i>{$i}</i> <j>{$j}</j></tuple>
Result:
<tuple><i>1</i><j>1</j></tuple>
<tuple><i>2</i><j>1</j></tuple>
<tuple><i>2</i><j>2</j></tuple>
CBU 2007
XQuery and XPath
51
Positional variables: 'at'

For items, can also get their position in the seq:
for $char at $i in ("a", "b", "c")
return concat($i, ".", $char, ";")
-> 1.a;2.b;3.c;

Could pair items by their position:
let $boys:= doc("kids.xml")//boy,
$girls:= doc("kids.xml")//girl
for $b at $i in $boys
where $i le count($girls)
return <pair>{ $b, $girls[$i] }</pair>
CBU 2007
XQuery and XPath
52
Examples (modified from "XML Query Use Cases")

Assume: a document named ”bib.xml”
containing of a list of books:
<book>+
<title>
<author>+
<publisher>
<year>
<price>
CBU 2007
XQuery and XPath
53
List Morgan Kaufmann book titles since 1998
<recent-MK-books> {
for $b in doc(”bib.xml”)//book
where $b/publisher = ”Morgan Kaufmann”
and $b/year >= 1998
return <book year="{$b/year}">
{$b/title}
</book>
} </recent-MK-books>
CBU 2007
XQuery and XPath
54
Result could be...
<recent-MK-books>
<book year=”1999”>
<title>TCP/IP Illustrated</title>
</book>
<book year=”2000”>
<title>Advanced Programming in the Unix
environment</title>
</book>
</recent-MK-books>
CBU 2007
XQuery and XPath
55
Publishers with avg price of their books:
string values of the sequence,
without duplicates
for $p in fn:distinct-values(
fn:doc(”bib.xml”)//publisher )
let $a := avg( doc(”bib.xml”)//book[
publisher = $p]/price )
return <publisher>
<name> {$p} </name>
<avgprice> {$a} </avgprice>
</publisher>
CBU 2007
XQuery and XPath
56
Invert the book-list structure
<author_list>{ (: group books by authors :)
for $a in distinct-values(
doc(”bib.xml”)//author )
return
<author> {
<name> {$a} </name>,
for $b in doc(”bib.xml”)//book[
author = $a]
return $b/title }
</author>
} </author_list>
CBU 2007
XQuery and XPath
57
List of publishers alphabetically,
and their books in descending order of price
for $p in distinct-values(
doc(”bib.xml”)//publisher )
order by $p
return
<publisher>
<name>{$p}</name>
{ for $b in doc(”bib.xml”)//book[
publisher = $p]
order by $b/price descending
return <book> {$b/title,
$b/price} </book> }
</publisher>
CBU 2007
XQuery and XPath
58
Queries on Document Order

Operators << and >>:
– x << y = true iff node x precedes node y in document
order; (y >> x similarly)

Consider a surgical report with
– procedure elements containing
» incision sub-elements

Return a "critical sequence" of contents between the first
and the second incisions of the first procedure
CBU 2007
XQuery and XPath
59
Computing a "critical sequence"
<critical_sequence> {
let $p :=
(doc("report.xml")//procedure)[1]
for $n in $p/node()
where $n >> ($p//incision)[1] and
$n << ($p//incision)[2]
return $n }
</critical_sequence>

NB: if incisions are not children of the procedure,
then an ancestor of the second incision gets to the
result; How to avoid this?
CBU 2007
XQuery and XPath
60
User-defined functions: Example
declare function local:precedes($a as node(),
$b as node()) as xs:boolean
{ $a << $b and (: $a is no ancestor of $b: :)
empty($a//node() intersect $b) };

local: is predeclared prefix for the namespace of local
function names
– Alternatively:
declare namespace my=http://my.namespace.org;
declare function my:precedes(... (as above)
CBU 2007
XQuery and XPath
61
User-defined functions: Example

Now, ”critical sequence” without ancestors of incision:
<critical_sequence> {
let $p :=
(doc("report.xml")//procedure)[1]
for $n in $p/node()
where $n >> ($p//incision)[1] and
local:precedes($n,
($p//incision)[2])
return $n
} </critical_sequence>
CBU 2007
XQuery and XPath
62
Recursive Transformations

Example: “Table-of-contents” for nested sections
– NB if-then-else (in ordinary XPath 2.0 expressions, too)
declare namespace my=http://my.own-ns.org;
declare function my:toc( $n as element() )
as element()*
{ if (name($n)=”sect”)
then <sect> {
for $c in $n/* return my:toc($c) } </sect>
else if (name($n)=”title”) then $n
else (: check child elements, if any: :)
for $c in $n/* return my:toc($c) };
CBU 2007
XQuery and XPath
63
Querying relational data

Lots of data is stored in relational databases

Should be able to access also them

Example: Tables for Parts and Suppliers
– P (pno, descrip) : part numbers and descriptions
– S (sno, sname) : supplier numbers and names
– SP (sno, pno, price):
who supplies which parts and for what price?
CBU 2007
XQuery and XPath
64
Possible XML representation of relations
*
*
*
CBU 2007
XQuery and XPath
65
Selecting in SQL vs. XQuery
SELECT pno
FROM p
WHERE descrip LIKE ’Gear%’
ORDER BY pno;

SQL:

XQuery:
for $p in doc(”p.xml”)//p_tuple
where starts-with($p/descrip, ”Gear”)
order by $p/pno
return $p/pno
CBU 2007
XQuery and XPath
66
Grouping



Many queries involve grouping data and applying
aggregation function like count or avg to each
group
in SQL: GROUP BY and HAVING clauses
Example: Find the part number and average price
for parts with at least 3 suppliers
CBU 2007
XQuery and XPath
67
Grouping: SQL
SELECT pno, avg(price) AS avgprice
FROM sp
GROUP BY pno
HAVING count(*) >= 3
ORDER BY pno;
CBU 2007
XQuery and XPath
68
Grouping: XQuery
for $pn in distinct-values(
doc(”sp.xml”)//pno)
let $sp:=doc(”sp.xml”)//sp_tuple[pno=$pn]
where count($sp) >= 3
order by $pn
return
<well_supplied_item> {
<pno>{$pn}</pno>,
<avgprice> {avg($sp/price)} </avgprice>
} <well_supplied_item>
CBU 2007
XQuery and XPath
69
Joins

Example: Return a ”flat” list of supplier names and their
part descriptions, in alphabetic order
for $sp in doc(”sp.xml”)//sp_tuple,
$p in doc(”p.xml”)//p_tuple[pno = $sp/pno],
$s in doc(”s.xml”)//s_tuple[sno = $sp/sno]
order by $p/descrip, $s/sname
return <sp_pair>{
$s/sname ,
$p/descrip
}<sp_pair>
CBU 2007
XQuery and XPath
70
XQuery: Summary
– A recent W3C XML query language, also capable of
general XML processing
– Vendor support??
» http://www.w3.org/XML/Query
mentions ~ 50 prototypes or products (2004: ~ 30, 2005: ~ 40;
free, commercial, ... Oracle, IBM)
– Future?? Interesting confluence of document and
database research, and highly potential for XML-based
data integration
CBU 2007
XQuery and XPath
71