Theory of Computation

Download Report

Transcript Theory of Computation

Introduction to XPath
Introduction to XML Path
Language (XPath20)
Cheng-Chia Chen
Transparency No. 1
Introduction to XPath
What is XPath ?
 Latest version:





2.0 :
http://www.w3.org/TR/xpath20
XQuery/XPath Data Model (XDM)
XQuery/XPath Formal Semantics
XQuery 1.0 and XPath 2.0 Functions and Operators
 1.0 : http://www.w3.org/TR/xpath
 a language for addressing parts of an XML
document,
 designed to be used by XSLT , XQuery, XML
Schema and XPointer.
 References: xfront, W3Schools
Transparency No. 2
Introduction to XPath
TOC
1
2
3
4
5
Introduction
Data Model
Location Paths
Expressions
Core Function Library
Transparency No. 3
Introduction to XPath
1. Introduction
 What is XPath?
 A language used to to address parts of an XML [XML]
document,
 provides basic facilities for manipulation of strings,
numbers and booleans,
 operate on the abstract, logical structure of an XML
document, rather than its surface syntax.
Transparency No. 4
Introduction to XPath
XPath(2.0) data model
 provides
 a tree representation of XML documents as well as
 atomic values such as number, strings, and booleans,
and
 flat sequences that may contain both references to nodes
in an XML document and atomic values.
 The result of evaluating an XPath expression is a
sequence of items, each of which is either
 a node from the input document, or
 an atomic value.
Transparency No. 5
Introduction to XPath
Type systems of XPath
 XPath Expression:
 the primary syntactic construct in XPath.
 would be evaluated to yield a value, which is a possibly
empty sequence of items.
 An item is either
 a node or
 an atomic value.
Transparency No. 6
Introduction to XPath
Expression evaluation (xpath 1.0)
 occurs with respect to a context.
 XSLT, XQuery and XPointer specify how the context
is determined.
 A context consists of:
 1. a node (the context node)
 2. a pair of non-zero positive integers ( the context
position and the context size)
 3. a set of variable bindings
 4. a function library
 5. the set of namespace declarations in scope for the
expression
 Notes:
 3,4,5 does not change when evaluating subexpressions.
 2 can only be changed by predicates
 Some expression may change 1.
Transparency No. 7
Introduction to XPath
Expression evaluation (xpath 2.0)
 Expression Context
 consisting of all information that can affect the result of
evaluating an expression
 Context are organized into two categories :
 static context : contains information available prior to
execution
 dynamic context :
 contains information used during execution
 = static context + additional information
Transparency No. 8
Introduction to XPath
Static context
A static context consists of:
1. XPath 1.0 compatibility mode : boolean
2. Statically known namespaces (i.e.,(prefix, uri) pairs )
3. Default element/type namespace (or none)
 <e1 .../>, <pre:e2 xsi:type="aType" />
4. Default function namespace (or none)
 max(...), fn:f1(...), ...
5. In-scope schema definitions:
1. schema type definitions(local+global) +
2. element declarations (global + local + substitution groups) +
3. attribute declarations (global+local)
 Identified by expanded QName (global) , or implementation
dependent identifiers(local or anonymous).
6. In-scope variables. : a set of (EQName, type) pairs.
 is the set of variables available for reference within an expression.
 some constructs (for,some,every ) may extend in-scope variables
of its subexpressions.
Transparency No. 9
Introduction to XPath
7. Context item static type : the static type of the context item
8. Function signatures(i.e., callable functions and constructors )



is the set of functions that are callable from within an expression.
Each function identified by its expanded QName and its arity.
Function signature also specifies the static types of the function
parameters and result.
9. Statically known collations.

is a set of (uri, collation) pairs. A collation is a specification of the manner
in which character strings are compared and ordered. Collations are
identified by a uri string.
10. Default collation : is one of statically known collations.
11. Base URI : is the uri for resolution (relative  absolute).
12. Statically known documents :

pairs of (s : absolute doc uri, t: type) , where t is the type of fn:doc( s) and
the default value of t is document-node()? .
13. Statically known collections : pairs of (s: uri, t:type), where t is the type
of fn:collection(s).
14. Statically known default collection type : default type ( is node()* if not
given) of fn:collection().
Transparency No. 10
Introduction to XPath
Dynamic context
= static context + additional items listed below :
1. Focus = context {item, position, size}

., position(), last()
2. Variable values : pairs of (EQName, value), where value also
contains dynamic type info.
3. Function implementations

contains implementation of function signatures given in static
context.
4. Current dateTime :

current-dateTime(), current-date(), current-time()
5. Implicit timezone: implicit-timezone()
6. Available documents: Map<uri, document-node>
7. Available collections : Map<uri, node()*>
8. Default collection: value of collection()
Transparency No. 11
Introduction to XPath
Location path
 The most important kind of expressions
 used to selects a set of nodes relative to a context
node.
Transparency No. 12
Introduction to XPath
2. Data Model
 details in XQuery/XPath data Model
 XPath operates on an XML document as a tree of nodes.




All xpath expressions are evaluated to produce a value.
In Xpath 2.0, a value is always a sequence.
A sequence is an ordered collection of zero or more items.
An item is either
 an atomic value or
 a node.
 An atomic value is a value (in the value space) of an atomic
type, as defined in [XML Schema].
 123 xs:integer;
 xs:date("2011-12-10")
123.0 xs:decimal;
xs:QName('xs:date')
1.23e2 xs:double
Transparency No. 13
Introduction to XPath
Xpath 2.0 data model
 A node is an instance of one of the seven node kinds
defined in XQuery/XPath data Model .
 Each node has a unique node identity, a typed value, and a
string value.
 Some nodes have a name, which is a value of type xs:QName.
 The typed value of a node is a sequence of zero or more atomic
values.
 The string value of a node is a value of type xs:string.
 In certain situations a value is said to be undefined (for
example, the value of the context item, or the typed value of
an element node).
 This term indicates that the property in question has no value and
that
 any attempt to use its value results in an error.
Transparency No. 14
Introduction to XPath
Kinds of Atoms
 Kinds of atoms
 number1.0 (a double floating-point number)
 boolean1.0 (true or false)
 string1.0 (a sequence of unicode characters) or
 generalized to including all atomic datatypes defined by
xml schema2.0
 number2.0 is classified further into

integer, decimal, float and double.
Transparency No. 15
Introduction to XPath
Atomization
 A sequence of items can be atomized to produce a sequence
of atoms by replacing every node item with its typed value
as follows:
 root, text node  string value +xs:untypedAtomic
 comment node, processing-instruction node, namespace node 
string value +xs:string
 attribute  value in the typeAnnotation, or string for type:xs:untypedAtomic


ex: "12.3e2" in xs:dobule => 12.3 e2;
"s1 s2 s3" in xs:IDREFS => sequence ('s1' ,'s2', 's3') of type xs:IDREF*
 element of simple content
 anySimpleType  string value + xs:untypedAtomic
 o/w
 value(s) + type // ex: list type
 element nodes
 xs:untyped or complex type with mixed content  string value +
xs:untypedAtomic
 complex type + empty content (or nilled ='true' ) ()
 complex type + complex element only content  undefined
 The typed value of a sequence s can be queried by invoking fn:data(s).
Transparency No. 16
Introduction to XPath
Types of nodes in an XML tree
 All but namespace node are the same as in XPath 1.0
 The tree contains nodes.
 Types of nodes and their possible children:
 root nodes : element ( = 1), comment, PI
 element nodes: element, text, PI, comment,
[attribute, namespace]
 text nodes: leaves
 attribute nodes : leaves
 namespace nodes:leaves// xpath2.0 need not support

// xquery1.0 do not support
 processing instruction nodes : leaves
 comment nodes : leaves
Transparency No. 17
Introduction to XPath
Basic concepts
 See Concepts from XDM




Node Identities
Document Order
Sequence
Types
Transparency No. 18
Introduction to XPath
Node Identity
 Every node has a unique identity. (like objects in
Java)
 identical to itself,
 not identical to any other node.
 I.e., node1 is node2 iff node1 and node 2 correspond to
the same node occurrence.
 Notes:
1. node identity ≠ ID attribute.
2. An element has an identity even if it has no ID attributes.
3. Non-element Nodes also have unique identity.
 Atomic values do not have identity;
 every occurrence of “5” as an integer is identical to every
other occurrence of “5” as an integer.
Transparency No. 19
Introduction to XPath
Example
<courses>
<course name =“dismath”>
<student idref=“Wang” />
<student idref=“chen” /> …
</course>
<course name=“compiler”>
<student idref=“Wang” />
<student idref=“Chang”/> …
</course> </courses>
Ex:
 xpath: ( /courses/course[name=‘dismath’]/student[1]
is (//student)[3] ) returns false.
 xapth: ((//students)[1]/@idref is (//students)[3]/@idref )
returns false. (why?)
Transparency No. 20
Introduction to XPath
Document order and reverse document order
 Same as in XPath 1.0
Transparency No. 21
Introduction to XPath
Example
<?xml version=“1.0” ?>
<a xmlns:ns1 = “uri1” at1 = “…” at2=“…” >
<a1> data1 </a1>
<a2> data2 </a2>
<a3><b3/><!-- comment 1 --> </a3>
<?pi pidata ?>
</a>
 Doc order: root < a < ns1 < { at1,at2}
< a1 < ns14a1 < data1 …
< a3 < ns14a3 < b3 < ns14b3 < comment
< pi
Transparency No. 22
Introduction to XPath
Sequences
 Sequence of items is the unique output type of all XPath
expressions.
 A sequence may contain nodes, atomic values, or any mixture of
nodes and atomic values.
 no distinction between an item and a singleton sequence
containing that item.
 (‘123’ ) = ‘123’ ; node2 = ( node2 ).
 A node does not loose its identity when it is added to a
sequence. [i.e., only references to the node are added]
 A node may occur in multiple places of one or more sequences.
 Sequences are flat and never contain other sequences.
 Appending (d e) to (a b c) will not produce (a b c (d e)) but would flat it
to (a b c d e ) automatically.
 Notes:
 Sequences replace node-sets from XPath 1.0.
 In XPath 1.0, node-sets do not contain duplicates.
Transparency No. 23
Introduction to XPath
Types in XDM
 accept all types defined by XML Schema
 supports XSLT and XQuery whose type system are based
on XML Schema.
 includes 19 built-in primitive types, 5 additional types
defined by XDM and user/implementor defined types.
 type system defined in XQuery&XPath formal
semantics
 Every item in the data model has both a value and a
type. Examples:
 nodes  node type,

5  xsd:integer ;

‘5’  xsd:string;
 “Hello World.”  xsd:string.
Transparency No. 24
Introduction to XPath
Item
type: EQName
value: Value
Node
node-kind: String
node-Name: xs:QName
5:xsd:int
Atom
Transparency No. 25
Introduction to XPath
XDM Type Hierarchy
 from XDM Type Hierarchy.
Transparency No. 26
Introduction to XPath
Representation of Types
 Use expanded-QName (EQName) to represent a
type.
 Definition: An expanded-QName is a set of three
values consisting of




{prefix} a possibly empty prefix,
{namespace name} a possibly empty namespace URI and
{local name} a local name.
Note: Only URI and local name is used for identity.
 Lexical representation of an expanded QName:
 [pre1:] localName
 URI determined by context.
 A type [with target namespace = n1 and local name
= loc1] is represented by a EQName[ whose URI =
n1 and local Name = loc1].
Transparency No. 27
Introduction to XPath
General constraints on nodes
All nodes must satisfy the following general
constraints:
 1. Every node must have a unique identity, distinct from
all other nodes. [unique identity]
 2. The children property of a node must not contain two
consecutive Text Nodes. [no adjacent texts ]
 3. The children property of a node must not contain any
empty Text Nodes. [no empty text ]
 4. The children and attributes properties of a node must
not contain two nodes with the same identity. [no sharing
of nodes ]
I.e., no sharing of contained nodes (hence a tree but not a dag ).
Transparency No. 28
Introduction to XPath
Predefined Types (link)
 xs:untyped
 denotes the dynamic type of an element node that has not been
validated, or has been validated in skip mode.
 xs:untypedAtomic
 denotes untyped atomic data, such as text that has not been
assigned a more specific type or attribute value that is validated in
skip mode
 xs:anyAtomicType
 derived from xs:anySimpleType
 the root of all atomic types (not including list or union type)
 the base type of all 23 primitive types.
 xs:dayTimeDuration, xs:yearMonthDuration
 derived from xs:duration
 form: PddDTddHddMdd:ddd
 form: PddddYmmM
Transparency No. 29
Introduction to XPath
atomic (Typed) value constructions
 signature (format): see XPath constructor functions
 prefix:TYPE($arg as xs:anyAtomicType?) as prefix:TYPE?
QName of
target type
InputType
OutputType
 Notes:
 ? means the input and output is a sequence of zero or one
atomic value.
 if $arg is empty () then the output is defined to be also
the empty sequence ().
 possible prefix:TYPE
 xs:integer, xs:int, xs:datetime, xs:boolean,…
 can also be user defined atomic types : bk:ISBN, np:IP
Transparency No. 30
Introduction to XPath
List of constructors for built-in types
 xs:string($arg as xs:anyAtomicType?) as xs:string?
 xs:string(“abc”)  string “abc”; xs:string(123)  “123”
 xs:boolean($arg as xs:anyAtomicType?) as xs:boolean?
 xs:boolean(“abc”)  error;
xs:boolan(“”)  error; xs:boolean(10) 
true;
 xs:boolean()  error;
xs:boolean(())  ()
 Note: xs:boolean != fn:boolean (effective boolean value)
 xs:decimal($arg as xs:anyAtomicType?) as xs:decimal?
 xs:decimal(“123.456789” )  123.456789
 xs:float($arg as xs:anyAtomicType?) as xs:float?
 xs:double($arg as xs:anyAtomicType?) as xs:double?
 Note:
 xs:int(“1234567891234”) error
 xs:integer(“1234567891234)  1234567891234
Transparency No. 31
Introduction to XPath
 All others are similar.











xs:duration, xs:dateTime, xs:time,xs:date,xs:gYearMonth,
xs:gYear,xs:gMonthDay,xs:gDay,xs:gMonth
xs:hexBinary,xs:base64Binary
xs:anyURI,xs:QName
xs:normalizedString,
xs:token,
xs:language,
xs:NMTOKEN, xs:Name,
xs:NCName,
xs:ID,
xs:IDREF,
xs:ENTITY,
xs:integer,
xs:long,
xs:int, xs:short,
xs:byte
xs:nonPositiveInteger,xs:negativeInteger
xs:nonNegativeInteger,
xs:unsignedLong,xs:unsignedInt,xs:unsignedShort,
xs:unsignedByte,
 xs:positiveInteger,xs:yearMonthDuration,
 xs:dayTimeDuration, xs:untypedAtomic,
Transparency No. 32
Introduction to XPath
More Examples
 xs:string(“abc”), xs:int(“123”)
 xs:float(“123.3e10”)
 xs:date(“2006-11-12”)
 xs:gMonthYear(“--11-12:)
 xs:gMonth(“--11”)
 xs:gDay(“---12”)
 xs:dateTime(“2006-11-12T12:00:00").
 fn:dateTime( xs:date("1999-12-31"), xs:time("12:00:00")) 
xs:dateTime("1999-12-31T12:00:00").
 fn:dateTime( xs:date("1999-12-31"), xs:time("24:00:00"))
returns xs:dateTime("1999-12-31T00:00:00") because
"24:00:00" is an alternate lexical form for "00:00:00".
 note: 24:00:00 = 00:00:00
Transparency No. 33
Introduction to XPath
String values
 Every atomic value has a string representation.
 The value can be obtained by the casting operation:
 Ex:
 ( xs:int(“123”) + 45 ) cast as xs:string
 return “168”
Transparency No. 34
Introduction to XPath
Properties of nodes
 string value
 Every node has a string-value, which is part of the node
or computed from the string-value of descendant nodes.
 can be obtained by string(.)
 typed value
 can be obtained by data(.)
 expanded-name1.0 ( in 2.0 it is replaced with
EQName)
 expanded-name = namespce URI + local part
 The namespace URI is either null or a URI string
[RFC2396].
 Two expanded-names are equal if they have the same
local part, and the same namespace URIs
Transparency No. 35
Introduction to XPath
Node relationship
 Same as in xpath 1.0
Transparency No. 36
Introduction to XPath
properties/relationship of nodes
m(e) is the URI bound to prefix e
node type
expanded
name
string-value
child
parent
1.root;
document
--( no value)
m(e) + local
null + local
---
descendent
texts
2,5,6
{}
descendant
texts
2,3,5,6.
1,2
text content
{}
2
4.attribute
m(e)+attr or attr value
( e: attr=“…”) null+ attr
(normalized)
{}
2
5.comment
text of content
{}
1,2
PIData
{}
1,2
uri
{}
2
2.element
( e:local)
3.text
6.PI
7.namespace
--null+PITarget
null+p
(xmlns:p=“uri”) null+””
Transparency No. 37
Introduction to XPath
3 Location Paths (renamed PathExpr in 2.0)
 Same as in xpath 1.0 (except some mirror change)
 LocationPath
 a special kind of expressions,
 used to locate a sequence of nodes in the document.

sorted in document order

no duplicates
Transparency No. 38
Introduction to XPath
Kinds of Expressions
3.1 Primary Expressions : string + numeric literls
3.2 Path Expressions
3.3 Sequence Expressions: , to [ … ], |, intersect, 3.4 Arithmetic Expressions : +, - , *, div, idiv, mod
3.5 Comparison Expressions: is, <, >, =, le, ge, eq,
ne…
3.6 Logical Expressions : and, or, not,
3.7 For Expressions : for
3.8 Conditional Expressions : if
3.9 Quantified Expressions : every, some
3.10 Expressions on SequenceTypes
Transparency No. 39
Introduction to XPath
Primary Expressions
 Literals






string: “abc”, ‘abc’, “He said “”OK”” ”, ‘He said “ok” ’.
numerical: 123  xs:integer,
123.4  xs:decimal
124.4e5  xs:double
non-literals:
xs:int(“125”) = xs:int(125) = 125 cast as xs:int
boolean : fn:true(), fn:false()
 Variable References : $pre:name, $var-1
 Parenthesized Expressions : ( ), ( expr )
 Context Item Expression : .
 (1 to 100) [. mod 5 eq 0]
//book[ fn:count(./author) > 1 ]
 Function Calls : pre:fName( arg1, …, argn )
 fn:concate(“abc”, “def”)
Transparency No. 40
Introduction to XPath
Literal Expressions
42
3.1415
6.022E23
’XPath is a lot of fun’
”XPath is a lot of fun”
’The cat said ”Meow!”’
”The cat said ””Meow!”””
”XPath is just
so much fun”
Transparency No. 41
Introduction to XPath
Variable References
$foo
$bar:foo
 $foo-17 refers to the variable ”foo-17”
 Possible fixes:
($foo)-17, $foo -17, $foo+-17
Transparency No. 42
Introduction to XPath
XPath operators and their precedences
#
Operator
(All operators are left associated!)
1
, (comma)
3
for, some, every, if
4
or
5
and
6
eq, ne, lt, le, gt, ge, =, !=, <, <=, >, >=, is, <<, >>
7
to
8
+, -
9
*, div, idiv, mod
10 union, |
11 intersect, except
logical
comparison
arithmetic
combine node seq
( node seq only)
12 instance of
13 treat
14 castable
15 cast
16 -(unary), +(unary)
unary arithmetic
17 ?, *(OccurrenceIndicator), +(OccurrenceIndicator)
18 /, //
path step
19 [ ]
predicate
Transparency No. 43
Introduction to XPath
Path Expressions
 Locations paths are expressions
 They may be applied to arbitrary sequences
 evaluation rule discussed before.
Transparency No. 44
Introduction to XPath
Sequence Expressions
 Constructing Sequences : , , to
 (1,2,3) ,(), (3)  (1,2,3,3)
 2 to 4  (2,3,4)
(10, (1 to 3))  (10,1,2,3)
 (1,(2,3,4),((5)))  (1,2,3,4,5) -- flatten
 Filter Expressions : PrimaryExpr [ … ]*
 (1 to 30) [ . mod 3 = 0 ] [ . mod 5 = 0 ]  (15, 30)
 (10 to 20) [ 5]  (14)
 Combining Node Sequences (for Node only):





assume doc order : A < B < C < D < E
union: (A,B,A) | (B,C) | (A,C) = (A,B) union (B,C) (A,B,C)
intersect, except :
(A,B,C,D )intersect (B,D,A,E) except (B)
 (A, D).
Transparency No. 45
Introduction to XPath
Filter Expressions
 Predicates generalized to arbitrary sequences
 The expression ’.’ is the context item
 The expression:
(10 to 40)[. mod 5 = 0 and position)>20]
has the result:
30, 35, 40
Transparency No. 46
Introduction to XPath
Arithmetic Expressions
 +, -, *, div, idiv, mod, +, - (unary)




-3 div 2  -1.5 (decimal)
-3 idiv 2  -1
(integer)
-3.4 mod 2 (or -2)  -1.4
rule: x = y * ( x idiv y) + (x mod y)
 precedence : {+,-} < {*, mod, div,idiv} < {unary +,-}
 Operators are generalized to sequences






if any argument is empty, the result is empty
() + 3  ()
All argument are singleton sequences of numbers:
( 3) + ( 4) + 5  12
otherwise, a runtime error occurs
(1,3) + (2,4)  error
Transparency No. 47
Introduction to XPath
Comparison Expressions  boolean
 Value Comparisons
 comparison operators : eq, ne, lt, le, gt, ge.
 used for comparing single values.
 General Comparisons (**)
 operators: =, !=, <, <=, >, >=.
 are existentially quantified comparisons that may be
applied to operand sequences of any length.
 The result is true or false if it does not raise an error.
 Node Comparisons
 operators: is, >>, <<
 A is B  true if A anb B are the same node
 A << B = B >> A  true if if A preceds B in doc order.
Transparency No. 48
Introduction to XPath
Value Comparison
 Comparison operators:
 eq(=), ne(≠), lt(<), le(<=), gt(>), ge(>=)
 Used on atomic values
 When applied to arbitrary values ( sequence ):
 atomize
 if either argument is empty => ()
 if one has length > 1 => type error
 if incomparable, a runtime error ; ex:8 < “abc”
 otherwise, compare the two atomic values
 8 eq 4+4
(//rcp:ingredient)[1]/@name eq ”beef cube steak”
Transparency No. 49
Introduction to XPath
Node Comparison
 Operators: is, <<, >>
 Used to compare nodes on identity and order
 is is for node identity; >>, << for node ordering
 When applied to arbitrary values:



Ex:

if either argument is empty, the result is empty
if both are singleton nodes, the nodes are compared
otherwise, a runtime error. Ex: //book[1] is “abc”
(//student)[2] is
//student[@id = ”s9527”]
 /rcp:collection << (//rcp:recipe)[4]
 (//rcp:recipe)[4] >> (//rcp:recipe)[3]
Transparency No. 50
Introduction to XPath
General Comparison (use with care!!)
 Operators: =, !=, <, <=, >, >=
 Used on general sequences:
 atomize
 if there exists two values, one from each argument, whose value
comparison holds, the result is true –Note: It may raise an error
during the value comparison
 otherwise, the result is false ;
8 = 4+4
(1,2) = (2,4)
//rcp:ingredient/@name = ”salt”
() = ()  false!!
(2) != (“2”)  runtime error(2.0), true( in 1.0 mode)
(1,2) = (1, “2”)  true
(1,2) = (“2”, 1)  runtime error (true in 1.0mode)
I.e., seq1 gop seq2 means
∃x1∈seq1∃x2∈seq2 (x1 vop x2).
Transparency No. 51
Introduction to XPath
Be Careful About Comparisons
((//rcp:ingredient)[40]/@name,(//rcp:ingredient)[40]/@amount) eq
((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount)
 false, only singletons and compatible values can be
compared
((//rcp:ingredient)[40]/@name, (//rcp:ingredient)[40]/@amount) =
((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount
 true, since the two names are found to be equal
((//rcp:ingredient)[40]/@name, (//rcp:ingredient)[40]/@amount) is
((//rcp:ingredient)[53]/@name, (//rcp:ingredient)[53]/@amount)
runtime error, since only single-node sequences can be
compared
Transparency No. 52
Introduction to XPath
Algebraic Axioms for Comparisons
•Reflexivity:
xx
•Symmetry:
x y yx
•Transitivity:
x y y  z x z
x y y z x z
•Anti-symmetry:
x y y x x y
•Negation:
x  y  x  y
Transparency No. 53
Introduction to XPath
Genral comparisons violates most axioms
 Reflexivity?
()=() yields false
 Transitivity?
(1,2)=(2,3), (2,3)=(3,4), not (1,2)=(3,4)
 Anti-symmetry?
(1,4)<=(2,3), (2,3)<=(1,4), not (1,2)=(3,4)
 Negation?
(1)!=() yields false, (1)=() yields false
Transparency No. 54
Introduction to XPath
Logical Expressions
 Operators: and, or
 Constants use functions :
 true() and false()
 Negation uses function:
 not(…)
 prcedence: or < and < not(.)
 Arguments are coerced, false if the value is:





the boolean : false()
the empty sequence : ()
the empty string : ””
the number zero : 0
e.g: 0 or ”0”  true; not(”0”)  false ; 0 or ()  false
Transparency No. 55
Introduction to XPath
Functions
 XPath has an extensive function library
 Default namespace for functions:
http://wwww.w3.org/2005/xpath-functions
http://www.w3.org/2006/xpath-functions
 106 functions are required
 More functions with the namespace:
http://www.w3.org/2001/XMLSchema
 for constructors
Transparency No. 56
Introduction to XPath
Function Invocation
 Calling a function with 4 arguments:
fn:avg(1,2,3,4) -- fail
 Calling a function with 1 argument:
fn:avg((1,2,3,4))
Transparency No. 57
Introduction to XPath
Numeric operators and functions
 Arithmetic operators:
+, -, *, div, idiv, mod
ex: 2 + 3, + 3, 5.0 – 4, -+4.0,
30.2 div 4.2, 30 idiv 4, 20 mod 3
 Value comparisons:
 eq(=), ne(!=), le(<=), lt(<), ge(>), gt(>=)
 2.3 > 5;
4 != 3; 4 ge 6
 Functions:
fn:abs(-23.4) = 23.4
fn:ceiling(23.4) = 24
fn:floor(23.4) = 23
//round-half-to-largest
fn:round(23.4) = 23 ;
fn:round(-23.5) = -23
fn:round-half-to-even(-23.5) = -24
Transparency No. 58
Introduction to XPath
Boolean Functions
 Note: no constants for true/false.
 use functions true() and false() instead.
 Boolean operators: and,
 a and b or c means
or
(a and b) or c
 functions: not(-), true(), false()
 fn:not(0) = fn:true() = fn:not( (0))
 fn:not(fn:true()) = fn:false()
 fn:not("") = fn:true()
 fn:not((1)) = fn:false() = fn:not(2)
Notes:
 0,“” , have effect boolean value false.
 (1) has effect boolean value true.
Transparency No. 59
Introduction to XPath
Effect boolean values ( = fn:boolean(s) )
 The following values are interpreted as true:




boolean true
non-empty string
non-zero number
a sequence whose first item is a node
 The following values are interpreted as false:
 boolean false, empty string, 0, 0.0 or NaN,
 All other cases are type error.
() // empty sequence
 Usage

Used in : and, or, not(.), E1[E2], if, some, every, (>,<,=,…;1.0)
 Not used in : xs:boolean(.), . cast as xs:bool, pass value to
xs:boolean arg.
 Examples:
 (2,3) or (4,5)runtime error; (/ , 2)  true ; (2, //e)  error
 2 and “”  false ;
(2) and (3)  true (why?)
Transparency No. 60
Introduction to XPath
String Functions
fn:concat("X","ML") = "XML"
fn:concat("X","ML"," ","book") = "XML book"
fn:string-join(("XML","book")," ") = "XML book"
fn:string-join(("1","2","3"),"+") = "1+2+3"
fn:substring("XML book",5) = "book"
fn:substring("XML book",2,4) = "ML b"
fn:string-length("XML book") = 8
fn:upper-case("XML book") = "XML BOOK"
fn:lower-case("XML book") = "xml book”
fn:translate("bar","abc","ABC") = "BAr"
fn:translate("--aaa--","abc-","ABC") = "AAA".
fn:translate("abcdabc", "abc", "AB") = "ABdAB".
Transparency No. 61
Introduction to XPath
Regexp Functions
fn:contains("XML book","XML") = fn:true()
fn:matches("XML book","XM..[a-z]*") = fn:true()
fn:matches("XML book",".*Z.*") = fn:false()
fn:replace("XML book","XML","Web") = "Web book"
fn:replace("XML book","[a-z]","8") = "XML 8888"
Transparency No. 62
Introduction to XPath
Cardinality Functions on sequence
fn:exists(()) = fn:false()
fn:exists((1,2,3,4)) = fn:true()
fn:empty(()) = fn:true()
fn:empty((1,2,3,4)) = fn:false()
fn:count((1,2,3,4)) = 4
fn:count(//rcp:recipe) = 5
Transparency No. 63
Introduction to XPath
Sequence Functions
fn:distinct-values((1, 2, 3, 4, 3, 2)) = (1, 2, 3,
4)
fn:insert-before((2, 4, 6, 8), 2, (3, 5))
= (2, 3, 5, 4, 6, 8) (: 2 is the position:)
fn:remove((2, 4, 6, 8), 3) = (2, 4, 8)
fn:reverse((2, 4, 6, 8)) = (8, 6, 4, 2)
fn:subsequence((2, 4, 6, 8, 10), 2) = (4, 6, 8,
10)
fn:subsequence((2, 4, 6, 8, 10), 2, 3) = (4, 6, 8)
Transparency No. 64
Introduction to XPath
Aggregate Functions
fn:avg((2, 3, 4, 5, 6, 7)) = 4.5
fn:max((2, 3, 4, 5, 6, 7)) = 7
fn:min((2, 3, 4, 5, 6, 7)) = 2
fn:sum((2, 3, 4, 5, 6, 7)) = 27
fn:count((2, 3, 4, 5, 6, 7)) = 6
Transparency No. 65
Introduction to XPath
Node Functions
fn:doc("http://www.brics.dk/ixwt/examples/recipes
.xml")
fn:position()
fn:last()
Transparency No. 66
Introduction to XPath
Coercion Functions
xs:integer("5") = 5
or
"5" cast as xs:integer
xs:integer(7.0) = 7
7.0 cast as xs:integer
xs:decimal(5) = 5.0
xs:decimal("4.3") = 4.3
xs:decimal("4") = 4.0
xs:double(2) = 2.0E0
xs:double(14.3) = 1.43E1
xs:boolean(0) = fn:false()
xs:boolean("true") = fn:true()
xs:string(17) = "17"
xs:string(1.43E1) = "14.3"
xs:string(fn:true()) = "true"
 castable
if(12345678901 castable as xs:int ) then 12345678901 cast as xs:int
else 12345678901 cast as xs:long
Transparency No. 67
Introduction to XPath
For Expressions
 The expression
for $r in //rcp:recipe
return fn:count($r//rcp:ingredient[fn:not(rcp:ingredient)])
returns the value
11, 12, 15, 8, 30
 The expression
for $i in (1 to 5)
for $j in (1 to $i)
return $j
returns the value
1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5
Transparency No. 68
Introduction to XPath
Conditional Expressions (IfThenElse)
fn:avg(
for $r in //rcp:ingredient return
if ( $r/@unit = "cup" )
then xs:double($r/@amount) * 237
else if ( $r/@unit = "teaspoon" )
then xs:double($r/@amount) * 5
else if ( $r/@unit = "tablespoon" )
then xs:double($r/@amount) * 15
else ()
)
Transparency No. 69
Introduction to XPath
Quantified Expressions
 form: ( some | every ) $var1 in Expr1 ,…,$varn in
Exprn … satisfies Expr
 a boolean expr
Ex: some $r in //rcp:ingredient
satisfies $r/@name eq "sugar"
fn:exists(
for $r in //rcp:ingredient return
if ($r/@name eq "sugar") then fn:true() else
()
)
Transparency No. 70
Introduction to XPath
Expressions on sequence types
 Expressions on SequenceTypes
1. Instance Of
2. Cast
3. Castable
4. Constructor Functions
5. Treat
 Sequence type
 is used to refer to the type of an XPath expression whose
value is always a sequence.
 syntax given in SequenceType .
Transparency No. 71
Introduction to XPath
sequence type syntax
 sequence type
 empty-sequence()
 item-type (? | + | * ) ?
 item-type
 atomic-type
 item()
 kind-test
 atomic-type  any QName // xs:int, my:type
 kind-test
Transparency No. 72
Introduction to XPath
kind-test
 generic cases :







AnyKindTest  node()
// any node
DocumentTest  document-node(), … // any doc
ElementTest  element(), …
// any element
AttributeTest  attribute( ), …
// any attribute
PITest  processing-instruction() // any PI
CommentTest  comment()
// any comment
TextTest  text()
// any text node
 ex: //sale treated as element()*

(//sale, 2) treated as item()+
Transparency No. 73
Introduction to XPath
kind-test
 Specialized cases:
 DocumentTest  document-node( RootElementTest )
 document-node(element(book, bookType) ) // root element is a book
 ElementTest  element( ElementNameOr* [,typeName [?]])
 element(*,xs:int), element(p:e1), element(bk:book, bk:bookType?)
 element(bk:book, bk:bookType) // @xsi:type derives from or is bookType
// and nilled(.) must be false.
 AttributeTest  attribute( AttrNameOr* [,typeName] )
attribute(*, my:type), attribute(my:attr1), attribute(age, xs:int)
 SchemaElementTest schema-element(QName)
QName is the qualified name of a declared element.
 SchemaAttributeTest  schema-attribute(QName)
QName is the qualified name of a declared element.
 PITest  processing-instruction([ NCName | string ])
Transparency No. 74
Introduction to XPath
Type conversion in XPath
 In XPath2.0 there are two operators for type
conversions:
 V cast as AT // change V to a value of atomic type AT
 V treat as ST // assume V is of sequence type ST (at
static time) and raise runtime error if not (like ()Obj in
Java).
 Ex:
 xs:int(2) cast as xs:double // may require value conversion
 2 cast as xs:int // ok!
 2 treat as ? // no value conversion
 ok: xs:integer, xs:decimal, xs:integer+, xs:integer*
 (since 2 is of type xs:integer,and all others are derived from xs:integer)
 runtime error: xs:int, xs:string
(since xs:integer is not a derived type of xs:int or xs:string).
Transparency No. 75
Introduction to XPath
Sequenctype expressions
 InstanceofExpr ::= TreatExpr instance of sequencType
 5 instance of xs:integer, 5 instance of xs:decimal
 (6,5) instance of xs:integer+
 . instance of element()
 CastExpr ::= UnaryExpr [ cast as [ atomicType] ]
 (2,3) cast as xs:double+ (x) // must be atomicType
 2 cast as xs:float
 CastableExpr ::=
CastExpr [ castable as [ atomicType] ]
 (2,3) castable as xs:double+  (x)
 2 castable as xs:double?  true ; "abc" castable as xs:int  false
 TreatExpr
::=
CastableExpr [ treat as SequenceType ]
 ex: @addr treat as attribute(*, USAddress )
 change the declared(static) type of @addr to USAddress.
 During evaluation, if the actual (dynamic) type is not  error
Transparency No. 76
Introduction to XPath
XPath 1.0 Restrictions




Many implementations only support XPath 1.0
Smaller function library
Implicit casts of values
Some expressions change semantics:
 ”4” < ”4.0” : false in XPath 1.0 but true in XPath 2.0

2 = "2" : true in 1.0 but type error in 2.0
Transparency No. 77
Introduction to XPath
XPointer
 A fragment identifier mechanism based on XPath
 Different ways of pointer to the fourth recipe:
...#xpointer(//recipe[4])
...#xpointer(//rcp:recipe[./rcp:title ='Zuppa Inglese'])
...#element(/1/5)
...#r102
Transparency No. 78
Introduction to XPath
Expression Hierarchy (1.0)
 PrimaryExpr  (Expr), funCall, number, literal, varReference
 (Expr), f(a,b,c), 2.3, “abc”, $pre
 FilterExpr  PrimaryExpr pred*
 $ns[@name=‘abc’]
 PathExpr FilterExpr / LP FilterExpr // LP
LP
 $ns[@name=‘abc’] //author[2]









UnionExpr  PathExpr | PathExpr
UnaryExpr  - UnionExpr
MultiplicativeExpr  *, div, mod,
AdditiveExpr  +, RelationalExpr  <, <=, >, >=
EqualityExpr  =, !=
AndExpr  and
OrExpr  or
Expr  OrExpr
Transparency No. 79
Introduction to XPath
Expression Hierarchy (2.0)
 PrimaryExpr 
 (Expr?), funCall, numberOrStringLiteral, varRef, cxtItemExpr
 (Expr), (), f(a,b,c), 2.3, “abc”, $xyz, .
 StepExpr ::= (PrimaryExpr | AxisStep) Pred*
 $x [@name eq ‘abc’],
pre:e1[@name][2]
 RelativePathExpr ::= StepExpr ((‘/’ | ‘//’ ) StepExpr )*
 $ns[@name=‘abc’] //author[2] /@name




PathExpr ::=(“/”?|‘//’)RelativePathExpr|RelativePathExpr
ValueExpr ::= PathExpr
UnaryExpr ::=(‘+’ |’ –’ )* ValueExpr
CastExpr ::= UnaryExpr (‘cast’ ‘as’ AtomicType ‘?’)?
 /bk:books[2]/@name cast as xs:string
() cast as xs:int?
Transparency No. 80
Introduction to XPath
 CastableExpr ::= CastExpr (‘castable’ ‘as‘ AtomicType ‘?’ )?
 if ($x castable as my:type) then
 $x cast as my:type else
 $x cast as xs:string
 TreatExpr ::= CatableExpr (‘treat’ ‘as’ sequenceType )?
 $add treat as element(*, USAddress)
 static type of $addr may be element(*, Address), but require it to be
element(*, USAddress) at runtime. o/w  dynamic error
 instanceOfExpr ::= TreatExpr (‘instacne’ ‘of’ sequencType )?
 IntersectExpr ::= instanceOfExpr ( (‘insersect’ | ‘except’ )
instacneOfExpr)*
 unionExpr ::= intersectExpr ( (‘union’ | ‘|’ ) intersectExpr)*
Transparency No. 81
Introduction to XPath
 MultiplicativeExpr  *, div, idiv, mod,
 5 div 2 * 3
 AdditiveExpr  +,  2+3-4
 RangeExpr ::= AdditiveExpr (to AdditiveExpr)?
 3 to 100
 ComparisonExpr ::= RangeExpr ( (NodeCmp | ValueCmp |
GeneralCmp ) RangeExpr )?
 AndExpr  and
 OrExpr  or
 ExprSingle ::= OrExpr | IfExpr | ForExpr | QuantifiedExpr
 Expr  ExprSingle (‘,’ ExprSingle)*
 XPath ::= Expr
Transparency No. 82