Document

Transcript Document

The XML Standard
Overview of our XML Standards
• Motivation: HTML vs XML
• XML 101: syntax, elements, attributes,
DTDs, …
• XML 201: XML Schema, Namespaces
• XSLT: Transforming and Rendering XML
• XQuery: Search, Transform & Integrate
So what is XML (all about)?
Executive Summary:
• XML = HTML – idiosyncrasies (simplified syntax)
+ user-definable ("semantic") tags
• Separation of data and its presentation
=> simple, very flexible data exchange format:
semistructured data model
=> new applications:
• Information exchange (B2B), sharing (diglib), integration
("mediation"), archival, ...
• Web site mangement (XML+XSL stylesheets), ...
What’s Wrong with HTML?
Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
“Object Fusion in Mediator Systems”. In VLDB 96.
HTML confuses presentation
with content
<DT>
<IMG SRC="greenball.gif" > 
<A NAME="object-fusion"></A>
Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
<A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">
"ObjectFusion in Mediator Systems".</A>
In <I>VLDB 96.</I>
</DT>
...What’s Wrong with HTML...
No Explicit Structure,
Semantics, or Object-Orientation
<DT>
<IMG SRC= "greenball.gif" > 
Author
<A NAME="object-fusion"></A>
Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina.
<A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps">
"ObjectFusion in Mediator Systems".</A>
In <I>VLDB 96.</I>
</DT>
Title
Conference
... And Some Repercussions
• Lack of schema/semantics when querying the Web
(HTML):
– "find documents (books, papers, ...)
author = Michael Jackson"
where
(... and learn how software engineering meets the moon walker
...)
– "create a list of M. Jackson's books and (if available)
their prices"
=> HTML is inappropriate for
 data exchange
 automation of information management
(retrieval, manipulation, integration)
XML is Based on Markup
<bibliography>
Markup indicates
structure and semantics
<paper ID= "object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
</bibliography>
Decoupled from
presentation
Elements and their Content
<bibliography>
element name
Element
Content
<paper ID="object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
element
Empty
Element
</bibliography>
Character content
Element Attributes
<bibliography>
Attribute name
Attribute Value
<paper ID="object-fusion">
<authors>
<author>Y.Papakonstantinou</author>
<author>S. Abiteboul</author>
<author>H. Garcia-Molina</author>
</authors>
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<booktitle>VLDB 96</booktitle>
</paper>
</bibliography>
XML = Labeled Ordered Trees
bibliography
authors@id
author
Yannis
...
paper
paper
fullpaper
23
author
...
title
Object Fusion
Serge
 semistructured data
 labeled trees/graphs
can also represent
• relational and
• object-oriented data
<bibliography>
<paper id=23...>
<authors>
<author>Yannis</author>
<author>Serge</author>
...
</authors>
<title>Object Fusion</title>
...
</paper>
</bibliography>
In Search of the Lost Structure &
Semantics
How do I share
structure and
metadata/semantics
How do I learn and use
with
the element structure
my community?
of a document?
How to make all
this automatable?
Adding Structure and Semantics
• XML Document Type Definitions (DTDs):
• define the structure of "allowed" documents
valid wrt. a DTD)
(i.e.,
•  database schema
=> improve query formulation, execution, ...
• XML Schema
– defines structure and data types
• XML Namespaces
– identify your vocabulary
• Resource Description Framework (RDF)
– simple metadata model
XML DTDs as Extended CFGs
XML DTD
<!element bibliography paper*>
<!element paper
(authors,fullPaper?,title,booktitle)>
<!element authors
author+>
Grammar
bibliography
paper
authors
paper*
authors fullPaper? title booktitle
author+
lhs = element (name)
rhs = regular expression over elements + strings (PCDATA)
Document Type Definitions (DTDs)
Define and Constrain
Element Names & Structure
<!element
<!element
<!element
<!element
<!element
<!element
<!element
<!attlist
<!attlist
bibliography paper*>
paper (authors, fullPaper?, title, booktitle)>
authors author+>
Element Type
author (#PCDATA)>
fullPaper EMPTY>
Declaration
title (#PCDATA)>
booktitle (#PCDATA)>
fullPaper source ENTITY #REQUIRED>
paper ID ID>
Attribute List
Declaration
Element Declarations
Sequence of 0 or
more paper
<!element
<!element
<!element
<!element
Authors followed by
optional fullpaper,
followed by title,
followed by booktitle
bibliography paper*>
paper (authors, fullPaper?, title, booktitle)>
authors author+>
Sequence of 1 or
author (#PCDATA)>
more author
Character content
<!element
<!element
<!element
<!attlist
<!attlist
fullPaper EMPTY>
title (#PCDATA)>
booktitle (#PCDATA)>
fullPaper source ENTITY #REQUIRED>
paper ID ID>
Element Content Declarations
Declaration
element name
R?
R*
R+
R1|R2|…|Rn
#PCDATA
EMPTY
(#PCDATA e*)*
ANY
Meaning
Exactly one instance of element
Zero or one instances of R
Zero or more instances of R
One or more instances of R
One instance of R1 or R2 or … Rn
Character content
Empty element
Mixed Content
Anything goes
Attributes
<person ID="yannis"> Yannis’ info </person>
<bibliography>
Object Identity Attribute
<paper ID="object-fusion" ROLE="publication">
CDATA (character data)
<authors>
<author authorRef="yannis">
IDREF
Y.Papakonstantinou</author>
intradocument
</authors>
reference
<fullPaper source="fusion"/>
<title>Object Fusion in Mediator Systems</title>
<related papers= "semistructured-data" "mediators"/>
</paper>
</bibliography>
Reference to
external ENTITY
Attribute Types
Type
ID
IDREF
IDREFS
ENTITY
ENTITIES
CDATA
NMTOKEN
NMTOKENS
More to
appear?
Meaning
Token unique within the document
Reference to an ID token
Reference to multiple ID tokens
External entity (image, video, …)
External entities
Character data
Enumerated token
Enumerated tokens
More types (eg, DATE) may soon be
part of the standard
Uses of XML Entities
• Physical partition
– size, reuse, "modularity", … (both XML docs &
DTDs)
• Non-XML data
– unparsed entities  binary data
• Non-standard characters
– character entities
• Shorthand for phrases & markup
Types of Entities
• Internal (to a doc) vs. External ( use URI)
• General (in XML doc) vs. Parameter (in DTD)
• Parsed (XML) vs. Unparsed (non-XML)
Internal Text Entities
Internal Text Entity Declaration
<!ENTITY WWW "World Wide Web">
Entity Reference
<p>We all use the &WWW;.</p>
Logically equivalent to actually appearing
<p>We all use the World Wide Web.</p>
Unparsed (& "Binary") Entities
Declare external...
... and unparsed entity
<!ENTITY fusion SYSTEM "fusion.ps" NDATA ps>
Declare attribute type to be entity
<!attlist fullPaper source ENTITY #REQUIRED>
Element with ENTITY attribute
<fullPaper source="fusion"/>
NOTATION declaration (helper app)
<!NOTATION ps SYSTEM "ghostview.exe">
From Docs to Data: XML Schema
• XML DTDs (part of the XML spec.)
– flexible, semistructured data model (nesting,
ANY, ?, *, |, ...)
– but document-oriented (SGML heritage)
• XML Schema (W3C working draft)
– schema definition language in XML
– data-oriented: data types
– extends capabilities of DTD
Sample Data for
Introduction to XML Schema
<?xml version="1.0" encoding="utf-8"?>
<book isbn="0836217462">
<title>Being a Dog Is a Full-Time Job</title>
<author>Charles M. Schulz</author>
<character>
<name>Snoopy</name>
<friend-of>Peppermint Patty</friend-of>
<since>1950-10-04</since>
<qualification>
extroverted beagle </qualification>
</character>
<character>
<name>Peppermint Patty</name>
<since>1966-08-22</since>
<qualification>bold, brash and tomboyish</qualification>
</character>
</book>
The Simple “Russian Doll” Approach
Complex Type
Content for book to XML Schema
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:element name="book">
Optional Namespace Definition
<xsd:complexType>
<xsd:sequence>
Sequence Compositor
Simple Type
<xsd:element name="title" type="xsd:string"/>
Content for
<xsd:element name="author" type="xsd:string"/> title and
<xsd:element name="character“
author
minOccurs="0" maxOccurs="unbounded">
<xsd:complexType> Character may appear any number of times
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="friend-of" type="xsd:string“
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element name="since" type="xsd:date"/>
<xsd:element name="qualification" type="xsd:string"/>
</xsd:sequence> …
Basic Type of XML Schema
The Catalog Approach to XML Schema:
Stand-Alone Declarations & References
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="author" type="xsd:string"/>
<xsd:element name="name" type="xsd:string"/>
…
<xsd:attribute name="isbn" type="xsd:string"/>
Simple Type
Elements
Complex Type
<xsd:element name="character">
Element character
<xsd:complexType>
<xsd:sequence>
Reference
<xsd:element ref="name"/>
<xsd:element ref="friend-of”
minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="since"/>
<xsd:element ref="qualification"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Attributes
Catalog Approach Cont’d
<xsd:element name="book">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="author"/>
<xsd:element ref="character“
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
<xsd:attribute ref="isbn"/>
</xsd:complexType>
</xsd:element>
Named Types
nameType derived from xsd:string by having
the xsd:maxLength facet restrict string to a
Maximum of to 32 characters
<xsd:simpleType name="nameType">
• Write stand<xsd:restriction base="xsd:string">
alone named
<xsd:maxLength value="32"/>
complex type or </xsd:restriction>
</xsd:simpleType>
simple type
declarations nameType used in the declaration of characterType
• Primitive form of <xsd:complexType name="characterType">
<xsd:sequence>
inheritance
<xsd:element name="name“ type="nameType"/>
(called
<xsd:element name="friend-of“
type="nameType”
derivation)
minOccurs="0“ maxOccurs="unbounded"/>
allows
– Restriction
– Extension
<xsd:element name="since"
type="sinceType"/>
<xsd:element name="qualification"
type="descType"/>
</xsd:sequence>
Groups: Named containers of sets of
Elements or Attributes
<xsd:group name="mainBookElements">
<xsd:sequence>
<xsd:element name="title" type="nameType"/>
<xsd:element name="author" type="nameType"/>
</xsd:sequence>
</xsd:group>
<xsd:complexType name="bookType">
<xsd:sequence>
<xsd:group ref="mainBookElements"/>
<xsd:element name="character" type="characterType“
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
Compositors: Sequence, Choice, All
So far we
have seen
sequences
The group nameTypes consists of one of
• the element “name”
• the sequence containing firstName,
middlename, lastName
<xsd:group name="nameTypes">
<xsd:choice>
<xsd:element name="name" type="xsd:string"/>
<xsd:sequence>
<xsd:element name="firstName" type="xsd:string"/>
<xsd:element name="middleName" type="xsd:string“
minOccurs="0"/>
<xsd:element name="lastName" type="xsd:string"/>
</xsd:sequence>
</xsd:choice>
</xsd:group>
Compositors (cont’d)
The characterType consists of name, a list of friend-of,
since, and qualification particles in no particular order.
(Compare with the sequence compositor.)
<xsd:complexType name="characterType">
<xsd:all>
<xsd:element name="name“ type="nameType"/>
<xsd:element name="friend-of“ type="nameType”
minOccurs="0“ maxOccurs="unbounded"/>
<xsd:element name="since" type="sinceType"/>
<xsd:element name="qualification" type="descType"/>
</xsd:all>
</xsd:complexType>
Derivation of Simple Types:
Unions and Lists
So far we
have seen
restrictions
and facets
<xsd:simpleType name="isbnType">
<xsd:union>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{10}"/>
</xsd:restriction>
The simple type isbnType will be either
</xsd:simpleType>
• a 10-digit string (notice the pattern)
• the token "TBD“ or the token "NA"
<xsd:simpleType>
<xsd:restriction base="xsd:NMTOKEN">
<xsd:enumeration value="TBD"/>
<xsd:enumeration value="NA"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
Constraints: Uniqueness
By inserting xsd:unique in the book element declaration
we enforce that the character name’s in each book are unique
<xsd:element name="book">
…
<xsd:unique name="charNameMustBeUnique">
<xsd:selector xpath="character"/>
<xsd:field xpath="name"/>
</xsd:unique>
…
</xsd:element>
Namespaces
<xsd:schema xmlns:xsd=http://www.w3.org/2000/10/XMLSchema
xmlns=http://example.org/ns/books/
targetNamespace=http://example.org/ns/books/
elementFormDefault="qualified“
attributeFormDefault="unqualified" >
Including Unknown Elements
<xsd:complexType name="descType" mixed="true">
<xsd:sequence>
<xsd:any namespace=http://www.w3.org/1999/xhtml
minOccurs="0" maxOccurs="unbounded“
processContents="skip"/>
</xsd:sequence>
</xsd:complexType>
Presenting XML: XSLT
• Why Stylesheets?
– separation of content (XML) from presentation
(XSL)
• Why not just CSS for XML?
– XSL is far more powerful:
• selecting elements
• transforming the XML tree
• content based display (result may depend on
data)
XSLT Overview
• XSLT stylesheets are denoted in XML syntax
• XSL components:
1. a language for transforming XML
documents
(XSLT: integral part of
the XSL specification)
2. an XML formatting vocabulary
(Formatting Objects: >90% of the
formatting properties inherited from CSS)
XSLT Processing Model
Transformatio
n
XSL stylesheet
XML source tree
XML,HTML,… result tree
XSLT Processing Model
• XSL stylesheet:
• template rule:
• main steps:
collection of template rules
(pattern  template)
– match pattern against source tree
– instantiate template (replace current node “.” by the
template in the result tree)
– select further nodes for processing
• control can be
– program-driven ("pull": <xsl:foreach> ...)
– data/event-driven ("push": <xsl:apply-templates> ...)
pattern
Template Rule: Example
template
<xsl:template match="product">
<table>
<xsl:apply-templates select="sales/domestic"/>
</table>
<table>
<xsl:apply-templates select="sales/foreign"/>
</table>
</xsl:template>
(i) match pattern: process <product> elements
(ii) instantiate template: replace each a product with two HTML tables
(iii) select the <product> grandchildren (“sales/domestic”,
“sales/foreign”) for further processing
Match/Select Patterns
• match patterns  select patterns =
defined in http://w3.org/TR/xpath
• Examples:
–
–
–
–
/mybook/chapter[2]/section/*
chapter|appendix
chapter//para
div[@class="appendix" and position()
mod 2 = 1]//para
– ../@lang
Creating the Result Tree...
• Literal result elements: non-XSL elements (e.g.,
HTML) appear “literally” in the result tree
• Constructing elements:
<xsl:element name = "…">
attribute & children definition
</xsl:element>
(similar for
xsl:attribute, xsl:text, xsl:comment,…)
• Generating text:
<xsl:template match="person">
<p>
<xsl:value-of select="@first-name"/>
<xsl:text> </xsl:text>
<xsl:value-of select="@surname"/>
</p>
</xsl:template>
Example of Turning XML into HTML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?>
<FitnessCenter>
<Member level="platinum">
<Name>Jeff</Name>
<Phone type="home">555-1234</Phone>
<Phone type="work">555-4321</Phone>
<FavoriteColor>lightgrey</FavoriteColor>
</Member>
</FitnessCenter>
HTML Document in an XSL Template
<?xml version="1.0"?>
<xsl:output method="html"/>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Tran
version="1.0">
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>Welcome</TITLE>
</HEAD>
<BODY>
Welcome!
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Extracting the Member Name
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>Welcome</TITLE>
</HEAD>
<BODY>
Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>!
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Extracting a Value from an XML
Document,
Navigating the XML Document
• Extracting values:
– use the <xsl:value-of select="…"/> XSL
element
• Navigating:
– The slash ("/") indicates parent/child
relationship
– A slash at the beginning of the path indicates
that it is an absolute path, starting from the
top of the XML document
/FitnessCenter/Member/Name
"Start from the top of the XML document, go to the FitnessCenter element,
from there go to the Member element, and from there go to the Name element."
Document
/
PI
<?xml version=“1.0”?>
Element
FitnessCenter
Element
Member
Element
Name
Text
Jeff
Element
Phone
Element
Phone
Element
FavoriteColor
Text
555-1234
Text
555-4321
Text
lightgrey
Extract the FavoriteColor and use it
as the bgcolor
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>Welcome</TITLE>
</HEAD>
<BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}">
Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>!
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
(see html-example03)
Note
Attribute values cannot contain "<" nor ">"
- Consequently, the following is NOT valid:
<Body bgcolor="<xsl:value-of select='/FitnessCenter/Member/FavoriteColor'/>">
To extract the value of an XML element and use it as an attribute
value you must use curly braces:
<Body bgcolor="{/FitnessCenter/Member/FavoriteColor}">
Evaluate the expression within
the curly braces. Assign the value
to the attribute.
Extract the Home Phone Number
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html"/>
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>Welcome</TITLE>
</HEAD>
<BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}">
Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>!
<BR/>
Your home phone number is:
<xsl:value-of select="/FitnessCenter/Member/Phone[@type='home']"/>
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Creating the Result Tree...
• Further XSL elements for ...
– Numbering
• <xsl:number value="position()" format="1 ">
– Conditions
• <xsl:if test="position() mod 2 = 0">
– Repetition...
Creating the Result Tree: Repetition
<xsl:template match="/">
<html>
<head>
<title>customers</title>
</head>
<body>
<table>
<tbody>
<xsl:for-each select="customers/customer">
<tr>
<th>
<xsl:apply-templates select="name"/>
</th>
<xsl:for-each select="order">
<td>
<xsl:apply-templates/>
</td>
...
</html>
</xsl:template>
Creating the Result Tree: Sorting
<xsl:template match="employees">
<ul>
<xsl:apply-templates select="employee">
<xsl:sort select="name/last"/>
<xsl:sort select="name/first"/>
</xsl:apply-templates>
</ul>
</xsl:template>
<xsl:template match="employee">
<li>
<xsl:value-of select="name/first"/>
<xsl:text> </xsl:text>
<xsl:value-of select="name/last"/>
</li>
</xsl:template>
More on XSL
• XSL(T):
– Conflict resolution for multiple applicable rules
– Modularization <xsl:include> <xsl:import>
– …
• XSL Formatting Objects
– a la CSS
• XPath (navigation syntax + functions)
= XSLT  XPointer
• ...
XQuery: Querying XML Sources
• Functional Query Language
– Operates on the Xpath/XQuery data model
– List of ordered trees
– A document is list of size 1
• XQuery expressions are composed of
–
–
–
–
Path expressions
Element constructors
FLWR expressions
… and more …
Path Expressions
In the second chapter of the document zoo.xml find the
figures with caption “Tree Frogs”
doc(“zoo.xml”)//chapter[2]//figure[caption=“Tree Frogs”]
book
chapter
part
chapter
chapter
chapter
section
part
chapter
paragraph
paragraph
figure
figure
caption
caption
“Tree Frogs” “Just Frogs”
appendix
More Path Expressions
Find the first immediate chapter subelements of immediate part
subelements of the document zoo.xml and retrieve figures that
have …
doc(“zoo.xml”)/part/chapter[1]//figure[caption=“Tree Frogs”]
book
chapter
part
chapter
chapter
chapter
section
part
chapter
paragraph
paragraph
figure
figure
caption
caption
“Tree Frogs” “Just Frogs”
appendix
Element Construction
In the second chapter of the document zoo.xml find the
figures with caption “Tree Frogs” and place them into an
element called result
<result>
doc(“zoo.xml”)//chapter[2]//figure[caption=“Tree Frogs”]
</result>
result
figure
caption
“Tree Frogs”
Bibliography Example Data Set
<bib>
<book>
<author> Aho </author>
<author> Hopcroft </author>
<author> Ullman </author>
<title> Automata Theory </title>
<publisher> Morgan Kaufmann </publisher>
<year> 1998 >/year>
</book>
<book>
<author> Ullman </author>
<title> Database Systems </title>
<publisher> Morgan Kaufmann </publisher>
<year> 1998 >/year>
</book>
<book>
<author> Abiteboul </author>
<author> Buneman </author>
<author> Suciu </author>
<title> Automata Theory </title>
<publisher> Prentice Hall </publisher>
<year> 1998 >/year>
</book>
</bib>
Reviews Example Data Set
<reviews>
<review>
<title> Automata Theory </title>
<comment> It’s the best in automata theory </comment>
<comment> A definitive textbook </comment>
</review>
…
</reviews>
For-Let-Where-Return (FLWR)
List the titles of books published by “Morgan
Kaufmann”
FOR $b in doc(“bib.xml”)//book
WHERE $b/publisher = “Morgan Kaufmann”
RETURN $b/title
bib
book
book
title
publisher year
Morgan
Kaufmann
1998
book
book
title
publisher year
title
publisher year
1998
Morgan
Kaufmann
Prentice 1998
Hall
Think (tuples of) variable bindings
bib
book
title
publisher year
Morgan
Kaufmann
1998
book
title
publisher year
title
publisher year
1998
Morgan
Kaufmann
Prentice 1998
Hall
FOR/LET
Ordered lists of tuples
of variable bindings
WHERE
Tuples of that
satisfy the conditions
RETURN
List of trees
book
$b
book
book
book
title
$b
book
book
title
year
FOR $b in doc(“bib.xml”)//book
WHERE $b/year > 1990
RETURN $b/author
Return the list of authors
who published after 1990
Tuples
List publishers who have published
more than 1 book
FOR $p in distinct(doc(“bib.xml”)//publisher)
LET $b := document(“bib.xml”)//book[publisher = $p]
WHERE count($b) > 1
RETURN $p
Tuples ($p, $b) are formulated
Boolean Expressions in WHERE
List the titles of books published by “Morgan
Kaufmann” in 1998
FOR $b in doc(“bib.xml”)//book
WHERE $b/publisher = “Morgan Kaufmann”
AND $b/year = “1998”
RETURN $b/title
Joins
FOR $b in doc(“bib.xml”)/book,
$r in doc(“review.xml”)/review
WHERE $b/title = $r/title
RETURN
<book_with_review>
{$b/@*}
{$b/*}
{$r/comment}
</book_with_review>
For every book with a
matching review output
a book_with_review
that contains all the attributes
and subelements of book
and the comment
subelements of review
<book_with_review>
<author> Aho </author>
<author> Hopcroft </author>
<author> Ullman </author>
<title> Automata Theory </title>
<publisher> Morgan Kaufmann </publisher>
<year> 1998 >/year>
<comment> It’s the best in automata theory </comment>
<comment> A definitive textbook </comment>
</book_with_review>
Relax Order Conditions
List the titles of books published by “Morgan
Kaufmann” in 1998
FOR $b in unordered(doc(“bib.xml”)//book)
WHERE $b/publisher = “Morgan Kaufmann”
AND $b/year = “1998”
RETURN $b/title
Very important feature in dealing with
relational sources and other set-oriented sources.
SELECT title
FROM bib
WHERE publisher = “Morgan Kaufmann” AND year =1998
Depending on the indices and access methods used,
the SQL query processor may deliver the tuples in
different order
Nested queries
FOR $a IN distinct(document(“bib.xml”)//author/text())
RETURN
<author>
<name> $a </name>
{
FOR $b IN document(“bib.xml”)//book[author=$a]
RETURN $b/title
}
</author>
Invert the structure of the input document so that
there is a list of author elements containing the name
of the author and the list of books he wrote
Conditionals
FOR $b IN doc(“bib.xml”)/book
RETURN
<short>
{$b/title}
<author>
{IF count($b/author) < 3
{$b/author}
ELSE
{$b/author[1], <author>and others</author>
</short>
Existential and Universal
Quantification
Return books where at least one
of the authors is “Ullman”
FOR $b in doc(“bib.xml”)/book
WHERE $b/author = “Ullman”
RETURN $b
Return books where all authors
are “Ullman”
FOR $b in doc(“bib.xml”)/book
WHERE EVERY $author IN $b/author
SATISFIES $author= “Ullman”
RETURN $b
Functions
DEFINE FUNCTION depth($e) RETURNS xsd:integer
{
IF (empty($e/*) THEN 1
ELSE max(depth($e/*) + 1
}
FOR $b in doc(“bib.xml”)/book
RETURN depth($b)
Applicability of XML Query
Languages (Xquery)
• XQuery standard does NOT elaborate on
the physical aspects of the XML sources
• Custom functions can provide access and
reference to the source(s)
– document(“test.xml”), source(“view1”)
• Question: as we go down the list of uses
of XQuery compare with XSL
XQuery on files, DOM objects, event
streams, messages
XQuery
XQuery Processor
XML
File
DOM
Object
SAX
Stream
• Usage scenarios
– Transformation and processing of messages
• Significant (but not “killer”) advantages over XSL
– Minor performance optimization superiority
– Better streaming, pipelining
– Cleaner extensible language
• Many academic and industrial prototypes of XQuery on files
Typical Scenario: XML Messaging
Requests in
native language
or special
wrapper API
SELECT *
FROM Customer, Order
WHERE customer.name=“Joe”
AND order.name=“Joe”
Application
SOAP service
Message
Transformer
Cdom = Sap(conn1, “joe”)
Wrapper
Wrapper
RDBMS
SAP ERP
Summary of Steps
Developer’s
Program
Issues
SQL Query
Wrapper
returns
SQL result
wrapped as
XML message
Developer’s
XQuery
transforms
XML message to
XML format
needed by
app
Typical Scenario: XML Messaging
FOR $cn IN distinct(msg(123)/customer/name)
RETURN
<customer>
$cn
<due>
7.8 * msg(123)/customer[name=$cn]/balance
</due>
<orders>
FOR $c IN msg(123)/customer
WHERE $c/name = $cn
Message
RETURN
Transformer
{$c/order}
</orders>
</customer>
Application
Wrapper
RDBMS
<query_result>
<customer>
<name> Joe </name>
<balance>100M</balance>
<order> fish… </order>
</customer>
<customer>
<name> Joe </name>
<balance>100M</balance>
<order> meat… </order>
</customer>
<customer>
<name> Joe </name>
<due> 780M </due>
<orders>
<order>fish</order>
<order>meat</order>
</orders>
</customer>
SOAP servi
Wrapper
SAP ERP
Direct XQuery on Databases
XQuery
Let’s write
a Russian
Doll
schema
XML
result
XML View of Relational DB
reldb
Xquery Processor
SQL (one
or more)
RDBMS
tuples
orders
customers
tuple
tuple
name
balance
Joe
100M
XQuery on Relational Databases
FOR $c IN db(1)/customers/tuple
WHERE $c/name = “Joe”
RETURN
<customer>
$c/name
<due>
7.8 * $c/balance
</due>
<orders>
FOR $o IN db(1)/orders/tuple
WHERE $c/name = $o/name
RETURN $o
</orders>
</customer>
<customer>
<name> Joe </name>
<due> 780M </due>
<orders>
<order>fish</order>
<order>meat</order>
</orders>
</customer>
XML View of Relational DB
Xquery Processor
SELECT * FROM customers
WHERE name = “Joe”
For each customer #c
SELECT * FROM orders
WHERE orders.name = #c.name
Merge results
RDBMS
Summary of Steps
Developer’s
Program
Issues
Xquery on
XML view of
SQL DB
Xquery
Processor automatically
sends SQL queries
to DB and structures
XML result
XQuery on Relational Databases
• Single language for accessing database
and structuring XML result
• Avoids deficiencies of SQL in dealing with
nested structures, optional elements, etc
• …
XQuery on Distributed Sources
XML
result
XQuery
XML View of All Sources
Xquery Processor (Mediator)
RDBMS
XML
File
RDBMS
Example:
Access to Two Relational Databases
FOR $c IN db(1)/customers/tuple
WHERE $c/name = “Joe”
RETURN
<customer>
$c/name
<due>
7.8 * $c/balance
</due>
<orders>
FOR $o IN db(2)/orders/tuple
WHERE $c/name = $o/name
RETURN $o
</orders>
</customer>
XQuery
XML
result
XML View of All Relational DBs
Xquery Processor (Mediator)
RDBMS
(orders)
RDBMS
(customers)
XQuery on Integrated Views
Let’s write
the “Joe”
query again
XML
result
XQuery
view
Virtual Integrated XML View
Xquery Processor (Mediator)
customers
customer
name
Joe
RDBMS
XML
File
RDBMS
customer
balance orders
100M
order
order order
and using XQuery to build the view
XML
result
XQuery
Virtual Integrated XML View
Xquery Processor (Mediator)
RDBMS
XML
File
RDBMS
XQuery as
View
Definition
View = Query
FOR $c IN db(1)/customers/tuple
RETURN
<customer>
$c/name
<due>
7.8 * $c/balance
</due>
<orders>
FOR $o IN db(2)/orders/tuple
WHERE $c/name = $o/name
RETURN $o
</orders>
</customer>