Initial Lecture

Download Report

Transcript Initial Lecture

XML Querying and Views
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
November 1, 2007
Some slide content courtesy of Susan Davidson, Dan Suciu, & Raghu Ramakrishnan
Administrivia
Homework 4 due 11/7
 XQuery
Handout re: project to come shortly – before project plan will
be due
Review the Shanmugasundaram paper by Wednesday 11/7
 Post a 1-page summary to the newsgroup
 What problems did they address? Basic techniques? Strengths &
weaknesses of their methods?
2
XQuery’s Basic Form
 Has an analogous form to SQL’s
SELECT..FROM..WHERE..GROUP BY..ORDER BY
 The model: bind nodes (or node sets) to variables; operate
over each legal combination of bindings; produce a set of
nodes
 “FLWOR” statement [note case sensitivity!]:
for {iterators that bind variables}
let {collections}
where {conditions}
order by {order-conditions} (older version was “SORTBY”)
return {output constructor}
3
“Iterations” in XQuery
A series of (possibly nested) FOR statements assigning the results of XPaths
to variables
for $root in document(“http://my.org/my.xml”)
for $sub in $root/rootElement,
$sub2 in $sub/subElement, …
 Something like a template that pattern-matches, produces a “binding
tuple”
 For each of these, we evaluate the WHERE and possibly output the
RETURN template
 document() or doc() function specifies an input file as a URI
 Old version was “document”; now “doc” but it depends on your XQuery
implementation
4
Example XML Data
Root
?xml
2002…
p-i
element
dblp
article
mdate
key
author title year school
1992
ms/Brown92
key
editor title journal volume year ee ee
2002…
tr/dec/…
PRPL…
Kurt P….
attribute
text
mastersthesis
mdate
root
Digital…
Univ….
1997
The…
Paul R.
db/labs/dec
SRC…
http://www.
5
Two XQuery Examples
<root-tag> {
for $p in document(“dblp.xml”)/dblp/proceedings,
$yr in $p/yr
where $yr = “1999”
return <proc> {$p} </proc>
} </root-tag>
for $i in document(“dblp.xml”)/dblp/inproceedings[author/text() = “John Smith”]
return <smith-paper>
<title>{ $i/title/text() }</title>
<key>{ $i/@key }</key>
{ $i/crossref }
</smith-paper>
6
Another Example
Root
?xml
name
country
…
…
p-i
element
mastersthesis
key
author title year
Univ….
attribute
text
universities
university
root
school
USA
1999
ms/Brown92
PRPL…
Univ….
Kurt P….
7
What If Order Doesn’t Matter?
By default:
 SQL is unordered
 XQuery is ordered everywhere!
 But unordered queries are much faster to answer
XQuery has a way of telling the query engine to avoid
preserving order:
 unordered {
for $x in (mypath) …
}
8
Querying & Defining Metadata –
Can’t Do This in SQL
Can get a node’s name by querying node-name():
for $x in document(“dblp.xml”)/dblp/*
return node-name($x)
Can construct elements and attributes using computed names:
for $x in document(“dblp.xml”)/dblp/*,
$year in $x/year,
$title in $x/title/text(),
element node-name($x) {
attribute {“year-” + $year} { $title }
}
9
XQuery Wrap-up
 XQuery is very SQL-like, but in some ways cleaner
and more orthogonal
 It is based on paths and binding tuples, with
collections and trees as its first-class objects
 See www.w3.org/TR/xquery/ for more details on the
language
10
XQuery Works Well with Schema,
and Validates Against It (Incl. Keys)
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType name=“ThesisType">
<xsd:attribute name=“key" type="xsd:string"/>
…
<xsd:sequence>
<xsd:element name=“author" type=“xsd:string"/>
...
<xsd:element name=“school" type=“xsd:string”/>
…
</xsd:sequence>
</xsd:complexType>
<xsd:element name=“mastersthesis" type=“ThesisType">
<xsd:key name=“mtId">
<xsd:selector xpath=“.”/> <xsd:field xpath="@key"/>
</xsd:key>
<xsd:keyref name=“schoolRef” refer=“schoolId">
<xsd:selector xpath=“./school”/> <xsd:field xpath=“./text()"/>
</xsd:keyref>
</xsd:element>
</xsd:schema>
11
A Problem
 We frequently want to reference data in a way that differs
from the way it’s stored




XML data  HTML, text, etc.
Relational data  XML data
Relational data  Different relational representation
XML data  Different XML representation
 Generally, these can all be thought of as different views over
the data
 A view is a named query
 Let’s start with a special presentation language for XML  HTML
12
XSL(T): XML  “Other Stuff”
 XSL (XML Stylesheet Language) is actually divided into two
parts:
 XSL:FO: formatting for XML
 XSLT: a special transformation language
 We’ll leave XSL:FO for you to read off www.w3.org, if you’re
interested
 XSLT is actually able to convert from XML  HTML, which
is how many people do their formatting today
 Products like Apache Cocoon generally translate XML  HTML on
the server side
 Your browser will do XML  HTML on the client side
13
A Different Style of Language
 XSLT is based on a series of templates that match
different parts of an XML document
 There’s a policy for what rule or template is applied if
more than one matches (it’s not what you’d think!)
 XSLT templates can invoke other templates
 XSLT templates can be nonterminating (beware!)
 XSLT templates are based on XPath “match”es, and
we can also apply other templates (potentially to
“select”ed XPaths)
 Within each template, we describe what should be output
 (Matches to text default to outputting it)
14
An XSLT Stylesheet
<xsl:stylesheet version=“1.1”>
<xsl:template match=“/dblp”>
<html><head>This is DBLP</head>
<body>
<xsl:apply-templates />
</body>
</html>
</xsl:template>
<xsl:template match=“inproceedings”>
<h2><xsl:apply-templates select=“title” /></h2>
<p><xsl:apply-templates select=“author”/></p>
</xsl:template>
…
</xsl:stylesheet>
15
Results of XSLT Stylesheet
<dblp>
<inproceedings>
<title>Paper1</title>
<author>Smith</author>
</inproceedings>
<inproceedings>
<author>Chakrabarti</author>
<author>Gray</author>
<title>Paper2</title>
</inproceedings>
</dblp>
<html><head>This Is DBLP</head>
<body>
<h2>Paper1</h2>
<p>Smith</p>
<h2>Paper2</h2>
<p>Chakrabarti</p>
<p>Gray</p>
</body>
</html>
16
What XSLT Can and Can’t Do
 XSLT is great at converting XML to other formats
 XML  diagrams in SVG; HTML; LaTeX
 …
 XSLT doesn’t do joins (well), it only works on one XML file at a time, and
it’s limited in certain respects
 It’s not a query language, really
 … But it’s a very good formatting language
 Most web browsers (post Netscape 4.7x) support XSLT and XSL
formatting objects
 But most real implementations use XSLT with something like Apache
Cocoon – compatible with more browsers
 You should use XSL/XSLT to format the forms and pages for your
projects – see www.w3.org/TR/xslt or the chapter we handed out for
more details
17
Other Forms of Views
XSLT is a language primarily designed from going from
XML  non-XML
Obviously, we can do XML  XML in XQuery
… Or relations  relations
… What about relations  XML and XML  relations?
Let’s start with XML  XML, relations  relations
18
Views in SQL and XQuery
 A view is a named query
 We use the name of the view to invoke the query
(treating it as if it were the relation it returns)
Using the views:
SQL:
CREATE VIEW V(A,B,C) AS
SELECT *
SELECT A,B,C FROM R
FROM V, R
WHERE R.A = “123”
WHERE V.B = 5
AND V.C = R.C
XQuery:
declare function V() as element(content)* {
for $v in V()/content,
for $r in doc(“R”)/root/tree,
$r in doc(“r”)/root/tree
$a in $r/a, $b in $r/b, $c in $r/c
where $v/b = $r/b
where $a = “123”
return $v
return <content>{$a, $b, $c}</content>
}
19
What’s Useful about Views
Providing security/access control
 We can assign users permissions on different views
 Can select or project so we only reveal what we want!
Can be used as relations in other queries
 Allows the user to query things that make more sense
Describe transformations from one schema (the base relations)
to another (the output of the view)
 The basis of converting from XML to relations or vice versa
 This will be incredibly useful in data integration, discussed soon…
Allow us to define recursive queries
20
Materialized vs. Virtual Views
 A virtual view is a named query that is actually re-computed
every time – it is merged with the referencing query
CREATE VIEW V(A,B,C) AS
SELECT A,B,C FROM R
WHERE R.A = “123”
SELECT *
FROM V, R
WHERE V.B = 5 AND V.C = R.C
 A materialized view is one that is computed once and its
results are stored as a table




Think of this as a cached answer
These are incredibly useful!
Techniques exist for using materialized views to answer other queries
Materialized views are the basis of relating tables in different schemas
21
Views Should Stay Fresh
 Views (sometimes called intensional relations) behave,
from the perspective of a query language, exactly
like base relations (extensional relations)
 But there’s an association that should be maintained:
 If tuples change in the base relation, they should change in
the view (whether it’s materialized or not)
 If tuples change in the view, that should reflect in the base
relation(s)
22
View Maintenance and
the View Update Problem
 There exist algorithms to incrementally recompute a
materialized view when the base relations change
 We can try to propagate view changes to the base relations
 However, there are lots of views that aren’t easily updatable:
R
B C
R⋈S A B C
1 2
2 4
1 2 4
2 2
2 3
1 2 3
A B
S
delete?
2 2 4
 We can ensure views are updatable
by enforcing certain constraints (e.g., no aggregation), 2 2 3
but this limits the kinds of views we can have
23
Views as a Bridge between Data Models
A claim we’ve made several times:
“XML can’t represent anything that can’t be expressed in in
the relational model”
If this is true, then we must be able to represent XML
in relations
Store a relational view of XML
(or create an XML view of relations)
24
A View as a Translation between
XML and Relations
 You’ll be reviewing the most-cited paper in this area
(Shanmugasundaram et al), and there are many more
(Fernandez et al., …)
 Techniques already making it into commercial
systems
 XPERANTO at IBM Research will appear in DB2 v9
 SQL Server has some XML support; Oracle is also doing
some XML
 … Next time you’ll see how it works!
25