Mapping XML Schemas to Database Schemas

Download Report

Transcript Mapping XML Schemas to Database Schemas

Mapping DTDs
to Databases
Ronald Bourret
[email protected]
http://www.rpbourret.com
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Overview
•
•
•
•
•
Table-based mappings
Object-based mappings
Generating relational schemas from DTDs
Generating DTDs from relational schemas
Mapping XML Schemas to databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table-based Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
There is an obvious mapping from
this XML document ...
<A>
<B>
<C>ccc</C>
<D>ddd</D>
<E>eee</E>
</B>
<B>
<C>fff</C>
<D>ggg</D>
<E>hhh</E>
</B>
</A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this table
<A>
<B>
<C>ccc</C>
<D>ddd</D>
<E>eee</E>
</B>
<B>
<C>fff</C>
<D>ggg</D>
<E>hhh</E>
</B>
</A>
Table A
C
D
... ...
ccc ddd
fff ggg
... ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
E
...
eee
hhh
...
How does the mapping work?
• Views the XML document as a single table ...
<Table>
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
...
</Table>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does the mapping work?
• ... or a set of tables
<Tables>
<Table_1>
<Row>
<Column_1>...</Column_1>
...
<Column_n>...</Column_n>
</Row>
...
</Table_1>
...
<Table_n>
...
</Table_n>
</Tables>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table-based Mappings
• Pros:
» Easy to understand
» Easy to write code to transfer data
» Efficient way to transfer data between relational databases
• Cons:
» Only works for a small subset of XML documents
• Used by data transfer middleware such as ASP2XML,
DatabaseDOM (IBM), DB2XML, DBIx::XML_RDB,
ODBC Socket Server, and XML-DB Link
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Object-based Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
There is also an obvious mapping
from this XML document ...
<A>
<B>bbb</B>
<C>ccc</C>
<D>ddd</D>
</A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this object ...
<A>
<B>bbb</B>
<C>ccc</C>
<D>ddd</D>
</A>
object
B =
C =
D =
}
A {
"bbb"
"ccc"
"ddd"
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to a row in this table
<A>
<B>bbb</B>
<C>ccc</C>
<D>ddd</D>
</A>
object
B =
C =
D =
}
A {
"bbb"
"ccc"
"ddd"
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table A
B
C
D
... ... ...
bbb ccc ddd
... ... ...
And an obvious mapping from
this element type definition ...
<!ELEMENT A (B, C, D)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this class ...
<!ELEMENT A (B, C, D)>
class A {
String B;
String C;
String D;
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
... to this table schema
<!ELEMENT A (B, C, D)>
class A {
String B;
String C;
String D;
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
CREATE TABLE A (
B VARCHAR(10)
NOT NULL,
C VARCHAR(10)
NOT NULL,
D VARCHAR(10)
NOT NULL
)
A more complex example
• This XML document ...
<SalesOrder>
<Number>1234</Number>
<Customer>Gallagher Industries</Customer>
<Date>29.10.00</Date>
<Line Number="1">
<Part>A-10</Part>
<Quantity>12</Quantity>
<Price>10.95</Price>
</Line>
<Line Number="2">
<Part>B-43</Part>
<Quantity>600</Quantity>
<Price>3.99</Price>
</Line>
</SalesOrder>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
A more complex example
• ... maps to these objects ...
object SalesOrder {
number = 1234;
customer = "Gallagher Industries";
date = 29.10.00;
lines = {ptrs to Line objects};
}
object Line {
number = 1;
part = "A-10";
quantity = 12;
price = 10.95;
}
object Line {
number = 2;
part = "B-43";
quantity = 600;
price = 3.95;
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
A more complex example
• ... which map to these rows
SaleOrders
Number
Customer
1234
Gallagher Industries
...
...
...
...
Lines
SONumber
1234
1234
...
Line
1
2
...
Part
A-10
B-43
...
Date
29.10.00
...
...
Quantity
12
600
...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Price
10.95
3.99
...
Objects are data-specific...
• Different for each DTD (schema)
• Model the content (data) of the document
<Orders>
<SalesOrder SONumber="12345">
<Customer CustNumber="543">
...
</Customer>
<OrderDate>150999</OrderDate>
<Line LineNumber="1">
<Part Name="Cherries">
...
</Part>
<Qty Unit="ton">2</Qty>
</Line>
</SalesOrder>
</Orders>
Orders
SalesOrder
Customer
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Line
Part
... not the DOM
• Same for all XML documents
• Model the structure of the document
<Orders>
<SalesOrder SONumber="12345">
<Customer CustNumber="543">
...
</Customer>
<OrderDate>150999</OrderDate>
<Line LineNumber="1">
<Part Name="Cherries">
...
</Part>
<Qty Unit="ton">2</Qty>
</Line>
</SalesOrder>
</Orders>
Element
(Orders)
Element
Attr
(SalesOrder)
(SONumber)
Element Element Element
(Customer)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
(OrderDate)
(Line)
How does the mapping work?
1. Map a DTD to an object schema
2. Map the object schema to a database schema
» Direct mapping to object-oriented databases
» Object-relational mapping to relational databases
3. (Optional) Combine steps (1) and (2) for a
DTD-to-database mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
How does data transfer work?
• With intermediate objects:
1. Transfer data from an XML document to a tree of objects
2. Transfer data from objects to the database
• Without intermediate objects:
1. Transfer data directly from an XML document to the database
• Whether intermediate objects are useful depends
on the application
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
The Basic Mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Element types are data types
• “Simple” element types have PCDATA-only content
<!ELEMENT Number (#PCDATA>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Quantity (#PCDATA)>
• “Complex” element types have element or mixed
content and/or attributes
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
Order (Number, Date, Customer, Line*)>
Line (LineNum, Quantity, Part)>
Customer EMPTY>
Customer
CustNum CDATA #REQUIRED
Name
CDATA #REQUIRED
Address CDATA #REQUIRED>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping “complex” element types
• Map complex element types to classes ...
<!ELEMENT A
<!ATTLIST A
F
<!ELEMENT B
<!ELEMENT C
(B, C)>
class A {
...
}
CDATA #REQUIRED>
(#PCDATA)>
(D, E)>
• ... which are mapped to tables (class tables)
class A {
...
}
Table A
...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping content models
• Map references to simple element types to scalar
properties ...
<!ELEMENT A
<!ATTLIST A
F
<!ELEMENT B
<!ELEMENT C
(B, C)>
class A {
String b;
...
}
CDATA #REQUIRED>
(#PCDATA)>
(D, E)>
• ... which are mapped to data columns
class A {
String b;
...
}
Table A
Column b ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping content models (cont.)
• Map references to “complex” element types
to pointer/reference properties ...
<!ELEMENT A
<!ATTLIST A
F
<!ELEMENT B
<!ELEMENT C
(B, C)>
CDATA #REQUIRED>
(#PCDATA)>
(D, E)>
class A {
String b;
C
c;
...
}
class C {
...
}
• ... which are mapped to primary / foreign key columns
class A {
String b;
C
c;
...
}
class C {
...
}
Table A
Column b Column c ...
Table C
Column c ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping attributes
• Map attributes to scalar properties ...
<!ELEMENT A
<!ATTLIST A
F
<!ELEMENT B
<!ELEMENT C
(B, C)>
class A {
String b;
C
c;
String f;
}
CDATA #REQUIRED>
(#PCDATA)>
(D, E)>
• ... which are mapped to data columns
class A {
String b;
C
c;
String f;
}
Table A
Column b Column c Column f
Table C
Column c ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
The basic mapping: summary
•
•
•
•
Map “complex” element types to classes, then to tables
Map content models to properties, then to columns
Map attributes to properties, then to columns
Join class tables with primary key / foreign key pairs
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Some Important Points
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
References are not definitions
• References must be mapped separately for each content
model
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Chapter (Title, Section+)>
Appendix (Title, Section+)>
Title (#PCDATA)>
Section (#PCDATA)>
class Chapter {
String
title;
String[] section;
}
class Appendix {
String
title;
String[] section;
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Data types
• Can map “simple” data types to any scalar data type
<!ELEMENT Part (Number, Price)>
<!ELEMENT Number (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
class Part {
String number;
float price;
}
CREATE TABLE Part (
number CHAR(10) NOT NULL,
price REAL NOT NULL
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Names can be mapped, too
• DTD, object schema, and relational schema can use
different names
<!ELEMENT Part (Number, Price)>
<!ELEMENT Number (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
class PartClass {
String numberProp;
float priceProp;
}
CREATE TABLE PRT (
PRTNUM
CHAR(10) NOT NULL,
PRTPRICE REAL NOT NULL
)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Primary and foreign keys
• Primary key on “one” side of one-to-many relationship
• May be in table of parent or child
<SalesOrder>
<Number>123</Number>
<Date>10/29/00</Date>
<Line>
<LineNum>1</LineNum>
<Part>
<PartNum>ABC</PartNum>
<Price>12.95</Price>
</Part>
<Quantity>3</Quantity>
</Line>
<Line>
...
</Line>
</SalesOrder>
Table Sales
Number, Date
Table Lines
SONum, Num, Part, Qty
Table Parts
Number, Price
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Complex Content
Models
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Complex content models
• How do you map the following?
<!ELEMENT A (B?,
(C |
((D | E | F | G)*,
(H | I)+,
J?)))>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Sequences
• Map each reference in a sequence to a property ...
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B, C, D)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String b;
String c;
D
d;
}
• ...which map to tables and columns as appropriate
class A {
String b;
String c;
D
d;
}
Table A
Column b Column c Column d
Table D
Column d ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Choices
• Map each reference in a choice to a property ...
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B | C | D)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String b;
String c;
D
d;
}
• ... which map to nullable columns
class A {
String b;
String c;
D
d;
}
CREATE TABLE A (
b VARCHAR(10),
c VARCHAR(10),
d INTEGER
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
No
NOT NULL
clause
Choices (cont.)
• Property values can be null ...
<A>
<B>bbb</B>
</A>
object
b =
c =
d =
}
a {
"bbb"
null
null
• ... so column values can be NULL
object
b =
c =
d =
}
a {
"bbb"
null
null
Table A
b
c
d
... ... ...
bbb NULL NULL
... ... ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Repeated children
• Map to multi-valued properties
• Map repeated references to arrays of known size ...
<!ELEMENT A (B, B, B, C)>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>
class A {
String[3] b;
String
c;
}
• ... which map to multiple columns (shown) or
separate tables
class A {
String[3] b;
String
c;
}
Table A
Column b1 Column b2 Column b3 ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Repeated children (cont.)
• Map references with * or + operator to arrays of
unknown size ...
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
b
c
D
(B*, C, D)>
(#PCDATA)>
(#PCDATA)>
(#PCDATA)>
class A {
String[] b;
String
c;
String
d
}
• ... which map to separate tables (property tables)
class A {
String[] b;
String
c;
String
d
}
Table A
Column a Column c Column d
Table B
Column a Column b
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Optional children
• Map to nullable properties, then to nullable columns
• Applies to children in a choice ...
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B | C | D)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String b; // May be null
String c; // May be null
D
d; // May be null
}
• ... and to children with ? or * operator
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B?, C*, D)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String
b; // May be null
String[] c; // May be null
D
d;
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups
• Map references in subgroup to properties of parent class
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B, (C | D))>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String b;
String c;
D
d;
}
• ... which map to columns in class table
class A {
String b;
String c;
D
d;
}
Table A
Column b, Column c, Column d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups (cont.)
• Works because elements in subgroup are still children
of parent
<!ELEMENT A (B, (C | D))>
<A>
<B>bbbbbb</B>
<C>cccccc</C>
</A>
<A>
<B>bbbbbb</B>
<D>
<E>eee</E>
<F>fff</F>
</D>
</A>
object
b =
c =
d =
}
a {
"bbbbbb"
"cccccc"
null
object
b =
c =
d =
}
a {
"bbbbbb"
null
ptr. to object d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Subgroups (cont.)
• Repeatability and optionality can be indirect
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
E
class A {
String
String[]
D[]
String[]
}
(B, (C | (D, E))+)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
(#PCDATA)>
b;
c; // May be null
d; // May be null
e; // May be null
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Mixed Content
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Model PCDATA as elements
<A>
This text <c>cc</c> makes
<b>bbbb</b> no sense
<c>cccc</c> except as
<b>bb</b> an example.
</A>
<A>
<pcdata>This text </pcdata>
<c>cc</c>
<pcdata> makes</pcdata>
<b>bbbb</b>
<pcdata> no sense</pcdata>
<c>cccc</c>
<pcdata> except as</pcdata>
<b>bb</b>
<pcdata> an example.</pcdata>
</A>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping mixed content
• Map PCDATA and element references to arrays of
unknown size ...
<!ELEMENT A (#PCDATA | B | C)*>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>
class A {
String[] pcdata;
String[] b;
String[] c;
}
• ... which are mapped to property tables
class A {
String[] pcdata;
String[] b;
String[] c;
}
Table PCDATA
Column a Column pcdata
Table A
Column a
Table B
Column a Column b
Table C
Column a Column c
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example
<A>
This text <c>cc</c> makes
<b>bbbb</b> no sense
<c>cccc</c> except as
<b>bb</b> an example.
</A>
object a {
pcdata = {"This text ",
" makes ",
" no sense ",
" except as",
" an example."}
b = {"bbbb", "bb"}
c = {"cc", "cccc"}
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example (cont.)
object a {
pcdata = {"This text ",
" makes ",
" no sense ",
" except as",
" an example."}
b = {"bbbb", "bb"}
c = {"cc", "cccc"}
}
Table PCDATA
a pcdata
1 This text
1 makes
1 no sense
1 except as
1 an example.
Table A
a
1
Table B
a
b
1
bbbb
1
bb
Table C
a
c
1
cc
1
cccc
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Sibling Order
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Siblings
• Sibling means “brother or sister”
• Sibling elements and text have the same parent
<A>
This text <C>cc</C> makes
<B>bbbb</B> no sense
<C>cccc</C> except as
<B>bb</B> an example
</A>
A
This text
C
cc
makes
B
bbbb
no sense
C
except as
cccc
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
B
bb
an example
Sibling order
• Sibling order is order in which siblings occur
A
This text
1
C
2
cc
makes
3
B no sense C except as
4
5
6
7
bbbb
cccc
B an example
8
9
bb
• Sibling order is different from hierarchical order ...
1
2 This text
3
A
C
cc
makes
B
bbbb
no sense
C
except as
cccc
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
B
bb
an example
Sibling order (cont.)
• ... and from document order
This text C
2
3
cc
4
A
1
makes B no sense C except as B an example
5
6
8
9
11
12
14
bbbb
cccc
bb
7
10
13
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is sibling order important?
• Usually not important in data-centric applications
<Part>
<Number>123</Number>
<Desc>Turkey wrench</Desc>
<Price>10.95</Price>
</Part>
<Part>
<Price>10.95</Price>
<Desc>Turkey wrench</Desc>
<Number>123</Number>
</Part>
object part {
number = 123
desc = "Turkey wrench"
price = 10.95
}
• Exception: Document validated against DTD
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Is sibling order important? (cont.)
• Very important in document-centric applications
<Review>
<p>Ronald Bourret is an
<b>excellent writer</b>.
Only an <b>idiot</b>
wouldn’t read his work.</p>
</Review>
<Review>
<p>Ronald Bourret is an
<b>idiot</b>. Only an
<b>execellent writer</b>
wouldn’t read his work.</p>
</Review>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping sibling order
• Store order values in:
» Order properties/columns
» Mapping
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order properties/columns
• One order property per element or PCDATA ...
<!ELEMENT A (#PCDATA | B | C)*>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>
class A {
String[]
int[]
String[]
int[]
String[]
int[]
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
pcdata;
pcdataOrder;
b;
bOrder;
c;
cOrder
Order properties/columns (cont.)
• ... each of which is mapped to a separate column
class A {
String[]
int[]
String[]
int[]
String[]
int[]
}
pcdata;
pcdataOrder;
b;
bOrder;
c;
cOrder
Table PCDATA
a, pcdata, pcdataOrder
Table A
Column a
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table C
a, c, cOrder
Table B
a, b, bOrder
Mixed content example
• All sibling order properties share the same order space
<A>
This text <c>cc</c> makes
<b>bbbb</b> no sense
<c>cccc</c> except as
<b>bb</b> an example.
</A>
object a {
pcdata = {"This text ",
" makes ",
" no sense ",
" except as",
" an example."}
pcdataOrder = {1, 3, 5, 7, 9}
b = {"bbbb", "bb"}
bOrder = {4, 8}
c = {"cc", "cccc"}
cOrder = {2, 6}
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mixed content example (cont.)
object a {
pcdata={"This text ",
" makes ",
" no sense ",
" except as",
" an example."}
pcdataOrder={1,3,5,7,9}
b={"bbbb","bb"}
bOrder={4,8}
c={"cc","cccc"}
cOrder={2,6}
}
Table PCDATA
a pcdata
order
1 This text
1
1 makes
3
1 no sense
5
1 except as
7
1 an example. 9
Table A
a
1
Table B
a
b
order
1
bbbb
4
1
bb
8
Table C
a
c
order
1
cc
2
1
cccc
6
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Element content and
order properties / columns
• Element content can use order properties / columns
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
A
B
C
D
(B, C, D)>
(#PCDATA)>
(#PCDATA)>
(E, F)>
class A {
String
int
String
int
D
int
}
b;
bOrder;
c;
cOrder;
d;
dOrder;
Table A
a, b, bOrder, c, cOrder
Table D
a, dOrder, ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Storing order in the mapping
• Order values can be stored in the mapping
Element Type
B
C
D
Order
1
2
3
• Children of same type must be groupable
OK:
<!ELEMENT A (B, C, D)>
<!ELEMENT A (B*, C+, D)>
<!ELEMENT A (B, (C | D)+>
Not OK: <!ELEMENT A (B, C, B, C, D)>
<!ELEMENT A (B*, (C, D)+)>
• No ordering within type (data-centric documents only)
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping Attributes
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Single-valued attributes
• CDATA, ID, IDREF, NMTOKEN, ENTITY,
NOTATION, and enumerated
• Map to scalar-valued properties ...
class A {
String b;
String c;
String d;
}
<!ELEMENT A (B, C)>
<!ATTLIST A
D CDATA #IMPLIED>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>
• ... which map to columns
class A {
String b;
String c;
String d;
}
Table A
Column b Column c Column d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Multi-valued attributes
• IDREFS, NMTOKENS, and ENTITIES
• Map to arrays of unknown size ...
<!ELEMENT A (B, C)>
<!ATTLIST A
D IDREFS #IMPLIED>
<!ELEMENT B (#PCDATA)>
<!ELEMENT C (#PCDATA)>
class A {
String
b;
String
c;
String[] d;
}
• ... which map to property tables
class A {
String
b;
String
c;
String[] d;
}
Table A
Column a Column b Column c
Table D
Column a Column d
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order of attributes
• Not significant according to XML Information Set
• No order properties needed
<A B="bbb"
C="ccc"
D="ddd"/>
=
<A C="ccc"
D="ddd"
B="bbb"/>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Order in multi-valued attributes
• One order property per multi-valued attribute
<!ELEMENT A EMPTY>
<!ATTLIST A
B IDREFS #IMPLIED
C NMTOKENS #IMPLIED>
class A {
String[]
int[]
String[]
int[]
}
b;
bOrder;
c;
cOrder;
• Separate order space for each multi-valued attribute
<A B="dd ee ff"
C="gg hh"/>
object a {
b = {"dd", "ee", "ff"}
bOrder = {1, 2, 3}
c = {"gg", "hh"}
cOrder = {1, 2}
}
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
ID/IDREF(S) Attributes
• Map to primary key / foreign key relationship
• “Decorate” IDs if not unique across all documents
Table A
a ...
<A>
A
B
C
D
<B ref_d="1">
...
</B>
<C ref_d="1">
...
</C>
<D id="1">
...
</D>
</A>
Table B
a ref_d ...
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Table C
a ref_d ...
Table D
id ...
Alternate Mappings
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Map complex element types to
scalar types
• Map references to complex element types to scalar
properties, then to columns
<!ELEMENT Part (Num, Desc)>
<!ELEMENT Number (#PCDATA)>
<!-- Use Inline entity
from XHTML -->
<!ELEMENT Desc (%Inline;)>
class Part {
String num;
String desc;
}
• Value is XML (e.g. XHTML)
<Part>
<Number>127</Number>
<Desc>
A very <b>big</b>
turkey wrench.
</Desc>
</Part>
Table Part
Num Desc
127 A very <b>big</>
turkey wrench.
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Map scalar properties to tables
• Useful for storing BLOBs separately
class Part {
String num;
String desc;
}
Table Parts
Column num
Table Descriptions
Column num, Column desc
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Additional Comments
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
What isn’t mapped?
• Physical structure and information:
»
»
»
»
Character and entity references
CDATA sections
Character encodings
Standalone declaration
• Document information:
» Document type
» DTD
• Other:
» Comments
» Processing instructions
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Pros and cons
• Pros:
»
»
»
»
Mapping model works for all XML documents
Preserves logical structure of data in XML document
Provides basis for easy-to-use, model-driven software
Round-tripping at the element/attribute/text level
• Cons:
» Discards physical structure information
» Inefficient for mixed content or deeply nested data
• Ideal for data-centric applications
• Poor for document-centric applications
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Widely used
• Relational databases such as Oracle 8i, IBM DB2,
Informix, and Microsoft SQL Server
• Data transfer middleware such as ADO, XML SQL
Utility for Java (Oracle), XML-DBMS, DB/XML
Vision, and InterAccess
• Object servers such as Castor, Object Translator
(Informix), and Total-e-Business (Bluestone)
• Varying levels of implementation
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating Database Schema
from DTDs
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example
1) Generate class tables and prim. keys for complex element types
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Order (OrderNum, Date, CustNum, Line*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Line (LineNum, Quantity, Part)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Part (PartNum, Price)>
PartNum (#PCDATA)>
Price (#PCDATA)>
Order
OrderPK
Line
LinePK
Part
PartPK
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
2) Generate columns for single references to simple element types
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Order (OrderNum, Date, CustNum, Line*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Line (LineNum, Quantity, Part)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Part (PartNum, Price)>
PartNum (#PCDATA)>
Price (#PCDATA)>
Order
OrderPK, OrderNum, Date, CustNum
Line
LinePK, LineNum, Quantity
Part
PartPK, PartNum, Price
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
3) Generate foreign keys for references to complex element types
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Order (OrderNum, Date, CustNum, Line*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Line (LineNum, Quantity, Part)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Part (PartNum, Price)>
PartNum (#PCDATA)>
Price (#PCDATA)>
Order
OrderPK, OrderNum, Date, CustNum
Line
LinePK, LineNum, Quantity, OrderFK
Part
PartPK, PartNum, Price, LineFK
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating database schema
from DTDs
• Process element types
» Complex element types generate class tables and primary keys
• Process content models
» Single references to simple element types generate columns
» Repeated references to simple element types generate property
tables with foreign keys
» References to complex element types generate foreign keys in
remote class tables
» PCDATA in mixed content generates a property table with a
foreign key
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating database schema
from DTDs (cont.)
• Process attributes
» Single-valued attributes generate columns
» Multi-valued attributes generate property tables with foreign
keys
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs from
Database Schema
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example
1) Generate element type for root table
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT Orders ()>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
2) Generate PCDATA-only elements for each data column
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT Orders (Date, CustNum)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT CustNum (#PCDATA)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
3) Generate PCDATA-only element for primary key
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Orders (Date, CustNum, OrderNum)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
4) Add element for table to which primary key is exported
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Orders (Date, CustNum, OrderNum, Lines*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Lines()>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
5) Process remote table (data columns and primary key column)
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Orders (Date, CustNum, OrderNum, Lines*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Lines(Quantity, LineNum)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
6) Add element for table to which foreign key corresponds
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Orders (Date, CustNum, OrderNum, Lines*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Lines(Quantity, LineNum, Parts)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Parts()>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Example (cont.)
7) Process remote table (data columns and primary key column)
Orders
OrderNum, Date, CustNum
Lines
OrderNum, LineNum, Quantity, PartNum
Parts
PartNum, Price
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
Orders (OrderNum, Date, CustNum, Lines*)>
OrderNum (#PCDATA)>
Date (#PCDATA)>
CustNum (#PCDATA)>
Lines (LineNum, Quantity, Parts)>
LineNum (#PCDATA)>
Quantity (#PCDATA)>
Parts (PartNum, Price)>
PartNum (#PCDATA)>
Price (#PCDATA)>
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs from
database schema
• Process table
» Generate element type with sequence content model
• Process data columns
» Generate PCDATA-only elements
» Add references to generated elements to sequence
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generating DTDs from
database schema (cont.)
• Process primary / foreign key columns
» If key is primary key, optionally:
• Add PCDATA-only element types for columns in key
• Add references to element types to sequence
» Add reference to remote element type to sequence
• If key is primary key, reference is optional/multiple (* operator)
• If key is foreign key and is nullable, reference is optional (? operator)
» Process remote table
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Generation problems
• Naming problems (both directions)
» Collisions
» Illegal names
» Names don’t necessarily make sense
• DTD => database schema
»
»
»
»
Can’t predict data types or lengths
Can’t recognize element types/attributes to use as keys
Can’t determine if primary key is in parent or child
DTD often has excess structure
• Database schema => DTD
» Can’t recognize order columns or property tables
• Can’t round-trip
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping XML Schemas to
Databases
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Mapping XML Schemas
to databases
• Map complex element types to classes (sort of)
» Map complex type extension to inheritance
• Map simple data types to scalar data types
» Many facets can’t be mapped
» Map wildcards to Java Object, C++ pointer to void, etc.
• Treat “all” groups as unordered sequences
• Treat substitution groups as choices
• Map most identity constraints to keys
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Resources
• Ronald Bourret’s Papers Page
» http://www.rpbourret.com/xml/index.htm
• XML:DB.org’s Resources Page
» http://www.xmldb.org/resources.html
• XML:DB Mailing List
» http://www.xmldb.org/projects.html
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com
Questions?
Ronald Bourret
[email protected]
http://www.rpbourret.com
Copyright 2000, 2001, Ronald Bourret, http://www.rpbourret.com