Transcript Document
e-Science e-Business
e-Government and their
Technologies
XML Schema
Bryan Carpenter, Geoffrey Fox, Marlon Pierce
Pervasive Technology Laboratories
Indiana University Bloomington IN 47404
January 12 2004
[email protected]
[email protected]
[email protected]
http://www.grid2004.org/spring2004
1
Introduction
We saw that DTDs provide an approach to validating
XML documents: ensuring they have the structure
expected for a particular application.
With the increasing use of XML for data-centric
applications—e.g. XML formats for messages
exchanged by Web Services—limitations of DTDs
(which were inherited from SGML) soon became
apparent.
XML Schema is a more recent validation framework for
XML, which attempts to address the shortcomings of
DTDs for data-centric applications, for example by
providing a much richer set of data types.
2
Problems with DTDs
DTDs have some clear limitations:
• Restricted set of data types: attribute data is either general
character data, name tokens, ID or IDREF (or arcane cases);
element content is either general character data or nested
elements or some mixture.
For data-centric applications, we might want a value to be
a well-formed number, date, etc, etc.
• DTDs are not convenient for dealing with XML
Namespaces—essential for modularity on the Web.
• The uniqueness and consistency requirements associated with
ID, IDREF are powerful, but could be much more refined.
• There are various obscure constraints on element content
specifications, needed purely for historical SGML
compatibility.
3
XML Schema
XML Schema address all the issues mentioned on the
previous slide.
• Also have the interesting property that an XML Schema is
itself a well-formed XML document—some people consider
this a significant advantage.
This is the good news. The less good news is that the
XML Schema 1.0 specification is longer by almost an
order of magnitude than the basic XML specification—
DTDs and all.
4
General Comparison
DTDs
A DTD defines all elements, etc,
in one type of document.
For documents with multiple
namespaces, somehow patch
together one large DTD.
Directly define structures of
named elements.
XML Schema
A schema defines all elements,
etc, in a single namespace.
For documents with multiple
namespaces, use multiple
schemas.
Define structures of complex
types of element; then declare
named element of that type.
Limited built-in data types for
attributes.
Extensive built-in simple types
for attributes and element
content
No entity substitution
mechanism.
Complex entity substitution
mechanism.
5
Reading Material
The XML Schema Specification itself comes in parts 0,
1, and 2. Parts 1 and 2 are long and tough to read, but
part 0 is a reasonable (“non-normative”) introduction:
XML Schema Part 0: Primer, May 2001.
http://www.w3.org/TR/xmlschema-0/
There are some good and bad books. A good one is:
Definitive XML Schema,
Priscilla Walmsley, Prentice Hall, 2002.
There is a comprehensive (but again rather long)
tutorial introduction to XML Schema by Roger
Costello at:
http://www.xfront.com/
6
“Report” Format Revisited
When discussing DTDs we described a simple “report” format.
Here is a slightly expanded version of the DTD given there:
<!DOCTYPE report [
<!ELEMENT report
(title, (paragraph | figure)*, bibliography?) >
<!ELEMENT title (#PCDATA)>
<!ELEMENT paragraph (#PCDATA)>
<!ELEMENT figure EMPTY>
<!ATTLIST figure source CDATA #REQUIRED >
<!ELEMENT bibliography (reference)* >
…
]>
We begin our detailed discussion of schema by considering how
to give an equivalent XML Schema for this document.
7
Declaring a paragraph Element
The report schema is surprisingly long: we will build
up to it in several incremental steps. First consider the
paragraph element.
Using DTD, we declared this element by:
<!ELEMENT paragraph (#PCDATA)>
An equivalent declaration in XML schema might be:
<xsd:element name="paragraph" type="xsd:string"/>
• xsd:element is itself an element in the XML Schema
namespace; this example assumes we use xsd as the prefix for
that namespace.
• xsd:type is a predefined type in that namespace.
8
xsd:string Primitive Type
XML Schema has a complex system of types. Different
types may describe:
1. the allowed values of attributes,
2. the allowed content of elements, or
3. the allowed content and the allowed attributes of elements.
There is a subset of types, called the simple types, that
can be used in either of the first two roles.
One of the simplest of all is string. Used as an
attribute type, this is equivalent to the DTD type
CDATA; used as an element type, this is equivalent to
the DTD content specification (PCDATA).
9
Declaring a report Element
We initially simplify to a schema in which a report
consists only of a series of paragraphs. In DTD a
possible declaration of the root element would be:
<!ELEMENT report (paragraph)*>
An equivalent declaration in XML schema might be:
<xsd:element name=“report">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="paragraph"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
10
Elements with Complex Type
This rather verbose declaration says:
• The element named report has complex type.
• The content associated with this complex type is a sequence of
elements.
• This sequence consists of at least 0 and at most an unbounded
number of occurrences of paragraph elements.
Here the xsd:element element has different roles:
• Outermost xsd:element declares the element named report.
• Innermost xsd:element uses the element named paragraph,
declared elsewhere.
The role is determined by the presence or absence of
the ref attribute.
11
Local Declarations
In fact xsd:element can have in a third role, which is
considered to be a combined declaration and use, e.g.:
<xsd:element name= "report">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="paragraph" type="xsd:string“
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
• Here the report element has its own local declaration of
paragraph; no separate global declaration is necessary.
12
Global vs Local Element Declarations
Declarations that occur as children of the top-level
schema element are global declarations.
• These are the only declarations that can actually be “used”
from elsewhere.
“Local declarations”—like the one illustrated on the
previous slide—are “used” exactly once at their point of
declaration.
• This is different from the concept of local declarations in most
programming languages.
• Local element declarations interact with namespaces in a
non-obvious way: perhaps best avoided until you are sure you
know what you are doing.
13
Global vs Local Type Definitions
The type of the report element was specified by an
xsd:complexType element nested within the element
declaration.
The type of the paragraph element was specified by a
type attribute on the declaration, referencing a named
type.
In fact types, like elements, can always be defined
locally where they are used, or defined globally, then
referenced from a point of use.
The following slide illustrates yet another way to
declare report.
14
Named Type Definitions
In this version we introduce a named complex type
called reportType, then declare the report element with
this type:
<xsd:complexType name="reportType">
<xsd:sequence>
<xsd:element ref="paragraph"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="report" type="reportType"/>
This abstraction facility—introducing new named
types—is a central theme of XML Schema.
15
A Complete XML Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.grid2004.org/ns/report1"
xmlns="http://www.grid2004.org/ns/report1">
<xsd:element name="report">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="paragraph"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="paragraph" type="xsd:string"/>
</xsd:schema>
16
Remarks
Recall this schema is essentially equivalent to the DTD:
<!DOCTYPE report [
<!ELEMENT report (paragraph)* >
<!ELEMENT paragraph (#PCDATA)>
]>
Clearly the schema has more baggage (or more added
value, according to your point of view!)
Our schema declares two element names, report and
paragraph, and puts them in a namespace called
http://www.grid2004.org/ns/report1.
17
Namespace Considerations
The root element of any schema is a schema element
from the http://www.w3.org/2001/XMLSchema
namespace.
The targetNamespace attribute on this element
specifies which namespace the elements declared here
“go into”.
We have seen the other namespace attributes before:
• The xmlns:xsd attribute associates the prefix xsd with the
XML Schema namespace.
• The xmlns attribute makes the default namespace
http://www.grid2004.org/ns/report1 for this document.
• Often one uses xsd as the prefix for schema elements, and
makes the target namespace the default namespace of the
schema document, but neither is essential.
18
An XML Instance Document
<?xml version="1.0"?>
<report xmlns="http://www.grid2004.org/ns/report1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid2004.org/ns/report1
report1.xsd">
<paragraph>Recently uncovered documents prove...
</paragraph>
<paragraph>The author is grateful to W3C for making this
research possible.</paragraph>
</report>
19
Namespace Considerations
Assuming the document vocabulary belongs to a
namespace, we must declare this namespace.
• In this example http://www.grid2004.org/ns/report1 is
declared as the default namespace.
If the instance document is to be validated against a
schema, we must normally define where the schema for
the namespace is located.
This is done here by putting an attribute
schemaLocation on the root element of the document.
This attribute is itself defined in a standard namespace,
called http://www.w3.org/2001/XMLSchema-instance.
So we must introduce a prefix for this (xsi is
traditional).
20
schemaLocation
The value of the schemaLocation attribute should be a
pair of IRIs: a namespace name and the corresponding
Schema URI.
• If the document uses more than one namespace, the value can
be several consecutive pairs.
• All tokens are separated by white space.
In this example the schema should be in the file
report1.xsd in the same directory as the instance
document.
21
Schema Validation Using dom.Writer
If I save the instance document in a file called
“xsdreport1.xml”, and the schema in a file called
“report1.xsd”, I can validate the file with the Xerces
parser by using the dom.Writer sample application as
follows:
> java dom.Writer –v –s –f xsdreport1.xml
• If validation is successful, this simply prints a formatted
version of the input file. If schema validation fails, you will
see error messages early in the output.
• The –v –s flags are needed here. Without –s the parser will
try to do just DTD validation. -f means “full” schema
validation—presumably a good thing.
22
Schema Validation from Java
Unfortunately it doesn’t seem to be possible to enable
XML Schema validation in Xerces using the “vendorneutral” JAXP API.
• The DOM Level 3 API will enable this, but it is not finalized
or fully deployed at the time of this writing.
For now you must directly use the “proprietary”
org.apache.xerces.parsers.DOMParser Xerces
implementation class.
Use is sketched on the next slide.
23
The Xerces DOMParser API
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
…
static final String VALIDATION_FEATURE_ID =
"http://xml.org/sax/features/validation" ;
static final String SCHEMA_VALIDATION_FEATURE_ID =
"http://apache.org/xml/features/validation/schema" ;
static final String SCHEMA_FULL_CHECKING_FEATURE_ID =
"http://apache.org/xml/features/validation/schema-full-checking" ;
…
DOMParser parser = new DOMParser();
// Turn Schema Validation on
parser.setFeature(VALIDATION_FEATURE_ID, true);
parser.setFeature(SCHEMA_VALIDATION_FEATURE_ID, true);
parser.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, true);
parser.setErrorHandler(new MyErrorHandler()) ;
parser.parse(uri) ; // uri is XML instance file
Document document = parser.getDocument() ;
…
24
More on Complex Types
If an element may have nested elements, or if it may
have attributes, it must be described by a complex type.
• If neither of these conditions holds—the element has only
character data content and no attributes—it is usually more
convenient to use a simple type.
Attributes on complex types are specified by an
attribute element, e.g.:
<xsd:element name="figure">
<xsd:complexType>
<xsd:attribute name="source" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
25
Attribute Declarations
Like element declarations, attributes may be declared
globally, then used inside a complex type declaration,
through an xsd:attribute element with a ref attribute.
• In contrast to the situation with elements, local declaration of
attributes is often a natural choice.
The figure example above has a complex type with no
content. In general attribute specifications go after the
content specification, in the body of the
xsd:complexType element.
26
Element Sequences and Choices
To finish this introductory foray into XML Schema, we
restore our report element back to its original
specification. The XML Schema declaration is given on
the next slide.
Recall this is supposed to be equivalent to the DTD
declaration:
<!ELEMENT report (title, (paragraph | figure)*, bibliography?) >
• The use of the xsd:sequence and xsd:choice elements should
be reasonably self explanatory.
• Note how the minOccurs, maxOccurs attributes replace use
of the *, ? operators: both have default values of 1.
27
Original report Element Structure
<xsd:element name="report">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="paragraph"/>
<xsd:element ref="figure"/>
</xsd:choice>
<xsd:element ref="bibliography" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
28
Simple Types:
Schema Datatypes
29
XML Schema Simple Types
Recall simple types can be used to describe the values of
attributes, or the content of elements that have no
nested elements (“character data” content).
So far we only illustrated one simple type built in to
XML Schema: namely string.
• As an attribute type this is similar to the DTD attribute type
CDATA; as an element type, it is similar to the DTD content
specification (PCDATA).
Most of the details of simple types are defined in the
W3 recommendation XML Schema Part 2: Datatypes.
30
Built In and User-Defined Types
XML Schema provides over 40 built in simple types.
It also provides flexible mechanisms for creating your
own simple types,
• which may in fact impose rather complex patterns on text
content.
31
Schema Built In Types
32
Built In Simple Types
Simple Type
Examples (comma separated)
string
Confirm this is electric
normalizedString
Confirm this is electric
token
Confirm this is electric
base64Binary
GpM7
hexBinary
0FB7
byte
-1, 126
unsignedByte
0, 126
33
Built In Simple Types (continued)
Simple Type
Examples (comma separated)
integer
-126789, -1, 0, 1, 126789
positiveInteger
1, 126789
negativeInteger
-126789, -1
nonNegativeInteger
0, 1, 126789
nonPositiveInteger
-126789, -1, 0
int
-1, 126789675
unsignedInt
0, 1267896754
long
-1, 12678967543233
unsignedLong
0, 12678967543233
short
-1, 12678
unsignedShort
0, 12678
34
Built In Simple Types (continued)
Simple Type
Examples (comma separated)
decimal
-1.23, 0, 123.4, 1000.00
float
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
double
-INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN
boolean
true, false, 1, 0
35
Built In Simple Types (continued)
Simple Type
Examples (comma separated)
time
13:20:00.000, 13:20:00.000-05:00
dateTime
1999-05-31T13:20:00.000-05:00
duration
P1Y2M3DT10H30M12.3S
date
1999-05-31
gMonth
--05--
gYear
1999
gYearMonth
1999-02
gDay
---31
gMonthDay
--05-31
36
Built In Simple Types (continued)
Simple Type
Examples (comma separated)
Name
shipTo
QName
po:USAddress
NCName
USAddress
anyURI
http://www.example.com/,
http://www.example.com/doc.html#ID5
en-GB, en-US, fr
language
37
Built In Simple Types (continued)
Simple Type
Examples (comma separated)
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
US, Brésil
NMTOKENS
US UK, Brésil Canada Mexique
38
Creating New Simple Types
There are three basic approaches to building new
simple types (deriving simple types):
• Restricting facets of an existing simple type.
• Creating a list type from an existing simple type.
• Creating a union type from some existing simple types.
The most sophisticated mechanism is the first—
restriction using facets.
39
Facets
The 19 primitive types (the built in types derived directly
from anySimpleType) have a set of constraining facets
restricting allowed values.
The constraining facets of a simple type are a subset of:
• length, minLength, maxLength, pattern, enumeration,
whiteSpace, maxInclusive, maxExclusive, minExclusive,
minInclusive, totalDigits, fractionDigits
• Restricted types have all the facets of their base types—
though values of the facets may be different.
• There is no way for schema writers to introduce new facets—
users cannot directly restrict anySimpleType.
• Technically simple types have additional fundamental facets,
but values of these flags cannot be set directly. They are:
equal, ordered, bounded, cardinality, numeric
40
Restriction
Here is a characteristic example of restriction:
<xsd:simpleType name="singleDigit">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="-9"/>
<xsd:maxInclusive value="9"/>
</xsd:restriction>
</xsd:simpleType>
This starts from the built in xsd:integer, and defines a
derived type singleDigit by setting the facet
minInclusive to -9 and the facet maxInclusive to 9.
Thus the type singleDigit represents a whole number
between -9 and +9.
41
Length
The facets length, minLength, maxLength allow to constrain the
length of an item like a string (also allow to constrain the
number of items in a list type, see later).
Values of length, minLength, minLength should be non-negative
integers. Example:
<xsd:simpleType name="state">
<xsd:restriction base="xsd:string">
<xsd:length value="2"/>
</xsd:restriction>
</xsd:simpleType>
defines a type state representing strings containing exactly two
characters.
• These facets supported by all primitive types other than numeric and dateand time-related types. Also supported by list types.
42
Pattern
Perhaps the most powerful facet is pattern, which
allows to specify a regular expression: any allowed
value must satisfy the pattern of this expression.
Example
<xsd:simpleType name="weekday">
<xsd:restriction base="xsd:string">
<xsd:pattern
value="(Mon|Tues|Wednes|Thurs|Fri)day"/>
</xsd:restriction>
</xsd:simpleType>
defines a type weekday representing the names of the
week days.
43
Regular Expressions
XML Schema has its own notation for regular
expressions, but very much based on the corresponding
Perl notation.
For the most part Schema use a subset of the Perl 5
grammar for regular expressions.
• Includes most of the purely “declarative” features from Perl
regular expressions, but omits many “procedural” features
related to search, matching algorithm, substitution, etc.
XML Schema adds a few features of its own, e.g.:
• Matching characters legal in XML names.
• Character class subtraction.
• Inherits general XML escape mechanisms for Unicode
characters, replacing analogous Perl mechanisms.
44
Metacharacters
The following characters, called metacharacters, have
special roles in Schema regular expressions:
. \ ? * + | { } ( ) [ ]
• Like Perl, but treats }, ] uniformly as metacharacters, and
omits search-related metacharacters ^ and $.
To match these characters literally in patterns, must
escape them with \, e.g.:
• The pattern “2\+2” matches the string “2+2”.
• The pattern “f\(x\)” matches the string “f(x)”.
45
Escape Sequences
In general one should use XML character references to
include hard-to-type characters. But for convenience
Schema regular expressions allow:
• \n matches a newline character (same as 
)
• \r matches a carriage return character (same as 
)
• \t matches a tab character (same as 	)
All other escape sequences (except \- and \^, used only
in character class expressions) match any single
character out of some set of possible values.
• For example \d matches any decimal digit, so the pattern
“Boeing \d\d\d” matches the strings “Boeing 747”, “Boeing
777”, etc.
46
Multicharacter Escapes
The simplest patterns matching classes of characters
are:
•
•
•
•
•
•
. matches any character except carriage return or newline.
\d matches any decimal digit.
\s matches any white space character.
\i matches any character that can start an XML name.
\c matches any character that can appear in an XML name.
\w matches any “word” character (excludes punctuation, etc.)
The escapes \D, \S, \I, \C and \W are negative forms, e.g.
\D matches any character except a decimal digit.
• Similar to Perl, except: Perl doesn’t have \i, \I; Perl uses \c, \C
for other things; detailed definitions of \w, \W are different.
47
Category Escapes
A large and interesting family of escapes is based on the
Unicode standard. General form in Perl or Schema is
\p{Name}
where Name is a Unicode-defined class name.
• The negative form \P{Name} matches any character not in the
class.
Simple examples include: \p{L} (any letter), \p{Lu}
(upper case letters), \p{Ll} (lower case letters), etc.
More interesting cases are based on the Unicode block
names for alphabets, e.g.:
• \p{IsBasicLatin}, \p{IsLatin-1Supplement}, \p{IsGreek},
\p{IsArabic}, \p{IsDevanagari}, \p{IsHangulJamo},
\p{IsCJKUnifiedIdeographs}, etc, etc, etc.
48
Character Class Expressions
Allow you to define terms that match any character
from a custom set of characters. Basic syntax is
familiar from Perl and UNIX:
[List-of-characters]
or the negative form:
[^List-of-characters]
Here List-of-characters can include individual
characters, and also ranges of the form First-Last
where First and Last are characters.
Examples:
• [RGB] matches one of R, G, or B.
• [0-9A-F] or [\dA-F] match one of 0, 1, …, 9, A, B,…, F.
• [^\r\n] matches anything except CR, NL (same as . ).
49
Class Subtractions
A feature of XML Schema, not present in Perl 5. A
class character expression can take the form:
[List-of-characters-Class-char-expr]
or:
[^List-of-characters-Class-char-expr]
where Class-char-expr is another class character
expression.
Example:
• [a-zA-Z-[aeiouAEIOU]] matches any consonant in the Latin
alphabet.
50
Sequences and Alternatives
Finally, the universal core of regular expressions. If
Pattern1 and Pattern2 are regular expressions, then:
• Pattern1Pattern2 matches any string made by putting a string
accepted by Pattern1 in front of a string accepted by Pattern2.
• Pattern1|Pattern2 matches any string that would be accepted
by Pattern1, or any string accepted by Pattern2.
Parentheses just group things together:
• (Pattern1) matches any string accepted by Pattern1.
An example given earlier:
• (Mon|Tues|Wednes|Thurs|Fri)day matches any of the strings
Monday, Tuesday, Wednesday, Thursday, or Friday.
• Equivalent to
Monday|Tuesday|Wednesday|Thursday|Friday.
51
Quantifiers
… and if Pattern1 is a regular expression:
• Pattern1? matches the empty string or any string accepted by
Pattern1.
• Pattern1+ matches any string accepted by Pattern1, or by
Pattern1Pattern1, or by Pattern1Pattern1Pattern1, or …
• Pattern1* matches the empty string or any string accepted by
Pattern1+.
If n, m are numbers, Perl and XML Schema also allow
the shorthand forms:
• Pattern1{n} is equivalent to Pattern1 repeated n times.
• Pattern1{m,n} matches any string accepted by Pattern1
repeated m times or m + 1 times or … or n times.
• Pattern1{m,} matches any string accepted by Pattern1 repeated
m or more times.
52
Using Patterns in Restriction
All simple types (including lists and enumerations) support the
pattern facet, e.g.:
<simpleType name=“multiplesOfFive">
<restriction base="xs:integer">
<pattern value=“[+-]?\d*[05]"/>
</restriction>
</simpleType>
defines a subtype of integer including all numbers ending with
digits 0 or 5.
The pattern facet can appear more than once in a single
restriction: interpretation is as if patterns were combined with |.
• Conversely if the pattern facet is specified in restriction of a base type that
was itself defined using a pattern, allowed values must satisfy both
patterns.
53
Enumeration
The enumeration facet allows one to select a finite
subset of allowed values from a base type, e.g.:
<xsd:simpleType name="weekday">
<xsd:restriction base="xs:string">
<xsd:enumeration value="Monday"/>
<xsd:enumeration value="Tuesday"/>
<xsd:enumeration value="Wednesday"/>
<xsd:enumeration value="Thursday"/>
<xsd:enumeration value="Friday"/>
</xsd:restriction>
</xsd:simpleType>
• Behaves like a very restricted version of pattern?
• All primitive types except boolean support the enumeration
facet. List and union types also support this facet.
54
White Space
This facet controls how white space in a value received from the
parser is processed, prior to Schema validation. It can take three
values:
• preserve: no white space processing, beyond what base XML does.
• replace: Convert every white space character (Line Feed, etc) to a space
character (#x20).
• collapse: Like replace. All leading or trailing spaces are then removed.
Also sequences of spaces are replaced by single spaces.
Note analogies to “Normalization of Attribute Values” in base XML.
All simple types except union types have this attribute, but
usually you don’t explicitly set it in restriction: just inherit values
from built in types.
All built in types have collapse, except string which has
preserve and normalizedString which has replace.
55
Other Facets
The facets maxInclusive, maxExclusive, minExclusive,
minInclusive are supported only by numeric and dateand time-related types, and define bounds of value
ranges.
The facets totalDigits, fractionDigits are defined for the
primitive type decimal, and thus all numeric types
derived from decimal, and for no other types.
56
List Types
We define a type representing a white-space-separated list of
items using the list element, e.g.:
<xsd:simpleType name="listOfDays">
<xsd:list itemType="weekday">
</xsd:simpleType>
this introduces a type that takes values like “”, “Monday”,
“Monday Monday”, “Tuesday Wednesday Thursday”, etc.
List types can be restricted using the length-related facets, and
the pattern, enumeration and whitespace facets.
The <list> element may contain an anonymous <simpleType>
element instead of having an itemType attribute.
A list value is split according to its white space content prior to
validation of the items in the list.
57
Union Types
A union type takes values from any one of a set of base types.
<xsd:simpleType name="maxOccursType">
<xsd:union memberTypes="xsd:integer">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="unbounded"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
The types in the union are specified in the list-valued attribute
memberTypes, or by nested anonymous <simpleType>
elements, or by a combination of the two, as above.
Union types can be restricted using the pattern or enumeration
facets.
58
Prohibiting Derivation
You may, for some reason, have a simple type that you
don’t want anybody to derive further types from.
Do this by specifying the final attribute on the
<simpleType> element. Its value is a list containing a
subset of values from list, union, restriction, extension.
• These specify which sorts of derivation are disallowed. Note
extension is a way of deriving a complex type from a simple
type. It will be discussed in the next section.
Give the final attribute the value “#all” for blanket
prohibition of any derivation from this simple type.
• Can also prevent the value of individual facets from being
changed in subsequent restrictions by specifying fixed="true"
on the facet elements.
59
Complex Types
60
Element Content, and Attributes
Simple types allow us to declare elements that have only
parsed character content (no nested elements). E.g. the
declaration:
<xsd:element name="dayItHappened" type="weekday"/>
might validate instance elements like:
<dayItHappened>
Monday
</dayItHappened>
<dayItHappened>Tuesday</dayItHappened>
But if we need elements with element content, or
elements with attributes, we must declare those elements
to have complex type.
61
Complex Type Hierarchy
We saw that a set of built in simple types were derived
from xsd:anySimpleType, and that new simple types
could be derived from a base type by restriction, list, or
union.
There are no built in complex types, other than the socalled ur-type, represented as xsd:anyType.
All other complex types are derived by one or more
steps of restriction and extension from xsd:anyType.
• Complex types can also be created by extension of a simple
type, but simple types are also notionally restrictions of
xsd:anyType.
62
Restriction
A restriction of a base type is a new type.
All allowed instances of the new type are also instances
of the base type. But the restricted type doesn’t allow
all possibilities allowed by the base type.
• Think of the example of restricting xsd:string to 4 characters
using the length facet. Strings of length 4 are also allowed by
the xsd:string, but the new type is more restrictive.
• In the complex case, we might have a complex base type that
allows attribute att optionally. A restricted type might not
allow att at all.
Another restriction of the same base might require att.
• Or we might have a base type that allows 0 or more nested
elm elements. The restricted type might require exactly 1
nested elm element.
63
Extension
An extension of a base type is a new type.
An extension allows extra attributes or extra content
that are not allowed in instances of the base type.
• At first brush this sounds like the opposite of restriction, but
this isn’t strictly true.
• If, for example, type E extends a type B by adding a required
attribute att, then instances of B are not allowed instances of
E (because they don’t have the required attribute). So we
have that E is an extension of B, but there is no sense in which
B could be a restriction of E.
Some such inverse relation exists if all extra attributes and
content are optional in the extended type, but this isn’t a
required feature of extension.
64
Complex Content and Simple Content
We have seen that XML Schema complex types define
both some allowed nested elements, and some allowed
attribute specifications. Complex types that allow
nested elements are said to have complex content.
But Schema distinguish as a special case complex types
having simple content—elements with such types may
have attributes, but they cannot have nested elements.
This is presumably a useful distinction, but it does
introduce one more layer of complexity into the syntax
for complex type derivation.
65
Basic Forms of Complex Type Definition
Restriction
<complexType>
<complexContent>
<restriction base="type">
allowed element content
allowed attributes
</restriction>
</complexContent>
</complexType>
Extension
<complexType>
<complexContent>
<extension base="type">
extra element content
extra attributes
</extension>
</complexContent>
</complexType>
<complexType>
<simpleContent>
<restriction base="type">
facet restrictions
allowed attributes
</restriction>
</simpleContent>
</complexType>
<complexType>
<simpleContent>
<extension base="type">
extra attributes
</extension>
</simpleContent>
</complexType>
66
Remarks
When one restricts a type one generally must specify all
allowed element content and attributes.
When one extends a type one generally must specify
just the extra element content and attributes.
67
Requirements on Base Type
The base type must be a complex type in all cases except
simpleContent/extension (lower right in table), in
which case the base can be a simple type.
If the derived type has complexContent, the base type
must have complex content.
• True for extension or restriction.
• Under some conditions, using a special form described later, a
base type with complex content can be restricted to a type
with simple content.
68
Schematic Inheritance Diagram
xsd:anyType
restriction
restriction
restriction†
Complex
Types
Simple Types
restriction
list
union
Simple
Content restriction†
extension
restriction
extension
Complex
Content
restriction
extension
† see later for syntax
69
Defining a Complex Type with no Base?
In the introductory lecture we seemed to avoid this complexity:
didn’t we just define complex types out of “thin air”?
Actually the XML Schema specification says that:
<complexType>
allowed element content
allowed attributes
</complexType>
Is “shorthand” for
So in reality we were
directly restricting the
ur-type, which allows
any attributes and any
content!
<complexType>
<complexContent>
<restriction base="xsd:anyType">
allowed element content
allowed attributes
</restriction>
</complexContent>
</complexType>
70
Defining Element Content
Where we wrote allowed element content or extra
element content in the syntax for complex type
definitions, what should appear is a model group.
A model group is exactly one of:
• an <xsd:sequence/> element, or
• an <xsd:choice/> element, or
• an <xsd:all/> element.
(The element content appearing in the type definition may also
be a globally defined model group, referenced through an
<xsd:group/> element. The global definition—a named
<xsd:group/> element—just contains one of the three elements
above.)
71
Sequence
A <xsd:sequence/> model group contains a series of
particles. A particle is an <xsd:element/> element,
another model group, or a wildcard.
As expected, this model just says the element content
represented by those items should appear in sequence.
E.g.
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="paragraph"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
says that exactly one occurrence of a title element is
followed by any number of occurrences of paragraph
elements.
72
Choice
A <xsd:choice/> model group also contains a series of
particles, with the same options as for sequence.
The element information validated by this model
should match exactly one of the particles in the choice.
E.g.
<xsd:choice minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="paragraph">
<xsd:sequence>
<xsd:element ref="figure"/>
<xsd:element ref="caption"/>
<xsd:sequence>
</xsd:choice>
matches a sequence of paragraph elements interleaved
with consecutive pairs of figure and caption elements.
73
All
The <xsd:all/> model group is peculiar to XML
Schema. All particles it contains must be
<xsd:element/>s.
The element information validated should match a
sequence of the particles in any order.
There are several constraints:
• The maxOccurs attribute of each particle must be 1.
• The minOccurs attribute of each particle must be 0 or 1.
• The <xsd:all/> model group can only occur at the top level of
a complex type’s content model, and must itself have
minOccurs = maxOccurs = 1.
In view of the fact minOccurs of a particle can be 0,
subset might be a better name than all??
74
Element Wildcard
The element wildcard particle <xsd:any/> matches and
validates any element in the instance document.
• Though one can restrict the namespace of the matched
element, as described below.
E.g.
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element ref=“header"/>
<xsd:any/>
</xsd:sequence>
matches a sequence of consecutive pairs of elements,
where the first element in each pair is a header, and
the second can be any kind of element.
75
Options on <xsd:any/>
The <xsd:any/> element takes the usual optional
maxOccurs, minOccurs attributes.
Allows a namespace attribute taking one of the values:
• ##any (the default),
• ##other (any namespace except the target namespace),
• List of namespace names, optionally including either
##targetNamespace or ##local.
Controls what elements the wildcard matches,
according to namespace.
It also allows a processContents attribute taking one
of the values strict, skip, lax (default strict), controlling
the extent to which the contents of the matched element
are validated.
76
Parsing and Determinism
Recall the rule about determinism of content models in
DTDs. We claimed XML retained this purely for
compatibility with SGML.
Perhaps surprisingly, XML Schema retains exactly the
same rule, calling it the Unique Particle Attribution
constraint.
It has to be imposed slightly more carefully here
because of the possibility of wild card particles and
substitution groups (discussed later).
• Unclear why it was retained. Perhaps to improve the
efficiency of parsing, especially in the presence of substitution
groups? Or to simplify the Particle Derivation OK constraints
for restriction of complex types (see later)?
77
Mixed Content
XML Schema score a big win over DTDs in the way
mixed content is handled.
One simply specifies the attribute mixed on the
complexContent element, giving it the value true.
• In the abbreviated form for restriction of the ur-type, the
mixed attribute appears on the complexType element.
This specifies that the element content defined by the
model particles can be interleaved with character data
(without limiting how the elements themselves are
arranged).
78
Mixed Content Example
This element declaration
<xsd:element name="body">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<element ref="p"/>
<element ref="a"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
allows the body element to contain <p/> and <a/>
elements, with text interleaved anyhow between them.
79
mixed and Inheritance
So an <xsd:complexContent/> with mixed="true"
indicates a mixed complex type. And an
<xsd:complexContent/> with mixed="false" (the
default) indicates an element-only complex type.
A mixed complex content type may be restricted to an
element-only type (if the element content allows it).
Perhaps surprisingly, an element-only complex content
type may not be extended to a mixed type.
80
Restricting Mixed Content to Simple Content
If the model group of a mixed complex type can match
the empty sequence of elements, then the type may have
content that is text-only.
Then it is logically possible to restrict the type to one
with simple content. There is a special syntax for this:
<complexType>
<simpleContent>
<restriction base="mixed-complex-content-type">
<simpleType>
usual content of simpleType element
</simpleType>
allowed attributes
</restriction>
</simpleContent>
</complexType>
81
Expanded Complex Type Inheritance
xsd:anyType
restriction
restriction
restriction
Complex
Complex Content
Types
Simple
Content
restriction
extension
Mixed
restriction
restriction
extension
Elementrestriction
only
restriction
extension
82
Empty Elements
XML Schema doesn’t have any unique way of
representing elements that must be empty.
The simplest thing to do this is simply omit the allowed
element content in a complex content restriction.
Can such an element also be mixed (i.e. have pure text
content)?
• Logically it seems this should be possible (I believe it is
allowed by Xerces).
• But it seems to be forbidden by the XML Schema
specification, which singles out this case and says such an
element is strictly empty.
83
Attributes and Local
Declarations
84
Defining Allowed Attributes
Where we wrote allowed attributes or extra attributes in
the syntax for complex type definitions, what should
appear is sequence of attribute declarations in the form
of <xsd:attribute/> elements.
• These may be followed an optional attribute wildcard.
(The attribute declaration list may also include globally defined
attribute groups, referenced through <xsd:attributeGroup/>
elements. These will be discussed later.)
85
Simple Attribute Declarations
A straightforward example of an attribute declaration
was given in the introductory lecture:
<xsd:element name="figure">
<xsd:complexType>
<xsd:attribute name="source" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
In general the value of the type attribute can be any
simple type.
• Though unusual, it is also allowed to include an anonymous
<xsd:simpleType/> definition in the body of the
<xsd:attribute/>, instead of specifying the type attribute.
86
Default Rules
As with DTDs, one can specify whether the use of an
attribute is optional (the default) or required.
One can also specify a default value (if the attribute is
optional).
Alternatively one can specify a fixed value for the
attribute (whether the attribute is optional or
required).
• default and fixed are mutually exclusive.
87
DTD Attribute Defaults Revisited
Attribute list declaration:
<!ATTLIST a
val
fix
req
opt
CDATA "nothing"
CDATA #FIXED "constant"
CDATA #REQUIRED
CDATA #IMPLIED>
Instances of element a:
<a val="something" fix="constant“
req="reading" opt="extra"/>
<a req="no experience"/>
<!-- OK: val = “nothing”, fix = “constant”, opt absent. -->
<a fix="variable"/>
<!-- Invalid! fix not “constant” and req unspecified. -->
88
Schema Attribute Occurrence
Equivalent Schema declaration:
<xsd:attribute name="val" type="xsd:string"
use="optional" default="nothing"/>
<xsd:attribute name="fix" type="xsd:string"
fixed="constant"/>
<xsd:attribute name="req" type="xsd:string"
use="required"/>
<xsd:attribute name="val" type="xsd:string“/>
• Note fix and val implicitly have use="optional" (we could have
omitted this specification for val too).
• Unlike DTDs, it possible to have an attribute that is both fixed
and required.
89
Complex Content Plus Attributes
Putting things together, here is a declaration of a body
element that allows mixed content plus a style
attribute.
<xsd:element name="body">
<xsd:complexType mixed="true">
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<element ref="p"/>
<element ref="a"/>
</xsd:choice>
<xsd:attribute name="style" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
90
Simple Content plus Attributes
Here is a declaration of an anchor element that allows
simple content plus an href attribute.
<xsd:element name="anchor">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string”>
<xsd:attribute name="href" type="xsd:anyURI"/>
<xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
91
Attribute Wildcards
An attribute wildcard is represented by an
<xsd:anyAttribute/> element.
There can be at most one such element in a complex
type definition, and it must appear after any normal
attribute declarations.
Such an declaration allows any attribute, optionally
limited by namespace.
The namespace and processContents attributes on
<xsd:anyAttribute/> work as for <xsd:any/>.
92
Attributes and Namespaces
By default, attributes declared as we have illustrated
(inside an <xsd:complexType/>) do not become part of
the target namespace.
• Instead these attributes are local properties of any element
they are attached to. The element itself may or may not
belong to a namespace.
In instance documents, names of these attributes must
not be prefixed with a namespace prefix.
93
Creating Attributes in a Namespace
There are three ways to put attributes into the target
namespace:
• Declare them “globally”, directly inside the top level
<xsd:schema/> element. Reference the attribute declaration
inside the complex type definition (like element references), or
• specify the attribute form="qualified" on a local
<xsd:attribute/> declaration, or
• specify the attribute attributeFormDefault="qualified" on the
<xsd:schema/> element.
After this, these attributes must be prefixed in instance
documents with a namespace prefix.
• Recall default namespace declarations (xmlns="namespace")
don’t work for attributes: you must introduce a non-empty
prefix.
94
Locally Declared Elements
XML Schema goes to some lengths to maintain symmetry
between elements and attributes.
Because the most natural way of declaring attributes is locally—
private to a complex type—it must therefore be possible to
declare elements local to the complex type.
• Even if this is less obviously natural for elements—it leads to some clumsy
constraints, e.g.: two local element declaration particles with the same
name in the model group of the same complex type must have the same
type.
The same rules apply: if an element is declared locally (inside an
<xsd:complexType/>), by default it does not belong to a
namespace.
In this case its name must not be prefixed with a namespace
prefix in instance documents.
95
Creating Elements in a Namespace
There are three ways to put elements into the target
namespace:
• Declare them “globally”, directly inside the top level
<xsd:schema/> element. Reference the element declaration
inside the complex type definition, or
• specify the attribute form="qualified" on a local
<xsd:element/> declaration, or
• specify the attribute elementFormDefault="qualified" on the
<xsd:schema/> element.
After this, these elements must be prefixed in instance
documents with a namespace prefix (or there must be a
default namespace declaration in effect).
96
elementFormDefault and attributeFormDefault
Summary:
• These attributes on the <xsd:schema/> element take the
values “qualified” or “unqualified”
• The defaults for both are “unqualified”.
• They control whether or not elements and attributes declared
locally in <xsd:complexType/> definitions belong to the
target namespace.
• This property can also be controlled by form attributes on the
individual declarations.
None of these attributes has any effect on elements or
attributes declared globally (at the top level in the
<xsd:schema/> element)! Effectively such declarations
are all qualified.
97
Inheritance and
Substitution
98
Polymorphism?
We have presented the mechanisms by which new types
can be derived from old types (albeit we have omitted
some details for complex types).
Through these mechanisms, inheritance provides useful
ways to recycle existing definitions.
But it doesn’t in itself provide all the benefits of OOP—
in particular we have not presented any analogue of
polymorphism.
Schema tries to provide some of the OO flexibility in
use of instances through type substitution and
substitution groups.
99
Type Substitution
The most basic mechanism for “polymorphism” is type
substitution.
In essence this says that if a particle (in a content
model, say) is declared to be an element with a
particular type, then the corresponding element item in
the instance document may have type derived from the
particle type.
Actually this only introduces new possibilities if the
derivation involves extension.
100
A Basis for Extension
Suppose we have the complex type declaration:
<xsd:complexType name="figureType">
<xsd:attribute name="source" type="xsd:anyURI"/>
</xsd:complexType>
and suppose this is used as follows:
<xsd:element name="figure" type="figureType“/>
<xsd:element name="report">
<xsd:complexType>
<xsd:choice minOccurs="0" maxOccurs="unbounded"/>
<xsd:element ref="paragraph">
<xsd:element ref="figure"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
i.e. a report is a sequence of interleaved paragraph and figure
elements, and a figure just has an attribute referencing a source
image file.
101
Extension Example
Now suppose that, without modifying any existing
definitions and declarations, we want to allow figures in
reports to have captions. We can do this if we
introduce the extended type:
<xsd:complexType name="captionFigureType">
<xsd:complexContent>
<xsd:extension base="figureType">
<xsd:element name="caption" type="xsd:string"/>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
• This complex type inherits the attribute source from its base
type, and adds a nested caption element.
102
Example Instance Document
<report xmlns="http://www.grid2004.org/ns/report4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid2004.org/ns/report4
report4.xsd">
<paragraph>Recently uncovered documents prove...
</paragraph>
<figure xsi:type="captionFigureType" source="notafake.jpg">
<caption>Irrefutable proof of ancient XML.</caption>
</figure>
</report>
103
xsi:type
As illustrated above, the element information item may
have any type derived by extension from the type the
element was declared with.
• In general it may be derived by a mixture of extension and
restriction.
This isn’t quite a free lunch, though. There is no way
for an XML processor to automatically infer the type of
an element instance; instead this approach requires the
XML author explicitly specify the intended type using
the xsi:type attribute.
• This limits the attractiveness of this approach to
“polymorphism”.
104
Substitution Groups
A more author-friendly approach to document
polymorphism is based on element declarations.
This approach uses so-called substitution groups.
• Each substitution group is a set of element declarations.
• One of these is singled out as the head declaration.
Where a content model includes a reference to the head
as a particle, the instance document can have any
member of the associated substitution group.
105
Substitution Group Example
Suppose the earlier definitions of figureType, <figure/>,
<report/>, and captionFigureType are in effect.
Now suppose we declare a new element
<captionFigure/>, having type captionFigureType, and
belonging to a substitution group headed by <figure/>.
Then a possible instance document would be:
<report … >
<paragraph>Recently uncovered documents prove...
</paragraph>
<captionFigure source="notafake.jpg">
<caption>Irrefutable proof of ancient XML.</caption>
</captionFigure>
</report>
106
Remarks
Important things to note:
• Again we haven’t modified the original declaration of the
report element, which still says it contains figure elements.
• Because captionFigure is in the substitution group of figure,
automatically it is allowed to appear in place of figure in the
instance.
• We no longer need the clumsy xsi:type attribute; the actual
type of the information element can now be easily inferred
from the element name (through its declaration, described
shortly).
107
Creating Substitution Groups
Groups are implicit: the implementation is more like a
new kind of inheritance hierarchy—one relating
element declarations rather than type definitions.
A new element declaration specifies at most one direct
substitution group affiliation. This is another element
declaration. The “affiliation” now heads a group
containing the new declaration.
• In practice an affiliation works almost exactly like a base
type, except it involves element declarations, not types.
• If the affiliation itself belongs to a different group, the new
declaration automatically joins that group—generally an
element can be in several (perfectly nested) groups.
108
Group Creation Example
In our example we could declare captionFigure, as
follows:
<xsd:element name="captionFigure"
type="captionFigureType"
substitutionGroup="figure" />
• This says <figure/> is the substitution group affiliation of
<captionFigure/>.
• Or in other words <captionFigure/> is in the substitution
group headed by <figure/>.
• The type attribute here may be omitted: the type defaults to
that of the substitution group affiliation (again emphasizing
the analogy with inheritance).
109
Notional Substitution Group Hierarchy
This way of looking at things isn’t part of the XML
Schema specification, but it may be mnemonic:
<xsd:any/>
<figure/>
<report/>
<captionFigure/>
110
Substitution and Type Inheritance
It is required that all elements in a substitution group
headed by element <Name/> have either the same type
as <Name/>, or a type derived from it by steps of
extension and restriction.
Note that substitution may be used without type
inheritance.
• In other words, all elements in the substitution group may
have the same type as their head.
• Consider the example of internationalization: you might want
many interchangeable elements with identical structure but
different names (for different languages).
111
Blocking Substitutions
We have described two kinds of substitution involving
an element: the structure of an element can be
substituted using xsi:type, or the whole element can be
substituted by a member of its substitution group.
It is quite likely that a schema writer will want to block
some such substitutions.
• Many applications will require elements to have exactly the
originally specified form.
• We need a way to prevent this form being corrupted by (say)
unexpected addition of an element to a substitution group.
112
block Attribute of <xsd:element/>
The value of the block attribute on <xsd:element/>
should be a list containing a possibly empty subset of
the values extension, restriction, and substitution (or
simply #all).
It defines the disallowed substitutions for this element.
• If a particle in a content model has substitution in its
disallowed substitutions, the document instance may not
replace the element by members of its substitution group.
• If an element has extension in its disallowed substitutions,
then neither xsi:type or a substitution group substitution
allows the instance to validate against a type whose derivation
from the particle type involves steps of extension.
• Appearance of restriction in the disallowed substitutions has
an analogous effect.
113
block Attribute of <xsd:complexType/>
A block attribute may also be specified on the
<xsd:complexType/> element. Its value is a list containing a
subset of the values extension and restriction (or simply #all).
It defines the prohibited substitutions for this type.
• If the type of an element has extension in its prohibited substitutions, then
neither xsi:type or a substitution group substitution are allowed to
validate the instance against a type whose derivation from the particle
type involves any extension steps.
Such validation is also prevented if the prohibited substitutions of any
intervening types in the chain of derivation include extension.
• Appearance of restriction in the prohibited substitutions has an analogous
effect.
Note the block attributes of <complexType/> and <element/>
are independent, and constraints from both must be satisfied.
• But it is “as if” an element acquires all blocked substitutions of its type.
114
blockDefault Attribute of <xsd:schema/>
Unless otherwise specified, all substitutions are allowed.
You may want to change this globally to something
more conservative.
Do this by specifying the blockDefault attribute on the
<xsd:schema/> element.
• Allowed values for this attribute are the same as for the block
attribute on <xsd:element/>.
115
Prohibiting Derivation
The final attribute on <xsd:complexType/> works in
the same way as the corresponding attribute for
<xsd:simpleType/>.
Its value may be either a list containing a subset of the
values extension and restriction, or simply #all. It
prohibits either or both kinds of derivation using this
type as base.
Although final and block can be used to similar ends,
their modus operandi are quite different:
• final controls how you define new types derived from this
type.
• block controls how you substitute elements of this type in the
document instance.
116
Substitution Group Exclusions
An <xsd:element/> declaration likewise allows a final
attribute, with the same allowed values as final on
<xsd:complexType/>.
Its value defines the substitution group exclusions for
this element, which control its use as the head of a
substitution group.
• If an element has extension in its substitution group
exclusions, it may not be the substitution group affiliation of
another element whose type is derived from the type of this
element by steps including extension.
• Appearance of restriction in the substitution group exclusions
has an analogous effect.
By all rights, it should be possible to put substitution in
this set. But it isn’t!
117
finalDefault Attribute of <xsd:schema/>
For completeness we mention that the <xsd:schema/>
element allows a finalDefault attribute, which works in
a way very much analogous to the blockDefault
attribute.
118
Still to Come on Inheritance
By no means have we yet covered every aspect of
inheritance.
Notably we haven’t discussed what exactly is a legal
restriction or extension of a complex type (particularly
with respect to the content model).
This is quite complicated in general, and it will be
covered in the final section.
119
XML Schema Identity
Constraints
120
Identifiers and References Revisited
Slightly extended version of an example from the
lectures on DTDs:
<agency>
<agent name="Alice" boss="Alice"/>
<agent name="Bob" boss="Alice"/>
<agent name="Carole" boss="Alice"/>
<agent name="Dave" boss="Bob"/>
</agency>
Using DTDs, we assumed
name was declared with type
ID, and attribute boss was
declared with type IDREF.
Bob
Alice
Carole
Dave
121
Identity Constraints
Recall that the attribute types ID and IDREF imply
interesting constraints on values of those attributes:
• Within any individual XML document, every attribute of type
ID must be specified with a different value from every other
attribute of type ID.
• The value of any attribute of type IDREF must be the same as
the value of an attribute of type ID specified somewhere in the
same document.
These properties are obviously very useful and natural
if we need to identify individual elements in a
document.
XML Schema supports the ID and IDREF simple types.
But it also introduces additional, much more general
mechanisms for achieving similar ends.
122
Use of XPath
In an earlier lecture-set we gave a brief introduction to
XPath.
• Recall that XPath is a notation for representing a subset of
nodes in a single XML document.
The basic idea of XML Schema identity constraints is
to use XPath expressions to identify groups of “fields”
within an XML document that act as either identifiers
or references.
• Uniqueness/existence constraints hold within/across these
groups.
More flexible than the DTD mechanism, because:
• XPath allows one to single out more refined sets of fields.
• May have multiple categories of identifier in the same
document.
123
Example
<xsd:element name="agency">
<xsd:complexType>
<xsd:element ref="agent"
minOccurs="0" maxOccurs="unbounded"/>
</xsd:complexType>
<xsd:key name="agentName">
<xsd:selector xpath="agent"/>
<xsd:field xpath="@name"/>
</xsd:key>
<xsd:keyref refer="agentName" name="agentBoss">
<xsd:selector xpath="agent"/>
<xsd:field xpath="@boss"/>
</xsd:key>
</xsd:element>
124
General Remarks
The element <xsd:key/> defines a key field called
agentName.
The element <xsd:keyref/> defines a key reference field
called agentBoss.
These definitions are inside the declaration of the
element <agency/>.
• This implies that the scope of the uniqueness and related
constraints is an individual <agency/> element.
• This may or may not be the top-level element of a document.
The fields themselves are specified by XPath
expressions (details follow).
125
Defining a Key
We have the example:
<xsd:key name="agentName">
<xsd:selector xpath="agent"/>
<xsd:field xpath="@name"/>
</xsd:key>
• The name of the key is agentName.
• The <xsd:selector/> element defines the set of nodes labeled
by this key.
In our case, it is the set of all agent elements nested
directly in the agency element.
• The <xsd:field/> element defines the field within each labeled
node that acts as the key.
In our case, the name attribute of the node.
126
Validity Constraints on Keys
Every node identified by the XPath expression in the
<xsd:selector/> element must have exactly one
descendant node identified by the XPath expression in
the <xsd:field/> element.
• This descendant, whose value is the key field, must be an
attribute or an element with simple type.
No two nodes identified by <xsd:selector/> may have
the same value for their key fields.
• This constraint holds within the body of the scope element
(the <agency/> element in our example).
• But the same value of the key field is allowed on different
<agent/> nodes inside different <agency/> elements.
127
Defining a Key Reference
We have the example:
<xsd:keyref refer="agentName" name="agentBoss">
<xsd:selector xpath="agent"/>
<xsd:field xpath="@boss"/>
</xsd:key>
• The refer attribute is the name of the key to which we refer.
• The <xsd:selector/> and <xsd:field/> elements identify the
nodes whose values are the actual references.
They work in essentially the same way as in <xsd:key/>.
The two-stage approach to identifying the relevant fields is
less obviously natural in this case. But it supports the
generalization to multiple key fields, described below.
• The name of the key reference is agentBoss—this attribute is
required (though unclear what this name is used for??)
128
Multiple Key Fields
A <xsd:key/> element can have multiple <xsd:field/>
elements, e.g.:
<xsd:key name="fullName">
<xsd:selector xpath=".//person"/>
<xsd:field xpath="@firstName"/>
<xsd:field xpath="@lastName"/>
</xsd:key>
• For validity, this implies every <person/> element in scope
has firstName and lastName attributes with unique pairwise-combined values.
A <xsd:keyref/> element that refers to this key must
have exactly the same number of <xsd:field/> elements.
129
Relating Key References to Keys
The fact that keys and key references are scoped to
element declarations introduces some “interesting”
complications.
Things might be straightforward if a <keyref/> always
referred to a <key/> defined in the same element
declaration.
You might be forgiven for thinking this should
“obviously” be the case. But actually the Schema
specification allows a <keyref/> to refer to a <key/>
defined in a different element declaration.
130
Referencing Keys in Nested Elements
Suppose a key, Key, is defined in the declaration of
element B.
Also suppose a key reference, Ref, refers to this key and
is defined in the declaration of element A.
Now a field of Ref—scoped to an instance of A—is
allowed to point to fields of Key scoped to an instance of
B that is a descendent of the A instance.
• This can lead to ambiguous references, because the key
uniqueness constraints apply only within a single B instance,
and there could be several Bs nested in the A instance.
• The specification gives a slightly clumsy recipe for resolving
such ambiguities (illustrated below).
131
Features
The rule on the previous slide can introduce interesting
behavior even when the <xsd:keyref/> and the
<xsd:key/> are defined in the same element
declaration.
• This can happen if instances of the element can nest inside
one another.
In the example on the next slide, the key is the value of
<key/> elements directly nested inside a <scope/>
element, and the reference is the value of a <ref/>
element directly nested in a <scope/> element. The
<scope/> elements are also allowed to self-nest.
132
An Interesting Case
<xsd:element name="scope">
<xsd:complexType>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="key"/>
<xsd:element ref="ref"/>
<xsd:element ref="scope"/>
</xsd:choice>
</xsd:complexType>
<xsd:key name="key">
<xsd:selector xpath="key"/>
<xsd:field xpath="."/>
</xsd:key>
<xsd:keyref refer="key" name="ref">
<xsd:selector xpath="ref"/>
<xsd:field xpath="."/>
</xsd:keyref>
</xsd:element>
133
Examples
<scope>
<scope>
<key>keyval</key>
</scope>
<ref>keyval</ref>
</scope>
<scope>
<scope>
<key>keyval</key>
</scope>
<key>keyval</key>
<ref>keyval</ref>
</scope>
<scope>
<scope>
<key>keyval</key>
</scope>
<scope>
<key>keyval</key>
</scope>
<ref>keyval</ref> Illegal!
</scope>
<scope>
<scope>
<key>keyval</key>
</scope>
<scope>
<key>keyval</key>
</scope>
<key>keyval</key>
<ref>keyval</ref>
</scope>
134
Remarks
Examples here follow the rules in the section of the
XML Schema specification called: Schema Information
Set Contribution: Identity-constraint table.
The rule is basically that a key reference can refer to a
key field scoped to a descendant element. But if there
are conflicts, you ignore any potential reference targets
arising from children (this rule applies recursively).
In the 3rd example (bottom left), all potential targets
arise from children, and are conflicting, so they should
be ignored. Thus the reference is illegal.
• The 2nd and 4th examples OK: conflicts are resolved by
ignoring targets from children, leaving just the local target.
• Xerces 2.6.2, however, also accepts the 3rd example!
135
Uniqueness Constraints
The <xsd:unique/> element works almost exactly like
the <xsd:key/> element, except that it is not required
that the identifying fields exist for every node identified
by the selector.
• If fields exist in the node instance, they must be unique across
all selected nodes.
A unique constraint cannot be the target of a keyref.
136
Namespaces
The examples given in this section were simplified in
that the XPath expressions did not allow for a target
namespace.
Recall that XPath expressions always require use of
qualified names. If you are using identity constraints in
a schema with a target namespace, you must declare a
prefix for that namespace, and use that prefix on (say)
element names appearing in the xpath attributes.
137
Imports and Includes
[To Be Added]
138
“Particle Derivation OK”
139
Inheritance in OOP and XML
We saw that XML Schema makes heavy use of a concept of type
inheritance.
This concept is clearly inspired by the corresponding concept in
Object Oriented Programming. But the analogy between XML
and OOP is by no means exact.
In OOP, a class has a set of disjoint, essentially independent,
named members (fields and methods).
• In derivation, this set can be extended, or named members can be
individually overridden.
In XML, a complex type has a set of attributes and a content
model.
• The attributes behave much like the independent members of a class, and
the set of attributes can naturally be extended during derivation.
• The analogy works much less well for content models. The complex
ordering and nesting relations within element content limit the options for
extension.
• And, while perhaps more “mathematically natural” than extension, we
will see restriction of content models has its own implementation problems.
140
Extension and Restriction
Unlike typical OOP programming languages, XML Schema
distinguishes two different forms of type derivation, called
extension and restriction.
• The analogy between Schema type extension and OOP inheritance should
be fairly clear.
• The analogy between Schema type restriction and OOP inheritance may be
less obvious.
• It is based on the insight that when a new class is derived, the new
constructors and methods generally introduce new sets of constraints or
restrictions (“invariants”) on members already in the base class.
Consider a class Square, which may be derived from a base class
Rectangle. The derived class imposes the new invariant width=height.
So OOP inheritance includes aspects of both extension and
restriction.
141
Attributes and Complex Type Extension
Recall typical syntax for extension is like:
<complexType>
<complexContent>
<extension base="base-type">
extra element content
extra attributes
</extension>
</complexContent>
</complexType>
The extra attributes are generally just added to the set of
attributes of base-type.
Some attributes in extra attributes may have the same name (and
namespace) as attributes in base-type; any such attribute must
also have identical type to its namesake in base-type.
• But the new version could have a different default value, say.
If extra attributes includes an attribute wildcard, it must
represent a superset of any attribute wildcard in base-type.
142
Attributes and Complex Type Restriction
If an attribute appearing in a restriction of a complex type is also
an explicitly declared attribute of the base-type, then:
• The simple type of the attribute in the new type must be identical to the
attribute’s type in the base-type, or derived from it by steps of restriction.
• If the attribute is fixed in the base-type, it must be fixed with the same
value in the new type.
• If the attribute is required in the base-type, so must it be in the new type.
Otherwise, there must be a wild-card in the base-type that
matches the attribute declared in the new type. Note:
• If an attribute is required in the base-type, it must be an explicitly declared
attribute of the new type.
• If an attribute was optional in the base-type, it may be specified in the new
type with use="prohibited". This is the same as omitting the attribute in
the new type (and the attribute might still be allowed by a wildcard!)
If there is an attribute wild-card in the restricted type, it must be
a subset of a wild-card in base-type.
143
Content Models and Extension
Consider an extension of a complex type with complex
content that adds non-empty extra element content.
The extra element content must be a particle, and the
element content of the new type is
<xsd:sequence>
base-type element content
extra element content
</xsd:sequence>
(unless the base-type content model was empty, when it
is just the extra element content). Notes:
• This would be illegal if the base-type element content was an
<xsd:all/> particle. You can’t extend such content.
• If the base-type element content is an <xsd:choice/>, there is
no way to extend the set of choices: can only add extra
particles in sequence.
144
Content Models and Restriction
The idea of restricting a content model is fairly intuitive, e.g.:
• Where there is an <xsd:choice/> of several particles, the restricted model
may offer a reduced choice—perhaps it replaces the <xsd:choice/> with
just one of the particles it contained.
• Where there is an optional particle (say minoccurs="0" and
maxoccurs="1") the restricted model might make the particle mandatory
(minoccurs="1") or, conversely, simply omit it.
More generally the restricted model may subset the
minoccurs..maxoccurs range as it sees fit.
• Where there is an <xsd:any/> wildcard (or an element particle that heads
a substitution group) the restricted model might replace it by a more
specific element particle.
Although these ideas seem intuitive, it isn’t particularly easy to
prove automatically that one content model is a valid restriction
of another.
145
Particle Derivation OK
Defining the conditions under which one particle is a
legal restriction of another particle is one of the more
complex parts of the (generally quite complex) XML
Schema specification.
You will find the rules in the section of the specification
called Constraints on Particle Schema Components.
The relevant subsections start with the rule called
Particle Valid (Restriction). This gives some rules for
reducing particles to a “canonical” form, then delegates
to more specialized rules with names like Particle
Derivation OK (X:Y – R), where X, Y, R depend on the
case.
146
Canonical Form
Before comparing two particles to see if one is a valid
restriction of the other, both should be reduced to a
certain canonical form:
• Any occurrence of an element particle that is the head of a
substitution group is replaced by an explicit <xsd:choice/>
between element particles for all members of the substitution
group.
• Empty groups are discarded.
• Redundant singleton <xsd:sequence/>, <xsd:choice/>,
<xsd:any/> particles are replaced by the single particle they
contain.
• An “associative rule” is applied to eliminate
<xsd:sequence/> particles nested inside other
<xsd:sequence/> particles (subject to some conditions on
minoccurs, maxoccurs). Likewise for <xsd:choice/>.
147
Comparing <sequence/> with <sequence/>
There are many specific versions of the Particle
Derivation OK rule—basically one for every kind of
particle you might try to restrict to any other kind of
particle.
We don’t attempt to mention all of them here—just a
couple of interesting cases.
For example, consider the case where you are trying to
restrict an <xsd:sequence/> particle in an existing
content model to an <xsd:sequence/> particle in a new
content model.
The exact rule that takes care of this case is called
Particle Derivation OK (All:All, Sequence:Sequence—
Recurse).
148
All:All, Sequence:Sequence—Recurse
The occurrence ranges (minoccurs, maxoccurs) of the
original and restricted <sequence/> must be consistent
with restriction.
Less trivially, there exists an order-preserving mapping
from the particles in the restricted <sequence/> to
particles of the original <sequence/>, such that:
• Each particle in the restricted <sequence/> is a valid
restriction of its image particle (under the map).
Here we recursively apply the definition of the Particle
Derivation OK, hence the Recurse in the title.
• Any particle of the original <sequence/> that is not in the
range of the map is emptiable—i.e. can match empty content.
It happens that the same rule is used for <all/> groups,
hence the All:All in the title.
149
Schematic Example
Original:
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="paragraph"
minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="figure"
minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
Restricted:
<xsd:sequence>
<xsd:element ref="title"/>
<xsd:element ref="captionFigure"
minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
Arrows illustrate an order-preserving map with required
properties:
• title particle is (trivially) a valid restriction of title particle, and
captionFigure is a valid restriction of figure.
• The original paragraph particle is not in the range of the map, but is
emptiable (because minOccurs is 0).
150
Determinism?
The requirement in the Sequence:Sequence—Recurse
rule that “there exists” a suitable map looks rather
cavalier: how are we to actually discover whether this
map exists?
• In other words, the rule doesn’t seem to give a deterministic
prescription for checking whether one model is a restriction
of the other.
151
A Prescription
A “greedy” prescription that will sometimes find a
suitable, order-preserving map is this:
• Visit the particles of the restricted model in turn, trying to
find a match for each. At any time we have a “next
candidate” particle from the original model, for possible
matching (initially the first particle of the original model).
• If the current particle in the restricted model is a valid
restriction of the “next candidate”, take the candidate as the
mapping of the current particle and carry on to the next
particles in both models.
• Otherwise, if the current particle is not a valid restriction of
the candidate, but the candidate is emptiable, try again with
the immediately following particle in the original model as
“next candidate”.
• Otherwise, this prescription fails to find a map.
152
A Case Where that Prescription Fails
Original:
<xsd:sequence>
<xsd:element ref="paragraph"
minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="figure"/>
minOccurs="0"
maxOccurs="unbounded"/>
<xsd:element ref="paragraph"/>
</xsd:sequence>
Restricted:
<xsd:sequence>
<xsd:element ref="paragraph"/>
</xsd:sequence>
The “greedy” prescription will try to match the paragraph
particle in the restricted model to the first paragraph particle of
the original model. But the resulting map is unsatisfactory,
because then the final paragraph particle of the original model is
not in the range of the map, nor is it emptiable.
Meanwhile, in fact, “there exists” a suitable map: just map the
paragraph particle of the restricted model to the final particle of
the original.
153
Unique Particle Attribution to the Rescue!?
But, the “Original” model on the previous slide is an
illegal content model according to the Unique Particle
Attribution rule!
• Recall this is the XML Schema analogue of a rule about
DTDs, which says content models must be “deterministic”.
While the XML Schema specification doesn’t spell this
out, it seems semi-plausible that, if content models
satisfy the Unique Particle Attribution rule, then a
simple greedy prescription will find the orderpreserving mapping required by Particle Derivation
OK, if such a mapping exists.
• This makes checking Particle Valid (Restriction) tractable.
154
Clause 1.5
Finally, we note that there is a slightly mysterious
clause in the section of the Schema specification called
Schema Component Constraint: Derivation Valid
(Extension), which is supposed to ensure that, in a chain
of derivation, nothing removed by a restriction may be
added back by a subsequent extension.
• We omit the details here! The rule isn’t very clearly stated in
the specification (IMHO).
155
Conclusion
In this section we have just briefly touched on the issues
of what constitutes a valid extension or restriction of a
content model.
The general rules are complicated. If you intend to use
these capabilities of XML Schema in non-trivial ways,
expect surprises!
156