SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul 3rd Assignment: Validating XML with DTD & XML Schema (page 1/2) The.

Download Report

Transcript SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul 3rd Assignment: Validating XML with DTD & XML Schema (page 1/2) The.

SE 5145 –
eXtensible Markup Language (XML )
XML Schema
2011-12/Spring, Bahçeşehir University, Istanbul
3rd Assignment: Validating XML with
DTD & XML Schema (page 1/2)
The goal of this exercise is to understand the basic concepts of XML Schema and how it extends the capabilities of
DTDs. You will use your XML Resume (CV) that you provided in Assignment 2.
Task 1. XML Schema: Write an XML schema definition for your XML Resume satisfying the following
requirements:



For any date in your Resume XML, make sure that your XML Schema checks for a valid date value. Try to
avoid xs:string as much as possible, or if you think that something really is a string, use your own string type
which for example could take care of checking for a maximum length and some character set (a xs:pattern
could be used to achieve the latter).
Make sure that at least one of your types is used by more than one element (because reuse is good). In real-life
applications, you would start to design a type library, and then start using it when constructing your schema
from the ground up.
Use minOccurs and maxOccurs to restrict the cardinality of some elements.
See next slide..
2
3rd Assignment: Validating XML with
DTD & XML Schema (page 2/2)
The following are recommended (but optional) for this assignment:

Depending on how similar or different your employer and education entries are, try to think of a way how you
could find some structural similarity between these entries and then represent this similarity using complex type
derivation.

Try to add a targetNamespace to your schema, so that your Resume schema now is a full-grown schema with
its own namespace. Don't forget that you have to change the instance (by using the namespace there) to match
the schema when you do that.

Identity constraints could be used to check various aspects of the Resume , depending on what you think should
be unique, a key, or a reference to an existing key. A typical example would be to have a key for institutions
(educational or companies), and then have each of your skills reference this key so that you can represent where
you have acquired each skill.
Task 3 – Validate XML: Use a tool to validate your XML Resume (*.xml) using your XML schema (*.xsd).

A suitable online tool is http://www.xmlvalidation.com/. On the first page, provide the XML document and
select ‘Validate against external XML schema’. Click ‘Validate’ and provide the XML schema on the second
page.

Another tool is Altova XML Spy (You can use download & use a trial version)

Alternatively, you can remember and use the recommended tools described by Melike (validator.w3.org) and
Erokan (iexmltls.exe, msval.vbs) from the last presentations.
3
XML Schemas

“Schemas” is a general term--DTDs are a form of XML
schemas


When we say “XML Schemas,” we usually mean the
W3C XML Schema Language



According to the dictionary, a schema is “a structured
framework or plan”
This is also known as “XML Schema Definition” language, or
XSD
It has been introduced to overcome some of the commonly
observed limitations of DTDs, most notably the lack of
typing
DTDs, XML Schemas, RELAX NG and Schematron
are all XML schema languages
4
What’s Wrong with DTDs?

DTDs do not support application-level datatypes



DTDs do not support any relationships between markup constructs




content models cannot be reused
attribute lists cannot be reused
structural relationships cannot be exploited in the DTD
DTDs provide a very weak specification language




XML for B2B is very data-centric and needs typing
SGML was created for documents where typing was less important
No restrictions on text content
Very little control over mixed content (text plus elements)
Little control over ordering of elements
DTDs are written in a strange (non-XML) format

You need separate parsers for DTDs and XML
5
Why XML Schemas?


XML Schema Definition language (XSD) solves these problems
XSD allows you to constrain the content of XML documents like DTDs, but
they are much more powerful & sophisticated.



XML Schema's simple data type provide some semantics





XSD allows a much finer level of control over structure and content
XSD is written using XMLsyntax instead of a custom syntax like DTDs use
a formerly undescribed attribute can now be described as being a xs:date
it can be understood as being a date and inserted into a calendar
but what kind of date is it? a birthday? an order date? a shipping date?
a question of the context of where the xs:date appears
XML Schema better supports model-level information




however, XML Schema also only captures part of the application semantics
an XML Schema is usually better than a DTD, because it contains types
types provide information about the basic datatypes being used
additional semantics (e.g., different kinds of dates) must be documented elsewhere
6
Why not XML schemas?

DTDs have been around longer than XSD



Power of XSD comes with a price:



Therefore they are more widely used
Also, more tools support them
XSD is a little harder and more verbose to write than DTDs,
even by XML standards
More advanced XML Schema instructions can be nonintuitive and confusing
Nevertheless, XSD is not likely to go away quickly
7
Validation and Typing

XML Schema does two things at the same time:
1. Validation checks for structural integrity (is the
document schema-valid?)


checking elements and attributes for proper usage (as with DTDs)
checking element contents and attribute values for proper values
2. Type annotations make the types available to applications


instead of having to look at the schema, applications get the Post-Schema
Validation Infoset (PSVI)
type-based applications (such as XSLT 2.0) can work on the typed
instance
8
Schema-Validation and Applications
9
Anatomy of a Schema




Schema uses the namespace defined by
http://www.w3.org/2001/XMLSchema and usually uses xsd or xs
prefix in the XML code
The file extension is .xsd
The root element is <schema>
XSD starts like this:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.rg/2001/XMLSchema">
10
Referring to a schema

To refer to a DTD in an XML document, the reference goes before the root
element:


<?xml version="1.0"?>
<!DOCTYPE rootElement SYSTEM "url">
<rootElement> ... </rootElement>
To refer to an XML Schema in an XML document, the reference goes in the
root element:

<?xml version="1.0"?>
<rootElement
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
(The XML Schema Instance reference is required)
xsi:noNamespaceSchemaLocation="url.xsd">
(This is where your XML Schema definition can be found)
...
</rootElement>
11
TYPES

A type is a set of values



the values can be enumerated (home, mobile, office)
the values can be described by extension (intervals, regular expressions)
DTDs have (almost) no types




element content is always #PCDATA (any number of any characters)
attributes most often are CDATA (any number of any characters)
attributes may have enumerated types (but no extensional types)
attributes may use ID/IDREF
12
TYPES
XML Schema
DTD
Concepts
some conceptual model (formal/informal)
Types
ID/IDREF and (#P)CDATA
Hierarchy of Simple and Complex Types
Markup
Constructs
Element Type Declarations
<!ELEMENT order ...
Element Definitions
<xs:element name="order"> ...
Instances
(Documents)
<order date=""> [ order content ] </order>
13
“Simple” and “complex” elements

A “simple” element is one that contains text and nothing
else





A simple element cannot have attributes
A simple element cannot contain other elements
A simple element cannot be empty
However, the text can be of many different types, and may
have various restrictions applied to it
If an element isn’t simple, it’s “complex”


A complex element may have attributes
A complex element may be empty, or it may contain text,
other elements, or both text and other elements
14
“Simple” types

Simple types describe values not structured by XML markup



Simple types can be used for elements or attributes



they describe attribute values (date="2006-10-03")
they describe element content (<phone>+1-510-6432253</phone>)
XML Schema treats contents in elements and attributes equally
simple type libraries can be designed independent of their eventual use
Simple types are available in three flavors



atomic types: one value of one type (one number in some range)
union types: one value of a union of types (a number or the
string undefined)
list types: a whitespace-separated list of values (phone type="home
office")
15
Named vs. Anonymous



Types can be named or anonymous
named types have a name and can be referenced (and thus be reused)
anonymous types have no name and can only be used where they are defined
16
Type Definitions

Simple types are sets of values



Simple types are defined to represent model-level information



named simple types are sets of values with a name (and thus reusable)
anonymous simple types are sets of values defined where they are needed
in most cases, they will have restrictions associated with them
they may also simply be tags for semantics (fax and phone numbers share
the same value space)
XML Schema has a library of built-in datatypes




ur-types are the conceptual grounding of all types
primitive types are the types that are there by definition
derived types are based on primitive types
users can derive their own types using simple type restriction
17
Type Hierarchy
18
Built-In Types
19
Declaring Elements with Schema

Elements can be declared as having a simple or
complex type



Types can be either built-in or defined by your Schema
Elements can also have mixed, empty, or element
content, just like in DTDs
Elements can be given a minimum and maximum
number of times that they are allowed to occur
20
Declaring Elements with Schema
21
Defining a simple element


A simple element is defined as
xs:element name="name" type="type"
minoccurs/maxoccurs="number/unbounded" />
where:


name is the name of the element
the most common values for type are
xs:boolean
xs:integer
xs:date
xs:string
xs:decimal
xs:time

minoccurs and maxoccurs are optional, default value= 1

Other attributes a simple element may have:


default="default value"
fixed="value"
if no other value is specified
no other value may be specified
22
Custom Simple Types with Restrict

You can define your own custom simple types by
deriving them from existing simple types with
restriction.




the base type must be a simple type
the derived type will be a simple type
all simple types form a tree, rooted as
the anySimpleType
Restriction are based on facets



each restriction can use 0-n facets
facets can be refined in further simple type restrictions
XML Schema designers should try to restrict types as
much as possible – WHY ?
23
Restrictions

The general form for putting a restriction on a text
value is:


<xs:element name="name">
<xs:restriction base="type">
... the restrictions ...
</xs:restriction>
</xs:element>
(or xs:attribute)
For example:

<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:positiveInteger">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="140"/>
</xs:restriction>
<xs:simpleType>
</xs:element>
24
Facets



Facets define a certain way of restricting a simple type
Facets may be repeated in different levels of the type
hierarchy
Not all facets are applicable to all types

the applicability depends on the primitive type being used
25
Restrictions on numbers

minInclusive -- number must be ≥ the given value

minExclusive -- number must be > the given value

maxInclusive -- number must be ≤ the given value

maxExclusive -- number must be < the given value

totalDigits -- number must have exactly value digits

fractionDigits -- number must have no more than value digits
after the decimal point
26
Restrictions on strings

length -- the string must contain exactly value characters

minLength -- the string must contain at least value characters

maxLength -- the string must contain no more than value characters

pattern -- the value is a regular expression that the string must match

whiteSpace -- not really a “restriction”--tells what to do with whitespace

value="preserve"
Keep all whitespace

value="replace"
Change all whitespace characters to spaces

value="collapse"
Remove leading and trailing whitespace, and replace
all sequences of whitespace with a single space
27
Patterns

Patterns restrict the lexical space of simple types



most other facets restrict the value space (e.g., intervals of numbers)
in many cases, patterns are useful additions to value-oriented facets
Patterns are regular expressions



they support many common regex constructs and Unicode
the language pattern allows de, de-CH, and other tags
the pattern checks for lexical correctness, not against a code list
([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})*
28
Facet Limitations

Facets limit one dimension of a type's value space




There is no connection to the context within the document


using pattern, the lexical space can also be restricted
restrictions should be made as specific as possible
no limitations are possible beyond the predefined facets
facets cannot make references to other values (e.g., neighboring attributes)
Additional constraints should be documented


documentation enables applications to implement constraint checking
other schema languages (like Schematron) may be used to express these
constraints
29
Enumeration


An enumeration restricts content to allowable
choices
Example:

<xsd:element name="season">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Spring"/>
<xsd:enumeration value="Summer"/>
<xsd:enumeration value="Autumn"/>
<xsd:enumeration value="Fall"/>
<xsd:enumeration value="Winter"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
30
Simple Type Examples
31
What is a Complex Type ?

Complex types describe the allowed element content



Complex types do not define the element name



they describe what the element may contain (the element's content
model)
they describe the attributes that an element may have (the
element's attribute list)
they define which content is allowed for the element
the element definition uses the complex type to define the allowed
element content
Complex types have similar properties to simple types


they can be named or anonymous
Complex Type Derivation can be used to construct a type hierarchy
32
Declaring Complex Elements

To declare the elements with complex type:


Use the xsd:anyType value for the type attribute
Use the <xsd:complexType> tag in the definition
Structure:
<xs:element name="name">
<xs:complexType>
... information about the complex type...
</xs:complexType>
</xs:element>

Remember that attributes are always simple types
33
Complex elements

Example:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:sequence> says that elements must occur in this order
34
Complex Type Example
35
Complex Types & Content Types

Complex types can have different kinds of content



simple content refers to simple type content using additional
attributes
complex content is anything else (anything beyond simple type
content)
Complex Type Derivation heavily depends on this
classification
36
DTD Content Models

Defining Elements in DTDs uses a compact syntax



DTDs allow elements to be mandatory, optional, repeatable, or optional
and repeatable


XML Schema allows the cardinality to be specified
DTDs allow sequences (,) and alternatives (|)


XML Schema supports the same facilities with a more verbose syntax
XML Schemas adds features which DTDs do not support
XML Schema introduces a (very limited) operator for all groups
Apart from the syntax, XML Schema content models are not very
different
37
Empty Content

DTDs have a special keyword for empty elements



XML Schema empty types are defined implicitly



instead of the content model, the keyword EMPTY is used
empty elements may still have attribute lists associated with them
there is no explicit keyword for defining an empty type
if a type has no model group inside it, it is empty (it still may have
attributes)
Declaring empty elements
<xs:element name="myEmptyElement">
<xs:complexType>
</xs:complexType>
</xs:element>
38
Mixed Content

DTDs define mixed content by mixing #PCDATA into the content
model



XML Schema defines mixed content outside of the content model



DTDs always require mixed content to use the form ( #PCDATA | a | b )*
the occurrence of elements in mixed content cannot be controlled
the content model is defined like an element-only content model
the mixed attribute on the type marks the type as being mixed
Example: (only one subtitle is allowed, why ?)
39
Mixed Content

XML Schema mixed content can use all model groups


it is possible to constrain element occurrences in the same way as in
element-only content
in practice, this feature is rarely used (mixed content often is very loosely
defined)
40
Defining an attribute



Attributes are always declared as simple types
Any of the simple types that can be used for elements can also be
used for attributes.
An attribute is defined as
<xs:attribute name="name" type="type" />
where:

name and type are the same as for xs:element
41
Defining an attribute

Other attributes a simple element may have:






default="default value"
fixed="value"
use="required"
use="optional"
use="prohibited"
if no other value is specified
no other value may be specified
attribute must be present
attribute is not required (default)
attribute can not be used
Example:

<xsd:attribute name="city" type="xsd:string"
use="optional" default="istanbul"/>
42
Adding attributes to the elements

Adding attributes to an element that has an empty
content model
43
Adding attributes to the elements

Adding attributes to an element that only has character
data content
44
Adding attributes to the elements

Adding attributes to an element that have element or
mixed content models
45
Global and local definitions




Elements declared at the “top level” of a <schema> are available for
use throughout the schema
Elements declared within a xs:complexType are local to that type
Thus, in
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
the elements firstName and lastName are only locally declared
The order of declarations at the “top level” of a <schema> do not
specify the order in the XML data document
46
Declaration and use


So far we’ve been talking about how to declare types,
not how to use them
To use a type we have declared, use it as the value of
type="..."

Examples:



<xs:element name="student" type="person"/>
<xs:element name="professor" type="person"/>
Scope is important: you cannot use a type if is local to some
other type
47
Declaring elements with element content



Sequence: child elements must appear in order
All: child elements can occur in any order
Choice: any one of the child elements from a list
48
sequence


child elements must appear in a specific order:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
49
xs:all




Child elements can appear in any order
<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:all>
</xs:complexType>
</xs:element>
Despite the name, the members of an xs:all group can occur once
or not at all
You can use minOccurs="n" and maxOccurs="n" to specify how many
times an element may occur (default value is 1)

In this context, n may only be 0 or 1
50
xs:choice

Aany one of the child elements from a list
<xs:element name="vehicle">
<xs:complexType>
<xs:choice>
<xs:element name="car" type="xs:integer" />
<xs:element name="bus" type="xs:string" />
<xs:element name="van" type="xs:string" />
<xs:element name="motorcycle" type="xs:integer" />
</xs: choice >
</xs:complexType>
</xs:element>
51
Referencing


Once you have defined an element or attribute (with
name="..."), you can refer to it with ref="..."
Example:



<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:all>
</xs:complexType>
</xs:element>
<xs:element name="student" ref="person">
Or just: <xs:element ref="person">
52
Text element with attributes

If a text element has attributes, it is no longer a simple
type


<xs:element name="population">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="year" type="xs:integer">
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
53
Empty elements


Empty elements are (ridiculously) complex
<xs:complexType name="counter">
<xs:complexContent>
<xs:extension base="xs:anyType"/>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexContent>
</xs:complexType>
54
Mixed elements




Mixed elements may contain both text and elements
We add mixed="true" to the xs:complexType element
The text itself is not mentioned in the element, and
may go anywhere (it is basically ignored)
<xs:complexType name="paragraph" mixed="true">
<xs:sequence>
<xs:element name="someName" type="xs:anyType"/>
</xs:sequence>
</xs:complexType>
55
Extensions


You can base a complex type on another complex type
<xs:complexType name="newType">
<xs:complexContent>
<xs:extension base="otherType">
...new stuff...
</xs:extension>
</xs:complexContent>
</xs:complexType>
56
Predefined string types


Recall that a simple element is defined as:
<xs:element name="name" type="type" />
Here are a few of the possible string types:




xs:string -- a string
xs:normalizedString -- a string that doesn’t contain
tabs, newlines, or carriage returns
xs:token -- a string that doesn’t contain any
whitespace other than single spaces
Allowable restrictions on strings:

enumeration, length, maxLength, minLength,
pattern, whiteSpace
57
Predefined date and time types




xs:date -- A date in the format CCYY-MM-DD, for
example, 2002-11-05
xs:time -- A date in the format hh:mm:ss (hours,
minutes, seconds)
xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss
Allowable restrictions on dates and times:

enumeration, minInclusive, maxExclusive, maxInclusive,
maxExclusive, pattern, whiteSpace
58
Predefined numeric types

Here are some of the predefined numeric types:
xs:decimal
xs:byte
xs:short
xs:int
xs:long

xs:positiveInteger
xs:negativeInteger
xs:nonPositiveInteger
xs:nonNegativeInteger
Allowable restrictions on numeric types:

enumeration, minInclusive, maxExclusive, maxInclusive,
maxExclusive, fractionDigits, totalDigits, pattern,
whiteSpace
59
Practice..
60