Transcript Document
e-Science e-Business e-Government and their Technologies XML Schema Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories Indiana University Bloomington IN 47404 January 12 2004 [email protected] [email protected] [email protected] http://www.grid2004.org/spring2004 1 Introduction We saw that DTDs provide an approach to validating XML documents: ensuring they have the structure expected for a particular application. With the increasing use of XML for data-centric applications—e.g. XML formats for messages exchanged by Web Services—limitations of DTDs (which were inherited from SGML) soon became apparent. XML Schema is a more recent validation framework for XML, which attempts to address the shortcomings of DTDs for data-centric applications, for example by providing a much richer set of data types. 2 Problems with DTDs DTDs have some clear limitations: • Restricted set of data types: attribute data is either general character data, name tokens, ID or IDREF (or arcane cases); element content is either general character data or nested elements or some mixture. For data-centric applications, we might want a value to be a well-formed number, date, etc, etc. • DTDs are not convenient for dealing with XML Namespaces—essential for modularity on the Web. • The uniqueness and consistency requirements associated with ID, IDREF are powerful, but could be much more refined. • There are various obscure constraints on element content specifications, needed purely for historical SGML compatibility. 3 XML Schema XML Schema address all the issues mentioned on the previous slide. • Also have the interesting property that an XML Schema is itself a well-formed XML document—some people consider this a significant advantage. This is the good news. The less good news is that the XML Schema 1.0 specification is longer by almost an order of magnitude than the basic XML specification— DTDs and all. 4 General Comparison DTDs A DTD defines all elements, etc, in one type of document. For documents with multiple namespaces, somehow patch together one large DTD. Directly define structures of named elements. XML Schema A schema defines all elements, etc, in a single namespace. For documents with multiple namespaces, use multiple schemas. Define structures of complex types of element; then declare named element of that type. Limited built-in data types for attributes. Extensive built-in simple types for attributes and element content No entity substitution mechanism. Complex entity substitution mechanism. 5 Reading Material The XML Schema Specification itself comes in parts 0, 1, and 2. Parts 1 and 2 are long and tough to read, but part 0 is a reasonable (“non-normative”) introduction: XML Schema Part 0: Primer, May 2001. http://www.w3.org/TR/xmlschema-0/ There are some good and bad books. A good one is: Definitive XML Schema, Priscilla Walmsley, Prentice Hall, 2002. There is a comprehensive (but again rather long) tutorial introduction to XML Schema by Roger Costello at: http://www.xfront.com/ 6 “Report” Format Revisited When discussing DTDs we described a simple “report” format. Here is a slightly expanded version of the DTD given there: <!DOCTYPE report [ <!ELEMENT report (title, (paragraph | figure)*, bibliography?) > <!ELEMENT title (#PCDATA)> <!ELEMENT paragraph (#PCDATA)> <!ELEMENT figure EMPTY> <!ATTLIST figure source CDATA #REQUIRED > <!ELEMENT bibliography (reference)* > … ]> We begin our detailed discussion of schema by considering how to give an equivalent XML Schema for this document. 7 Declaring a paragraph Element The report schema is surprisingly long: we will build up to it in several incremental steps. First consider the paragraph element. Using DTD, we declared this element by: <!ELEMENT paragraph (#PCDATA)> An equivalent declaration in XML schema might be: <xsd:element name="paragraph" type="xsd:string"/> • xsd:element is itself an element in the XML Schema namespace; this example assumes we use xsd as the prefix for that namespace. • xsd:type is a predefined type in that namespace. 8 xsd:string Primitive Type XML Schema has a complex system of types. Different types may describe: 1. the allowed values of attributes, 2. the allowed content of elements, or 3. the allowed content and the allowed attributes of elements. There is a subset of types, called the simple types, that can be used in either of the first two roles. One of the simplest of all is string. Used as an attribute type, this is equivalent to the DTD type CDATA; used as an element type, this is equivalent to the DTD content specification (PCDATA). 9 Declaring a report Element We initially simplify to a schema in which a report consists only of a series of paragraphs. In DTD a possible declaration of the root element would be: <!ELEMENT report (paragraph)*> An equivalent declaration in XML schema might be: <xsd:element name=“report"> <xsd:complexType> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> 10 Elements with Complex Type This rather verbose declaration says: • The element named report has complex type. • The content associated with this complex type is a sequence of elements. • This sequence consists of at least 0 and at most an unbounded number of occurrences of paragraph elements. Here the xsd:element element has different roles: • Outermost xsd:element declares the element named report. • Innermost xsd:element uses the element named paragraph, declared elsewhere. The role is determined by the presence or absence of the ref attribute. 11 Local Declarations In fact xsd:element can have in a third role, which is considered to be a combined declaration and use, e.g.: <xsd:element name= "report"> <xsd:complexType> <xsd:sequence> <xsd:element name="paragraph" type="xsd:string“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> • Here the report element has its own local declaration of paragraph; no separate global declaration is necessary. 12 Global vs Local Element Declarations Declarations that occur as children of the top-level schema element are global declarations. • These are the only declarations that can actually be “used” from elsewhere. “Local declarations”—like the one illustrated on the previous slide—are “used” exactly once at their point of declaration. • This is different from the concept of local declarations in most programming languages. • Local element declarations interact with namespaces in a non-obvious way: perhaps best avoided until you are sure you know what you are doing. 13 Global vs Local Type Definitions The type of the report element was specified by an xsd:complexType element nested within the element declaration. The type of the paragraph element was specified by a type attribute on the declaration, referencing a named type. In fact types, like elements, can always be defined locally where they are used, or defined globally, then referenced from a point of use. The following slide illustrates yet another way to declare report. 14 Named Type Definitions In this version we introduce a named complex type called reportType, then declare the report element with this type: <xsd:complexType name="reportType"> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:element name="report" type="reportType"/> This abstraction facility—introducing new named types—is a central theme of XML Schema. 15 A Complete XML Schema <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.grid2004.org/ns/report1" xmlns="http://www.grid2004.org/ns/report1"> <xsd:element name="report"> <xsd:complexType> <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="paragraph" type="xsd:string"/> </xsd:schema> 16 Remarks Recall this schema is essentially equivalent to the DTD: <!DOCTYPE report [ <!ELEMENT report (paragraph)* > <!ELEMENT paragraph (#PCDATA)> ]> Clearly the schema has more baggage (or more added value, according to your point of view!) Our schema declares two element names, report and paragraph, and puts them in a namespace called http://www.grid2004.org/ns/report1. 17 Namespace Considerations The root element of any schema is a schema element from the http://www.w3.org/2001/XMLSchema namespace. The targetNamespace attribute on this element specifies which namespace the elements declared here “go into”. We have seen the other namespace attributes before: • The xmlns:xsd attribute associates the prefix xsd with the XML Schema namespace. • The xmlns attribute makes the default namespace http://www.grid2004.org/ns/report1 for this document. • Often one uses xsd as the prefix for schema elements, and makes the target namespace the default namespace of the schema document, but neither is essential. 18 An XML Instance Document <?xml version="1.0"?> <report xmlns="http://www.grid2004.org/ns/report1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid2004.org/ns/report1 report1.xsd"> <paragraph>Recently uncovered documents prove... </paragraph> <paragraph>The author is grateful to W3C for making this research possible.</paragraph> </report> 19 Namespace Considerations Assuming the document vocabulary belongs to a namespace, we must declare this namespace. • In this example http://www.grid2004.org/ns/report1 is declared as the default namespace. If the instance document is to be validated against a schema, we must normally define where the schema for the namespace is located. This is done here by putting an attribute schemaLocation on the root element of the document. This attribute is itself defined in a standard namespace, called http://www.w3.org/2001/XMLSchema-instance. So we must introduce a prefix for this (xsi is traditional). 20 schemaLocation The value of the schemaLocation attribute should be a pair of IRIs: a namespace name and the corresponding Schema URI. • If the document uses more than one namespace, the value can be several consecutive pairs. • All tokens are separated by white space. In this example the schema should be in the file report1.xsd in the same directory as the instance document. 21 Schema Validation Using dom.Writer If I save the instance document in a file called “xsdreport1.xml”, and the schema in a file called “report1.xsd”, I can validate the file with the Xerces parser by using the dom.Writer sample application as follows: > java dom.Writer –v –s –f xsdreport1.xml • If validation is successful, this simply prints a formatted version of the input file. If schema validation fails, you will see error messages early in the output. • The –v –s flags are needed here. Without –s the parser will try to do just DTD validation. -f means “full” schema validation—presumably a good thing. 22 Schema Validation from Java Unfortunately it doesn’t seem to be possible to enable XML Schema validation in Xerces using the “vendorneutral” JAXP API. • The DOM Level 3 API will enable this, but it is not finalized or fully deployed at the time of this writing. For now you must directly use the “proprietary” org.apache.xerces.parsers.DOMParser Xerces implementation class. Use is sketched on the next slide. 23 The Xerces DOMParser API import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.*; … static final String VALIDATION_FEATURE_ID = "http://xml.org/sax/features/validation" ; static final String SCHEMA_VALIDATION_FEATURE_ID = "http://apache.org/xml/features/validation/schema" ; static final String SCHEMA_FULL_CHECKING_FEATURE_ID = "http://apache.org/xml/features/validation/schema-full-checking" ; … DOMParser parser = new DOMParser(); // Turn Schema Validation on parser.setFeature(VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_VALIDATION_FEATURE_ID, true); parser.setFeature(SCHEMA_FULL_CHECKING_FEATURE_ID, true); parser.setErrorHandler(new MyErrorHandler()) ; parser.parse(uri) ; // uri is XML instance file Document document = parser.getDocument() ; … 24 More on Complex Types If an element may have nested elements, or if it may have attributes, it must be described by a complex type. • If neither of these conditions holds—the element has only character data content and no attributes—it is usually more convenient to use a simple type. Attributes on complex types are specified by an attribute element, e.g.: <xsd:element name="figure"> <xsd:complexType> <xsd:attribute name="source" type="xsd:string"/> </xsd:complexType> </xsd:element> 25 Attribute Declarations Like element declarations, attributes may be declared globally, then used inside a complex type declaration, through an xsd:attribute element with a ref attribute. • In contrast to the situation with elements, local declaration of attributes is often a natural choice. The figure example above has a complex type with no content. In general attribute specifications go after the content specification, in the body of the xsd:complexType element. 26 Element Sequences and Choices To finish this introductory foray into XML Schema, we restore our report element back to its original specification. The XML Schema declaration is given on the next slide. Recall this is supposed to be equivalent to the DTD declaration: <!ELEMENT report (title, (paragraph | figure)*, bibliography?) > • The use of the xsd:sequence and xsd:choice elements should be reasonably self explanatory. • Note how the minOccurs, maxOccurs attributes replace use of the *, ? operators: both have default values of 1. 27 Original report Element Structure <xsd:element name="report"> <xsd:complexType> <xsd:sequence> <xsd:element ref="title"/> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="paragraph"/> <xsd:element ref="figure"/> </xsd:choice> <xsd:element ref="bibliography" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> 28 Simple Types: Schema Datatypes 29 XML Schema Simple Types Recall simple types can be used to describe the values of attributes, or the content of elements that have no nested elements (“character data” content). So far we only illustrated one simple type built in to XML Schema: namely string. • As an attribute type this is similar to the DTD attribute type CDATA; as an element type, it is similar to the DTD content specification (PCDATA). Most of the details of simple types are defined in the W3 recommendation XML Schema Part 2: Datatypes. 30 Built In and User-Defined Types XML Schema provides over 40 built in simple types. It also provides flexible mechanisms for creating your own simple types, • which may in fact impose rather complex patterns on text content. 31 Schema Built In Types 32 Built In Simple Types Simple Type Examples (comma separated) string Confirm this is electric normalizedString Confirm this is electric token Confirm this is electric base64Binary GpM7 hexBinary 0FB7 byte -1, 126 unsignedByte 0, 126 33 Built In Simple Types (continued) Simple Type Examples (comma separated) integer -126789, -1, 0, 1, 126789 positiveInteger 1, 126789 negativeInteger -126789, -1 nonNegativeInteger 0, 1, 126789 nonPositiveInteger -126789, -1, 0 int -1, 126789675 unsignedInt 0, 1267896754 long -1, 12678967543233 unsignedLong 0, 12678967543233 short -1, 12678 unsignedShort 0, 12678 34 Built In Simple Types (continued) Simple Type Examples (comma separated) decimal -1.23, 0, 123.4, 1000.00 float -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN double -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN boolean true, false, 1, 0 35 Built In Simple Types (continued) Simple Type Examples (comma separated) time 13:20:00.000, 13:20:00.000-05:00 dateTime 1999-05-31T13:20:00.000-05:00 duration P1Y2M3DT10H30M12.3S date 1999-05-31 gMonth --05-- gYear 1999 gYearMonth 1999-02 gDay ---31 gMonthDay --05-31 36 Built In Simple Types (continued) Simple Type Examples (comma separated) Name shipTo QName po:USAddress NCName USAddress anyURI http://www.example.com/, http://www.example.com/doc.html#ID5 en-GB, en-US, fr language 37 Built In Simple Types (continued) Simple Type Examples (comma separated) ID IDREF IDREFS ENTITY ENTITIES NOTATION NMTOKEN US, Brésil NMTOKENS US UK, Brésil Canada Mexique 38 Creating New Simple Types There are three basic approaches to building new simple types (deriving simple types): • Restricting facets of an existing simple type. • Creating a list type from an existing simple type. • Creating a union type from some existing simple types. The most sophisticated mechanism is the first— restriction using facets. 39 Facets The 19 primitive types (the built in types derived directly from anySimpleType) have a set of constraining facets restricting allowed values. The constraining facets of a simple type are a subset of: • length, minLength, maxLength, pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits • Restricted types have all the facets of their base types— though values of the facets may be different. • There is no way for schema writers to introduce new facets— users cannot directly restrict anySimpleType. • Technically simple types have additional fundamental facets, but values of these flags cannot be set directly. They are: equal, ordered, bounded, cardinality, numeric 40 Restriction Here is a characteristic example of restriction: <xsd:simpleType name="singleDigit"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-9"/> <xsd:maxInclusive value="9"/> </xsd:restriction> </xsd:simpleType> This starts from the built in xsd:integer, and defines a derived type singleDigit by setting the facet minInclusive to -9 and the facet maxInclusive to 9. Thus the type singleDigit represents a whole number between -9 and +9. 41 Length The facets length, minLength, maxLength allow to constrain the length of an item like a string (also allow to constrain the number of items in a list type, see later). Values of length, minLength, minLength should be non-negative integers. Example: <xsd:simpleType name="state"> <xsd:restriction base="xsd:string"> <xsd:length value="2"/> </xsd:restriction> </xsd:simpleType> defines a type state representing strings containing exactly two characters. • These facets supported by all primitive types other than numeric and dateand time-related types. Also supported by list types. 42 Pattern Perhaps the most powerful facet is pattern, which allows to specify a regular expression: any allowed value must satisfy the pattern of this expression. Example <xsd:simpleType name="weekday"> <xsd:restriction base="xsd:string"> <xsd:pattern value="(Mon|Tues|Wednes|Thurs|Fri)day"/> </xsd:restriction> </xsd:simpleType> defines a type weekday representing the names of the week days. 43 Regular Expressions XML Schema has its own notation for regular expressions, but very much based on the corresponding Perl notation. For the most part Schema use a subset of the Perl 5 grammar for regular expressions. • Includes most of the purely “declarative” features from Perl regular expressions, but omits many “procedural” features related to search, matching algorithm, substitution, etc. XML Schema adds a few features of its own, e.g.: • Matching characters legal in XML names. • Character class subtraction. • Inherits general XML escape mechanisms for Unicode characters, replacing analogous Perl mechanisms. 44 Metacharacters The following characters, called metacharacters, have special roles in Schema regular expressions: . \ ? * + | { } ( ) [ ] • Like Perl, but treats }, ] uniformly as metacharacters, and omits search-related metacharacters ^ and $. To match these characters literally in patterns, must escape them with \, e.g.: • The pattern “2\+2” matches the string “2+2”. • The pattern “f\(x\)” matches the string “f(x)”. 45 Escape Sequences In general one should use XML character references to include hard-to-type characters. But for convenience Schema regular expressions allow: • \n matches a newline character (same as 
) • \r matches a carriage return character (same as 
) • \t matches a tab character (same as 	) All other escape sequences (except \- and \^, used only in character class expressions) match any single character out of some set of possible values. • For example \d matches any decimal digit, so the pattern “Boeing \d\d\d” matches the strings “Boeing 747”, “Boeing 777”, etc. 46 Multicharacter Escapes The simplest patterns matching classes of characters are: • • • • • • . matches any character except carriage return or newline. \d matches any decimal digit. \s matches any white space character. \i matches any character that can start an XML name. \c matches any character that can appear in an XML name. \w matches any “word” character (excludes punctuation, etc.) The escapes \D, \S, \I, \C and \W are negative forms, e.g. \D matches any character except a decimal digit. • Similar to Perl, except: Perl doesn’t have \i, \I; Perl uses \c, \C for other things; detailed definitions of \w, \W are different. 47 Category Escapes A large and interesting family of escapes is based on the Unicode standard. General form in Perl or Schema is \p{Name} where Name is a Unicode-defined class name. • The negative form \P{Name} matches any character not in the class. Simple examples include: \p{L} (any letter), \p{Lu} (upper case letters), \p{Ll} (lower case letters), etc. More interesting cases are based on the Unicode block names for alphabets, e.g.: • \p{IsBasicLatin}, \p{IsLatin-1Supplement}, \p{IsGreek}, \p{IsArabic}, \p{IsDevanagari}, \p{IsHangulJamo}, \p{IsCJKUnifiedIdeographs}, etc, etc, etc. 48 Character Class Expressions Allow you to define terms that match any character from a custom set of characters. Basic syntax is familiar from Perl and UNIX: [List-of-characters] or the negative form: [^List-of-characters] Here List-of-characters can include individual characters, and also ranges of the form First-Last where First and Last are characters. Examples: • [RGB] matches one of R, G, or B. • [0-9A-F] or [\dA-F] match one of 0, 1, …, 9, A, B,…, F. • [^\r\n] matches anything except CR, NL (same as . ). 49 Class Subtractions A feature of XML Schema, not present in Perl 5. A class character expression can take the form: [List-of-characters-Class-char-expr] or: [^List-of-characters-Class-char-expr] where Class-char-expr is another class character expression. Example: • [a-zA-Z-[aeiouAEIOU]] matches any consonant in the Latin alphabet. 50 Sequences and Alternatives Finally, the universal core of regular expressions. If Pattern1 and Pattern2 are regular expressions, then: • Pattern1Pattern2 matches any string made by putting a string accepted by Pattern1 in front of a string accepted by Pattern2. • Pattern1|Pattern2 matches any string that would be accepted by Pattern1, or any string accepted by Pattern2. Parentheses just group things together: • (Pattern1) matches any string accepted by Pattern1. An example given earlier: • (Mon|Tues|Wednes|Thurs|Fri)day matches any of the strings Monday, Tuesday, Wednesday, Thursday, or Friday. • Equivalent to Monday|Tuesday|Wednesday|Thursday|Friday. 51 Quantifiers … and if Pattern1 is a regular expression: • Pattern1? matches the empty string or any string accepted by Pattern1. • Pattern1+ matches any string accepted by Pattern1, or by Pattern1Pattern1, or by Pattern1Pattern1Pattern1, or … • Pattern1* matches the empty string or any string accepted by Pattern1+. If n, m are numbers, Perl and XML Schema also allow the shorthand forms: • Pattern1{n} is equivalent to Pattern1 repeated n times. • Pattern1{m,n} matches any string accepted by Pattern1 repeated m times or m + 1 times or … or n times. • Pattern1{m,} matches any string accepted by Pattern1 repeated m or more times. 52 Using Patterns in Restriction All simple types (including lists and enumerations) support the pattern facet, e.g.: <simpleType name=“multiplesOfFive"> <restriction base="xs:integer"> <pattern value=“[+-]?\d*[05]"/> </restriction> </simpleType> defines a subtype of integer including all numbers ending with digits 0 or 5. The pattern facet can appear more than once in a single restriction: interpretation is as if patterns were combined with |. • Conversely if the pattern facet is specified in restriction of a base type that was itself defined using a pattern, allowed values must satisfy both patterns. 53 Enumeration The enumeration facet allows one to select a finite subset of allowed values from a base type, e.g.: <xsd:simpleType name="weekday"> <xsd:restriction base="xs:string"> <xsd:enumeration value="Monday"/> <xsd:enumeration value="Tuesday"/> <xsd:enumeration value="Wednesday"/> <xsd:enumeration value="Thursday"/> <xsd:enumeration value="Friday"/> </xsd:restriction> </xsd:simpleType> • Behaves like a very restricted version of pattern? • All primitive types except boolean support the enumeration facet. List and union types also support this facet. 54 White Space This facet controls how white space in a value received from the parser is processed, prior to Schema validation. It can take three values: • preserve: no white space processing, beyond what base XML does. • replace: Convert every white space character (Line Feed, etc) to a space character (#x20). • collapse: Like replace. All leading or trailing spaces are then removed. Also sequences of spaces are replaced by single spaces. Note analogies to “Normalization of Attribute Values” in base XML. All simple types except union types have this attribute, but usually you don’t explicitly set it in restriction: just inherit values from built in types. All built in types have collapse, except string which has preserve and normalizedString which has replace. 55 Other Facets The facets maxInclusive, maxExclusive, minExclusive, minInclusive are supported only by numeric and dateand time-related types, and define bounds of value ranges. The facets totalDigits, fractionDigits are defined for the primitive type decimal, and thus all numeric types derived from decimal, and for no other types. 56 List Types We define a type representing a white-space-separated list of items using the list element, e.g.: <xsd:simpleType name="listOfDays"> <xsd:list itemType="weekday"> </xsd:simpleType> this introduces a type that takes values like “”, “Monday”, “Monday Monday”, “Tuesday Wednesday Thursday”, etc. List types can be restricted using the length-related facets, and the pattern, enumeration and whitespace facets. The <list> element may contain an anonymous <simpleType> element instead of having an itemType attribute. A list value is split according to its white space content prior to validation of the items in the list. 57 Union Types A union type takes values from any one of a set of base types. <xsd:simpleType name="maxOccursType"> <xsd:union memberTypes="xsd:integer"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="unbounded"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> The types in the union are specified in the list-valued attribute memberTypes, or by nested anonymous <simpleType> elements, or by a combination of the two, as above. Union types can be restricted using the pattern or enumeration facets. 58 Prohibiting Derivation You may, for some reason, have a simple type that you don’t want anybody to derive further types from. Do this by specifying the final attribute on the <simpleType> element. Its value is a list containing a subset of values from list, union, restriction, extension. • These specify which sorts of derivation are disallowed. Note extension is a way of deriving a complex type from a simple type. It will be discussed in the next section. Give the final attribute the value “#all” for blanket prohibition of any derivation from this simple type. • Can also prevent the value of individual facets from being changed in subsequent restrictions by specifying fixed="true" on the facet elements. 59 Complex Types 60 Element Content, and Attributes Simple types allow us to declare elements that have only parsed character content (no nested elements). E.g. the declaration: <xsd:element name="dayItHappened" type="weekday"/> might validate instance elements like: <dayItHappened> Monday </dayItHappened> <dayItHappened>Tuesday</dayItHappened> But if we need elements with element content, or elements with attributes, we must declare those elements to have complex type. 61 Complex Type Hierarchy We saw that a set of built in simple types were derived from xsd:anySimpleType, and that new simple types could be derived from a base type by restriction, list, or union. There are no built in complex types, other than the socalled ur-type, represented as xsd:anyType. All other complex types are derived by one or more steps of restriction and extension from xsd:anyType. • Complex types can also be created by extension of a simple type, but simple types are also notionally restrictions of xsd:anyType. 62 Restriction A restriction of a base type is a new type. All allowed instances of the new type are also instances of the base type. But the restricted type doesn’t allow all possibilities allowed by the base type. • Think of the example of restricting xsd:string to 4 characters using the length facet. Strings of length 4 are also allowed by the xsd:string, but the new type is more restrictive. • In the complex case, we might have a complex base type that allows attribute att optionally. A restricted type might not allow att at all. Another restriction of the same base might require att. • Or we might have a base type that allows 0 or more nested elm elements. The restricted type might require exactly 1 nested elm element. 63 Extension An extension of a base type is a new type. An extension allows extra attributes or extra content that are not allowed in instances of the base type. • At first brush this sounds like the opposite of restriction, but this isn’t strictly true. • If, for example, type E extends a type B by adding a required attribute att, then instances of B are not allowed instances of E (because they don’t have the required attribute). So we have that E is an extension of B, but there is no sense in which B could be a restriction of E. Some such inverse relation exists if all extra attributes and content are optional in the extended type, but this isn’t a required feature of extension. 64 Complex Content and Simple Content We have seen that XML Schema complex types define both some allowed nested elements, and some allowed attribute specifications. Complex types that allow nested elements are said to have complex content. But Schema distinguish as a special case complex types having simple content—elements with such types may have attributes, but they cannot have nested elements. This is presumably a useful distinction, but it does introduce one more layer of complexity into the syntax for complex type derivation. 65 Basic Forms of Complex Type Definition Restriction <complexType> <complexContent> <restriction base="type"> allowed element content allowed attributes </restriction> </complexContent> </complexType> Extension <complexType> <complexContent> <extension base="type"> extra element content extra attributes </extension> </complexContent> </complexType> <complexType> <simpleContent> <restriction base="type"> facet restrictions allowed attributes </restriction> </simpleContent> </complexType> <complexType> <simpleContent> <extension base="type"> extra attributes </extension> </simpleContent> </complexType> 66 Remarks When one restricts a type one generally must specify all allowed element content and attributes. When one extends a type one generally must specify just the extra element content and attributes. 67 Requirements on Base Type The base type must be a complex type in all cases except simpleContent/extension (lower right in table), in which case the base can be a simple type. If the derived type has complexContent, the base type must have complex content. • True for extension or restriction. • Under some conditions, using a special form described later, a base type with complex content can be restricted to a type with simple content. 68 Schematic Inheritance Diagram xsd:anyType restriction restriction restriction† Complex Types Simple Types restriction list union Simple Content restriction† extension restriction extension Complex Content restriction extension † see later for syntax 69 Defining a Complex Type with no Base? In the introductory lecture we seemed to avoid this complexity: didn’t we just define complex types out of “thin air”? Actually the XML Schema specification says that: <complexType> allowed element content allowed attributes </complexType> Is “shorthand” for So in reality we were directly restricting the ur-type, which allows any attributes and any content! <complexType> <complexContent> <restriction base="xsd:anyType"> allowed element content allowed attributes </restriction> </complexContent> </complexType> 70 Defining Element Content Where we wrote allowed element content or extra element content in the syntax for complex type definitions, what should appear is a model group. A model group is exactly one of: • an <xsd:sequence/> element, or • an <xsd:choice/> element, or • an <xsd:all/> element. (The element content appearing in the type definition may also be a globally defined model group, referenced through an <xsd:group/> element. The global definition—a named <xsd:group/> element—just contains one of the three elements above.) 71 Sequence A <xsd:sequence/> model group contains a series of particles. A particle is an <xsd:element/> element, another model group, or a wildcard. As expected, this model just says the element content represented by those items should appear in sequence. E.g. <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> says that exactly one occurrence of a title element is followed by any number of occurrences of paragraph elements. 72 Choice A <xsd:choice/> model group also contains a series of particles, with the same options as for sequence. The element information validated by this model should match exactly one of the particles in the choice. E.g. <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:sequence> <xsd:element ref="figure"/> <xsd:element ref="caption"/> <xsd:sequence> </xsd:choice> matches a sequence of paragraph elements interleaved with consecutive pairs of figure and caption elements. 73 All The <xsd:all/> model group is peculiar to XML Schema. All particles it contains must be <xsd:element/>s. The element information validated should match a sequence of the particles in any order. There are several constraints: • The maxOccurs attribute of each particle must be 1. • The minOccurs attribute of each particle must be 0 or 1. • The <xsd:all/> model group can only occur at the top level of a complex type’s content model, and must itself have minOccurs = maxOccurs = 1. In view of the fact minOccurs of a particle can be 0, subset might be a better name than all?? 74 Element Wildcard The element wildcard particle <xsd:any/> matches and validates any element in the instance document. • Though one can restrict the namespace of the matched element, as described below. E.g. <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref=“header"/> <xsd:any/> </xsd:sequence> matches a sequence of consecutive pairs of elements, where the first element in each pair is a header, and the second can be any kind of element. 75 Options on <xsd:any/> The <xsd:any/> element takes the usual optional maxOccurs, minOccurs attributes. Allows a namespace attribute taking one of the values: • ##any (the default), • ##other (any namespace except the target namespace), • List of namespace names, optionally including either ##targetNamespace or ##local. Controls what elements the wildcard matches, according to namespace. It also allows a processContents attribute taking one of the values strict, skip, lax (default strict), controlling the extent to which the contents of the matched element are validated. 76 Parsing and Determinism Recall the rule about determinism of content models in DTDs. We claimed XML retained this purely for compatibility with SGML. Perhaps surprisingly, XML Schema retains exactly the same rule, calling it the Unique Particle Attribution constraint. It has to be imposed slightly more carefully here because of the possibility of wild card particles and substitution groups (discussed later). • Unclear why it was retained. Perhaps to improve the efficiency of parsing, especially in the presence of substitution groups? Or to simplify the Particle Derivation OK constraints for restriction of complex types (see later)? 77 Mixed Content XML Schema score a big win over DTDs in the way mixed content is handled. One simply specifies the attribute mixed on the complexContent element, giving it the value true. • In the abbreviated form for restriction of the ur-type, the mixed attribute appears on the complexType element. This specifies that the element content defined by the model particles can be interleaved with character data (without limiting how the elements themselves are arranged). 78 Mixed Content Example This element declaration <xsd:element name="body"> <xsd:complexType mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice> </xsd:complexType> </xsd:element> allows the body element to contain <p/> and <a/> elements, with text interleaved anyhow between them. 79 mixed and Inheritance So an <xsd:complexContent/> with mixed="true" indicates a mixed complex type. And an <xsd:complexContent/> with mixed="false" (the default) indicates an element-only complex type. A mixed complex content type may be restricted to an element-only type (if the element content allows it). Perhaps surprisingly, an element-only complex content type may not be extended to a mixed type. 80 Restricting Mixed Content to Simple Content If the model group of a mixed complex type can match the empty sequence of elements, then the type may have content that is text-only. Then it is logically possible to restrict the type to one with simple content. There is a special syntax for this: <complexType> <simpleContent> <restriction base="mixed-complex-content-type"> <simpleType> usual content of simpleType element </simpleType> allowed attributes </restriction> </simpleContent> </complexType> 81 Expanded Complex Type Inheritance xsd:anyType restriction restriction restriction Complex Complex Content Types Simple Content restriction extension Mixed restriction restriction extension Elementrestriction only restriction extension 82 Empty Elements XML Schema doesn’t have any unique way of representing elements that must be empty. The simplest thing to do this is simply omit the allowed element content in a complex content restriction. Can such an element also be mixed (i.e. have pure text content)? • Logically it seems this should be possible (I believe it is allowed by Xerces). • But it seems to be forbidden by the XML Schema specification, which singles out this case and says such an element is strictly empty. 83 Attributes and Local Declarations 84 Defining Allowed Attributes Where we wrote allowed attributes or extra attributes in the syntax for complex type definitions, what should appear is sequence of attribute declarations in the form of <xsd:attribute/> elements. • These may be followed an optional attribute wildcard. (The attribute declaration list may also include globally defined attribute groups, referenced through <xsd:attributeGroup/> elements. These will be discussed later.) 85 Simple Attribute Declarations A straightforward example of an attribute declaration was given in the introductory lecture: <xsd:element name="figure"> <xsd:complexType> <xsd:attribute name="source" type="xsd:string"/> </xsd:complexType> </xsd:element> In general the value of the type attribute can be any simple type. • Though unusual, it is also allowed to include an anonymous <xsd:simpleType/> definition in the body of the <xsd:attribute/>, instead of specifying the type attribute. 86 Default Rules As with DTDs, one can specify whether the use of an attribute is optional (the default) or required. One can also specify a default value (if the attribute is optional). Alternatively one can specify a fixed value for the attribute (whether the attribute is optional or required). • default and fixed are mutually exclusive. 87 DTD Attribute Defaults Revisited Attribute list declaration: <!ATTLIST a val fix req opt CDATA "nothing" CDATA #FIXED "constant" CDATA #REQUIRED CDATA #IMPLIED> Instances of element a: <a val="something" fix="constant“ req="reading" opt="extra"/> <a req="no experience"/> <!-- OK: val = “nothing”, fix = “constant”, opt absent. --> <a fix="variable"/> <!-- Invalid! fix not “constant” and req unspecified. --> 88 Schema Attribute Occurrence Equivalent Schema declaration: <xsd:attribute name="val" type="xsd:string" use="optional" default="nothing"/> <xsd:attribute name="fix" type="xsd:string" fixed="constant"/> <xsd:attribute name="req" type="xsd:string" use="required"/> <xsd:attribute name="val" type="xsd:string“/> • Note fix and val implicitly have use="optional" (we could have omitted this specification for val too). • Unlike DTDs, it possible to have an attribute that is both fixed and required. 89 Complex Content Plus Attributes Putting things together, here is a declaration of a body element that allows mixed content plus a style attribute. <xsd:element name="body"> <xsd:complexType mixed="true"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <element ref="p"/> <element ref="a"/> </xsd:choice> <xsd:attribute name="style" type="xsd:string"/> </xsd:complexType> </xsd:element> 90 Simple Content plus Attributes Here is a declaration of an anchor element that allows simple content plus an href attribute. <xsd:element name="anchor"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string”> <xsd:attribute name="href" type="xsd:anyURI"/> <xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> 91 Attribute Wildcards An attribute wildcard is represented by an <xsd:anyAttribute/> element. There can be at most one such element in a complex type definition, and it must appear after any normal attribute declarations. Such an declaration allows any attribute, optionally limited by namespace. The namespace and processContents attributes on <xsd:anyAttribute/> work as for <xsd:any/>. 92 Attributes and Namespaces By default, attributes declared as we have illustrated (inside an <xsd:complexType/>) do not become part of the target namespace. • Instead these attributes are local properties of any element they are attached to. The element itself may or may not belong to a namespace. In instance documents, names of these attributes must not be prefixed with a namespace prefix. 93 Creating Attributes in a Namespace There are three ways to put attributes into the target namespace: • Declare them “globally”, directly inside the top level <xsd:schema/> element. Reference the attribute declaration inside the complex type definition (like element references), or • specify the attribute form="qualified" on a local <xsd:attribute/> declaration, or • specify the attribute attributeFormDefault="qualified" on the <xsd:schema/> element. After this, these attributes must be prefixed in instance documents with a namespace prefix. • Recall default namespace declarations (xmlns="namespace") don’t work for attributes: you must introduce a non-empty prefix. 94 Locally Declared Elements XML Schema goes to some lengths to maintain symmetry between elements and attributes. Because the most natural way of declaring attributes is locally— private to a complex type—it must therefore be possible to declare elements local to the complex type. • Even if this is less obviously natural for elements—it leads to some clumsy constraints, e.g.: two local element declaration particles with the same name in the model group of the same complex type must have the same type. The same rules apply: if an element is declared locally (inside an <xsd:complexType/>), by default it does not belong to a namespace. In this case its name must not be prefixed with a namespace prefix in instance documents. 95 Creating Elements in a Namespace There are three ways to put elements into the target namespace: • Declare them “globally”, directly inside the top level <xsd:schema/> element. Reference the element declaration inside the complex type definition, or • specify the attribute form="qualified" on a local <xsd:element/> declaration, or • specify the attribute elementFormDefault="qualified" on the <xsd:schema/> element. After this, these elements must be prefixed in instance documents with a namespace prefix (or there must be a default namespace declaration in effect). 96 elementFormDefault and attributeFormDefault Summary: • These attributes on the <xsd:schema/> element take the values “qualified” or “unqualified” • The defaults for both are “unqualified”. • They control whether or not elements and attributes declared locally in <xsd:complexType/> definitions belong to the target namespace. • This property can also be controlled by form attributes on the individual declarations. None of these attributes has any effect on elements or attributes declared globally (at the top level in the <xsd:schema/> element)! Effectively such declarations are all qualified. 97 Inheritance and Substitution 98 Polymorphism? We have presented the mechanisms by which new types can be derived from old types (albeit we have omitted some details for complex types). Through these mechanisms, inheritance provides useful ways to recycle existing definitions. But it doesn’t in itself provide all the benefits of OOP— in particular we have not presented any analogue of polymorphism. Schema tries to provide some of the OO flexibility in use of instances through type substitution and substitution groups. 99 Type Substitution The most basic mechanism for “polymorphism” is type substitution. In essence this says that if a particle (in a content model, say) is declared to be an element with a particular type, then the corresponding element item in the instance document may have type derived from the particle type. Actually this only introduces new possibilities if the derivation involves extension. 100 A Basis for Extension Suppose we have the complex type declaration: <xsd:complexType name="figureType"> <xsd:attribute name="source" type="xsd:anyURI"/> </xsd:complexType> and suppose this is used as follows: <xsd:element name="figure" type="figureType“/> <xsd:element name="report"> <xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"> <xsd:element ref="figure"/> </xsd:choice> </xsd:complexType> </xsd:element> i.e. a report is a sequence of interleaved paragraph and figure elements, and a figure just has an attribute referencing a source image file. 101 Extension Example Now suppose that, without modifying any existing definitions and declarations, we want to allow figures in reports to have captions. We can do this if we introduce the extended type: <xsd:complexType name="captionFigureType"> <xsd:complexContent> <xsd:extension base="figureType"> <xsd:element name="caption" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> • This complex type inherits the attribute source from its base type, and adds a nested caption element. 102 Example Instance Document <report xmlns="http://www.grid2004.org/ns/report4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid2004.org/ns/report4 report4.xsd"> <paragraph>Recently uncovered documents prove... </paragraph> <figure xsi:type="captionFigureType" source="notafake.jpg"> <caption>Irrefutable proof of ancient XML.</caption> </figure> </report> 103 xsi:type As illustrated above, the element information item may have any type derived by extension from the type the element was declared with. • In general it may be derived by a mixture of extension and restriction. This isn’t quite a free lunch, though. There is no way for an XML processor to automatically infer the type of an element instance; instead this approach requires the XML author explicitly specify the intended type using the xsi:type attribute. • This limits the attractiveness of this approach to “polymorphism”. 104 Substitution Groups A more author-friendly approach to document polymorphism is based on element declarations. This approach uses so-called substitution groups. • Each substitution group is a set of element declarations. • One of these is singled out as the head declaration. Where a content model includes a reference to the head as a particle, the instance document can have any member of the associated substitution group. 105 Substitution Group Example Suppose the earlier definitions of figureType, <figure/>, <report/>, and captionFigureType are in effect. Now suppose we declare a new element <captionFigure/>, having type captionFigureType, and belonging to a substitution group headed by <figure/>. Then a possible instance document would be: <report … > <paragraph>Recently uncovered documents prove... </paragraph> <captionFigure source="notafake.jpg"> <caption>Irrefutable proof of ancient XML.</caption> </captionFigure> </report> 106 Remarks Important things to note: • Again we haven’t modified the original declaration of the report element, which still says it contains figure elements. • Because captionFigure is in the substitution group of figure, automatically it is allowed to appear in place of figure in the instance. • We no longer need the clumsy xsi:type attribute; the actual type of the information element can now be easily inferred from the element name (through its declaration, described shortly). 107 Creating Substitution Groups Groups are implicit: the implementation is more like a new kind of inheritance hierarchy—one relating element declarations rather than type definitions. A new element declaration specifies at most one direct substitution group affiliation. This is another element declaration. The “affiliation” now heads a group containing the new declaration. • In practice an affiliation works almost exactly like a base type, except it involves element declarations, not types. • If the affiliation itself belongs to a different group, the new declaration automatically joins that group—generally an element can be in several (perfectly nested) groups. 108 Group Creation Example In our example we could declare captionFigure, as follows: <xsd:element name="captionFigure" type="captionFigureType" substitutionGroup="figure" /> • This says <figure/> is the substitution group affiliation of <captionFigure/>. • Or in other words <captionFigure/> is in the substitution group headed by <figure/>. • The type attribute here may be omitted: the type defaults to that of the substitution group affiliation (again emphasizing the analogy with inheritance). 109 Notional Substitution Group Hierarchy This way of looking at things isn’t part of the XML Schema specification, but it may be mnemonic: <xsd:any/> <figure/> <report/> <captionFigure/> 110 Substitution and Type Inheritance It is required that all elements in a substitution group headed by element <Name/> have either the same type as <Name/>, or a type derived from it by steps of extension and restriction. Note that substitution may be used without type inheritance. • In other words, all elements in the substitution group may have the same type as their head. • Consider the example of internationalization: you might want many interchangeable elements with identical structure but different names (for different languages). 111 Blocking Substitutions We have described two kinds of substitution involving an element: the structure of an element can be substituted using xsi:type, or the whole element can be substituted by a member of its substitution group. It is quite likely that a schema writer will want to block some such substitutions. • Many applications will require elements to have exactly the originally specified form. • We need a way to prevent this form being corrupted by (say) unexpected addition of an element to a substitution group. 112 block Attribute of <xsd:element/> The value of the block attribute on <xsd:element/> should be a list containing a possibly empty subset of the values extension, restriction, and substitution (or simply #all). It defines the disallowed substitutions for this element. • If a particle in a content model has substitution in its disallowed substitutions, the document instance may not replace the element by members of its substitution group. • If an element has extension in its disallowed substitutions, then neither xsi:type or a substitution group substitution allows the instance to validate against a type whose derivation from the particle type involves steps of extension. • Appearance of restriction in the disallowed substitutions has an analogous effect. 113 block Attribute of <xsd:complexType/> A block attribute may also be specified on the <xsd:complexType/> element. Its value is a list containing a subset of the values extension and restriction (or simply #all). It defines the prohibited substitutions for this type. • If the type of an element has extension in its prohibited substitutions, then neither xsi:type or a substitution group substitution are allowed to validate the instance against a type whose derivation from the particle type involves any extension steps. Such validation is also prevented if the prohibited substitutions of any intervening types in the chain of derivation include extension. • Appearance of restriction in the prohibited substitutions has an analogous effect. Note the block attributes of <complexType/> and <element/> are independent, and constraints from both must be satisfied. • But it is “as if” an element acquires all blocked substitutions of its type. 114 blockDefault Attribute of <xsd:schema/> Unless otherwise specified, all substitutions are allowed. You may want to change this globally to something more conservative. Do this by specifying the blockDefault attribute on the <xsd:schema/> element. • Allowed values for this attribute are the same as for the block attribute on <xsd:element/>. 115 Prohibiting Derivation The final attribute on <xsd:complexType/> works in the same way as the corresponding attribute for <xsd:simpleType/>. Its value may be either a list containing a subset of the values extension and restriction, or simply #all. It prohibits either or both kinds of derivation using this type as base. Although final and block can be used to similar ends, their modus operandi are quite different: • final controls how you define new types derived from this type. • block controls how you substitute elements of this type in the document instance. 116 Substitution Group Exclusions An <xsd:element/> declaration likewise allows a final attribute, with the same allowed values as final on <xsd:complexType/>. Its value defines the substitution group exclusions for this element, which control its use as the head of a substitution group. • If an element has extension in its substitution group exclusions, it may not be the substitution group affiliation of another element whose type is derived from the type of this element by steps including extension. • Appearance of restriction in the substitution group exclusions has an analogous effect. By all rights, it should be possible to put substitution in this set. But it isn’t! 117 finalDefault Attribute of <xsd:schema/> For completeness we mention that the <xsd:schema/> element allows a finalDefault attribute, which works in a way very much analogous to the blockDefault attribute. 118 Still to Come on Inheritance By no means have we yet covered every aspect of inheritance. Notably we haven’t discussed what exactly is a legal restriction or extension of a complex type (particularly with respect to the content model). This is quite complicated in general, and it will be covered in the final section. 119 XML Schema Identity Constraints 120 Identifiers and References Revisited Slightly extended version of an example from the lectures on DTDs: <agency> <agent name="Alice" boss="Alice"/> <agent name="Bob" boss="Alice"/> <agent name="Carole" boss="Alice"/> <agent name="Dave" boss="Bob"/> </agency> Using DTDs, we assumed name was declared with type ID, and attribute boss was declared with type IDREF. Bob Alice Carole Dave 121 Identity Constraints Recall that the attribute types ID and IDREF imply interesting constraints on values of those attributes: • Within any individual XML document, every attribute of type ID must be specified with a different value from every other attribute of type ID. • The value of any attribute of type IDREF must be the same as the value of an attribute of type ID specified somewhere in the same document. These properties are obviously very useful and natural if we need to identify individual elements in a document. XML Schema supports the ID and IDREF simple types. But it also introduces additional, much more general mechanisms for achieving similar ends. 122 Use of XPath In an earlier lecture-set we gave a brief introduction to XPath. • Recall that XPath is a notation for representing a subset of nodes in a single XML document. The basic idea of XML Schema identity constraints is to use XPath expressions to identify groups of “fields” within an XML document that act as either identifiers or references. • Uniqueness/existence constraints hold within/across these groups. More flexible than the DTD mechanism, because: • XPath allows one to single out more refined sets of fields. • May have multiple categories of identifier in the same document. 123 Example <xsd:element name="agency"> <xsd:complexType> <xsd:element ref="agent" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> <xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key> <xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key> </xsd:element> 124 General Remarks The element <xsd:key/> defines a key field called agentName. The element <xsd:keyref/> defines a key reference field called agentBoss. These definitions are inside the declaration of the element <agency/>. • This implies that the scope of the uniqueness and related constraints is an individual <agency/> element. • This may or may not be the top-level element of a document. The fields themselves are specified by XPath expressions (details follow). 125 Defining a Key We have the example: <xsd:key name="agentName"> <xsd:selector xpath="agent"/> <xsd:field xpath="@name"/> </xsd:key> • The name of the key is agentName. • The <xsd:selector/> element defines the set of nodes labeled by this key. In our case, it is the set of all agent elements nested directly in the agency element. • The <xsd:field/> element defines the field within each labeled node that acts as the key. In our case, the name attribute of the node. 126 Validity Constraints on Keys Every node identified by the XPath expression in the <xsd:selector/> element must have exactly one descendant node identified by the XPath expression in the <xsd:field/> element. • This descendant, whose value is the key field, must be an attribute or an element with simple type. No two nodes identified by <xsd:selector/> may have the same value for their key fields. • This constraint holds within the body of the scope element (the <agency/> element in our example). • But the same value of the key field is allowed on different <agent/> nodes inside different <agency/> elements. 127 Defining a Key Reference We have the example: <xsd:keyref refer="agentName" name="agentBoss"> <xsd:selector xpath="agent"/> <xsd:field xpath="@boss"/> </xsd:key> • The refer attribute is the name of the key to which we refer. • The <xsd:selector/> and <xsd:field/> elements identify the nodes whose values are the actual references. They work in essentially the same way as in <xsd:key/>. The two-stage approach to identifying the relevant fields is less obviously natural in this case. But it supports the generalization to multiple key fields, described below. • The name of the key reference is agentBoss—this attribute is required (though unclear what this name is used for??) 128 Multiple Key Fields A <xsd:key/> element can have multiple <xsd:field/> elements, e.g.: <xsd:key name="fullName"> <xsd:selector xpath=".//person"/> <xsd:field xpath="@firstName"/> <xsd:field xpath="@lastName"/> </xsd:key> • For validity, this implies every <person/> element in scope has firstName and lastName attributes with unique pairwise-combined values. A <xsd:keyref/> element that refers to this key must have exactly the same number of <xsd:field/> elements. 129 Relating Key References to Keys The fact that keys and key references are scoped to element declarations introduces some “interesting” complications. Things might be straightforward if a <keyref/> always referred to a <key/> defined in the same element declaration. You might be forgiven for thinking this should “obviously” be the case. But actually the Schema specification allows a <keyref/> to refer to a <key/> defined in a different element declaration. 130 Referencing Keys in Nested Elements Suppose a key, Key, is defined in the declaration of element B. Also suppose a key reference, Ref, refers to this key and is defined in the declaration of element A. Now a field of Ref—scoped to an instance of A—is allowed to point to fields of Key scoped to an instance of B that is a descendent of the A instance. • This can lead to ambiguous references, because the key uniqueness constraints apply only within a single B instance, and there could be several Bs nested in the A instance. • The specification gives a slightly clumsy recipe for resolving such ambiguities (illustrated below). 131 Features The rule on the previous slide can introduce interesting behavior even when the <xsd:keyref/> and the <xsd:key/> are defined in the same element declaration. • This can happen if instances of the element can nest inside one another. In the example on the next slide, the key is the value of <key/> elements directly nested inside a <scope/> element, and the reference is the value of a <ref/> element directly nested in a <scope/> element. The <scope/> elements are also allowed to self-nest. 132 An Interesting Case <xsd:element name="scope"> <xsd:complexType> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="key"/> <xsd:element ref="ref"/> <xsd:element ref="scope"/> </xsd:choice> </xsd:complexType> <xsd:key name="key"> <xsd:selector xpath="key"/> <xsd:field xpath="."/> </xsd:key> <xsd:keyref refer="key" name="ref"> <xsd:selector xpath="ref"/> <xsd:field xpath="."/> </xsd:keyref> </xsd:element> 133 Examples <scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref> </scope> <scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref> </scope> <scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <ref>keyval</ref> Illegal! </scope> <scope> <scope> <key>keyval</key> </scope> <scope> <key>keyval</key> </scope> <key>keyval</key> <ref>keyval</ref> </scope> 134 Remarks Examples here follow the rules in the section of the XML Schema specification called: Schema Information Set Contribution: Identity-constraint table. The rule is basically that a key reference can refer to a key field scoped to a descendant element. But if there are conflicts, you ignore any potential reference targets arising from children (this rule applies recursively). In the 3rd example (bottom left), all potential targets arise from children, and are conflicting, so they should be ignored. Thus the reference is illegal. • The 2nd and 4th examples OK: conflicts are resolved by ignoring targets from children, leaving just the local target. • Xerces 2.6.2, however, also accepts the 3rd example! 135 Uniqueness Constraints The <xsd:unique/> element works almost exactly like the <xsd:key/> element, except that it is not required that the identifying fields exist for every node identified by the selector. • If fields exist in the node instance, they must be unique across all selected nodes. A unique constraint cannot be the target of a keyref. 136 Namespaces The examples given in this section were simplified in that the XPath expressions did not allow for a target namespace. Recall that XPath expressions always require use of qualified names. If you are using identity constraints in a schema with a target namespace, you must declare a prefix for that namespace, and use that prefix on (say) element names appearing in the xpath attributes. 137 Imports and Includes [To Be Added] 138 “Particle Derivation OK” 139 Inheritance in OOP and XML We saw that XML Schema makes heavy use of a concept of type inheritance. This concept is clearly inspired by the corresponding concept in Object Oriented Programming. But the analogy between XML and OOP is by no means exact. In OOP, a class has a set of disjoint, essentially independent, named members (fields and methods). • In derivation, this set can be extended, or named members can be individually overridden. In XML, a complex type has a set of attributes and a content model. • The attributes behave much like the independent members of a class, and the set of attributes can naturally be extended during derivation. • The analogy works much less well for content models. The complex ordering and nesting relations within element content limit the options for extension. • And, while perhaps more “mathematically natural” than extension, we will see restriction of content models has its own implementation problems. 140 Extension and Restriction Unlike typical OOP programming languages, XML Schema distinguishes two different forms of type derivation, called extension and restriction. • The analogy between Schema type extension and OOP inheritance should be fairly clear. • The analogy between Schema type restriction and OOP inheritance may be less obvious. • It is based on the insight that when a new class is derived, the new constructors and methods generally introduce new sets of constraints or restrictions (“invariants”) on members already in the base class. Consider a class Square, which may be derived from a base class Rectangle. The derived class imposes the new invariant width=height. So OOP inheritance includes aspects of both extension and restriction. 141 Attributes and Complex Type Extension Recall typical syntax for extension is like: <complexType> <complexContent> <extension base="base-type"> extra element content extra attributes </extension> </complexContent> </complexType> The extra attributes are generally just added to the set of attributes of base-type. Some attributes in extra attributes may have the same name (and namespace) as attributes in base-type; any such attribute must also have identical type to its namesake in base-type. • But the new version could have a different default value, say. If extra attributes includes an attribute wildcard, it must represent a superset of any attribute wildcard in base-type. 142 Attributes and Complex Type Restriction If an attribute appearing in a restriction of a complex type is also an explicitly declared attribute of the base-type, then: • The simple type of the attribute in the new type must be identical to the attribute’s type in the base-type, or derived from it by steps of restriction. • If the attribute is fixed in the base-type, it must be fixed with the same value in the new type. • If the attribute is required in the base-type, so must it be in the new type. Otherwise, there must be a wild-card in the base-type that matches the attribute declared in the new type. Note: • If an attribute is required in the base-type, it must be an explicitly declared attribute of the new type. • If an attribute was optional in the base-type, it may be specified in the new type with use="prohibited". This is the same as omitting the attribute in the new type (and the attribute might still be allowed by a wildcard!) If there is an attribute wild-card in the restricted type, it must be a subset of a wild-card in base-type. 143 Content Models and Extension Consider an extension of a complex type with complex content that adds non-empty extra element content. The extra element content must be a particle, and the element content of the new type is <xsd:sequence> base-type element content extra element content </xsd:sequence> (unless the base-type content model was empty, when it is just the extra element content). Notes: • This would be illegal if the base-type element content was an <xsd:all/> particle. You can’t extend such content. • If the base-type element content is an <xsd:choice/>, there is no way to extend the set of choices: can only add extra particles in sequence. 144 Content Models and Restriction The idea of restricting a content model is fairly intuitive, e.g.: • Where there is an <xsd:choice/> of several particles, the restricted model may offer a reduced choice—perhaps it replaces the <xsd:choice/> with just one of the particles it contained. • Where there is an optional particle (say minoccurs="0" and maxoccurs="1") the restricted model might make the particle mandatory (minoccurs="1") or, conversely, simply omit it. More generally the restricted model may subset the minoccurs..maxoccurs range as it sees fit. • Where there is an <xsd:any/> wildcard (or an element particle that heads a substitution group) the restricted model might replace it by a more specific element particle. Although these ideas seem intuitive, it isn’t particularly easy to prove automatically that one content model is a valid restriction of another. 145 Particle Derivation OK Defining the conditions under which one particle is a legal restriction of another particle is one of the more complex parts of the (generally quite complex) XML Schema specification. You will find the rules in the section of the specification called Constraints on Particle Schema Components. The relevant subsections start with the rule called Particle Valid (Restriction). This gives some rules for reducing particles to a “canonical” form, then delegates to more specialized rules with names like Particle Derivation OK (X:Y – R), where X, Y, R depend on the case. 146 Canonical Form Before comparing two particles to see if one is a valid restriction of the other, both should be reduced to a certain canonical form: • Any occurrence of an element particle that is the head of a substitution group is replaced by an explicit <xsd:choice/> between element particles for all members of the substitution group. • Empty groups are discarded. • Redundant singleton <xsd:sequence/>, <xsd:choice/>, <xsd:any/> particles are replaced by the single particle they contain. • An “associative rule” is applied to eliminate <xsd:sequence/> particles nested inside other <xsd:sequence/> particles (subject to some conditions on minoccurs, maxoccurs). Likewise for <xsd:choice/>. 147 Comparing <sequence/> with <sequence/> There are many specific versions of the Particle Derivation OK rule—basically one for every kind of particle you might try to restrict to any other kind of particle. We don’t attempt to mention all of them here—just a couple of interesting cases. For example, consider the case where you are trying to restrict an <xsd:sequence/> particle in an existing content model to an <xsd:sequence/> particle in a new content model. The exact rule that takes care of this case is called Particle Derivation OK (All:All, Sequence:Sequence— Recurse). 148 All:All, Sequence:Sequence—Recurse The occurrence ranges (minoccurs, maxoccurs) of the original and restricted <sequence/> must be consistent with restriction. Less trivially, there exists an order-preserving mapping from the particles in the restricted <sequence/> to particles of the original <sequence/>, such that: • Each particle in the restricted <sequence/> is a valid restriction of its image particle (under the map). Here we recursively apply the definition of the Particle Derivation OK, hence the Recurse in the title. • Any particle of the original <sequence/> that is not in the range of the map is emptiable—i.e. can match empty content. It happens that the same rule is used for <all/> groups, hence the All:All in the title. 149 Schematic Example Original: <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> Restricted: <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="captionFigure" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> Arrows illustrate an order-preserving map with required properties: • title particle is (trivially) a valid restriction of title particle, and captionFigure is a valid restriction of figure. • The original paragraph particle is not in the range of the map, but is emptiable (because minOccurs is 0). 150 Determinism? The requirement in the Sequence:Sequence—Recurse rule that “there exists” a suitable map looks rather cavalier: how are we to actually discover whether this map exists? • In other words, the rule doesn’t seem to give a deterministic prescription for checking whether one model is a restriction of the other. 151 A Prescription A “greedy” prescription that will sometimes find a suitable, order-preserving map is this: • Visit the particles of the restricted model in turn, trying to find a match for each. At any time we have a “next candidate” particle from the original model, for possible matching (initially the first particle of the original model). • If the current particle in the restricted model is a valid restriction of the “next candidate”, take the candidate as the mapping of the current particle and carry on to the next particles in both models. • Otherwise, if the current particle is not a valid restriction of the candidate, but the candidate is emptiable, try again with the immediately following particle in the original model as “next candidate”. • Otherwise, this prescription fails to find a map. 152 A Case Where that Prescription Fails Original: <xsd:sequence> <xsd:element ref="paragraph" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="figure"/> minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="paragraph"/> </xsd:sequence> Restricted: <xsd:sequence> <xsd:element ref="paragraph"/> </xsd:sequence> The “greedy” prescription will try to match the paragraph particle in the restricted model to the first paragraph particle of the original model. But the resulting map is unsatisfactory, because then the final paragraph particle of the original model is not in the range of the map, nor is it emptiable. Meanwhile, in fact, “there exists” a suitable map: just map the paragraph particle of the restricted model to the final particle of the original. 153 Unique Particle Attribution to the Rescue!? But, the “Original” model on the previous slide is an illegal content model according to the Unique Particle Attribution rule! • Recall this is the XML Schema analogue of a rule about DTDs, which says content models must be “deterministic”. While the XML Schema specification doesn’t spell this out, it seems semi-plausible that, if content models satisfy the Unique Particle Attribution rule, then a simple greedy prescription will find the orderpreserving mapping required by Particle Derivation OK, if such a mapping exists. • This makes checking Particle Valid (Restriction) tractable. 154 Clause 1.5 Finally, we note that there is a slightly mysterious clause in the section of the Schema specification called Schema Component Constraint: Derivation Valid (Extension), which is supposed to ensure that, in a chain of derivation, nothing removed by a restriction may be added back by a subsequent extension. • We omit the details here! The rule isn’t very clearly stated in the specification (IMHO). 155 Conclusion In this section we have just briefly touched on the issues of what constitutes a valid extension or restriction of a content model. The general rules are complicated. If you intend to use these capabilities of XML Schema in non-trivial ways, expect surprises! 156