Transcript Slide 1
XML, DTD, XML Schema, and XSLT Jianguo Lu University of Windsor 1 Where we are • • • • • • • XML DTD XML Schema XML Namespace XPath DOM Tree XSLT 2 Name Conflict <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> <table> <name>African Coffee Table </name> <width>80</width> <length>120</length> </table> • Solution: add prefix to the tag names <h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table> <f:name>African Coffee Table </f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table> 3 Name spaces HTML name space Furniture name space table table td html tr body width name price th height length 4 XML namespace • An XML document may use more than one schema; • Since each structuring document was developed independently, name clashes may appear; • The solution is to use a different prefix for each schema – prefix:name <prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> 5 Namespace names • Namespace names are URIs – Many namespace names are in the form of HTTP URI. • The purpose of a name space is not to point to a location where a resource resides. – It is intended to provide a unique name that can be associated with a particular organization. – The URI MAY point to a schema. <prod:product xmlns:prod=http://example.org/prod> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> 6 Namespace declaration • A namespace is declared using an attribute starts with “xmlns”. • You can declare multiple namespaces in one instance. <ord:order xmlns:ord=“http://example.org/ord” xmlns:prod=“http://example.org/prod” > <ord:number> 123ABC123</ord:number> <prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </ord:order> 7 Default namespace declaration • Default namespace maps unprefixed element type name to a namespace. <order xmlns=“http://example.org/ord” xmlns:prod=“http://example.org/prod” > <number> 123ABC123 </number> <prod:product> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </order> 8 Scope of namespace declaration • Namespace declaration can appear in any start tag. • The scope is in the element where it is declared. <order xmlns=“http://example.org/ord”> <number> 123ABC123 </number> <prod:product xmlns:prod=“http://example.org/prod”> <prod:number> 557 </prod:number> <prod:size system=“US-DRESS”> 10 </prod:size> </prod:product> </order> 9 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The elements and datatypes that are used to construct schemas - schema - element - complexType - sequence - string come from the http://…/XMLSchema namespace Indicates that the elements defined by this schema - BookStore - Book - Title - Author - Date - ISBN - Publisher are to go in the http://www.books.org namespace From Costello 10 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.books.org" xmlns="http://www.books.org" elementFormDefault="qualified"> <xsd:element name="BookStore"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:string"/> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:schema> The default namespace is http://www.books.org which is the targetNamespace! This is referencing a Book element declaration. The Book in what namespace? Since there is no namespace qualifier it is referencing the Book element in the default namespace, which is the targetNamespace! Thus, this is a reference to the Book element declaration in this schema. From Costello 11 Import in XML Schema • Now with the understanding of namespace, we can introduce some more advanced features in XML Schema. • The import element allows you to access elements and types in a different namespace. Namespace A Namespace B A.xsd B.xsd <xsd:schema …> <xsd:import namespace="A" schemaLocation="A.xsd"/> <xsd:import namespace="B" schemaLocation="B.xsd"/> … </xsd:schema> C.xsd 12 Example Nikon.xsd Olympus.xsd Pentax.xsd Camera.xsd From Costello 13 Nikon.xsd Olympus.xsd Pentax.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.pentax.com" xmlns="http://www.pentax.com" elementFormDefault="qualified"> <xsd:complexType name="manual_adapter_type"> <xsd:sequence> <xsd:element name="speed" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> From Costello 14 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax="http://www.pentax.com" elementFormDefault="qualified"> <xsd:import namespace="http://www.nikon.com" schemaLocation="Nikon.xsd"/> <xsd:import namespace="http://www.olympus.com" schemaLocation="Olympus.xsd"/> <xsd:import namespace="http://www.pentax.com" schemaLocation="Pentax.xsd"/> <xsd:element name="camera"> <xsd:complexType> <xsd:sequence> <xsd:element name="body" type="nikon:body_type"/> <xsd:element name="lens" type="olympus:lens_type"/> <xsd:element name="manual_adapter“ type="pentax:manual_adapter_type"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:schema> Camera.xsd Here I am using the body_type that is defined in the Nikon namespace From Costello 15 <?xml version="1.0"?> <c:camera xmlns:c="http://www.camera.org" xmlns:nikon="http://www.nikon.com" xmlns:olympus="http://www.olympus.com" xmlns:pentax=http://www.pentax.com …… <c:body> <nikon:description>Ergonomically designed casing for easy handling </nikon:description> </c:body> <c:lens> <olympus:zoom>300mm</olympus:zoom> <olympus:f-stop>1.2</olympus:f-stop> </c:lens> <c:manual_adapter> <pentax:speed>1/10,000 sec to 100 sec</pentax:speed> </c:manual_adapter> </c:camera> Camera.xml <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.olympus.com" xmlns="http://www.olympus.com" elementFormDefault="qualified"> <xsd:complexType name="lens_type"> <xsd:sequence> <xsd:element name="zoom" type="xsd:string"/> <xsd:element name="f-stop" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:schema> <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nikon.com" xmlns="http://www.nikon.com" elementFormDefault="qualified"> <xsd:complexType name="body_type"> <xsd:sequence> <xsd:element name="description" type="xsd:string"/> </xsd:sequence> </xsd:complexType> From Costello </xsd:schema> 16 Include • The include element allows you to access components in other schemas – All the schemas you include must have the same namespace as your schema (i.e., the schema that is doing the include) – The net effect of include is as though you had typed all the definitions directly into the containing schema LibraryEmployee.xsd LibraryBook.xsd <xsd:schema …> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> … </xsd:schema> Library.xsd From Costello 17 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.library.org" xmlns="http://www.library.org" elementFormDefault="qualified"> <xsd:include schemaLocation="LibraryBook.xsd"/> <xsd:include schemaLocation="LibraryEmployee.xsd"/> <xsd:element name="Library"> <xsd:complexType> <xsd:sequence> <xsd:element name="Books"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Book" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Employees"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Employee" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> Library.xsd These are referencing element declarations in other schemas. From Costello 18 XML Path •XML •DTD •XML Schema •XML Namespace •XPath •DOM Tree •XSLT 19 XPath • Language for addressing parts of an XML document. – It operates on the tree data model of XML • XPath is a syntax for defining parts of an XML document • XPath uses paths to define XML elements – It has a non-XML syntax • XPath defines a library of standard functions – Such as arithmetic expressions. • XPath is a major element in XSLT and XML query languages • XPath is a W3C Standard 20 What is XPath • Like traditional file paths • XPath uses path expressions to identify nodes in an XML document. These path expressions look very much like the expressions you see when you work with a computer file system: – public_html/569/xml.ppt – Books/book/author/name/FirstName • Absolute path – /library/author/book • Relative path – author/book 21 XML path example /library <library location="Bremen"> /library/author <author name="Henry Wise"> <book title="Artificial Intelligence"/> //author <book title="Modern Web Services"/> /library/@location <book title="Theory of Computation"/> </author> //book[@title=“Artificia <author name="William Smart"> l Intelligence”] <book title="Artificial Intelligence"/> </author> <author name="Cynthia Singleton"> <book title="The Semantic Web"/> <book title="Browser Technology Revised"/> </author> </library> 22 XML Path Example • Address all author elements – /library/author – Addresses all author elements that are children of the library element node, which resides immediately below the root – /t1/.../tn, where each ti+1 is a child node of ti, is a path through the tree representation • Address all author elements – //author – Here // says that we should consider all elements in the document and check whether they are of type author – This path expression addresses all author elements anywhere in the document 23 XPath example • Select the location attribute nodes within library element nodes – /library/@location – The symbol @ is used to denote attribute nodes • Select all title attribute nodes within book elements anywhere in the document, which have the value “Artificial Intelligence” – //book/@title="Artificial Intelligence“ • Select all books with title “Artificial Intelligence” – /library/author/book[@title="Artificial Intelligence"] – Test within square brackets: a filter expression • It restricts the set of addressed nodes. – Difference with previous query. • This query addresses book elements, the title of which satisfies a certain condition. • Previous query collects title attribute nodes of book elements 24 root library author author author name author name book book book book Henry title title title title … Artificial intelligence … Artificial Intelligence name 25 XPath syntax • • A path expression consists of a series of steps, separated by slashes A step consists of – An axis specifier, – A node test, and – An optional predicate • An axis specifier determines the tree relationship between the nodes to be addressed and the context node – E.g. parent, ancestor, child (the default), sibling, attribute node – // is such an axis specifier: descendant or self – child::book select all book elements that are children of current node • A node test specifies which nodes to address – The most common node tests are element names • /library/author – E.g., * addresses all element nodes • /library/* – comment() selects all comment nodes • /library/commnets() 26 XPath syntax • Predicates (or filter expressions) are optional and are used to refine the set of addressed nodes – E.g., the expression [1] selects the first node – [position()=last()] selects the last node – [position() mod 2 =0] selects the even nodes • XPath has a more complicated full syntax. – We have only presented the abbreviated syntax 27 More examples • Address the first author element node in the XML document – //author[1] • Address the last book element within the first author element node in the document – //author[1]/book[last()] • Address all book element nodes without a title attribute – //book[not @title] 28 Where we are • • • • • • • XML DTD XML Schema XML Namespace XPath DOM Tree XSLT 29 How to process XML • XML does not DO anything • Process XML using general purpose languages – Java, Perl, C++ … – DOM is the basis • Process XML using special purpose languages – “translate the stock XML file to an HTML table.” • Transform the XML: XSLT – “tell me the stocks that are higher that 100.” • Query XML: XQuery 30 DOM (Document Object Model) • What: DOM is application programming interface (API) for processing XML documents – http://www.w3c.org/DOM/ • Why: – unique interface. – Platform and language independence. • How: It defines the logical structure of documents and the way to access and manipulate it – With the Document Object Model, one can • Create an object tree • Navigate its structure • Access, add, modify, or delete elements etc 31 XML tree hierarchy • XML can be described by a tree hierarchy Document Unit Sub-unit Document Parent Unit Child Sub-unit Sibling 32 DOM tree model • Generic tree model Parent – Node • • • • • Type, name, value Attributes Parent node Previous, next sibling nodes First, last child nodes Prev. Sibling Next Sibling Node First Child Last Child – Many other entities extends node • • • • Document Element Attribute ... ... 33 DOM class hierarchy DocumentFragment Document Text CDATASection CharacterData Attr Node Comment Element DocumentType Notation NodeList NamedNodeMap Entity EntityReference DocumentType ProcessingInstruction 34 JavaDoc of DOM API http://xml.apache.org/xerces-j/apiDocs/index.html 35 Remarks on javadoc • javadoc is a command included in JDK; • It is a useful tool generate HTML description for your programs, so that you can use a browser to look at the description of the classes; • JavaDoc describes classes, their relationships, methods, attributes, and comments. • When you write java programs, the JavaDoc is the first place that you should look at: – For core java, there is JavaDoc to describe every class in the language; – To know how to use DOM, look at the javaDoc of org.w3c.dom package. • If you are a serious java programmer: – you should have the core jdk javaDoc ready on your hard disk; – You should generate the javaDoc for other people to look at. • To run javadoc, type D>javadoc *.java This is to generate JavaDoc for all the classes under current directory. 36 Methods in Node interface • Three categories of methods – Node characteristics • name, type, value – Contextual location and access to relatives • parents, siblings, children, ancestors, descendants – Node modification • Edit, delete, re-arrange child nodes 37 XML parser and DOM DOM API DOM XML parser Your XML application DOM Tree • When you parse an XML document with a DOM parser, you get back a tree structure that contains all of the elements of your document; • DOM also provides a variety of functions you can use to examine the contents and structure of the document. 38 DOM tree and DOM classes <stocks> <stock exchange=“nasdaq”> <name> Amazon inc <symbol> <price> amzn 15.45 <stock Exchange=“nyse” > <name> IBM <price> 105 Element TextNode Node child 39 Use Java to process XML • Tasks: – How to construct the DOM tree from an XML text file? – How to get the list of stock elements? – How to get the attribute value of the second stock element? • Construct the Document object: – Need to use an XML parser (XML4J); – Remember to import the necessary packages; – The benefits of DOM: the following lines are the only difference if you use another DOM XML parser. 40 Get the first stock element <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> 41 Navigate to the next sibling of the first stock element <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> 42 Be aware the Text object in two elements <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp</name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> Question: How many children does the stocks node have? <stocks> text <stock exchange=“nasdaq”> text <name> text <symbol> text <price> Amazon inc amzn 16 <stock Exchange=“nyse” > text text text<name> IBM inc text <price> text text 102 43 Remarks on XML parsers • There are several different ways to categorise parsers: – Validating versus non-validating parsers; • It takes a significant amount of effort for an XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD; • If only want to find tags and extract information - use non-validating; • Validating or non-validating can be turned on or off in parsers. – Parsers that support the Document Object Model (DOM); – Parsers that support the Simple API for XML (SAX) ; – Parsers written in a particular language (Java, C++, Perl, etc.). 44 Where we are • • • • • • • XML DTD XML Schema XML Namespace XPath DOM Tree XSLT 45 History XSL (low-precision graphics, e.g.,HTML, text, XML) (high-precision graphics, e.g., PDF) XQuery XLink/ XPointer XSLT XSL XML Schemas XPath 46 XSLT(XML Stylesheet Language Transformation) • XSLT Version 1.0 is a W3C Recommendation, 1999 – http://www.w3.org/Style/XSL/ • XSLT is used to transform XML to other formats XSLT 1 XML XML XSLT 2 HTML XSLT 3 TEXT 47 XSLT basics • XSLT is an XML document itself • It is a tree transformation language XSLT XSLT processor XML • It is a rule-based declarative language – XSLT program consists of a sequence of rules. – It is a functional programming language. 48 XSLT Example: transform to another XML <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp </name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> ? <?xml version="1.0“> <companies> <company> <value>24 CAD </value> <name>amazon corp</name> </company> <company> <value>153 CAD </value> <name>IBM inc</name> </company> </companies> stock.xml • • • • output Rename the element names Remove the attribute and the symbol element Change the order between name and price. Change the US dollar to CAD. 49 A most simple XSLT 50 Template definition and call 51 If statement 52 XSLT rule: <xsl:template> <xsl:template match="stock"> <company> <value> <xsl:value-of select="price*1.5"/> CAD </value> <name> <xsl:value-of select="name"/> </name> <company> xslt template for <stock> </xsl:template> <?xml version="1.0" ?> <stocks> <stock exchange="nasdaq"> <name>amazon corp </name> <symbol>amzn</symbol> <price>16</price> </stock> <stock exchange="nyse"> <name>IBM inc</name> <price>102</price> </stock> </stocks> stock.xml <company> <value> get the value of <price>* 1.5, i.e. 24 CAD </value> <name> get the value of <name>, i.e amazon </name> </company> Part of the output 53 XSLT process model <xsl:template match="/"> <companies> <xsl:apply-templates select="stocks/stock”/> </companies> </xsl:template> <xsl:template match="stock"> <company> <value> <xsl:value-of select="price*1.5"/> CAD</value> <name> <xsl:value-of select="name"/> </name></company> </xsl:template> toXML.xsl apply template 1 to <stocks> <companies> apply template 2 to <stock> 1 <company> <value> get the value of <price>*1.5,i.e. 24 CAD </value> <name> get the value of <name>, i.e amazon </name></company> apply template 2 to <stock> 2 <company> value> get the value of <price>*1.5,i.e., 153 CAD </value> <name> get the value of <name>, i.e., IBM </name></company> </companies> xslt output 54 Transforming XML to HTML toHTML.xsl 55 Running XSLT from the client side • Browser gets the XML+XSLT, and interprets them inside the browser. Web server • How to specify the XSL associated with the XML file? – <?xml-stylesheet type="text/xsl" href="stock.xsl"?> • Advantages: – Easy to develop and deploy. • Disadvantages: – – – – Not every browser supports XML+XSL; Browsers do not support all XSLT features; Not secure: you only want to show part of the XML data; Not efficient. 56 Run XSLT from the server side • XSL processor transforms the XML and XSLT to HTML, and the web server send the HTML to the browser. • Popular tool: xalan java -classpath xalan/bin/xalan.jar org.apache.xalan.xslt.Process -in stock.xml -xsl stock.xsl -out stock.html XSL Processor Web server HTML HTML 57 Why XML is useful • Data exchange • Data integration 58 Why XML is useful(cont.) • Present to different devices 59 XML references • For XML and related specifications: www.w3c.org • For Java support for XML, like XML parser, XSLT processor: www.apache.org • For xml technologies: www.xml.com • XML integrated development environment: www.xmlspy.com 60