XML Overview / Schema / DOM Brent P. Christie Major USMC XML Overview  What is XML? – eXtensible Markup Language – Meta-markup language defined by.

Download Report

Transcript XML Overview / Schema / DOM Brent P. Christie Major USMC XML Overview  What is XML? – eXtensible Markup Language – Meta-markup language defined by.

XML
Overview / Schema / DOM
Brent P. Christie
Major USMC
XML Overview
 What is XML?
– eXtensible Markup Language
– Meta-markup language defined by the World Wide Web
Consortium (W3C, specification at
www.w3c.org/XML)
• Meta, define your own markup language
• Supercedes other markup languages, HTML
– XML is a subset of Standard Generalized Markup
Language (SGML)
• XML easier-to-use subset of SGML
• HTML an application of SGML
XML Overview
 What’s so great about it?
– Easy Data Exchange
• Non-proprietary, no patent or copyright (encoded)
• Stored as text, human readable
• Efficient storage, yes text is efficient, MS bloatware
– Customized Markup Languages
• Customized browsers or applications (IFX, CML)
– Self-Describing Data Specification
<?xml version="1.0" encoding="UTF-8"?>
<DOCUMENT>
<GREETING>
Hello From XML
</GREETING>
<MESSAGE>
Welcome to the wild and woolly world of XML.
</MESSAGE>
</DOCUMENT
XML Overview
– Structured and Integrated Data
• The logical design of a document (content) should be separate from its visual
design (presentation)
• Separation of logical and visual design
– promotes sound typography
– encourages better writing
– is more flexible
• XML can be used to define the logical design, while the XSL (eXtensible Style
Language) is used to define the visual design (usually by mapping XML into
HTML).
 How XML fits into the new HTML world:
– XML describes the logical structure of the document.
– CSS (Cascading Style Sheets) or other style language describes the visual
presentation of the document.
– The DOM (Document Object Model) allows scripting languages, such as
JavaScript to access document objects.
XML Overview
 SGML = Standard Generalized ML
– Defined by ISO 8879. Has been the standard, vendorindependent way to maintain repositories of structured
documentation for more than a decade
– A SGML document carries with it a grammar called a
Document Type Definition (DTD). The DTD defines
the tags and the meaning of those tags
– Presentation is governed by a style sheet written in the
Document Style Semantics and Specification Language
(DSSSL)
– Note that HTML is a fixed SGML application, a hardwired set of about 70 tags and 50 attributes, and does
not need to have a DTD.
XML Overview
 XML is SGML Lite
– XML is also an SGML application, but since XML is extensible
(XML is also a metalanguage), every XML document must be
accompanied by its DTD or schema
– XML is a compromise between the non-extensible, limited
capabilities of HTML and the full power and complexity of SGML
– XML offers “80% of the benefits of SGML for 20% of its
complexity”
• XML designers tried to leave out all the SGML that would be rarely
used on the web
• Note that XML specification is 30 pages and the SGML specification
is 500 pages.
– XML allows you to define your own tags and to describe nested
hierarchies of information
XML Overview
Why XML?
– XML was created so that richly structured
documents could be used over the web, nothing
else is practical
– HTML, as we've already discussed, comes
bound with a set of semantics and does not
provide arbitrary structure
– SGML provides arbitrary structure, but is too
difficult/expensive to implement just for a web
browser.
XML Design Goals
1.
2.
3.
4.
5.
XML shall be usable over the Internet
XML shall support a variety of applications
XML shall be compatible with SGML
It shall be easy to write programs that process XML documents
Optional features in XML shall be kept to the absolute minimum,
ideally zero
6. XML documents should be human-legible and reasonably clear
7. Design of XML should be prepared quickly
8. Design of XML shall be formal and concise
9. XML documents shall be easy to create
10. Terseness in XML markup is of minimal importance
XML and Related Acronyms
 Document Type Definition (DTD), which defines the tags and their
relationships
 Extensible Style Language (XSL) style sheets, which specify the presentation
of the document
 Cascading Style Sheets(CSS) less powerful presentation technology without
tag mapping capability
 XPATH which specifies location in document
 XLINK and XPOINTER which defines link-handling details
 Resource Description Framework (RDF), document metadata
 Document Object Model (DOM), API for converting the document to a tree
object in your program for processing and updating
 Simple API for XML (SAX), “serial access” protocol, fast-to-execute protocol
for processing document on the fly
 XML Namespaces, for an environment of multiple sets of XML tags
 XHTML, a definition of HTML tags for XML documents (which are then just
HTML documents)
 XML schema, offers a more flexible alternative to DTD
XML Overview
 XML constraints:
1. Well-formedness, W3C, if its not well-formed its not
XML.
–
–
–
–
A data object is an XML document if it is well-formed, as
defined in this specification. A well-formed XML document
my in addition be valid if it meets certain further constraints
Properly nested and nonabbreviated starting and ending tags
are used, syntax rules.
Stops browsers from fixing bugs
Allows parsing of document
–
By, well defined encapsulation mechanism allowing
designated sections of the data to be accessed
programmatically.
2. Validity.
–
Obey DTD or schema
DTD
 The DTD specifies the logical structure of the document; it is a
formal grammar describing document syntax and semantics
 The DTD does not describe the physical layout of the document;
this is left to the style sheets and the scripts
 It is no mean task to write a DTD, so most users will adopt
predefined DTDs (or can write an XML document without a
DTD).
 DTDs can be written in separate files to facilitate re-use.
 Content-providers, industries and other groups can collaborate
to define sets of tags: the essence of “any” field (physics, music
…) is captured in a domain specific DTD
 DTDs store all data as text #PCDATA. This lack of precision is
one of the reasons XML schemas were developed.
<?xml version="1.0"?>
<!DOCTYPE BOOK [
<!ELEMENT p (#PCDATA)>
<!ELEMENT BOOK
(OPENER,SUBTITLE?,INTRODUCTION?,(SECTION | PART)+)>
<!ELEMENT OPENER
(TITLE_TEXT)*>
<!ELEMENT TITLE_TEXT
(#PCDATA)>
<!ELEMENT SUBTITLE
(#PCDATA)>
<!ELEMENT INTRODUCTION (HEADER, p+)+>
<!ELEMENT PART
(HEADER, CHAPTER+)>
<!ELEMENT SECTION
(HEADER, p+)>
<!ELEMENT HEADER
(#PCDATA)>
<!ELEMENT CHAPTER
(CHAPTER_NUMBER, CHAPTER_TEXT)>
<!ELEMENT CHAPTER_NUMBER (#PCDATA)>
<!ELEMENT CHAPTER_TEXT (p)+>
]>
<BOOK>
<OPENER>
<TITLE_TEXT>
All About Me
</TITLE_TEXT>
</OPENER>
<PART>
<HEADER>Welcome To My Book</HEADER>
<CHAPTER>
<CHAPTER_NUMBER>CHAPTER 1</CHAPTER_NUMBER>
<CHAPTER_TEXT>
<p>Glad you want to hear about me.</p>
<p>There's so much to say!</p>
<p>Where should we start?</p>
<p>How about more about me?</p>
</CHAPTER_TEXT>
</CHAPTER>
</PART>
</BOOK>
Schema vs. DTD
 Dismissing DTD’s
– DTD simple, easy to use but limited
– Schemas far more powerful and precise
• Not only syntax, as in DTD, but also:
– specify actual data types of each element’s content
» simple and complex(sub-type)
– inheritance, syntax from other schemas
– annotate schemas
– multiple namespaces
– min and max occurrence of element
– create list types
– create attribute groups
– restrict the ranges of values that elements can hold
– restrict what other schemas can inherit from yours
– merge fragments of multiple schemas together
– require that attribute or element values be unique
Purchase Order Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema">
<xsd:annotation>
<xsd:documentation>
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
Purchase Order Schema
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
use="fixed" value="US"/>
</xsd:complexType>
Purchase Order Schema
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU"/>
</xsd:complexType>
Purchase Order Schema
 </xsd:element> <!– End item specification 
 </xsd:sequence> <!– End sequence for items specification 
 </xsd:complexType> <!– End items specification 
 <!-- Stock Keeping Unit, a code for identifying products -->
 <xsd:simpleType name="SKU">
 <xsd:restriction base="xsd:string">
 <xsd:pattern value="\d{3}-[A-Z]{2}"/>
 </xsd:restriction>
 </xsd:simpleType>
 </xsd:schema>
DOM vs. SAX
 SAX (Simple API for XML) and DOM (Document Object Model) were
created to serve the same purpose, which is giving you access to the
information stored in XML documents using any programming language (and
a parser for that language). However, both of them take very different
approaches to giving you access to your information.
 What is SAX?
– SAX chooses to give you access to the information in your XML document, not as
a tree of nodes, but as a sequence of events!
– SAX chooses not to create a default Java object model on top of your XML
document (like DOM does).
• Faster
• Necessitates
–
–
creation of your own custom object model
creation of a class that listens to SAX events and properly creates your object model.
– In the case of DOM, the parser does almost everything, read the XML document in,
create a Java object model on top of it and then give you a reference to this object
model (a Document object) so that you can manipulate it.
DOM vs. SAX
– All SAX requires is that the parser should read in the XML document, and fire a
bunch of events depending on what tags it encounters in the XML document
– You are responsible for interpreting these events by writing an XML document
handler class, which is responsible for making sense of all the tag events and
creating objects in your own object model. So you have to write:
• your custom object model to "hold" all the information in your XML document into
• a document handler that listens to SAX events (which are generated by the SAX parser as
its reading your XML document) and makes sense of these events to create objects in your
custom object model.
 What kinds of SAX events are fired by the SAX parser?
– will fire an event for every open tag, and every close tag. It also fires events for
#PCDATA and CDATA sections
– SAX also fires events for processing instructions, DTDs, comments
– your handler has to interpret these events (and the sequence of the events)
and make sense out of them.
DOM vs. SAX
 What is the Document Object Model (DOM)?
–
–
DOM gives you access to the information stored in your XML document as a hierarchical
object model.
DOM creates a tree of nodes (based on the structure and information in your XML document)
and you can access your information by interacting with this tree of nodes.
DOM vs SAX

Once a document object tree has been created (by the
XML parser, or your own code), you can access elements
in that tree and you can also modify, delete and create
leaves and branches by using the interfaces in the API.
DOM vs SAX
Things to think about
– DOM is W3C standardize
– Level 3 recommendation will address content models
(DTD and schemas)
– Tree-based APIs put a great strain on system resources,
document is large. Furthermore, some applications need
to build their own, different data trees, and it is very
inefficient to build a tree of parse nodes, only to map it
onto a new tree
What do you think?
Questions