eXtensible Markup Language

Download Report

Transcript eXtensible Markup Language

eXtensible Markup Language
Jesús Ibáñez, Toni Navarrete, Josep
Blat
Universitat Pompeu Fabra
eXtensible Markup Language
•
•
•
•
New Internet mark-up metalanguage
Previously: SGML, HTML, DHTML’s
Extensibility, structure and validation
SGML adaptation for WWW
eXtensible Markup Language
• Defined as standard by W3C
(Generic SGML Editorial Review
Board - XML Working Group)
• XML != HTML++ ;
XML == SGML-• XML, DTD (Document Type
Definition) and XSL (eXtensible
Style Language)
Main Characteristics
 Describing semantically document content
 Uncoupling semantic description from
presentation
 Allowing each user community to define its
own labels, for instance: <PRICE>, <AUTHOR>,
<SECTION>, <DATE>, <IMPORTANCE
LEVEL="Expert">
XML Example (without DTD)
<?XML version="1.0" standalone="yes"?>
<conversation>
<greeting>Hello world!</greeting>
<answer>Stop it, I’m getting off!</answer>
</conversation>
Example with DTD (1)
<!DOCTYPE Book[
<!ELEMENT Book(Title, Author, Date, ISBN, Publisher)
<!ELEMENT Title(#PCDATA)>
<!ELEMENT Author(#PCDATA)>
<!ELEMENT Date(#PCDATA)>
<!ELEMENT ISBN(#PCDATA)>
<!ELEMENT Publisher(#PCDATA)>
]>
Example with DTD (2)
<?xml version="1.0"? standalone=“no”>
<!DOCTYPE Book SYSTEM "file://localhost/xmlcourse/xsl/Book.dtd">
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillan Publishing</Publisher>
</Book>
DTDs
• Allow to create new sets of labels
• Examples:
–
–
–
–
–
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Disk (Disk)+>
<!ELEMENT Book (Book)*>
? (0 or 1) , (sequence)
| (option)
Attributes:
(1 or more)
(0 or more)
• <!ATTLIST ARTICLE DATE CDATA>
(CDATA means Character Data)
• <!ATTLIST PERSON GENDER (male | female) #IMPLIED>
(optional)
• <!ATTLIST PERSON GENDER (male | female) “male”
#REQUIRED>
(required)
DTDs
<!DOCTYPE Discography[
<!ELEMENT Discography (disk)*>
<!ELEMENT Disk (Title, Group, Song*)>
<!ELEMENT Title(#PCDATA)>
<!ELEMENT Group(#PCDATA)>
<!ELEMENT Song (titleS, Duration>
<!ELEMENT titleS(#PCDATA)>
<!ELEMENT Duration(#PCDATA)>
]>
DTDs
<
Discography>
< Disk>
< Title>Brother in arms</ Title>
< Group>Dire Straits</ Group>
< Song>
< titleS>Money for nothing</ titleS>
< Duration>5:20</ Duration>
</ Song>
< Song>
<titleS>So far away</titleS>
<duration>4:10</duration>
</ Song>
...
</Disk>
<Disk>
<Title>On every street</Title>
<Group>Dire Straits</Group>
<Song>
...
</Disk>
</Discography>
DTDs
<!DOCTYPE publications[
<!ELEMENT publications (disk | book)*>
<!ELEMENT book ... >
<!ELEMENT disk ... >
]>
DTDs
<publications>
<disk>
<titledisk>Brother in arms</titledisk>
<group>Dire Straits</group>
<song>
<titleS>Money for nothing</titleS>
<duration>5:20</duration>
</song>
...
</disc>
<book>
<titlebook>Cien años de soledad</titlebook>
<writer>Gabriel García Márquez</writer>
...
</book>
<book>
<titlebook>La ciudad de los prodigios</titlebook>
<writer>Eduardo Mendoza</writer>
...
</book>
</publications>
DTDs
<?xml version="1.0"?>
<!DOCTYPE file [
<!ELEMENT file (name+, surname+, address+, picture?)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST name sex (male|female) #IMPLIED>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT address (#PCDATA)>
<!ELEMENT picture EMPTY>
]>
<file>
<name sex=“male”>Toni</name>
<surname>Navarrete</surname>
<surname>Terrasa</surname>
<address>Rambla 32</address>
</file>
Well formed vs valid
• Valid XML: the content conforms to the
rules of the associated DTD.
– Completeness, good format and attribute values
of the XML data is ensured.
• Well formed: adjusted to XML syntax
– An XML document without DTD can be well
formed but, of course, cannot be valid.
XML Schemata
• XML Schemata to define the structure of XML
documents (same as DTDs)
• BUT in XML syntax. Advantage: same parser to
validate, tools for dynamic creation
• Use of Namespaces
• Improved data type definition (41 instead of 10,
plus user-defined)
• Object orientation allows new types by extension
or restriction of previous ones
• Validation (a document wrt a scheme, a scheme
wrt scheme of schemes)
Schema definition
• An XML document whose root is “schema” and
within it elements and attributes are defined:
<?xml version="1.0“?>
<schema>
... elements and attributes definition
</schema>
• element definition
<element name=“name of the element”
type=“type of the element”
[options...]
>
Simple types of elements
– string: characters chain
– boolean (false, 0, true, 1)
– float (32 bits)
– double (64 bits)
– decimal (integer)
– timeDuration
– recurringDuration (several subtypes)
– binary
– uriReference (Uniform Resource Indicator)
And derived from these basic ones
Data type structure
<?xml version="1.0“ encoding="ISO-8859-1“?>
<bookshop>
<book isbn="84-111-1111-1">
<title>El Quijote</ title>
<author>Miguel de Cervantes</author>
<publisher>Plaza y Janés</publisher>
<character>Don Quijote</character>
<character>Sancho Panza</character>
<character>Dulcinea</character>
<character>Rocinante</character>
</book>
<book isbn="84-222-2222-2">
<title>La ciudad de los prodigios</ title>
<author>Eduardo Mendoza</author>
<publisher>Seix-Barral</publisher>
<character>Onofre Boubila</character>
<character>Efren Castells</character>
</book>
<book isbn="84-333-3333-3">
<title>Cien años de soledad</title>
<author>Gabriel García Márquez</author>
<publisher>Planeta</publisher>
<character>Aureliano Buendía</character>
</bookshop>
Example
XML document previous
to schema definition
Building blocks: simple elements
and cardinality
• Simple elements:
<element name=“title" type="string" />
<element name="author" type="string" />
<element name=“publisher" type="string" />
<element name=“character"
minOccurs="0" maxOccurs="unbounded" />
• A DTD would be like:
<!ELEMENT title (#PCDATA)>
• In the cardinality definition we replace the
DTD symbols ?, *, +
Building blocks: Complex types
• The element book is composite, thus we
define it as a complex type:
<element name=“book">
<complexType>
<sequence>
<element name=“title" type="string" />
<element name="author" type="string" />
<element name=“publisher" type="string" />
<element name=“character" minOccurs="0"
maxOccurs="unbounded" />
</sequence>
</complexType>
</element>
Alternative: naming complex
types
• We could also define a complex type with a
name:
<element name=“book” type=“Booktype” />
<complexType name=“Booktype”>
<element name=“title" type="string" />
<element name="author" type="string" />
<element name=“publisher" type="string" />
<element name=“character" minOccurs="0"
maxOccurs="unbounded" />
</complexType>
Remark: the combination of both
is not allowed
<element name=“book” type=“Booktype”>
<complexType name=“Booktype”>
<element name=“title" type="string" />
<element name="author" type="string" />
<element name=“publisher" type="string" />
<element name=“character" minOccurs="0"
maxOccurs="unbounded" />
</complexType>
</element>
Building blocks: empty elements
• Elements such as HTML tags <hr> or <img
...> are empty
<hr />
<img src=“image.gif” />
• Empty has to be declared as an implicit
complex type
<element name=“hr”>
<complexType content=“empty” />
</element>
<element name=“img”>
<complexType content=“empty”>
<attribute name=“src” type=“string” />
</complexType>
</element>
A level upwards ...
• Let us define “bookshop”:
<element name=“bookshop">
<complexType>
<element name=“book"
minOccurs="0” maxOccurs="unbounded">
<complexType>
...
</complexType>
</element>
</complexType>
</element>
A schema definition is a
BOTTOM-UP process
Attribute definition
• Elements can have attributes associated to them
• In DTDs, we would write:
<!ATTLIST book isbn #REQUIRED>
In XML Schema:
<attribute name=“name of the attribute”
type=“type of the attribute”
[options of the attribute ...]
>
Attribute definition
• At the end of the element definition
<element name=“book" minOccurs="0" maxOccurs="unbounded">
<complexType>
<element name=“title" type="string" />
<element name="autor" type="string" />
<element name=“publisher" type="string" />
<element name=“character"
minOccurs="0" maxOccurs="unbounded" />
<attribute name="isbn" type="string" />
</complexType>
</element>
General ordering
• The definitions are ordered for a better
legibility:
– 1) Simple types definition
– 2) Attributes definition
– 3) Complex types definition
Referencing the schema
• We then add the schema reference in the XML
document: assume it is book.xml and bookshop is
book.xsd then we would write:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookshop
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”
xsi:noNamespaceSchemaLocation=“book.xsd”
>
...
</bookshop>
Namespaces
• An XML Namespace is a collection of names (of
elements and attributes) identified by an URI
• Namespaces are a very flexible tool. The re-use of
schemata, names, mixing them is promoted.
• For instance we could use elements from two name
spaces
< BOOKS>
<bk: BOOK
xmlns:bk="urn: BookLovers.org:BookInfo“
xmlns:money="urn:Finance:Money">
<bk:TITLE>A Suitable Boy</bk:TITLE>
<bk:PRICE money:currency="US Dollar">22.95</bk:PRICE>
</bk:BOOK>
</BOOKS>
Namespaces
• http://www.w3.org/2000/10/XMLSchema
– This is the Namespace for the schemata. Suffix xsd is
used; if none, it is the default namespace
• http://www.w3.org/2000/10/XMLSchema-instance
– Namespace for the documents instantiated from a
schema. The prefix xsi is usually used.
Example
<schema xmlns="http://www.w3.org/2000/10/XMLSchema ”
targetNamespace="http://www.upf.es/namespaces/Book”
elementFormDefault="qualified”
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:bk="http://www.publishing.org/namespaces/Book">
1
2
3
Indicates the default namespace, which is XMLSchema
Indicates that the elements and attributes in this schema are defined upon the
namespace http://www.upf.es/namespaces/Book
Indicates that all the elements created in this namespace and used in the
instantiated documents have to be qualified with a prefix (if we had used
unqualified, only the global elements could go)
1
2
3
Example (2)
<schema xmlns="http://www.w3.org/2000/10/XMLSchema ”
targetNamespace="http://www.upf.es/namespaces/Book”
elementFormDefault="qualified”
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:bk=" http://www.upf.es/namespaces/Book">
4
5
6
7
4
5
6
7
Indicates that this XML document is instantiated from the general Schema
on Schemata
This is the namespace where the attribute schemaLocation is defined
The namespace for the general Schema on Schemata
URI of this Schema on Schemata
Example (3)
<schema xmlns="http://www.w3.org/2000/10/XMLSchema ”
targetNamespace="http://www.upf.es/namespaces/Book”
elementFormDefault="qualified”
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”
xsi:schemaLocation=
"http://www.w3.org/2000/10/XMLSchema
http://www.w3.org/2000/10/XMLSchema.xsd"
xmlns:bk="http://www.upf.es/namespaces/Book">
8
8
We give a prefix to the target namespace to facilitate the use in documents, for
instance:
<element ref=“bk:Title" minOccurs="1" maxOccurs="1"/>
Example (and 4)
• In the instantiated document:
<bookshop xmlns ="http://www.upf.es/namespaces/Book”
xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”
xsi:schemaLocation=“http://www.upf.es/namespaces/book.xsd">
1
2
3
We define the default namespace of the document
We include the namespace where schema instantiation is defined (xsi)
With schemaLocation we specify where is the Schema for this document
(book.xsd)
1
2
3
Other important concepts
•
•
•
•
•
ID and IDREFS
DOM (Document Object Model)
X-path
X-pointer
X-link
ID and IDREFS
• ID attribute for unique identification of element.
Similar role of URI. Example assigning the
identity “attack”:
<paragraph id=“attack”>Suddenly the skies were filled
with aircraft</paragraph>
• IDREFS (identity reference) easiest way of
referring to an ID. Example: In a DTD defined
attributes of employee “empnumber” as an ID and
“boss” as IDREFS; here we say that Hank’s ID is
126 and his boss is 124 (defined earlier):
< employee empnumber=“emp126” boss=“emp124”>
Hank</employee>
DOM (Document Object Model)
• DOM is a technology for accessing and
manipulating parts of an XML document
• DOM models a document as a tree whose
nodes are its elements
• Then some properties and methods exist for
the objects, allowing the access and
manipulation
X-PATH
• X-Path is a language for referencing parts of
an XML document
• It is used, for instance, to transform a
document through XSL
• X-Path is based upon DOM; and uses paths
(similar to URLs) to reference parts of a
document
X-POINTER
• X-Pointer is a language for pointing at a
part of an XML document
• X-Pointer uses X-path for pointing
• X-Pointer enables linking
Linking using XML: X-LINK
• X-Link is a language for describing how to
link resources in XML
• We use attributes for the element link in
the NameSpace xlink at
"http://www.w3.org/XML/XLink/1.0"
• The attributes are used to describe endpoints, traversal, effect, resources
Tools
•
•
•
•
•
•
•
XML Browsers (visualisers)
XML Editors
XML Parsers
XML Servers
Relational DB to XML converters
XSL Editors
XSL Processors
XSL
• Allows to incorporate a design into an XML
document, generating HTML, PDF, mail,
SMS message, ...
• Using CSS and DSSSL (SGML)
XSL
XML
Transformation Engine
(XSL Parser)
HTML
XSL
<?xml version="1.0"?>
<!DOCTYPE BookCatalogue SYSTEM "file://localhost/xml-course/xsl/BookCatalogue.dtd">
<BookCatalogue>
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>July, 1998</Date>
<ISBN>94303-12021-43892</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper &amp; Row</Publisher>
</Book>
</BookCatalogue>
XSL
Document
/
PI
<?xml version=“1.0”?>
DocumentType
Element
BookCatalogue
<!DOCTYPE BookCatalogue ...>
Element
Book
Element
Book
Element
Title
Text
My Life ...
Element
Book
...
...
Element
Author
Element
Date
Element
ISBN
Text
Text
July, 1998
Text
Text
94303-12021-43892
McMillin Publishing
Paul McCartney
Element
Publisher
XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="BookCatalogue">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Book">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Title">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Author">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ISBN">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Publisher">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<HTML>
<HEAD>
<TITLE>Book Catalogue</TITLE>
</HEAD>
<BODY>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
<xsl:template match="BookCatalogue">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Book">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Title">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Author">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="ISBN">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Publisher">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="."/>
</xsl:template>
</xsl:stylesheet>
BookCatalogue.xsl
added these
XML-based formats
• XML is an architecture not an application
–
–
–
–
–
–
–
–
SMIL (Synchronized Multimedia Integration Language)
RDF (Resource Description Framework) for metadata
CDF (Channel Definition Format) canales Microsoft
MathML (Mathematical Markup Language)
CML (Chemical Markup Language)
BSML (Bioinformatic Sequence Markup Language)
JML
WIDL (B2B integration)
Processing
• Two orientations to process XML
documents using Java as programming
language:
 DOM (Document Object Model)
 tree structure (nodes, elements and text), most used
 SAX (Serial Access with the Simple API for XML)
 event based
 Fastest, less memory requirements, more difficult to
program
Some references
• http://www.w3.org/
– Official web with all the standards
• http://www.xml.com/
– Web from O’Reilly publishers. A lot of good
documentation and resources.
• http://www.xfront.com/
– Very good tutorials of XSL and XML-Schema
• http://xml.apache.org
– Apache parsers and documentation (Xerces, Xalan, ...)
• XML and Java. B. McLAUGHLIN. O’Reilly, 2000
– Interesting about their combination using Apache
parsers