XML – a meta language

Download Report

Transcript XML – a meta language

XML – a meta language
Howell Istance and Peter Norris
School of Computing
De Montfort University
© De Montfort University, 2001
1
Origins of HTML
• Initial set of tags defined to do a limited job of cross
referencing scientific papers
• Little semantic content in tags, although <address> has
meaning in terms of what tag refers to
• Tag set was small and user community of html grew
quickly – html was easy to learn and use
© De Montfort University, 2003
2
HTML
• Tags in HTML concerned with presentation of data
• Tags have no semantic meaning: <H1> indicates ‘header’
but contents of document could be anything
• Search engines rely on <META> tag to provide keywords,
no means of discerning content from marked up document
• Definition of tags is not extensible
• Cascading style sheets provide some separation of content
and presentation
• One of original aims of SGML was complete separation of
content and presentation
© De Montfort University, 2003
3
XML (eXtensible Markup Language)
• Simplified version of SGML, enabling users to define their
own language
• XML is not a tag set, but a meta language
• Tags not concerned with how to render data, but instead
define content
– (HTML) <b> indicates that text is bold
– (XML) <employee> indicates that data is about employees
• Definition is either implicit (deduced from document
structure) or is explicit (defined in Document Type
Definition or DTD)
© De Montfort University, 2003
4
students.html
<html>
<head> </head>
<body>
<h2>Student List</h2>
<ul>
<li> 9906789 </li>
<li>Adam</li>
<li>[email protected]</li>
<li>yes - final </li>
</ul>
<ul>
<li> 9806791 </li>
<li>Adrian</li>
<li>[email protected]</li>
<li>no</li>
</ul>
</body>
</html>
© De Montfort
University, 2003
Specifies
presentation
students.xml
<?xml version = "1.0"?>
<student_list>
<student>
<id> 9906789 </id>
<name>Adam</name>
<email>[email protected]</email>
<bsc level=“final”>yes</bsc>
</student>
<student>
<id> 9806791 </id>
<name>Adrian</name>
<email>[email protected]</email>
<bsc>no</bsc>
</student>
</student_list>
Specifies data structure
5
xml mark-up
• Start tag <… > possibly containing attributes and values
eg < page>, <page number = “1”>
• End tag </…> eg </page>
• Empty tag <… /> eg <author surname= “Jones” />
• Document type declaration <!DOCTYPE …>
eg <!DOCTYPE page SYSTEM “my_page.dtd”>
• xml declaration <?xml … ?>
eg <?xml version=“1.0” encoding=“UTF-8” ?>
• Comment <!-- … -->
© De Montfort University, 2003
8
Elements and trees
• Elements are house, downstairs and upstairs
<house>
<downstairs> </downstairs>
<upstairs> </upstairs>
</house>
• downstairs and upstairs are the ‘content’ of the element
house
• downstairs and upstairs themselves have no content
• downstairs and upstairs are child elements of the root
element house
© De Montfort University, 2003
9
Xml node structure
kitchen
Xml document
larder
downstairs
Dining_room
house
bathroom
upstairs
bedroom
bedroom
bedroom
© De Montfort University, 2003
11
Well formed and valid
• xml document is ‘well-formed’ if the implicit structure in
the document is not ambiguous
• xml document is ‘valid’ if it conforms to a set of rules
which defines what each element is permitted to contain
© De Montfort University, 2003
12
Text nodes
<house>
<downstairs> Rather gloomy</downstairs>
<upstairs> Brightly lit</upstairs>
</house>
• The content of elements may be other elements or text
• The text Rather Gloomy and Brightly lit are each text
nodes
© De Montfort University, 2003
13
Attributes and values
<house postcode=“LE1 9BH”>
<downstairs> Rather gloomy</downstairs>
<upstairs> Brightly lit</upstairs>
</house>
• Attribute is a means of providing information about an
element,
• has a name (postcode)and a value (“LE1 9BH”)
• There may be several attributes, all must have unique
names, order in which they appear not important
© De Montfort University, 2003
14
Rules for well formed xml
•
•
•
•
•
•
•
There must be exactly one root element
Element start tags must have a matching closing tag
Elements may nest but must not overlap
Attribute values must be in matching quotes
Element may not have two attributes with the same name
If present, the xml declaration must begin the document
Comments and processing instructions must not appear in
tags
• Element and attribute values must not contain < and &
• Entities must not be used unless they have been declared
© De Montfort University, 2003
16
Valid xml
• DTD defines permissible syntax of a document using
extended BNF
• Either Public or System
• Public – used by the world at large (say in the case of the
definition of SMIL)
• System – used (and found often) on the local system
• Applications attempting to process an xml document can
compare it with the rules and process it only if it is valid
© De Montfort University, 2003
17
Sample dtd
<!ELEMENT petshow (owner)>
<!ELEMENT owner (ownerName,pet)>
Petshow can have
one and only 1 child
which must be a
owner element
<!ELEMENT ownerName (#PCDATA)>
<!ELEMENT pet (petName,petType,petDateOfBirth,isAlive)>
<!ELEMENT petName (#PCDATA)>
<!ELEMENT petType (#PCDATA)>
<!ELEMENT petDateOfBirth (#PCDATA)>
<!ELEMENT isAlive EMPTY>
• #PCDATA – ‘parsed character data’ ie a text node
• No attributes included
© De Montfort University, 2003
18
Occurrence suffixes
•
•
•
•
•
?
*
+
x,y,z
X|y|z
0 or 1
0 or more
1 or more
in this order
one of these
• Used to define permissible number of occurrences within
DTD
© De Montfort University, 2003
19
Modified DTD showing occurrences
<!ELEMENT petshow (owner+)>
<!ELEMENT owner (ownerName,pet+)>
<!ELEMENT ownerName (#PCDATA)>
<!ELEMENT pet (petName,petType,petDateOfBirth,(isAlive|petDateOfDeath))>
<!ELEMENT petName (#PCDATA)>
<!ELEMENT petType (#PCDATA)>
<!ELEMENT petDateOfBirth (#PCDATA)>
<!ELEMENT isAlive EMPTY>
<!ELEMENT petDateOfDeath (#PCDATA)>
© De Montfort University, 2003
20
Modified DTD showing attributes
<!ELEMENT petshow (owner+)>
<!ELEMENT owner (ownerName,pet+)>
<!ATTLIST owner
ownerID
CDATA #REQUIRED>
<!ELEMENT ownerName (#PCDATA)>
<!ELEMENT pet (petName,petType,petDateOfBirth,(isAlive|petDateOfDeath))>
<!ATTLIST pet
petSid CDATA #REQUIRED
previousOwner IDREF #IMPLIED>
<!ELEMENT petName (#PCDATA)>
<!ELEMENT petType (#PCDATA)>
<!ELEMENT petDateOfBirth (#PCDATA)>
<!ELEMENT isAlive EMPTY>
© De Montfort
University, 2003
<!ELEMENT
petDateOfDeath
(#PCDATA)>
21