XHTML: The New HTML - Centrum Wiskunde & Informatica
Download
Report
Transcript XHTML: The New HTML - Centrum Wiskunde & Informatica
XHTML
Steven Pemberton
CWI, Amsterdam
Chair, W3C HTML Working Group
Overview
History
Philosophy
XML and related technologies
XHTML 1.0
Modularisation
XHTML Basic
XHTML 1.1
The Future
HTML 1
The original HTML was designed in the
early 1990’s for scientific reports
Each document was a single resource (not
even <IMG>)
(This explains much about HTTP by the
way)
(HTML 1)
It is amazing how much we have been
able to do with a language with such
beginnings
It was described using SGML
HTML as an SGML Application
SGML: an international standard in 1986
It is a Meta-language that describes data
formats, using DTD’s (Document Type
Definitions)
Describes structure, not presentation
<H1>HTML as SGML Application</H1>
Example of a DTD fragment
<!ELEMENT table
(caption?, (col*|colgroup*), thead?,
tfoot?, (tbody+|tr+))>
<!ELEMENT caption %Inline;>
<!ELEMENT thead (tr)+>
...
Attributes
<!ATTLIST TABLE
%attrs;
-- %coreattrs, %i18n, %events -summary
%Text;
#IMPLIED
width
%Length;
#IMPLIED
border
%Pixels;
#IMPLIED
…
>
Entities
<!ENTITY % fontstyle
"TT | I | B | BIG | SMALL">
<!ENTITY % inline "#PCDATA | %fontstyle;
| %phrase; | %special; | %formctrl;">
<!ENTITY % Length "CDATA" -- nn for
pixels or nn% for percentage length -->
Problems with SGML
Arcane syntax
Very difficult to implement fully
No support for types
Changes to HTML
Netscape and Microsoft start adding to
HTML: mostly presentation-oriented tags
(like <BLINK>, <CENTER>), and frames
The World Wide Web Consortium (W3C)
started effort to:
Keep HTML Pure
Do presentation via Style Sheets
Separating content and
presentation
HTML was designed as a data-structuring
language, but the later changes
undermined this.
Separating content from presentation has
distinct advantages
For the author
Easier to write your documents
Easier to change your documents
Easy to change the look of your
documents
Access to professional designs
Your documents are smaller
Visible on more devices
Visible to more people
For the webmaster
Separation of concerns
Simpler HTML, less training
Cheaper to produce, easier to manage
Easy to change house style
Reach more people
Search engines find your stuff easier
Visible on more devices
For the reader
Faster download (one of the top 4 reasons
for liking a site)
Easier to find information
You can actually read the information if
you are sight-impaired
Information more accessible
You can use more devices
For the implementor
Improves the implementation (separation
of concerns)
Can produce smaller browsers
Changes to HTML (2)
Another change that Netscape made, with
insufficient thought was Frames
Frames create significant problems with
web pages
The problems with frames
Can’t bookmark framesets
[Back] does odd things
[Page up] and [page down] work oddly
[Reload] often doesn’t work right
Security is compromised
Nested frames are hard to deal with (how
do you get out?)
What frames can do
Search and show interfaces
Keeping script variables in a hidden frame
Style languages
The first action that W3C did was to start
an activity on Style Sheets (Nov 1995)
This produced CSS1 initially (Dec 1996),
then CSS2 (May 1998) (CSS3 is in
preparation)
Later produced XSL, an XML-based
language, as complementary to CSS
CSS
CSS is a separate language from HTML
that allows you to specify how an HTML
document, or set of documents, should
look
Separates content from presentation
HTML can be a structure language again
Examples of CSS
h1 { font-weight: bold; font-size: 2em }
h2 { font-weight: bold; font-size: 1.5em }
em {background-color: yellow}
body {margin-left: 20%}
Using CSS
Use the following at the top of an XML
document:
<?xml-stylesheet type='text/css'
href=’mystyle.css'?>
Or this in the <head> of an HTML
document:
<link rel="stylesheet" type="text/css"
href=”mystyle.css" />
Advantages of CSS
Makes HTML easier to write (and read)
You can define a house style
Compatible: you can still see the content
on non-CSS browsers
Pages are much smaller
Accessible to sight-impaired
...
By the way...
Check your logs: more than 95% of
people browsing now use a CSS-enabled
browser
The current generation of browsers (IE 5,
NS 6, Opera 4) have excellent support for
CSS.
You never need to use the <FONT> and
<FONTFACE> elements again!
Documents
As mentioned, HTML was designed for
just one sort of document (scientific
reports), but is now being used for all
sorts of different documents
You could use SGML to define other sorts
of document, but SGML is notoriously
hard to fully implement
Enter XML
Enter XML
XML is a W3C effort to simplify SGML
It is a meta-language: a language for
defining languages
It is a subset of SGML
One of the aims is to allow everyone to
invent their own tags
DTD is optional: a DTD can be inferred
from a document
Consequences
The requirement of being able to infer a
DTD from a document has an effect on
the languages you can define:
Closing tags are now required
<LI>....</LI> <P>....</P>
Empty tags are marked specially
<IMG SRC=”pic.gif”/> <BR/> <HR/> (or
<HR></HR> etc)
Consequences 2
CDATA sections must be marked as such
(only necessary if they contain “<”, “&”
etc.):
<SCRIPT>
<![CDATA[
... script content ...
]]>
</SCRIPT>
By the way:
<P> is not like <BR>
Not Like This
<H1>XML</H1>
An underlying problem
with HTML is that ...
<P>
You could use SGML to
define ...
But Like This
<H1>XML</H1>
<P>
An underlying problem
with HTML is that …
</P>
<P>
You could use SGML to
define ...</P>
Consequence of XML
Anyone can now design their own (Webdelivered) languages
CSS makes them viewable
<address>
<name>Steven Pemberton</name>
<company>CWI</company>
<street>Kruislaan 413</street>
<postcode>1098 SJ</postcode>
<city>Amsterdam</city>
<speaker/>
</address>
So do we still need HTML?
Workshop in May 1998
XML is still a meta-language
There is still a perceived need for a baseline mark-up
HTML has some useful semantics, both
implied and explicit (search engines gladly
use it, for instance)
HTML as XML application
Clean up (get rid of historical flotsam)
Modularise – split into separate parts
Allows other XML applications to use parts
Allows special purpose devices to use subset
Add any required new functionality
(forms, better event handling, Ruby)
The HTML Working group
International membership, around 20
members
Many major players (IBM, Microsoft,
Netscape, etc)
Meets weekly by phone, quarterly face-toface
Group experience
There was more to be worked out than
we anticipated
XHTML is the first major application of
XML, so the world’s eyes are on us
XML still needs the wrinkles ironed out
Philosophy of XHTML
Transition from ‘old world’ to XML
Clean up the language
Return to structure only
Use generic XML as much as possible
Modularise
Address wider needs (International,
Accessibility)
Add new functionality
Plan of action
HTML 4.01: corrected version
XHTML 1.0: transitional version of HTML
4.01 in 3 flavours
Modularisation: agreement on split and
methodology
XHTML Basic: Small devices
XHTML 1.1: clean version of 1.0 strict
(plan of action)
Events: accessible and deviceindependent
Ruby: needed Asian markup
Forms: more control
XHTML 2.0: Putting it all together
Differences HTML:XHTML
Because of the difference between SGML
and XML, there are some necessary
differences, for instance:
Use lower case: <p> not <P>
Attributes are always quoted:
<th colspan=”2”>
Anchors use id attribute not name (and not
just on <a> by the way):
<a id=”index”> <p id=”top”>
Example XHTML 1.0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml”
xml:lang="en">
<head><title>Virtual Library</title></head>
<body>
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.
</p>
</body>
</html>
Namespaces
Namespaces have been added to XML to
allow you to mix fragments from different
languages (e.g. HTML + Maths)
In the same way that object-oriented
languages allow you to identify which
function you are using, namespaces allow
you to identify which tags you are using.
Example of nesting
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>A Math Example</title></head>
<body>
<p>The following is MathML markup:</p>
<math xmlns="http://www.w3.org/TR/REC-MathML">
<apply><log/><logbase><cn> 3 </cn> </logbase>
<ci> x </ci>
</apply>
</math>
</body>
</html>
Example of colonising
<math xmlns="http://www.w3.org/TR/REC-MathML"
xmlns:html="http://www.w3.org/1999/xhtml">
<apply><log/><logbase><cn> 3 </cn> </logbase>
<ci> x </ci>
</apply>
<html:p>This is a paragraph</html:p>
</math>
Namespaced attributes
Attributes normally come from the
element itself:
<html:a href="next.xml">
But you may also use ‘global’ attributes
from a namespace:
<pointer html:href="x.xml">
<music style="classical" html:style="color:
red">Beethoven’s 5th</music>
XML ‘namespace’
XML also has its own pseudo-namespace
for reserved attributes:
<para xml:lang="en">
Using ‘generic’ XML
Presentation use CSS
Links use Xlink or Schemas
Forms use CSS?
Images etc. use Xlink or Schemas
(Natural) language of elements use
xml:lang attribute
Xlink?
HTML has several ‘built-in’ hyperlinks:
<a>, <img>, <object>, <link>, etc.
Since XML allows you to define your own
elements, a browser doesn’t know which
are links
Xlink was started to solve this problem.
Xlink
Xlink started as a method of describing
which attributes of an element were a link
It later changed into a language of links,
so it could no longer be used to describe
XHTML
The current plan is now to introduce types
into Schemas to describe links
Example of Xlink
<crossReference
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="students.xml"
xlink:role="studentlist"
xlink:title="Student List"
xlink:show="new"
xlink:actuate="onRequest">
Current List of Students
</crossReference>
Schemas
Schemas are a new technology to replace
much of DTDs.
Schemas are expressed in XML
They have support for data types
Much easier to parse and implement than
DTDs
Schemas: but
They don’t support the definition of
entities (é)
Not easy to read (or write)
Schema fragment
<elementType name='table'>
<refines>
<archetypeRef name='common'/>
<archetypeRef name='simpleBlockDisplay'/>
</refines>
more>>>
(schema fragment)
<sequence>
<elementTypeRef name='caption' minOccur='0' maxOccur='1'/>
<choice>
<elementTypeRef name='col' minOccur='0' maxOccur='*'/>
<elementTypeRef name='colgroup' minOccur='0' maxOccur='*'/>
</choice>
more >>>
(schema fragment)
<choice>
<sequence>
<elementTypeRef name='thead' minOccur='0' maxOccur='1'/>
<elementTypeRef name='tfoot' minOccur='0' maxOccur='1'/>
<elementTypeRef name='tbody' minOccur='1' maxOccur='*'/>
</sequence>
<elementTypeRef name='tr' minOccur='1' maxOccur='*'/>
</choice>
</sequence>
</elementType>
(equivalent DTD)
<!ELEMENT table
(caption?, (col*|colgroup*), thead?,
tfoot?, (tbody+|tr+))>
XHTML 1.0
XHTML 1.0 is an XML-ised version of
HTML 4.01
Just like HTML 4.01, there are 3 versions:
‘strict’, ‘loose’, and ‘frameset’
Transitional version
XHTML 1.0 has been carefully designed to
make use of ‘quirks’ in existing HTML
browsers
Use of a small number of guidelines
allows XHTML to be served to HTML user
agents as well as XML user agents
Examples of Guidelines
Use space before / of empty elements:
<br /> <hr /> <img src=”foo.gif” />
Don’t use <hr></hr> form
Use name= and id= on <a>:
<a name= ”index” id= ”index”> … </a>
Serving XHTML 1.0
An XHTML 1.0 document that follows the
guidelines can be served up either as
HTML, or as XML
But beware: CSS has slightly different
rules for HTML and XML
Similarly, the DOM has differences for
HTML and XML
Modularisation
XHTML has been divided into a number of
modules.
A module is a collection of elements
and/or attributes that can be used as
building blocks to build a DTD.
(modularisation)
A language can be built by using just
XHTML modules, or adding your own
We had originally defined Modularisation
just for our own use, but it has turned out
useful for other groups as well
XHTML modules
Structure: html, head, title, body
Text: abbr, acronym, address, blockquote,
br, cite, code, dfn, div, em, h1, h2, h3, h4,
h5, h6, kbd, p, pre, q, samp, span, strong,
var
Hypertext: a
List: ol, ul, dl, li, dt, dd
(modules)
Applet (deprecated): applet, param
Presentation: b, i, hr, big, small, sub, sup,
tt
Edit: del, ins
Bi-directional Text: bdo
(modules)
Basic Forms: simple forms
Forms: full forms
Basic Tables: simple tables
Tables: full tables
(modules)
Image: img
Client-side Image Map: map, +
Server-side Image Map: change to img
Object: object, param
Frames
Target: attribute
Iframe
(modules)
Intrinsic Events: adds events attributes
Metainformation: meta
Scripting: script
Stylesheet: style
Style Attribute
Link: link
(modules)
Base: base
Name Identification: name attribute
Legacy: basefont, center, font, s, strike, u,
plus loads of attributes (eg align)
Ruby: Asian markup
Note on modules
Note that some modules consist of a
single element, or just add some
attributes to existing elements
Not all modules are independent: if you
use some modules, they bring other
modules with them, or change other
modules
Future modules are planned (eg extended
forms, events)
The XHTML family
To still be called an XHTML language you
must use Structure, Hypertext, Basic Text,
and List modules (you may define your
own Structure module)
Example integration
languages
SMIL is planning a module to integrate
SMIL and HTML
Likewise for MathML
Creating a DTD
It is not expected that creating XHTMLbased languages will be a daily activity
Not the place to describe the method
here: it depends on understanding DTDs.
The Modularisation document has
extensive examples
Future versions will also use Schemas (we
hope…)
XHTML Basic
XHTML Basic is the first XHTML familymember to be defined using
Modularisation
It is designed for small devices, typically
mobile telephones
XHTML Basic Modules
Structure Module*
body, head, html, title
Text Module*
abbr, acronym, address,
blockquote, br, cite, code, dfn,
div, em, h1, h2, h3, h4, h5, h6,
kbd, p, pre, q, samp, span,
strong, var
(XHTML Basic Modules)
Hypertext Module*
a
List Module*
dl, dt, dd, ol, ul, li
Basic Forms Module
form, input, label, select,
option, textarea
(XHTML Basic Modules)
Basic Tables Module
caption, table, td, th, tr
Image Module
img
Object Module
object, param
(XHTML Basic Modules)
Metainformation Module
meta
Link Module
link
Base Module
base
XHTML Basic usage
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtmlbasic/xhtml-basic10.dtd">
XHTML 1.1
XHTML 1.1 is the second family member
to be defined using Modularisation
Its main aim is to present a cleaned-up,
non-transitional version of XHTML 1.0
strict (no frames)
It also adds Ruby markup
Otherwise: no new functionality
XHTML 1.1 Modules
Structure, Text, Hypertext, List, Object,
Presentation, Edit, Bidirectional Text,
Forms, Tables, Image, Client-side Image
Map, Server-side Image Map, Intrinsic
Events, Metainformation, Scripting,
Stylesheet Module, Style Attribute
(Deprecated ), Link, Base, Ruby.
Example XHTML 1.1
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" >
<head> <title>Virtual Library</title> </head>
<body>
<p>Moved to <a
href="http://vlib.org/">vlib.org</a>.</p>
</body>
</html>
Ruby
Example Ruby markup
<ruby>
<rb>WWW</rb>
<rp>(</rp><rt>World Wide
Web</rt><rp>)</rp>
</ruby>
(Use CSS to describe presentation)
XHTML 2.0
XHTML 2.0 is still in preparation
New forms
New events
More accessibility
Forms
Being produced by a separate group
Consists of three parts:
data model
instances
user interface
Will allow you to
save and restore forms
download multi-page forms
(Forms)
Will include much more client-side
checking
Form data will be sent to the server as
XML
Separates content from presentation (e.g.
a radio button and a select box both allow
you to select one from many, and you
may want to use different choices on
different devices)
Events
Current events are almost all in terms of
mouse: onclick, onmouseover, onfocus,
etc.
Future event model will be device
independent, and allow you to define your
own new events
Uses the DOM event model
The DOM
Domain Object Model: how you access a
document via scripting
Currently only an XML DOM
An XHTML DOM is being investigated
Accessibility and
Internationalisation
W3C has an accessibility group that
checks that new recommendations
address people with accessibility needs
There is also an internationalisation group
that does the same for cultural issues
(which produced <ruby>)
Accessibility problems
A sighted person can work out the
structure from the visual presentation
A non-sighted person cannot: the
structure must be present in the markup
That is why new features were added to
forms and tables in HTML 4, like
<caption>
Structure
Text would also benefit from such a
treatment: not h1, h2 etc (which are
subject to misuse) but nested sections
with their own headings
Example of structure
<section>
<h>XHTML</h>
…
<section>
<h>Structure</h>
…
</section>
</section>
CSS can still handle it
section h { how an h1 should look }
section section h { h2 }
section section section h { h3 }
etc.
Conclusions
XML with related technologies gives you
the freedom to define and deliver your
own document types
HTML is still needed as a base-line
markup
The new HTML gives a transition path to
the future
The State of Things
New generation of XML+CSS browsers
emerging
Many XML applications appearing
Major companies planning XML as output
(Adobe PDF, MS Office 2000)
Now: HTML4, XHTML 1.0, Modularisation,
Basic, 1.1
To Find Out More
All XHTML developments are made public
at www.w3.org/Markup
Members of W3C can also look at
www.w3.org/Markup/Group