XHTML: The New HTML - Centrum Wiskunde & Informatica

Download Report

Transcript XHTML: The New HTML - Centrum Wiskunde & Informatica

XHTML
Steven Pemberton
CWI, Amsterdam
Chair, W3C HTML Working Group
Overview
History
Philosophy
XML and related technologies
XHTML 1.0
Modularisation
XHTML Basic
XHTML 1.1
The Future
HTML 1
The original HTML was designed in the
early 1990’s for scientific reports
Each document was a single resource (not
even <IMG>)
(This explains much about HTTP by the
way)
(HTML 1)
It is amazing how much we have been
able to do with a language with such
beginnings
It was described using SGML
HTML as an SGML Application
SGML: an international standard in 1986
It is a Meta-language that describes data
formats, using DTD’s (Document Type
Definitions)
Describes structure, not presentation
<H1>HTML as SGML Application</H1>
Example of a DTD fragment
<!ELEMENT table
(caption?, (col*|colgroup*), thead?,
tfoot?, (tbody+|tr+))>
<!ELEMENT caption %Inline;>
<!ELEMENT thead (tr)+>
...
Attributes
<!ATTLIST TABLE
%attrs;
-- %coreattrs, %i18n, %events -summary
%Text;
#IMPLIED
width
%Length;
#IMPLIED
border
%Pixels;
#IMPLIED
…
>
Entities
<!ENTITY % fontstyle
"TT | I | B | BIG | SMALL">
<!ENTITY % inline "#PCDATA | %fontstyle;
| %phrase; | %special; | %formctrl;">
<!ENTITY % Length "CDATA" -- nn for
pixels or nn% for percentage length -->
Problems with SGML
Arcane syntax
Very difficult to implement fully
No support for types
Changes to HTML
Netscape and Microsoft start adding to
HTML: mostly presentation-oriented tags
(like <BLINK>, <CENTER>), and frames
The World Wide Web Consortium (W3C)
started effort to:
Keep HTML Pure
Do presentation via Style Sheets
Separating content and
presentation
HTML was designed as a data-structuring
language, but the later changes
undermined this.
Separating content from presentation has
distinct advantages
For the author
Easier to write your documents
Easier to change your documents
Easy to change the look of your
documents
Access to professional designs
Your documents are smaller
Visible on more devices
Visible to more people
For the webmaster
Separation of concerns
Simpler HTML, less training
Cheaper to produce, easier to manage
Easy to change house style
Reach more people
Search engines find your stuff easier
Visible on more devices
For the reader
Faster download (one of the top 4 reasons
for liking a site)
Easier to find information
You can actually read the information if
you are sight-impaired
Information more accessible
You can use more devices
For the implementor
Improves the implementation (separation
of concerns)
Can produce smaller browsers
Changes to HTML (2)
Another change that Netscape made, with
insufficient thought was Frames
Frames create significant problems with
web pages
The problems with frames
Can’t bookmark framesets
[Back] does odd things
[Page up] and [page down] work oddly
[Reload] often doesn’t work right
Security is compromised
Nested frames are hard to deal with (how
do you get out?)
What frames can do
Search and show interfaces
Keeping script variables in a hidden frame
Style languages
The first action that W3C did was to start
an activity on Style Sheets (Nov 1995)
This produced CSS1 initially (Dec 1996),
then CSS2 (May 1998) (CSS3 is in
preparation)
Later produced XSL, an XML-based
language, as complementary to CSS
CSS
CSS is a separate language from HTML
that allows you to specify how an HTML
document, or set of documents, should
look
Separates content from presentation
HTML can be a structure language again
Examples of CSS
h1 { font-weight: bold; font-size: 2em }
h2 { font-weight: bold; font-size: 1.5em }
em {background-color: yellow}
body {margin-left: 20%}
Using CSS
Use the following at the top of an XML
document:
<?xml-stylesheet type='text/css'
href=’mystyle.css'?>
Or this in the <head> of an HTML
document:
<link rel="stylesheet" type="text/css"
href=”mystyle.css" />
Advantages of CSS
Makes HTML easier to write (and read)
You can define a house style
Compatible: you can still see the content
on non-CSS browsers
Pages are much smaller
Accessible to sight-impaired
...
By the way...
Check your logs: more than 95% of
people browsing now use a CSS-enabled
browser
The current generation of browsers (IE 5,
NS 6, Opera 4) have excellent support for
CSS.
You never need to use the <FONT> and
<FONTFACE> elements again!
Documents
As mentioned, HTML was designed for
just one sort of document (scientific
reports), but is now being used for all
sorts of different documents
You could use SGML to define other sorts
of document, but SGML is notoriously
hard to fully implement
Enter XML
Enter XML
XML is a W3C effort to simplify SGML
It is a meta-language: a language for
defining languages
It is a subset of SGML
One of the aims is to allow everyone to
invent their own tags
DTD is optional: a DTD can be inferred
from a document
Consequences
The requirement of being able to infer a
DTD from a document has an effect on
the languages you can define:
Closing tags are now required
<LI>....</LI> <P>....</P>
Empty tags are marked specially
<IMG SRC=”pic.gif”/> <BR/> <HR/> (or
<HR></HR> etc)
Consequences 2
CDATA sections must be marked as such
(only necessary if they contain “<”, “&”
etc.):
<SCRIPT>
<![CDATA[
... script content ...
]]>
</SCRIPT>
By the way:
<P> is not like <BR>
Not Like This
<H1>XML</H1>
An underlying problem
with HTML is that ...
<P>
You could use SGML to
define ...
But Like This
<H1>XML</H1>
<P>
An underlying problem
with HTML is that …
</P>
<P>
You could use SGML to
define ...</P>
Consequence of XML
Anyone can now design their own (Webdelivered) languages
CSS makes them viewable
<address>
<name>Steven Pemberton</name>
<company>CWI</company>
<street>Kruislaan 413</street>
<postcode>1098 SJ</postcode>
<city>Amsterdam</city>
<speaker/>
</address>
So do we still need HTML?
Workshop in May 1998
XML is still a meta-language
There is still a perceived need for a baseline mark-up
HTML has some useful semantics, both
implied and explicit (search engines gladly
use it, for instance)
HTML as XML application
Clean up (get rid of historical flotsam)
Modularise – split into separate parts
Allows other XML applications to use parts
Allows special purpose devices to use subset
Add any required new functionality
(forms, better event handling, Ruby)
The HTML Working group
International membership, around 20
members
Many major players (IBM, Microsoft,
Netscape, etc)
Meets weekly by phone, quarterly face-toface
Group experience
There was more to be worked out than
we anticipated
XHTML is the first major application of
XML, so the world’s eyes are on us
XML still needs the wrinkles ironed out
Philosophy of XHTML
Transition from ‘old world’ to XML
Clean up the language
Return to structure only
Use generic XML as much as possible
Modularise
Address wider needs (International,
Accessibility)
Add new functionality
Plan of action
HTML 4.01: corrected version
XHTML 1.0: transitional version of HTML
4.01 in 3 flavours
Modularisation: agreement on split and
methodology
XHTML Basic: Small devices
XHTML 1.1: clean version of 1.0 strict
(plan of action)
Events: accessible and deviceindependent
Ruby: needed Asian markup
Forms: more control
XHTML 2.0: Putting it all together
Differences HTML:XHTML
Because of the difference between SGML
and XML, there are some necessary
differences, for instance:
Use lower case: <p> not <P>
Attributes are always quoted:
<th colspan=”2”>
Anchors use id attribute not name (and not
just on <a> by the way):
<a id=”index”> <p id=”top”>
Example XHTML 1.0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml”
xml:lang="en">
<head><title>Virtual Library</title></head>
<body>
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.
</p>
</body>
</html>
Namespaces
Namespaces have been added to XML to
allow you to mix fragments from different
languages (e.g. HTML + Maths)
In the same way that object-oriented
languages allow you to identify which
function you are using, namespaces allow
you to identify which tags you are using.
Example of nesting
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>A Math Example</title></head>
<body>
<p>The following is MathML markup:</p>
<math xmlns="http://www.w3.org/TR/REC-MathML">
<apply><log/><logbase><cn> 3 </cn> </logbase>
<ci> x </ci>
</apply>
</math>
</body>
</html>
Example of colonising
<math xmlns="http://www.w3.org/TR/REC-MathML"
xmlns:html="http://www.w3.org/1999/xhtml">
<apply><log/><logbase><cn> 3 </cn> </logbase>
<ci> x </ci>
</apply>
<html:p>This is a paragraph</html:p>
</math>
Namespaced attributes
Attributes normally come from the
element itself:
<html:a href="next.xml">
But you may also use ‘global’ attributes
from a namespace:
<pointer html:href="x.xml">
<music style="classical" html:style="color:
red">Beethoven’s 5th</music>
XML ‘namespace’
XML also has its own pseudo-namespace
for reserved attributes:
<para xml:lang="en">
Using ‘generic’ XML
Presentation  use CSS
Links  use Xlink or Schemas
Forms  use CSS?
Images etc.  use Xlink or Schemas
(Natural) language of elements  use
xml:lang attribute
Xlink?
HTML has several ‘built-in’ hyperlinks:
<a>, <img>, <object>, <link>, etc.
Since XML allows you to define your own
elements, a browser doesn’t know which
are links
Xlink was started to solve this problem.
Xlink
Xlink started as a method of describing
which attributes of an element were a link
It later changed into a language of links,
so it could no longer be used to describe
XHTML
The current plan is now to introduce types
into Schemas to describe links
Example of Xlink
<crossReference
xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="students.xml"
xlink:role="studentlist"
xlink:title="Student List"
xlink:show="new"
xlink:actuate="onRequest">
Current List of Students
</crossReference>
Schemas
Schemas are a new technology to replace
much of DTDs.
Schemas are expressed in XML
They have support for data types
Much easier to parse and implement than
DTDs
Schemas: but
They don’t support the definition of
entities (&eacute;)
Not easy to read (or write)
Schema fragment
<elementType name='table'>
<refines>
<archetypeRef name='common'/>
<archetypeRef name='simpleBlockDisplay'/>
</refines>
more>>>
(schema fragment)
<sequence>
<elementTypeRef name='caption' minOccur='0' maxOccur='1'/>
<choice>
<elementTypeRef name='col' minOccur='0' maxOccur='*'/>
<elementTypeRef name='colgroup' minOccur='0' maxOccur='*'/>
</choice>
more >>>
(schema fragment)
<choice>
<sequence>
<elementTypeRef name='thead' minOccur='0' maxOccur='1'/>
<elementTypeRef name='tfoot' minOccur='0' maxOccur='1'/>
<elementTypeRef name='tbody' minOccur='1' maxOccur='*'/>
</sequence>
<elementTypeRef name='tr' minOccur='1' maxOccur='*'/>
</choice>
</sequence>
</elementType>
(equivalent DTD)
<!ELEMENT table
(caption?, (col*|colgroup*), thead?,
tfoot?, (tbody+|tr+))>
XHTML 1.0
XHTML 1.0 is an XML-ised version of
HTML 4.01
Just like HTML 4.01, there are 3 versions:
‘strict’, ‘loose’, and ‘frameset’
Transitional version
XHTML 1.0 has been carefully designed to
make use of ‘quirks’ in existing HTML
browsers
Use of a small number of guidelines
allows XHTML to be served to HTML user
agents as well as XML user agents
Examples of Guidelines
Use space before / of empty elements:
<br /> <hr /> <img src=”foo.gif” />
Don’t use <hr></hr> form
Use name= and id= on <a>:
<a name= ”index” id= ”index”> … </a>
Serving XHTML 1.0
An XHTML 1.0 document that follows the
guidelines can be served up either as
HTML, or as XML
But beware: CSS has slightly different
rules for HTML and XML
Similarly, the DOM has differences for
HTML and XML
Modularisation
XHTML has been divided into a number of
modules.
A module is a collection of elements
and/or attributes that can be used as
building blocks to build a DTD.
(modularisation)
A language can be built by using just
XHTML modules, or adding your own
We had originally defined Modularisation
just for our own use, but it has turned out
useful for other groups as well
XHTML modules
Structure: html, head, title, body
Text: abbr, acronym, address, blockquote,
br, cite, code, dfn, div, em, h1, h2, h3, h4,
h5, h6, kbd, p, pre, q, samp, span, strong,
var
Hypertext: a
List: ol, ul, dl, li, dt, dd
(modules)
Applet (deprecated): applet, param
Presentation: b, i, hr, big, small, sub, sup,
tt
Edit: del, ins
Bi-directional Text: bdo
(modules)
Basic Forms: simple forms
Forms: full forms
Basic Tables: simple tables
Tables: full tables
(modules)
Image: img
Client-side Image Map: map, +
Server-side Image Map: change to img
Object: object, param
Frames
Target: attribute
Iframe
(modules)
Intrinsic Events: adds events attributes
Metainformation: meta
Scripting: script
Stylesheet: style
Style Attribute
Link: link
(modules)
Base: base
Name Identification: name attribute
Legacy: basefont, center, font, s, strike, u,
plus loads of attributes (eg align)
Ruby: Asian markup
Note on modules
Note that some modules consist of a
single element, or just add some
attributes to existing elements
Not all modules are independent: if you
use some modules, they bring other
modules with them, or change other
modules
Future modules are planned (eg extended
forms, events)
The XHTML family
To still be called an XHTML language you
must use Structure, Hypertext, Basic Text,
and List modules (you may define your
own Structure module)
Example integration
languages
SMIL is planning a module to integrate
SMIL and HTML
Likewise for MathML
Creating a DTD
It is not expected that creating XHTMLbased languages will be a daily activity
Not the place to describe the method
here: it depends on understanding DTDs.
The Modularisation document has
extensive examples
Future versions will also use Schemas (we
hope…)
XHTML Basic
XHTML Basic is the first XHTML familymember to be defined using
Modularisation
It is designed for small devices, typically
mobile telephones
XHTML Basic Modules
Structure Module*
body, head, html, title
Text Module*
abbr, acronym, address,
blockquote, br, cite, code, dfn,
div, em, h1, h2, h3, h4, h5, h6,
kbd, p, pre, q, samp, span,
strong, var
(XHTML Basic Modules)
Hypertext Module*
a
List Module*
dl, dt, dd, ol, ul, li
Basic Forms Module
form, input, label, select,
option, textarea
(XHTML Basic Modules)
Basic Tables Module
caption, table, td, th, tr
Image Module
img
Object Module
object, param
(XHTML Basic Modules)
Metainformation Module
meta
Link Module
link
Base Module
base
XHTML Basic usage
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtmlbasic/xhtml-basic10.dtd">
XHTML 1.1
XHTML 1.1 is the second family member
to be defined using Modularisation
Its main aim is to present a cleaned-up,
non-transitional version of XHTML 1.0
strict (no frames)
It also adds Ruby markup
Otherwise: no new functionality
XHTML 1.1 Modules
Structure, Text, Hypertext, List, Object,
Presentation, Edit, Bidirectional Text,
Forms, Tables, Image, Client-side Image
Map, Server-side Image Map, Intrinsic
Events, Metainformation, Scripting,
Stylesheet Module, Style Attribute
(Deprecated ), Link, Base, Ruby.
Example XHTML 1.1
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" >
<head> <title>Virtual Library</title> </head>
<body>
<p>Moved to <a
href="http://vlib.org/">vlib.org</a>.</p>
</body>
</html>
Ruby
Example Ruby markup
<ruby>
<rb>WWW</rb>
<rp>(</rp><rt>World Wide
Web</rt><rp>)</rp>
</ruby>
(Use CSS to describe presentation)
XHTML 2.0
XHTML 2.0 is still in preparation
New forms
New events
More accessibility
Forms
Being produced by a separate group
Consists of three parts:
data model
instances
user interface
Will allow you to
save and restore forms
download multi-page forms
(Forms)
Will include much more client-side
checking
Form data will be sent to the server as
XML
Separates content from presentation (e.g.
a radio button and a select box both allow
you to select one from many, and you
may want to use different choices on
different devices)
Events
Current events are almost all in terms of
mouse: onclick, onmouseover, onfocus,
etc.
Future event model will be device
independent, and allow you to define your
own new events
Uses the DOM event model
The DOM
Domain Object Model: how you access a
document via scripting
Currently only an XML DOM
An XHTML DOM is being investigated
Accessibility and
Internationalisation
W3C has an accessibility group that
checks that new recommendations
address people with accessibility needs
There is also an internationalisation group
that does the same for cultural issues
(which produced <ruby>)
Accessibility problems
A sighted person can work out the
structure from the visual presentation
A non-sighted person cannot: the
structure must be present in the markup
That is why new features were added to
forms and tables in HTML 4, like
<caption>
Structure
Text would also benefit from such a
treatment: not h1, h2 etc (which are
subject to misuse) but nested sections
with their own headings
Example of structure
<section>
<h>XHTML</h>
…
<section>
<h>Structure</h>
…
</section>
</section>
CSS can still handle it
section h { how an h1 should look }
section section h { h2 }
section section section h { h3 }
etc.
Conclusions
XML with related technologies gives you
the freedom to define and deliver your
own document types
HTML is still needed as a base-line
markup
The new HTML gives a transition path to
the future
The State of Things
New generation of XML+CSS browsers
emerging
Many XML applications appearing
Major companies planning XML as output
(Adobe PDF, MS Office 2000)
Now: HTML4, XHTML 1.0, Modularisation,
Basic, 1.1
To Find Out More
All XHTML developments are made public
at www.w3.org/Markup
Members of W3C can also look at
www.w3.org/Markup/Group