Will It All Fit Together? The need for standards and Brian Kelly

Download Report

Transcript Will It All Fit Together? The need for standards and Brian Kelly

B
Will It All Fit Together?
The need for standards and
the technical challenges
Brian Kelly
and
UK Web Focus
UKOLN
University of Bath
Bath, BA2 7AY
Paul Miller
Interoperability Focus
[email protected]
[email protected]
http://www.ukoln.ac.uk/
UKOLN is funded by the Library and Information Commission, the Joint
Information Systems Committee (JISC) of the Higher Education Funding Councils,
as well as by project funding from the JISC and the European Union.
UKOLN also receives support from the University of Bath where it is based.
B
Contents
• Introduction
• Background:
• The Web
• Library Information
• Problems
• Solutions
• Deployment Challenges
• Conclusions
2
B
About Us
UK Web Focus
• Advises UK HE community on web developments
• JISC-funded
• Represents JISC on W3C
Interoperability Focus
• Advises on issues related to the deployment of
‘interoperable’ services across libraries, museums,
archives, etc.
• JISC and LIC funded
• Represents community on various international
metadata and standardisation initiatives
3
B
About You
How many are in the following groups:
“Webmasters”
Library catalogue/system Managers
Others
What do you hope to gain from the session?
…and if we use terms you don’t understand… ask!
4
B
Aims of this Session
The aims of this session are:
• To provide an update on web developments
• To illustrate ways in which the web relates
to other library–based electronic information
• To outline some of the advantages of
adopting a standardised solution to
problems
• To look at the ways in which things might
move in the near future
5
B
Standardisation
Community
• Library groups
• Cultural Heritage
• Government
Proprietary
• De facto standards
• Often initially
Formal
appealing (cf
• Formal international/
PowerPoint, PDF)
national standards
• May emerge as
processes
W3C
standards
• ISO, CEN, NISO, ECMA,
• Produces W3C
Relevant
ANSI, BSI…
Recommendations
• Can be slow-moving
Bodies
• Managed approach
and bureaucratic 23950
• Protocols initially
IETF
• Produce robust PNG
developed by
• Produces Internet
standards
W3C members
HTML
Drafts
on
Internet
protocols
• Decisions made by
Java
•
Bottom-up
approach
to
developments
W3C, influenced by
• Protocols developed by
HTTP
member & PNG
interested individuals
URN
public
HTML
• "Rough consensus and working
whois++
review
HTTP
6
code"
B
Background to the Web
The web was initially very successful due to
its simplicity
HTML
Client
Netscape
IE
Lynx
Give me foo.html
from www.bath.ac.uk
Here it is
Server
Apache
IIS
...
The web is based on three key architectural components:
7
Data Format:
HTML (HyperText Markup Language)
Addressing:
URLs (Uniform Resource Locators)
Transport:
HTTP (Hypertext Transfer Protocol)
Background to Library
Information
Long tradition of categorising information
• Card catalogue (local)
• OPAC (local-ish)
• WebPAC (potentially global)
Proven track record on formalising practice
• AACR (rules for cataloguing)
• MARC (rules for transfer)
• Z39.50 (linking and access)
8
P
B
Problems With the Web
Although the web has been successful,
there are problems:
• Performance - the web is too slow
• Resource discovery - lack of a metadata
architecture
• HTML’s lack of arbitrary structure
• Accessibility - difficulties of accessing
information by visually impaired, people using
PDAs, etc.
• Functionality - difficult to deploy interactive
applications on the web
• Addressing
• etc.
9
B
Solutions (Today)
HTML 4.0 used in conjunction with CSS 2.0 (Cascading
Style Sheets) and the DOM provides an architecturally
pure, yet functionally rich environment
HTML 4.0 - W3C-Rec
• Improved forms
• Hooks for stylesheets
• Hooks for scripting
languages
• Table enhancements
• Better printing
Problems
• Changes during CSS development
• Netscape & IE incompatibilities
• Continued use of browsers with
known bugs
10
CSS 2.0 - W3C-Rec
• Support for all HTML
formatting
• Positioning of HTML
elements
• Multiple media support
DOM - W3C-Rec
• Document Object Model
• Hooks for scripting
languages
• Permits changes to
HTML & CSS properties
and content
B
HTML's Limitations
HTML 4.0 / CSS 2.0 have limitations:
• Difficulties in introducing new elements
– Time-consuming standardisation process
(<ABBREV>)
– Dictated by browser vendor (<BLINK>, <MARQUEE>)
• Area may be inappropriate for standarisation:
– Covers specialist area (maths, music, ...)
– Application-specific (<STUD-NUM>)
• HTML is a display (output) format
• HTML's lack of arbitrary structure limits functionality:
– Find all memos copied to John Smith
– How many unique tracks on Jackson Browne CDs
11
B
XML
XML:
•
•
•
•
Extensible Markup Language
A lightweight SGML designed for network use
Addresses HTML's lack of evolvability
Arbitrary elements can be defined (<STUDENTNUMBER>, <PART-NO>, etc)
• Agreement achieved quickly - XML 1.0 became
W3C Recommendation in Feb 1998
• Support from industry (SGML vendors,
Microsoft, etc.)
• Support in Netscape 5 and IE 5
12
B
XML Deployment
Ariadne issue 15 has article
on "What Is XML?"
Describes how XML support
can be provided:
• Natively by new browsers
• Back end conversion
of XML - HTML
• Client-side conversion
of XML - HTML / CSS
• Java rendering of XML
Examples of intermediaries
See http://www.ariadne.ac.uk/issue15/what-is/
13
B
Namespaces and Linking
XML Namespaces
What if an XML document contains a <TITLE>
for the document and a <TITLE> for the name
of a book?
XML Namespaces enable such clashes to be
resolved
The naming conventions are defined at a URL
XSL stylesheet language will provide extensibility
and transformation facilities (e.g. create a table of
contents)
14
Challenges facing library
information
Competition?
Amazon.co.uk 
Many–MARC
Integration with other
scholarly resources
• AHDS Gateway
• SOSIG
• Web of Science
Alternative delivery
• on–line document
delivery?
15
Obfuscation ?
Complication !
P
P
Addressing (Problems)
URLs (e.g. http://www.bristolpoly.ac.uk/depts/music/) have
limitations:
• Lack of long-term persistency
– Organisation changes name
– Department shut down or merged
– Directory structure reorganised
16
• Inability to support multiple versions of
resources (mirroring)
ISBN/ISSN also problematic:
• Not tied to the work
• Nor to the item at hand
P
Addressing (Solutions)
DOIs (Document Object Identifiers):
• Proposed by publishing industry as a
solution
• Aimed at supporting rights ownership
• Business model needed
• Do two copies of a digital object get
separate DOIs?
PURLs (Persistent URLs):
• Provide single level of redirection
17
P
Joined–up thinking
• Users can be anywhere. They need to
search anywhere
• Physical locations at which digital data
are stored should not impinge upon
access
• Disciplinary boundaries should not be a
barrier
18
P
Z39.50
• International Standard (ISO 23950)
• Permits remote searching of databases
• Access via Z client or over web
• Relies upon ‘Profiles’
• Used outside the library
See http://www.ariadne.ac.uk/issue21/
19
P
Z39.50 Challenges
• Profiles for each discipline
• Defeats interoperability?
• Bib–1 bloat
• Largely invisible
• Seen as complicated
• Seen as expensive
• Seen as old–fashioned
• Surely no match for XML/RDF/whatever
20
P
Z39.50 Futures
• International Interoperability Profile
• Cross–Domain Attribute Set
• Attribute Architecture
• Bib–2
• XER
• DNER/RDNC/NGDF/ New Library?
21
P
When to use it?
• To provide remote access to a large
catalogue of material (an OPAC, a
museum collection management
system…)
• To facilitate/allow searching of your
resources alongside like resources from
elsewhere
22
P
What is ‘Metadata’?
– meaningless jargon
– or
a fashionable term for what we’ve always done
– or
“a means of turning data into information”
– and
“data about data”
– and
the name of a film director (‘Luc Besson’)
– and
the title of a book (‘The Lord of the Flies’).
23
What is ‘Metadata’?
Metadata exists for almost anything;
• People
• Places
• Objects
• Concepts
• Web pages
• Databases.
24
P
What is ‘Metadata’?
Metadata fulfils three main functions;
• Description of resource content
– “What is it?”
• Description of resource form
– “How is it constructed?”
• Description of resource use
– “Can I afford it?”.
25
P
Introducing the Dublin Core
• An attempt to improve resource
discovery on the Web
– now adopted more broadly
• Building an interdisciplinary consensus
about a core element set for resource
discovery
– simple and intuitive
– cross–disciplinary
– international
– flexible.
26
Introducing the Dublin Core
•
•
•
•
15 elements of descriptive metadata
All elements optional
All elements repeatable
The whole is extensible
– offers a starting point for semantically
richer descriptions
• Interdisciplinary
– libraries, museums, archives…
• International
– available in 20 languages, with more on
the way...
27
Introducing the Dublin Core
•
•
•
•
•
•
•
•
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
•
•
•
•
•
•
•
Format
Identifier
Source
Language
Relation
Coverage
Rights
http://purl.org/dc/
28
P
Implementing the Dublin Core
•
•
•
•
•
Normally thought of as being HTML
Most recently possible in XML/RDF
Dublin Core ‘view’ onto richer databases
DC elements in Bib–1
DC elements form basis of XD attribute
set
• DC closely mapped to GILS
See http://purl.org/dc/
29
See http://www.ukoln.ac.uk/metadata/resources/dc/
datamodel/WD–dc–rdf/
P
RDF
RDF Data Model
RDF - the metadata
framework
Resource
• Based on a formal
data model (direct
label graphs)
• Syntax for interchange
of data
• Schema model
page.html
Cost
Property
PropName
Cost
30
Value
Property
page.html
£0.05
PropObj
InstanceOf
PropertyType
Value
ValidUntil
11-May-98
Cost
£0.05
ValidUntil
11-May-98
P
Authentication
We can’t (can we?) just make all these resources
available for free.
Users need to authenticate.
ATHENS / Digital Signatures …
Authenticate once per site ?
Authenticate once per query per site ?
Complicated by Z39.50 searches
authenticate once per Target queried ?!
Ideally, authenticate once
when you log on in the morning!
31
B
Deployment
How to I deploy “the new stuff” in the real
world?
Barriers:
•
•
•
•
•
32
Browser x doesn’t do CSS, …
Authoring tools don’t do RDF
I prefer the web as it is
I haven’t the time to learn anything new
This Z39.50 thing is just too hard
B
Approaches to Deployment
Various interesting new technologies have
been outlined
How can they be deployed in our environment?
Should we:
• Ignore them?
• Accept them fully?
• Accept them partly?
33
B
Ignore New Developments
We can chose to ignore new developments,
and continue to use, say, HTML 3.2:
Safe option, with no new training, support or
software costs
Experience in effectiveness, limitations, etc.
Fails to address current performance problems
Fails to address accessibility problems
Fails to provide new functionality
Service likely to look "old-fashioned" compared
with competition
34
B
Fully Accept New Developments
We can chose to more wholesale to, say, HTML
4.0 and CSS 2.0:
Can be exciting to be at leading edge
Performance benefits
Accessibility benefits
Based on open-standards
Provides motivation for users to upgrade browsers
Likely to be solution at some point (cf. Gopher)
Backwards compatibility problems with old browsers
Costly to deploy new authoring news, training, ..
Likely to be bugs and incompatibilities with new tools
and browsers
35
B
Implement "Safe" Solutions
An alternative is to use "safe" parts of
technologies which are backwards compatible
and avoid major browser bugs
Attractive sounding compromise position
Lose some functionality, but not all
Can be difficult or expensive to find "safe" options
(does .margin-left work on IE on SGI?)
Tools may not allow safe options to be chosen
Lack of validation tools for checking conformance
with restricted set of specification
Note
36
See <URL: www.webreview.com/guides/style/
insafegrid.htm> for unsafe CSS 2.0 properties
B
Decision Time
What would you opt for?
Stick with current technologies
Cheap, default option. Continuation of performance
and accessibility problems. Unlikely to be long
term solution.
Deploy new technologies
More expensive option. Functionality, performance
and accessibility benefits. Access problems for old
browsers.
Use "safe" new technologies
May require home-grown tools and support. Avoids
some of the problems of other solutions
37
B
An Alternative
An alternative approach to deploying new
technologies is available:
• Use more intelligent server-side software
• Use "proxies" to address limitations of browser
technologies. The term intermediary was used
in a paper [1] at the WWW 7 conference to
describe this approach
• Protocol solutions, such as Transparent
Content Negotiation (TCN) and (CC/PP)
[1] "Intermediaries: New Places For Producing
and Manipulating Web Content"
38
B
Intelligent Server Software
Simple model:
• Server receives request for resource
• Server delivers resource to client
More sophisticated model:
• Server receives request for resource
• Server processes header information from client
• Server delivers resource to client based on client
information
Can be implemented used server add-ons such as
PHP/FI and MS Active Server Pages or by use of
Content Management systems
39
B
Web Conclusions
To conclude:
• New web protocols are still being developed
• Deployment of new technologies can be expensive
or time-consuming, but is likely to be needed
• Various deployment models:
Don't implement
 Implement fully
Implement via proxy  Other solutions
• We can't do it all ourselves
• Experience in developing (wide-area) web
applications will help in developing intermediaries
40
P
Non–Web Conclusions
• Cross–domain interoperability is a laudable goal
• Technical developments continue in a rapidly
shifting environment
• Libraries are not alone
• To make an OPAC more widely available, look at
Z39.50
• To raise awareness of library web pages, or to
describe particular resources, look at a
‘metadata’ solution like Dublin Core
• We need to move beyond ‘traditional’ users (who
know where the library is and what if offers)…
41