Document 7427396

Download Report

Transcript Document 7427396

Open for Business
Open Archives, OpenURL,
RSS and the Dublin Core
Andy Powell, UKOLN, University of Bath
[email protected]
UKSG 2004, Manchester
UKOLN is supported by:
www.ukoln.ac.uk
a centre of expertise in digital information management
www.bath.ac.uk
Contents
• context – metasearching and open
‘context sensitive’ linking
• bluffer’s guides to…
–
–
–
–
Dublin Core
OAI Protocol for Metadata Harvesting
RSS
OpenURL
• discussion about the benefits, problems
and issues of using these standards in the
publishing ‘business’ environment…
UKSG 2004, Manchester
2
Things to note…
• this is a briefing session about
technologies…
• …but it is not intended to be overly
technical
• you should leave with an understanding
of what the key technologies are – but
not necessarily be expert in them!
UKSG 2004, Manchester
3
Important
• this is a briefing session…
…please feel free to ask questions
as we go through!
UKSG 2004, Manchester
4
Context: metasearching and
context sensitive linking
UKSG 2004, Manchester
5
The ‘problem’…
• end-user often has access to large
number of heterogeneous collections full-text, A&I, images, video, data, etc.
(e.g. thru JISC licening agreements)
• however, experience of these collections
is less than optimal:
– end-users not aware of available content
– end-user has to interact with (search or
browse) multiple different Web sites to work
across range of content
– content ‘discovery’ services not joined-up
with delivery services
UKSG 2004, Manchester
6
Or, to put it another way…
• from perspective of ‘data consumer’
– need to interact with multiple collections of
stuff - bibliographic, full-text, data, image,
video, etc.
– delivered thru multiple Web sites
– few cross-collection discovery services (with
exception of big search engines like Google, but still some
issues with use of Google – e.g. the ‘invisible Web’, the lack
of metadata, keywords with multiple meanings, etc.)
• from perspective of ‘data provider’
– few agreed mechanisms for disclosing
availability of content
UKSG 2004, Manchester
7
A solution…
• an ‘information environment’
• framework of machine-oriented services
allowing the end-user to
– discover, access, use, publish resources across a range
of content providers
– move away from lots of stand-alone Web sites...
• content providers expose metadata for
– searching, harvesting, alerting
• develop end-user services and tools that
bring stuff together…
• …based on open ‘standards’
UKSG 2004, Manchester
8
End-user services and tools
• tend to focus on library portal
(metasearch) tools (e.g. Encompass,
MetaLib or ZPortal)
• but, there will be lots of user-focused
services and tools…
– subject portals developed within academia
– reading list and other tools in VLE (e.g. externally hosted
by Sentient Discover)
– commercial ‘portals’ (ISI Web of Knowledge, ingenta, Bb
Resource Center, etc.)
– SFX service component (or other OpenURL resolver)
– personal desktop reference manager (e.g. Endnote)
UKSG 2004, Manchester
9
Link resolvers
• ‘discovery’ is only part of the problem…
• in the case of books, journals, journal
articles, end-user wants access to the most
appropriate copy
• need to join up discovery services with
access/delivery services (local library OPAC,
ingentaJournals, Amazon, etc.)
• need localised view of available services
• linking services that provide access to the
most appropriate copy
– user and institutional preferences, cost, access
rights, location, etc.
UKSG 2004, Manchester
10
A shared problem space
• the problems outlined here are shared
across sectors and communities
– student or researcher looking for information from variety
of bibliographic sources
– lecturer searching for e-learning resources from multiple
learning object repositories
– researcher working across multiple data-sets and
compute servers on the Grid
– a GP searching the National electronic Library for Health
– school child searching BBC, museum and library Web
sites for homework project
– someone searching across multiple e-government Sites
– even someone looking to buy or sell a second-hand
car…
UKSG 2004, Manchester
11
Technologies
• require global, standards-based, crossdomain solutions…
• cross-searching
– Z39.50 – Bath Profile, a profile of Z39.50
SRW (Search and Retrieve Web-service)
(Web services implementation of Z39.50)
• harvesting
– OAI-PMH - Open Archives Initiative Protocol
for Metadata Harvesting
• alerting
– RSS - RDF/Rich Site Summary
…and cross-domain
• linking
metadata
– OpenURL
UKSG 2004, Manchester
12
Bluffer’s Guide to…
Dublin Core
UKSG 2004, Manchester
13
Bluffer’s guide to DC
http://dublincore.org/
1. DC short for Dublin Core
2. simple metadata standard,
supporting ‘cross-domain’
resource discovery
3. original focus on Web resources but that
is no longer the case – e.g. usage to
describe physical artefacts in museums
4. current usage across wide range of
sectors – academic, e-government,
museums, libraries, business, semantic
Web
UKSG 2004, Manchester
14
Bluffer’s Guide to DC
•
•
‘simple DC’ provides 15 elements
(metadata properties)
multiple encoding syntaxes including
HTML <meta> tags, XML and RDF/XML
(XML schema are available)
dc:title
dc:creator
dc:subject
dc:description
dc:contributor
dc:date
dc:type
dc:format
dc:source
dc:language
dc:relation
dc:coverage
dc:publisher
dc:identifier
dc:rights
UKSG 2004, Manchester
15
Bluffer’s Guide to DC
7. relatively slow programme of adding
new terms to ‘qualified DC’
– new elements (e.g. dcterms:audience)
– element refinements (e.g.
dcterms:dateCopyrighted)
– encoding schemes (e.g. dcterms:LCSH and
dcterms:W3CDTF
– 48 elements and 17 encoding schemes
http://dublincore.org/documents/dcmi-terms/
UKSG 2004, Manchester
16
Bluffer’s Guide to DC
8. DC can be embedded into HTML pages
but almost none of the big search
engines will use it! Why? Lack of
trust…
– meta-spam
– meta-crap
– however, embedding DC in HTML may be
worthwhile if your own site search engine
uses it
9. however, simple DC forms baseline
metadata format for the OAI protocol…
UKSG 2004, Manchester
17
Bluffer’s Guide to
OAI Protocol for Metadata
Harvesting
UKSG 2004, Manchester
18
OAI roots
• the roots of OAI lie in the development of
eprint archives…
– arXiv, CogPrints, NACA (NASA), RePEc, NDLTD,
NCSTRL
• each offered Web interface for deposit of
articles and for end-user searches
• difficult for end-users to work across
archives without having to learn multiple
different interfaces
• recognised need for single search
interface to all archives
– Universal Pre-print Service (UPS)
UKSG 2004, Manchester
19
Searching vs. harvesting
• two possible approaches to building a
single search interface to multiple eprint
archives…
– cross-searching multiple archives based on protocol
like Z39.50
– harvesting metadata into one or more ‘central’ services
– bulk move data to the user-interface
• US digital library experience in this area
indicated that cross-searching not
preferred approach
– distributed searching of N nodes viable, but only for
small values of N
UKSG 2004, Manchester
20
Harvesting requirements
• in order that harvesting approach can work
there need to be agreements about…
– transport protocols – HTTP vs. FTP vs. …
– metadata formats – DC vs. MARC vs. …
– quality assurance – mandatory elements,
mechanisms for naming of people, subjects,
etc., handling duplicated records, best-practice
– intellectual property and usage rights – who
can do what with the records
• work in this area resulted in the “Santa Fe
Convention”
UKSG 2004, Manchester
21
Development of OAI-PMH
• 2 year metamorphosis thru various names
– Santa Fe Convention, OAI-PMH versions 1.0, 1.1…
– OAI Protocol for Metadata Harvesting 2.0
• development steered by international
technical committee
• inter-version stability helped developer
confidence
• move from focus on eprints to more
generic protocol
– move from OAI-specific metadata schema to mandatory
support for DC
UKSG 2004, Manchester
22
Bluffer’s guide to OAI
http://www.openarchives.org/
1. OAI-PMH short for Open Archives Initiative
Protocol for Metadata Harvesting
2. a low-cost mechanism for harvesting
metadata records
– from ‘data providers’ to ‘service providers’
3. allows ‘service provider’ to say ‘give me
some or all of your metadata records’
– where ‘some’ is based on date-stamps, sets,
metadata formats
4. eprint heritage but widely deployed
– images, museum artefacts, learning objects, …
UKSG 2004, Manchester
23
Bluffer’s guide to OAI
5. based on HTTP and XML
– simple, Web-friendly, fast deployment
6. OAI-PMH is not a search protocol
– but use can underpin search-based services
based on Z39.50 or SRW or SOAP or…
7. OAI-PMH carries only metadata
– content (e.g. full-text or image) made available
separately – typically at URL in metadata
8. mandates simple DC as record format
– but extensible to any XML format – IMS
metadata, IEEE LOM, ONIX, MARC, METS,
MPEG-21, etc.
UKSG 2004, Manchester
24
Bluffer’s guide to OAI
9. metadata and ‘content’ often made freely
available – but not a requirement
– OAI-PMH can be used between closed
groups
– or, can make metadata available but restrict
access to content in some way
10. underlying HTTP protocol provides
– access control – e.g. HTTP BASIC
– compression mechanisms (for improving
performance of harvesters)
– could, in theory, also provide encryption if
required
UKSG 2004, Manchester
25
Bluffer’s Guide to…
RSS
UKSG 2004, Manchester
26
Bluffer’s guide to RSS
http://www.eevl.ac.uk/rss_primer/
1. simple XML application for sharing
(syndicating) ‘news’ feeds on the Web
2. RDF Site Summary or Rich Site
Summary (depending on who you ask)
3. ‘news’ can be interpreted quite loosely,
e.g. new items added to database
4. uses ‘channel’ and ‘item’ terminology
5. a ‘channel’ is an XML document that is
made available on a Web-site – to update
the channel, simply update the XML
UKSG 2004, Manchester
27
Bluffer’s guide to RSS
6. each ‘item’ has simple metadata (title,
description) and URL link to resource
(news story or whatever)
7. RSS also provides channel branding
(logo, etc.)
8. three versions currently 0.9, 1.0 and 2.0
- 1.0 is based on RDF and is more
flexible (but slightly more complex)
(Also worth noting Atom – an attempt to
resolve some of the tensions in RSS)
9. no single registry of all channels yet
UKSG 2004, Manchester
28
Bluffer’s guide to RSS
10. fairly widespread usage, e.g. channels
available from the BBC, Microsoft,
Apple, … as well as from several
academic sites and services (RDN,
LTSN, …)
11. easy to use within ‘portals’ (e.g. uPortal)
12. lots of software and toolkits available –
open source and commercial
UKSG 2004, Manchester
29
Bluffer’s Guide to…
OpenURLs
UKSG 2004, Manchester
30
OpenURL roots
• the context
a library
perspective
?
– distributed information environment (e.g. the
JISC IE)
– multiple A&I and other discovery services
– rapidly growing e-journal collection
– need to interlink available resources
• the problem
– links controlled by external info services
– links not sensitive to user’s context
(appropriate copy problem)
– links dependent on vendor agreements
– links don’t cover complete collection
UKSG 2004, Manchester
31
The problem
• the context
a library
perspective
?
– distributed information environment (e.g. the
JISC IE)
– multiple A&I and other discovery services
– rapidly growing e-journal collection
– need to interlink available resources
• the REAL problem
– libraries have no say in linking
– libraries losing core part of ‘organising
information’ task
– expensive collection not used optimally
– users not well served
UKSG 2004, Manchester
32
The solution…
• do NOT hardwire a link to a single
service on the referenced item (e.g.
a link from an A&I service to the
corresponding full-text)
• BUT rather
– provide a link that transports metadata
OpenURL
about the referenced item
– to another service that is better placed
OpenURL
to provide service links
resolver
(link server)
UKSG 2004, Manchester
33
Non-OpenURL linking
document delivery
service
A&I service
link source
reference
.
link destination
link to referenced work
resolution of
metadata into a link
(typically a URL)
UKSG 2004, Manchester
34
OpenURL linking
A&I service
link source
reference
.
user-specific
transportation of
metadata & identifiers
OpenURL
OpenURL
resolver
document delivery
service
link
link
link
link
provision of OpenURL
link
destination
link
destination
link
destination
link
destination
resolution of metadata &
identifiers into services
UKSG 2004, Manchester
35
Example 1
• journal article
• from Web of Science to ingenta Journals
UKSG 2004, Manchester
36
button indicating
OpenURL ‘link’
is available
UKSG 2004, Manchester
37
OpenURL resolver offering
context-sensitive links,
including link to ingenta
UKSG 2004, Manchester
38
UKSG 2004, Manchester
39
also links to other services
such as Google search for
related information
UKSG 2004, Manchester
40
UKSG 2004, Manchester
41
Example 2
• book
• from University of Bath OPAC to Amazon
UKSG 2004, Manchester
42
button indicating
OpenURL ‘link’
is available
UKSG 2004, Manchester
43
OpenURL resolver offering
context-sensitive links,
including link to Amazon
UKSG 2004, Manchester
44
UKSG 2004, Manchester
45
also links to other services
such as Google search for
related information
UKSG 2004, Manchester
46
UKSG 2004, Manchester
47
Summary…
ingenta
ISI Web of Science
OpenURL resolver
Google
University of Bath OPAC
Amazon
OpenURL Source
OpenURL
Resolver
UKSG 2004, Manchester
OpenURL Target
48
Summary (2)
• OpenURL source
– a service that embeds OpenURLs into its userinterface in order to enable linking to most appropriate
copy
• OpenURL resolver
– a service that links to appropriate copy(ies) and other
value added services based on metadata in OpenURL
• OpenURL target
– a service that can be linked to from an OpenURL
resolver using metadata in OpenURL
UKSG 2004, Manchester
49
Bluffer’s guide to OpenURLs
http://www.niso.org/committees/committee_ax.html
1. standard for linking ‘discovery’ services
to ‘delivery’ services
2. supports linking from OpenURL ‘source’
to OpenURL ‘target’ via OpenURL
‘resolver’
e.g. Web of Science
e.g. ingenta
source
resolver
target
BASEURL
http://www.bath.ac.uk/openurl?genre=article&
atitle=Information%20gateways:%20collaboration
%20on%20content &title=Online%20Information
%20Review &issn=1468-4527&volume=24&
spage=40&epage=45 &artnum=1&aulast=Heery&
aufirst=Rachel
UKSG 2004, Manchester
End-user
50
Bluffer’s guide to OpenURLs
3. the OpenURL is a URL that carries
metadata from the ‘source’ service to the
user’s preferred resolver
4. resolver typically offered by institution
5. currently deployed OpenURLs are often
version 0.1 - focus on bibliographic
resources (books and journal articles)
6. version 1.0 (the standard) – more
generic and extensible, e.g. could carry
metadata about learning objects or
research data
UKSG 2004, Manchester
51
Bluffer’s guide to OpenURLs
7. ‘sources’ need to maintain knowledge
about end-user’s preferred resolver
8. resolvers and targets need to share
knowledge about ‘link-to’ syntaxes
9. most library automation vendors will
either have (or be developing) an
OpenURL resolver solution for their
customers
10. some open-source solutions also
available – but expect to work quite
hard with these
UKSG 2004, Manchester
52
Discussion…
UKSG 2004, Manchester
53
Summary
UKSG 2004, Manchester
54
Summary
• protocols presented here fill space
between ‘information providers’ and other
services (‘portals’, VLEs, etc.)
– allow integration of remote information
resources more seamlessly
– allow separation of ‘discovery’ and ‘content
delivery’
– enable user-focused, context-sensitive linking
– can be viewed as ways of getting users to your
site
• but… there are some issues to beware of
UKSG 2004, Manchester
55
What can you do?
• consider exposing metadata about your
content for harvesting (or searching)
• consider making ‘alerting’ channels
available
• consider supporting use of OpenURLs for
linking to appropriate-copy
• consider how your content will be used in
e-learning context
• consider how external services ‘link to’
your resources (i.e. support persistent
deep linking to your content)
UKSG 2004, Manchester
56