Transcript Document

The OpenURL Quality
Problem & Project
Adam Chandler
Coordinator, Service Design Group
Glen Wiley
Metadata Librarian
Metadata Working Group
February 22, 2008
The Original Problem
• Reduce linking dead ends from a publisher’s content to another
• Show multiple subscriptions or relevant access points in one place
• Desire to show the most appropriate version of the service (like full text)
• Improve content visibility
• Possibly reduce document delivery costs
Brief History of OpenURL
• Originated by Herbert van de Stompel at Univ. of Ghent, around 2000
– Became OpenURL Version 0.1
• Commercialized by ExLibris (SFX) in 2001
• Fast-tracked by NISO
– Released as Version 1.0, but officially as international ANSI standard
Z39.88 in 2004
• OCLC is maintenance agency as of June 2006
What is OpenURL?
• OpenURL is a syntax for querying a server
• to perform a service
• on a resource
– specified by attributes
• sensitive to context
– also specified by attributes
OpenURL is an "actionable" URL that transports resource metadata.
OpenURL Version 0.1 Example
http://linkresolver.library.cornell.edu:4550/resserv?genre=article&issn=0160
4120&title=Environment+International&volume=32&issue=1&date=20060101
&atitle=The+United+States+Department+of+Energy's+Regional+Carbon+Seq
uestration+Partnerships+program.&spage=128&pages=128144&sid=EBSCO:aph&aulast=Litynski
OpenURL Version 1.0 Example
http://linkresolver.library.cornell.edu:4550/resserv?url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rfr_id=info:sid/w
ww.isinet.com:wok:wos&rft.au=giordanino,+m&rft.epage=377&rft.stitle=knowl+eng+rev&rft.date=2007
&rft_id=info:doi/10.1017%2fs0269888907001233/&url_ver=z39.88-2004&rft.issn=02698889&rft.aulast=uren&rft.title=knowledge+engineering+review&rft.genre=article&rft.issue=4&rft.spage=
361&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.volume=22&rft.auinit=v&rft.atitle=the+usability+of+sem
antic+search+tools%3a+a+review
How does it work?
OpenURL Version 0.1
• OpenURL 0.1 is a de facto standard that is built around
scholarly bibliographic data only
• An accepted “standard” syntax for creating a link between an
information source and a link resolver
• Pre-defines sets of data elements to use in describing an
“item”
• Relies on HTTP protocol for transmission
• The concept of context-sensitive linking implemented for a
specific class of resources: (some) scholarly assets
OpenURL Version 0.1
Limitations:
• Pre-defined metadata genres and elements means that new ones
cannot be defined to meet emerging needs (e.g., for image databases)
• Only provides for key-value pair (HTTP GET or POST) representation
of metadata.
• OpenURL 0.1 is tied to HTTP transport
• Lack of implementation guidelines means that support for OpenURL
is loosely defined
OpenURL Version 1.0
–
–
–
–
Complicated and highly abstract
Designed for greater flexibility
Slower uptake
Supports richer data formats/genres
• Journal, Article, Proceeding, Preprint, Book, Report, Document, Patent,
Dissertation, etc
– Provides more complete context description
– Supports transport mechanisms other than HTTP
• like SOAP, OAI-PMH, HTTPS
– A generic specification that allows to implement OpenURL Applications
• OpenURL Applications: networked applications that implement the
concept of context-sensitive services for a certain class of resources
Understanding OpenURL Version 1.0
networked
resource
Resolver
Transport
reference
about
Referent
description
of
Referent
& context
ContextObject
services
pertaining to
Referent
Diagram is from Herbert von de Sompel’s OpenURL Tutorial at the
Olybris 2005 Ex Libris Seminar, Kos, Greece, April 18th 2005.
Understanding OpenURL Version 1.0
• OpenURL 1.0 divides ContextObject into six entities (including the resource)
– Each entity has attributes to identify it
– Each entity has schema for those attributes
• Each entity affects URL resolution
Problems with the Standard & Documentation
•Tough read
•Key/Encoded-Value (KEV)
“Implementation Guidelines” are
helpful, but complex
•Not specific enough in many ways.
Some mention of best practices for
metadata values like:
•UTF-8 encoding for special characters
•DCMI Type Vocabulary for Referent Type
(rft.type)
•MIME type for Referent Format (rft.format)
Miriam Blake citation and the Known Issues
• M.E. Blake, F.L. Knudson. Metadata and Reference Linking. Libr. Coll.
Acq. & Tech. Serv. 26 (2002) 219–230 229
• Goals for the future:
– Increased consistency in metadata within a single database and
across databases.
– Increased communication between primary publishers and
secondary publishers.
– Increased awareness of bibliographic/citation standards by
authors.
– Increased outreach by librarians to authors emphasizing and
promoting the importance of citation standards for electronic
document retrieval.
Link Resolvers and the Serials Supply Chain
[UKSG Report] -- 2007
•Description of the Supply Chain
•Issues and Barriers
•Lack of awareness
•Lack of Co-operation
•Inaccurate/Incomplete Data
•Content Package Issues
•Responsibility of Data Quality
•Lack of Data Standards
•Inbound Linking Issues
•Etc…
•Recommendations
Problems Persist
1. Wrong start end date in the local library's holdings database
2. Wrong link-to syntax in link resolver
3. Inaccurate or missing Crossref DOI URL (often the DOI
registration process is out of sync with the mounting of articles)
Problems Persist
4. Semantically inaccurate metadata from the OpenURL origin (wrong
ISSN, for example)
Problems Persist
5. Syntactically incorrect metadata from the OpenURL origin
Problems Persist
6. Subscription and embargo
errors (especially in January)
– For each month that passes
the chances of the link
working is increased by over
8%
Characteristics of a solution to the OpenURL
quality problem
• empirical
• network level problem: so it needs be solved at
the network level
• sanctioned, officially recognized
• offer value to librarians and content providers
• narrow scope
Model: Open Language Archives Community
Metadata Quality Evaluation: Experience from the Open Language Archives
Community, Baden Hughes, Department of Computer Science and Software
Engineering, University of Melbourne,
Abstract. We describe the motivation, design and implementation of an
infrastructure to support metadata quality assessment within a specialised Open
Archives Initiative (OAI) sub-domain, the Open Language Archives
Community (OLAC). While services for structural validation of metadata are
widely used, there is little corresponding work regarding services which
evaluate the semantic and syntactic content of metadata from a qualitative
perspective. We posit that any measure of metadata quality benefits from both
contextual and referential assessment - metadata on a per record and per
collection basis is legitimately assessed against the baseline of broader
community practice, as well as for compliance to any external standard. In this
paper we describe the implementation of a metadata quality assessment scheme,
and the corresponding interfaces to the evaluation tool.
http://eprints.infodiv.unimelb.edu.au/archive/00001408/01/ICADL2004-PUBLISHED.pdf
Model: Open Language Archives Community
Metrics
• code existence score, 0-1 (bonus for using controlled vocabulary)
• element absence penalty, 0-1 (penalty for missing core elements)
• per metadata record weighted aggregate, max 10
• archive level derivative metrics
• archive diversity metric (use of controlled vocabulary across the archive)
• metadata quality score metric (derived from individual scores)
• core elements per record metric
• core element usage metric
• code usage metrics
• code and element usage metrics
• “star rating” (derived from average item score in archive)
http://eprints.infodiv.unimelb.edu.au/archive/00001408/01/ICADL2004-PUBLISHED.pdf
Case Study: L'Année philologique
Log file provided by Professor Eric Rebillard, Director of Graduate
Studies, Field of Classics
http://www.annee-philologique.com/aph/
126 OpenURLs in sample
Observations: log file scan
[Log file is not available in Powerpoint version.
Please contact Adam Chandler for more information]
Observations: date
Date of publication in ISO 8601 form YYYY, YYYYMM or YYYY-MM-MM [p.56]
NOTE: "chron" Indications of chronology in a non
ISO8601 form (like "Spring" or "1st quarter") should
be carried in this element; the element content is not
normalized. Where numeric ISO8601 dates are also
available, they should be provided in the "date"
element. As such, a recorded date of publication of
"Spring, 1992" becomes "date=1992" and
"chron=spring". Chronology information can also be
provided in the "ssn" and "quarter" elements [p. 57]
log examples:
2000-2001
2000-2001
2000-2001
2004-2005
2004-2005
2003-2004
2004-2005
1998-1999
2004-2005
2004-2005
Observations: volume and issue
log examples:
Volume is usually expressed as a number but
could be roman numerals or non-numeric, e.g.
"124", or "VI"."4“ [p.57]
Issue: This is the designation of the published
issue of a journal, corresponding to the actual
physical piece in most cases. While usually
numeric, it could be nonnumeric. Note that
some publications use chronology in the place
of enumeration, i.e. Spring, 1998. [p.58]
N.%20S.%2055%20(1)
7%20(1)
43%20(3-4)
N.%20S.%2055%20(2)
4a%20ser.%203%20(1)
N%B0%20152
N%B0%2054
7%20(2)
133%20(2)
13-14
4a%20ser.%203%20(1)
31%20(1)
133%20(2)
38%20(3)
98%20(1)
N.%20S.%2055%20(1)
Observations: spage
"spage=" is missing: more useful than pages field when
linking to full text
First page number of a start/end (spage-epage) pair. Note
that pages are not always numeric [p.58]
Observations: missing ISSNs
International Standard Serial Number (ISSN). ISSN
numbers may contain a hyphen, e.g. "1041-5653" [p. 59]
"ISSN=" these are easier to resolve than titles, especially
with titles that contain special characters
Observations: character encoding
Character encoding:
Use UTF-8
Specify character encoding this way in OpenURl 1.0: info:ofi/enc:UTF-8
Source: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Character+Encodings
Observations: Missing WorldCat numbers
Including OCLC WorldCat numbers would help to resolve title level
ambiguities, especially when the request is routed to InterLibrary Loan
Data from title matching in WorldCat
17 titles without an ISSN
To do this means moving from OpenURL 0.1 to 1.0 format
info:ofi/nam:info:oclcnum:
Source: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc&set=Core:Namespaces
Analysis of L'Année philologique in log sample that are
held in WorldCat libraries
Total titles analyzed: 81
Total confirmed held by Cornell in WorldCat: 53 (margin of error)
Unconfirmed in or out of WorldCat: 6
Median number of libraries that hold these titles: 67
Thus, even if the metadata were perfect, finding the title through
ILL, especially without an identifier (ISSN, ISBN, WorldCat) is
expensive.
Caveat: Not all of a library’s holdings are in WorldCat, especially
journals.
The scale of the OpenURL quality problem
Cornell link resolver activity: December 3, 2007 – February 8,
2008: 53,062 openurls were sent to link resolver.
Discussion
Notes and links
http://library4.library.cornell.edu/openurl/index.html
How many openurls came into Cornell dec – feb?
http://www.language-archives.org/index.html
http://www.niso.org/standards/standard_detail.cfm?std_id=783
http://erms.library.cornell.edu/webbridge/edit
beforelinks.html