Dia 1 - Institute for the Study of Labor

Download Report

Transcript Dia 1 - Institute for the Study of Labor

Persistent identifiers – an Overview
Juha Hakala
The National Library of Finland
2011-02-01
Traditional identifiers
• Traditional (bibliographic) identifiers are systems like
ISBN (International Standard Book Number) which
provide unique and persistent identification for
certain types of resources (books, serials, etc.)
• They were designed for printed resources before the
Internet was invented; thus the match with the digital
resources and the Web may be a forced one
• These identifiers are well established international
standards with relatively clear roles
• Not always clear how to apply them to the e-resources, except
that identified resources themselves should be persistent
Persistent identifiers (PIDs)
• A new category of identifiers which are actionable in
the Internet, that is, they enable persistent linking
(resolution) to the resource or a surrogate such as a
bibliographic description of the resource
• Most PIDs are also “traditional” identifiers
• When using a DOI, one can identify a book with DOI & an
embedded ISBN or DOI with a local ID string
• URN is the only exception from this; URNs must
include a traditional identifier
• URN namespaces inherit the rules of the traditional identifier
used; there is no need to discuss the scope of the URN itself
Traditional versus persistent identifiers
• Assigning a traditional identifier such as ISBN is
(should be?) a controlled process with precise rules
• What is identified, by whom
• Assigning a PID such as ARK may or may not be a
controlled process and the rules of application may
be vague
• Sometimes the rules are different:
• A book must have just one ISBN, but it may have two PIDs (for
instance, ARK and DOI)
• The National Library of Finland uses Handles in its Dspace system, but URN
is the ”official” identifier of these resources
Recommendations
• Conflicts between the two identifier groups should be
avoided at all cost
• If a traditional identifier can be assigned to the resource, use that
identifier as a part of the PID
• It follows that PIDs that cannot (easily) incorporate traditional
identifiers may cause problems
• Any identifier (traditional / PID) should have explicit
implementation guidelines
• If no general guidelines exist rules must be developed locally;
such rules should eventually be aligned in the level of the PID
community
Persistent identifiers and the Web: Cool URIs
• From the library point of view, cool URIs (URLs) are
not proper identifiers at all
• The same resource may be available from many URLs
• Over time, different resources or variant versions of the same
resource may be available in the same URI
• There is absolutely no control over cool URI assignment
• A user cannot know if a URI is cool or not (most of them aren’t)
• Instead, cool URIs are just shelf marks
• What is a realistic time frame for cool URI persistence?
• Cool URIs can support only resolution; persistent
identifiers can be more versatile in this respect
• Match with the current / future long term preservation systems
Services provided by PIDs
• Basic question: what services do we need?
• Some examples:
• Find all locations (URLs) related to the PID
• Find bibliographic metadata related to the PID
• Retrieve the preservation commitment of the owning
organization (concerning the resource at hand)
• There is no overall framework / context within
which to design the resolution services
• Each PID provides a slightly different set
PID –based services in the future
• Theoretical basis could be twofold:
• Functional requirements for bibliographic records (FRBR) –
model: work, expression, manifestation
• Current theory and practice of long-term preservation based on
the migration strategy (and a long tail of manifestations for each
work)
• This means it must be possible for instance to:
•
•
•
•
Find all works related to the work at hand
Find all expressions related to the work at hand
Find all manifestations of the work at hand
Find out differences between these manifestations
PID–based services in the future (2)
• It should also be possible to
• Find out who is preserving the resource
• Retrieve the rights metadata related to the resource
• Retrieve the preservation metadata related to the
resource
• Retrieve the most original version (the eldest
preserved manifestation) of the resource
• Retrieve the latest (and supposedly the easiest to use)
manifestation of the resource
• …
Example: qualitative social scientific data set
• The work itself should be described; one metadata
element should be the PID
• Expressions (translations to other languages) should have their
own PIDs, linked to the work level record
• There may be multiple manifestations (relational database, Excel
table, etc.) of each expression; each one should have its own
PID, and there should be links to the work / expressions
• In this environment, it would make sense to provide
links to the work, and let the users to choose the
most appropriate manifestation
• Choice of the language, file format, etc.
Recommendations (2)
• Services supported by PID systems need a face lift
• Many systems were designed 10+ years ago, when digital object
management systems were still in their infancy
• Upgrades must be done in a non-destructive manner (existing
implementations must be compliant with the new version)
•
All aspects of PID systems should be standardized
• Some PIDs (e.g. ARK and PURL) have never reached a standard
status, and at best only one part of the system (identifier syntax)
has been published as a standard
• More (and better) open source implementations are
needed
Conclusion
• There will be multiple PIDs in existence in the future
(just like there are now)
• Once a system has been chosen, you cannot give it up
• PID supporters and cool URI proponents will most
likely continue talking past one another for quite
some time, but:
• Given the time frame the national libraries & archives must
preserve resources (centuries) and the technical complexity of
this task, cool URIs fall short of the requirements in several ways;
instead, PIDs must be used
• PID systems are to some extent ”work in progress”