DOI System: Syntax

Download Report

Transcript DOI System: Syntax

DOI SYSTEM: SYNTAX

Workshop on the DOI System

International DOI Foundation

Outline / Key concepts in this section

• • • • • • • • Terminology Format Assignment and uniqueness Scope of the DOI System Relation to other identifier schemes Directory management The uses of prefixes for management Administrative granularity

doi>

Further reading on key concepts in this section doi>

DOI Handbook Chapter 2, “Numbering” http://www.doi.org/handbook_2000/enumeration.html

DOI name doi>

DOI name: the string that specifies a unique object (the in a sequence prescribed by the DOI syntax.

referent

) within the DOI System. Names may consist of alphanumeric characters • The terms “identifier” and “number” are sometimes but not always used in the same sense and are to be avoided where ambiguity might arise. • The unqualified use of “DOI” alone may also be ambiguous: the term should instead always be used in conjunction with a specific noun ( DOI name, DOI system , etc).

DOI syntax doi>

• A DOI name consists of a prefix e.g. 10.1223

/ 4567 and a suffix • DOI names are case insensitive – 10.123/ ABC is identical to 10.123/ AbC – This is a deliberate choice: see DOI Handbook 2.4

• Prefixes and suffixes use ascii characters (letters and numbers) – in principle can use any printable characters from the Universal Character Set (UCS-2), of ISO/IEC 10646, which is the character set defined by Unicode v2.0

: encompasses most characters used in every major language written today. – However, because of specific uses made of certain characters by some Internet technologies (vary by browser!), recommended to keep to simple (A-Z, 1-9) – Note encoding requirements when a DOI name is used with HTML, URLs, and HTTP (special care with % “ # and [space], and use of pointed brackets < > in xml etc) – http://www.doi.org/handbook_2000/appendix_1.html#A1-E • Prefixes are allocated to DOI name assigners; assigners then add the suffix. RAs oversee the process to ensure no duplication etc.

DOI syntax: prefix doi>

• Prefix always begins 10 (by convention) – In practice, 10 is the Handle system prefix allocated to the IDF – If it doesn’t begin 10, it’s not a DOI name (but it may be a Handle) • Prefix may be any length, but currently using four digits. e.g.

10.1234

/ 456-mydoc-456584893489 • Prefix may be further subdivided e.g.

10.1234.456.7 / 4851234 – Current DOI System practice is not to do so unless a specific requirement – Such subdivisions are peers (10.123 is the same level as 10.123.456), but can be specifically configured to be a hierarchy

DOI syntax: suffix doi>

• Suffix may be any length.

• Suffix may incorporate another identifier numbering scheme (or may be new): – e.g. 10.1234

/ ISBN 0-7894-7764-5 – – the DOI System treats all DOI names as “dumb strings” care if the other identifier contains special characters (e.g. the SICI < > ) • If not using another identifier, then the assigner needs to devise some way of allocating numbers.

• Using DOI names may obviate the need adopt or create a new scheme: e.g. in CrossRef: – Publisher A uses PII: S1384107697000225 – – Publisher B uses SICI: 0361-9230(1997)42:2.0.TX;2-B – – – Publisher C uses his own numbers: These three schemes are not at all interoperable, but become so in the DOI System as: doi:10.2345/ doi:10.4567/ S1384107697000225 JoesPaper56 0361-9230(1997)42:2.0.TX;2-B doi:10.6789/ JoesPaper56 • A particular Registration Agency may (and probably should) determine some specific rules or recommendations for its own DOI name registrants and applications.

Visual presentation of DOI name doi>

• When displayed on screen or in print, a DOI name is preceded by a lowercase "doi:" unless the context clearly indicates that a DOI name is implied. – EXAMPLE: the DOI name 10.1006/jmbi.1998.2354

doi:10.1006/jmbi.1998.2354

. is displayed as • The use of lowercase string “doi” follows the specification for representation as a URI; http://www.ietf.org/rfc/rfc2396.txt (as for e.g. "ftp:" and "http:").

• When displayed in web browsers the DOI name itself may be attached to the address for an appropriate proxy server, to enable resolution of the DOI name via a standard web hyperlink. – EXAMPLE: the DOI name 10.1006/jmbi.1998.2354

could be made an active link as http://dx.doi.org/10.1006/jmbi.1998.2354.

Scope of the DOI System doi>

• Digital Object Identifier = Digital [Object Identifier] – not [Digital Object] Identifier • “The DOI ® System provides an infrastructure for persistent unique identification of entities ... A DOI name is permanently assigned to an object, to provide a persistent link to current information about that object, including where the object, or information about it, can be found on the internet”. • Because entities of interest may be physical, digital, or abstract.

– e.g. CrossRef assigns DOI name to “article” irrespective of format • Handle: Digital Object Architecture – Not a conflict: Any entity can be abstracted into a representation as

a digital object

Scope of the DOI System doi>

• A DOI name may be assigned to any object of any form whenever there is a functional need to distinguish it as a separate entity . • Registration Agencies may specify more constrained rules for the assignment of DOI names to objects for DOI-related services. • “The principal focus of assignment shall be to in their management, e.g. licences, parties”.

content-related entities exemplified by, but not limited to: text documents; data sets; sound carriers; books; photographs; serials; audio, video and audiovisual recordings; software; abstract works; artwork, etc., and related entities

Uniqueness doi>

• Each DOI name can specify one and only one referent in the DOI System. – A role of Registration Agencies is to provide a service to registrants which facilitates this. – However, the DOI System will not accept duplicate prefix+suffix and makes internal checks for uniqueness at the time of registration .

• A referent may be specified by more than one DOI name, though it’s recommended practice that each referent has only one DOI name.

– Because it may not always be known that a DOI name already exists – Where multiple DOI names are assigned to the same referent, e.g. through assignment of DOI names by two different registration agencies, the IDF encourages registration agencies to collaborate in provide a unifying record for that referent.

• It is good practice never to reissue any unique identifier that has been once issued in error.

Persistence doi>

• No time limit for the existence of a DOI name shall be assumed in any assignment, service or application. • A DOI name and its referent are unaffected by changes in the rights associated with the referent, or changes in the management responsibility of the referent object. • The IDF implements rules for transfer of management responsibility between Registration Agencies, requirements on Registration Agencies for maintenance of records, default resolution services, and technical infrastructure resilience. • The DOI System is not a means of archival preservation of identified entities. • The DOI System provides a means to continue interoperability through exchange of meaningful information about identified entities and initiated actions between different systems through at minimum persistence of the DOI name and description of the referent .

doi>

Party

do makes uses

Creation

about

Transaction

Current DOI name uses

d

ecs>

DOI names with existing identifiers doi>

• Identifier schemes already exist for many creations – ISBN, ISSN, ISRC, etc. – New ones: e.g. ISTC (textual abstractions e.g. “Robinson Crusoe by Daniel

Defoe”)

• ISO standardisation of DOI System recognises this • First example – “Bookland DOIs” from ISBNs – Name comes from “Bookland” bar codes from ISBNs • Pilot scheme based on the new syntax of the ISBN-13 – ISBN: 978-86-123-4567-8 – DOI name to be: 10.

978.86123/45678 • Second example - ISSN: • Defined syntax for ISSNs in DOI names: – doi:10.5555

/ issnl.1234-5678 (linking ISSN: all media versions) – doi:10.5555

/ issn.1234-5678 (ISSN: specific media version) • NB: Relevant information as to the identity of the referent is included in the metadata associated with the DOI name string.

DOI names with existing identifiers doi>

General case • ISO standardisation of DOI System

– “A DOI name is not intended as a replacement for other identifier schemes, but when used with them may enhance the identification functionality provided by those systems with additional functionality…”

• Incorporate the other identifier into the DOI syntax

and/or

• Record the other identifier in the DOI metadata. • Each scheme retains its autonomy but works together

DOI names for entities other than “creations” doi>

• Parties

– Authors: for disambiguation etc – Institutions: for licensing transactions, etc.

– ISNI: International Standard Name Identifier (was: ISPI) • Based on InterParty “PIDI = Public identity identifier” – ITU Identity management Focus group • Any end point in the network (machines, users)

• Licences

– ONIX for licencing work (with NISO/ERMI) • Electronic Resource Management Initiative – Contextual identification

Granularity doi>

• Granularity: the extent to which a collection of information has been subdivided for purposes of identification (e.g. a collection; a book; tables and figures) – Functional Granularity : it should be possible to identify an entity whenever it needs to be distinguished • Your functional granularity may not be my functional granularity: – A wants to distinguish “this book in any format”, but B wants to distinguish “the pdf version” from “the html version”, etc ….” • “It is a fundamental of almost any statistic that, to produce it, something, somewhere has been defined and identified. Never underestimate how much nuisance that small practical detail can cause.

First, it has to be agreed what to count…. In maths numbers seem hard, pristine and bright, neatly defined around the edges. In life, we do better to think of something murkier and softer” – “The Tiger That Isn’t: Seeing Through a World of Numbers”

(2007) Blastland & Dilnot

• You must know (say) PRECISELY WHAT is being identified

Granularity doi>

• A DOI name may be assigned to any entity, regardless of the extent to which it may be a component part of some larger entity. DOI names may be assigned at arbitrary levels of granularity or abstraction. • EXAMPLE: separate DOI names may be assigned to: – a novel as an abstract work; – a specific edition of that novel; – a specific chapter within that edition of the novel; – a single paragraph; – a specific image or quotation; – each specific manifestation in which any of those entities are published or otherwise made available, – “or any other level of granularity which a registrant deems to be appropriate” • Assignment of a DOI name shall require the Registrant to record metadata describing the entity to which the DOI name is being assigned. The metadata shall describe the entity to the degree that is necessary to distinguish it as a separate entity within the DOI System. In certain cases (which shall be defined in the User Manual) it shall be allowable for no metadata declaration to be made.

Specifying what is identified doi>

Manuscript mss #ABC123 paper journal/volume/page Two things in one: Physical manifestation of intangible work (which is identified?)

Web page URL “intangible Work”

MS

Vol/page; ISBN; SICI, etc

“work” used in analytical sense, not copyright sense

“intangible Work”

Versions – separately identified?

What are we identifying?

doi>

Document on screen Abstract work?

Manifestation of abstract work?

Version?

This HTML file? All/some of these?

Does it matter?

doi>

Yes, it can do. e.g.: 1. Practical use of data. Example – journal article

– For the purpose of citation: • Count pdf, print, html as same • Citation refers to the abstract work (hence ISI, CrossRef) – For the purpose of purchase: • Count pdf, print, html as different • Purchase refers to the manifestation – Suppose I encounter a purchase system and try to use it for counting citations….

– Can I rely on a system now if I don’t know what is being identified? Can others rely on the system long after I’m gone?

2. Legal implications: copyright “My A is the same as your B and is my copyright…”

The

d

ecs> framework doi> Principles:

• Unique Identification : every entity should be uniquely identified within an identified namespace.

• Functional Granularity : it should be possible to identify an entity whenever it needs to be distinguished • Designated Authority : the author of an item of metadata should be securely identified.

• Appropriate Access : everyone requires access to the metadata on which they depend, and privacy and confidentiality for their own metadata from those who are not dependent on it.

• Definition of metadata

(description)

: An item of metadata is a relationship that someone claims to exist between two referents More on this: see “data model”

First class naming doi>

• Many of the items we manage should be treated as “First-class objects”.

• First class = having an identity independent of any other item.

– A key concept of Digital object architecture (e.g. Handles) www.acme.com/document456  Document456  Vanity Fair  Penguin Classics: Vanity Fair  ISBN-13: 978-0-141-43983-9  www.newco

www.acme.com/doc456 doc456 First class name www.acme.com

The use of prefixes doi>

• • • • • • • • • • A DOI name consists of a prefix and a suffix e.g. 10.1223

/ 4567 A prefix can have unlimited suffixes So in theory, only one prefix is needed? Could a set of DOI names ever need to be managed differently – e.g. separated across DOI RAs, or different mirror servers, etc? CrossRef example: Prefix allocated to a publisher (imprint), not a journal Would it be better to have a separate prefix for each journal? Journals can move publisher.

Easy to manage one prefix on an everyday basis (ISBN, etc) – Management of a whole customer’s DOI name set by one prefix But easiest to group DOI names by separate prefixes if you need to change them… A trade-off

Administrative granularity doi>

• • • • • Who will need to administer the prefix?

IDF Directory Manager RA manager Individual customer of RA (e.g. a publisher) Individual manger within a publisher (e.g. production manager) • • • • • – Prefixes can have a defined administrator Similarly, URLs rely on one site administrator But also: DOI names can have any level of administrative granularity Every single DOI name could have a different manager!

Handle System has various levels of administrator, and keys A choice which must depend on each application’s requirements

Handles resolve to typed data doi>

Handle Data type Index Handle data

10.123/456 URL URL DLS HS_ADMIN XYZ 1 2 9 100 12 http://acme.com/….

http://a-books.com/….

acme/repository acme.admin/jsmith 1001110011110 Rules for data type construction: www.handle.net/overviews/types.html

Outline / Key concepts in this section

• • • • • • • • Terminology Format Assignment and uniqueness Scope of the DOI System Relation to other identifier schemes Directory management The uses of prefixes for management Administrative granularity

doi>

DOI SYSTEM: SYNTAX

Workshop on the DOI System

International DOI Foundation