… part of shared task to ensure ease and continuity of access PEPRS: Recording The Extent Preserved Peter Burnhill EDINA, University of Edinburgh with.

Download Report

Transcript … part of shared task to ensure ease and continuity of access PEPRS: Recording The Extent Preserved Peter Burnhill EDINA, University of Edinburgh with.

… part of shared task to ensure ease and continuity of access
PEPRS: Recording The Extent Preserved
Peter Burnhill
EDINA, University of Edinburgh
with sincere thanks to Regina Reynolds
ALA Holdings Forum, New Orleans, 25th June 2011
“Universal
and repurposed holdings information - emerging
initiatives and projects”
4:00/5:30pm MCC Room 355
This presentation is in 3 parts
1.
2.
3.
Why the interest in the ‘holdings statement’
–
‘experiential knowledge’ from union catalogues
–
Moving from human-readable to computational
An introduction to PEPRS and peprs.org
–
What is available now or ‘real soon now’
–
What is unresolved but important and needs doing
Focus on record of extent preserved
–
Extent issued; extent held on shelf or digitally secured
(but first a little bit of ‘institutional’ background to start)
2
Brief introductions
1.
EDINA
–
–
UK national academic data centre –
http://edina.ac.uk
Designated and funded by JISC –
http://www.jisc.ac.uk/
*
–
Based at University of Edinburgh
*
2.
The agency for innovative use of digital technology for
UK research and education
Research-led University, with Library founded in 1580
ISSN International Centre
–
Directs and coordinates the ISSN Network of 88
national ISSN Centres
–
Based in Paris, France
3
What is PEPRS?
•
JISC-funded project
•
led by EDINA & ISSN IC
•
to provide an online registry on what
e-journals are being preserved
– who is doing this and how
– and the extent of content preserved
•
a registry of keepers of (e-)journal content
Experience and implications (1) Union catalogues
1.
SALSER (union
catalogue of serials in
Scotland, est. 1994)
–
http://edina.ac.uk/salser/
–
all life is there
–
no de-duplication at
the title level, nor at
the holdings level
–
Holdings statements
once described as
“highly variable and
mostly poor”
5
2. SUNCAT, the UK union catalogue of serials
•
80 largest research & university libraries
– inc British Library, Cambridge, Oxford, Edinburgh, Glasgow
– 3.5m ‘library records’: over 4.7m ‘item holding records’
+ 2.8m ‘titles’ in CONSER, ISSN & DOAJ databases
•
FRBR-like matching to provide search at title-level
– http://www.suncat.ac.uk/
•
No comparison of information at holdings level
– change in local holdings statement is biggest cause of
updating
•
Helping UK Research Reserve discover ‘candidate
titles’ for print archiving
– UKRR plans to keep minimum of 3 copies
* OPAC holdings statements not reliable enough for
disposal decisions
Importance of knowing what was & was not issued
•
Always been a problem for librarians who need to
claim back for what does not arrive
•
Now a problem for the ‘preservers’!
Exploring data flow
on ‘issues’ into
SUNCAT: to help
librarians know
what had not
been issued(!)
– ONIX for Serials and serials holdings format
Experience and implications (2): access to articles
The article has always been the ‘information
object of desire’. Now with an established
digital world (but not a ‘digital only’ world),
the focus is on ‘entitlement’ & ‘access’
- not ‘holdings’
Assisting access to articles online remotely
1.
A&I and machine-to-machine access
– linking via OpenURL to articles online
2.
Institutions arrange licence & remote access
to publishers’ content via ERM (not the OPAC)
3.
Recent focus on role of ERM, and union
catalogues, to record of ‘entitlement’ in
event of cancelation
4.
Renewed attention on ‘digital shelf for back
copy’
– for assurance of continuity of access
9
Scholarly Communication
(Retaining focus on formal (£) economy for licensed online access to article–
length work published in journals – but conscious of the ‘open’)
ISSN & other
metadata
Publisher
article
serial
issue
‘Holdings’
metadata
DOI & other
metadata
Licence=
authorisation
unioncat
Serials
managers
OpenURL
Resolver
Library
(serial)
A&I
OPAC
‘discover’
‘locate/access’
authentication
(Shibboleth)
‘request’
Reader
(article)
P.Burnhill, EDINA/JISC, 2005 (updated 2011)
Is this a case of ‘middle child syndrome’?
an emotional scarring
condition
with neglect,
forgotten dates, and
sometimes in bad
cases forgetting they
even exist.
Middle children are
known for ending up
with things that are
too big for the baby
and too small for the
oldest.
13
Holdings statements as the “middle child”
1.
In OPACs and union catalogues, holding
statements are difficult to understand, often
regarded as wrong, and some think them
unreformable.
2.
The eldest (the journal title information) always
takes precedence, but can help a lot if well
defined
3.
The youngest (the wild article child) is ‘just there’
PEPRS:
Piloting an E-journal Preservation Registry Service
Idea of a registry raised in literature,
ca. 2003/4, and then again in 2006:
“either .. clarity of public statement by each
agency
or through a registry by which it would be plain
what content was being archived, and
therefore what was not.”
(US) CLIR Report, 2006
PEPRS--Development
•
Scoping study in 2007 by Rightscom and
Loughborough University led on to a JISC-funded
Project:
– Partners: EDINA & ISSN International Centre,
* Phase 1:
August 2008 – July 2010
‘investigate, prototype and build’
* Phase 2:
August 2010 – July 2012
‘preparing for service & governance’
– Initially UK in scope, we now judge PEPRS as
necessarily international
* Literature is international – so is ISSN
* Every nation needs one
* Growing international support
On the road …
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
and hosted at http://edina.ac.uk/presentations.html
JISC Journals Working Group (London, August 2008)
ISSN National Directors Meeting (Tunis, September 2008)
NASIG, 24th Annual Conference (Ashville NC, USA, 4 June 2009)
Library of Chinese Academy of Science (Beijing, 15 September 2009)
ISSN National Directors Meeting (Beijing, 17 September 2009)
PARSE.Insight Workshop (Darmstadt, Germany, 21 September 2009)
Knowledge Exchange Workshop (Edinburgh, October 2009)
E-journals are Forever Workshop, JISC/DPC (London, April 2010)
IFLA 2010 (Gothenburg September 2010)
RLUK Conference (Edinburgh, 11 November 2010)
Columbia Univ. (NYC, 23 November 2010); UKSG (Spring 2011)
… ISSN Governing Body (Paris, April 2011)
… ARL (Montreal, May 2011) and welcomed invite to ALA, New Orleans
P.Burnhill, F.Pelle, P.Godefroy, F.Guy, M.Macgregor, A.Rusbridge & C.Rees
Piloting an e-journals preservation registry service.
Serials 22(1) March 2009. [UK Serials Group]
P.Burnhill Tracking e-journal preservation: archiving registry service anyone?
Against the Grain. 21(1) February 2009. pp. 32,34,36
17
Abstract Data Model: Figure 1 in reference paper in Serials, March 2009
SERVICES: user requirements
E-J Preservation Registry Service
Piloting an
E-journals
Preservation
Registry
Service
E-Journal
Preservation
Registry
(b)
METADATA
on preservation action
(a)
METADATA
on extant e-journals
Data dependency
ISSN
Register
Digital Preservation Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.
Information about the archiving organisations
•
Wanting to work with those who have ‘archival
intent’, i.e., the keepers of content for the long
term
•
Five pilot participants:
– British Library
– CLOCKSS Archive
– e-Depot [Koninklijke Bibliotheek (KB), Dutch Royal Library]
– Global LOCKSS Network
– Portico
*preparing to include more in some kind of self-registration
19
Participants self-state* the following:
•
Overview & background: A short summary of each
archiving initiative.
•
Ingest & preservation workflow: Steps taken to ingest
content & preserve it over time.
•
Library access to content: In general terms, the conditions
under which a library can access the content archived for
each initiative.
•
Auditing of content, policies and procedures (both
internal and external activities): Steps taken to ensure
the ongoing authenticity and accessibility of content and to
monitor the development of the approach over time.
•
Latest data: With direct link to the archiving agency's
holdings information, or to the archiving agency's home page
if the holdings information is not available.
[*PEPRS is not an audit*]
Public Βeta now live!
after field-testing with archiving
Organisations [British Library,
CLOCKSS, LOCKSS, KB & Portico]
+ associates
http://peprs.org
A Quick Look
Simple search shows that
this journal is being preserved
get same result
searching on
(either) ISSN
23
Passing glance at the variation in ‘holdings information’
reflecting what the archiving organisations hold as metadata
* CLOCKSS also archives Springer content; not shown here
24
What happens when
print ISSN is
entered?
Note key role of ISSN-L; even if the
‘print ISSN’ is entered, the
preservation status of the e-journal
is found
25
* COMING SOON *
Allows a library to upload a list of ISSNs to check
preservation status.
Being field-tested in the UK and by 2CUL (Columbia &
Cornell)
26
* COMING SOON *
We are exploring the standards to use for m2m use of the
registry service, so PEPRS could be used within union
catalogues and other serial services.
27
Variation in how ‘holdings’ are expressed to PEPRS
by the agencies
The volume is often the work unit in archiving, plus whatever metadata
there is at hand associated with that unit of effort
* Dates are in the metadata, not in the workflow *
28
More variation in
a list from OUP
Mix of Arabic volume
numbers
& Roman numerals;
dates are derived
from metadata
29
note (simple) variation in
Publisher information, across
the archiving agencies, and
ISSN Register
30
Matters unresolved (1): things in initial project scope
PEPRS-specific
•
•
What users ‘really want to know’ via release of Public Beta
–
about archiving agencies and their preservation policy & practices
–
feedback on functionality; opportunity for social media
How to be an international registry of global keepers
–
Governance: UK (JISC/SCONUL/RLUK); EU (Knowledge Exchange;
LIBER); USA (ARL); International (IFLA, ICOLC, ISSN-IC; EU) ??
Relevant for ‘Holdings Forum’
•
Assigning ISSNs to preserved e-serials that are reported
1.
‘E-journals’ that come to notice
*
2.
‘D-journals’, digitised content from print journals
*
•
ISSN-IC is devising workflow to assign ISSNs as required
some have print ISSN, some not; problematic but essential to make progress
Issues/volumes, not just titles
–
extent preserved; common/conversion [action in Phase 2]
31
Matters unresolved (2): challenging the scope of PEPRS
1.
‘Continuity of access’, not just preservation
– archiving agencies may want to detail current access offer
* how should PEPRS try to adapt?
2.
What about repositories of digitized journals?
– HATHI Trust has over 210,000 titles
* of which only about 1/3 have an ISSN in the record
3.
What about print archiving?
– CLR’s PAPR initiative, for print journals
* significant proportion will not have had an ISSN assigned
Common challenges relevant for Holdings Forum
•
All have serials where ISSN not yet assigned by ‘big sister’
–
If it is worth preserving it should have a serials identifier!
*
•
Good News: ISSN Network has issued over 80,000 already
All tackling ‘middle sister’ problem of Issues/Volumes
32
‘holdings information’ in OPACs has ‘middle child’ conflict
‘holdings’ in OPAC conflates:
– information for humans
(patrons/readers) about
access to content
with
– possession of that content
Maybe OK for print journals
but we need a different
approach for journal
content in digital format,
where access and
stewardship have differing
requirements
33
What’s the way forward?
Let’s accept that the OPAC
holding statement is
just a ‘human-readable
string’
We need radical reform, with
means to ingest and store
structured metadata on
issues (with their tables of
contents) that allows:
a) transformation to allow
helpful display for humans
b) computation by
software/agents to support
lots more
34
Universal and repurposed holdings information …
•
Information for machines on what is held by a
keeper:
•
•
We are working on an ‘arithmetic’ representation
• the norm/expectation being some matrix expression,
with ‘additions’ and ‘subtractions’ about that norm
Ingesting data flows from Publishers & Digitizers …
… that can be parsed ‘volume by ‘volume’
… but expect the operational definition of ‘volumes’
to differ, as the workflows for Publishers and
Digitizers are not the same, and so their
respective ‘units of work’ differ:
① The issue as is published
② The bound volume as was digitized
35
Concluding thoughts …
Our common task is to ensure ease and continuity of
access
Because the role of libraries, individually and collectively, as
trusted keepers of scholarly information has been challenged
by the new economics of the digital …
Each Keeper needs to be sure about what it holds
– on a (digital) shelf held with ‘archival intent’
… doing so in ways that all others can know who is
keeping what?
– publish that metadata so the machine can understand!
That’s true for e-journal content, and probably true for
both digitized journal content and also of print …
…
36
hence interest in
registries:
peprs.org =>
thekeepers.org
THANK YOU
Acknowledgements due to all members of
the PEPRS Project Team,
and in particular to
Morag Macgregor for the software engineering
And thanks again to Regina Reynolds for
adding Expression to this Work [Manifestation/Item?]
Contact details:
[email protected] and [email protected]
38