Transcript CKRN09-07

E-Journal Archiving:
A Survey of the Landscape
(a study sponsored by CLIR)
Ann Okerson
CRKN
26 September 2007
7/16/2015
Digital preservation represents one of the grand challenges
facing higher education. Yet… the responsibility for
preservation is diffuse and the responsible parties have
been slow to identify and invest in the necessary infrastructure. The shift from print to electronic publication
of scholarly journals is occurring at a particularly rapid
pace; the digital portion of the scholarly record is increasingly at risk and solutions may require unique arrangements within the academy for sharing preservation
responsibility.
Adapted from "Urgent Action Needed to Preserve Scholarly
Electronic Journals," Don Waters et al, 10/2005
7/16/2015
Or, to put it another way: In an age of information
abundance and rapid growth, an age of immensely
ambitious digital resources, libraries neither own – nor
have much assurance of long-time access to – all this
glorious electronic content that we are making available
and delivering to our patrons.
7/16/2015
The study: history & process
Fall 2005: idea emerges at ARL meeting
1/2006: study commissioned with Anne Kenney & Cornell
team
2/2006 - 6/2006, the team:
Conducted interviews with library directors
Did extensive literature and Web searches
Studied the journal e-archiving landscape and chose 12
representative initiatives
Surveyed the initiatives
Analyzed all information that was gathered
7/16/2015
History & process (2)
Iteration with ARL directors at 5/2006 meeting
Extensive back and forth with stakeholders, interested
parties
Recommendations were developed in 6/2006
External readers and editorial review in summer of 2006
Publication date September 2006
Wide promulgation and discussion
ICOLC
ARL
JISC, and more
7/16/2015
Contents
Includes: the "who, what, when, where, why, and
how" of significant archiving programs operated
by not-for-profit organizations in the domain of
peer reviewed journal literature published in
digital form.
Excludes: preservation efforts covering digitized
versions of print journals (i.e., JSTOR), library
conversion projects, publisher efforts, and
initiatives in planning stages.
7/16/2015
The chosen dozen initiatives
Government mandated/funded (6):
KB - e-Depot (Dutch national deposit library). Started in
2000. 12 major publishers
Dutch Publishers Association, IBM
Kopal - DDB (National Library of Germany & Ministry
of Education & Research's project to accept journals
under legal deposit arrangement). Started in 2004
GNL, Gottingen, IBM, and others
CISTI - Csi (Canada's national science library; Canada's
scientific infostructure. Started in 2003.
7/16/2015
The chosen dozen initiatives
Csi initiative:
Goal: universal, seamless, permanent access for Canadian research,
regardless of geographic location or affiliation
Permanent: “to resolve the concerns of librarians about enduring and
sovereign access to material for which they have already paid .”
“Continuing to make significant progress in building the digital
repository, technical infrastructure, and the tools and services to
manage the information; developing new and renewed strategic
partnerships and collaborations with stakeholders in the library and
information community” (undated)
Plan for 7M articles in 2006?
2006: An MOU with UdeM to archive its licensed e-journals
MOU for backup with Library & Archives Canada (LAC)
Dovetails into proposed Federal Science eLibrary
7/16/2015
The chosen dozen (2)
Government mandated/funded (cont'd):
NLA-Pandora (Preserving and Accessing Networked
Documentary Resources of Australia). Started in 1996.
About 2,000 e-journals, mostly non-commercial;10
partner libraries
PubMed Central (National Institutes of Health-National
Library of Medicine). Started in 2000. Last year about
250 titles with ambitions to become comprehensive
LANL-RL (Los Alamos National Laboratory Research
Library, D of E). Started in 1995. Focus on physical
sciences for local use and also serves a group of
external clients.
7/16/2015
The chosen dozen (3)
Membership/subscription initiatives (4):
LOCKSS Alliance (Lots of Copies Keep Stuff Safe).
Started in 2000. Over 200 participating institutions in 20+
countries. Informal and “unregulated”
CLOCKSS (Controlled LOCKSS). Started in 2006. 7
libraries and 11 publishers to establish a comprehensive
dark archive. Intentional and comprehensive
OCLC-ECO: Started in 1997. Over 5,000 titles from 70
publishers; libraries can select their content
Portico: Membership-based 3rd party "dark archive"
service, includes 39? publishers, thousands of titles (2006)
7/16/2015
The chosen dozen (4)
Consortial implementations, providing access for
library members (2):
OhioLink Electronic Journal Center: over
7,200 journals, 9.1M articles, from 100+
publishers, 85+ members. Started early 90s?
Ontario Scholars Portal: serves 20+ university
libraries in Ontario; 7,300 journals
7/16/2015
Seven indicators of viability
Both an explicit mission & necessary mandate to perform
long-term archiving – has to come from somewhere
Must negotiate all rights and responsibilities to carry out its
obligations
Must identify exactly which titles are covered and for whom
Must provide a minimal set of defined services - receive,
store, verify integrity, guard against loss, be auditable
(certification)
Make information available under clearly stated conditions
Needs to be organizationally sound
Should work as part of a network
7/16/2015
What about content coverage?
Proved difficult to identify which publications are
being archived, by whom
Not all publish lists; not all have complete, up to
date titles (this is complicated)
Not all of a publishers' titles necessarily included
in a collection (PubMed Central has largest
number of publishers & smallest number of titles)
Aggregators such as Muse, BioOne, etc., add
complexity
7/16/2015
Content coverage (2)
Participation in the 12 (2006 data):
Number of unique publishers was 128
91 participated in only one program
20 participated in 2 programs
17 (major) publishers are in 3 or more programs
Lots of redundancy for STM
Other disciplines, smaller publishers, non-Roman, and
dynamic Web publications are less well represented and
less likely to have an archiving/preservation program
7/16/2015
“Minimal” set of services?
This area of the report:
Is the most lengthy
Is particularly clearly written
Represents the area that we know least about (much
technical activity with yet a long way to go to assure
perpetual availability)
Identifies emerging best practices and standards
Some areas covered: formats for ingestion, what content is
included, how to know it's all there, is it corrupted, cost
effectiveness, guard against loss/backup, etc.
7/16/2015
Organizational viability?
Most of the 12 appear to have the necessary organizational
structure including:
Commitment
Documentation
Adherence to standards
Succession planning
Good business planning, models
Incoming revenue for support
However, mostly a limited track record (very new)
7/16/2015
Part of a network?
Networks can be formal or informal and provide:
Idea exchange
Sharing of documents
Sharing software
Coordinating content selection
Reciprocal storage, mirroring
Backup if other archives fail
Shared resources, facilities
Some of these initiatives are communicating
productively with one or more other initiatives
7/16/2015
Conclusions of the CLIR study
Trigger events will happen
Libraries cannot do this alone
Current license terms for libraries are mostly inadequate
(perpetual access does not equal preservation)
Viable options are emerging
No single archiving program will meet all needs
Coverage is uneven
Much content is at risk
Libraries can and should influence developments
Legislation needed -- legal deposit
All programs need greater support, transparency, etc.
7/16/2015
What have we done right?
Contract language with publishers about archiving,
perpetual access
General guarantees for perpetual access
Request backup copy or permission to download
Trusted third party language
(Transfer titles are still dicey)
Self-archiving
Participated in solutions
Joined LOCKSS
Joined Portico, other
Tried to keep the issues alive
7/16/2015
What haven’t we done right?
“Let others take care of preservation,” while we drop the
print and reap the (short-term) e-only savings
Otherwise generally passive, even on our own campuses
We’ve piled up publisher tapes (like ECO) without having
a clue what we might do with them
We’ve joined mass-digitizing projects without thinking
beyond the noses on our faces – too eager to get projects
going
What about e-books? And other e-formats?
What about all the rest of the cool internet stuff? Who’ll
take care of it in the long term?
(Lack of strategy)
7/16/2015
Some questions:
Can libraries be responsible for preserving everything of
scholarly importance? Is this realistic?
Can we trust others (publishers?) to take up some of the
burden for us? Who, under what circumstances?
Should we be worried any longer about STM journals?
What preservation obligations are local? Consortial?
National? How do we decide?
Can we build consortia that are big enough when we
don’t really partner very well on so many other things
(ILS, repositories)
Can we avoid the “not invented here” syndrome?
Can we learn to work with new partners?
Does it really matter after all? (piousness vs. action)
7/16/2015
CLIR pub 138:
E-Journal Archiving Metes and Bounds:
A Survey of the Landscape
by Anne R. Kenney, Richard Entlich, Peter B. Hirtle, Nancy Y. McGovern, and Ellie L. Buckley
September, 2006. 120 pp. $30
ISBN 1-932326-26-X
ISBN 978-1-932326-26-0
<www.clir.org/pubs/reports/pub138/pub138.pdf>
7/16/2015
CLIR White Paper (draft for comment till 10/5):
Preservation in the Age of Large-Scale
Digitization
by Oya Y. Rieger, Interim Assistant University Librarian for Digital Library and Information Technologies,
Cornell University Library
<http://www.clir.org/activities/details/mdpres.html>
7/16/2015