Transcript CKRN09-07
E-Journal Archiving: A Survey of the Landscape (a study sponsored by CLIR) Ann Okerson CRKN 26 September 2007 7/16/2015 Digital preservation represents one of the grand challenges facing higher education. Yet… the responsibility for preservation is diffuse and the responsible parties have been slow to identify and invest in the necessary infrastructure. The shift from print to electronic publication of scholarly journals is occurring at a particularly rapid pace; the digital portion of the scholarly record is increasingly at risk and solutions may require unique arrangements within the academy for sharing preservation responsibility. Adapted from "Urgent Action Needed to Preserve Scholarly Electronic Journals," Don Waters et al, 10/2005 7/16/2015 Or, to put it another way: In an age of information abundance and rapid growth, an age of immensely ambitious digital resources, libraries neither own – nor have much assurance of long-time access to – all this glorious electronic content that we are making available and delivering to our patrons. 7/16/2015 The study: history & process Fall 2005: idea emerges at ARL meeting 1/2006: study commissioned with Anne Kenney & Cornell team 2/2006 - 6/2006, the team: Conducted interviews with library directors Did extensive literature and Web searches Studied the journal e-archiving landscape and chose 12 representative initiatives Surveyed the initiatives Analyzed all information that was gathered 7/16/2015 History & process (2) Iteration with ARL directors at 5/2006 meeting Extensive back and forth with stakeholders, interested parties Recommendations were developed in 6/2006 External readers and editorial review in summer of 2006 Publication date September 2006 Wide promulgation and discussion ICOLC ARL JISC, and more 7/16/2015 Contents Includes: the "who, what, when, where, why, and how" of significant archiving programs operated by not-for-profit organizations in the domain of peer reviewed journal literature published in digital form. Excludes: preservation efforts covering digitized versions of print journals (i.e., JSTOR), library conversion projects, publisher efforts, and initiatives in planning stages. 7/16/2015 The chosen dozen initiatives Government mandated/funded (6): KB - e-Depot (Dutch national deposit library). Started in 2000. 12 major publishers Dutch Publishers Association, IBM Kopal - DDB (National Library of Germany & Ministry of Education & Research's project to accept journals under legal deposit arrangement). Started in 2004 GNL, Gottingen, IBM, and others CISTI - Csi (Canada's national science library; Canada's scientific infostructure. Started in 2003. 7/16/2015 The chosen dozen initiatives Csi initiative: Goal: universal, seamless, permanent access for Canadian research, regardless of geographic location or affiliation Permanent: “to resolve the concerns of librarians about enduring and sovereign access to material for which they have already paid .” “Continuing to make significant progress in building the digital repository, technical infrastructure, and the tools and services to manage the information; developing new and renewed strategic partnerships and collaborations with stakeholders in the library and information community” (undated) Plan for 7M articles in 2006? 2006: An MOU with UdeM to archive its licensed e-journals MOU for backup with Library & Archives Canada (LAC) Dovetails into proposed Federal Science eLibrary 7/16/2015 The chosen dozen (2) Government mandated/funded (cont'd): NLA-Pandora (Preserving and Accessing Networked Documentary Resources of Australia). Started in 1996. About 2,000 e-journals, mostly non-commercial;10 partner libraries PubMed Central (National Institutes of Health-National Library of Medicine). Started in 2000. Last year about 250 titles with ambitions to become comprehensive LANL-RL (Los Alamos National Laboratory Research Library, D of E). Started in 1995. Focus on physical sciences for local use and also serves a group of external clients. 7/16/2015 The chosen dozen (3) Membership/subscription initiatives (4): LOCKSS Alliance (Lots of Copies Keep Stuff Safe). Started in 2000. Over 200 participating institutions in 20+ countries. Informal and “unregulated” CLOCKSS (Controlled LOCKSS). Started in 2006. 7 libraries and 11 publishers to establish a comprehensive dark archive. Intentional and comprehensive OCLC-ECO: Started in 1997. Over 5,000 titles from 70 publishers; libraries can select their content Portico: Membership-based 3rd party "dark archive" service, includes 39? publishers, thousands of titles (2006) 7/16/2015 The chosen dozen (4) Consortial implementations, providing access for library members (2): OhioLink Electronic Journal Center: over 7,200 journals, 9.1M articles, from 100+ publishers, 85+ members. Started early 90s? Ontario Scholars Portal: serves 20+ university libraries in Ontario; 7,300 journals 7/16/2015 Seven indicators of viability Both an explicit mission & necessary mandate to perform long-term archiving – has to come from somewhere Must negotiate all rights and responsibilities to carry out its obligations Must identify exactly which titles are covered and for whom Must provide a minimal set of defined services - receive, store, verify integrity, guard against loss, be auditable (certification) Make information available under clearly stated conditions Needs to be organizationally sound Should work as part of a network 7/16/2015 What about content coverage? Proved difficult to identify which publications are being archived, by whom Not all publish lists; not all have complete, up to date titles (this is complicated) Not all of a publishers' titles necessarily included in a collection (PubMed Central has largest number of publishers & smallest number of titles) Aggregators such as Muse, BioOne, etc., add complexity 7/16/2015 Content coverage (2) Participation in the 12 (2006 data): Number of unique publishers was 128 91 participated in only one program 20 participated in 2 programs 17 (major) publishers are in 3 or more programs Lots of redundancy for STM Other disciplines, smaller publishers, non-Roman, and dynamic Web publications are less well represented and less likely to have an archiving/preservation program 7/16/2015 “Minimal” set of services? This area of the report: Is the most lengthy Is particularly clearly written Represents the area that we know least about (much technical activity with yet a long way to go to assure perpetual availability) Identifies emerging best practices and standards Some areas covered: formats for ingestion, what content is included, how to know it's all there, is it corrupted, cost effectiveness, guard against loss/backup, etc. 7/16/2015 Organizational viability? Most of the 12 appear to have the necessary organizational structure including: Commitment Documentation Adherence to standards Succession planning Good business planning, models Incoming revenue for support However, mostly a limited track record (very new) 7/16/2015 Part of a network? Networks can be formal or informal and provide: Idea exchange Sharing of documents Sharing software Coordinating content selection Reciprocal storage, mirroring Backup if other archives fail Shared resources, facilities Some of these initiatives are communicating productively with one or more other initiatives 7/16/2015 Conclusions of the CLIR study Trigger events will happen Libraries cannot do this alone Current license terms for libraries are mostly inadequate (perpetual access does not equal preservation) Viable options are emerging No single archiving program will meet all needs Coverage is uneven Much content is at risk Libraries can and should influence developments Legislation needed -- legal deposit All programs need greater support, transparency, etc. 7/16/2015 What have we done right? Contract language with publishers about archiving, perpetual access General guarantees for perpetual access Request backup copy or permission to download Trusted third party language (Transfer titles are still dicey) Self-archiving Participated in solutions Joined LOCKSS Joined Portico, other Tried to keep the issues alive 7/16/2015 What haven’t we done right? “Let others take care of preservation,” while we drop the print and reap the (short-term) e-only savings Otherwise generally passive, even on our own campuses We’ve piled up publisher tapes (like ECO) without having a clue what we might do with them We’ve joined mass-digitizing projects without thinking beyond the noses on our faces – too eager to get projects going What about e-books? And other e-formats? What about all the rest of the cool internet stuff? Who’ll take care of it in the long term? (Lack of strategy) 7/16/2015 Some questions: Can libraries be responsible for preserving everything of scholarly importance? Is this realistic? Can we trust others (publishers?) to take up some of the burden for us? Who, under what circumstances? Should we be worried any longer about STM journals? What preservation obligations are local? Consortial? National? How do we decide? Can we build consortia that are big enough when we don’t really partner very well on so many other things (ILS, repositories) Can we avoid the “not invented here” syndrome? Can we learn to work with new partners? Does it really matter after all? (piousness vs. action) 7/16/2015 CLIR pub 138: E-Journal Archiving Metes and Bounds: A Survey of the Landscape by Anne R. Kenney, Richard Entlich, Peter B. Hirtle, Nancy Y. McGovern, and Ellie L. Buckley September, 2006. 120 pp. $30 ISBN 1-932326-26-X ISBN 978-1-932326-26-0 <www.clir.org/pubs/reports/pub138/pub138.pdf> 7/16/2015 CLIR White Paper (draft for comment till 10/5): Preservation in the Age of Large-Scale Digitization by Oya Y. Rieger, Interim Assistant University Librarian for Digital Library and Information Technologies, Cornell University Library <http://www.clir.org/activities/details/mdpres.html> 7/16/2015