Presentation

Download Report

Transcript Presentation

looking under the hood
Preservation Status of e-Resources:
A Potential Crisis in Electronic Journal Preservation
Oya Y. Rieger
AUL for Digital Scholarship & Preservation Services
Cornell University Library
Robert Wolven
AUL for Bibliographic Services and Collection Development,
Columbia University Libraries/Information Services
CNI Forum, December 2011
http://2cul.org/node/22
genesis of the study
• Cornell and Columbia spend more on
e-materials than other forms of content.
Cornell University Library
Annual Statistics Report
2009/2010
genesis of the study
• E-journal archiving responsibility is
distributed and elusive
Yet as the creation and use of digital information accelerate,
responsibility for preservation is diffuse, and the responsible parties …
have been slow to identify and invest in the necessary infrastructure to
ensure that the published scholarly record represented in electronic
formats remains intact over the long-term.
Urgent Action Needed to Preserve Scholarly Electronic Journals ,
Donald J. Waters et al., 2005
Research Questions
• How do we participate in the LOCKSS alliance?
– Do we understand the difference between LOCKSS and CLOCKSS?
– Who is overseeing the coordination of preservation decisions?
• How do we keep track of which e-subscriptions are represented
in LOCKSS to understand their preservation status?
– How do we have back issue access when a journal is canceled?
– What kind of a mechanism do we have in place between the ERM/LMS and
the local LOCKSS box to support uninterrupted access to digital content?
• Can we do an analysis that compares Portico and LOCKSS coverage
of the 2CUL e-journals?
2CUL LOCKSS Assessment Study
Initiative Leads
•Oya Rieger, AUL, Digital Scholarship & Preservation Services, Cornell
•Patricia Renfro, Associate VP for Digital Programs and Technology Services, Columbia
Research Team
•Marty Kurth, Coordinator, Digital Scholarship Services, Cornell (now NYU)
•Jeff Carroll, Collections, Columbia
•Bill Kara, Central Library Operations, Cornell
•Bill Kehoe, Information Technology, Cornell
•Jim Spear, Technical Services Assistant, Cornell
•Breck Witte, Library Information Technology, Columbia
•Bob Wolven, Collection Development, Columbia
international community initiative that provides
libraries with open-source digital preservation tools
and support
facilitate easy and inexpensive collection and
preservation of institutional copies of authorized econtent
200+ members & over 8,600 e-journal titles from 500
publishers
digital preservation service provided by ITHAKA, a
not-for-profit organization with a mission to help
the academic community use digital technologies to
preserve the scholarly record
139 participating publishers, 718 partner libraries,
12,381 e-journal titles, and 123,586 e-book titles
Leveraging LOCKSS
• Only surface understanding of the preservation strategy and
its implications
• No formal process in place for identification of e-journals for
preservation consideration
• LOCKSS is currently being used for dark archiving
• Lack of organizational leadership to bring together related
parties from collections, IT, and scholarly communication
teams
Operational Aspects
• Neither Columbia nor Cornell currently uses its ERM to record
and manage details related to potential LOCKSS or Portico
access
• Identification of titles for which access has been triggered is
not handled through the ERMs at Cornell; Columbia tracks
CLOCKSS and Portico triggered content in Serials Solutions
• Neither of the libraries have we taken advantage of LOCKSS so
far by gaining access to a canceled subscription or a closed
journal or by participating in a failure-recovery test
LOCKSS & Portico Coverage Study
• The short version:
– “Only 13% (or 15%) of Cornell’s and Columbia’s
e-journals are currently being preserved.”
• A closer look under the hood:
– What we found
– What should be done about it
Disclaimers
• Not an evaluation of LOCKSS or Portico
• Not a complete survey of e-journal
preservation
• Not a rigorous research study
• Not up to the minute
• Set out to measure overlap; ended up …
LOCKSS and Portico coverage study
• Data for e-journal titles extracted from catalog
– Limited to titles with ISSN or e-ISSN (50%)
– 45,000+ titles for Cornell
– 55,000+ titles for Columbia
Data sent to Portico for matching
– Cornell data also compared to LOCKSS
LOCKSS and Portico Coverage
Cornell data
• LOCKSS only:
3.9%
• Portico only:
14.5%
• LOCKSS and Portico:
7.6%
– Not necessarily same holdings
• Total coverage:
26.1% of titles
26% of What?
• Serial publications
• In digital form
• With ISSN or e-ISSN
• Titles
• Not content
• Not expenditures
Titles vs Holdings: South Asia Research
LOCKSS: vol. 25, 26, 27, 28
Portico: vol. 23(1), 24, 25, 26, 27(1), 28(3), 29(1)
Serial publications
•
•
•
•
•
•
•
Scholarly, peer-reviewed journals
Trade publications, newsletters
Annual reports
Newspapers
Government documents
Conference proceedings
Monographs in series
In digital form
•
•
•
•
•
•
Current, from publisher
Backfiles, from publisher
Current or backfiles, from aggregator
Historical, scanned by libraries, Google
Historical, in commercial digital collections
Published on the web
Breaking down the numbers:
what’s not preserved (35-40,000
titles)
Available through aggregators: 25-30%
Miscellaneous freely accessible: 22-25%
Newsletters: 10%
• East Asian: 10%
• Participating publishers: 8-9%
• Non-participating publishers: 4-5%
Breaking down the numbers
• Digitized collections with e-journals (commercial): 5%
• Digitized collections, library based (e.g. Hathi Trust): 4%
• Government, IGO (e.g. OECD): 3-4%
• Book series, conference proceedings: 2-3%
• Data errors (e.g., ISSN mismatch): 2%
A few examples
• Aggregator:
Popular electronics
– In multiple databases
• Freely accessible: Jornal brasileiro de pneumologia
– In Scielo Brasil, 2004• NGO: Yearbook … Balkan Human Rights Network
– In Central European Online Library, 2006
• Trade Newsletter: Malaysia Food & Drink Report
– In ABI/Inform, 2009• East Asian:
대한산업공학회지
– In DBPIA
More examples
• Historical: Bulletin d’archeologie chretienne
– In Gallica, 1870-1876
• Book series:
Developments in volcanology
– In ScienceDirect e-book collection
• Data error:
Music and Medicine
– In SAGE Premier, 2009- (ISSN mismatch)
• Foundations of Computational Mathematics
– In SpringerLink 2001-present (LOCKSS, not Portico)
• Proceedings … User Services Conference
– In ACM Digital Library 1974-present
Breaking down the numbers:
what’s not preserved (35-40,000
titles)
Available through aggregators: 25-30%
Difficult; 3rd-party agreements
Important; libraries going e-only
Miscellaneous freely accessible: 22-25%
Questionable; many “acquired” en masse
Newsletters: 10%
Secondary? Ephemeral?
• East Asian: 10%
Different legal, technical environment
Breaking down the numbers
• Participating publishers: 8-9%
Publisher platforms as distributors (aggregators)
Content not structured as journals
• Non-participating publishers: 4-5%
Cost/benefit issues
• Government, IGO (e.g. OECD): 3-4%
Whose responsibility?
• Data errors (e.g., ISSN mismatch): 2%
Fewer than expected
Different preservation strategies
• Scholarly journals – LOCKSS; Portico
• Historical – HathiTrust; Portico digital collections
• Free on the web – web archiving; e-Depot
• University published – Institutional repository?
• Book series, conferences – as books
Next steps?
• Repeat, extend analysis
• Work with other libraries on priorities, strategies
• Work with publishers
• Work with LOCKSS, Portico, Keepers Registry
• Investigate international context
• Develop intersystem data exchange
We wish to thank the staff of LOCKSS and Portico for their
assistance in conducting this study.
Questions?