Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

Download Report

Transcript Digital Archiving at Elsevier Joep Verheggen, ScienceDirect ICSTI Conference, London, 17 May 2004

Digital Archiving at Elsevier
Joep Verheggen, ScienceDirect
ICSTI Conference, London, 17 May 2004
Agenda
•
•
•
•
Short introduction about Elsevier
Archiving; why is this so important
and what is our position
“YOAS” project
“Technical aspects”
Note: this presentation focusses on journal content
2
Elsevier vision...
…to deliver superior information
products and services that provide
solutions for scientists, medical
professionals and librarians ...
3
Archiving terminology
•
there can be confusion when talking of
archives between:
–
–
•
•
4
(1) ongoing access to current services and
(2) long-term storage and preservation of the
intellectual content
we provide for both in our licenses
this presentation primarily related to (2)
Long-term preservation
•
•
•
•
5
significance of going “e-only”
many university and corporate libraries
have cancelled paper and use electronic
only -- and this is increasing weekly
e-only puts greater pressure on archival
preservation -- and archiving of both the
print and the electronic versions
archiving high on the agenda of individual
libraries and library groups
Responsibility for archiving
•
6
Elsevier takes digital archiving seriously
– responsibility to authors
– responsibility for maintaining “the
minutes of science”
– importance to the library community
– interest in maintaining an asset
Broad range of actions
•
•
•
7
have participated in discussions, projects
and committees related to digital archiving
since 1995
among the first (after AIP) to make public
archiving commitment and perhaps the first
to incorporate it in our license
currently making multi-million dollar
investment in internal back-up systems
Current license language
•
8
since 1999, all ScienceDirect licenses for
online service contains an annex specifying:
– we will maintain a permanent archive of
the SD journals we own
– we will migrate the archive as the
technology used for storage or access
changes
– we will transfer the archive to an
independent, librarian-approved
depository if we cannot maintain it
Sizing the problem
•
•
•
•
•
9
there are more than 1800 Elsevier journals
on ScienceDirect
we are retrodigitizing: creating digital
backfiles from v. 1, n. 1 on all titles
expect to have more than 6 million articles
on ScienceDirect by the end of this year
original size estimate of total file: 50 million
pages, 6.5 to 7 terabytes
Project started in 2001, completed in 2004
Types of archives
•
internal production “archive”
Electronic Warehouse, not ScienceDirect
•
“defacto archives”
about 10 regular ScienceDirect OnSite
(SDOS) customers worldwide who get
everything or nearly everything for local
loading (but make no archiving
commitment beyond their constituency)
10
Types of archives -- continued
•
self-designated “national” archives
libraries or other institutions that choose to
maintain an archival copy locally as a
national security measure; variation on
SDOS license
•
“official Elsevier archive”
formal, contractual relationship between
Elsevier and a trusted archival institution to
provide permanent retention and access to
the digital files for future generations
11
Official Elsevier archives
12
•
we did an investigative project with Yale
University Library (with funding from the
Mellon Foundation) which was completed
in early 2002
•
signed the first formal agreement for an
official archive with the Koninklijke
Bibliotheek (KB) in August, 2002
•
likely to do 3-4 additional agreements (in
North America, Asia and Europe)
Koninklijke Bibliotheek
13
•
an recognized international leader in digital
archiving investigations
•
fortunately, also our national library
•
Elsevier was already sending electronic files
for its 351 Dutch imprint journals
•
now expand to the entire 1,800 title journal
list, which the KB will archive “forever”
Official archive contract terms
•
contract is different from a normal license
for SD
–
–
–
–
–
–
•
14
perpetual nature of an archive
service level agreement
trigger events -- public access
financial terms
format for submission
comprehensiveness of archive (e.g., handling of
“withdrawn” material)
as standards for archival repositories
develop, KB must meet these
Use of the official archives
•
•
•
15
available for walk-in users now
available remotely to anyone in the event
we exit the business and no one else takes
over
in the event of a disaster that would result
in ScienceDirect being down for a
prolonged period, all libraries holding the
journals (archives or SDOS) would be
invited to open access to all (no access
controls)
“Technical aspects”; LOCKSS principle
Hardware
•
•
•
•
16
Dayton hosting system is located in a bunker
that is Tornado-, Earthquake-, and aircraft
impact proof
Daily incremental backups, weekly complete
backups
Off-site copies of backups, extensive recovery
procedures in place
Migration to new type hardware formats on
every new version release
“Technical aspects” – continued
Software : all formats are generally accepted standards/formats
(developed to last and/or easy to migrate)
•
Text: full SGML, migrating to XML this year
–
•
Text: PDF (derived from Postscript file)
–
17
Older content: “Head & tail” in SGML/XML
Older content: laser printer quality (300 dpi scanning)
•
Images: TIFF, JPEG, GIF (for web applications)
•
Multi-media files: we support small number of
formats that will be usable in coming decades
Thank you !
18