Digital Preservation Best Practices, Lessons Learned From

Download Report

Transcript Digital Preservation Best Practices, Lessons Learned From

Digital Preservation Best Practices
Lessons Learned From Across the Pond
Slavko Manojlovich
Associate University Librarian (IT) / Manager, Digital Archives Initiative
Memorial University of Newfoundland
Michael J. Bennett
Digital Projects Librarian / Institutional Repository Coordinator
University of Connecticut
Ontario Library Association Super Conference
Toronto
February 5, 2011
Outline
 What is digital preservation?
 Best practices information resources
 Open Archives Information System (OAIS) Model
 Preservation Planning
 Digital Preservation in Action (Archivematica)
What is digital preservation?
 Digital preservation is the series of actions and
interventions required to ensure continued and
reliable access to authentic digital objects for as long as
they are deemed to be of value. This encompasses not
just technical activities, but also all of the strategic and
organisational considerations that relate to the
survival and management of digital material.
 Disaster recovery strategies and backup systems are
not sufficient to ensure survival and access to
authentic digital resources over time.
JISC Digital Preservation Briefing Paper, November, 2006
http://www.jisc.ac.uk/publications/briefingpapers/2006/pub_digipres
ervationbp.aspx
What is digital preservation?
 Digital preservation includes:
 Digitized analogue content (easy)
 Born – digital content (more difficult)
 Text
 Audio
 Video
 Email
 Web sites
 Research data
 Databases
 Container files
 Spreadsheets
 Software
 ....
What is digital preservation?
 Recent example from Memorial University (January, 2011)
Data stored in a variety of formats going back to 1977:
 Access Databases
 Paper Files (14 filing cabinets)
 Excel Spreadsheets
 Progeny files
 Cyrillic files
 Slides of various testing images
 JPEGs of various testing images
 Powerpoint presentations
 Researcher’s memory 
“All of the above represents a vast resource which
cannot be lost from the University.”
Digital preservation best practices
 Best practices may not always be the best option for
your organization:
 British Library Microsoft Live Book Data Project
The DPT [Digital Preservation Team] have taken the view
that since the budget for hard drive storage for this project
has already been allocated, it would be impractical to
recommend a change in the specifics as far as file format is
concerned for this project......
JPEG 2000 files compressed to 70 dB PSNR for the
preservation copy.
British Library, Preservation Plan for Microsoft – Update
http://www.bl.uk/aboutus/stratpolprog/ccare/introduction/
digital/digpresmicro.pdf
Digital preservation best practices
 Best practices may not always be the best option for
your organization:
 The National Gallery (UK) Preservation of Digital
Photographs of the Collection
The National Gallery has photographed their entire collection
using a high-end digital MARC camera capable of capturing and
rendering colour accuracy which is at least 5 times better than
traditional photography. They have selected the proprietary raw
camera output format for long-term preservation because it
supports an advanced level of colour management. The company
supporting the camera and associated software is very small and is
not a market leader.
Digital Futures Workshop Site Visit to National Gallery
Photography Department, April, 2010.
Best Practices Information Sources
 Subject Repositories: European collaboration in the
international context
London, UK. January 28-29, 2010
http://www.ariadne.ac.uk/issue62/bl-subject-repos-rpt/
 Digital Preservation – The Planets Way
London, UK / February 9, 2010
http://www.planets-project.eu/events/london-2010
 Digital Futures London 2010: From digitization to
delivery
King’s Digital Consultancy Services (KDCS)
King’s College, London, UK / April 19 – 23, 2010
http://www.kdcs.kcl.ac.uk/digifutures/overview.html
Best Practices Information Sources
 Eighth European Conference on Digital Archiving
Geneva, Switzerland / April 28 -30, 2010
http://www.bar.admin.ch/eca2010/00732/index.html?lan
g=en
 My Leicestershire: JISC Funded Community Engagement
Digitization Project
Leicester, UK / May – November, 2010
http://www.myleicestershire.com
 Learning how to play nicely: Repositories and CRIS
Leeds, UK / May 7, 2010
http://www.wrn.aber.ac.uk/events/cris/program.html
Best Practices Information Sources
 Archiving 2010
The Hague, Netherlands / June 1-4, 2010
http://www.imaging.org/ist/conferences/archiving/Arch
iving%202010%20Preliminary%20Program.pdf
 Thrive or Survive: Making the Most of Your Digital
Content
Manchester, UK / June 8-9, 2010
http://www.surviveorthrive.org.uk/about/
 Digital Preservation Management: Implementing Shortterm Solutions for Long-term Problems
Cambridge, MA, USA / June 13-18, 2010
http://www.icpsr.umich.edu/dpm/workshops/fiveday.ht
ml
Information Sources
 OR2010: The 5th International Conference on Open
Repositories
Madrid, Spain / July 6-9, 2010
http://or2010.fecyt.es
 Talis Open Day: Linked Data and Libraries
London, UK / July 21, 2010
http://blogs.talis.com/nodalities/2010/07/linked-datain-libraries-presentations.php
 iPRES2010: 7th International Conference on Preservation
of Digital Objects
Vienna, Austria / September 19-24, 2010
http://www.ifs.tuwien.ac.at/dp/ipres2010/index.html
International Journal of Digital Curation
http://www.ijdc.net/index.php/ijdc
International Journal of Digital Curation
Latest Issue
http://www.ijdc.net/index.php/ijdc
International Journal of Digital Curation
Latest Issue
http://www.ijdc.net/index.php/ijdc
Digital Curation/Preservation Science
King’s College MA Digital Assets Management
Digital Curation/Preservation Science
 “The technical knowledge needed for digital
preservation is not widespread in Canada and there
are few opportunities in Canada for staff development
and education right now.”
Survey of Digital Preservation Practices in Canada,
Kathleen Shearer, March, 2009
http://www.collectionscanada.gc.ca/digitalinitiatives/012018-3100.06-e.html
Digital Curation/Preservation Science
University of Toronto Digital Curation Institute
Digital Curation/Preservation Science
Digital Curation/Preservation Science
Open Archives Information System (OAIS)
 Developed by the Consultative Committee for Space
Data Systems in 2002 and became an ISO standard in
2003 (ISO 14721:2003).
148 pages of heavy reading:
http://public.ccsds.org/publications/archive/650x0b1.
pdf
 “Those who will implement OAIS archives or
administer them on a daily basis should read the
entire document.”
 OCLC claims OAIS compliance for their “Digital
Archive”.
http://cendi.dtic.mil/presentations/oais_kirche_12_11_
01.ppt
Open Archives Information System (OAIS)
 Library and Archives Canada’s Trusted Digital
Repository is based on OAIS.
http://www.bar.admin.ch/eca2010/00732/00843/index
.html?lang=en&download=M3wBPgDB/8ull6Du36We
nojQ1NTTjaXZnqWfVp3Uhmfhnapmmc7Zi6rZnqCkk
IN2fX18bKbXrZ6lhuDZz8mMps2gpKfo
 National Library of the Netherlands (Koninklijke
Biblioteek) e-Depot is an exemplar world class OAIS
based digital repository.
http://cendi.dtic.mil/presentations/oais_kirche_12_11_
01.ppt
OAIS Reference Model
“The use of this reference model as the basis of any archive implementation is
recommended as it allows practitioners to use common language and potentially
common tools to address common problems.”
Tessella Technology & Consulting White Paper
http://www.digital-preservation.com/wp-content/uploads/DigitalArchiving.pdf
OAIS Reference Model
http://wiki.esipfed.org/images/0/0b/OAIS_FunctionalEntities.jpg
OAIS Reference Model
http://www.library.cornell.edu/dlit/MathArc/web/StoryFrameset.html
OAIS Reference Model - Actors
Digital Preservation Management: Implementing Short-term
Solutions for Long-term Problems
OAIS Reference Model - Objects
Digital Preservation Management: Implementing Short-term
Solutions for Long-term Problems
OAIS Reference Model - Actions
Digital Preservation Management: Implementing Short-term
Solutions for Long-term Problems
Preservation Planning
 Monitor designated community
 Monitor technology
 Develop Preservation Strategies and Standards
 Develop Packaging Designs and Migration Plans
Digital Preservation Management: Implementing Short-term
Solutions for Long-term Problems
Monitor Technology
Internet Archive: MUN Web Site 2008
http://web.archive.org/web/*/http://www.mun.ca
Monitor Technology
Internet Archive: MUN Web Site 2004
http://web.archive.org/web/*/http://www.mun.ca
Monitor Technology
Internet Archive: MUN Web Site 1999
http://web.archive.org/web/*/http://www.mun.ca
Monitor Technology
Google Drops H.264 Support (Jan 11, 2011)
http://video-commerce.org/2011/01/google-chrome-h264-video/
Monitor Technology
Microsoft Adds H.264 Support (Feb 2, 2011)
Microsoft's Got H.264's Back, Releases Plug-in for Chrome Users - PCWorld
Plato: The PLANETS Preservation
Planning Tool
Plato: The PLANETS Preservation Planning Tool
Plato: The PLANETS Preservation
Planning Tool
 Developed by the PLANETS Consortium
The British Library
The National Library of the
Netherlands
Austrian National Library
The Royal Library of Denmark
State and University Library,
Denmark
The National Archives of the
Netherlands
The National Archives of
England, Wales and the UK
Swiss Federal Archives
University of Cologne
University of Freiburg
HATII at the University of
Glasgow
Vienna University of
Technology
The Austrian Institute of
Technology
IBM Netherlands
Microsoft Research Limited
Tessella Plc
Plato: The PLANETS Preservation Planning Tool
Plato: The PLANETS Preservation
Planning Tool
 A preservation plan defines a series of preservation actions to be taken by a
responsible institution due to an identified risk for a given set of digital objects
or records (called collection).
 The Preservation Plan takes into account the preservation policies, legal
obligations, organisational and technical constraints, user requirements and
preservation goals and describes the preservation context, the evaluated
preservation strategies and the resulting decision for one strategy, including
the reasoning for the decision.
 It also specifies a series of steps or actions (called preservation action plan)
along with responsibilities and rules and conditions for execution on the
collection.
 Provided that the actions and their deployment as well as the technical
environment allow it, this action plan is an executable workflow definition.
 Access to a library of preservation plans.
Plato: The PLANETS Preservation Planning Tool
Plato: The PLANETS Preservation
Planning Tool
Plato: The PLANETS Preservation Planning Tool
Plato: TIFF to JPEG 2000 Case Study
http://digitalpreservation.gov/edge/edge_planets.html
Plato: TIFF to JPEG 2000 Case Study
 British Library’s 2 million newspaper pages in TIFF-5
uncompressed and high quality. File size is 40 MB/
page.
 PLATO experiment to test image quality and size of
TIFF-5 images converted to JPEG 2000 lossless.
 Experiment results: JPEG 2000 lossless image quality is
as good as TIFF-5 uncompressed and image size is
reduced by 25-30 percent. JPEG derivatives from TIFF5 are equivalent to JPEG derivatives from JPEG 2000
lossless.
http://digitalpreservation.gov/edge/edge_planets.html
Planets Time Capsule
http://www.ifs.tuwien.ac.at/dp/timecapsule/timecapsule.html
E-Prints: Integration of Bit-Level and
Logical Preservation (New)
Connecting Preservation Planning and Plato with Digital Repository
Interface, iPRES 2010
E-Prints: Integration of Bit-Level and
Logical Preservation (New)
Connecting Preservation Planning and Plato with Digital Repository
Interface, iPRES 2010
E-Prints: Integration of Bit-Level and
Logical Preservation (New)
GIF files will be migrated to PNG with the ImageMagick utility
Connecting Preservation Planning and Plato with Digital Repository
Interface, iPRES 2010
E-Prints: Integration of Bit-Level and
Logical Preservation (New)
 Upload Plato preservation plan to E-Prints
 Prescribed preservation plan action applied to each set
of identified “at risk” classified files
 E-Prints creates provenance metadata for all
preservation actions (i.e. File was migrated from “file
format A” to “file format B” on this date according to
preservation plan NNN).
Connecting Preservation Planning and Plato with Digital Repository
Interface, iPRES 2010
Sample Media Type Preservation Plan
Archivematica Media Type Preservation Plans
Trustworthy Repositories Audit &
Certification (TRAC) Checklist
http://www.crl.edu/archiving-preservation/digital-archives/certification-andassessment-digital-repositories
Trustworthy Repositories Audit &
Certification (TRAC) Checklist
http://www.crl.edu/archiving-preservation/digital-archives/metricsassessing-and-certifying/crl-ratings
Trustworthy Repositories Audit &
Certification (TRAC) Checklist
http://www.crl.edu/archiving-preservation/digital-archives/metricsassessing-and-certifying/core-re
Digital Preservation in Action
Archivematica (version 0.6)
 Archivematica http://archivematica.org is an open
source software toolkit that takes the OAIS model and
turns its various conceptual entities into actionable
functionalities.
 Can take SIPs and turn them into AIPs (and DIPs).
 In its v. 0.6 it accomplishes this through a Unix
pipeline design pattern (think: Ubuntu’s Thunar file
manager in combination with Bash and Python
scripts)
 At various points along the pipeline (from SIP to AIP,
etc.), various open-source utilities bundled w/n
Archivematica are called upon and leveraged.
Archivematica & OAIS: what we’ll
look at – SIP > AIP
http://wiki.esipfed.org/images/0/0b/OAIS_FunctionalEntities.jpg
Archivematica & OAIS: what we’ll
look at – SIP > AIP
SIP>AIP
Microservices
Open
source
tools
employed
http://archivematica.org/wiki/images/d/dc/Archivematica-architecture-7May20102.png
Archivematica & OAIS: the context of
our investigation, UConn case study
Basic needs assessment & outline for still image file archiving. Note
that we’re not worried about DIP here, just SIP > AIP (institution feels
we already have a sufficient DIP-like system(s) in place).
Archivematica Demo: SIP > AIP
Archivematica Demo: SIP
Archivematica Demo: SIP
SIP can contain an assortment of files, in this
case still image files.
Archivematica Demo: SIP > reviewSIP
Archivematica Demo: SIP > reviewSIP
Archivematica Demo: SIP > reviewSIP
Archivematica Demo: SIP > reviewSIP
SIP successfully pasted in review directory.
Archivematica Demo: reviewing SIP
SIP-level descriptive
Dublin Core metadata &
an MD5 checksum can
be added at this point.
Or a pre-created
checksum can be
verified.
Archivematica Demo: SIP > quarantineSIP
Archivematica Demo: SIP > quarantineSIP
Once the SIP moves into
quarantine it remains there
for a pre-set time period
while the files are checked
for viruses, file names are
normalized, formats are
verified, and technical
metadata are extracted.
Logs of all activity are
written, stored, and added
to a METS.xml manifest
that describes the soon-tobe-created AIP’s contents.
Archivematica Demo: quarantine > appraiseSIP
Once finished with SIP processing described in
previous slide, Archivematica’s scripts move
the SIP out of quarantine and into appraiseSIP.
Archivematica Demo: appraiseSIP
METS.xml describes the complete SIP contents. The
“objects” folder contains the still image files. The “logs”
folder contains the following…
Archivematica Demo: log files
Archivematica Demo: SIP > prepareAIP
After staff finish their
appraisal, the SIP is
then either manually
moved to a rejection
folder or to
prepareAIP.
Archivematica Demo: SIP > prepare AIP
After placing
SIP into
prepareAIP, the
program
automatically
generates
preservation
format copies
of original
objects. All
original
formats…
…are preserved
alongside the
normalized copies.
Also added to the
prepared AIP is a
METS xml file which
contains FITS output,
ingest log info, and
Dublin Core
metadata. All of this
is then…
Archivematica Demo: prepareAIP > reviewAIP
…zipped into a single file, using the Library of Congress’
BagIt format, and moved by the software to the
reviewAIP folder.
Archivematica Demo: reviewAIP
Contents of reviewAIP BagIt file.
Archivematica Demo: reviewAIP
Contents of data subfolder.
Archivematica Demo: AIP > Archival
Storage
http://wiki.esipfed.org/images/0/0b/OAIS_FunctionalEntities.jpg
Archivematica & OAIS: how did it do against
UConn's needs & what else do we need to consider?
can we automate SIP creation?
CAN THIS SCALE?...
AM requires double the space of a given SIP’s
size in order to perform AV scanning, etc. So, AM
will need to be run in a suitably-size environment.
File Share b/w Win & Linux? Samba?
wait for AM v. 0.7’s dashboard?
File Share b/w Win & Linux? Samba?
So, Why Did We Not Consider
Archivematica's DIP?
http://wiki.esipfed.org/images/0/0b/OAIS_FunctionalEntities.jpg
So, Why Did We Not Consider
Archivematica's DIP?
We’ve already got a CONTENTdm image repository/search
layer in place, http://images.lib.uconn.edu/
We’re also in the process of evaluating Digital Asset Management (DAM) systems for
possible implementation, http://digitalcommons.uconn.edu/libr_pubs/24/ &
http://digitalcommons.uconn.edu/libr_pres/24/
Preview of New Version of Archivematica (0.7)
We’re very interested in following
how this piece will be developed
Additional Open-Source OAIS
Software Packages
 National Archives of Australia’s Digital
Preservation Software Platform (DPSP),
http://dpsp.sourceforge.net/
 Florida Digital Archive’s DAITSS v.2 (anticipated
February 2011 release) http://daitss.fcla.edu/
 Portuguese National Archives’ RODA (no longer under
active development?)
http://redmine.keep.pt/projects/show/roda-public
Contact
Slavko Manojlovich
Associate University Librarian (IT)
Manager, Digital Archives Initiative
Memorial University of Newfoundland, St. John’s
[email protected]
&
Michael J. Bennett
Digital Projects Librarian &
Institutional Repository Coordinator
University of Connecticut Libraries, Storrs, CT, USA
[email protected]
*This presentation may be downloaded at:
http://digitalcommons.uconn.edu/libr_pres/29