Blue Template - King's College London

Download Report

Transcript Blue Template - King's College London

Digital Preservation and Management
Preserving Digital Resources: Why is it an Issue?
Technology obsolescence
Digital media life expectancy
Variety of file formats
Digital rights management
Costs
Organizational resistance
Assumptions
Digital preservation is more challenging and
complex than preservation of analog objects
Digital preservation is more than a technical
preservation strategy
“THE” solution doesn’t exist
Digital preservation needs to be integrated into
organizational culture
Assumptions
Change Happens
File formats matter
Non-proprietary is best; de facto standards are good
System architecture and documentation matters
Open systems that can be moved to other platforms
Technology isn’t the whole solution
Policies, planning, and resources
The community is just beginning to work on these issues –
and everything is new and is changing
Terms
Digital Object: Any resource that can be stored
or manipulated by a computer
Digitized Resources: Any resource that has been
digitized from an analog source
Born Digital: Any resource that was created
digitally and will be managed and preserved
digitally
Terms
Digital preservation/archiving: Storage,
maintenance, and access to a digital object over
the long term, usually as a consequence of
applying one or more preservation strategies
Terms
Viability: maintenance of the bitstream
Renderability: viewable by humans and
“processable” by computers
Understandability: interpretable by humans
Fixity: The state or quality of being fixed or
unchanged.
Reliability: the digital objects are created in a
trustworthy way. They are what they say they
are
Authenticity: the digital object remains reliable
over time
Digital Preservation Strategies
Bitstream Copying
Refreshing
Durable/Persistent Media
Technology Preservation
Digital Archaeology
Analog Backups
Migration
Replication
Reliance on Standards
Normalization
Canonicalization
Emulation
Encapsulation
Universal Virtual
Computer
Trusted Digital Repositories
A repository whose mission is to provide reliable,
long term access to managed digital resources to
a community, now and in the future.
Trusted Digital Repositories
Attributes
Administrative responsibility
Organizational viability
Financial sustainability
Technological suitability
System security
Procedural accountability
OAIS compliant
Trusted Digital Repositories
Implementation approaches will vary
Approach will depend on:
Context
Users (designated community)
Underlying issue remains constant
Functionality
Reliability and authenticity
Open Archival Information System (OAIS) Reference Model
Conceptual framework for an archival system
dedicated to preserving and maintaining access
to digital information over the long term
Consists of people and systems
http://ssdoo.gsfc.nasa.gov/nost/isoas/overvie
w.html (overview)
http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/d
ocuments/pdf/CCSDS-650.0-B-1.pdf (standard)
OAIS: What is it?
Any organization or system charged with the
task of preserving information over the long
term and making it accessible to a specific group
of users
An OAIS archive is expected to meet certain
minimum responsibilities
OAIS: Minimum Responsibilities
Negotiate and accept appropriate information
from information creators
Obtain sufficient control over the information to
ensure preservation
Determine the scope of the “Designated
Community” (the users)
Ensure that users can understand the
information without assistance from the
information creators
OAIS: Minimum Responsibilities
Follow documented policies and procedures
Ensure preservation
Authenticate information
Disseminate (provide access to) information
Make the information available to the
Designated Community
Preservation Planning
Monitoring technology and users; developing
preservation actions
Preservation planning is part of the
administration functions of any archival
program; OAIS has highlighted it as a distinct
function
Importance of constant and ongoing
management and planning for digital
preservation call for this
Components of a Digital Preservation Program
TDR and OAIS imply that there are three
components of a digital preservation program
Resources Framework (trust)
Organizational Infrastructure (policy)
Technological Infrastructure (technology)
Resource Framework
Nothing is sustainable without ongoing
commitment of resources
A high level commitment to digital preservation
must demonstrate an adequate resource
commitment
Deliverables that meet the goals
Line item budgets
Staff commitment
Strategic planning
Projections for costs and funding scenarios
Resource Framework
Commitment of resources (time, money, staff) implies
organizational commitment and reflects organizational
priorities
Staffing is the expensive part!
Curatorial functions
Appraising, acquiring, processing, metadata creation,
ongoing management, access
Technical functions
Computer operation, system administrator, database
administrator, storage administrator, application
programmer, preservation expertise
Planning
Identify stakeholders and their roles
Educate
All partners need a desired outcome
Tangible or intangible
Buy-in
Mission, goals, outcomes
Organizational Infrastructure
Organizational and Curatorial Responsibilities
Policy framework
Operational Responsibilities
Planning framework
Functions and roles
Organizational and Curatorial Responsibilities –
Policy Framework
Strategic Plan
Collection Policy
Security Policy
Preservation Policy
Access Policy
Strategic Plan
Overview and scope of the digital preservation
program and its context
Mission/Purpose
High level goals and objectives
Commitment to OAIS and community best
practices
Related documentation and who is responsible
Administrative/Oversight structure
High level audience statement
Audience (Designated Community)
OAIS requirement
Explicit
All collections
Per collection
Audience=assumed knowledge and resources
Impacts of Audience Identification
The kinds of collections you will accept
The kind of descriptive information (metadata)
you will provide
The kind of services you will offer
Software, translators
The kind of preservation actions chosen
Significant properties
The access mechanisms you need to provide
Collection Policy
What kinds of digital resources are you going to
collect and digitally preserve?
Content considerations
Are you focusing on a specific content area?
Rights management considerations
Metadata responsibilities and requirements
Requirements for documenting acquisitions
Collection Policy
Technical considerations
Digitization with no physical counterpart
Digitization with a physical counterpart
Anything born digital
Born digital that can’t be reformatted to eye
readable
Collection Policy
Are there further limitations on what you will collect?
(examples)
Non-proprietary formats only
Specific formats only (TIFF)
Systems/databases only
Distinct documents only
Minimum amount of metadata required at time of
acquisition
Materials that can be digitally reformatted in a specific
way
Move everything to TIFF?
Move everything to XML?
Documenting Acquisitions
OAIS requires agreements with depositors that address
acquisition, maintenance, access and withdrawal
Should already be using these kinds of agreements
May need to revise for digital materials, to include
What happens if functionality is lost?
Is reformatting to eye readable an acceptable
preservation option?
What kind of access can you provide and is it
acceptable?
Are there digital-specific copyright issues to consider?
Documenting Acquisitions
May need to revise for digital materials, to
include
Metadata creation responsibilities
Rights management
What level of functionality will be available
from the digital repository?
Security Policy
System security
Physical environment
Backup and recovery
Fixity of the data (reliability)
Disaster preparedness and response
Planning and documentation requirements
Assign responsibility
Preservation Policy
Commitment to digital preservation
Goals of digital preservation
Scope of materials
Formats
Metadata suppliers
Access commitments
Preservation Policy
Definition of overall preservation strategy
Are there limitations?
What happens if preservation actions go
wrong?
Is reformatting to eye-readable an acceptable
preservation action? Under what
circumstances?
Planning and documentation requirements
Responsibilities assigned
Operational Responsibilities
Based on work done by OAIS community to
define the principle obligations of an OAIS
compliant repository
Appropriate planning documentation will be
necessary to carry out operations
Specific planning based on strategic plan and
policies
Operational Responsibilities
Acquisition
Physical and intellectual control
Determines audience (designated community)
Follows policies and procedures to assure
preservation of authentic information
Access
Promotes development of best practices and
standards
Acquisition
Development of collection policies
Includes specific required formats, if appropriate
Procedures and workflows for copyright clearance for
access and preservation
Metadata specifications and implementation
Procedures to ensure the authenticity of submitted
material
Assessment of the completeness of the submission
Documentation of all acquisition transactions
Control
Preparing the materials for storage
Content analysis
Significant properties
Verification of metadata
Unique and persistent identifier assigned
Authenticity and integrity check
Move to archival storage
Preservation Actions
Monitoring of technology and the digital
materials
Technology watch
Preservation planning
Classes of material
Actions to be taken
Documentation of actions and results
Functionality considerations
Access
A system for resource discovery
Mechanism for authenticity check
Access control mechanisms
User support
Standards and Best Practices
Promote and utilize
Results in economies of scale
Creation of high quality digital resources that
are more amenable to preservation
Work with software suppliers, potential
depositors, designated communities
In-house
Significant investment
Technical expertise
Workflow impacts
Maintain physical control
Outsource
Can the service provider meet your needs and
requirements?
Less investment?
No cost models to show if this is accurate
Less reliance on in-house technical expertise and
infrastructure necessary
What happens if the service provider goes out of
business?
Combination
Build what you can
Build what you need that can’t be outsourced
Buy what you can’t build
Now, digital repositories…
OAIS Metadata Implications
Metadata is data that facilitates the
management, description, and preservation of a
digital object or aggregation of digital objects.
Standards and best practices are developed to
promote the creation of metadata to it supports
interoperability and collaboration.
Metadata sets
Metadata encoding schema
Types of Metadata
Descriptive
Technical
Structural
Administrative
Preservation
Metadata
Each type of metadata will be needed to
facilitate the preservation and usability of born
digital material
Use standards and best practice metadata sets
Think interoperability
Technologically
Element sets
Immediate Actions
Get Your Team Together
Identify your needs
Do you really need a digital repository right
NOW?
Is there an interim solution until the field is
more settled?
Agree on vision and goals
Plan
Immediate Actions
Discuss strategy
Communication
Any institutional repository depends on a
relationship with IT staff
Priorities
Language barriers
Immediate Actions
Identify the organizational infrastructure
changes that need to be made
Investigate existing tools and digital repositories
Learn and experiment with existing tools
Make high level decisions
What kind of digital materials are we going to
commit to preserving?
Immediate Actions
Funding
Inventories of digital resources
Establish metadata standards and practices
Identify and understand users
Take Home Concepts
Use standards and best practices
The solution is complex; the tools are incomplete
Organizational and technological challenges
Learn about what others are doing and
build on it
Don’t reinvent the wheel
Take Home Concepts
Resources are the issue
People, not computers!
Expect and plan for change
This is all a work in progress
“First generation” technologies, tools,
understanding of issues
You will redo work
Existing Tools
Tools
Technical tools
Interfaces, infrastructure and technologies that allow
you to do the work necessary to create, manage and
preserve digital resources
Examples might include:
Metadata creation
File format verification
Algorithms for fixity checks
Appraisal/processing tools
Access tools – indexing, finding aids, etc.
Acquisition tools
Tools
Few currently exist
Options
Wait
Build your own
Modify existing tools
Use what there is
Tools
DSpace
FedoraTM
LOCKSS
Greenstone
OCLC Digital Archive
DSpace
A specialized content management system that:
manages and distributes digital items
allows for creation, indexing and searching of
metadata
supports long term preservation of material
designed to make submission and
administration easy
DSpace
Developed by MIT and Hewlett Packard
Based on freely available software
can use proprietary software as well with
minor modifications
Customizable
Academic community is especially active in the
use of this implementation
UNIX based; written in Java
DSpace
No support available
Preservation is done locally and is not inherent
in the system
Downloads and specific information at
http://www.dspace.org
Dspace Demo - MIT Press
https://hpds1.mit.edu/handle/1721.1/1776
FedoraTM
Flexible Extensible Digital Object and Repository
Architecture
“An Open-Source Digital Repository
Management System” – the architectural
underpinning or plumbing
Used to support institutional repositories, digital
libraries, content management, digital asset
management, scholarly publishing, and digital
preservation
FedoraTM
Cornell and University of Virginia, funded by
Mellon
Freely available
Based on open source software and web based
technologies
Limited interfaces
Management
Access
Access Lite
FedoraTM Architectural Model
Data Object
Persistent ID (PID)
Behavior Definition
Object
Persistent ID (PID)
Method Definition
Metadata
System Metadata
Disseminators
Datastreams (specs)
System Metadata
Behavior Mechanism
Object
Persistent ID (PID)
Method Implementation
Metadata
Datastreams
System Metadata
Datastreams
(executables)
FedoraTM
Installs on Windows PC
Packaged to get up and running quickly
Demo set of objects
Scales with hardware in a production
environment
No support available
Plumbing only; no inherent preservation
Downloads and information available at
http://www.fedora.info
LOCKSS
Lots of Copies Keeps Stuff Safe
To safeguard web journals libraries subscribe to
Mimics the way libraries manage paper
collections
Redundant, distributed, decentralized
LOCKSS
Works only for HTTP/HTML standard file types
(html, jpeg, gif, pdf, etc)
Open source code
It can be modified
Designed to be low cost, low time
Will run on a dedicated PC
PC specs available on the LOCKSS site
LOCKSS
Publishers can prevent LOCKSS from caching
their content
Publishers must give libraries permission
Licensing language available on the LOCKSS
web site
Freely available
No support (ease of use is highlighted)
Preservation is not inherent
http://lockss.stanford.edu/
Greenstone
A suite of software for building and distributing
digital library collections
Produced by the New Zealand Digital Library
Project at the University of Waikato
Developed and distributed in cooperation with
UNESCO and the Human Info NGO.
Open-source, multilingual software, issued
under the terms of the GNU General Public
License.
Greenstone
“Should in fact work on any Windows or Unix
system.”
“Local library”
“Web library”
Greenstone Librarian Interface
The “Organizer”
Greenstone
Documentation is available
Installer's Guide
Developer's Guide
Paper to Collection Inside
Greenstone Collections
MG/MG++
Workshops are also held
Listservs for implementors
Some technical support available
Not preservation oriented
http://www.greenstone.org/cgi-bin/library
OCLC Digital Archive
Standards based
OAIS compliant
METS encoded dissemination packages
Phased support for various formats and material type
Currently text and still image
Can integrate with current library selection and cataloging
activities
Content owner manages the archived objects and
determines access
Known costs
Offers bit preservation
OCLC Digital Archive Functions
Harvest from web
preview and review
Metadata creation
Ingest
From web or batch
Access management
public or restricted
Viewing
Dissemination
Reports
Periodic Audits of Objects in the Archive
Frequent Backups and Disaster Prevention
Digital Archive Web Services
End User Access
OCLC Digital Archive Development
Preservation policy and plans in progress
Expanding formats and object types accepted
Active in development of preservation metadata
standard and will comply
Active in developing digital repository certification
Additional information available at:
http://www.oclc.org/support/training/digitalarchive/
http://www.oclc.org/support/documentation/digitalarchive/
Other Tools
Australian PANDAS-PANDORA
CONTENTdm (content management)
SDSC Data Grid Technology
Web harvesting tools
E-records management software
Document management systems
Data warehousing technology
XML parsing tools
SDSC and others