Digital Repository Preservation Service ________________________ Digital Dissemination Task Force April 24, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS.

Download Report

Transcript Digital Repository Preservation Service ________________________ Digital Dissemination Task Force April 24, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS.

Digital Repository
Preservation Service
________________________
Digital Dissemination Task Force
April 24, 2008
Meg Bellinger, AUL
Roy Lechich, Audrey Novak, ILTS
Yale Cyber Infrastructure Architecture
Infrastructure Framework,
Protocols, Standards
Web services, Z39.50, OAI-PMH,
RSS, SRU/SRW, OAIS, Fedora
Common Services
Persistent identification, Authentication &
Authorization, Registries, Rights Management
Content Provision:
Services & Storage
For digital collections,
preservation,
metadata
Users
From library,
museums, research,
academic and
administrative
departments
Yale and global
Fusion:
Services, Tools, Applications
Presentation: Interfaces
Yale uPortal, Classesv2, Google,
Personal Information Environment,
Discipline specific, gallery, museum
and library sites
Brokers, aggregators, indexes,
catalogs, MetaLib, XSearch
Based on a graphic created by Lorcan Dempsey
Content
Sources
Yale University Library
Dissemination
Digital Repository Service
Full Text
Books
Google,
MSN, Yahoo …
Finding
Aids
Content
Image
Commons
Collections
Environment
Classes*v2
(Sakai)
Audio
& Video
University
Portal
Images and
Metadata
Integration
Services
E-Publishing
(Institutional
Repository)
Complex
Objects
Preservation
Archive
Collections
XSearch
.
Research
Data
Personal
Collections
Library,
MetaLib
Metadata
VITAL
Outline
______________________________________
• Introduction
• Background
• Digital Preservation Repository
– Phase I
– Additional Phases
• Within the Larger Landscape
24 Apr 2008
Intro: What is Digital Preservation?
__________________________________________________
“ Digital preservation is the whole of
the activities and processes involved
in the physical and intellectual
protection and technical stabilization
of digital resources through time in
order to reproduce authentic copies of
these resources.” (YUL Digital
Preservation Policy)
24 Apr 2008
Introduction: The Need
___________________________________________________________________
Mass
Digitization
At an ever
accelerating pace,
faculty, students,
and staff (e.g., the
Library) are
creating, sharing,
and storing digital
information for
teaching, learning,
research,
administrative, and
creative purposes.
Statistical Datasets
Images
Information in
digital form is
now integral to
Yale's core
mission.
Scientific &
Biomedical Data
Audio,
Video,
Podcasts
Web Sites
24 Apr 2008
Introduction: The Need
__________________________________________________
• Digital resources are fragile and the preservation of
these resources is complex.
• Digital preservation is dynamic
– Responses to technological obsolescence or media decay
must be taken quickly.
• Digital preservation is pro-active
– Rather than reactionary and the prospects for successfully
preserving digital resources rest heavily upon decisions
taken at each stage of their life cycle starting with creation.
24 Apr 2008
Introduction: The Need
_____________________________________________________
Digital Landscapes Committee,
Cyberinfrastructure Survey (Oct 2006)
Ranking from 19 survey questions posed to faculty:
#1 Easier electronic access to scholarly materials
#2 Providing students with digital access to research
and instructional materials
#11 Ensuring the preservation of my scholarly digital
output (e.g., datasets, research notes, e-prints)
24 Apr 2008
Introduction: The Need
_____________________________________________________
“The
coolest thing that will be done
with your data someone else will
do.” Open Repositories 08
24 Apr 2008
Background – YUL Related Initiatives
_____________________________________________________
• IAC Rescue Repository
– 2004 - present
• IAC Digital Preservation Committee
– Nov 2004 - Jan 2007
• IAC Metadata Committee
– Nov 2004 - Feb 2007
– PREMIS - Preservation Metadata Task Force
• April - Oct 2006
24 Apr 2008
Rescue Repository
(May 2004 Requirements Report)
______________________________________
“An increasing number of projects in the YUL are
generating or acquiring digital content …”
“The digital masters for much of this material are in
immediate danger of permanent loss through
media decay, physical damage, technological
obsolescence, or difficulties in archival
management..."
"...in the interim, we propose a flexible and agile/quick
short-term solution…"
24 Apr 2008
Rescue Repository Description
_____________________________________________________
• Managed, secure storage (disk-to-disk-totape).
• Resources are organized according to owning
library, collection, subcollection(s), file name.
• Activity is managed by simple ingest and
retrieval applications with basic file verification
and validation.
• A ~3 year temporary solution (May 2005 +3
yrs).
• Heavily used …
24 Apr 2008
Users: BRBL, Div, E-Collections, Geo, LWL, MSS/A, Peabody,
Preservation, SSL, VRC, YUAG
RR Storage Usage
60
50
Storage in TB
40
Total Storage
Used Storage
Available Storage
30
20
10
r-0
6
Ju
n06
Au
g06
O
ct
-0
6
D
ec
-0
6
Fe
b07
Ap
r-0
7
Ju
n07
Au
g07
O
ct
-0
7
D
ec
-0
7
Fe
b08
Ap
r-0
8
Ju
n08
Au
g08
O
ct
-0
8
D
ec
-0
8
6
Ap
5
b0
Fe
ec
-0
D
O
ct
-0
5
0
Oct-05 Jan-06 Jul-06 Jan-07 Mar-07 Jun-07 Oct-07
Nov-07 Jun-08 Sep-08 Dec-08
Total Storage
13.6
13.6
13.6
13.6
13.6
13.6
28.8
36
36
Used Storage
0.419
0.698
5.7
8.4
9.3
9.9
14.1
19
36
Available Storage
13.1
13
8
5.3
4.5
3.8
14.7
21
0
43.5
53
Date
24 Apr 2008
Digital Preservation Committee
___________________________________________________________________
• Preservation Policy – Defines digital
preservation; establishes general principles
about what is preserved; promulgates our
commitment to standards.
• Best Practices – A dynamic suite of
documents that address current best
practices for preservation-related issues
such as format validation, registries, etc.
24 Apr 2008
Metadata Committee
____________________________________________
Preservation Metadata Taskforce (PREMIS)
Report
• PREMIS (PREservation Metadata
Implementation Strategies) defines the
metadata needed to preserve digital
information assets for the long term.
24 Apr 2008
Digital Preservation Need and Related
Initiatives Summary
_____________________________________________________
• The demand for a Digital Preservation Repository
from faculty, Rescue Repository users,
digitization operations and projects is heavy.
• The Rescue Repository and work by the IAC
Digital Preservation and Metadata/PREMIS
Committees laid the foundation.
• Rescue Repository is reaching its planned end to
life.
24 Apr 2008
Digital Preservation Repository: Phase I
_________________________________________________________
• $500,000 funding from the Provost to establish
a Digital Preservation Repository prototype.
– Provide mechanisms and services for preservation
and access to the data.
– Create the scalable hardware infrastructure.
– Demonstrate an extensible repository service
model.
– Develop the resource (staff and economic) models.
– Establish the collaborative campus partnerships.
– Further the research and scholarship into digital
preservation issues.
24 Apr 2008
Digital Preservation Repository: Phase I
_________________________________________________________
Working from two Use Cases:
1.
YPED (Yale Protein Expression Database)*
• Protein profiling mass spectrometry data sets
generated by the Keck Lab
2. Images from the Rescue Repository
• Approximately 400,000 individual image files
from the Art Gallery, Beinecke, Divinity
Library, Lewis Walpole Library, Library Visual
Resources Collection, and Manuscripts and
Archives department.
*
Proteomics is the large-scale study of proteins and is often considered
the next step in the study of biological systems, after genomics.
24 Apr 2008
Digital Preservation Repository: Phase I
_________________________________________________________
1. Hardware Architecture
2. Software Design
3. Preservation Metadata
4. Use Case: YPED
5. Use Case: Images
24 Apr 2008
Phase I - Hardware
______________________________________
•
•
•
•
20TB YPED and Images
30TB Microsoft mass digitization
10TB non-images (Rescue Repository)
40TB Annual growth with Library digitization
projects
_________
• 250TB Annual growth with Fortunoff video
digitization project
• 1000TBs (a petabyte) within 5 years
• Others?
24 Apr 2008
Phase I - Hardware
______________________________________
Projected Growth in Storage
900
800
700
600
500
TB
400
300
200
100
0
2005
2006
2007
2008
2009
2010
2011
2012
24 Apr 2008
Yale Cyber Infrastructure Architecture
Infrastructure Framework,
Protocols, Standards
Web services, OAI-PMH, OAIS,
Fedora, METS, MVC, SOA
Common Services
Persistent identification, Authentication &
Authorization, Registries, Rights Management
Content Provision:
Services & Storage
For digital collections,
preservation,
metadata
Users
From library,
museums, research,
academic and
administrative
departments
Yale and global
Fusion:
Services, Tools, Applications
Presentation: Interfaces
Yale uPortal, Classesv2, Google,
Personal Information Environment,
Discipline specific, gallery, museum
and library sites
Brokers, aggregators, indexes,
catalogs, MetaLib, XSearch
Based on a graphic created by Lorcan Dempsey
Software Design
___________________________________________________
Phase I - Core Preservation Functionality
• Deposit, Normalization, Packaging, Validation,
Ingest, Storage (multiple copies, geographic
separation), Preservation Policy Management,
Authorization, OAI-PMH, SRW/SRU, Retrieval
• YPED and Image Use Case Requirements
Additional Phases - More Services
• Preservation actions
• All (or almost all) user-facing services
• Enhanced access & delivery through applications
24 Apr 2008
Flexible
Accept Different
Types of Data
Collect Data and
Metadata Components
Normalize for Ingest
Processing
Verify Integrity
Add Identifiers
Add Preservation
Metadata
Deposit / Ingest
SIP
Continuous Integrity
Checks
Format Migrations
(e.g. .tiff to .jp2000)
Storage Migrations
(to new or different
type physical media)
Logging
Reporting
Preservation / Storage
AIP
Repository
Authorization
Validation
OAI-PMH
SRW/SRU
Indexing
Retrieval
Logging
Access
DIP
Digital Preservation Repository – Phase I
Summary
_____________________________________________________
Build:
• Hardware environment
• Core preservation repository services
• Project specific service components
needed for YPED and to replace Rescue
Repository
• Migration of Rescue Repository image
content
24 Apr 2008
Additional Phases
_____________________________________________________
Examples:
• Full Rescue Repository migration
• More content (project/use cases)
– Project specific ingest and access
• More storage (950TBs)
• Preservation actions (integrity checks, format
migrations, etc.)
• Reporting
• Rights Management
5 years, 6FTE, ~7 million dollars
24 Apr 2008
Larger Landscape
____________________________________________
Peer Institutions:
• Stanford, Harvard
• Rutgers
• DAITSS (Florida)
• Michigan
• Columbia
Internationally:
• European National Libraries
• Australia & New Zealand
24 Apr 2008
Thank you
Q&A
24 Apr 2008