DIGITAL PRESERVATION IN HYDRA/FEDORA March 24, 2015 GET AHEAD ON YOUR REPOSITORY About Hydra/Fedora • Flexible Extensible Digital Object Repository Architecture • Open-source project • Provides.

Download Report

Transcript DIGITAL PRESERVATION IN HYDRA/FEDORA March 24, 2015 GET AHEAD ON YOUR REPOSITORY About Hydra/Fedora • Flexible Extensible Digital Object Repository Architecture • Open-source project • Provides.

DIGITAL PRESERVATION
IN HYDRA/FEDORA
March 24, 2015
GET AHEAD ON YOUR REPOSITORY
About Hydra/Fedora
• Flexible Extensible Digital Object Repository Architecture
• Open-source project
• Provides a platform for digital preservation and presentation
• Used by hundreds of organizations, with over 52 Fedora Members
•
•
•
•
contributing financially; Yale is one of these.
Originally developed at Cornell, now led by Fedora Project Steering
Group under stewardship of DuraSpace.org
(http://www.fedora-commons.org)
Yale is also a Fedora development partner, and Mike Friscia serves on
the Fedora Leadership Committee
Currently actively engaged in development of Fedora 4
Hydra
• Began in 2008 as collaboration between Stanford,
UVA, Univ. of Hull, and Fedora Commons
• YUL joined in 2013 as 18th member. Membership
now up to around 27—recent additions include
Princeton, Cornell, Case Western
• Another 25 or more institutions are working in the
Hydra framework without yet being formal
members, including Brown, Johns Hopkins, Trinity
College Dublin, Oxford, UC Berkeley and others
Hydra Partners
25
20
15
10
5
0
OR09
OR10
OR11
OR12
OR13
OR14
OR = Open Repositories Conference
• DuraSpace (f)
• Stanford University (f)
• University of Hull (f)
• University of Virginia (f)
• MediaShelf
• University of Notre Dame
• Northwestern University
• Columbia University
• Penn State University
• Indiana University
•
•
•
•
•
•
•
•
•
• London School of Economics
and Political Science
• Rock and Roll Hall of Fame and
Museum
• Royal Library of Denmark
• Data Curation Experts
•
•
•
•
WGBH
Boston Public Library
Duke University
Yale University
Virginia Tech
University of Cincinnati
Princeton University Library
Cornell University
Oregon Digital (University of
Oregon and Oregon State
University)
Case Western Reserve
University
Tufts University
Duoc UC
University of Alberta
A Worldwide Presence
Hydra at Yale
GET AHEAD ON YOUR REPOSITORY
What Is Hydra?
1. A framework for repository-powered
applications, with
• multiple, tailored UIs, and a
• robust repository back end
 One body, many heads
2. A set of solution bundles
3. A community
 If you want to go fast, go alone.
If you want to go far, go together.
Data
Import
Hydra
Interface
(IT use only)
Single Image
Zoom
Bookreader
Complex Object
Display
Hydra-Head
Blacklight
Creating and managing objects (CRUD)
Discovering and viewing objects (R)
Downloadable
PDF
Search/Facet Logic
Active Fedora and Solrizer
Hydra Access Controls
Image Request
Metadata
Images
Ladybird
(Yale’s Cataloging
Tool)
Fedora
(Preservation)
Link to Images
RSS
SQL Server
Managed
Storage
Solr
(Index)
Image Retrieval
Media Server
Content model
Access Conditions
• Defined for each file in a content model
• Wide range of authorization definitions
• Customizable
• Example:
Ingest Workflow
Research Data into Hydra
•
Colectica software exports contents in BagIt format
•
Bag enters a watched folder in Ladybird
•
Ladybird validates the bag contents
•
•
Checksum validation
• File characterization
Ladybird maintains the original file hierarchy as a collection of
complex objects
•
Each Ladybird object mapped to an Unstructured Content
Model
•
Each content model is then ingested into Hydra
Unstructured Content model
DPN
Digital Preservation in Hydra
GET AHEAD ON YOUR REPOSITORY
Hydra Solution Bundles
• Sufia
• CurateND
• ScholarSphere
• HydraDAM
• Argo
• Chronos
Preservation Profiles
Encrypted
Storage pillars
Integrity
check
Preservationprofile
I
II
III IV
V-VIII
IX
1: Storage without bit preservation
2: Digital born collection of material that has
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
access restrictions
3: Legally deposited born digital material that
is not in the Webarchive
4: Born digital collection material, without
access restrictions
5: Retro digitized (expensive) materials with
X
analog copies
6: Secret digital materials
X
7: Top secret digital materials
X
X
X
X
X
X
X
X
XI
X
X
Future Development
GET AHEAD ON YOUR REPOSITORY
Fedora 4
Roadmap:
• Audit Service
• Portland Common Data Model
• Migration Tools
• Asynchronous Storage
• Linked Data Platform
• Managed External Data Streams
Fedora 4 Auditing
• Track Events: agent, date, activity, entity
• Allow import/export of events
• High performance
• Stored separate from repository entities
• Export in RDF format
• Provide SPARQL-Query search endpoint
Portland Common Data Model
HydraDAM2
http://news.indiana.edu/releases/iu/2014/12/neh
-grants-digital-preservation.shtml
Hydra Infrastructure
GET AHEAD ON YOUR REPOSITORY
Hydra Architecture
•
Open source, community developed software
•
•
•
•
Fedora Commons
Apache Solr
Blacklight
MySQL
•
Hydra Project open source, community developed software
•
Locally developed software; Ladybird, Media Delivery Service
Repository Storage – Current State
New Haven/West Haven,
CT
Rocky Hill, CT
Repository
Yale ITS
Disk-based
Enterprise
Storage
Yale
Library
Tapebased
Archival
Storage
Iron Mtn.,
Offline
Replicated
Set - Tape
Repository Storage – Current State
Risks of current state:
•
Data resides in single region, the Northeast
•
Tape media handling and refresh constraints at petabyte scale
•
One month window in which primary and backup are in same
location
Repository Storage – Future State
New Haven/West Haven, CT
Repository
Out-of-Region
Digital
Preservation
Network
or
Cloud
storage
provider (ex.
Amazon
Glacier)
or
Yale ITS
Disk-based
Enterprise
Storage
Yale ITS
Out-ofregion
Storage
Yale Hydra Roadmap
GET AHEAD ON YOUR REPOSITORY
Migrations in Progress
Hydra Growth at Yale (TB)
1200
1000
800
600
400
200
0
Hydra Roadmap
• Complete Kissinger collection (1.7 million pages, 10 million files)
• Complete migration of legacy digital collections
• Discovery and display for curated research data
• Self-archiving (Sufia) project with ITS to support Yale faculty,
•
•
•
•
•
•
•
student, and research content (first Fedora 4 collections)
Move all collections to Fedora 4 (IIIF, RDF, auditing, other
advanced features)
Unified search
Integration with ArchivesSpace (ArcLight Hydra project)
ORCid support
Online exhibitions in Spotlight
Video streaming support, HydraDAM for video preservation
DPN or other offsite copy support
Digital Preservation Services
• Multiple Copies
• Bit Preservation
• Secure Storage with Managed Access
• Provenance and Authenticity Assurance
• Standards Compliance
• Obsolescence Monitoring
• Format migration and emulation services
QUESTIONS?
“Not all digital objects are digital assets. Only those which store value and will
realise future benefit can be described as assets. Those which won’t are
liabilities.”
-4C Roadmap, “Investing in Curation: A Shared Path to Sustainability”
Resources
• http://digitalpowrr.niu.edu/tool-grid/
• http://libraries.ucsd.edu/chronopolis/_files/presentations/
DPN_OR_2014.pdf
• http://www.avalonmediasystem.org/blog-post/hydradam2
• http://duraspace.org/articles/2119
• https://curate.nd.edu/
• https://scholarsphere.psu.edu/
• http://digital.case.edu/
• http://www.kb.dk/en/nb/afdelinger/db/index.html