DIGITAL PRESERVATION IN HYDRA/FEDORA March 24, 2015 GET AHEAD ON YOUR REPOSITORY About Hydra/Fedora • Flexible Extensible Digital Object Repository Architecture • Open-source project • Provides.
Download ReportTranscript DIGITAL PRESERVATION IN HYDRA/FEDORA March 24, 2015 GET AHEAD ON YOUR REPOSITORY About Hydra/Fedora • Flexible Extensible Digital Object Repository Architecture • Open-source project • Provides.
DIGITAL PRESERVATION IN HYDRA/FEDORA March 24, 2015 GET AHEAD ON YOUR REPOSITORY About Hydra/Fedora • Flexible Extensible Digital Object Repository Architecture • Open-source project • Provides a platform for digital preservation and presentation • Used by hundreds of organizations, with over 52 Fedora Members • • • • contributing financially; Yale is one of these. Originally developed at Cornell, now led by Fedora Project Steering Group under stewardship of DuraSpace.org (http://www.fedora-commons.org) Yale is also a Fedora development partner, and Mike Friscia serves on the Fedora Leadership Committee Currently actively engaged in development of Fedora 4 Hydra • Began in 2008 as collaboration between Stanford, UVA, Univ. of Hull, and Fedora Commons • YUL joined in 2013 as 18th member. Membership now up to around 27—recent additions include Princeton, Cornell, Case Western • Another 25 or more institutions are working in the Hydra framework without yet being formal members, including Brown, Johns Hopkins, Trinity College Dublin, Oxford, UC Berkeley and others Hydra Partners 25 20 15 10 5 0 OR09 OR10 OR11 OR12 OR13 OR14 OR = Open Repositories Conference • DuraSpace (f) • Stanford University (f) • University of Hull (f) • University of Virginia (f) • MediaShelf • University of Notre Dame • Northwestern University • Columbia University • Penn State University • Indiana University • • • • • • • • • • London School of Economics and Political Science • Rock and Roll Hall of Fame and Museum • Royal Library of Denmark • Data Curation Experts • • • • WGBH Boston Public Library Duke University Yale University Virginia Tech University of Cincinnati Princeton University Library Cornell University Oregon Digital (University of Oregon and Oregon State University) Case Western Reserve University Tufts University Duoc UC University of Alberta A Worldwide Presence Hydra at Yale GET AHEAD ON YOUR REPOSITORY What Is Hydra? 1. A framework for repository-powered applications, with • multiple, tailored UIs, and a • robust repository back end One body, many heads 2. A set of solution bundles 3. A community If you want to go fast, go alone. If you want to go far, go together. Data Import Hydra Interface (IT use only) Single Image Zoom Bookreader Complex Object Display Hydra-Head Blacklight Creating and managing objects (CRUD) Discovering and viewing objects (R) Downloadable PDF Search/Facet Logic Active Fedora and Solrizer Hydra Access Controls Image Request Metadata Images Ladybird (Yale’s Cataloging Tool) Fedora (Preservation) Link to Images RSS SQL Server Managed Storage Solr (Index) Image Retrieval Media Server Content model Access Conditions • Defined for each file in a content model • Wide range of authorization definitions • Customizable • Example: Ingest Workflow Research Data into Hydra • Colectica software exports contents in BagIt format • Bag enters a watched folder in Ladybird • Ladybird validates the bag contents • • Checksum validation • File characterization Ladybird maintains the original file hierarchy as a collection of complex objects • Each Ladybird object mapped to an Unstructured Content Model • Each content model is then ingested into Hydra Unstructured Content model DPN Digital Preservation in Hydra GET AHEAD ON YOUR REPOSITORY Hydra Solution Bundles • Sufia • CurateND • ScholarSphere • HydraDAM • Argo • Chronos Preservation Profiles Encrypted Storage pillars Integrity check Preservationprofile I II III IV V-VIII IX 1: Storage without bit preservation 2: Digital born collection of material that has X X X X X X X X X X X X X X X X X X X X X access restrictions 3: Legally deposited born digital material that is not in the Webarchive 4: Born digital collection material, without access restrictions 5: Retro digitized (expensive) materials with X analog copies 6: Secret digital materials X 7: Top secret digital materials X X X X X X X X XI X X Future Development GET AHEAD ON YOUR REPOSITORY Fedora 4 Roadmap: • Audit Service • Portland Common Data Model • Migration Tools • Asynchronous Storage • Linked Data Platform • Managed External Data Streams Fedora 4 Auditing • Track Events: agent, date, activity, entity • Allow import/export of events • High performance • Stored separate from repository entities • Export in RDF format • Provide SPARQL-Query search endpoint Portland Common Data Model HydraDAM2 http://news.indiana.edu/releases/iu/2014/12/neh -grants-digital-preservation.shtml Hydra Infrastructure GET AHEAD ON YOUR REPOSITORY Hydra Architecture • Open source, community developed software • • • • Fedora Commons Apache Solr Blacklight MySQL • Hydra Project open source, community developed software • Locally developed software; Ladybird, Media Delivery Service Repository Storage – Current State New Haven/West Haven, CT Rocky Hill, CT Repository Yale ITS Disk-based Enterprise Storage Yale Library Tapebased Archival Storage Iron Mtn., Offline Replicated Set - Tape Repository Storage – Current State Risks of current state: • Data resides in single region, the Northeast • Tape media handling and refresh constraints at petabyte scale • One month window in which primary and backup are in same location Repository Storage – Future State New Haven/West Haven, CT Repository Out-of-Region Digital Preservation Network or Cloud storage provider (ex. Amazon Glacier) or Yale ITS Disk-based Enterprise Storage Yale ITS Out-ofregion Storage Yale Hydra Roadmap GET AHEAD ON YOUR REPOSITORY Migrations in Progress Hydra Growth at Yale (TB) 1200 1000 800 600 400 200 0 Hydra Roadmap • Complete Kissinger collection (1.7 million pages, 10 million files) • Complete migration of legacy digital collections • Discovery and display for curated research data • Self-archiving (Sufia) project with ITS to support Yale faculty, • • • • • • • student, and research content (first Fedora 4 collections) Move all collections to Fedora 4 (IIIF, RDF, auditing, other advanced features) Unified search Integration with ArchivesSpace (ArcLight Hydra project) ORCid support Online exhibitions in Spotlight Video streaming support, HydraDAM for video preservation DPN or other offsite copy support Digital Preservation Services • Multiple Copies • Bit Preservation • Secure Storage with Managed Access • Provenance and Authenticity Assurance • Standards Compliance • Obsolescence Monitoring • Format migration and emulation services QUESTIONS? “Not all digital objects are digital assets. Only those which store value and will realise future benefit can be described as assets. Those which won’t are liabilities.” -4C Roadmap, “Investing in Curation: A Shared Path to Sustainability” Resources • http://digitalpowrr.niu.edu/tool-grid/ • http://libraries.ucsd.edu/chronopolis/_files/presentations/ DPN_OR_2014.pdf • http://www.avalonmediasystem.org/blog-post/hydradam2 • http://duraspace.org/articles/2119 • https://curate.nd.edu/ • https://scholarsphere.psu.edu/ • http://digital.case.edu/ • http://www.kb.dk/en/nb/afdelinger/db/index.html