The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping.
Download ReportTranscript The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping.
The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping Team Research Library, Los Alamos National Laboratory This work supported in part by the Library of Congress OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland context • Add APS collection to locally hosted LANL collection o o • Remain permanently synced Ensure correctness of locally stored APS data Bigger picture: o o o Archive APS content Create efficient content transfer/mirroring approach between information providers & LANL NDIIP: Create efficient content transfer/mirroring approach between heterogeneous content repositories. - Efficient mechanisms are largely non-existent. - Devise a standards-based approach: – MPEG-21 DIDL – OAI-PMH – W3C XML Signatures OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland Bigger picture: OAIS perspective preservation planning preservation planning data managment data managment OAI-PMH ingest access archival storage archival storage AIP1 DIP SIP administration AIP2 administration archive 1 (APS) archive 2 (LANL) OAI-PMH record header OAI-PMH datestamp OAI-PMH identifier metadata MPEG-21 DIDL document content-identifier datastream XML Signature of datastream datastream XML Signature of datastream datastream XML Signature of datastream about XML Signature of DIDL document APS / LANL mirroring process • • • OAI-PMH request OAI-PMH response OAI-PMH harvester APS repository LANL OAI-PMH repository APS LANL pre-ingest & ingest aDORe repository APS Digital Object represented as application-neutral MPEG-21 DIDL document & exposed through OAI-PMH front-end Each datastream provided via a DIDL document is accorded a digest. Digests delivered in DIDL document via W3C XML Signatures A complete DIDL document is accorded a digest; delivered in the OAIPMH « about » container via W3C XML Signature OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process • OAI-PMH request OAI-PMH response OAI-PMH harvester APS repository LANL OAI-PMH repository APS LANL pre-ingest & ingest aDORe repository Remain synced via OAI-PMH datestamp-based harvesting of DIDL documents: o o New APS Digital Objects Updated APS Digital Objects OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process • OAI-PMH response LANL pre-ingest & ingest aDORe repository Datastreams delivered By-Value and/or By-Reference o • OAI-PMH request OAI-PMH harvester APS repository LANL OAI-PMH repository APS By-Reference requires dereferencing of datastream post harvest Storage in pre-ingest area: o o Harvested DIDL documents in XMLtape Dereferenced content in ARC files OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process • OAI-PMH response LANL pre-ingest & ingest Verification of digests: o o • • OAI-PMH request OAI-PMH harvester APS repository LANL OAI-PMH repository APS DIDL document Datastreams Digest correct: continue Digest incorrect: reharvest OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland aDORe repository APS / LANL mirroring process • OAI-PMH request OAI-PMH response OAI-PMH harvester APS repository LANL OAI-PMH repository APS LANL pre-ingest & ingest aDORe repository Ingest Digital Objects: o o o Map application-neutral DIDL documents to aDORe-profile DIDL documents Insert digests per constituent datastream (W3C XML Signatures) Store in aDORe XMLtape/ARCfile environment OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process • • OAI-PMH request OAI-PMH response OAI-PMH harvester APS repository LANL OAI-PMH repository APS LANL pre-ingest & ingest aDORe repository Recurrent introspection in both repositories Ability to harvest in both directions in case of problems with stored Digital Objects OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland software • OAIResource: generic Java-based OAI-PMH resource harvesting software package: o Goal: gather resources by OAI-PMH harvesting first o Can deal with OAI-PMH repositories irrespective of their supported metadata formats o Plug-in structure makes the process of dereferencing datastreams configurable per OAI-PMH repository o Results of harvesting/gathering stored as follows: - OAI-PMH records concatenated into XMLtapes - Datastreams concatenated into Internet Archive ARC files o Log files: - List successful and unsuccesful harvesting/gathering - List relationship between OAI-PMH records in XMLtapes and datastreams in ARC files OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland Papers • • • Jeroen Bekaert and Herbert Van de Sompel. A Standards-based Solution for the Accurate Transfer of Digital Assets. D-Lib Magazine, June 2005. http://dx.doi.org/10.1045/june2005-bekaert Jeroen Bekaert, Herbert Van de Sompel. Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services. 2005. Preprint at http://arxiv.org/abs/cs.DL/0509090 . Draft of an accepted submission for PV 2005 "Ensuring Long-term Preservation and Adding Value to Scientific and Technical data". Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Lyudmila Balakireva, Thorsten Schwander. aDORe: a modular, standards-based Digital Object Repository. 2005. The Computer Journal. Preprint at arXiv:cs.DL/0502028 . Computer Journal paper at doi:10.1093/comjnl/bxh114 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20th 2005, CERN, Geneva, Switzerland