The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping.

Download Report

Transcript The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping.

The American Physical Society Project:
Standards-based Mirroring of Digital Library Content
Jeroen Bekaert, and Herbert Van de Sompel
Digital Library Research & Prototyping Team
Research Library, Los Alamos National Laboratory
This work supported in part by the Library of Congress
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
context
•
Add APS collection to locally hosted LANL collection
o
o
•
Remain permanently synced
Ensure correctness of locally stored APS data
Bigger picture:
o
o
o
Archive APS content
Create efficient content transfer/mirroring approach between information
providers & LANL
NDIIP: Create efficient content transfer/mirroring approach between
heterogeneous content repositories.
- Efficient mechanisms are largely non-existent.
- Devise a standards-based approach:
– MPEG-21 DIDL
– OAI-PMH
– W3C XML Signatures
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
Bigger picture: OAIS perspective
preservation planning
preservation planning
data
managment
data
managment
OAI-PMH
ingest
access
archival
storage
archival
storage
AIP1
DIP
SIP
administration
AIP2
administration
archive 1 (APS)
archive 2 (LANL)
OAI-PMH record
header
OAI-PMH datestamp
OAI-PMH identifier
metadata
MPEG-21 DIDL document
content-identifier
datastream
XML Signature of datastream
datastream
XML Signature of datastream
datastream
XML Signature of datastream
about
XML Signature of DIDL document
APS / LANL mirroring process
•
•
•
OAI-PMH request
OAI-PMH response
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
LANL
pre-ingest
&
ingest
aDORe
repository
APS Digital Object represented as application-neutral MPEG-21 DIDL
document & exposed through OAI-PMH front-end
Each datastream provided via a DIDL document is accorded a digest.
Digests delivered in DIDL document via W3C XML Signatures
A complete DIDL document is accorded a digest; delivered in the OAIPMH « about » container via W3C XML Signature
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
APS / LANL mirroring process
•
OAI-PMH request
OAI-PMH response
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
LANL
pre-ingest
&
ingest
aDORe
repository
Remain synced via OAI-PMH datestamp-based harvesting of DIDL
documents:
o
o
New APS Digital Objects
Updated APS Digital Objects
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
APS / LANL mirroring process
•
OAI-PMH response
LANL
pre-ingest
&
ingest
aDORe
repository
Datastreams delivered By-Value and/or By-Reference
o
•
OAI-PMH request
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
By-Reference requires dereferencing of datastream post harvest
Storage in pre-ingest area:
o
o
Harvested DIDL documents in XMLtape
Dereferenced content in ARC files
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
APS / LANL mirroring process
•
OAI-PMH response
LANL
pre-ingest
&
ingest
Verification of digests:
o
o
•
•
OAI-PMH request
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
DIDL document
Datastreams
Digest correct: continue
Digest incorrect: reharvest
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
aDORe
repository
APS / LANL mirroring process
•
OAI-PMH request
OAI-PMH response
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
LANL
pre-ingest
&
ingest
aDORe
repository
Ingest Digital Objects:
o
o
o
Map application-neutral DIDL documents to aDORe-profile DIDL
documents
Insert digests per constituent datastream (W3C XML Signatures)
Store in aDORe XMLtape/ARCfile environment
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
APS / LANL mirroring process
•
•
OAI-PMH request
OAI-PMH response
OAI-PMH harvester
APS
repository
LANL
OAI-PMH repository
APS
LANL
pre-ingest
&
ingest
aDORe
repository
Recurrent introspection in both repositories
Ability to harvest in both directions in case of problems with stored
Digital Objects
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
software
•
OAIResource: generic Java-based OAI-PMH resource harvesting
software package:
o
Goal: gather resources by OAI-PMH harvesting first
o
Can deal with OAI-PMH repositories irrespective of their supported
metadata formats
o
Plug-in structure makes the process of dereferencing datastreams
configurable per OAI-PMH repository
o
Results of harvesting/gathering stored as follows:
- OAI-PMH records concatenated into XMLtapes
- Datastreams concatenated into Internet Archive ARC files
o
Log files:
- List successful and unsuccesful harvesting/gathering
- List relationship between OAI-PMH records in XMLtapes and
datastreams in ARC files
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland
Papers
•
•
•
Jeroen Bekaert and Herbert Van de Sompel. A Standards-based
Solution for the Accurate Transfer of Digital Assets. D-Lib Magazine,
June 2005. http://dx.doi.org/10.1045/june2005-bekaert
Jeroen Bekaert, Herbert Van de Sompel. Access Interfaces for Open
Archival Information Systems based on the OAI-PMH and the OpenURL
Framework for Context-Sensitive Services. 2005. Preprint at
http://arxiv.org/abs/cs.DL/0509090 . Draft of an accepted submission for
PV 2005 "Ensuring Long-term Preservation and Adding Value to
Scientific and Technical data".
Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Lyudmila
Balakireva, Thorsten Schwander. aDORe: a modular, standards-based
Digital Object Repository. 2005. The Computer Journal. Preprint at
arXiv:cs.DL/0502028 . Computer Journal paper at
doi:10.1093/comjnl/bxh114
OAI-PMH for Resource Harvesting Tutorial
OAI4, October 20th 2005, CERN, Geneva, Switzerland