Bild 1 - Uppsala University

Download Report

Transcript Bild 1 - Uppsala University

Archiving Workflow
between
a Local Repository and the
National Library Archive
Experiences from the DiVA Project
Eva Müller, Peter Hansson, Uwe Klosa, Stefan Andersson
Electronic Publishing Centre
Uppsala University Library, Sweden
1
focus on
• Using URN:NBN as persistence
identifier for referencing and as
identifier in an archiving workflow
• Workflow between a local repository
and the national library archive
2
Outline
•
•
•
•
•
DiVA project and its objectives
DiVA publishing system
DiVA Archive and archiving workflow
URN:NBN and it’s role in the workflow
Conclusions and next steps
3
DiVA Project
Digitala Vetenskapliga Arkivet
(Digital Scientific Archive)
• Since September 2000
• Objectives:
– Technical solutions & well functioning
workflow supporting full-text publication of
doctoral theses, working papers…
– Explore ways to ensure the future use and
understanding of the digital objects in the
archive
4
DiVA Publishing System
• Focus on workflow
–
–
–
–
reuse and enhance the data directly from the
source document,originally created by authors,
for metadata and a digital master for an
electronic & printed version
store & checksum the files
assign a persistent identifier
send a copy to the National Library Archive
5
Word Processor
Word Processing
Format (Template)
Author
DiVA
Document
Format
DiVA Manager
DiVA
Document
Format
Local
Repository
Long-term
storage
packages
Local Long-term
Storage
6
Implementation
• Java – XML technologies
• Currently an Oracle database used for
indexing and searching
• Architecture: component-based design
– Modularity and reusability of the components
– Possibility to seamlessly replaced modules with
improved implementations of the component
7
DiVA system
• Used by 5 universities in Sweden
– Stockholm, Umeå, Uppsala och Örebro
university + Södertörns högskola
• Soon 1 university in Denmark
– Århus University (Staatsbibliotek i Århus)
8
Issues
• How can we ensure the future access
and understanding of documents we
produce locally?
• What factors increase potential for
success?
• Can these factors be integrated into an
automated and low-cost workflow?
9
How can we ensure
accessibility in the future?
• A stable point of reference
(persistent identifier)
• Use human-readable, non-proprietary
storage format
• Storage in several locations
10
How can we minimize risks
for data loss?
• Multiple copies in different locations
• Mechanism to keep track of copies
11
Analysis of interest
– Authors
• Dissemination of their intellectual output
– Universities
• Track research output
• Reduce publishing cost
• Increase impact
– National libraries
• Legal deposit
12
Strategies
• Decision to use XML as a primary
storage format
• Decision to use URN:NBN
• Decision to cooperate with the Royal
Library
• Decision to fit all needs into an
automated workflow
13
DiVA Document Format
• Internal format
• Version 1.0 (described in XML Schema)
–
–
–
–
99 elements
Component based
Extensible
Administrative elements are combined with
descriptive elements
– DocBook DTD is used for the content part of the
document
14
Metadata Dissemination
Services
Dublin Core
Content Dissemination
Services
MARC 21
Export Formats
TEI Header
PDF
Endnote
DocBook
Reference Manager
TEI
URN:NBN Register Format
HTML
Export Formats
Word Processor
Word Processing
Format (Template)
DiVA Document Format
Web Services
Author
Local
Repository
15
Implementation of the
Archiving Workflow
• Assignment of the URN:NBN to the
resources
• Implementation of URN:NBN Resolution
Service
• URN:NBN as a unique identifier within the
archive
• URN:NBN as a naming convention for files,
directories and archival packages
• URN:NBN as a part of disseminated
metadata
16
Assignment of the URN:NBN
Sub domain – managed locally
Structure URN:NBN:se:X:diva
URN:NBN:se:uu:diva+locally managed
serial number
URN:NBN is used as identifier for each
item – an item is a single publication
without consideration of format
17
Implementation of URN:NBN
Resolution Service
• Only basic functionally, more
development planned
• Implemented as a java-servlet and
contains an harvester which can
harvest URN:URL-bindings from many
different repositories
18
User
request
e.g. http://urn.kb.se/resolve?urn=
response
user redirected to an URL
Royal Library
URN:NBN
resolution
service
URN:NBN:se to URL
mappings
request
Repositories
Resolution Service
Configuration File
request
response
response
URN:NBN
Register Format
URN:NBN
Register Format
DiVA
Other
19
URN:NBN as a naming
convention for files,
directories and archival
packages
20
Workflow to the National
Library Archive
• Archiving packages
– Administrative and descriptive
metadata stored in XML
– Today – content in pdf and where
possible even in XML
– Each manifestation is bundled to AP
• General data
• Format specific data
21
Archiving Package
checksum
metadata
content
stylesheets
schemas
checksums
name: URN:NBN:se:[specific part]
22
Central
URN:NBN
Resolution
Service
XML
Long-term
Storage
Long-term storage packages
Library
Catalogue
MARC 21
Local
urn:nbn:se:.. ->
http://www…
urn:nbn:se:.. ->
http://www...
urn:nbn:se:.. ->
http://www...
List of URN:NBN
to URL mappings
Long-term
Storage
urn:nbn:se:….
Long-term storage packages
Metadata
Repository
Metadata & Content
23
Conclusions
• Low-cost system that supports a fully automated
workflow from the point of submission works well
• Using harvesting model for updates to the mapping
registry makes the management of URN:NBN simple
• Automatic creation of MARC21 records makes
cataloguing faster and less expensive
• Push model to deliver archival packages makes this
process more reliable and easier to manage
24
…
• Use of XML for all metadata associated with each
archival package increases the likelihood of future
understanding of the digital objects in the archive
and offers the potential to easily extract document
metadata, if necessary
• Modularity of technical solution offers advantage of
component reusability and is a solid basis for further
local and cooperative development
• Long-term access to institutional research
publications can be assured with cooperation from
national libraries
25
Next steps
New project funded by Royal Library’s
Department for National Co-ordination and
Development (BIBSAM)
– Examine and evaluate current solutions
– Develop and implement a generalized archiving
workflow between a local repository and a
national archive focusing on the variety of
publishing platforms and systems
26
More information
• http://publications.uu.se/
• http://publications.uu.se/epcentre/
• http://publications.uu.se/conferences/ec
dl2003/archiving_ECDL_2003.pdf
27