wiki.opf-labs.org

Download Report

Transcript wiki.opf-labs.org

Data Archiving and Networked Services
DANS & Database Archiving
- MIXED
- SDFP
René van Horik
Program Manager
OPF Hackaton, Copenhagen, 7-9 February
2012
DANS is an institute of KNAW and NWO
Outline
1.
2.
3.
4.
5.
DANS – Data Archiving & Networked Services
Data Archiving @ DANS
MIXED: a DANS Software project
SDFP Specification (Standard Data Format for Preservation)
Hackspirations
1. DANS
•
•
•
•
•
•
DANS => “Data Archiving & Networked Services”
Mission: Durable access to research data
Project oriented
Ca. 40 people
DANS is an institute of KNAW and NWO
http://www.dans.knaw.nl
2. Data Archiving @ DANS
• EASY = Trusted Digital Repository of DANS ->
http://easy.dans.knaw.nl
• Data Seal of Approval ->
http://www.datasealofapproval.org
• 25.000 datasets / > 1.000.000 files / 10 data
archivists
• DANS guarantees accessibility over time of
“Preferred File Formats”
• Audit & Certification of DANS digital archiving
infrastructure
• APARSEN Network of Excellence (EU project)
3. MIXED-project
• MIXED = Migration to Intermediate XML for
Electronic Data
• Implementation of “Smart Migration” strategy
• MIXED Software
Smart Migration strategy
• Conversion upon ingest of specific kinds of data
formats (such as spreadsheets and databases) to
an intermediate generic format expressed in the
XML data format. Upon dissemination the file is
converted from this generic format into a current
format of choice.
• Assumption: XML is a durable file format
• Smart migration can be considered as a
combination of normalisation and migration
(Smart Migration)
Snapshot
(Smart migration)
Timeline
MIXED Software
• Generic framework with conversion plug-ins
• Several interfaces possible: web console (see
below), command line tool, web service, …
• Open source libraries (see below)
• Building block in preservation workflow
standard used!
Libraries for reading and writing
obsolete binary file formats
• dBase http://dans-dbf-lib.sourceforge.net/
• DataPerfect http://dans-dp-lib.sourceforge.net/
(Open Source)
Only use this slide to present a screenshot of an application.
As no style is applied, the screenshot can take up the whole
slide. For all other information please use the slide with
preset style!
4. SDFP part 1
•
•
•
•
“Standard Data Format for Preservation”
Defines the features of the intermediate XML data
format. “Wrapper”
Contains sets of XML schemas for various
significant data kinds and builds on existing XML
representations of file formats (e.g. SIARD / ODF)
MIXED: concentrates on tabular data (spreadsheets
and databases)
New data kinds can be added
SDFP Part 2
• Formalisation of SDFP during MIXED project
raised a lot of discussion
– To what extent do we have to replicate existing standards?
(SIARD / ODF)
– What about the provenance metadata?
– Who is our “designated community”
• We need a specification that can serve as a basis
for the development services such as MIXED (but
also for other services)
• => Data Dictionary – Metadata for the SDFP Data
Format
The SDFP data format
• The SDFP data format is optimized for
representing the content and structure of a
number of data kinds in a durable way
• Data kind: type of file format that has a structure
optimized for specific functions
• SDFP Data Dictionary is available -> Designed to
facilitate interoperability between systems,
services, and application software to support
long-term management of and continuing access
to data kinds
SDFP Data Dictionary
SDFP Groups of Data Elements
5. Hackspirations
• Working with the MIXED plugins / Libraries
• Working with the SDFP data dictionary
• Suggestions
Reference
René van Horik and Dirk Roorda, Migration to Intermediate XML for
Electronic Data (MIXED): Repository of Durable File Format
Conversions, in: The International Journal of Digital Curation, Issue
2, volume 6, 2011.
(http://www.ijdc.net)
Thank you for your attention
For more information please contact
[email protected]
Data Archiving and Networked Services
Anna van Saksenlaan 10, 2593 HT The Hague. P.O. Box 93067, 2509 AB The Hague.
T +31 (0)70 3446 484, F +31 (0)70 3446 482, E [email protected]