Transcript TITLE
The Planets Digital Preservation
Project
DLF, April 2007
Adam Farquhar
Outline
The Digital Preservation problem
The Planets project
Goals
Status
Architecture
The Planets testbed
The Planets interoperability framework
Preservation planning
2
Digital information at risk
Our society risks a gaping hole in the cultural and
scientific record unless we act now
European National Libraries and Archives
Have the legal responsibility and the legislative
framework to safeguard digital information
Must provide sustained access to cultural and
scientific knowledge
Have limited ability to ensure that today’s digital
information will be accessible for future
generations
Meeting the challenge of preserving access goes
beyond the capabilities of any single institution
3
EU Support for digital preservation
Major initiative in the Information Science and Technology
(IST) Framework Programme 6 Call 5
Two Integrated Projects funded: Planets (BL), Caspar
(CCLRC)
Coordinated action: DPE (HATII at Glasgow)
Planets builds on strong digital archiving and preservation
programmes at European, National and institutional levels
Addresses core digital preservation challenges
Use an empirical approach to learn what works and why
Four year project starting June 2006 with 15me budget
4
Losing digital information hurts everyone
An NHS doctor needs a 1987 clinical study found on Google
Scholar
She tries to open the ‘dvi’ file, but can’t
A father shows his children the computer game he wrote in school
He wrote the game in PDP assembler
He stored the program on paper tape
A small business owner wants to market the energy saving device
it developed in 1985
She carefully stored all of the files
Now she doesn’t have the applications to read the documents,
spread-sheets, and CAD drawings
The CAD company is long out of business
5
Losing digital information costs opportunity
A university research lab has provided its data, technical
reports, software on-line since 1984 and on the web since
1990. The professor retires and closes the lab in 2004
A university IP officer wants to defend a patent challenge
A biographer wants review the unpublished work
A former student wants to revive a line of research
The digital files
Some are damaged
Some rely on applications that are out-of-use
Some rely on hardware that is unavailable
Some rely on an environment that no longer exists
Some rely on information that no-one recorded
6
Losing digital information costs money
An oil company collected
extensive data for a reservoir
and want to exploit it in 2007
All documents and data are
held in v1.3 of an integrated
management product
They now use v9.0 and can’t
read or access it
An oilfield services company
collects dipmeter data in the
1970s
Stored on 7-Track tapes
Recorded in optimised
formats
Difficult and expensive to
repeat measurement data
7
How big is the problem?
Who is touched by digital preservation problems?
Individual consumers
Small and medium sized enterprises
Large corporations
University libraries, faculties, institutes
Publishers
Libraries
Local, regional, national governments
… every person or organisation that keeps digital material for more
than 15 years!
Estimates suggest Europe loses €3bn per year in business value
8
Planets
Four-year EU-Funded (FP6) Digital Preservation research and
technology development project.
Increase Europe’s ability to ensure long-term access to its cultural
and scientific heritage
Improve decision-making about long term preservation
Ensure long-term access to valued digital content
Control the costs of preservation actions through increased
automation, scaleable infrastructure
Ensure wide adoption across the user community and establish
market place for preservation services and tools
Build practical solutions
Integrate existing expertise, designs and tools
Share and build
9
Planets
Brings together Archives, Libraries, researchers and technology
companies
Builds on strong digital archiving and preservation programmes
Addresses core challenges
Focuses on needs of Libraries and Archives
Will provide an interoperable framework to enable
Third-parties to provide tools and services
Vendors to integrate preservation services
Content owners to ensure long-term access to their digital content
Will use an empirical approach to gather evidence
Outreach shows potential to make a difference
10
Planets partners I
The British Library
National Library, Netherlands
Austrian National Library
State and University Library,
Denmark
Royal Library, Denmark
National Archives, UK
Swiss Federal Archives
National Archives, Netherlands
11
Planets partners II
Tessella Plc
IBM Netherlands
Microsoft Research, Cambridge
Austrian Research Centers
Hatii at University of Glasgow
University of Freiburg
Technical University of Vienna
University at Cologne
12
The Team
Kick-off meeting, June 2006
All Staff Meeting, Feb 2007
13
Planets approach
Planning services that empower organisations to define, evaluate,
and execute preservation plans
Methodologies, tools and services for Characterisation of digital
objects
Innovative solutions for Preservation Actions
An Interoperability Framework provides services distributed
services
A Testbed enables objective evaluation of protocols, tools, services
and plans
Outreach, workshops and training to engage the user and vendor
communities
14
Project architecture reflects problem structure
Preservation
Planning
Services
Preservation
Action
Services
Test Bed:
evaluation and
validation
services
Digital
Content
Org.
Context
External
Context
Characterisation
Services
Interoperability Framework
15
Status
Fall 06
Built the team
Gathered initial requirements
Conducted workshops and surveys
Winter 07
Built specifications
Evaluated component technologies
Spring 07
Finalised many technical and implementation decisions
Started to build tools and services
Summer 07
Initial prototypes completed
First experiments conducted
16
Key technology choices
Extensive use of XML throughout architecture
Extensive use of web services
Extensive use of Java and open source components
JSF (Java Server Faces) for user interfaces
Workflows
BPEL – Business process execution language to describe experiments
and plans
Eclipse BPEL workflow designer
Repository and interfaces
JSR-170 Repository API
Access to digital archives
Jackrabbit to manage intermediate storage and data
Sandbox technology for some third-party tools
JBoss application server
17
Testbed
Provides a foundation for objective evaluation
Load content
Experiment: collect data, evaluate results, compare outcomes
Validate preservation plans
Benchmark tools and services
Consists of
Data storage, hardware, Planets software, testbed software
Benchmark and other content
Provides resources for
The project partners
The preservation community
External organisations
Tool and service certification
18
Testbed – Screen Shot
Design
Experiment
19
Testbed – Screen Shot
Run
Experiment
20
Interoperability Framework
Provide the glue to hold the Planets tools and services
together
Provide service registries
Characterisation services
Preservation action services
Provide shared services
Security, authentication, authorisation
Monitoring, logging, auditing
Intermediate data, repository, file system space
Execute and manage workflows
Enable third-parties to provide tools and services
Enable vendors to integrate preservation services
21
Applications
….
Character’n
Services
Preservation
Planning
Testbed
Admin
Tool
Workflow
Designer
Person 1
Service Bus
Interoperability Framework
Repositories
Security
Monitoring
Service
Registry
Workflow Execution Engine
Transaction Manager
Work
Space
Exception Handling
Registry
Services
Registry
Services
Database Layer
22
Registries
Interoperability Framework – Workflow design
23
Preservation planning
Preservation
Policy
Content
Profile
Preservation
Planner
Plans
Plan Evaluator
Usage
Profile
Sample
Content
Plan
Actions
24
Preservation plan execution
Plan
Adaptor
Delivery
Executor
Repository
Content
25
Content characterisation
Characterise content to support preservation
Reduce up-front metadata costs
E.g., Harvard segmented images based on tool parameters
Build on TNAs PRONOM for file-format identification
Define a characterisation language (XCDL)
Define an extraction language (XCEL)
Define an pluggable interpreter
Extend to measure loss due to actions
All transformations cause loss
Leverage understanding to improve file formats
Address a root cause of digital obsolescence
26
Preservation actions
Transform content
Transform environments
Wrap third-party transformation tools
Modular emulation of the full
hardware/software environment
Provides full look-and feel
Superb for highly dynamic
content
Fill gaps with new tools
Preserve relational databases
Build on Swiss Archive work
Preserve Office content
Build on MSFT tools
Leverage Virtual Machine
technology
Layered durable emulation
Build on IBM Universal Virtual
Computer (UVC)
Establish abstract device drivers
27
Conclusion
Planets
Is a major EU co-funded digital preservation project
Addresses the needs of Libraries and Archives
Has made substantial progress towards a service-oriented
preservation infrastructure
Looks forward to working with the international Digital Library
community to test, evaluate, refine, improve
28
Questions?
Thanks!
http://www.planets-project.eu/
[email protected]
29