Transcript TITLE

The Planets Digital Preservation
Project
DLF, April 2007
Adam Farquhar
Outline





The Digital Preservation problem
The Planets project
 Goals
 Status
 Architecture
The Planets testbed
The Planets interoperability framework
Preservation planning
2
Digital information at risk
Our society risks a gaping hole in the cultural and
scientific record unless we act now
European National Libraries and Archives
 Have the legal responsibility and the legislative
framework to safeguard digital information
 Must provide sustained access to cultural and
scientific knowledge
 Have limited ability to ensure that today’s digital
information will be accessible for future
generations
Meeting the challenge of preserving access goes
beyond the capabilities of any single institution
3
EU Support for digital preservation
Major initiative in the Information Science and Technology
(IST) Framework Programme 6 Call 5
 Two Integrated Projects funded: Planets (BL), Caspar
(CCLRC)
 Coordinated action: DPE (HATII at Glasgow)
Planets builds on strong digital archiving and preservation
programmes at European, National and institutional levels
 Addresses core digital preservation challenges
 Use an empirical approach to learn what works and why
 Four year project starting June 2006 with 15me budget
4
Losing digital information hurts everyone



An NHS doctor needs a 1987 clinical study found on Google
Scholar
 She tries to open the ‘dvi’ file, but can’t
A father shows his children the computer game he wrote in school
 He wrote the game in PDP assembler
 He stored the program on paper tape
A small business owner wants to market the energy saving device
it developed in 1985
 She carefully stored all of the files
 Now she doesn’t have the applications to read the documents,
spread-sheets, and CAD drawings
 The CAD company is long out of business
5
Losing digital information costs opportunity

A university research lab has provided its data, technical
reports, software on-line since 1984 and on the web since
1990. The professor retires and closes the lab in 2004
 A university IP officer wants to defend a patent challenge
 A biographer wants review the unpublished work
 A former student wants to revive a line of research
 The digital files
 Some are damaged
 Some rely on applications that are out-of-use
 Some rely on hardware that is unavailable
 Some rely on an environment that no longer exists
 Some rely on information that no-one recorded
6
Losing digital information costs money
An oil company collected
extensive data for a reservoir
and want to exploit it in 2007
 All documents and data are
held in v1.3 of an integrated
management product
 They now use v9.0 and can’t
read or access it
An oilfield services company
collects dipmeter data in the
1970s
 Stored on 7-Track tapes
 Recorded in optimised
formats
 Difficult and expensive to
repeat measurement data
7
How big is the problem?
Who is touched by digital preservation problems?
 Individual consumers
 Small and medium sized enterprises
 Large corporations
 University libraries, faculties, institutes
 Publishers
 Libraries
 Local, regional, national governments
… every person or organisation that keeps digital material for more
than 15 years!
Estimates suggest Europe loses €3bn per year in business value
8
Planets
Four-year EU-Funded (FP6) Digital Preservation research and
technology development project.
Increase Europe’s ability to ensure long-term access to its cultural
and scientific heritage
 Improve decision-making about long term preservation
 Ensure long-term access to valued digital content
 Control the costs of preservation actions through increased
automation, scaleable infrastructure
 Ensure wide adoption across the user community and establish
market place for preservation services and tools
Build practical solutions
 Integrate existing expertise, designs and tools
 Share and build
9
Planets
Brings together Archives, Libraries, researchers and technology
companies
 Builds on strong digital archiving and preservation programmes
 Addresses core challenges
 Focuses on needs of Libraries and Archives
Will provide an interoperable framework to enable
 Third-parties to provide tools and services
 Vendors to integrate preservation services
 Content owners to ensure long-term access to their digital content
Will use an empirical approach to gather evidence
Outreach shows potential to make a difference
10
Planets partners I
The British Library
National Library, Netherlands
Austrian National Library
State and University Library,
Denmark
Royal Library, Denmark
National Archives, UK
Swiss Federal Archives
National Archives, Netherlands
11
Planets partners II
Tessella Plc
IBM Netherlands
Microsoft Research, Cambridge
Austrian Research Centers
Hatii at University of Glasgow
University of Freiburg
Technical University of Vienna
University at Cologne
12
The Team
Kick-off meeting, June 2006
All Staff Meeting, Feb 2007
13
Planets approach
Planning services that empower organisations to define, evaluate,
and execute preservation plans
Methodologies, tools and services for Characterisation of digital
objects
Innovative solutions for Preservation Actions
An Interoperability Framework provides services distributed
services
A Testbed enables objective evaluation of protocols, tools, services
and plans
Outreach, workshops and training to engage the user and vendor
communities
14
Project architecture reflects problem structure
Preservation
Planning
Services
Preservation
Action
Services
Test Bed:
evaluation and
validation
services
Digital
Content
Org.
Context
External
Context
Characterisation
Services
Interoperability Framework
15
Status
Fall 06
 Built the team
 Gathered initial requirements
 Conducted workshops and surveys
Winter 07
 Built specifications
 Evaluated component technologies
Spring 07
 Finalised many technical and implementation decisions
 Started to build tools and services
Summer 07
 Initial prototypes completed
 First experiments conducted
16
Key technology choices







Extensive use of XML throughout architecture
Extensive use of web services
Extensive use of Java and open source components
JSF (Java Server Faces) for user interfaces
Workflows
 BPEL – Business process execution language to describe experiments
and plans
 Eclipse BPEL workflow designer
Repository and interfaces
 JSR-170 Repository API
 Access to digital archives
 Jackrabbit to manage intermediate storage and data
 Sandbox technology for some third-party tools
JBoss application server
17
Testbed
Provides a foundation for objective evaluation
 Load content
 Experiment: collect data, evaluate results, compare outcomes
 Validate preservation plans
 Benchmark tools and services
Consists of
 Data storage, hardware, Planets software, testbed software
 Benchmark and other content
Provides resources for
 The project partners
 The preservation community
 External organisations
 Tool and service certification
18
Testbed – Screen Shot
Design
Experiment
19
Testbed – Screen Shot
Run
Experiment
20
Interoperability Framework
Provide the glue to hold the Planets tools and services
together
 Provide service registries
 Characterisation services
 Preservation action services
 Provide shared services
 Security, authentication, authorisation
 Monitoring, logging, auditing
 Intermediate data, repository, file system space
 Execute and manage workflows
 Enable third-parties to provide tools and services
 Enable vendors to integrate preservation services
21
Applications
….
Character’n
Services
Preservation
Planning
Testbed
Admin
Tool
Workflow
Designer
Person 1
Service Bus
Interoperability Framework
Repositories
Security
Monitoring
Service
Registry
Workflow Execution Engine
Transaction Manager
Work
Space
Exception Handling
Registry
Services
Registry
Services
Database Layer
22
Registries
Interoperability Framework – Workflow design
23
Preservation planning
Preservation
Policy
Content
Profile
Preservation
Planner
Plans
Plan Evaluator
Usage
Profile
Sample
Content
Plan
Actions
24
Preservation plan execution
Plan
Adaptor
Delivery
Executor
Repository
Content
25
Content characterisation
Characterise content to support preservation
 Reduce up-front metadata costs
 E.g., Harvard segmented images based on tool parameters
Build on TNAs PRONOM for file-format identification
 Define a characterisation language (XCDL)
 Define an extraction language (XCEL)
 Define an pluggable interpreter
Extend to measure loss due to actions
 All transformations cause loss
Leverage understanding to improve file formats
 Address a root cause of digital obsolescence
26
Preservation actions
Transform content
Transform environments
Wrap third-party transformation tools
Modular emulation of the full
hardware/software environment
 Provides full look-and feel
 Superb for highly dynamic
content
Fill gaps with new tools
Preserve relational databases
 Build on Swiss Archive work
Preserve Office content
 Build on MSFT tools
Leverage Virtual Machine
technology
Layered durable emulation
 Build on IBM Universal Virtual
Computer (UVC)
 Establish abstract device drivers
27
Conclusion
Planets
Is a major EU co-funded digital preservation project
Addresses the needs of Libraries and Archives
Has made substantial progress towards a service-oriented
preservation infrastructure
Looks forward to working with the international Digital Library
community to test, evaluate, refine, improve
28
Questions?
Thanks!
http://www.planets-project.eu/
[email protected]
29