Transcript TITLE
The Planets Digital Preservation Project DLF, April 2007 Adam Farquhar Outline The Digital Preservation problem The Planets project Goals Status Architecture The Planets testbed The Planets interoperability framework Preservation planning 2 Digital information at risk Our society risks a gaping hole in the cultural and scientific record unless we act now European National Libraries and Archives Have the legal responsibility and the legislative framework to safeguard digital information Must provide sustained access to cultural and scientific knowledge Have limited ability to ensure that today’s digital information will be accessible for future generations Meeting the challenge of preserving access goes beyond the capabilities of any single institution 3 EU Support for digital preservation Major initiative in the Information Science and Technology (IST) Framework Programme 6 Call 5 Two Integrated Projects funded: Planets (BL), Caspar (CCLRC) Coordinated action: DPE (HATII at Glasgow) Planets builds on strong digital archiving and preservation programmes at European, National and institutional levels Addresses core digital preservation challenges Use an empirical approach to learn what works and why Four year project starting June 2006 with 15me budget 4 Losing digital information hurts everyone An NHS doctor needs a 1987 clinical study found on Google Scholar She tries to open the ‘dvi’ file, but can’t A father shows his children the computer game he wrote in school He wrote the game in PDP assembler He stored the program on paper tape A small business owner wants to market the energy saving device it developed in 1985 She carefully stored all of the files Now she doesn’t have the applications to read the documents, spread-sheets, and CAD drawings The CAD company is long out of business 5 Losing digital information costs opportunity A university research lab has provided its data, technical reports, software on-line since 1984 and on the web since 1990. The professor retires and closes the lab in 2004 A university IP officer wants to defend a patent challenge A biographer wants review the unpublished work A former student wants to revive a line of research The digital files Some are damaged Some rely on applications that are out-of-use Some rely on hardware that is unavailable Some rely on an environment that no longer exists Some rely on information that no-one recorded 6 Losing digital information costs money An oil company collected extensive data for a reservoir and want to exploit it in 2007 All documents and data are held in v1.3 of an integrated management product They now use v9.0 and can’t read or access it An oilfield services company collects dipmeter data in the 1970s Stored on 7-Track tapes Recorded in optimised formats Difficult and expensive to repeat measurement data 7 How big is the problem? Who is touched by digital preservation problems? Individual consumers Small and medium sized enterprises Large corporations University libraries, faculties, institutes Publishers Libraries Local, regional, national governments … every person or organisation that keeps digital material for more than 15 years! Estimates suggest Europe loses €3bn per year in business value 8 Planets Four-year EU-Funded (FP6) Digital Preservation research and technology development project. Increase Europe’s ability to ensure long-term access to its cultural and scientific heritage Improve decision-making about long term preservation Ensure long-term access to valued digital content Control the costs of preservation actions through increased automation, scaleable infrastructure Ensure wide adoption across the user community and establish market place for preservation services and tools Build practical solutions Integrate existing expertise, designs and tools Share and build 9 Planets Brings together Archives, Libraries, researchers and technology companies Builds on strong digital archiving and preservation programmes Addresses core challenges Focuses on needs of Libraries and Archives Will provide an interoperable framework to enable Third-parties to provide tools and services Vendors to integrate preservation services Content owners to ensure long-term access to their digital content Will use an empirical approach to gather evidence Outreach shows potential to make a difference 10 Planets partners I The British Library National Library, Netherlands Austrian National Library State and University Library, Denmark Royal Library, Denmark National Archives, UK Swiss Federal Archives National Archives, Netherlands 11 Planets partners II Tessella Plc IBM Netherlands Microsoft Research, Cambridge Austrian Research Centers Hatii at University of Glasgow University of Freiburg Technical University of Vienna University at Cologne 12 The Team Kick-off meeting, June 2006 All Staff Meeting, Feb 2007 13 Planets approach Planning services that empower organisations to define, evaluate, and execute preservation plans Methodologies, tools and services for Characterisation of digital objects Innovative solutions for Preservation Actions An Interoperability Framework provides services distributed services A Testbed enables objective evaluation of protocols, tools, services and plans Outreach, workshops and training to engage the user and vendor communities 14 Project architecture reflects problem structure Preservation Planning Services Preservation Action Services Test Bed: evaluation and validation services Digital Content Org. Context External Context Characterisation Services Interoperability Framework 15 Status Fall 06 Built the team Gathered initial requirements Conducted workshops and surveys Winter 07 Built specifications Evaluated component technologies Spring 07 Finalised many technical and implementation decisions Started to build tools and services Summer 07 Initial prototypes completed First experiments conducted 16 Key technology choices Extensive use of XML throughout architecture Extensive use of web services Extensive use of Java and open source components JSF (Java Server Faces) for user interfaces Workflows BPEL – Business process execution language to describe experiments and plans Eclipse BPEL workflow designer Repository and interfaces JSR-170 Repository API Access to digital archives Jackrabbit to manage intermediate storage and data Sandbox technology for some third-party tools JBoss application server 17 Testbed Provides a foundation for objective evaluation Load content Experiment: collect data, evaluate results, compare outcomes Validate preservation plans Benchmark tools and services Consists of Data storage, hardware, Planets software, testbed software Benchmark and other content Provides resources for The project partners The preservation community External organisations Tool and service certification 18 Testbed – Screen Shot Design Experiment 19 Testbed – Screen Shot Run Experiment 20 Interoperability Framework Provide the glue to hold the Planets tools and services together Provide service registries Characterisation services Preservation action services Provide shared services Security, authentication, authorisation Monitoring, logging, auditing Intermediate data, repository, file system space Execute and manage workflows Enable third-parties to provide tools and services Enable vendors to integrate preservation services 21 Applications …. Character’n Services Preservation Planning Testbed Admin Tool Workflow Designer Person 1 Service Bus Interoperability Framework Repositories Security Monitoring Service Registry Workflow Execution Engine Transaction Manager Work Space Exception Handling Registry Services Registry Services Database Layer 22 Registries Interoperability Framework – Workflow design 23 Preservation planning Preservation Policy Content Profile Preservation Planner Plans Plan Evaluator Usage Profile Sample Content Plan Actions 24 Preservation plan execution Plan Adaptor Delivery Executor Repository Content 25 Content characterisation Characterise content to support preservation Reduce up-front metadata costs E.g., Harvard segmented images based on tool parameters Build on TNAs PRONOM for file-format identification Define a characterisation language (XCDL) Define an extraction language (XCEL) Define an pluggable interpreter Extend to measure loss due to actions All transformations cause loss Leverage understanding to improve file formats Address a root cause of digital obsolescence 26 Preservation actions Transform content Transform environments Wrap third-party transformation tools Modular emulation of the full hardware/software environment Provides full look-and feel Superb for highly dynamic content Fill gaps with new tools Preserve relational databases Build on Swiss Archive work Preserve Office content Build on MSFT tools Leverage Virtual Machine technology Layered durable emulation Build on IBM Universal Virtual Computer (UVC) Establish abstract device drivers 27 Conclusion Planets Is a major EU co-funded digital preservation project Addresses the needs of Libraries and Archives Has made substantial progress towards a service-oriented preservation infrastructure Looks forward to working with the international Digital Library community to test, evaluate, refine, improve 28 Questions? Thanks! http://www.planets-project.eu/ [email protected] 29