Transcript Slide 1
Where are we with Digital Preservation? Andrew Waugh Public Record Office Victoria Where are we? • It is not the end. It may not even be the beginning of the end. But it is undoubtedly the end of the beginning – Winston Churchill • This talk will cover – Consensus views on digital presevation – Open questions and future challenges What this presentation will cover • • • • Understanding (building systems) Storage (preserving the bit strings) Access (preserving the meaning) Metadata (preserving the context & authenticity) • Transfer (overcoming system senescence) Understanding • Communication requires shared terminology and concepts • Open Archival Information System (OAIS) reference model (IS 14721:2003) – http://public.ccsds.org/publications/archive/650x0b1.pdf – High level terminology very widely used, but few use the detail in the model – Does not cover preservation – Pre web and detail does not reflect actual implementations – Currently under review Trusted digital repositories • How can you be sure if an organisation (& its system) is up to holding your digital objects? • Trustworthy Repositories Audit and Certification – CRL/NARA (2007) • http://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91 – Administrative focus rather than technical – high level (cannot be tested) – Based on OAIS, basis for audit checklists Audit checklists • Provide tests to see if a repository can be trusted – Drambora: DCC/DPE (2007) • Risk based, self certification • http://www.repositoryaudit.eu/ Public domain digital repositories • Public domain digital repository code – D-Space (http://www.dspace.org/) – Fedora (http://www.fedora-commons.org/) • Both came out of the academic community and primarily support institutional repositories Storage – preserving the bit string • Fundamental task of digital preservation is ensuring that the bits that make up the digital objects are preserved • “Solved” problem – large scale data repositories have existed for decades and there is lots of operational experience • Archival twist: actively monitor health of stored objects using hashes Storage - future challenges • Reducing storage cost (and chance for error) – Swedish National Archives estimated in 2005 between 4 and 8 Euro per digitised page mostly in system and support costs – http://www.tape-online.net/docs/Palm_Black_Hole.pdf • Reducing risks – Administrator risk vs packaged risk • Ideal storage system – Packaged (i.e. built in administration such as the Centera) – Open so that you can trust it and replace components • CLOCKSS – Uses redundant copies at participating institutions to ensure preservation (LOCKSS) – http://www.clockss.org/clockss/Home Access – preserving the meaning • What do you do when you no longer have an application to open the data files? • Current approach is either – Do nothing now with eventual migration – Normalisation upon accession • Future approach might be emulation Migration • Save what you capture now and convert to new formats as required – Web harvesting (studies show web sites are mostly safe formats – HTML, XML, jpeg, gif, etc) – Formats (and software) proving surprisingly resilient Normalisation • Convert upon accession to small number of long term preservation formats – E.g. PDF/A (PROV), ODF (NAA) – Immediate cost upon accession, but expected lower long term management cost – Criteria for good LTPF (Library of Congress) • http://www.digitalpreservation.gov/formats/intro/intro.shtml Challenges • What is it? Tools to determine file formats – Pronom – repository of format descriptions and DROID (format classifier) http://www.nationalarchives.gov.uk/pronom/ – JHOVE (Harvard) classifier and simple validation http://hul.harvard.edu/jhove/ • How accurate is the conversion? • Is it a valid file according to the standard? Metadata is better data • Metadata is information about the bit string – What it is (semantic) – What it is (technical) – How it relates to other digital objects – What is its history? – How is it to be managed? • Unfortunately, lots and lots of large metadata standards Metadata standards • For an excellent summary of metadata standards see the Metadata chapter in the DCC Digital Curation Manual – http://www.dcc.ac.uk/resource/curationmanual/chapters/metadata/metadata.pdf Digital preservation metadata • Data Dictionary for Preservation Metadata (PREMIS) – little descriptive information and nothing format specific – http://www.loc.gov/standards/premis/ • ISO 23081 (Metadata for records) • National Archives Australia Recordkeeping Metadata Standard – http://www.naa.gov.au/Images/rkms_pt1_2_tcm2-1036.pdf Future challenges • Too many competing standards – Which do I implement? • Too many elements – Increases cost of standard development and software implementation • Few elements ever used – Too expensive and too hard to capture metadata Transfer Overcoming system senescence • Digital objects have a much longer life than the systems that hold them – Move objects to digital repositories where they can be properly managed – Move them from one digital repository to its replacement • Storage is so cheap that holders may be tempted to keep digital objects (until it is too late) Future challenges • Current systems are not designed around the assumption that digital objects must be relocated – AIHT, Conceptual Issues from Practical Tests, Clay Shirky, D-Lib Magazine, Vol 11 No 12, December 2005, http://www.dlib.org/dlib/december05/shirky/12shirky.ht ml • ADRI-UN/CEFACT work on a standard to transfer custody of digital records More information • If I have whetted your appetite... – PADI Annotated bibliography of digital preservation (http://www.nla.gov.au/padi/) – D-Lib Magazine (http://www.dlib.org/) Final thoughts • We know about compasses, and we have some charts, but there are a lot of rocks out there… We are a long way from satellite navigation • What about small/medium archives… personal archives? • Are photographs better digital or as negatives? – http://www.wilhelm-research.com/