Transcript Slide 1
Building a Digital Archives for the City of Vancouver Glenn Dingwall [email protected] 14 September, 2011 Project Context 2004-2006 VanRIMS Classification Project 2008-2009 VanDOCS ERDMS Project 2009-2010 Olympic Legacy Project Project Phases I - Proof of Concept (2008-2009) • Public records • Controlled creation environment II – Prototype (2009-2010) • Private records • Uncontrolled creation environment Initial Assumptions • Use OAIS (Open Archival Information System Reference Model) as a starting point • Progressively add to requirements, drawing from: – General Preservation Standards • InterPARES • RLG/OCLC Trusted Digital Repositories (TDR) – Task specific • E.g., PREMIS metadata – Institution specific requirements CoV Digital Archives: Producers and Consumers Digital Preservation: The Business Case • Technology obsolescence • Technology incompatibility • Long-term access and useability Alternatives – What’s out there already? Already many free/open source tools available: Ingest Tools JHOVE DROID XENA Repository DSpace FEDORA Greenstone Access Archivist’s Toolkit ICA AtoM Each only does a small part in the preservation chain, no start-to-finish single solution So, what can we do with the existing tools? Can we piece all of the various components together to come up with a complete Digital Preservation system? Constraints: • Use open source tools wherever possible • Lightweight system architecture • Architecturally independent components What is OAIS? OAIS (=Open Archival Information System) • ISO 14721:2003 • Is a high level reference model • Defacto standard for discussing digital preservation concepts at this level • Important concepts include – Information Model – Functional Entities – Mandatory Responsibilities OAIS Information Model Information Packages contain: – Content (records) – PDI = Preservation Description Information (metadata) – Packaging Information Three types of Information Packages: SIP = Submission Information Package (what we get) AIP = Archival Information Package (what we preserve) DIP = Dissemination Information Package (what we provide) Information Package Model Packaging Preservation Description Information (PDI - “metadata”) Content Information File 1 Context File 2 Fixity ... Provenance Reference File n OAIS Responsibilities • • • • Accept submissions from Producer Establish control over material Implement long-term preservation policies Determine who the users are (“designated Community”) • Ensure preserved information is understandable to users • Provide access OAIS Functional Entities • Establishes the main functional components of the system • Defines the relationships of the components to each other in terms of the information that passes between them OAIS Functional Entities rc h at a a Se Data Management ri e M et ad ue Q s SIP Ingest Access DIP AIP Archival Storage Preservation Planning Administration Management City of Vancouver Archives Implementation rc h at a ICA AtoM a Se Data Management Archivematica ri e M et ad ue Q s SIP Ingest DIP Access AIP Archival Storage Preservation Planning Administration 50TB NAS Management Archivematica Archivematica Pipeline SIP - Content - Metadata Ingest Archivematica Archivematica Pipeline AIP - Original Content - Metadata + - Normalized Content - Preservation Metadata SIP - Content - Metadata Ingest Archivematica to Archival Storage Archivematica Pipeline AIP - Original Content - Metadata + - Normalized Content - Preservation Metadata SIP - Content - Metadata Ingest Archivematica to Archival Storage To Access System DIP - Access Copies - Descriptive Metadata Ingest Workflow Summary Receive SIP Audit SIP Characterize Content Appraise Content Normalize Content Package AIP Store AIP/ Upload DIP Micro-services Characterize and extract metadata Scan for viruses in submission documentation Verify SIP compliance Set file permissions Characterize and extract metadata in submission documentation Assign file UUIDs and checksums Appraise SIP for preservation Normalize submission documentation Verify metadata directory checksums Scan for removed files post appraise SIP for preservation Remove files without PREMIS Remove thumbs.db files Create DIP directory Verify PREMIS checksums Create Dublin Core template Normalize Compile METS Set file permissions Set file permissions Add Dublin Core to METS Appraise SIP for submission Approve normalization Copy METS to DIP directory Scan for removed files post appraise SIP for submission Check for submission documentation Generate DIP Place in quarantine Move Submission Documentation into objects directory Set file permissions Remove from quarantine Assign file UUIDs and checksums to submission documentation Prepare AIP Extract packages Extract packages in submission documentation Upload DIP Sanitize file and directory names Sanitize file and directory names in submission documentation Store AIP Create SIP backup Scan for viruses Media Type Preservation Plans Media type File formats Preservation format(s) Access format(s) Normalization tool Audio AC3, AIFF, MP3, WAV, WMA WAVE (LPCM) MP3 FFmpeg Email PST MBOX MBOX readpst Office Open XML DOCX, PPTX, XLSX Original format PDF for PPTX OpenOffice Plain text TXT Original format Original format None Portable Document Format PDF PDF/A PDF Ghostscript Presentation files PPT ODF PDF OpenOffice Raster images BMP, GIF, JPG, JP2*, PCT, PNG*, PSD, TIFF, TGA Uncompressed TIFF JPEG ImageMagick Raw camera files/Digital Negative format** 3FR, ARW, CR2, CRW, DCR, DNG, ERF, KDC, MRW, NEF, ORF, PEF, RAF, RAW, X3F Original format JPEG ImageMagick/UFRaw Spreadsheets XLS ODF Original format OpenOffice Vector images AI, EPS, SVG SVG PDF Inkscape Video AVI, FLV, MOV, MPEG-1, MPEG2, MPEG-4, SWF, WMV MPEG-2 MPG FFmpeg Word processing files DOC, WPD, RTF ODF PDF OpenOffice GIS Preservation Questions • Appropriate formats • Acceptable losses during migration/normalization • Availability of normalization software • Availability of viewing software • Necessary metadata Archivematica Collaborators • • • • • Artefactual Systems Inc. City of Vancouver Archives International Monetary Fund University of British Columbia Library Rockefeller Archive Centre Documentation Wikis Vancouver Digital Archives Project • http://artefactual.com/wiki/index.php?title=V ancouver_Digital_Archives Archivematica • http://archivematica.org/wiki Qubit (ICA-AtoM) • http://qubit-toolkit.org/wiki