Transcript Slide 1

Building a Digital Archives for the City of
Vancouver
Glenn Dingwall
[email protected]
14 September, 2011
Project Context
2004-2006
VanRIMS Classification
Project
2008-2009
VanDOCS ERDMS Project
2009-2010
Olympic Legacy Project
Project Phases
I - Proof of Concept (2008-2009)
• Public records
• Controlled creation environment
II – Prototype (2009-2010)
• Private records
• Uncontrolled creation environment
Initial Assumptions
• Use OAIS (Open Archival Information System
Reference Model) as a starting point
• Progressively add to requirements, drawing
from:
– General Preservation Standards
• InterPARES
• RLG/OCLC Trusted Digital Repositories (TDR)
– Task specific
• E.g., PREMIS metadata
– Institution specific requirements
CoV Digital Archives: Producers and Consumers
Digital Preservation: The Business Case
• Technology obsolescence
• Technology incompatibility
• Long-term access and useability
Alternatives – What’s out there already?
Already many free/open source tools available:
Ingest Tools
JHOVE
DROID
XENA
Repository
DSpace
FEDORA
Greenstone
Access
Archivist’s Toolkit
ICA AtoM
Each only does a small part in the preservation
chain, no start-to-finish single solution
So, what can we do with the existing tools?
Can we piece all of the various components
together to come up with a complete
Digital Preservation system?
Constraints:
• Use open source tools wherever possible
• Lightweight system architecture
• Architecturally independent components
What is OAIS?
OAIS (=Open Archival Information System)
• ISO 14721:2003
• Is a high level reference model
• Defacto standard for discussing digital
preservation concepts at this level
• Important concepts include
– Information Model
– Functional Entities
– Mandatory Responsibilities
OAIS Information Model
Information Packages contain:
– Content (records)
– PDI = Preservation Description Information (metadata)
– Packaging Information
Three types of Information Packages:
SIP = Submission Information Package (what we get)
AIP = Archival Information Package (what we preserve)
DIP = Dissemination Information Package (what we provide)
Information Package Model
Packaging
Preservation Description
Information (PDI - “metadata”)
Content Information
File 1
Context
File 2
Fixity
...
Provenance
Reference
File n
OAIS Responsibilities
•
•
•
•
Accept submissions from Producer
Establish control over material
Implement long-term preservation policies
Determine who the users are (“designated
Community”)
• Ensure preserved information is understandable
to users
• Provide access
OAIS Functional Entities
• Establishes the main functional
components of the system
• Defines the relationships of the
components to each other in terms of the
information that passes between them
OAIS Functional Entities
rc
h
at
a
a
Se
Data
Management
ri e
M
et
ad
ue
Q
s
SIP
Ingest
Access
DIP
AIP
Archival
Storage
Preservation Planning
Administration
Management
City of Vancouver Archives Implementation
rc
h
at
a
ICA AtoM
a
Se
Data
Management
Archivematica
ri e
M
et
ad
ue
Q
s
SIP
Ingest
DIP
Access
AIP
Archival
Storage
Preservation Planning
Administration
50TB NAS
Management
Archivematica
Archivematica Pipeline
SIP
- Content
- Metadata
Ingest
Archivematica
Archivematica Pipeline
AIP
- Original Content
- Metadata
+
- Normalized
Content
- Preservation
Metadata
SIP
- Content
- Metadata
Ingest
Archivematica
to Archival
Storage
Archivematica Pipeline
AIP
- Original Content
- Metadata
+
- Normalized
Content
- Preservation
Metadata
SIP
- Content
- Metadata
Ingest
Archivematica
to Archival
Storage
To Access
System
DIP
- Access Copies
- Descriptive
Metadata
Ingest Workflow Summary
Receive SIP
Audit SIP
Characterize
Content
Appraise
Content
Normalize
Content
Package AIP
Store AIP/
Upload DIP
Micro-services
Characterize and extract metadata
Scan for viruses in submission
documentation
Verify SIP compliance
Set file permissions
Characterize and extract metadata in
submission documentation
Assign file UUIDs and checksums
Appraise SIP for preservation
Normalize submission documentation
Verify metadata directory checksums
Scan for removed files post appraise SIP for
preservation
Remove files without PREMIS
Remove thumbs.db files
Create DIP directory
Verify PREMIS checksums
Create Dublin Core template
Normalize
Compile METS
Set file permissions
Set file permissions
Add Dublin Core to METS
Appraise SIP for submission
Approve normalization
Copy METS to DIP directory
Scan for removed files post appraise SIP for
submission
Check for submission documentation
Generate DIP
Place in quarantine
Move Submission Documentation into objects
directory
Set file permissions
Remove from quarantine
Assign file UUIDs and checksums to
submission documentation
Prepare AIP
Extract packages
Extract packages in submission documentation
Upload DIP
Sanitize file and directory names
Sanitize file and directory names in
submission documentation
Store AIP
Create SIP backup
Scan for viruses
Media Type Preservation Plans
Media type
File formats
Preservation
format(s)
Access format(s)
Normalization tool
Audio
AC3, AIFF, MP3, WAV, WMA
WAVE (LPCM)
MP3
FFmpeg
Email
PST
MBOX
MBOX
readpst
Office Open XML
DOCX, PPTX, XLSX
Original format
PDF for PPTX
OpenOffice
Plain text
TXT
Original format
Original format
None
Portable Document
Format
PDF
PDF/A
PDF
Ghostscript
Presentation files
PPT
ODF
PDF
OpenOffice
Raster images
BMP, GIF, JPG, JP2*, PCT, PNG*,
PSD, TIFF, TGA
Uncompressed
TIFF
JPEG
ImageMagick
Raw camera
files/Digital Negative
format**
3FR, ARW, CR2, CRW, DCR,
DNG, ERF, KDC, MRW, NEF,
ORF, PEF, RAF, RAW, X3F
Original format
JPEG
ImageMagick/UFRaw
Spreadsheets
XLS
ODF
Original format
OpenOffice
Vector images
AI, EPS, SVG
SVG
PDF
Inkscape
Video
AVI, FLV, MOV, MPEG-1, MPEG2, MPEG-4, SWF, WMV
MPEG-2
MPG
FFmpeg
Word processing files
DOC, WPD, RTF
ODF
PDF
OpenOffice
GIS Preservation Questions
• Appropriate formats
• Acceptable losses during
migration/normalization
• Availability of normalization software
• Availability of viewing software
• Necessary metadata
Archivematica Collaborators
•
•
•
•
•
Artefactual Systems Inc.
City of Vancouver Archives
International Monetary Fund
University of British Columbia Library
Rockefeller Archive Centre
Documentation Wikis
Vancouver Digital Archives Project
• http://artefactual.com/wiki/index.php?title=V
ancouver_Digital_Archives
Archivematica
• http://archivematica.org/wiki
Qubit (ICA-AtoM)
• http://qubit-toolkit.org/wiki