Presentation Title - Indiana University

Download Report

Transcript Presentation Title - Indiana University

Building an Audio Preservation System at
Indiana University Using Standards
and Best Practices
Mike Casey, Archives of Traditional Music
Jon Dunn, Digital Library Program
Jenn Riley, Digital Library Program
April 14, 2008
The Problem
• Numbers
• Degradation
• Obsolescence
Audio/Video at IUB
•
•
•
•
•
•
•
•
•
•
•
•
AAAMC
Music Library
University Archives
ATM
HPER
Radio/TV
Center for the Study of History and
Memory
Kinsey Institute
Athletics
Emeriti House
Office of University Marketing
Wells Library
•
•
•
•
•
•
•
•
•
•
•
•
AISRI
Office of Dean of the Faculties
Lilly Library
Alumni Relations
School of Journalism
School of Music
School of Law
Traditional Arts Indiana
Department of History
Department of Anthropology
Department of
Folklore/Ethnomusicology
Black Film Center/Archive
Plus many more!
July 6, 2015
By the Numbers
• ATM: 110,000 (mostly) audio recordings
• Wells Library, Kent Cooper Room: 20,000 videos
• Music Library: 137,000 audio recordings
*2,000 lacquer discs
*8,000 DATs
*50,000 open reel tapes
• CSHM: 3,200 audio recordings
• AAAMC: 5,900 audio recordings
Obsolescence
•
•
•
•
•
•
•
Audio formats
Equipment (playback machines, test devices)
Repair parts
Playback expertise
Repair expertise
Tools
Supplies
Preservation in the Analog Domain
•
•
•
•
Life expectancy critically important
Predicting when a recording will fail
Quest for the eternal carrier
Target preservation format-mastering-quality
open reel tape
• Standards set in the mid-1980’s-ARSC/AAA
The New Paradigm
• Eternal sound carriers never available
• Maintaining equipment long-term
unmanageable
Therefore, classical preservation strategy is hopeless
The New Paradigm
• Preserve the content, not the carrier
• The eternal file, not the eternal carrier
• Use digital mass storage systems
Longevity of carriers in mass storage systems of
minor importance
Standards and Best Practices
•
•
•
•
•
Ensure Quality
Provide Philosophical/Ethical Foundation
Encourage Sustainability
Foster Interoperability
Provide a Migration Path
Preserving Digital Information
• Advantage: Digital information may be copied
without degradation
• Disadvantage: Digital information requires
active management in order to remain
accessible
Risks of Digital Information:
Bit Loss
• Degradation of physical media
– Optical, magnetic
• Damage or theft of physical media
• Media obsolescence
– Ability to read physical media
– Ability to read logical media format
Risks of Digital Information:
Semantic Loss
• Even if the bits are intact, can a file still be
understood?
• File format obsolescence
• Loss of context
– Insufficient metadata
Risks of Digital Information:
Integrity
• How do we know whether or not information
has been altered, whether intentionally or
unintentionally?
Methods of Mitigating Risks
• Migration
– Migration of data to new physical media
– Migration of data to new file formats
• Replication
– Multiple copies of data in multiple locations
• Validation
– Retain checksums for files, routinely retrieve
files and compare against checksums
Scaling Digital Preservation
• Migration, replication, and validation require:
– Automated processes
– Ongoing monitoring, management, and
planning
– Ongoing funding for technology refresh
Digital Repositories
• Centrally-managed systems for storage (and
delivery) of digital information
• Leverage economies of scale for storage and
management costs
• Support preservation integrity functions
(migration, replication, validation)
• Much easier to manage than many little pockets
of digital information
OAIS:
Open Archival Information System
• ISO Standard 14721:2003
• Origins in space science community
• Conceptual framework for an archival system
dedicated to preserving and maintaining access
to digital information over the long term
• Basis for much work on digital preservation
within the library and archive community
OAIS Reference Model
Preservation Packages in OAIS
• Preservation package
– Digital content plus metadata
• SIP: Submission Information Package
• AIP: Archival Information Package
• DIP: Dissemination Package
From OAIS to Trusted Digital
Repositories
• 2002 OCLC-RLG task force report:
– Trusted Digital Repositories: Attributes and
Responsibilities
• What are the attributes of a trusted repository?
–
–
–
–
–
–
OAIS compliance
Administrative responsibility
Organizational viability
Financial sustainability
System security
Procedural accountability
Trusted Digital Repositories:
Auditing and Certiciation
• Digital Repository Audit Method Based on Risk
Assessment (DRAMBORA)
– http://www.repositoryaudit.edu/
• Trustworthy Repositories Audit & Certification
(TRAC): Criteria and Checklist
– OCLC/NARA/CRL report
– http://www.crl.edu/PDF/trac.pdf
Archives of Traditional Music
•
•
•
•
•
•
Established 1948
110,000 recordings
1890s to present
Field—30%
World music traditions
Endangered/extinct world languages
Sound Directions
Digital Preservation and Access for Global Audio Heritage
• Collaboration between Harvard University and
Indiana University
• Phase 1 an R&D project funded by NEH
• Focus on preservation
Sound Directions
Digital Preservation and Access for Global Audio Heritage
Project Partners
•
•
•
•
•
Archives of Traditional Music, Indiana University
Archive of World Music, Harvard University
Harvard College Library Audio Preservation Services
Digital Library Program, Indiana University
Office for Information Systems, Harvard University
Sound Directions
Digital Preservation and Access for Global Audio Heritage
Objectives
• Research best practices in areas without standards or
best practices
• Develop best practices to meet existing and emerging
standards
• Test existing and emerging standards/best practices
with a real world project
Sound Directions
Digital Preservation and Access for Global Audio Heritage
Results
 Publication—
Sound Directions: Best Practices for Audio Preservation
 Development of audio preservation system
 Software tools
 Preservation of field collections
Sound Directions
Digital Preservation and Access for Global Audio Heritage
Project Future
•
•
•
•
•
•
“Preservation” Phase funded by NEH
Increase throughput
Simultaneous transfer
Indiana automation
Release ATMC
Develop new access system for field collections
System / Project
Planning & Development
Funding
Personnel / Vendor
Equipment
Selection for Preservation
Assess research value
Evaluate condition
Consider political, technical,
and other issues
Establish priorities
Software Tools
Creation / maintenance of
software and scripts
Cleaning or physical restoration as needed
Collection Setup
Gather and assess documentation
Evaluate collection needs / condition
Assess cataloging / descriptive
metadata issues
Develop digitization plan
Assess and calibrate equipment
Preliminary Work /
Pilot Project
Exploratory transfers and
metadata collection
Quality control
Reassessment of
digitization plan
Workflow management / scheduling
Digitization
Analog playback
A/D conversion
Creation of Preservation
Master Files
Local filenames
Ingestion into / Copy to
Long-Term Storage
Solution
Preservation packages
Digitization
Technical metadata
Structural metadata
Checksums
Quality control
Local storage solution
Periodic Evaluation
Data integrity checking
Format obsolescence
analysis
Workflow
management
Post-Transfer Processing
Quality control
Generation of derivatives
Marking areas of interest in
files
Signal processing
(if appropriate)
Migration
decision
Migration
New carrier
New format
Common sense definition of a system:
• Set of interacting units or elements
• Forms an integrated whole
• Performs a function
A few basic principles…
•
•
•
•
Each element/part affects the whole
Whole is greater than sum of parts
Inputs and outputs
Equifinality
What should we preserve?
Selection for Preservation
• Analysis of research value
• Evaluation of preservation condition and risk
Data Collection/Analysis
Condition/Risk
(FACET)
Score
Research Value
Score
Combined Selection
Score
Collection Ranking
Curatorial Review
Selection for
Preservation
FACET
• Software tool—point-based, collection level
• Analyzes data on condition of field formats
• Returns a risk assessment score
FACET Package
•
•
•
•
Software
Formats document
Procedures manual
FACET worksheets
Where should preservation
work be done?
• In-house or outsource?
• Issues: studio space, technical expertise,
amount of work, future location of expertise
• Critical listening spaces
• Development of preservation studio
Who should do preservation transfer
work?
• Audio engineer
• Importance of analog playback stage
• Audio examples
Who and Where Best Practices
• Use audio engineers in the workflow where
their skill is required
• Critical listening environment
• Use cleanest, most direct signal path to
converter
• Instant comparison from playback machine and
post A/D converter
• Test/calibration chain
What is the target preservation format?
• Digital file
• Broadcast Wave Format (BWF or BWAV)
Preservation involves a long-term responsibility to
the digital file
What do we look for in a file format?
•
•
•
•
•
•
•
Disclosure
Adoption
Transparency
Self-documentation
External dependencies
Impact of patents
Technical protection mechanisms
http://www.digitalpreservation.gov/formats/sustain/sustain.shtml
Broadcast Wave Format
•
•
•
•
Audio file format based on .wav files
EBU 1996 for the exchange of files
Non-proprietary
Recommended by IASA, AES, NARAS, Sound
Directions for preservation
• “Chunk” for metadata residing with the file
• Time stamp
Broadcast Wave Format
Metadata elements include:
•
•
•
•
•
Description of the sound sequence
Name of the originator
Date/time
Coding history (signal chain components)
Format independent, sample accurate time
stamp
• “Catastrophic” metadata
How do we define the files we create?
• What is in them?
• How are they created?
• What do they represent?
Preservation (Archival) Master Files
Best Practice Documents
•
•
•
•
Unmodified
No subjective alterations or improvements
Preserve history, not re-write it
As true to the original source as possible
Preservation (Archival) Master Files
• Complete, unaltered stream from playback
machine
• Carrier of raw material from transfer
• No editing, signal processing, data reduction,
gain manipulation, announcements (slates)
• 24 bit, 96 kHz
Preservation (Archival) Master Files
Best Practices
• Define purpose of every digital file
• Written guidelines on characteristics of files
• Written guidelines on “technical” and content
edits
• Maintain common reference timeline
Data Integrity
 Data integrity checking
 “Checksums”
 MD5 hash or algorithm
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
Data Integrity
• All files with enduring value
• As soon as possible
• Critical metadata stored in database and in
preservation package
• Verify before trusting
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
How do we make the preserved content
understandable and manageable?
•
•
•
•
•
Descriptive Metadata
Administrative—Technical Metadata
Administrative—Digital Provenance
Administrative—Rights Management
Structural Metadata
Audio Technical Metadata Collector
(ATMC)
•
•
•
•
Enter/edit technical and structural metadata
Audio object and process history metadata
Enter/edit audio object evaluations
Parse files to collect metadata
Quality Control and Assurance
•
•
•
•
•
Quality control vs. quality assurance
QC at ATM: aural, visual, software tools
Collection setup—preliminary transfers
Role of permanent staff
QA at ATM
How do we store the data immediately
after capture?
•
•
•
•
Local, interim storage
Backup copies at each stage
ATM NAS
Additional redundant copy
Director
Archivist
Project Development
Selection for Preservation
Librarian
Cataloging Issues
Selection
Preview Collections
QC Documentation
Associate Director
Audio Engineer
Preservation Transfer
Preservation Master Files
Technical MD Collection
Checksums
BWAV MD
ADL’s
Signal Processing
Project Management
Selection—Format Issues
Scheduling Coordination
QC
R&D
Programmer
Software/Script
Development
Digital Library Program
Preservation Repository Services
Deliverables
Access System
Project Assistant
Content Division
Production Masters
QC
ADL’s
Workflow Management
Collection Setup
Ingestion Process
The Role of Metadata in Digital
Preservation
What is metadata?
• “The stuff we need to know in order to discover
and manage data over the long term”
• Here’s a better definition:
“Metadata is structured information that
describes, explains, locates, or otherwise
makes it easier to retrieve, use, or manage
an information resource.”
NISO. “Understanding Metadata.” 2004.
<http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>
Metadata standards
• Standards define mutually agreed-upon:
– Definitions of key terms
– “Fields” of data to record
– Rules for structuring data in these fields
• In this area, generally expressed in XML
• Allow us to benefit from community experience
• Promote preservation by providing for more
predictable data
Evaluating metadata standards
• Good fit for the type of material I have?
• Supports my access/management/preservation
needs?
• Are there existing tools to help me create it?
• Has it been used before in similar situations?
• Who maintains it?
• How quickly are the standards in this
environment changing?
Creating metadata
• Generally not done by humans encoding data
directly in the storage format
• Instead:
– Humans use tools designed for specific
purposes
– Derived computationally from the digital
resource itself
Technical metadata
• Tracks properties of a digital file necessary for
its rendering and processing
• Can also include data about the circumstances
of creation of a digital file
• Often format- or media-specific
• Much can be generated automatically from
digital file
Digital provenance metadata
• Tracks the history of a set of related digital files
– Can include the methodology by which the
“master” file was created from an analog
source (overlap with technical metadata)
– What transformative processes have been
applied to the file
– Relationship of “derivative” files to the
“master”
Structural metadata
• Documents relationships within and between
digital files
– Locating the same intellectual content on
multiple representations
– Noting points of interest within a single
resource
– Grouping and sequencing multiple files that
make up a logical whole
Rights metadata
• Covers legal, moral/ethical, financial rights over
resources
– Rights holders
– Copyright status
– Conditions on access
– Usage fees/royalty payments
• Can be in human- or machine-readable format
Descriptive metadata
• Like “cataloging”
• Allows users and collection managers to find
and identify resources of interest
• Factual information such as creator, date
created, running time (overlap with technical
metadata)
• Constructed information such as title
• Subjective information such as topic, genre
Preservation metadata
• Some overlap with technical and process
history metadata
• Catch-all for all the metadata we need to
support the preservation process that’s not
recorded elsewhere
• Most important feature: tracking events that
occur during the preservation process
Preservation Packages
Types of preservation packages
• According to OAIS:
– Submission information package (SIP)
– Archival information package (AIP)
– Dissemination information package (DIP)
• The AIP is what is stored (potentially broken up
into pieces) in the IU repository
• Metadata Encoding and Transmission Standard
(METS) used to wrap various pieces together
Information representation
• Repository needs two simultaneous views of
the content it manages
– Physical files
– Functions the repository needs to support
Technical metadata
Also record for analog source object!
Audio Engineering Society, Core Audio
Schema Draft. AES X098-B/SC-03-06.
Digital provenance metadata
Audio Engineering Society, Audio Processing
History Draft. AES X098-C/SC-03-06.
Structural metadata (1)
Audio Engineering Society, Audio
Decision List. AES 31-3
and
Metadata Encoding and Transmisson
Standard (METS), <structMap> section
Structural metadata (2)
Audio Engineering Society, Audio
Decision List. AES 31-3
and
Metadata Encoding and Transmisson
Standard (METS), <structMap> section
Rights metadata
• For field audio collections, the ATM knows:
– Collector
– Terms of deposit governing access
• This area still under develop for the IU
repository
• No decision yet on metadata format; need more
thorough analysis of the functions this metadata
needs to support
Descriptive metadata (1)
MARCXML
Descriptive metadata (2)
METS reference to external
Word document
Preservation metadata
• Still under investigation for IU repository, for all
formats of material
• Will need to implement before any preservation
events occur
• Will likely use PReservation Metadata
Implementation Strategies (PREMIS) data
dictionaries and schema
Need to share
• Copies in multiple repositories can help ensure
preservation
• Sound Directions did a test exchange of content
between IU and Harvard
– Different repository architectures
– Different preservation package structures
• ...demonstrated how different levels of
preservation are possible
Two Repositories Supported by the
Digital Library Program
• IUScholarWorks Repository
– “Institutional Repository”
• For preserving and providing access to IU’s
research output: articles, papers, etc.
– Based on DSpace software
• IU Digital Library Repository
– General-purpose digital content repository
– Based on Fedora software
Fedora
• Flexible Extensible Digital Object
Repository Architecture
• Open source digital repository software
developed by Cornell and the University of
Virginia
• Supported by new organization:
Fedora Commons
• Basis for IU Digital Library Repository
Moving Content to a Digital
Repository – Idealized Workflow
ATMC/Audio
Workstation Upload
preservation
package
Temporary
Server Disk
Storage
Master audio files
in MDSS
Validate
and ingest
Fedora
Repository
Delivery audio
files on streaming server
Metadata records on
disk
IU Massive Data Storage System
(MDSS)
• Hierarchical storage management
– Some storage on hard disks
– Much more storage on automated tape
• Managed by UITS Research Technologies
• Servers in Bloomington and Indianapolis
connected via I-Light high-speed fiber link
• Total capacity: 2+ petabytes
• Need to build Fedora-MDSS connection
Repository Status
• Fedora is running in production
– Supporting access to image and text
collections
– Experiments with loading audio and video
• Need to improve tools for ingest and retrieval to
support audio projects
• Not yet a true preservation repository
Toward a Preservation Repository
• Need to add:
– File integrity validation
– Integration with MDSS – replication of data
– Eventually, file format obsolescence
monitoring and migration
• Self-audit and/or external certification as
Trusted Digital Repository
– DRAMBORA, TRAC
Access Systems
• Variations2
– variations2.indiana.edu
– Provides access to cataloged commercial
recordings from the Music Library and ATM
• Need access system to provide discovery and
delivery of field collections and other types of
archival audio collections
Questions?
• [email protected][email protected][email protected]
• www.dlib.indiana.edu/projects/sounddirections/