digitizing images - Preservation Services

Download Report

Transcript digitizing images - Preservation Services

Digitizing Images
Stephen Chapman
Weissman Preservation Center
Harvard University Library
30 September 2003
Motivations
Necessity
More and more material is being produced in digital form;
more and more of our users want access to such materials.
Excellence
Because we value the highest levels of teaching and research
we will have to change our way of doing things.
Innovation
Technology enables uses not possible with analog formats.
Obligations are to create coherent, integrated collections and
to deliver them with tools that support innovative professional
practice.
Why images?
“The special opportunities presented by digital
technologies constitute the most fundamental
development in the potential for increased access and
flexibility of use since the advent of photographic
reproduction.”
Neil L. Rudenstine, April 2001
Digital technologies
World Wide Web
“This is what the Internet does well.”
Relational databases
Permit flexible approaches to cataloging — hierarchical
structures needed to manage and describe multiple versions
of a particular item, including surrogates.
Digital cameras and scanners
Proven capability to create digital surrogates “that faithfully
represent the originals in tone and color and provide a level of
detail that would enable advanced study.” David Remington
Products of image digitization
Images
The data for scholars to study. Single version of image rarely
suffices to meet all needs (compare, study, print).
Descriptions of images (text)
The “metadata” the user needs to locate and interpret images.
Descriptions of ownership and rights (text)
The metadata the owner uses to disclose terms and
conditions associated with using images.
Source materials
The principal assets valued by owners and users.
Infrastructure
Catalogs
Systems for comprehensive and controlled searching.
Persistent naming
Means to ensure image management and reliable access.
Repository
A trustworthy place to manage images over time.
Delivery
Systems to deliver digital images to authorized users.
Workflow
project
planning
source
selection
prep
surrogates
cataloging
imaging
deposit
linking
Project planning
There are no absolute rules for creating good
collections, objects or metadata. Every project is
unique and each has its own goals. The key to a
successful project is not to follow any particular path,
but to think strategically and make wise choices.
IMLS Framework of Guidance
Selection
For the kind of pictures we collect, individual public
domain analysis is expensive. [T]here is only one
practical methodology: accession policy must choose
the most conservative boundary as a functional bright
line that separates what is acceptable and what is not
acceptable.
Robert A. Baron
Prep
Whenever there is handling of original collections,
there is a need for the application of conservation
knowledge and practice.
Library of Congress NDLP and Conservation Division
Workflow
project
planning
source
selection
prep
surrogates
cataloging
imaging
deposit
linking
Cataloging practices
Local, but…
“Picture catalogs still tend to be incomplete, idiosyncratic, and
isolated.” Helena Zinkham
Movement to consolidate: union catalogs
AMICO, ArtSTOR, VIA, UCAI (UCSD, ArtSTOR, Harvard)
Emerging consensus and best practice
VRA Guide, “Cataloguing Cultural Objects”
http://www.vraweb.org/CCOweb/
Data standards “promote sharing, improve the management of
content, and reduce redundancy of effort.”
Descriptive metadata standards
Specific to topics or disciplines
Biology or art
Specific to kinds of materials
Moving pictures, encoded texts
Specific to support particular functions
Discovery, rights management, presentation
Descriptive metadata standards
Which “information pieces”
Data dictionaries (e.g., for OLIVIA)
CDWA, VRA Core, Dublin Core
How information is formed
Content standards and vocabularies:
VIA Working Group has identified over 20
How information is encoded for processing
Syntax (e.g., MARC, RDF)
Virtually no standards govern all of these aspects of metadata.
http://hul.harvard.edu/ois/systems/via/via_standards.html
Key decisions
Scope
Which catalog(s)? …HOLLIS, OASIS, VIA used at Harvard
Item- or group-level cataloging
Extent
Amount of cataloging (project and program policies)
Digital image production
Lights, camera…
Visual literacy and technical skill still absolutely critical
Pixels!
“The more one looks at image quality and ways to clearly
define it, the more parameters have to be taken into account.”
Frey and Reilly
- rendering intent
- tone reproduction
- detail and edge reproduction
- color reproduction
- noise
Digital image standards
Formats
DLF Global Digital Format Registry
Quality
I3A/IT10 Electronic Still Picture Imaging Committee
ISO speed, resolution (MTF), OECF, noise and color
measurement
ISO 3664: 2000 Viewing conditions
Technical metadata for digital still images
NISO Z39.87-2002 AIIM 20-2002 (governed by LC)
Digital imaging practices
masters
delivery images admin metadata
quality control
“support intended current and likely future use”
(IMLS Framework)
archival masters (optimized for processing, not viewing)
production masters (optimized for automation)
no compression for grayscale and color images
TIFF = format of choice
Digital imaging practices
masters
delivery images admin metadata
calibrated devices
calibrated environment
targets
checksums
validation software at repository
quality control
Digital imaging practices
masters
delivery images admin metadata
quality control
“supports management of resources” (R. Wendler)
• ownership
• access restrictions
• technical attributes of files
XML format
produced and deposited in addition to images
Digital imaging practices
masters
delivery images admin metadata
calibrated devices
calibrated environment
targets
checksums
validation software at repository
quality control
Deposit
DRS preservation services provide active oversight to
ensure an indefinite lifespan for objects deposited in
approved formats. "Oversight" involves monitoring file
formats, assessing the vulnerability of digital
collections, and transforming files to maintain usability.
HUL DRS Policy Guide
Repository Storage Cost Gaps, Photographs, Example 1
Harvard Depository and OCLC Digital Archive (2003)
$ per photograph, per year
$0.47
OCLC (>1,000 GB rate)
HD film vault
$0.16
$0.003
24-bit PCD (2) (10.7 MB)
24-bit TIFF (2) (32 MB)
35mm negative
Current cost gap: digital 53-157X more expensive than film @ OCLC
18-52X more expensive at Harvard (DRS)
Repository Storage Cost Gaps, Photographs, Example 2
Harvard Depository and OCLC Digital Archive (2003)
$ per photograph, per year
$3.35
OCLC (>1,000 GB rate)
HD film vault
$0.016
24-bit TIFF (229 MB)
4 x 5 negative
Current cost gap: digital 209X more expensive than film @ OCLC,
70X more expensive at Harvard (DRS)
Closing cost gaps for repository storage
Compression
Investigate risks associated with using bit-for-bit lossless
compression instead of uncompressed formats as
preservation masters.
Cost metrics
Bill owners at unit other than size (e.g., per GB) to sustain
costs of running repository and preservation services.
Subsidies
Create common-good repositories and services (“safe
havens”) with secure, sustainable funding lines for items that
meet defined criteria.
Hybrid approach viable for still images
Digital Masters
Deposit digital master to repository, pay for annual
maintenance regardless of use.
Repurpose digital masters: produce delivery images, in
analog or digital formats, in advance and/or upon request.
Analog Masters
Deposit analog (e.g., film) master to repository, pay for annual
maintenance regardless of use.
Repurpose analog masters: produce delivery images, in
analog or digital formats, in advance and/or upon request.
Lessons learned
Building ArtSTOR into a trusted repository … will require
not only time and resources, but also collegiality and the
active participation of individuals from academic
institutions, museums, libraries, and research centers;
specialists in imaging and in building databases; others
experienced in the creation of digital resources; experts
in intellectual property rights; and wise generalists.
One clear conclusion is that working on this project
inspires humility!
William G. Bowen, President
Andrew W. Mellon Foundation
Resources
Your colleagues!
• Mellon Foundation, 2001 President’s Report, “ArtSTOR”
• Harvard University Library, LDI Program Origins
• David Remington, “HCL-DIG General Imaging Practice”
• Helena Zinkham,”Bridges & Whirlpools: Best Access
Practices for Pictures”