Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.

Download Report

Transcript Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.

Digital Image Scanning
Instructor:
Geri Bunker Ingram
[email protected]
An Infopeople Workshop
August 2005
This Workshop Is Brought to You By
the Infopeople Project
Infopeople is a federally-funded grant project
supported by the California State Library. It
provides a wide variety of training to California
libraries. Infopeople workshops are offered
around the state and are open registration on a
first-come, first-served basis.
For a complete list of workshops, and for other
information about the Project, go to the
Infopeople website at infopeople.org.
Introductions
Please tell us again, your



Name
Library
Position and role within the Local History Project
Are there lingering questions from yesterday that we
should discuss?
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Learning Objectives





Understand the basics of digital imaging
Interpret and evaluate scanning specifications
for your project
Differentiate among different technology
options for various formats
Understand the significance of standard
metadata
Learn about display and navigation options.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Agenda

9:00—10:30 What is Digitization?
10:30—10:45 BREAK
10:45—12:00 Technology Infrastructure

12:00—1:00
LUNCH
1:00—2:30
2:30—2:45
2:45—4:00
Metadata, Rights, Quality Control
BREAK
Effectiveness





August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
What is Digitization?
What is Digitization?

Process of digitization
resolution
 bit depth



The Local History Project guidelines and
standards
The implications of these standards
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Refresher on Scanning




Scanning takes reflected light signals and
changes them to digital data.
The resulting digitized image is made up of a
grid of individual picture elements.
Picture elements are known as “pixels”. Pixels
are made up of binary digits (bits)
Each bit is expressed as either “0” or “1”.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Controlling Spatial Detail and
Accuracy

Two settings affect spatial detail and accuracy
during the scanning process
 bit depth
 resolution (the number of bits sampled)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Adjusting Bit Depth

Binary digit (bit) “depth”
number of bits used to define each pixel.
 the greater the bit depth, the greater the number
of tones (grays or color)




Black and white (bitonal)=1 bit per pixel
Grayscale=8 bits per pixel (256 shades of
gray)
Color=24 bits per pixel (16.7 million color
tones)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Adjusting Resolution


Resolution is a sampling rate—
how many dots per inch will you scan?


E.g., 400 dpi.
The effect:
the higher the rate, the smoother the image
 the more it can be magnified before its individual
pixels become visible



High resolution = many dots per inch
Low resolution = fewer dots per inch
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Sometimes Resolution Is Expressed
As Absolute Pixel Dimensions
Pixel dimensions =
(dpi x width) x (dpi x height)
Example: 3200 x 4000 would be the pixel
dimensions of an 8” x 10” image scanned at
400 dpi using the formula:
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Storing Your Images

Very high quality images create very large files
The higher the resolution, the greater the file size
The higher the bit depth, the greater the file size

For the exercise coming up




August 2005
two different formulas
to figure out how much disk space images need
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Three Or More Files For Every
Image

Master image
This is one you do not tamper with, and you
 use a file format that does not lose data when you
save it.


Two derivatives:
access (service) image
 small (thumbnail) image.

August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Master Files

Stored offline—




it is valuable,
usually too large for common bandwidth
Not uncommon to have multi-megabyte master
images.
The exception is the JPEG2000 format, which
enjoys a progressive display (details later).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Service or Access Images

By contrast, a common range for the service or
access image is
100 to 500 KB
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Thumbnail: The Smallest Access
Image

A thumbnail may be only a few KB, and typically
is no larger than
about 150-200 pixels on a side
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
For the Local History Project



Full resolution image and large service
image delivered directly to libraries
Import either of them to CONTENTdm to
derive a service image and thumbnail
Automatic with CONTENTdm software
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Keeping Your Master


Retain on your local system, on the CDs
delivered, or in any other manner you like.
CDL will also receive a copy of both
master and derivative,


Store the master as your “preservation” copy.
Important to understand the storage
implications of your master images
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Scanning


A common specification has been developed
Scanning vendor (will have been) selected
It is still important to understand the
specification and infrastructure issues.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #1
Calculating File Sizes for
Digital Images
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Technology Infrastructure
Technology Infrastructure


In this unit we will discuss the hardware,
software and networking requirements of
digital projects.
We will touch on data storage again briefly
and will delve into the question of
compression and file formats.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Local History Project


Will run on computers located around the
state, connected through the Internet.
The smooth operation of this distributed
infrastructure involves not only hardware
and software, but also depends upon good
communication among people.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
All Of This Takes Planning

All the partners in the project


including the info tech service providing
partners
Must demonstrate good communication
skills and consistently confer with each
other
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Library Policies



Security
Intellectual property
Policies must be in synch with info tech
provider

regardless of whom that may be
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CDL Will Be Providing Access To
Your Collections


They must be able to protect their
networks from misuse.
The end-users must be able to easily
access unrestricted material.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Distributed Architecture


Designed for the Local History Project, it
has local libraries feeding material into a
central databank
Fairly sophisticated, and yet divides the
labor according to appropriate tasks.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Local History Project Will
Comprise A Set Of Collections



Each built locally
Using the CONTENTdm Acquisition Station
software, and stored on the CONTENTdm
server. The materials will be copied to the
CDL
Part of collaborative program for both
access and preservation
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Offers At
Least Three Outlets For Collections



The way your metadata will get into the
CDL is through the use of the
CONTENTdm export function.
A customized export/import mechanism
writes your metadata in the METS format
You will be trained in its use during your
CONTENTdm training session
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Managing The Digital Files


Because your scanning will have been done
by a vendor, we will not discuss the attributes
of scanning software fully.
But you will need to know something about
the various pieces of software in use.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Processing That Will Be Done
For You Includes:



Scanning: representing a print item as a digital
image. E.g., the software that runs your digital
camera or your scanner.
OCR Software: if you have text that you would like
made searchable, software such as Omnipage then
converts the words in the image to a text file that can
be searched.
Lastly, a Digital Asset Management System (e.g.
CONTENTdm) provides a way to organize the image
files, make derivatives and add metadata to each
image.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm Selected



High-performance tool
Easy-to-use interface
Will scale as the collections grow

i.e., it will continue to perform well and be
manageable even when there are millions of
objects
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Hardware: From Scanning To
Storage


The lifecycle of collections now includes
preservation of the digital image.
Before scanning hardware or specifications
are set
consider the technical issues
 for access AND for
 long-term preservation of the digital image

August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Sustaining Collections Over Time


Data needs to be saved and protected at
every stage in its life-cycle
Many ways of accomplishing this are in
experimental stages
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Preservation Of Digital Files

Data migration


Backup and archiving plans


e.g. moving files from CD to DVD
e.g. storing files online or on a central backup
server
Disaster recovery plans—for both analog and
digital resources

August 2005
heaven forbid! The library burns down….what
happens to your CDs, your computers?
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Preservation Repository Must Also
Be Managed


Sized, weeded, protected and moved
Because CDL is offering long-term
preservation,

your scans and metadata must meet the
standards set for the repository!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Choosing Among File Formats


One decision that affects collection’s
accessibility and preservation potential is
The format of the files you choose to keep
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Many File Formats
(LHDRP is requiring these *)






TIFF*
JPEG2000
GIF*
JPEG*
PDF
MrSid—proprietary, wavelet-based
compression for progressive display
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Choosing among the file formats
means you need to understand
something about what the file
format specification implies.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Compression Used To Reduce File Sizes

Two kinds—lossy and lossless.

Lossy- an irrecoverable loss of data,


considerable size reductions (JPEG).
Lossless (JPEG2000 and TIFF),

no loss of data.


August 2005
TIFF: no loss of data but the file size is not reduced
JPEG2000: no loss of data, but can also reduce the size
of the file delivered for display, as it is decompressed at
the point of display.
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
TIFF: Tagged Image File Format





TIFF itu-t.7
IS A 24-bit storage format in widespread use.
Useful for both color and bitonal (black &
white) images
Provides a high level of detail. It is used for
archival files (masters).
When compression is used, it should be
lossless.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
JPEG: Joint Photographic Expert’s
Group/JFIF
(JPEG File Interchange Format)

JPEGs are commonly used in bitmap image
editing programs


In viewers, and most important for our
project,



e.g., Paintshop
web browsers
24-bit, lossy compression format
Well suited for screen and print presentations.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
JPEG2000


Provides highly detailed views of objects
Not a proprietary format





but not all software can handle a JPEG2000 file
both PhotoShop and CONTENTdm have that capability
To view a file saved as JPEG2000, some products
require a browser “plug-in”.
CONTENTdm does not require one, but has a built-in
viewer in the extended server software.
CDL does not currently support JPEG2000, so for this
project, you will not create JPEG2000 files.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
GIF: Graphics Interchange Format




8-bit, lossless compression format
Well-suited to low resolution screen display
Often used for thumbnails
Supported by all major computer platforms
and web browsers
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
PDF: Portable Document Format



Proprietary (Adobe) format, now
de facto standard (is actually several formats)
All need a plug-in or external application for
web display,


but that “reader” is free to download.
Widely used for printing and viewing multipage documents
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Word About File Naming



Best practice is to use the standard 8.3
convention, e.g., house178.txt.
Use lower-case characters only as some
operating systems such as Unix are casesensitive.
Avoid punctuation characters in filenames
altogether.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
File Naming


Simple—a single image
Compound—more than one image



Components need to be named and stored in
logical fashion
E.g., when assembling, page_01.jpg will
precede page_02.jpg (alphanumeric sort)
E.g., when assembling a hierarchy, items need
to be stored in logical directories
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Conventions


Vendor must deliver files named with an
appropriate scheme

that works for your library

And for the Local History Project
Exercise will focus on file handling

File formats, naming and organization
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Hardware
Digital project hardware components will
include at minimum



Servers
Desktop computers
Network components
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Your CONTENTdm Environment



Server located and managed remotely for
the Local History Project.
Computer on your desktop
Network: IT provider uses components
e.g., routers, cables, access points, network
interface cards
to connect everything together and to the
internet.

August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Data Storage: Day-to-day, And
Over The Long Haul

As you populate your collections, it is
important to back up the workstations and
network drives regularly. At the site of the
CONTENTdm server, as well as at CDL,
servers will also be regularly backed up as
well.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digital collection servers



Remember: form follows function.
Hardware is sized for the project and for the
environment,
After the software has been chosen.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm Server is Hosted by
OCLC



For LHDRP
One-year license
After that, depends on funding….if funded
could be extended
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Considerations If You Run A Server





Processor style and speed
Minimum RAM
Minimum online storage
These variables always depend upon the context of
your organization, the operating system environments
supported, and the application requirements.
The minimum requirements for servers in general
assure good performance, i.e., you can very rapidly
search and retrieve dense data, and display to many
concurrent users.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm 4 Minimum Server
Requirements






CPU: Intel Pentium® 4 or greater
RAM 512 MB minimum
Operating Systems:
Linux, unix, Sun Solaris™ 8 or higher,
Windows 2000/2003
Dedicated Web server
IIS 4.0 or later with Windows®, Apache with
UNIX)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Storage For Files

Both “derivatives” (service images and
thumbnails) are


kept online
The archival TIFF is stored offline
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Files Most Commonly Seen As
Derivative (Access) Files



JPEGs averaging 100 K (with most CONTENTdm
collections)
Estimate 500 jpgs will need about 50 MB space
to store the access (service, derivative) images
To size a CONTENTdm server, assume that a


1 GB disk
Will store 10,000 jpgs for high-quality display
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
To Populate The Collections, On
The Desktop Contentdm Requires






Monitor capable of 1024 x 768 resolution
256 MB RAM (512 recommended)
Disk capacity to hold images (temporarily)
and software
i.e. 100 MB for installation of Acquisition
Station
Windows 2000 or XP
128 Kbps minimum network connection
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Desktop Wish List—not Required,
But Nice!






A dedicated computer for digitization with:
A 19” or 21” inch display monitor
1 Gb RAM (for multi-media)
3.2GHz/800MHz processors optimized for image
manipulation
Graphics processors (up to 128 MB dedicated RAM)
for high quality video, multiple monitors, etc.
High-quality lupes, scales and updated targets
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digitizing Devices: Scanners and
Cameras


In this phase of the project, your scanning will
be outsourced
But info on scanners and cameras is included
here for future reference
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
We Will Discuss The Primary Types
Of These “Capture” Devices







August 2005
Flatbed scanners
Transparency scanners
Overhead scanners
Wide format scanners
Cameras
Copy stand cameras
Camera “backs”
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Flatbed Scanner


Chances are you have one of these in your library (or
your home). They handle unbound material up to 11”
x 17” in size, and some come with automatic
document feeder attachments so that you can stack a
document for scanning.
The makes and models vary greatly in cost and
quality. Some have transparency adapters too, but if
you have a lot of film (slides) to scan, you may look
for a specialized scanner just for them.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Transparency Scanner



For transparent material, both negatives and slides,
there are many makes and models to choose from,
but a commonly used one is made by Nikon.
E.g., Nikon LS-2000 Film Scanner
36-bit color
58mb file size
20 second scan speed
2700 dpi resolution
35mm film strip or slide format
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Overhead Scanner


If you do a lot of interlibrary loan, you may
already own an overhead scanner. it was
designed for books, other bound documents,
so that the page is protected from touch by
the machine.
E.g., Minolta PS 3000 and PS 7000 are widely
in use
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Cameras



For 3-dimensional items and sometimes for oversize
items, cameras are becoming very popular.
Discussions on various listservs such as “imagelib” are
lively with comparisons of cameras from the
consumer models we carry on our vacations to highquality professional set ups.
E.g., Nikon COOLPIX 3100
Effective pixels 3.2 million (total pixels: 3.34 million)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Copy Stands

are used for long exposures, repeated
placement of objects, etc.






An example of a high quality camera and copy stand is the Leica
S1 Pro Digital Camera used in the digitization lab at the
University of Utah. It is described as:
Triple linear color CCD line, high-performance full step motor.
Full scan time is 185 seconds. Viewfinder offers laterally correct
image on a focusing screen with a grid.
Produces file sizes of
75MB at 36 bit color or
150MB at 48 bit color.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Camera Specification



Resolution for cameras is often given as the
total number of pixels delivered by a device.
For example, a camera may be described as ‘x
number of mega-pixels’
A mega-pixel is 1,000,000 pixels.
E.g., Canon’s S45 (4.5 Megapixel) maximum
resolution: 2272 x 1704 which if you do the
math, is closer to 3.8 megapixels…
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
For Highest Quality Professional
Work

Photographers fit 4x5 traditional film
cameras with “camera backs” that store
the images digitally instead of in analog
format. E.g., PhaseOne PowerPhase-- a
digital back to a 4x5 view camera that can
produce resolutions of 10,000 x 12,000
pixels.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Camera vs. Scanner


Scanners and cameras share broadly similar
technologies, and at this point there are
negligible quality differences at the high end.
Of course scanners can only handle 2dimensional or flat images, while cameras can
handle both
2-dimensional and 3-dimensional objects.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digital Cameras—Versatile and Fast


They are preferred for delicate or fragile
originals and increasingly for large flat works
such as maps and aerial photos.
But the lighting is hard to control to get
professional quality work you may find yourself
hiring a professional photographer to come in.
Rare materials should not be subjected to strong
light of course, so if doing that sort of
photography in-house, you might use a strobe
light.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Prints From Digital: What Does The
User Need?



Many libraries are creating revenue
generating (cost-recovery) programs that
provide prints from the collection.
With the advent of digitization programs,
these prints are increasingly made from
digitized copies of the original.
Occasionally users even purchase the digital
file itself.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Cost of Commercial Printer

To serve the occasional professional user


Outsource to a commercial house or offer to sell
the digital image instead.
“Pro-sumer” photo-quality printers can be had
for under $100
e.g., Canon i560s
 Some of your users may prefer to buy the TIFF
and print at home

August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
IF You Print From Digital, You Will
House Large Files

The B&H Photo house in New York City estimates
these file sizes for good output:

Up to 3 MB
Good for proofing, web use,
presentations
3-20 MB
21-50 MB
51-99 MB
100-125MB
Good
Good
Good
Good
August 2005
for
for
for
for
Digital Image Scanning
up to 8x10 prints
up to 16x20 prints
up to 24x30 prints
over 24x30 prints
Geri Ingram, DiMeMa, Inc.
Networking Puts It All Together



To move your digital images from your
workstation to your CONTENTdm server, you
will use the internet.
Your connection should have sufficient
bandwidth for the digital formats you are
importing.
Your users will of course need to have
connections strong enough to download the
images in real time.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Speeds


T1: 1.544 million bits per second (Mbps)—this
bandwidth is sufficient for building the
collection.
T3: 45 Mbps – of course this is even better,
much faster.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Wireless

The most popular wireless mode
802.11 b/g (WiFi)
 shared 11 Mbps for “b” and 33-54 Mbps for “g”.


This should be quite adequate for your endusers to access your collections.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Security

Network access is made secure through
various methods,




IP ranges (addresses like 209.116.xxx.xxx)
Passwords
Mixed models
Integrated with a parent organization’s model!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #2
Materials Preparation
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Metadata, Rights, Quality
Control
Metadata




Standards and schemes
Access and preservation
A full one-day workshop on the metadata
Template for Local History Project in Project
Guide
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Refresher:
What Do We Mean By “Metadata”?


Metadata is information about the digital
object.
Good metadata helps in finding and
preserving a digital object or aggregation
of digital objects.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Metadata Schema Examples





AACR2 (MARC format)
Dublin Core (DC)
Visual Resources Association Core (VRA Core)
Metadata Object Descriptive Schema (MODS)
Encoded Archival Description (EAD)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Types of Metadata





Descriptive
Administrative
Structural
Technical
“Preservation”
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Descriptive Metadata


Terms that say what the digital object
represents—what it is “about”
It’s what your users expect—it identifies
the information resources in a way that
allows them to be discovered.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Administrative Metadata



Facilitates both short-term and long-term
management and processing of digital
collections
Includes data pertinent to the creation of
the digital object
Includes rights management, access
control and use requirements
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Structural Metadata


Facilitates navigation and presentation
Provides information about the internal structure
of resources


including page, section, chapter numbering, indexes,
and table of contents
Describes the relationship among materials (e.g.,
photograph B was front of Postcard A)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Technical Metadata

Describes the features of the digital file



e.g. resolution,
pixel dimensions,
and the compression factor used in saving the
file.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
“Preservation” Metadata




The ability to preserve your digital
resources into the future depends in part
on how completely you’ve applied
metadata, especially
administrative
structural
technical metadata
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
LHDRP, CONTENTdm and the
Dublin Core









Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
August 2005







Digital Image Scanning
Format
Identifier
Source
Language
Relation
Coverage
Rights
CONTENTdm offers
Audience too
Geri Ingram, DiMeMa, Inc.
You Will Use These Elements To
Describe Your Collections

At the item level


Later, your collections will be




during the CONTENTdm building process.
exported
Imported to OAC
Metadata and CONTENTdm classes scheduled
There we will delve into applying the Dublin
Core element set
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Rights: Metadata




When material needs to be restricted
The reasons should be made clear to the endusers,
If possible, the right to access the objects
should be negotiated.
You will have to clear your materials of any
restrictions so that they can be freely displayed
on the CDL’s public access site(s).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Access To Contentdm Server



The Dublin Core Rights field can be used to
explain the rights situation for the item
Mechanisms in place to allow you to restrict
access to materials at the item and the collection
level.
Some commonly used mechanisms for
controlling access to digital materials are user
name/password challenges and IP (internet
protocol) address ranges.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm



Uses both usernames/passwords and IP
ranges
Control access at the collection and the
item level
When your users are viewing your images
on a CONTENTdm server.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Quality control

Getting the materials off to the vendor


appropriately packed, tagged and flagged
Getting the materials back from the
vendor


what will you check for?
texts and photos—different things to look for
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Texts And Images Of Texts


The scan produces a file in image format,
which in itself is not searchable
There are a number of ways to create
searchable text from images of text.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Converting Images To Text

Re-keying


handwritten text, or foreign language fonts



very expensive, but high-quality
you will have to create typescripts by hand.
OCR (Optical Character Recognition) is the
automated way
With correction, expensive,

August 2005
but without correction lower accuracy
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
What is OCR?




OCR “engines” are
pattern recognition algorithms which can
convert images of alphanumeric characters
into machine-recognizable characters.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
OCR Has Been Around Since The
1970s



Much research to improve accuracy and
extend the readable language sets.
Very expensive in the early days
Available to desktop consumers in the
mid-late 1990s
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Now There Is Decent “Pro-sumer”
Desktop Software

Such as AbbyyFine Reader available


(e.g., this is offered as an extension of
CONTENTdm.)
Service bureaus (vendors) have also
developed proprietary software



get up to 90% accuracy
can handle large volumes
use filters, formulas and multi-pass methods
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Problem With OCR


When used on barely legible old texts,
film, etc., creates “dirty” ASCII—
“Guesses” are saved in a string


not intended for human view. (These should
be cleaned up if display is important.)
can hide the dirty ASCII from display but
allow the search engine to index on it
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
To What Degree Is The Accuracy Of
The OCR Important?


This depends on the quality of the image
being processed, and on the intended use of
the captured text.
A rule of thumb: high resolution, greater bitdepth gives more accurate OCR (and larger
file sizes).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Imaging Vendor Checklist:
Identifying Unacceptable Scans








Image not correct size
File name is incorrect
File format is incorrect
Loss of detail
Too light or too dark
Image cropped incorrectly
Image rotated incorrectly
Image reversed
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Identifying Correct Packaging Of
Digital Materials


Object identifier
The order of the compound object’s parts


corresponding file names and directory
structure
Verify to CALIFA
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #3
Quality Control
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Effectiveness
What Is Success ?




Best practices in the digitization process, evaluation
and quality control.
Usability testing
As technology changes,
 as long as you are relying on agreed-upon
standards,
You will be able to go back and correct, improve and
expand.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
User-driven Purposes



Many reasons for undertaking a digitization
project,
All include to improve and expand end-user
access to your materials.
Even “preserving” the content and
“conserving the originals”

August 2005
It is because someday a person may need to
access the resource
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Late Turn-of-the-century History

Regular use of digitization in cultural
heritage organizations


such as libraries and archives
Leaders in the field like the California
Digital Library, the Digital Library
Federation documented “best practices”
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Principles, Part 1






Leading practices proven over time
Scan at the highest resolution appropriate to the informational
content of the originals
Scan at an appropriate level of quality to avoid rescanning and
re-handling of the originals in the future—scan once
Create and store a master image file that can be used to
produce derivative image files and serve a variety of current and
future user needs
Use image file formats and compression techniques that conform
to industry standards
Create backup copies of all files on a stable medium
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Principles, Part 2:







Create meaningful metadata for image files or
collections
Store media in an appropriate environment
Monitor and recopy data as necessary
Outline a migration strategy for transferring data
across generations of technology
Anticipate and plan for future technological
developments
Scan (or have your vendor scan) at the appropriate
settings for source material
Inspect master images at 100% magnification (all or
a sample)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Standards





The California State Library, CALIFA and CDL
Partnered to create a set of standards for
digital imaging and metadata
To ensure that your collections are accessible
to your public and well-preserved into the
future.
Selected a digital collection management tool
Prepared a straightforward path for your
materials from CONTENTdm to the CDL
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Let’s Revisit Our Project Plans
And make sure we chart our
course for the next steps!
Exercise #4
Assessing and Improving
Your
Local History Project
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Conclusion


Please fill out your evaluation forms
See you in a few weeks for CONTENTdm
training!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.