Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.
Download
Report
Transcript Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.
Digital Image Scanning
Instructor:
Geri Bunker Ingram
[email protected]
An Infopeople Workshop
August 2005
This Workshop Is Brought to You By
the Infopeople Project
Infopeople is a federally-funded grant project
supported by the California State Library. It
provides a wide variety of training to California
libraries. Infopeople workshops are offered
around the state and are open registration on a
first-come, first-served basis.
For a complete list of workshops, and for other
information about the Project, go to the
Infopeople website at infopeople.org.
Introductions
Please tell us again, your
Name
Library
Position and role within the Local History Project
Are there lingering questions from yesterday that we
should discuss?
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Learning Objectives
Understand the basics of digital imaging
Interpret and evaluate scanning specifications
for your project
Differentiate among different technology
options for various formats
Understand the significance of standard
metadata
Learn about display and navigation options.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Agenda
9:00—10:30 What is Digitization?
10:30—10:45 BREAK
10:45—12:00 Technology Infrastructure
12:00—1:00
LUNCH
1:00—2:30
2:30—2:45
2:45—4:00
Metadata, Rights, Quality Control
BREAK
Effectiveness
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
What is Digitization?
What is Digitization?
Process of digitization
resolution
bit depth
The Local History Project guidelines and
standards
The implications of these standards
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Refresher on Scanning
Scanning takes reflected light signals and
changes them to digital data.
The resulting digitized image is made up of a
grid of individual picture elements.
Picture elements are known as “pixels”. Pixels
are made up of binary digits (bits)
Each bit is expressed as either “0” or “1”.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Controlling Spatial Detail and
Accuracy
Two settings affect spatial detail and accuracy
during the scanning process
bit depth
resolution (the number of bits sampled)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Adjusting Bit Depth
Binary digit (bit) “depth”
number of bits used to define each pixel.
the greater the bit depth, the greater the number
of tones (grays or color)
Black and white (bitonal)=1 bit per pixel
Grayscale=8 bits per pixel (256 shades of
gray)
Color=24 bits per pixel (16.7 million color
tones)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Adjusting Resolution
Resolution is a sampling rate—
how many dots per inch will you scan?
E.g., 400 dpi.
The effect:
the higher the rate, the smoother the image
the more it can be magnified before its individual
pixels become visible
High resolution = many dots per inch
Low resolution = fewer dots per inch
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Sometimes Resolution Is Expressed
As Absolute Pixel Dimensions
Pixel dimensions =
(dpi x width) x (dpi x height)
Example: 3200 x 4000 would be the pixel
dimensions of an 8” x 10” image scanned at
400 dpi using the formula:
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Storing Your Images
Very high quality images create very large files
The higher the resolution, the greater the file size
The higher the bit depth, the greater the file size
For the exercise coming up
August 2005
two different formulas
to figure out how much disk space images need
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Three Or More Files For Every
Image
Master image
This is one you do not tamper with, and you
use a file format that does not lose data when you
save it.
Two derivatives:
access (service) image
small (thumbnail) image.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Master Files
Stored offline—
it is valuable,
usually too large for common bandwidth
Not uncommon to have multi-megabyte master
images.
The exception is the JPEG2000 format, which
enjoys a progressive display (details later).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Service or Access Images
By contrast, a common range for the service or
access image is
100 to 500 KB
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Thumbnail: The Smallest Access
Image
A thumbnail may be only a few KB, and typically
is no larger than
about 150-200 pixels on a side
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
For the Local History Project
Full resolution image and large service
image delivered directly to libraries
Import either of them to CONTENTdm to
derive a service image and thumbnail
Automatic with CONTENTdm software
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Keeping Your Master
Retain on your local system, on the CDs
delivered, or in any other manner you like.
CDL will also receive a copy of both
master and derivative,
Store the master as your “preservation” copy.
Important to understand the storage
implications of your master images
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Scanning
A common specification has been developed
Scanning vendor (will have been) selected
It is still important to understand the
specification and infrastructure issues.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #1
Calculating File Sizes for
Digital Images
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Technology Infrastructure
Technology Infrastructure
In this unit we will discuss the hardware,
software and networking requirements of
digital projects.
We will touch on data storage again briefly
and will delve into the question of
compression and file formats.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Local History Project
Will run on computers located around the
state, connected through the Internet.
The smooth operation of this distributed
infrastructure involves not only hardware
and software, but also depends upon good
communication among people.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
All Of This Takes Planning
All the partners in the project
including the info tech service providing
partners
Must demonstrate good communication
skills and consistently confer with each
other
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Library Policies
Security
Intellectual property
Policies must be in synch with info tech
provider
regardless of whom that may be
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CDL Will Be Providing Access To
Your Collections
They must be able to protect their
networks from misuse.
The end-users must be able to easily
access unrestricted material.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Distributed Architecture
Designed for the Local History Project, it
has local libraries feeding material into a
central databank
Fairly sophisticated, and yet divides the
labor according to appropriate tasks.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Local History Project Will
Comprise A Set Of Collections
Each built locally
Using the CONTENTdm Acquisition Station
software, and stored on the CONTENTdm
server. The materials will be copied to the
CDL
Part of collaborative program for both
access and preservation
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Offers At
Least Three Outlets For Collections
The way your metadata will get into the
CDL is through the use of the
CONTENTdm export function.
A customized export/import mechanism
writes your metadata in the METS format
You will be trained in its use during your
CONTENTdm training session
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Managing The Digital Files
Because your scanning will have been done
by a vendor, we will not discuss the attributes
of scanning software fully.
But you will need to know something about
the various pieces of software in use.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Processing That Will Be Done
For You Includes:
Scanning: representing a print item as a digital
image. E.g., the software that runs your digital
camera or your scanner.
OCR Software: if you have text that you would like
made searchable, software such as Omnipage then
converts the words in the image to a text file that can
be searched.
Lastly, a Digital Asset Management System (e.g.
CONTENTdm) provides a way to organize the image
files, make derivatives and add metadata to each
image.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm Selected
High-performance tool
Easy-to-use interface
Will scale as the collections grow
i.e., it will continue to perform well and be
manageable even when there are millions of
objects
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Hardware: From Scanning To
Storage
The lifecycle of collections now includes
preservation of the digital image.
Before scanning hardware or specifications
are set
consider the technical issues
for access AND for
long-term preservation of the digital image
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Sustaining Collections Over Time
Data needs to be saved and protected at
every stage in its life-cycle
Many ways of accomplishing this are in
experimental stages
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Preservation Of Digital Files
Data migration
Backup and archiving plans
e.g. moving files from CD to DVD
e.g. storing files online or on a central backup
server
Disaster recovery plans—for both analog and
digital resources
August 2005
heaven forbid! The library burns down….what
happens to your CDs, your computers?
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Preservation Repository Must Also
Be Managed
Sized, weeded, protected and moved
Because CDL is offering long-term
preservation,
your scans and metadata must meet the
standards set for the repository!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Choosing Among File Formats
One decision that affects collection’s
accessibility and preservation potential is
The format of the files you choose to keep
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Many File Formats
(LHDRP is requiring these *)
TIFF*
JPEG2000
GIF*
JPEG*
PDF
MrSid—proprietary, wavelet-based
compression for progressive display
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Choosing among the file formats
means you need to understand
something about what the file
format specification implies.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Compression Used To Reduce File Sizes
Two kinds—lossy and lossless.
Lossy- an irrecoverable loss of data,
considerable size reductions (JPEG).
Lossless (JPEG2000 and TIFF),
no loss of data.
August 2005
TIFF: no loss of data but the file size is not reduced
JPEG2000: no loss of data, but can also reduce the size
of the file delivered for display, as it is decompressed at
the point of display.
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
TIFF: Tagged Image File Format
TIFF itu-t.7
IS A 24-bit storage format in widespread use.
Useful for both color and bitonal (black &
white) images
Provides a high level of detail. It is used for
archival files (masters).
When compression is used, it should be
lossless.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
JPEG: Joint Photographic Expert’s
Group/JFIF
(JPEG File Interchange Format)
JPEGs are commonly used in bitmap image
editing programs
In viewers, and most important for our
project,
e.g., Paintshop
web browsers
24-bit, lossy compression format
Well suited for screen and print presentations.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
JPEG2000
Provides highly detailed views of objects
Not a proprietary format
but not all software can handle a JPEG2000 file
both PhotoShop and CONTENTdm have that capability
To view a file saved as JPEG2000, some products
require a browser “plug-in”.
CONTENTdm does not require one, but has a built-in
viewer in the extended server software.
CDL does not currently support JPEG2000, so for this
project, you will not create JPEG2000 files.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
GIF: Graphics Interchange Format
8-bit, lossless compression format
Well-suited to low resolution screen display
Often used for thumbnails
Supported by all major computer platforms
and web browsers
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
PDF: Portable Document Format
Proprietary (Adobe) format, now
de facto standard (is actually several formats)
All need a plug-in or external application for
web display,
but that “reader” is free to download.
Widely used for printing and viewing multipage documents
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Word About File Naming
Best practice is to use the standard 8.3
convention, e.g., house178.txt.
Use lower-case characters only as some
operating systems such as Unix are casesensitive.
Avoid punctuation characters in filenames
altogether.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
File Naming
Simple—a single image
Compound—more than one image
Components need to be named and stored in
logical fashion
E.g., when assembling, page_01.jpg will
precede page_02.jpg (alphanumeric sort)
E.g., when assembling a hierarchy, items need
to be stored in logical directories
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Conventions
Vendor must deliver files named with an
appropriate scheme
that works for your library
And for the Local History Project
Exercise will focus on file handling
File formats, naming and organization
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Hardware
Digital project hardware components will
include at minimum
Servers
Desktop computers
Network components
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Your CONTENTdm Environment
Server located and managed remotely for
the Local History Project.
Computer on your desktop
Network: IT provider uses components
e.g., routers, cables, access points, network
interface cards
to connect everything together and to the
internet.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Data Storage: Day-to-day, And
Over The Long Haul
As you populate your collections, it is
important to back up the workstations and
network drives regularly. At the site of the
CONTENTdm server, as well as at CDL,
servers will also be regularly backed up as
well.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digital collection servers
Remember: form follows function.
Hardware is sized for the project and for the
environment,
After the software has been chosen.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm Server is Hosted by
OCLC
For LHDRP
One-year license
After that, depends on funding….if funded
could be extended
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Considerations If You Run A Server
Processor style and speed
Minimum RAM
Minimum online storage
These variables always depend upon the context of
your organization, the operating system environments
supported, and the application requirements.
The minimum requirements for servers in general
assure good performance, i.e., you can very rapidly
search and retrieve dense data, and display to many
concurrent users.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm 4 Minimum Server
Requirements
CPU: Intel Pentium® 4 or greater
RAM 512 MB minimum
Operating Systems:
Linux, unix, Sun Solaris™ 8 or higher,
Windows 2000/2003
Dedicated Web server
IIS 4.0 or later with Windows®, Apache with
UNIX)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Storage For Files
Both “derivatives” (service images and
thumbnails) are
kept online
The archival TIFF is stored offline
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Files Most Commonly Seen As
Derivative (Access) Files
JPEGs averaging 100 K (with most CONTENTdm
collections)
Estimate 500 jpgs will need about 50 MB space
to store the access (service, derivative) images
To size a CONTENTdm server, assume that a
1 GB disk
Will store 10,000 jpgs for high-quality display
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
To Populate The Collections, On
The Desktop Contentdm Requires
Monitor capable of 1024 x 768 resolution
256 MB RAM (512 recommended)
Disk capacity to hold images (temporarily)
and software
i.e. 100 MB for installation of Acquisition
Station
Windows 2000 or XP
128 Kbps minimum network connection
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Desktop Wish List—not Required,
But Nice!
A dedicated computer for digitization with:
A 19” or 21” inch display monitor
1 Gb RAM (for multi-media)
3.2GHz/800MHz processors optimized for image
manipulation
Graphics processors (up to 128 MB dedicated RAM)
for high quality video, multiple monitors, etc.
High-quality lupes, scales and updated targets
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digitizing Devices: Scanners and
Cameras
In this phase of the project, your scanning will
be outsourced
But info on scanners and cameras is included
here for future reference
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
We Will Discuss The Primary Types
Of These “Capture” Devices
August 2005
Flatbed scanners
Transparency scanners
Overhead scanners
Wide format scanners
Cameras
Copy stand cameras
Camera “backs”
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Flatbed Scanner
Chances are you have one of these in your library (or
your home). They handle unbound material up to 11”
x 17” in size, and some come with automatic
document feeder attachments so that you can stack a
document for scanning.
The makes and models vary greatly in cost and
quality. Some have transparency adapters too, but if
you have a lot of film (slides) to scan, you may look
for a specialized scanner just for them.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Transparency Scanner
For transparent material, both negatives and slides,
there are many makes and models to choose from,
but a commonly used one is made by Nikon.
E.g., Nikon LS-2000 Film Scanner
36-bit color
58mb file size
20 second scan speed
2700 dpi resolution
35mm film strip or slide format
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Overhead Scanner
If you do a lot of interlibrary loan, you may
already own an overhead scanner. it was
designed for books, other bound documents,
so that the page is protected from touch by
the machine.
E.g., Minolta PS 3000 and PS 7000 are widely
in use
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Cameras
For 3-dimensional items and sometimes for oversize
items, cameras are becoming very popular.
Discussions on various listservs such as “imagelib” are
lively with comparisons of cameras from the
consumer models we carry on our vacations to highquality professional set ups.
E.g., Nikon COOLPIX 3100
Effective pixels 3.2 million (total pixels: 3.34 million)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Copy Stands
are used for long exposures, repeated
placement of objects, etc.
An example of a high quality camera and copy stand is the Leica
S1 Pro Digital Camera used in the digitization lab at the
University of Utah. It is described as:
Triple linear color CCD line, high-performance full step motor.
Full scan time is 185 seconds. Viewfinder offers laterally correct
image on a focusing screen with a grid.
Produces file sizes of
75MB at 36 bit color or
150MB at 48 bit color.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Camera Specification
Resolution for cameras is often given as the
total number of pixels delivered by a device.
For example, a camera may be described as ‘x
number of mega-pixels’
A mega-pixel is 1,000,000 pixels.
E.g., Canon’s S45 (4.5 Megapixel) maximum
resolution: 2272 x 1704 which if you do the
math, is closer to 3.8 megapixels…
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
For Highest Quality Professional
Work
Photographers fit 4x5 traditional film
cameras with “camera backs” that store
the images digitally instead of in analog
format. E.g., PhaseOne PowerPhase-- a
digital back to a 4x5 view camera that can
produce resolutions of 10,000 x 12,000
pixels.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Camera vs. Scanner
Scanners and cameras share broadly similar
technologies, and at this point there are
negligible quality differences at the high end.
Of course scanners can only handle 2dimensional or flat images, while cameras can
handle both
2-dimensional and 3-dimensional objects.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Digital Cameras—Versatile and Fast
They are preferred for delicate or fragile
originals and increasingly for large flat works
such as maps and aerial photos.
But the lighting is hard to control to get
professional quality work you may find yourself
hiring a professional photographer to come in.
Rare materials should not be subjected to strong
light of course, so if doing that sort of
photography in-house, you might use a strobe
light.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Prints From Digital: What Does The
User Need?
Many libraries are creating revenue
generating (cost-recovery) programs that
provide prints from the collection.
With the advent of digitization programs,
these prints are increasingly made from
digitized copies of the original.
Occasionally users even purchase the digital
file itself.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Cost of Commercial Printer
To serve the occasional professional user
Outsource to a commercial house or offer to sell
the digital image instead.
“Pro-sumer” photo-quality printers can be had
for under $100
e.g., Canon i560s
Some of your users may prefer to buy the TIFF
and print at home
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
IF You Print From Digital, You Will
House Large Files
The B&H Photo house in New York City estimates
these file sizes for good output:
Up to 3 MB
Good for proofing, web use,
presentations
3-20 MB
21-50 MB
51-99 MB
100-125MB
Good
Good
Good
Good
August 2005
for
for
for
for
Digital Image Scanning
up to 8x10 prints
up to 16x20 prints
up to 24x30 prints
over 24x30 prints
Geri Ingram, DiMeMa, Inc.
Networking Puts It All Together
To move your digital images from your
workstation to your CONTENTdm server, you
will use the internet.
Your connection should have sufficient
bandwidth for the digital formats you are
importing.
Your users will of course need to have
connections strong enough to download the
images in real time.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Speeds
T1: 1.544 million bits per second (Mbps)—this
bandwidth is sufficient for building the
collection.
T3: 45 Mbps – of course this is even better,
much faster.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Wireless
The most popular wireless mode
802.11 b/g (WiFi)
shared 11 Mbps for “b” and 33-54 Mbps for “g”.
This should be quite adequate for your endusers to access your collections.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Security
Network access is made secure through
various methods,
IP ranges (addresses like 209.116.xxx.xxx)
Passwords
Mixed models
Integrated with a parent organization’s model!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #2
Materials Preparation
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Metadata, Rights, Quality
Control
Metadata
Standards and schemes
Access and preservation
A full one-day workshop on the metadata
Template for Local History Project in Project
Guide
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
A Refresher:
What Do We Mean By “Metadata”?
Metadata is information about the digital
object.
Good metadata helps in finding and
preserving a digital object or aggregation
of digital objects.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Metadata Schema Examples
AACR2 (MARC format)
Dublin Core (DC)
Visual Resources Association Core (VRA Core)
Metadata Object Descriptive Schema (MODS)
Encoded Archival Description (EAD)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Types of Metadata
Descriptive
Administrative
Structural
Technical
“Preservation”
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Descriptive Metadata
Terms that say what the digital object
represents—what it is “about”
It’s what your users expect—it identifies
the information resources in a way that
allows them to be discovered.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Administrative Metadata
Facilitates both short-term and long-term
management and processing of digital
collections
Includes data pertinent to the creation of
the digital object
Includes rights management, access
control and use requirements
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Structural Metadata
Facilitates navigation and presentation
Provides information about the internal structure
of resources
including page, section, chapter numbering, indexes,
and table of contents
Describes the relationship among materials (e.g.,
photograph B was front of Postcard A)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Technical Metadata
Describes the features of the digital file
e.g. resolution,
pixel dimensions,
and the compression factor used in saving the
file.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
“Preservation” Metadata
The ability to preserve your digital
resources into the future depends in part
on how completely you’ve applied
metadata, especially
administrative
structural
technical metadata
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
LHDRP, CONTENTdm and the
Dublin Core
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
August 2005
Digital Image Scanning
Format
Identifier
Source
Language
Relation
Coverage
Rights
CONTENTdm offers
Audience too
Geri Ingram, DiMeMa, Inc.
You Will Use These Elements To
Describe Your Collections
At the item level
Later, your collections will be
during the CONTENTdm building process.
exported
Imported to OAC
Metadata and CONTENTdm classes scheduled
There we will delve into applying the Dublin
Core element set
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Rights: Metadata
When material needs to be restricted
The reasons should be made clear to the endusers,
If possible, the right to access the objects
should be negotiated.
You will have to clear your materials of any
restrictions so that they can be freely displayed
on the CDL’s public access site(s).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Access To Contentdm Server
The Dublin Core Rights field can be used to
explain the rights situation for the item
Mechanisms in place to allow you to restrict
access to materials at the item and the collection
level.
Some commonly used mechanisms for
controlling access to digital materials are user
name/password challenges and IP (internet
protocol) address ranges.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
CONTENTdm
Uses both usernames/passwords and IP
ranges
Control access at the collection and the
item level
When your users are viewing your images
on a CONTENTdm server.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Quality control
Getting the materials off to the vendor
appropriately packed, tagged and flagged
Getting the materials back from the
vendor
what will you check for?
texts and photos—different things to look for
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Texts And Images Of Texts
The scan produces a file in image format,
which in itself is not searchable
There are a number of ways to create
searchable text from images of text.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Converting Images To Text
Re-keying
handwritten text, or foreign language fonts
very expensive, but high-quality
you will have to create typescripts by hand.
OCR (Optical Character Recognition) is the
automated way
With correction, expensive,
August 2005
but without correction lower accuracy
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
What is OCR?
OCR “engines” are
pattern recognition algorithms which can
convert images of alphanumeric characters
into machine-recognizable characters.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
OCR Has Been Around Since The
1970s
Much research to improve accuracy and
extend the readable language sets.
Very expensive in the early days
Available to desktop consumers in the
mid-late 1990s
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Now There Is Decent “Pro-sumer”
Desktop Software
Such as AbbyyFine Reader available
(e.g., this is offered as an extension of
CONTENTdm.)
Service bureaus (vendors) have also
developed proprietary software
get up to 90% accuracy
can handle large volumes
use filters, formulas and multi-pass methods
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
The Problem With OCR
When used on barely legible old texts,
film, etc., creates “dirty” ASCII—
“Guesses” are saved in a string
not intended for human view. (These should
be cleaned up if display is important.)
can hide the dirty ASCII from display but
allow the search engine to index on it
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
To What Degree Is The Accuracy Of
The OCR Important?
This depends on the quality of the image
being processed, and on the intended use of
the captured text.
A rule of thumb: high resolution, greater bitdepth gives more accurate OCR (and larger
file sizes).
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Imaging Vendor Checklist:
Identifying Unacceptable Scans
Image not correct size
File name is incorrect
File format is incorrect
Loss of detail
Too light or too dark
Image cropped incorrectly
Image rotated incorrectly
Image reversed
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Identifying Correct Packaging Of
Digital Materials
Object identifier
The order of the compound object’s parts
corresponding file names and directory
structure
Verify to CALIFA
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Exercise #3
Quality Control
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Effectiveness
What Is Success ?
Best practices in the digitization process, evaluation
and quality control.
Usability testing
As technology changes,
as long as you are relying on agreed-upon
standards,
You will be able to go back and correct, improve and
expand.
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
User-driven Purposes
Many reasons for undertaking a digitization
project,
All include to improve and expand end-user
access to your materials.
Even “preserving” the content and
“conserving the originals”
August 2005
It is because someday a person may need to
access the resource
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Late Turn-of-the-century History
Regular use of digitization in cultural
heritage organizations
such as libraries and archives
Leaders in the field like the California
Digital Library, the Digital Library
Federation documented “best practices”
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Principles, Part 1
Leading practices proven over time
Scan at the highest resolution appropriate to the informational
content of the originals
Scan at an appropriate level of quality to avoid rescanning and
re-handling of the originals in the future—scan once
Create and store a master image file that can be used to
produce derivative image files and serve a variety of current and
future user needs
Use image file formats and compression techniques that conform
to industry standards
Create backup copies of all files on a stable medium
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Principles, Part 2:
Create meaningful metadata for image files or
collections
Store media in an appropriate environment
Monitor and recopy data as necessary
Outline a migration strategy for transferring data
across generations of technology
Anticipate and plan for future technological
developments
Scan (or have your vendor scan) at the appropriate
settings for source material
Inspect master images at 100% magnification (all or
a sample)
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Local History Project Standards
The California State Library, CALIFA and CDL
Partnered to create a set of standards for
digital imaging and metadata
To ensure that your collections are accessible
to your public and well-preserved into the
future.
Selected a digital collection management tool
Prepared a straightforward path for your
materials from CONTENTdm to the CDL
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Let’s Revisit Our Project Plans
And make sure we chart our
course for the next steps!
Exercise #4
Assessing and Improving
Your
Local History Project
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.
Conclusion
Please fill out your evaluation forms
See you in a few weeks for CONTENTdm
training!
August 2005
Digital Image Scanning
Geri Ingram, DiMeMa, Inc.