Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.
Download ReportTranscript Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded.
Digital Image Scanning Instructor: Geri Bunker Ingram [email protected] An Infopeople Workshop August 2005 This Workshop Is Brought to You By the Infopeople Project Infopeople is a federally-funded grant project supported by the California State Library. It provides a wide variety of training to California libraries. Infopeople workshops are offered around the state and are open registration on a first-come, first-served basis. For a complete list of workshops, and for other information about the Project, go to the Infopeople website at infopeople.org. Introductions Please tell us again, your Name Library Position and role within the Local History Project Are there lingering questions from yesterday that we should discuss? August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Learning Objectives Understand the basics of digital imaging Interpret and evaluate scanning specifications for your project Differentiate among different technology options for various formats Understand the significance of standard metadata Learn about display and navigation options. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Agenda 9:00—10:30 What is Digitization? 10:30—10:45 BREAK 10:45—12:00 Technology Infrastructure 12:00—1:00 LUNCH 1:00—2:30 2:30—2:45 2:45—4:00 Metadata, Rights, Quality Control BREAK Effectiveness August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. What is Digitization? What is Digitization? Process of digitization resolution bit depth The Local History Project guidelines and standards The implications of these standards August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. A Refresher on Scanning Scanning takes reflected light signals and changes them to digital data. The resulting digitized image is made up of a grid of individual picture elements. Picture elements are known as “pixels”. Pixels are made up of binary digits (bits) Each bit is expressed as either “0” or “1”. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Controlling Spatial Detail and Accuracy Two settings affect spatial detail and accuracy during the scanning process bit depth resolution (the number of bits sampled) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Adjusting Bit Depth Binary digit (bit) “depth” number of bits used to define each pixel. the greater the bit depth, the greater the number of tones (grays or color) Black and white (bitonal)=1 bit per pixel Grayscale=8 bits per pixel (256 shades of gray) Color=24 bits per pixel (16.7 million color tones) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Adjusting Resolution Resolution is a sampling rate— how many dots per inch will you scan? E.g., 400 dpi. The effect: the higher the rate, the smoother the image the more it can be magnified before its individual pixels become visible High resolution = many dots per inch Low resolution = fewer dots per inch August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Sometimes Resolution Is Expressed As Absolute Pixel Dimensions Pixel dimensions = (dpi x width) x (dpi x height) Example: 3200 x 4000 would be the pixel dimensions of an 8” x 10” image scanned at 400 dpi using the formula: August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Storing Your Images Very high quality images create very large files The higher the resolution, the greater the file size The higher the bit depth, the greater the file size For the exercise coming up August 2005 two different formulas to figure out how much disk space images need Digital Image Scanning Geri Ingram, DiMeMa, Inc. Three Or More Files For Every Image Master image This is one you do not tamper with, and you use a file format that does not lose data when you save it. Two derivatives: access (service) image small (thumbnail) image. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Master Files Stored offline— it is valuable, usually too large for common bandwidth Not uncommon to have multi-megabyte master images. The exception is the JPEG2000 format, which enjoys a progressive display (details later). August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Service or Access Images By contrast, a common range for the service or access image is 100 to 500 KB August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Thumbnail: The Smallest Access Image A thumbnail may be only a few KB, and typically is no larger than about 150-200 pixels on a side August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. For the Local History Project Full resolution image and large service image delivered directly to libraries Import either of them to CONTENTdm to derive a service image and thumbnail Automatic with CONTENTdm software August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Keeping Your Master Retain on your local system, on the CDs delivered, or in any other manner you like. CDL will also receive a copy of both master and derivative, Store the master as your “preservation” copy. Important to understand the storage implications of your master images August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Local History Project Scanning A common specification has been developed Scanning vendor (will have been) selected It is still important to understand the specification and infrastructure issues. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Exercise #1 Calculating File Sizes for Digital Images August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Technology Infrastructure Technology Infrastructure In this unit we will discuss the hardware, software and networking requirements of digital projects. We will touch on data storage again briefly and will delve into the question of compression and file formats. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Local History Project Will run on computers located around the state, connected through the Internet. The smooth operation of this distributed infrastructure involves not only hardware and software, but also depends upon good communication among people. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. All Of This Takes Planning All the partners in the project including the info tech service providing partners Must demonstrate good communication skills and consistently confer with each other August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Library Policies Security Intellectual property Policies must be in synch with info tech provider regardless of whom that may be August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. CDL Will Be Providing Access To Your Collections They must be able to protect their networks from misuse. The end-users must be able to easily access unrestricted material. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Distributed Architecture Designed for the Local History Project, it has local libraries feeding material into a central databank Fairly sophisticated, and yet divides the labor according to appropriate tasks. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Local History Project Will Comprise A Set Of Collections Each built locally Using the CONTENTdm Acquisition Station software, and stored on the CONTENTdm server. The materials will be copied to the CDL Part of collaborative program for both access and preservation August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Local History Project Offers At Least Three Outlets For Collections The way your metadata will get into the CDL is through the use of the CONTENTdm export function. A customized export/import mechanism writes your metadata in the METS format You will be trained in its use during your CONTENTdm training session August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Managing The Digital Files Because your scanning will have been done by a vendor, we will not discuss the attributes of scanning software fully. But you will need to know something about the various pieces of software in use. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Processing That Will Be Done For You Includes: Scanning: representing a print item as a digital image. E.g., the software that runs your digital camera or your scanner. OCR Software: if you have text that you would like made searchable, software such as Omnipage then converts the words in the image to a text file that can be searched. Lastly, a Digital Asset Management System (e.g. CONTENTdm) provides a way to organize the image files, make derivatives and add metadata to each image. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. CONTENTdm Selected High-performance tool Easy-to-use interface Will scale as the collections grow i.e., it will continue to perform well and be manageable even when there are millions of objects August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Hardware: From Scanning To Storage The lifecycle of collections now includes preservation of the digital image. Before scanning hardware or specifications are set consider the technical issues for access AND for long-term preservation of the digital image August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Sustaining Collections Over Time Data needs to be saved and protected at every stage in its life-cycle Many ways of accomplishing this are in experimental stages August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Preservation Of Digital Files Data migration Backup and archiving plans e.g. moving files from CD to DVD e.g. storing files online or on a central backup server Disaster recovery plans—for both analog and digital resources August 2005 heaven forbid! The library burns down….what happens to your CDs, your computers? Digital Image Scanning Geri Ingram, DiMeMa, Inc. Preservation Repository Must Also Be Managed Sized, weeded, protected and moved Because CDL is offering long-term preservation, your scans and metadata must meet the standards set for the repository! August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Choosing Among File Formats One decision that affects collection’s accessibility and preservation potential is The format of the files you choose to keep August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Many File Formats (LHDRP is requiring these *) TIFF* JPEG2000 GIF* JPEG* PDF MrSid—proprietary, wavelet-based compression for progressive display August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Choosing among the file formats means you need to understand something about what the file format specification implies. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Compression Used To Reduce File Sizes Two kinds—lossy and lossless. Lossy- an irrecoverable loss of data, considerable size reductions (JPEG). Lossless (JPEG2000 and TIFF), no loss of data. August 2005 TIFF: no loss of data but the file size is not reduced JPEG2000: no loss of data, but can also reduce the size of the file delivered for display, as it is decompressed at the point of display. Digital Image Scanning Geri Ingram, DiMeMa, Inc. TIFF: Tagged Image File Format TIFF itu-t.7 IS A 24-bit storage format in widespread use. Useful for both color and bitonal (black & white) images Provides a high level of detail. It is used for archival files (masters). When compression is used, it should be lossless. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. JPEG: Joint Photographic Expert’s Group/JFIF (JPEG File Interchange Format) JPEGs are commonly used in bitmap image editing programs In viewers, and most important for our project, e.g., Paintshop web browsers 24-bit, lossy compression format Well suited for screen and print presentations. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. JPEG2000 Provides highly detailed views of objects Not a proprietary format but not all software can handle a JPEG2000 file both PhotoShop and CONTENTdm have that capability To view a file saved as JPEG2000, some products require a browser “plug-in”. CONTENTdm does not require one, but has a built-in viewer in the extended server software. CDL does not currently support JPEG2000, so for this project, you will not create JPEG2000 files. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. GIF: Graphics Interchange Format 8-bit, lossless compression format Well-suited to low resolution screen display Often used for thumbnails Supported by all major computer platforms and web browsers August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. PDF: Portable Document Format Proprietary (Adobe) format, now de facto standard (is actually several formats) All need a plug-in or external application for web display, but that “reader” is free to download. Widely used for printing and viewing multipage documents August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. A Word About File Naming Best practice is to use the standard 8.3 convention, e.g., house178.txt. Use lower-case characters only as some operating systems such as Unix are casesensitive. Avoid punctuation characters in filenames altogether. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. File Naming Simple—a single image Compound—more than one image Components need to be named and stored in logical fashion E.g., when assembling, page_01.jpg will precede page_02.jpg (alphanumeric sort) E.g., when assembling a hierarchy, items need to be stored in logical directories August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Local History Project Conventions Vendor must deliver files named with an appropriate scheme that works for your library And for the Local History Project Exercise will focus on file handling File formats, naming and organization August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Hardware Digital project hardware components will include at minimum Servers Desktop computers Network components August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Your CONTENTdm Environment Server located and managed remotely for the Local History Project. Computer on your desktop Network: IT provider uses components e.g., routers, cables, access points, network interface cards to connect everything together and to the internet. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Data Storage: Day-to-day, And Over The Long Haul As you populate your collections, it is important to back up the workstations and network drives regularly. At the site of the CONTENTdm server, as well as at CDL, servers will also be regularly backed up as well. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Digital collection servers Remember: form follows function. Hardware is sized for the project and for the environment, After the software has been chosen. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. CONTENTdm Server is Hosted by OCLC For LHDRP One-year license After that, depends on funding….if funded could be extended August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Considerations If You Run A Server Processor style and speed Minimum RAM Minimum online storage These variables always depend upon the context of your organization, the operating system environments supported, and the application requirements. The minimum requirements for servers in general assure good performance, i.e., you can very rapidly search and retrieve dense data, and display to many concurrent users. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. CONTENTdm 4 Minimum Server Requirements CPU: Intel Pentium® 4 or greater RAM 512 MB minimum Operating Systems: Linux, unix, Sun Solaris™ 8 or higher, Windows 2000/2003 Dedicated Web server IIS 4.0 or later with Windows®, Apache with UNIX) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Storage For Files Both “derivatives” (service images and thumbnails) are kept online The archival TIFF is stored offline August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Files Most Commonly Seen As Derivative (Access) Files JPEGs averaging 100 K (with most CONTENTdm collections) Estimate 500 jpgs will need about 50 MB space to store the access (service, derivative) images To size a CONTENTdm server, assume that a 1 GB disk Will store 10,000 jpgs for high-quality display August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. To Populate The Collections, On The Desktop Contentdm Requires Monitor capable of 1024 x 768 resolution 256 MB RAM (512 recommended) Disk capacity to hold images (temporarily) and software i.e. 100 MB for installation of Acquisition Station Windows 2000 or XP 128 Kbps minimum network connection August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. A Desktop Wish List—not Required, But Nice! A dedicated computer for digitization with: A 19” or 21” inch display monitor 1 Gb RAM (for multi-media) 3.2GHz/800MHz processors optimized for image manipulation Graphics processors (up to 128 MB dedicated RAM) for high quality video, multiple monitors, etc. High-quality lupes, scales and updated targets August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Digitizing Devices: Scanners and Cameras In this phase of the project, your scanning will be outsourced But info on scanners and cameras is included here for future reference August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. We Will Discuss The Primary Types Of These “Capture” Devices August 2005 Flatbed scanners Transparency scanners Overhead scanners Wide format scanners Cameras Copy stand cameras Camera “backs” Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Flatbed Scanner Chances are you have one of these in your library (or your home). They handle unbound material up to 11” x 17” in size, and some come with automatic document feeder attachments so that you can stack a document for scanning. The makes and models vary greatly in cost and quality. Some have transparency adapters too, but if you have a lot of film (slides) to scan, you may look for a specialized scanner just for them. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Transparency Scanner For transparent material, both negatives and slides, there are many makes and models to choose from, but a commonly used one is made by Nikon. E.g., Nikon LS-2000 Film Scanner 36-bit color 58mb file size 20 second scan speed 2700 dpi resolution 35mm film strip or slide format August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Overhead Scanner If you do a lot of interlibrary loan, you may already own an overhead scanner. it was designed for books, other bound documents, so that the page is protected from touch by the machine. E.g., Minolta PS 3000 and PS 7000 are widely in use August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Cameras For 3-dimensional items and sometimes for oversize items, cameras are becoming very popular. Discussions on various listservs such as “imagelib” are lively with comparisons of cameras from the consumer models we carry on our vacations to highquality professional set ups. E.g., Nikon COOLPIX 3100 Effective pixels 3.2 million (total pixels: 3.34 million) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Copy Stands are used for long exposures, repeated placement of objects, etc. An example of a high quality camera and copy stand is the Leica S1 Pro Digital Camera used in the digitization lab at the University of Utah. It is described as: Triple linear color CCD line, high-performance full step motor. Full scan time is 185 seconds. Viewfinder offers laterally correct image on a focusing screen with a grid. Produces file sizes of 75MB at 36 bit color or 150MB at 48 bit color. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Camera Specification Resolution for cameras is often given as the total number of pixels delivered by a device. For example, a camera may be described as ‘x number of mega-pixels’ A mega-pixel is 1,000,000 pixels. E.g., Canon’s S45 (4.5 Megapixel) maximum resolution: 2272 x 1704 which if you do the math, is closer to 3.8 megapixels… August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. For Highest Quality Professional Work Photographers fit 4x5 traditional film cameras with “camera backs” that store the images digitally instead of in analog format. E.g., PhaseOne PowerPhase-- a digital back to a 4x5 view camera that can produce resolutions of 10,000 x 12,000 pixels. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Camera vs. Scanner Scanners and cameras share broadly similar technologies, and at this point there are negligible quality differences at the high end. Of course scanners can only handle 2dimensional or flat images, while cameras can handle both 2-dimensional and 3-dimensional objects. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Digital Cameras—Versatile and Fast They are preferred for delicate or fragile originals and increasingly for large flat works such as maps and aerial photos. But the lighting is hard to control to get professional quality work you may find yourself hiring a professional photographer to come in. Rare materials should not be subjected to strong light of course, so if doing that sort of photography in-house, you might use a strobe light. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Prints From Digital: What Does The User Need? Many libraries are creating revenue generating (cost-recovery) programs that provide prints from the collection. With the advent of digitization programs, these prints are increasingly made from digitized copies of the original. Occasionally users even purchase the digital file itself. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Cost of Commercial Printer To serve the occasional professional user Outsource to a commercial house or offer to sell the digital image instead. “Pro-sumer” photo-quality printers can be had for under $100 e.g., Canon i560s Some of your users may prefer to buy the TIFF and print at home August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. IF You Print From Digital, You Will House Large Files The B&H Photo house in New York City estimates these file sizes for good output: Up to 3 MB Good for proofing, web use, presentations 3-20 MB 21-50 MB 51-99 MB 100-125MB Good Good Good Good August 2005 for for for for Digital Image Scanning up to 8x10 prints up to 16x20 prints up to 24x30 prints over 24x30 prints Geri Ingram, DiMeMa, Inc. Networking Puts It All Together To move your digital images from your workstation to your CONTENTdm server, you will use the internet. Your connection should have sufficient bandwidth for the digital formats you are importing. Your users will of course need to have connections strong enough to download the images in real time. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Speeds T1: 1.544 million bits per second (Mbps)—this bandwidth is sufficient for building the collection. T3: 45 Mbps – of course this is even better, much faster. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Wireless The most popular wireless mode 802.11 b/g (WiFi) shared 11 Mbps for “b” and 33-54 Mbps for “g”. This should be quite adequate for your endusers to access your collections. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Security Network access is made secure through various methods, IP ranges (addresses like 209.116.xxx.xxx) Passwords Mixed models Integrated with a parent organization’s model! August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Exercise #2 Materials Preparation August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Metadata, Rights, Quality Control Metadata Standards and schemes Access and preservation A full one-day workshop on the metadata Template for Local History Project in Project Guide August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. A Refresher: What Do We Mean By “Metadata”? Metadata is information about the digital object. Good metadata helps in finding and preserving a digital object or aggregation of digital objects. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Metadata Schema Examples AACR2 (MARC format) Dublin Core (DC) Visual Resources Association Core (VRA Core) Metadata Object Descriptive Schema (MODS) Encoded Archival Description (EAD) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Types of Metadata Descriptive Administrative Structural Technical “Preservation” August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Descriptive Metadata Terms that say what the digital object represents—what it is “about” It’s what your users expect—it identifies the information resources in a way that allows them to be discovered. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Administrative Metadata Facilitates both short-term and long-term management and processing of digital collections Includes data pertinent to the creation of the digital object Includes rights management, access control and use requirements August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Structural Metadata Facilitates navigation and presentation Provides information about the internal structure of resources including page, section, chapter numbering, indexes, and table of contents Describes the relationship among materials (e.g., photograph B was front of Postcard A) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Technical Metadata Describes the features of the digital file e.g. resolution, pixel dimensions, and the compression factor used in saving the file. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. “Preservation” Metadata The ability to preserve your digital resources into the future depends in part on how completely you’ve applied metadata, especially administrative structural technical metadata August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. LHDRP, CONTENTdm and the Dublin Core Title Creator Subject Description Publisher Contributor Date Type August 2005 Digital Image Scanning Format Identifier Source Language Relation Coverage Rights CONTENTdm offers Audience too Geri Ingram, DiMeMa, Inc. You Will Use These Elements To Describe Your Collections At the item level Later, your collections will be during the CONTENTdm building process. exported Imported to OAC Metadata and CONTENTdm classes scheduled There we will delve into applying the Dublin Core element set August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Rights: Metadata When material needs to be restricted The reasons should be made clear to the endusers, If possible, the right to access the objects should be negotiated. You will have to clear your materials of any restrictions so that they can be freely displayed on the CDL’s public access site(s). August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Access To Contentdm Server The Dublin Core Rights field can be used to explain the rights situation for the item Mechanisms in place to allow you to restrict access to materials at the item and the collection level. Some commonly used mechanisms for controlling access to digital materials are user name/password challenges and IP (internet protocol) address ranges. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. CONTENTdm Uses both usernames/passwords and IP ranges Control access at the collection and the item level When your users are viewing your images on a CONTENTdm server. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Quality control Getting the materials off to the vendor appropriately packed, tagged and flagged Getting the materials back from the vendor what will you check for? texts and photos—different things to look for August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Texts And Images Of Texts The scan produces a file in image format, which in itself is not searchable There are a number of ways to create searchable text from images of text. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Converting Images To Text Re-keying handwritten text, or foreign language fonts very expensive, but high-quality you will have to create typescripts by hand. OCR (Optical Character Recognition) is the automated way With correction, expensive, August 2005 but without correction lower accuracy Digital Image Scanning Geri Ingram, DiMeMa, Inc. What is OCR? OCR “engines” are pattern recognition algorithms which can convert images of alphanumeric characters into machine-recognizable characters. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. OCR Has Been Around Since The 1970s Much research to improve accuracy and extend the readable language sets. Very expensive in the early days Available to desktop consumers in the mid-late 1990s August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Now There Is Decent “Pro-sumer” Desktop Software Such as AbbyyFine Reader available (e.g., this is offered as an extension of CONTENTdm.) Service bureaus (vendors) have also developed proprietary software get up to 90% accuracy can handle large volumes use filters, formulas and multi-pass methods August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. The Problem With OCR When used on barely legible old texts, film, etc., creates “dirty” ASCII— “Guesses” are saved in a string not intended for human view. (These should be cleaned up if display is important.) can hide the dirty ASCII from display but allow the search engine to index on it August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. To What Degree Is The Accuracy Of The OCR Important? This depends on the quality of the image being processed, and on the intended use of the captured text. A rule of thumb: high resolution, greater bitdepth gives more accurate OCR (and larger file sizes). August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Imaging Vendor Checklist: Identifying Unacceptable Scans Image not correct size File name is incorrect File format is incorrect Loss of detail Too light or too dark Image cropped incorrectly Image rotated incorrectly Image reversed August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Identifying Correct Packaging Of Digital Materials Object identifier The order of the compound object’s parts corresponding file names and directory structure Verify to CALIFA August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Exercise #3 Quality Control August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Effectiveness What Is Success ? Best practices in the digitization process, evaluation and quality control. Usability testing As technology changes, as long as you are relying on agreed-upon standards, You will be able to go back and correct, improve and expand. August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. User-driven Purposes Many reasons for undertaking a digitization project, All include to improve and expand end-user access to your materials. Even “preserving” the content and “conserving the originals” August 2005 It is because someday a person may need to access the resource Digital Image Scanning Geri Ingram, DiMeMa, Inc. Late Turn-of-the-century History Regular use of digitization in cultural heritage organizations such as libraries and archives Leaders in the field like the California Digital Library, the Digital Library Federation documented “best practices” August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Principles, Part 1 Leading practices proven over time Scan at the highest resolution appropriate to the informational content of the originals Scan at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future—scan once Create and store a master image file that can be used to produce derivative image files and serve a variety of current and future user needs Use image file formats and compression techniques that conform to industry standards Create backup copies of all files on a stable medium August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Principles, Part 2: Create meaningful metadata for image files or collections Store media in an appropriate environment Monitor and recopy data as necessary Outline a migration strategy for transferring data across generations of technology Anticipate and plan for future technological developments Scan (or have your vendor scan) at the appropriate settings for source material Inspect master images at 100% magnification (all or a sample) August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Local History Project Standards The California State Library, CALIFA and CDL Partnered to create a set of standards for digital imaging and metadata To ensure that your collections are accessible to your public and well-preserved into the future. Selected a digital collection management tool Prepared a straightforward path for your materials from CONTENTdm to the CDL August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Let’s Revisit Our Project Plans And make sure we chart our course for the next steps! Exercise #4 Assessing and Improving Your Local History Project August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc. Conclusion Please fill out your evaluation forms See you in a few weeks for CONTENTdm training! August 2005 Digital Image Scanning Geri Ingram, DiMeMa, Inc.