Overwhelmed by Large-Scale Digitization Projects?
Download
Report
Transcript Overwhelmed by Large-Scale Digitization Projects?
OVERWHELMED BY LARGE-SCALE
DIGITIZATION PROJECTS
Xiaocan (Lucy) Wang
Digital Repository Librarian
Eric Holt
University Archivist
Cunningham Memorial Library
Indiana State University
Agenda
Project background
Implementation
Equipment
Software choices
Process
Ingestion
Workflow
Outcome
Lesson learned
Conclusion
Project Background
Indiana State University
Project Background
ETD (electronic theses and dissertations)
ETD Digital Initiative
2010 and onward
Access
Project background (cont.)
RTD (retrospective theses and dissertations)
Number:
3,802
Where: Archives + Library basement
Condition: most in usable condition, but…
Access
Project Background (cont.)
Purposes
Centralize:
ETD & RTD
Improve access, search and retrieval
Support teaching, learning and research
Improve preservation
Project Background (cont.)
Consideration
Format
Copyright
Privacy
Equipment
Bookdrive DIY
Disclosure
Not currently or previously an employee of the
corporations whose products I discuss
I am not compensated for my comments or opinions
Older software version being used
Capture New Book window
Capture in action
Batch entry
Irfanview
GIMP
Open source equivalent to Photoshop
Batch processing requires additional plugin
Supervisor unfamiliarity
Photoshop
Can record action to perform batch processing
Graphical interface while setting up recorded
action
Changing DPI
Color
Grayscale
B/W
PDF Compression
All items being converted are compressed
Some formats compress better than others
Compression artifacts can also become visible
Original image of page is visible
Searchable text layer is hidden
First Review
All pages present?
All text legible?
No
shadows covering text?
Page in focus?
Essential color elements retained?
PDF/a
Copy saved to Archives server
Only accessible to staff
Final Review and cleanup
Review metadata
Correct if necessary
Approve and publish
Remove original camera images, processed images,
and extra copies of pdf
Workflow
Imaging original theses or dissertations
Workflow (cont.)
Processing image files
Workflow (cont.)
Converting to PDF/A
Workflow (cont.)
Publishing on ISU IR
Outcomes
Volume finished: 848
Average volume size: 96 pages
Average student time: 1.3 hours
Average supervisor time: 5-10 minutes
Average file size: 5.5 MB
Total Disk Space: 4.6 GB
Approximate cost: $15-18
Worth It?
Centralize
Improve access
Via
digital repository
Search engines
Digital repository registries
WorldCat
Worth it? (cont.)
Support teaching, learning and research
Improve preservation strategies
Multiple
digital copies
Backup
Bitstream
preservation
Distributed preservation network
via
MetaArchive Cooperative
Lesson learned
Control quality:
monochrome and grayscale
Supervise students
Add MARC 856 field
Secure continued funds
Conclusion
Complex
Various issues
Funding
Technical
standards
Quality control
Format selection
In-house vs. outsourcing
Metadata
Delivery
Preservation
Rights management
Workflow development
Contact info
Xiaocan (Lucy) Wang
[email protected]
Eric Holt
[email protected]