UC Libraries and the Implications of Mass Digitization Robin L. Chandler User’s Council May 11, 2007

Download Report

Transcript UC Libraries and the Implications of Mass Digitization Robin L. Chandler User’s Council May 11, 2007

UC Libraries and the Implications
of Mass Digitization
Robin L. Chandler
User’s Council
May 11, 2007
Seek to achieve in this talk:
• Status report on UC Libraries’ mass
digitization projects
• Impact of mass digitization on our
collections and our users
UC Libraries’
Mass Digitization Projects
• Overview of two projects
– Microsoft/Internet Archive
– Google Books
• Look at Operations
• April 2007 status report on scanning
Understanding Participant Roles
• UC Libraries
– supply & curate books
– preserve digital files created
– supply onsite scanning facilities when
appropriate
• Third-parties (Google, Microsoft, Yahoo)
– provide funding for book scanning
– manage digitization vendor
Microsoft / Internet Archive
• Production scanning began April 2006
• Internet Archive: Digitization Agent
• Projected scope 100 K books (public
domain) per year
– Scanning books from all campus libraries
• Scanning Centers (20 scanning machines)
– Location: UC at NRLF and SRLF
Google
• Production scanning began October 2006
– Scanning books from NRLF currently
• Projected Scope
– 2.5 million books during 6 year period
– Public domain /in-copyright
• Scanning Center
– Books transported to offsite scanning facility
– Over 3K book / per day
Workflow Steps (1)
• #1 Project management
• #2 Select, retrieve, inspect, mass charge /physical
charge, physical transfer
• #3 Sharing bibliographic records (over 3 K daily)
• #4: Digitization: creating content files & metadata
– JP2000, PDF, OCR
– Metadata created during scanning including
image coordinates
Workflow Steps (2)
• #5 Mass discharge / manual charge; books
returned to shelves
• #6 Quality control on digital files prior to
ingest
• #7 ingest of metadata and content files for
preservation storage
• #8 Enhance union and local catalog records
with link to hosted content
Motives: UC Libraries exploring models
• Collection Management: Digital reformatting can
help support our efforts to build shared print
collections
• Curating through Collaboration: Digitization of
local materials creates access (for our patrons) to
third-party materials not currently available
• Funding Reallocation: Funds invested in licensing
online collections of out of copyright materials
could be reallocated to digital reformatting our
unique content
Mass Digitization Collection
Advisory Group (MDCAG)
• Approved by University Librarians
• First meeting March 2007
• Charge:
– Develop process for selection of book
collections for scanning from across UC
Libraries
• Collection Development Committee (CDC)
will approve collection selections
April 2007 Status Report
• Google
–
–
–
–
249,485 books transferred
235,633 books scanned
11,320 books rejected
55,264 books live
• Microsoft/Internet Archive
– 84,315 books transferred
– 58,543 books scanned / books live
– 25,772 books rejected
Success due to our Systemwide
collaboration!
• UCB & UCLA Libraries / Northern and Southern
Regional Library Facility teams
• UC Library systemwide groups: ULs, SOPAG
CDC, PAG, HOPS, Bibliographer Groups
• Mass Digitization Collection Advisory Group
(MDCAG)
• CDL Programs: Bibliographic Services,
Collections, Data Acquisitions, Digital
Preservation Repository
Microsoft: Sample Book
Internet Archive: Sample Book
Google: Sample Book
Impacts of Mass Dig
• Will we re-define our collections ?
• How should we make collections available
to our users?
Mass Dig: Collections & Users:
• All Libraries can be bigger than before
– Leveraging the collections of other libraries to bring content to our
users
• Leveraging our collections ala the Long Tail
– Libraries can learn from Netflix
• Digitize local content – we all have special stuff!
– Unique holdings support specialized disciplines
• Prepare: demand for the physical item may increase
– Digital access may increase relevance of analog
• Book discovery increasingly happens outside the library
– Information discovery (Google, MSM, Yahoo!)
– Bibliographic discovery (Amazon)
Our Users Today
• Faculty, Graduates and
Undergrads
• Working in range of
disciplines
• Seeking efficiencies
• Define their tool space
• Resource needs are diverse
– Can very day by day
• They judge resource’s worth
Dawn of the Embedded Library (1)
• Web services embed library content into the browsing
experience of users
– Enable discovery, locate, request, and delivery
– Library content must be exposed to aggregators
• Examples: Library Thing, NCSU’s Catalog WS, LibX
Firefox, Google Book Search
– integrating web services for users and customizing software
– Leveraging Catalog, Open URLs, COinS, APIs, etc.
Dawn of the Embedded Library (2)
• Providing user services
– Find in a library, POD, download mobile
devices, ILL, order from Amazon, etc.
• Expose our content to aggregators and
consume the data of others
– OAI-PMH, SRU, Google Sitemap, Open
Search, RSS feeds, mobile device searching
Library Thing: Catalog Your Books
Online – social bookmarking
NCSU’s CatalogWS
LibX: Providing direct access to
your library’s resources
Mass Dig & New Library Services
• What systems are required to extract meaning
from massive text collections?
– Machine translation, data mining, etc.
• What new modes of reading, representation and
understanding are needed to interact with texts?
– Linguistic, visual, and statistical processing
• What collaborations between librarians, computer
scientists and scholars are needed to do this
exploration?
– Standards, search queries, visualization, social
networks
Epilogue: Mr. Peabody’s
WABAC (wayback) Machine
• 1992 Conference on “Technology, Scholarship and the
Humanities: The Implications of Electronic Information
asked certain questions:
• Will scholarship be better if it takes advantage of
technology?
• How will technology affect
– The book?
– The lecture?
– The library?
– The classroom?
1992: Historical Context
• Cold War formally ended & US lifted trade sanctions
against China
• Bill Clinton was elected U.S. President
• Four police officers were acquitted in Rodney King Trial
• Johnny Carson left the Tonight Show
• Earth Summit held in Rio de Janeiro
• CD sales surpassed cassette tapes
• OPACs and Gopher were in the library and a text-based
web browser was first made available to the public…..
Technology, Scholarship, & Humanities
Conference: Viewpoints (1)
• Richard A. Lanham, Professor of English, UCLA
“As traditionally taught, each class exists in a temporal, conceptual
and social vacuum…but if an electronic library were
employed…students could read papers submitted in earlier classes,
read scholarly articles on the same topics, read before-and-after
examples of revised work, do searches of Shakespearean texts for
imagery or rhetorical figuration, and make excerpts of videotaped
performances to illustrate their papers – all without going to the
campus library. Most importantly, a course like this would have a
history and could be accessed by people in other courses; it would
constitute a continuing society, its students becoming citizens of a
commonwealth”
Technology, Scholarship, & Humanities
Conference: Viewpoints (2)
• William Y. Arms, VP Computing Services
Carnegie Mellon
• “The scientific community has long-funded its capital-intensive
projects with support from government and industry. In contrast, only
2 percent of humanities research funding comes from the U.S.
government. As a result, the humanities can undertake few large,
interdisciplinary projects unless the government and other funding
agencies perceive the outcome to benefit the entire academic
community…..”
Thank you
• Please feel free to contact me at
[email protected]