Transcript Document
Massively Digitizing UC Library Collections Google, Microsoft, and More
Learning in Retirement
Libraries – The Intersection of Tradition and Innovation
April 10, 2008 Ivy Anderson & Heather Christenson
California Digital Library
“11 th University Library” founded 1997 Part of UC Office of the President
Two Complementary Roles
Facilitate library collaboration across the ten
campuses of the UC system (e.g. shared collection
development)
Distinctive services emphasizing digital stewardship, innovation in scholarly publishing, and open-access digital collections
Three Audiences
UC libraries
Broader UC community External constituencies and the general public
Five Programs
Collection Development and Management
(Licensed Content, Shared Print Collections, Mass Digitization)
Bibliographic Services (Melvyl Catalog, SFX) Preservation (Digital Preservation Repository, Web
Archiving)
Digital Special Collections (Calisphere, Online
Archive of California)
Publishing Services (eScholarship Repository,
eScholarship Editions, collaboration with UC Press)
Digitization of Library Collections
Special Collections Manuscripts, archival collections, photographs, etc.
Berkeley, University of California, Bancroft Library, UCB 150, f. 252v CDL / UC Libraries Online Archive of California Calisphere
Digitization of Library Collections
Specialized Texts and Corpora Making of America -10,000 texts in 10 years CDL eScholarship Editions
Digitization of Library Collections
Commercial Partnerships EEBO: 100,000 important early English texts Licensed access via ProQuest Satans stratagems, 1648. copy from UCLA Library
…and Along Came Google
Google Library Project
2005:
The ‘Google Five:’
Harvard, Oxford, New York Public Library, Stanford, University of Michigan
2008:
20 library partners in 5 countries Google Publisher Partner Program
…and the Open Content Alliance
October 2005
Founders: Internet Archive, University of California , U of Toronto… Large-scale digitization of out-of copyright works only A project of the Internet Archive
…and Microsoft
Out-of-Copyright Works Only
UC Involvement
Founding Member of Open Content Alliance October 2005 UC Joins Google Library Project August 2006 Microsoft Digitization Agreement March 2007
So: Three Projects, One Goal
Goal: Mass digitization of library book collections Google In-copyright and out-of-copyright works Available via Google search engine and Google Book Search Microsoft Out-of-copyright works only Available via Microsoft Live Search Open Content Alliance Out-of-copyright works only Available (via the Internet Archive website) to any and all search engines Library and grant-funded
Why Are
They
Doing It?
Google’s vision: To put all the world’s information online Google and Microsoft: To gain marketshare and competitive advantage for their search (and online advertising) services
It’s all about Search
OCA: To put the world’s information online, for free, forever
It’s all about the public good
Why Are
We
Doing It?
To enhance student and faculty research To put our collections where our users are – in Google!
Mass digitization of these materials enhances access. It can make people aware of books they may not have discovered otherwise and lead them, through an internet search, back to our libraries To support deeper textual analysis and research. Scholars can trace the evolution of ideas and perform other sophisticated textual analysis when the full text is indexed and searchable by computer, opening scholarship in new ways.
To fulfill our public service mission Many books of enduring general interest anywhere, anytime – including classic works of literature and more unique items such as early histories of the settlement of California and the West - can now be read by anyone, To preserve and protect our collections In earthquake and fire-prone California, digitizing books in our collections may also help protect the university from catastrophic loss should disaster someday strike our libraries
Microsoft/OCA Project Contributors
Northern Regional Library Facility (NRLF) Southern Regional Library Facility (SRLF) UC Berkeley, Bancroft Library UCLA
Google Project Contributors
Northern Regional Library Facility (NRLF) + UC Berkeley Systems UC Santa Cruz UC San Diego
CDL’s role, on behalf of UC
Liaison with partners Planning & coordination Funding Stewardship of digital content New services
Campuses Provide the Books
The Book Digitization Process
A world of barcodes, logistics, loading docks, packing materials, and scanning machines!
Reasons books might get rejected (images)
Costs to the UC Libraries
Staffing (2-5 FTE at each of 5 locations) Physical space & facilities Scanning centers (where scanning machines are housed), book processing, queue storage (book trucks) Costs to run campus systems CDL servers for inventory database, digital preservation
Digital files
Images OCR - Text OCR - Page coordinates Metadata
What sort of books are being digitized?
American history Humanities Science Cookbooks Children’s books East Asian & Pacific Rim collections
Where can you access the books?
Google Book Search
: http://books.google.com/
Microsoft Live Search Books
: http://search.live.com/results.aspx?q=&scop e=books
Internet Archive
: http://www.archive.org/details/university_of_c alifornia_libraries
Test version of UC Union catalog:
http://melvyl-test.cdlib.org:8164/F
Copyright status is a factor
Out of copyright, pre-1923 “orphan works,” 1923-1964 1965 - present
At the frontier…
What’s ahead
Digital preservation –storage, storage, storage Copyright determination Print on demand
New modes of access & critical mass of digital books will transform scholarship Full text search - new form of book discovery Beyond search – text mining, computationally assisted research Machines can interact with massive amounts of texts, and provide new structures
Questions?
Heather Christenson, CDL Mass Digitization Project Manager [email protected]
Ivy Anderson, CDL Director of Collections [email protected]
For more information: http://www.cdlib.org/inside/projects/mas sdig/