Transcript Document

Massively Digitizing UC Library Collections Google, Microsoft, and More

Learning in Retirement

Libraries – The Intersection of Tradition and Innovation

April 10, 2008 Ivy Anderson & Heather Christenson

California Digital Library

“11 th University Library” founded 1997 Part of UC Office of the President

   Two Complementary Roles  

Facilitate library collaboration across the ten

campuses of the UC system (e.g. shared collection

development)

Distinctive services emphasizing digital stewardship, innovation in scholarly publishing, and open-access digital collections

Three Audiences 

UC libraries

 

Broader UC community External constituencies and the general public

Five Programs  

Collection Development and Management

(Licensed Content, Shared Print Collections, Mass Digitization)

Bibliographic Services (Melvyl Catalog, SFX)    Preservation (Digital Preservation Repository, Web

Archiving)

Digital Special Collections (Calisphere, Online

Archive of California)

Publishing Services (eScholarship Repository,

eScholarship Editions, collaboration with UC Press)

Digitization of Library Collections

 Special Collections  Manuscripts, archival collections, photographs, etc.

Berkeley, University of California, Bancroft Library, UCB 150, f. 252v  CDL / UC Libraries   Online Archive of California Calisphere

Digitization of Library Collections

  Specialized Texts and Corpora  Making of America -10,000 texts in 10 years CDL  eScholarship Editions

Digitization of Library Collections

 Commercial Partnerships  EEBO: 100,000 important early English texts  Licensed access via ProQuest Satans stratagems, 1648. copy from UCLA Library

…and Along Came Google

  Google Library Project 

2005:

The ‘Google Five:’ 

Harvard, Oxford, New York Public Library, Stanford, University of Michigan

2008:

20 library partners in 5 countries Google Publisher Partner Program

…and the Open Content Alliance

October 2005

   Founders: Internet Archive, University of California , U of Toronto… Large-scale digitization of out-of copyright works only A project of the Internet Archive

…and Microsoft

Out-of-Copyright Works Only

UC Involvement

Founding Member of Open Content Alliance October 2005 UC Joins Google Library Project August 2006 Microsoft Digitization Agreement March 2007

So: Three Projects, One Goal

    Goal: Mass digitization of library book collections Google   In-copyright and out-of-copyright works Available via Google search engine and Google Book Search Microsoft   Out-of-copyright works only Available via Microsoft Live Search Open Content Alliance    Out-of-copyright works only Available (via the Internet Archive website) to any and all search engines Library and grant-funded

Why Are

They

Doing It?

   Google’s vision: To put all the world’s information online Google and Microsoft: To gain marketshare and competitive advantage for their search (and online advertising) services 

It’s all about Search

OCA: To put the world’s information online, for free, forever 

It’s all about the public good

Why Are

We

Doing It?

    To enhance student and faculty research To put our collections where our users are – in Google!

 Mass digitization of these materials enhances access. It can make people aware of books they may not have discovered otherwise and lead them, through an internet search, back to our libraries  To support deeper textual analysis and research. Scholars can trace the evolution of ideas and perform other sophisticated textual analysis when the full text is indexed and searchable by computer, opening scholarship in new ways.

 To fulfill our public service mission Many books of enduring general interest anywhere, anytime – including classic works of literature and more unique items such as early histories of the settlement of California and the West - can now be read by anyone,  To preserve and protect our collections In earthquake and fire-prone California, digitizing books in our collections may also help protect the university from catastrophic loss should disaster someday strike our libraries

Microsoft/OCA Project Contributors

    Northern Regional Library Facility (NRLF) Southern Regional Library Facility (SRLF) UC Berkeley, Bancroft Library UCLA

Google Project Contributors

 Northern Regional Library Facility (NRLF) + UC Berkeley Systems  UC Santa Cruz  UC San Diego

CDL’s role, on behalf of UC

     Liaison with partners Planning & coordination Funding Stewardship of digital content New services

Campuses Provide the Books

The Book Digitization Process

 A world of barcodes, logistics, loading docks, packing materials, and scanning machines!

Reasons books might get rejected (images)

Costs to the UC Libraries

   Staffing (2-5 FTE at each of 5 locations) Physical space & facilities  Scanning centers (where scanning machines are housed), book processing, queue storage (book trucks)  Costs to run campus systems CDL servers for inventory database, digital preservation

Digital files

 Images  OCR - Text  OCR - Page coordinates  Metadata

What sort of books are being digitized?

      American history Humanities Science Cookbooks Children’s books East Asian & Pacific Rim collections

Where can you access the books?

Google Book Search

: http://books.google.com/   

Microsoft Live Search Books

: http://search.live.com/results.aspx?q=&scop e=books

Internet Archive

: http://www.archive.org/details/university_of_c alifornia_libraries

Test version of UC Union catalog:

http://melvyl-test.cdlib.org:8164/F

Copyright status is a factor

   Out of copyright, pre-1923 “orphan works,” 1923-1964 1965 - present

At the frontier…

What’s ahead

 Digital preservation –storage, storage, storage  Copyright determination  Print on demand

New modes of access & critical mass of digital books will transform scholarship    Full text search - new form of book discovery Beyond search – text mining, computationally assisted research Machines can interact with massive amounts of texts, and provide new structures

Questions?

  

Heather Christenson, CDL Mass Digitization Project Manager [email protected]

Ivy Anderson, CDL Director of Collections [email protected]

For more information: http://www.cdlib.org/inside/projects/mas sdig/