This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library Archiving 2009 May 5-8, 2009

Download Report

Transcript This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library Archiving 2009 May 5-8, 2009

This Library Never Forgets
Preservation, Cooperation, and the
Making of HathiTrust Digital Library
Jeremy York
Project Librarian
HathiTrust Digital Library
Archiving 2009
May 5-8, 2009
What is HathiTrust?
•
•
•
•
•
•
•
•
•
•
•
•
•
California Digital Library
Indiana University
Michigan State University
Northwestern University
The Ohio State University
Penn State University
Purdue University
UC Berkeley
UC Davis
UC Irvine
UCLA
UC Merced
UC Riverside
•
•
•
•
•
•
•
•
•
•
•
UC San Diego
UC San Francisco
UC Santa Barbara
UC Santa Cruz
The University of Chicago
University of Illinois
University of Illinois at Chicago
The University of Iowa
University of Michigan
University of Minnesota
University of WisconsinMadison
• University of Virginia
Current Holdings
• As of May 5
• 2,823,385 volumes
• 448,413 in the public domain (~16%)
How it came to be
University of Michigan
• Large Scale Production Environments
– JSTOR
– Making of America
– PEAK
– Humanities Text Initiative
Committee on Institutional Cooperation
• Long history of successful cooperation
• Voluntary partnership
• Build strengths of all for benefit of all
University of California
• System-wide planning
• Shared storage, cataloging
• Standards – preservation and access
University of Virginia
•
•
•
•
Electronic Text Center 1992
Focus on the scholar
Innovation and Research
User-centered orientation
Origins - Chronology
• UM in 2004
• “…U of M shall have the right to use the U of
M Digital Copy, in whole or in part at U of M's
sole discretion, as part of services offered in
cooperation with partner research libraries
such as the institutions in the Digital Library
Federation…”
Origins - Chronology
• 2007 CIC/Google Agreement
• Shared Digital Repository
• 2008 University of California and University of
Virginia join
• Launched October, 2008
Goals and Aspirations
How we are doing
Partnership
• Grow
• Voluntary/Flexible
• Stable
Governance Model
• Executive Committee
• Strategic Advisory Board
Executive Committee
• Paul Courant, University Librarian and Dean of Libraries, University
of Michigan
• John King, Vice Provost for Academic Information, University of
Michigan
• Patricia Steele, Dean of Libraries, Indiana University
• Brad Wheeler, Chief Information Officer, Indiana University
• Paula Kaufman, University Librarian and Dean of Libraries,
University of Illinois at Champaign-Urbana
• Laine Farley, Executive Director, California Digital Library
• Brian Schottlaender, University Librarian, University of California,
San Diego Libraries
• John Wilkin, Executive Director of HathiTrust and Associate
University Library, Library Information Technology, University of
Michigan
Strategic Advisory Board
• Guiding hand of HathiTrust
• At least 4 members from the CIC, 3 members
from the University of California
Strategic Advisory Board
– Ed Van Gemert (Chair), Director of Libraries, University of
Wisconsin-Madison
– John Butler, Associate University Librarian for Information
Technology, University of Minnesota
– Patricia Cruse, Director, Preservation, California Digital Library
– Robin Dale, Associate University Librarian for Collections and
Library Information Systems, University of California, Santa Cruz
– R. Bruce Miller, University Librarian, University of California,
Merced
– Sarah Pritchard, University Librarian, Northwestern University
– Paul Soderdahl, Director, Library Information Technology,
University of Iowa
– John Wilkin, Executive Director, HathiTrust (ex officio)
Partnership/Cost Model
• HathiTrust Funded for initial 5-year period
(2008-2013)
• Base funding from member institutions
• 3-year review
• Constitutional Convention
– Members by September 2010
– Contribute content by March 2011
How much does it cost?
• Infrastructure
Costs
•
•
•
•
•
Estimate content over 5 years
Calculate proportional cost
Calculate average per-year cost
< $0.15 per volume
One-time fee (25% of yearly cost)
Repository and Content
•
•
•
•
Sustainable curation of library content
Community Building
Support content beyond books and journals
Grow
Sustainable Curation
• fund repository with base funds from member
institutions
• two active storage sites with backup
• Based on standards and best practices for
Archival repositories
–
–
–
–
OAIS
METS/PREMIS
Ingest Validation (GROOVE)
Periodic fixity checks using MD5
• Rights Database
OAIS
Reference
Model
Sustainable curation of library content
GROOVE
(JHOVE)
MARC record extensions
(Aleph)
Rights DB
Page Turner
HathiTrust API
OAI
GeoIP DB
CNRI Handles
[Solr]
Google
[OCA]
In-house Conversion
GRIN
Internal Data Loading
METS/PREMIS object
TIFF G4/JPEG2000
OCR
MD5 checksums
Isilon
Site Replication
TSM
MD5 checksum validation
METS object
PNG
OCR
PDF
Community Building
• Shared Collection Development
– Unified core collection
– Certification of volumes
Support content beyond books and
journals
• Born-digital
• Native XML
• Encoded Text
Grow
Services
•
•
•
•
•
•
Catalog
Page Turner
Bibliographies and Saved Collections
Users with Print Disabilities
Computational Research (sample datasets)
Ability to build applications with Library
content
• Large scale Search
Upcoming Plans
• Expand partnership
• Begin work on shared collection development
and de-duplication
• Complete Data API
• Create Development Sandbox
• Configure for Computational Research
• Worldcat Local Catalog
• Prepare for TRAC
Thank you very much!
[email protected]
[email protected]
http://www.hathitrust.org
http://catalog.hathitrust.org