This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library Archiving 2009 May 5-8, 2009
Download ReportTranscript This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library Archiving 2009 May 5-8, 2009
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library Archiving 2009 May 5-8, 2009 What is HathiTrust? • • • • • • • • • • • • • California Digital Library Indiana University Michigan State University Northwestern University The Ohio State University Penn State University Purdue University UC Berkeley UC Davis UC Irvine UCLA UC Merced UC Riverside • • • • • • • • • • • UC San Diego UC San Francisco UC Santa Barbara UC Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Michigan University of Minnesota University of WisconsinMadison • University of Virginia Current Holdings • As of May 5 • 2,823,385 volumes • 448,413 in the public domain (~16%) How it came to be University of Michigan • Large Scale Production Environments – JSTOR – Making of America – PEAK – Humanities Text Initiative Committee on Institutional Cooperation • Long history of successful cooperation • Voluntary partnership • Build strengths of all for benefit of all University of California • System-wide planning • Shared storage, cataloging • Standards – preservation and access University of Virginia • • • • Electronic Text Center 1992 Focus on the scholar Innovation and Research User-centered orientation Origins - Chronology • UM in 2004 • “…U of M shall have the right to use the U of M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation…” Origins - Chronology • 2007 CIC/Google Agreement • Shared Digital Repository • 2008 University of California and University of Virginia join • Launched October, 2008 Goals and Aspirations How we are doing Partnership • Grow • Voluntary/Flexible • Stable Governance Model • Executive Committee • Strategic Advisory Board Executive Committee • Paul Courant, University Librarian and Dean of Libraries, University of Michigan • John King, Vice Provost for Academic Information, University of Michigan • Patricia Steele, Dean of Libraries, Indiana University • Brad Wheeler, Chief Information Officer, Indiana University • Paula Kaufman, University Librarian and Dean of Libraries, University of Illinois at Champaign-Urbana • Laine Farley, Executive Director, California Digital Library • Brian Schottlaender, University Librarian, University of California, San Diego Libraries • John Wilkin, Executive Director of HathiTrust and Associate University Library, Library Information Technology, University of Michigan Strategic Advisory Board • Guiding hand of HathiTrust • At least 4 members from the CIC, 3 members from the University of California Strategic Advisory Board – Ed Van Gemert (Chair), Director of Libraries, University of Wisconsin-Madison – John Butler, Associate University Librarian for Information Technology, University of Minnesota – Patricia Cruse, Director, Preservation, California Digital Library – Robin Dale, Associate University Librarian for Collections and Library Information Systems, University of California, Santa Cruz – R. Bruce Miller, University Librarian, University of California, Merced – Sarah Pritchard, University Librarian, Northwestern University – Paul Soderdahl, Director, Library Information Technology, University of Iowa – John Wilkin, Executive Director, HathiTrust (ex officio) Partnership/Cost Model • HathiTrust Funded for initial 5-year period (2008-2013) • Base funding from member institutions • 3-year review • Constitutional Convention – Members by September 2010 – Contribute content by March 2011 How much does it cost? • Infrastructure Costs • • • • • Estimate content over 5 years Calculate proportional cost Calculate average per-year cost < $0.15 per volume One-time fee (25% of yearly cost) Repository and Content • • • • Sustainable curation of library content Community Building Support content beyond books and journals Grow Sustainable Curation • fund repository with base funds from member institutions • two active storage sites with backup • Based on standards and best practices for Archival repositories – – – – OAIS METS/PREMIS Ingest Validation (GROOVE) Periodic fixity checks using MD5 • Rights Database OAIS Reference Model Sustainable curation of library content GROOVE (JHOVE) MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Google [OCA] In-house Conversion GRIN Internal Data Loading METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums Isilon Site Replication TSM MD5 checksum validation METS object PNG OCR PDF Community Building • Shared Collection Development – Unified core collection – Certification of volumes Support content beyond books and journals • Born-digital • Native XML • Encoded Text Grow Services • • • • • • Catalog Page Turner Bibliographies and Saved Collections Users with Print Disabilities Computational Research (sample datasets) Ability to build applications with Library content • Large scale Search Upcoming Plans • Expand partnership • Begin work on shared collection development and de-duplication • Complete Data API • Create Development Sandbox • Configure for Computational Research • Worldcat Local Catalog • Prepare for TRAC Thank you very much! [email protected] [email protected] http://www.hathitrust.org http://catalog.hathitrust.org