HATHITRUST A Shared Digital Repository HathiTrust Overview MichALL Spring Meeting May 18, 2012 Jeremy York, Project Librarian, HathiTrust.
Download ReportTranscript HATHITRUST A Shared Digital Repository HathiTrust Overview MichALL Spring Meeting May 18, 2012 Jeremy York, Project Librarian, HathiTrust.
HATHITRUST A Shared Digital Repository HathiTrust Overview MichALL Spring Meeting May 18, 2012 Jeremy York, Project Librarian, HathiTrust Partnership Arizona State University Baylor University Boston College Boston University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of Arizona University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of WisconsinMadison Utah State University Washington University Yale University Library Digital Repository • Launched 2008 • Initial focus on digitized book and journal content – 10,302,450 total volumes – 5,462,709 book titles – 271,014 serial titles – 2,994,286 public domain (~29%) The Name • The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy Mission • To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge Collections and Collaboration • Comprehensive collection - Preservation…with Access • Shared strategies – – – – – – Copyright Collection management, development Preservation Discovery / Use Bibliographic Indeterminacy Efficient user services • Public Good Content Content Distribution 72% "Public Domain" 28% Public Domain (worldwide) 14% U.S. Federal Government Documents (worldwide) 4% Public Domain (US) 10% Open Access .1% Creative Commons .01% Content Sources LC 1% Minnesota 1% Yale UNC-Chapel Hill 0% Harvard Madrid Virginia 0% Utah State 1% Indiana 1% Chicago 0% 0% 2% NCSU 0% Columbia NorthwesternDuke 0% 0% 1% 0% Illinois Penn State NYPL Princeton Purdue 0% 0% 3% 3% 0% Cornell Wisconsin 4% 5% Michigan 45% California 33% Dates 1900-1909 4% 1910-1919 4% 1920-1929 4% 1930-1939 4% 1940-1949 4% 1950-1959 6% 1600-1699 0% 1800-1849 3% 1700-1799 1850-1899 1% 8% 1500-1599 0% 0-1500 0% 2000-2009 10% 1990-1999 14% 1980-1989 15% 1960-1969 11% 1970-1979 13% Language Distribution (1) Arabic Latin 2%Italian 1% Japanese 3% Remaining Languages 14% 3% Russian 4% Chinese 4% Spanish 5% French 7% The top 10 languages make up ~86% of all content English 48% German 9% Language Distribution (2) Ancient-Greek Ukrainian Bulgarian Panjabi Catalan Multiple 1% The next 40 1% 1% 1% 1% Malayalam Romanian 1% Armenian Telugu languages make 1% 1% Undetermined 1% Marathi Malay Greek 1% Vietnamese up ~13% of total 1% 7% 1% Finnish 1% Slovak 1% Serbian Polish 1%1% Hungarian Sanskrit 1% 7% Portuguese 2% 2% 7% Norwegian 2% Dutch Music 5% 2% Bengali 2% Tamil Persian 2% 2% Croatian 2% Unknown 3% Czech 3% Danish 3% Hebrew 5% Hindi 5% Thai 3% Turkish Urdu 3% 3% Korean Swedish 4% 3% Indonesian 4% Preservation and Access Preservation with Access • Cost effective preservation and access services • Preservation – TRAC-certified – Robust infrastructure – Long-term commitments on digital content facilitate planning, decision-making Preservation with Access (2) • Discovery – Bibliographic and full-text search of all materials – Extended discovery (ProQuest, EBSCO, OCLC, Ex Libris) – Mechanisms for local loading of records Preservation with Access (3) • Access and Use – Public domain and open access works – Full download of materials where possible* – Print on demand – Collections and APIs – Research Center* – Lawful uses of in-copyright works* Lawful uses • Access to users who have print disabilities • Section 108 uses of materials • Access to orphan works Terms of Access • Available to students, faculty, staff of partnering institutions – On library premises or authenticated into HathiTrust • Partner libraries own a print copy – One simultaneous user per print copy owned • Users must be on U.S. soil • One page at a time download How do we facilitate uses? • Fundamental issues of – Identification – Description – Rights Copyright Automatic Rights Determination • Conducted on all works at time of ingest and when records are modified – Public domain worldwide • US works published before 1923, US federal government publications, non-US works published prior to 1872 – Public domain in the United States • Non-US works published prior to 1923 Manual Rights Determination • IMLS-funded CRMS project – – – – – US-published works 1923-1963 Conformance with formalities Expanding to non-US works Double-blind review with expert review for conflicts Staff at 4 HathiTrust partner institutions (15 will take part in non-US) – As of February 2012 ~190,000 reviewed, more than 100,000 opened • Rights Holder Permissions Breakdown of HathiTrust book corpus by publication date Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011 Breakdown of HathiTrust book corpus by publication date Copyright status of books published pre-1923 and US works published 1923-1963 Copyright status of books published pre-1923 and US works published 1923-1963 ? Copyright status of books published pre-1923 and US works published 1923-1963 Copyright status of books published pre-1923 and US works published 1923-1963 In Print ? Collection Management A global change in the library environment 60% Academic print book collection already substantially duplicated in mass digitized book corpus 50% % of Titles in Local Collection June 2010 Median duplication: 31% 40% 30% 20% June 2009 Median duplication: 19% 10% 0% 0 20 40 60 80 Rank in 2008 ARL Investment Index 100 120 Digitized Books in Shared Repositories ~3.5M titles 3,500,000 3,000,000 ~75% of mass digitized corpus is ‘backed up’ in one or more shared print repositories ~2.5M Unique Titles 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 Sep-09 Oct-09 Nov-09 Dec-09 Mass digitized books in Hathi digital repository Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Mass digitized books in shared print repositories Collection Management, Development • Overlap – More than 50% median overlap with ARL institutions; higher for small liberal arts colleges • Pricing model based on Print holdings – Requires print holdings database – Also support expansion of legal uses, efforts in deduplication – Facilitate individual and collaborative collection development and management operations Holdings relevant to Law • • • • • Reports Committee Hearings Government Documents Law Reviews Information around court cases How to find out more • Web site “About” section • http://www.hathitrust.org/about • HathiTrust Research Center • http://www.hathitrust.org/htrc • Twitter • http://twitter.com/hathitrust • Monthly newsletter • http://www.hathitrust.org/updates • RSS: http://www.hathitrust.org/updates_rss • Contact us: [email protected] • Blogs: http://www.hathitrust.org/blogs • Large-scale search • Perspectives from HathiTrust Thank you very much!