HATHITRUST A Shared Digital Repository HathiTrust: An Above Campus Solution Sarah Michalak RLUK Birmingham November 14, 2014
Download ReportTranscript HATHITRUST A Shared Digital Repository HathiTrust: An Above Campus Solution Sarah Michalak RLUK Birmingham November 14, 2014
HATHITRUST A Shared Digital Repository HathiTrust: An Above Campus Solution Sarah Michalak RLUK Birmingham November 14, 2014 11/6/2015 Today’s Discussion - HathiTrust • Mission and partnership • Collections • Services • HathiTrust Research Center • Benefits for Libraries 11/6/2015 The Name • The meaning behind the name • • • • • • 11/6/2015 Hathi (hah-tee)--Hindi for elephant Never forgets Full of wisdom Secure Trustworthy Big, strong The Mission and Partnership 11/6/2015 Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Efforts include, but are not limited to …building comprehensive collections co-owned and managed by partners. …enabling access by users with print disabilities. …supporting computational research with the collections. …stimulating shared collection storage strategies among libraries. 11/6/2015 HathiTrust Members Allegheny College Arizona State University Baylor University Boston College Boston University Brandeis University Brown University California Digital Library Carnegie Mellon University Colby College Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University** Getty Research Institute Harvard University Library Indiana University Iowa State University Johns Hopkins University Kansas State University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University Montana State University Mount Holyoke College New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University November 3, 2014 7 The Ohio State University The Pennsylvania State University Princeton University Purdue University Rutgers University Stanford University Syracuse University Temple University Texas A&M University Texas Tech Tufts University Universidad Complutense de Madrid University of Alabama University of Alberta University of Arizona University of British Columbia University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Delaware University of Florida University of Houston University of Illinois University of Illinois at Chicago The University of Iowa** University of Kansas** University of Maine University of Maryland** University of Massachusetts, Amherst University of Miami University of Michigan University of Minnesota** University of Missouri** University of NebraskaLincoln** University of New Mexico The University of North Carolina at Chapel Hill** University of Notre Dame University of Oklahoma University of Pennsylvania University of Pittsburgh University of Queensland University of Tennessee, Knoxville** University of Texas University of Utah University of Vermont University of Virginia** University of Washington** University of WisconsinMadison** Utah State University** Vanderbilt University Virginia Tech Wake Forest University Washington University Yale University Library How Are Costs Shared? • Public domain volumes: All partners share in infrastructure costs for each item. • In copyright volumes: Partners share costs based on their holdings. • Infrastructure cost per volume: ~$0.168 per volume per year. • All partners pay an additional amount above costs to fund new programs and investigations. 11/6/2015 Collections and Access 11/6/2015 HATHITRUST.ORG November 3, 2014 10 12.5 million total volumes 6.4 million book titles 327,000 serial titles 575,889 government publications 4.6 million volumes in the public domain (~37%) Link takes you to HathiTrust Records loaded into DPLA, local library catalogs, and commercial databases Collective Stewardship • Leverage expertise across institutions • Distributed Functions and Services • Preservation repository and access services • University of Michigan • Mirror site: Indiana University • Metadata management services • California Digital Library • HathiTrust Research Center • Indiana University and University of Illinois 5 November 2014 13 Collection Sources Ohio State, 0.42% Northwestern, 0.45% Columbia, 0.52% Keio, 0.72% LoC, 0.87% Universidad Complutense, 0.92% Mnnesota, 1.11% Virginia, 0.41% Chicago, 0.41% Purdue, 0.38% Yale, 0.19% UNC Chapel Hill, 0.14% Getty Research Institute, 0.13% Massachusetts, 0.09% Florida, 0.08% Duke, 0.06% Connecticutt, 0.04% Boston College, 0.03% NC State, 0.03% Mgill, 0.01% PSU, 1.19% Princeton, 2.02% Texas A&M, 0.01% NYPL, 2.35% Alberta, < 0.01% llinois (UC), 2.45% Cornell, 4.02% Indiana, 4.19% Delaware, < 0.01% Utah State, < 0.01% Wisconsin, 4.47% Harvard, 6.15% Michigan, 37.54% California, 28.63% 11/6/2015 14 Dates 1850-1899 1900-1909 10% 1910-1919 4% 4% 1920-1929 4% 1930-1939 4% 1940-1949 4% 1950-1959 6% 11/6/2015 0-1500, 0.04% 1500-1599, 0.07% 1600-1699, 0.01% 2000-2009 1700-1799, 0.01% 10% 1800-1849 3% 1990-1999 14% 1980-1989 14% 1960-1969 11% 1970-1979 13% Language Distribution (1) Remaining Languages, 13% The top 10 languages make up ~87% of all content Latin, 1% Arabic, 2% Italian, 3% Japanese, 3% English, 49% Russian, 4% Chinese, 4% Spanish, 5% French, 7% 11/6/2015 German, 9% Language Distribution (2) Greek,-Ancient(to-1453), 1% Bulgarian, 1% Serbian, 1% 1% Telugu, 1% Multiple-languages, Catalan, Malayalam, 1% 1% Panjabi, 1% Finnish, 1% Romanian, 1% Malay, Slovak, 1% 1% Marathi, 1% Armenian, 1% Ukrainian, 1% Greek,-Modern- Vietnamese, 1% (1453--), 2% Sanskrit, 2% Slovenian , 1% Turkish,Yiddish, 1% Ottoman, 1% Nepali, 0% The next 40 languages make up ~12% of total Portuguese, 7% Polish, 7% Dutch, 5% Hebrew, 5% Norwegian, 2% Hindi, 5% Bengali, 2% Hungarian, 2% Tamil, 2% Indonesian, 4% Persian, 2% Korean, 4% Croatian, 3% Czech, 3% 11/6/2015 Danish, 3% Turkish, 3% Urdu, 3% Thai, 3% Swedish, 4% Copyright Distribution Public Domain Worldwide 21% In Copyright or undetermined 63% “Public domain” 38% US Government Documents 5% Public Domain (US) 11% Creative Commons 0.06% 11/6/2015 18 Open Access 0.06% Services Preservation with Access • Preservation – TRAC-certified – Long-term commitments to preserve digital content facilitate planning, decision-making • Discovery – Bibliographic and full-text search of all materials – Mechanisms for local loading of records • Access and Use – – – – Full text search (all users) Public domain and open access works (all users) Collections and APIs (all users) Lawful uses of in-copyright works (members) 10 September, 2014 | 20 Access: Lawful uses of in-copyright works • Sensitive to multiple legal regimes – Full-text search (everyone everywhere) – Access to users who have print disabilities (through member proxy in US, and where law permits)** – Access works that are damaged or missing and also out of print and unavailable (members in US only) **Terms and conditions at http://www.hathitrust.org/access_use#icaccess 10 September, 2014 | 21 Collective Action: Copyright Review • Copyright Review Management System – Systematic manual review of copyright registrations to determine status of portions of the HathiTrust Collection – CRMS US: Published in US, 1923-1963 • 316,396 reviewed / 166,753 PD (~53%) – CRMS-World: Published in UK (1874-1944), Canada, Australia (18941964) • 145,804 reviewed / 75,775 PD-world 9 (~52%) 10 September, 2014 |2122October 2014 22 HathiTrust Research Center • http://www.hathitrust.org/htrc • Operated by the University of Illinois, Urbana-Champaign and Indiana University, with additional financial support from HathiTrust. • Co-led by Beth Plale (Indiana) and Stephen Downie (Illinois). • Goal: enable researchers world-wide to carry out computational investigation of HT repository. 10 September, 2014 | 23 Aims of the HTRC • Focus on developing services to researchers • Develop model for access: the ‘workset’ • Develop tools that facilitate research by digital humanities and informatics communities • Develop secure cyberinfrastructure that allows computational investigation of entire copyrighted and public domain HathiTrust repository 10 September, 2014 | 24 Example Projects Supported by HTRC • Muñoz, Trevor, University of Maryland. “Distributed Metadata Correction and Annotation.” – Correction, annotation and enhancement of HT records and export as linked data • Page, Kevin, Oxford University. “ElEPHãT: Early English Print in HathiTrust, a Linked Semantic Workset Prototype” – Development of secondary worksets based on both HT and the Early English Books Online Text Creation Partnership (EEBO-TCP). • Burton, Vernon. “The South as ‘Other,’ the Southerner as ‘Stranger.’” – Explore how attitudes expressed in print about slavery, southerners, and non-southerners have changed over both time and space. • Ted Underwood, Associate Professor of English at the University of Illinois, UrbanaChampaign. – Using public domain texts received from HathiTrust to explore changing relationships in literary genres from 1700-1899. 10 September, 2014 | 25 HathiTrust overall benefits to libraries • Digital Curation – – – – – – Drive costs down Reduce “bibliographic indeterminacy” Make meaningful decisions about formats and quality Increase discoverability, use Consolidate development talent Improve strength of archiving • Print Curation – Means to associate our print holdings – Coordinated record-keeping • Subsidiary benefits – Quantify problems – Collective attention to solving shared problems – Understanding relationship between collective and local 10 September, 2014 | 26 11/6/2015 Benefits for UNC-Chapel Hill • Preservation solution for UNC digitized books and journals. • Online access to hundred’s of thousands of titles we do not have in our collection. • Live links to Hathi materials in our catalog is a convenience for users and enriches our collections. • Hathi-led “community developments” provide tools and expertise we might not have otherwise. • Digital humanities scholars and other researchers have the benefit of computational research over the large-scale corpus. 10 September, 2014 | 28 10 September, 2014 | 29 The HathiTrust Digital Library Large Scale Digital Preservation and Access For the Public Good 10 September, 2014 | 30