HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense de Madrid December 4, 2012 Jeremy York, Project Librarian, HathiTrust Unless.
Download ReportTranscript HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense de Madrid December 4, 2012 Jeremy York, Project Librarian, HathiTrust Unless.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense de Madrid December 4, 2012 Jeremy York, Project Librarian, HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License. Partnership Arizona State University Baylor University Boston College Boston University Brandeis University California Digital Library Carnegie Mellon University Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Kansas State University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Syracuse University Texas A&M University Universidad Complutense de Madrid University of Arizona University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Delaware University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Vermont University of Virginia University of Washington University of WisconsinMadison Utah State University Virginia Tech Washington University Yale University Library Digital Repository • Launched 2008 • Initial focus on digitized book and journal content – 10.6 million total volumes – 5.5 million book titles – 275,000 serial titles – 3.2 million public domain (~31%) The Name • The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy Mission • To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge HathiTrust Universal Library Common Goal Single Entity, Many Partners Collections and Collaboration • Comprehensive collection - Preservation…with Access • Shared strategies – – – – – – Copyright Collection management, development Preservation Discovery / Use Bibliographic Indeterminacy Efficient user services • Public Good More Content Sources LC 1% Minnesota 1% Yale UNC-Chapel Hill 0% Harvard Madrid Virginia 0% Utah State 1% Indiana 1% Chicago 0% 0% 2% NCSU 0% Columbia NorthwesternDuke 0% 0% 1% 0% Illinois Penn State NYPL Princeton Purdue 0% 0% 3% 3% 0% Cornell Wisconsin 4% 5% Michigan 45% California 33% Language Distribution (1) Arabic Latin 2%Italian 1% Japanese 3% Remaining Languages 14% 3% Russian 4% Chinese 4% Spanish 5% French 7% The top 10 languages make up ~86% of all content English 48% German 9% Language Distribution (2) Ancient-Greek Ukrainian Bulgarian Panjabi Catalan Multiple 1% The next 40 1% 1% 1% 1% Malayalam Romanian 1% Armenian Telugu languages make 1% 1% Undetermined 1% Marathi Malay Greek 1% Vietnamese up ~13% of total 1% 7% 1% Finnish 1% Slovak 1% Serbian Polish 1%1% Hungarian Sanskrit 1% 7% Portuguese 2% 2% 7% Norwegian 2% Dutch Music 5% 2% Bengali 2% Tamil Persian 2% 2% Croatian 2% Unknown 3% Czech 3% Danish 3% Hebrew 5% Hindi 5% Thai 3% Turkish Urdu 3% 3% Korean Swedish 4% 3% Indonesian 4% Dates 1900-1909 4% 1910-1919 4% 1920-1929 4% 1930-1939 4% 1940-1949 4% 1950-1959 6% 1600-1699 0% 1800-1849 3% 1700-1799 1850-1899 1% 8% 1500-1599 0% 0-1500 0% 2000-2009 10% 1990-1999 14% 1980-1989 15% 1960-1969 11% 1970-1979 13% 100% 90% Yale Utah State 80% UNC-Chapel Hill 70% Penn State Purdue Northwestern 60% 50% NCSU Illinois Duke 40% Chicago 30% Minnesota Virginia Madrid 20% 10% 0% LoC Harvard Columbia Indiana Princeton NYPL TDR Preservation... with Access 1. Rights Determination • Automated • Manual Manual Bibliographic 2. Sensitivity to diverse rights regimes • Public domain worldwide • Public domain in the United States • Public domain in .... Opened CRMS-US CRMS-World Total Reviewed Percent 178,872 338,463 53% 15,181 30,965 49% 194,053 369,428 53% 3. Permissions • Open Access • Preservation • Born-digital Copyright Distribution U.S. Federal Government Documents (worldwide) 4% In-copyright or undetermined 69% "Public Domain” 31% Public Domain (worldwide) 15% Public Domain (US) 11% Open Access .1% Creative Commons .04% 4. Lawful access • U.S. – Print disabilities – Damaged or lost • Extensible Relation to print, pricing model Pay by overlap Pay per GB Full view = shared Limited = if “held” Holdings Database • Single-part monographs – OCLC #; Local system ID; Holding Status; Condition • Multi-part monographs – Include enumeration and chronology • Serials – OCLC #; Local system ID; ISSN More • • • • Growing the collection Expanding rights review Expanding access Partner-wide and local strategies Better Leveraging expertise • Preservation • Access • Relationships 1. Preservation • Difficult questions • Collective experience • Shared resources 2. Access • Access – Full-text search – Collections – APIs – Datasets 3. Relationships • Relationships – Bibliographic records – Bib records and objects – Digital objects – Digital and print • Strategies – Metadata management – Targeted efforts Better • Leveraging expertise and resources for – Preservation – Access – Relationships Together Governance Budget, Finances Decision-making Policy Enterprise Management Repository Administration Repository Administration Communication and Coordination with partner institutions Hardware configuration and maintenance Data management (content storage, backup, integrity checks, deletion) Project management Planning Web and application server configuration and maintenance Security Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item-level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for nonGoogle volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Print on Demand Content Ingest Content Access Quality Assurance User Services Transformation PageTurner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Financial contributions of partners Research Center Bibliographic Catalog APIs HathiTrust Functional Framework Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy More Better • Content • Rights • Determinations • Permissions • Lawful access • Collaborate globally, plan locally • Shared expertise /resources for: • Preservation • Access • Relationships Together ? Thank you!