HATHITRUST A Shared Digital Repository HathiTrust: On TRAC ICPSR Applied Data Science Repository Requirements and Assessment: HathiTrust July 26, 2012 Jeremy York, Project Librarian, HathiTrust.
Download ReportTranscript HATHITRUST A Shared Digital Repository HathiTrust: On TRAC ICPSR Applied Data Science Repository Requirements and Assessment: HathiTrust July 26, 2012 Jeremy York, Project Librarian, HathiTrust.
HATHITRUST A Shared Digital Repository
HathiTrust: On TRAC
ICPSR Applied Data Science Repository Requirements and Assessment: HathiTrust July 26, 2012 Jeremy York, Project Librarian, HathiTrust
Arizona State University Baylor University Boston College Boston University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Florida State University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University` Michigan State University New York Public Library New York University North Carolina Central University
Partnership
North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of Arizona University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin Madison Utah State University Washington University Yale University Library
Digital Repository
• • Launched 2008 Initial focus on digitized book and journal content – 10.4 million volumes – 5.5 book titles – 270,000+ serial titles – 3.1 public domain volumes (~30%)
Mission
• To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge
HathiTrust
Universal Library Common Goal Single Entity, Many Partners
Collections and Collaboration
• • • Comprehensive collection Preservation…with Access Shared strategies Public Good
Services
• • • • • • • Long-term preservation – Bit-level and migration Bibliographic search Full-text search Reading and download capabilities Print on demand Collections Datasets, Research Center
Governance
• • • • • 12-member Board of Governors – April 2012 Manages budget and finances Budget separately held within the University of Michigan Strategic Advisory Board Working Groups and Committees
CRL Audit
• Why – Value Community Standards – Accountability, Openness, Transparency • Desire to know how we were doing, and let the community know
What is TRAC
• • • Trusted Digital Repositories (OCLC, RLG) 2002 – A framework of attributes and responsibilities – One of recommend items was process for certifying digital repositories TRAC (RLG, NARA) 2007 – CRL, nestor, DCC, National Library of Australia Administered by CRL in US
CRL Audit (2)
• • Guided by criteria included in TRAC, as well as other metrics developed by CRL HathiTrust’s practices are sound…appropriate to the content being archived and the general needs of the CRL community.
What was involved?
• • Timeline – Data gathering: November 2009 - December 2010 – Site visit May 2010 Logistics – Question by email, documentation – Phone conversations – Staff: Project Librarian, Digital Preservation Library, Executive Director
Where we were
• • Developmental stages – Changing, growing Core pieces in place
Results
• • • Organizational Infrastructure (2) – Mission statement, succession plan, staff, assessment, accountability, business plan, agreements Digital Object Management (3) – Properties preserved, SIP, AIP, validation, naming conventions, identifiers, understandability, preservation strategies, logging, access policies Technologies Technical Infrastructure Security (4) – Hardware, software, error-handling, change management, security, staff roles, disaster preparedness
Key Issues
• • • • Staff/Organization Rights and ownership of HathiTrust enterprise assets Succession plan Clarify and strengthen quality assurance and print archiving components of HathiTrust program
Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning Collective Work: Working Groups and Committees • Operational • • • • • • • • Strategic Collections Discovery Interface Full-text Search Distributed work • • • Driven by needs of institutions Leverage across the partnership Projects, Grant Work, Ingest Specifications, PageTurner, Bibliographic Data Management HathiTrust
Governance Budget, Finances Decision-making Policy Planning e-Commerce Print on Demand Financial contributions of partners Enterprise Management Communication and Coordination with partner institutions Project management Content Ingest Transformation Validation Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Content Access PageTurner Collection Builder Large-scale Search Research Center Bibliographic Catalog APIs Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Quality Assurance Quality Review Content Certification Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions User Services Usability User support (helpdesk) HathiTrust Functional Framework Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for non Google volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Legal Risk management (use of materials) Partner agreements Advocacy
Key Issues (2)
• • • Rights and ownership of HathiTrust enterprise assets Succession plan Clarify and strengthen quality assurance and print archiving components of HathiTrust program
Future Work
• • • • Disaster Recovery Change Management – Moving to new formats: image, audio, born-digital Governance Certification updates