HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense de Madrid December 4, 2012 Jeremy York, Project Librarian, HathiTrust Unless.

Download Report

Transcript HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense de Madrid December 4, 2012 Jeremy York, Project Librarian, HathiTrust Unless.

HATHITRUST
A Shared Digital Repository
More, Better, Together:
HathiTrust Accomplishments
and Aspirations
The Researcher of Tomorrow
Universidad Complutense de Madrid
December 4, 2012
Jeremy York, Project Librarian, HathiTrust
Unless otherwise noted, these slides and their contents are licensed under a Creative Commons
Attribution Unported License.
Partnership
Arizona State University
Baylor University
Boston College
Boston University
Brandeis University
California Digital Library
Carnegie Mellon University
Columbia University
Cornell University
Dartmouth College
Duke University
Emory University
Florida State University
Getty Research Institute
Harvard University Library
Indiana University
Johns Hopkins University
Kansas State University
Lafayette College
Library of Congress
Massachusetts Institute of
Technology
McGill University`
Michigan State University
New York Public Library
New York University
North Carolina Central
University
North Carolina State
University
Northwestern University
The Ohio State University
The Pennsylvania State
University
Princeton University
Purdue University
Stanford University
Syracuse University
Texas A&M University
Universidad Complutense
de Madrid
University of Arizona
University of Calgary
University of California
Berkeley
Davis
Irvine
Los Angeles
Merced
Riverside
San Diego
San Francisco
Santa Barbara
Santa Cruz
The University of Chicago
University of Connecticut
University of Delaware
University of Florida
University of Illinois
University of Illinois at Chicago
The University of Iowa
University of Maryland
University of Miami
University of Michigan
University of Minnesota
University of Missouri
University of Nebraska-Lincoln
The University of North
Carolina at Chapel Hill
University of Notre Dame
University of Pennsylvania
University of Pittsburgh
University of Utah
University of Vermont
University of Virginia
University of Washington
University of WisconsinMadison
Utah State University
Virginia Tech
Washington University
Yale University Library
Digital Repository
• Launched 2008
• Initial focus on digitized book and journal
content
– 10.6 million total volumes
– 5.5 million book titles
– 275,000 serial titles
– 3.2 million public domain (~31%)
The Name
• The meaning behind the name
– Hathi (hah-tee)--Hindi for elephant
– Big, strong
– Never forgets, wise
– Secure
– Trustworthy
Mission
• To contribute to the common good by collecting,
organizing, preserving, communicating, and
sharing the record of human knowledge
HathiTrust
Universal Library
Common Goal
Single Entity, Many Partners
Collections and Collaboration
• Comprehensive collection
- Preservation…with Access
• Shared strategies
–
–
–
–
–
–
Copyright
Collection management, development
Preservation
Discovery / Use
Bibliographic Indeterminacy
Efficient user services
• Public Good
More
Content Sources
LC
1%
Minnesota
1%
Yale UNC-Chapel Hill
0%
Harvard Madrid Virginia 0%
Utah
State
1%
Indiana
1%
Chicago
0%
0%
2%
NCSU
0%
Columbia
NorthwesternDuke
0%
0%
1%
0% Illinois
Penn State
NYPL Princeton
Purdue
0%
0%
3%
3%
0%
Cornell
Wisconsin 4%
5%
Michigan
45%
California
33%
Language Distribution (1)
Arabic Latin
2%Italian 1%
Japanese 3%
Remaining
Languages
14%
3%
Russian
4%
Chinese
4%
Spanish
5%
French
7%
The top 10 languages make up
~86% of all content
English
48%
German
9%
Language Distribution (2)
Ancient-Greek
Ukrainian Bulgarian
Panjabi Catalan
Multiple
1%
The next 40
1%
1%
1%
1% Malayalam
Romanian
1%
Armenian
Telugu
languages make
1%
1%
Undetermined
1% Marathi Malay
Greek
1%
Vietnamese
up ~13% of total
1%
7%
1%
Finnish
1%
Slovak
1%
Serbian
Polish
1%1%
Hungarian Sanskrit 1%
7%
Portuguese
2%
2%
7%
Norwegian
2%
Dutch
Music
5%
2% Bengali
2%
Tamil
Persian
2%
2%
Croatian
2%
Unknown
3%
Czech
3%
Danish
3%
Hebrew
5%
Hindi
5%
Thai
3%
Turkish Urdu
3%
3%
Korean
Swedish 4%
3%
Indonesian
4%
Dates
1900-1909
4%
1910-1919
4%
1920-1929
4%
1930-1939
4%
1940-1949
4%
1950-1959
6%
1600-1699
0%
1800-1849
3%
1700-1799
1850-1899
1%
8%
1500-1599
0%
0-1500
0%
2000-2009
10%
1990-1999
14%
1980-1989
15%
1960-1969
11%
1970-1979
13%
100%
90%
Yale
Utah State
80%
UNC-Chapel Hill
70%
Penn State
Purdue
Northwestern
60%
50%
NCSU
Illinois
Duke
40%
Chicago
30%
Minnesota
Virginia
Madrid
20%
10%
0%
LoC
Harvard
Columbia
Indiana
Princeton
NYPL
TDR
Preservation...
with Access
1. Rights Determination
• Automated
• Manual
Manual
Bibliographic
2. Sensitivity to diverse rights regimes
• Public domain worldwide
• Public domain in the United States
• Public domain in ....
Opened
CRMS-US
CRMS-World
Total
Reviewed
Percent
178,872
338,463
53%
15,181
30,965
49%
194,053
369,428
53%
3. Permissions
• Open Access
• Preservation
• Born-digital
Copyright Distribution
U.S. Federal
Government
Documents
(worldwide)
4%
In-copyright or
undetermined
69%
"Public Domain”
31%
Public Domain
(worldwide)
15%
Public
Domain
(US)
11%
Open Access
.1%
Creative Commons
.04%
4. Lawful access
• U.S.
– Print disabilities
– Damaged or lost
• Extensible
Relation to print, pricing model
Pay by overlap
Pay per GB
Full view = shared
Limited = if “held”
Holdings Database
• Single-part monographs
– OCLC #; Local system ID; Holding Status; Condition
• Multi-part monographs
– Include enumeration and chronology
• Serials
– OCLC #; Local system ID; ISSN
More
•
•
•
•
Growing the collection
Expanding rights review
Expanding access
Partner-wide and local strategies
Better
Leveraging expertise
• Preservation
• Access
• Relationships
1. Preservation
• Difficult questions
• Collective experience
• Shared resources
2. Access
• Access
– Full-text search
– Collections
– APIs
– Datasets
3. Relationships
• Relationships
– Bibliographic records
– Bib records and objects
– Digital objects
– Digital and print
• Strategies
– Metadata management
– Targeted efforts
Better
• Leveraging expertise and resources for
– Preservation
– Access
– Relationships
Together
Governance
Budget, Finances
Decision-making
Policy
Enterprise
Management
Repository
Administration
Repository
Administration
Communication
and Coordination
with partner
institutions
Hardware
configuration and
maintenance
Data management
(content storage,
backup, integrity
checks, deletion)
Project
management
Planning
Web and
application server
configuration and
maintenance
Security
Hardware selection
and replacement
Content and
Metadata
specifications
Permissions
Rights
Management
Bibliographic
Data
Management
Copyright
determination
Entity description
(record-level)
Copyright review
Object
identification
(item-level)
Copyright
information
management
(database)
Data availability
Collection
Development
Digital
• Expansion beyond
books and journals
(born-digital,
images and maps,
audio)
• Selection of
content (for nonGoogle volume
ingest and pilots
projects)
Print
• Cloud Library (effect
of digital on print)
Rightsholder
permissions
Disaster Recovery
Logging
Processes for
ensuring content
integrity
e-Commerce
Print on Demand
Content Ingest
Content Access
Quality
Assurance
User Services
Transformation
PageTurner
Quality Review
Usability
Validation
Collection Builder
Content
Certification
User support
(helpdesk)
Large-scale Search
Financial
contributions
of partners
Research Center
Bibliographic
Catalog
APIs
HathiTrust Functional
Framework
Outreach
Project website
Monthly
newsletter
Papers and
presentations
Communication
with potential
partners
Surveys, general
inquiries
Repository
evaluation and
audit (e.g.,
DRAMBORA,
TRAC)
Legal
Risk management
(use of materials)
Partner
agreements
Advocacy
More
Better
• Content
• Rights
• Determinations
• Permissions
• Lawful access
• Collaborate globally,
plan locally
• Shared expertise
/resources for:
• Preservation
• Access
• Relationships
Together
?
Thank you!