HATHITRUST A Shared Digital Repository HathiTrust: An Above Campus Solution Sarah Michalak RLUK Birmingham November 14, 2014

Download Report

Transcript HATHITRUST A Shared Digital Repository HathiTrust: An Above Campus Solution Sarah Michalak RLUK Birmingham November 14, 2014

HATHITRUST
A Shared Digital Repository
HathiTrust:
An Above Campus Solution
Sarah Michalak
RLUK Birmingham
November 14, 2014
11/6/2015
Today’s Discussion - HathiTrust
• Mission and partnership
• Collections
• Services
• HathiTrust Research Center
• Benefits for Libraries
11/6/2015
The Name
• The meaning behind the name
•
•
•
•
•
•
11/6/2015
Hathi (hah-tee)--Hindi for elephant
Never forgets
Full of wisdom
Secure
Trustworthy
Big, strong
The Mission and Partnership
11/6/2015
Mission
To contribute to the common good by collecting, organizing, preserving,
communicating, and sharing the record of human knowledge.
Efforts include, but are not limited to
…building comprehensive collections co-owned and managed by
partners.
…enabling access by users with print disabilities.
…supporting computational research with the collections.
…stimulating shared collection storage strategies among libraries.
11/6/2015
HathiTrust Members
Allegheny College
Arizona State University
Baylor University
Boston College
Boston University
Brandeis University
Brown University
California Digital Library
Carnegie Mellon University
Colby College
Columbia University
Cornell University
Dartmouth College
Duke University
Emory University
Florida State University**
Getty Research Institute
Harvard University Library
Indiana University
Iowa State University
Johns Hopkins University
Kansas State University
Lafayette College
Library of Congress
Massachusetts Institute of
Technology
McGill University`
Michigan State University
Montana State University
Mount Holyoke College
New York Public Library
New York University
North Carolina Central
University
North Carolina State
University
Northwestern University
November 3, 2014
7
The Ohio State University
The Pennsylvania State
University
Princeton University
Purdue University
Rutgers University
Stanford University
Syracuse University
Temple University
Texas A&M University
Texas Tech
Tufts University
Universidad Complutense
de Madrid
University of Alabama
University of Alberta
University of Arizona
University of British Columbia
University of Calgary
University of California
Berkeley
Davis
Irvine
Los Angeles
Merced
Riverside
San Diego
San Francisco
Santa Barbara
Santa Cruz
The University of Chicago
University of Connecticut
University of Delaware
University of Florida
University of Houston
University of Illinois
University of Illinois at
Chicago
The University of Iowa**
University of Kansas**
University of Maine
University of Maryland**
University of Massachusetts,
Amherst
University of Miami
University of Michigan
University of Minnesota**
University of Missouri**
University of NebraskaLincoln**
University of New Mexico
The University of North
Carolina at Chapel Hill**
University of Notre Dame
University of Oklahoma
University of Pennsylvania
University of Pittsburgh
University of Queensland
University of Tennessee,
Knoxville**
University of Texas
University of Utah
University of Vermont
University of Virginia**
University of Washington**
University of WisconsinMadison**
Utah State University**
Vanderbilt University
Virginia Tech
Wake Forest University
Washington University
Yale University Library
How Are Costs Shared?
• Public domain volumes: All partners share in infrastructure costs
for each item.
• In copyright volumes: Partners share costs based on their
holdings.
• Infrastructure cost per volume: ~$0.168 per volume per year.
• All partners pay an additional amount above costs to fund new
programs and investigations.
11/6/2015
Collections and Access
11/6/2015
HATHITRUST.ORG
November 3, 2014
10
12.5 million total volumes
6.4 million book titles
327,000 serial titles
575,889 government publications
4.6 million volumes in the public domain
(~37%)
Link takes you to HathiTrust
Records loaded into DPLA, local library
catalogs, and commercial databases
Collective Stewardship
• Leverage expertise across institutions
• Distributed Functions and Services
• Preservation repository and access services
• University of Michigan
• Mirror site: Indiana University
• Metadata management services
• California Digital Library
• HathiTrust Research Center
• Indiana University and University of Illinois
5 November 2014
13
Collection Sources
Ohio State, 0.42%
Northwestern, 0.45%
Columbia, 0.52%
Keio, 0.72%
LoC, 0.87%
Universidad Complutense, 0.92%
Mnnesota, 1.11%
Virginia, 0.41%
Chicago, 0.41%
Purdue, 0.38%
Yale, 0.19%
UNC Chapel Hill, 0.14%
Getty Research Institute, 0.13%
Massachusetts, 0.09%
Florida, 0.08%
Duke, 0.06%
Connecticutt, 0.04%
Boston College, 0.03%
NC State, 0.03%
Mgill, 0.01%
PSU, 1.19%
Princeton, 2.02%
Texas A&M, 0.01%
NYPL, 2.35%
Alberta, < 0.01%
llinois (UC), 2.45%
Cornell, 4.02%
Indiana, 4.19%
Delaware, < 0.01%
Utah State, < 0.01%
Wisconsin, 4.47%
Harvard, 6.15%
Michigan, 37.54%
California, 28.63%
11/6/2015
14
Dates
1850-1899
1900-1909 10%
1910-1919
4%
4%
1920-1929
4%
1930-1939
4%
1940-1949
4%
1950-1959
6%
11/6/2015
0-1500, 0.04%
1500-1599, 0.07%
1600-1699, 0.01%
2000-2009 1700-1799, 0.01%
10%
1800-1849
3%
1990-1999
14%
1980-1989
14%
1960-1969
11%
1970-1979
13%
Language Distribution (1)
Remaining
Languages, 13%
The top 10 languages make up
~87% of all content
Latin, 1%
Arabic, 2%
Italian, 3%
Japanese, 3%
English, 49%
Russian, 4%
Chinese, 4%
Spanish, 5%
French, 7%
11/6/2015
German, 9%
Language Distribution (2)
Greek,-Ancient(to-1453), 1%
Bulgarian, 1%
Serbian, 1%
1%
Telugu, 1% Multiple-languages,
Catalan,
Malayalam, 1%
1%
Panjabi, 1%
Finnish, 1%
Romanian, 1%
Malay,
Slovak, 1%
1%
Marathi, 1%
Armenian, 1%
Ukrainian, 1%
Greek,-Modern- Vietnamese, 1%
(1453--), 2%
Sanskrit, 2%
Slovenian
, 1%
Turkish,Yiddish, 1%
Ottoman,
1%
Nepali, 0%
The next 40
languages
make up
~12% of
total
Portuguese, 7%
Polish, 7%
Dutch, 5%
Hebrew, 5%
Norwegian, 2%
Hindi, 5%
Bengali, 2%
Hungarian, 2%
Tamil, 2%
Indonesian, 4%
Persian, 2%
Korean, 4%
Croatian, 3%
Czech, 3%
11/6/2015
Danish, 3%
Turkish, 3%
Urdu, 3% Thai, 3%
Swedish, 4%
Copyright Distribution
Public Domain
Worldwide
21%
In Copyright or
undetermined
63%
“Public domain”
38%
US Government
Documents
5%
Public Domain (US)
11%
Creative Commons
0.06%
11/6/2015
18
Open Access
0.06%
Services
Preservation with Access
• Preservation
– TRAC-certified
– Long-term commitments to preserve digital content facilitate planning,
decision-making
• Discovery
– Bibliographic and full-text search of all materials
– Mechanisms for local loading of records
• Access and Use
–
–
–
–
Full text search (all users)
Public domain and open access works (all users)
Collections and APIs (all users)
Lawful uses of in-copyright works (members)
10 September, 2014 | 20
Access: Lawful uses of
in-copyright works
• Sensitive to multiple legal regimes
– Full-text search (everyone everywhere)
– Access to users who have print disabilities (through member proxy in
US, and where law permits)**
– Access works that are damaged or missing and also out of print and
unavailable (members in US only)
**Terms and conditions at http://www.hathitrust.org/access_use#icaccess
10 September, 2014 | 21
Collective Action: Copyright Review
• Copyright Review Management System
– Systematic manual review of copyright registrations to determine
status of portions of the HathiTrust Collection
– CRMS US: Published in US, 1923-1963
• 316,396 reviewed / 166,753 PD (~53%)
– CRMS-World: Published in UK (1874-1944), Canada, Australia (18941964)
• 145,804 reviewed / 75,775 PD-world 9 (~52%)
10 September, 2014 |2122October 2014
22
HathiTrust Research Center
• http://www.hathitrust.org/htrc
• Operated by the University of Illinois, Urbana-Champaign and
Indiana University, with additional financial support from
HathiTrust.
• Co-led by Beth Plale (Indiana) and Stephen Downie (Illinois).
• Goal: enable researchers world-wide to carry out
computational investigation of HT repository.
10 September, 2014 | 23
Aims of the HTRC
• Focus on developing services to researchers
• Develop model for access: the ‘workset’
• Develop tools that facilitate research by digital humanities and
informatics communities
• Develop secure cyberinfrastructure that allows computational
investigation of entire copyrighted and public domain
HathiTrust repository
10 September, 2014 | 24
Example Projects Supported by HTRC
• Muñoz, Trevor, University of Maryland. “Distributed Metadata Correction and Annotation.”
– Correction, annotation and enhancement of HT records and export as linked data
• Page, Kevin, Oxford University. “ElEPHãT: Early English Print in HathiTrust, a Linked Semantic
Workset Prototype”
– Development of secondary worksets based on both HT and the Early English Books Online Text
Creation Partnership (EEBO-TCP).
• Burton, Vernon. “The South as ‘Other,’ the Southerner as ‘Stranger.’”
– Explore how attitudes expressed in print about slavery, southerners, and non-southerners have
changed over both time and space.
• Ted Underwood, Associate Professor of English at the University of Illinois, UrbanaChampaign.
– Using public domain texts received from HathiTrust to explore changing relationships in literary
genres from 1700-1899.
10 September, 2014 | 25
HathiTrust overall benefits to libraries
• Digital Curation
–
–
–
–
–
–
Drive costs down
Reduce “bibliographic indeterminacy”
Make meaningful decisions about formats and quality
Increase discoverability, use
Consolidate development talent
Improve strength of archiving
• Print Curation
– Means to associate our print holdings
– Coordinated record-keeping
• Subsidiary benefits
– Quantify problems
– Collective attention to solving shared problems
– Understanding relationship between collective and local
10 September, 2014 | 26
11/6/2015
Benefits for UNC-Chapel Hill
• Preservation solution for UNC digitized books and journals.
• Online access to hundred’s of thousands of titles we do not
have in our collection.
• Live links to Hathi materials in our catalog is a convenience for
users and enriches our collections.
• Hathi-led “community developments” provide tools and
expertise we might not have otherwise.
• Digital humanities scholars and other researchers have the
benefit of computational research over the large-scale corpus.
10 September, 2014 | 28
10 September, 2014 | 29
The HathiTrust Digital Library
Large Scale Digital Preservation and Access
For the Public Good
10 September, 2014 | 30