HTRC Workshop 101
Download
Report
Transcript HTRC Workshop 101
HTRC Workshop 101
THATCamp Gainesville
April 24, 2014
Outline
• HathiTrust and HathiTrust Research Center overview
• How to Use the HTRC Portal
– Workset Builder
– Algorithm Analysis
• Opportunities to connect you with the HathiTrust
Research Center
HathiTrust “Wow” Numbers
•
•
•
•
•
•
•
•
11,135,776 total volumes
5,801,121 book titles
290,893 serial titles
3,897,521,600 pages
499 terabytes
132 miles
9,048 tons
Public Domain: 3,743,574 volumes(~34% of
total)
http://www.hathitrust.org
Content Distribution
Dates
1900-1909
4%
1910-1919
4%
1920-1929
4%
1930-1939
4%
1940-1949
4%
1950-1959
6%
1600-1699
0%
1800-1849
3%
1700-1799
1850-1899
1%
8%
1500-1599
0%
0-1500
0%
2000-2009
10%
1990-1999
14%
1980-1989
15%
1960-1969
11%
1970-1979
13%
Language Distribution
The top 10 languages make up
~86% of all content
Board of Governors
Executive Committee
Executive Director
University
of
Michigan
HathiTrust
Digital
Library
90+
partners
Data
Copy
#1
HathiTrust
Research
Center
Indiana
University
Data
Copy
#2
Indiana
University
University
of
Illinois
HathiTrust Collection Builder
HTRC Portal
www.hathitrust.org/htrc
Log in to HTRC Portal
Create a Log In
How To Start a Workset
Log In Again to Workset Builder
Workset Builder
Why Worksets?
• The result of a first-level, rough filter
• Better scale for intensive analytics
• Provides essential scope for certain analytics
– Word frequency scope over Bacon’s essays
• Some tools (are trained to) work best on a
narrow, homogeneous work-set
• Eliminate noise that would otherwise arise by
asking questions across whole of HT
Workset Search
Select Items
Create Worksets
Analysis in the HTRC Portal
Choose Algorithm
Choose Collection(s) for Analysis
Run the Analysis…
Results!
View Results
Looking into the future
• Non-consumptive research on copyrighted
texts
• Bookworm tool development:
http://sandbox.htrc.illinois.edu/bookworm/
• Improvement of metadata through Workset
Creation for Scholarly Analysis (WCSA) study
• Documentation and user guides forthcoming
soon
Acknowledgements: HTRC Team
• HTRC @ Illinois (GSLIS and the University
Library):
Stephen Downie, Tim Cole, Loretta Auvil, Sayan
Bhattacharyya, Boris Capitanu, Colleen Fallaw,
Katrina Fenlon, Harriett Green, Peter
Organisciak, Megan Senseney, Craig Willis
• Indiana University: led by Beth Plale
Get Involved!
HTRC Announcements:
htrc-announce-l @ list.indiana.edu
HTRC User Group:
htrc-usergroup-l @ list.indiana.edu
Questions?
Harriett Green
English and Digital Humanities Librarian
University of Illinois at Urbana-Champaign
[email protected]