Transcript Slide 1

HathiTrust:
Building the Universal Collection
John Wilkin
18 May 2009
Presentation structure
• Quick background on where we are
• A few pieces of what’s in the hopper
• Development work underway
• New collaborative structures
• Explore HathiTrust as a vehicle for collaboration in
the realm of collections
www.hathitrust.org
Mission and Goals
• to contribute to the common good by collecting, organizing,
preserving, communicating, and sharing the record of human
knowledge
– materials converted from print
– improve access …to meet the needs of the co-owning
institutions
– reliable and accessible electronic representations
– coordinate shared storage strategies
– “public good” … free-riders.
– simultaneously …centralized …open
www.hathitrust.org
current members
•
•
•
•
•
•
•
•
•
•
•
•
•
California Digital Library
Indiana University
Michigan State University
Northwestern University
The Ohio State University
Penn State University
Purdue University
UC Berkeley
UC Davis
UC Irvine
UCLA
UC Merced
UC Riverside
•
•
•
•
•
•
•
•
•
•
•
•
UC San Diego
UC San Francisco
UC Santa Barbara
UC Santa Cruz
The University of Chicago
University of Illinois
University of Illinois at Chicago
The University of Iowa
University of Michigan
University of Minnesota
University of Wisconsin-Madison
University of Virginia
www.hathitrust.org
Governance Model
• Executive Committee
• Strategic Advisory Board
• Coordinated input from groups of members
– Hathi/CIC Steering Committee
– UC library directors
www.hathitrust.org
Executive Committee
• Paul Courant, University Librarian and Dean of Libraries, University
of Michigan
• Laine Farley, Executive Director, California Digital Library
• Paula Kaufman, University Librarian and Dean of Libraries,
University of Illinois at Champaign-Urbana
• John King, Vice Provost for Academic Information, University of
Michigan
• Brian Schottlaender, University Librarian, University of California,
San Diego Libraries
• Patricia Steele, Dean of Libraries, Indiana University
• Brad Wheeler, Chief Information Officer, Indiana University
• John Wilkin, Executive Director of HathiTrust and Associate
University Library, Library Information Technology, University of
Michigan
www.hathitrust.org
Strategic Advisory Board
– Ed Van Gemert (Chair), Director of Libraries, University of
Wisconsin-Madison
– John Butler, Associate University Librarian for Information
Technology, University of Minnesota
– Patricia Cruse, Director, Preservation, California Digital Library
– Robin Dale, Associate University Librarian for Collections and
Library Information Systems, University of California, Santa Cruz
– R. Bruce Miller, University Librarian, University of California,
Merced
– Sarah Pritchard, University Librarian, Northwestern University
– Paul Soderdahl, Director, Library Information Technology,
University of Iowa
– John Wilkin, Executive Director, HathiTrust (ex officio)
www.hathitrust.org
Preservation: OAIS Reference Model
GROOVE
(JHOVE)
MARC record extensions
(Aleph)
Rights DB
Page Turner
HathiTrust API
OAI
GeoIP DB
CNRI Handles
[Solr]
Google
[OCA]
In-house Conversion
GRIN
Internal Data Loading
METS/PREMIS object
TIFF G4/JPEG2000
OCR
MD5 checksums
Isilon
Site Replication
TSM
MD5 checksum validation
www.hathitrust.org
METS object
PNG
OCR
PDF
growth trajectory
www.hathitrust.org
accomplishments to date
1.
2.
3.
4.
5.
6.
25 partners
successful ingest and millions of vols online
mirroring and backup
rich access
collection builder
Catalog beta and WCL working group
www.hathitrust.org
What next?
• Data API and other strategies for increased
openness
• Internet Archive/OCA ingest followed by misc.
non-Google ingest
• Full text search over entire repository
• Extending out services through Shib
• Creating research corpus
• Deeper collaborative strategies
www.hathitrust.org
Where next with collaboration?
• Begin sharing actual development, cf. ingest of
Internet Archive content
– Specifications
– Validation routines?
– Packaging?
• Collaboratively develop a collaborative
framework
– SAB and working group charges
www.hathitrust.org
Working groups?
•
•
•
•
•
•
•
•
Security
Collection management
Non-Consumptive Research
Digital preservation
Discovery (bibliographic and full text)
Externally-facing repository APIs
Bibliographic metadata management
Rights Management
www.hathitrust.org
Universal collection
• What is a collection?
• Bibliographic identity
• Certification (and for specific or purposes)
– Object as content
– Object as artifact
www.hathitrust.org
Toward the Cloud Library
• Shared Print repository or repositories with all the
best attributes (service, treatment, management)
• Shared digital repository with all the best attributes
(compliance with TRAC, accessible in every sense, a
foundation for services)
• … and even some coordination between the two
• … and even (particularly for in-copyright works
where we don’t have permissions) a viewable copy in
GBS
www.hathitrust.org
Expectations and plans?
• How would we define our requirements for
satisfaction with each?
• What would the business model be?
• How would we build our local collections in
light of the presence of something like this?
• What would we do on the “core” or shared
collections?
www.hathitrust.org
Next steps for libraries
•
•
•
•
•
Case study library: NYU Library
ReCap storage facility in Princeton, NJ
HathiTrust digital repository
CLIR as broker and RLG Research as agent
Futures that depend on looking beyond the
local to the shared, from the shared as “you”
to the shared as “we”
www.hathitrust.org
Thank you!
• http://www.hathitrust.org/
• RSS feed for updates
• [email protected]
www.hathitrust.org