Transcript Slide 1
NCSU Libraries
Digital Repository Projects
at the
North Carolina State University Libraries
James Jackson Sanborn
Jim Tuttle
Open Repositories/DSpace User Group ‘07
Early Repository Planning
• Digital Repository Planning Committee
• What it wouldn’t be (at least to start)
– Distributed community structure
– Open submission
– ‘Institutional’ Repository
• What it would be (at least to start)
– Library-managed collections
– Building block for campus partnership
– Learning opportunity
NCSU Libraries
Repository Building Blocks
• NCSU Electronic Theses and Dissertations
– Started 1997
– Mandatory since 2002
– Virginia Tech’s ETDdb
– ~3,000 ETDs
• NCSU Authors Database
– Started 1995
– Access Database/Cold Fusion front-end
– ~22,000 citations
NCSU Libraries
Repository Building Blocks (cont’d)
• Technical Reports Print Collection
– Campus Institutes and Departments
– Massive fall-off in print distribution
• Special Collections Resource Center
– Digitized texts and photographs
– Campus Newsletters
• GIS Data
– Library managed/acquired data collection
– Homegrown data layer database/discovery
tools
NCSU Libraries
Repository Plan
• Target ‘Research’ collections first
– Technical Reports
– ETDs
– Faculty Publications/Citations
• Treat each collection as its own project
• Actively pursue common technological
solutions
NCSU Libraries
Technical Reports
• DSpace Application
• Lightly Customized
• Library Harvested
– Local Cataloging/Metadata database
– Scripted Ingest Object Creation
– Batch Ingest
• Mix of ongoing submission by
institute/departmental personnel and
Library capture.
NCSU Libraries
Tech Rep Screenshot
NCSU Libraries
Technical Reports Item Detail
NCSU Libraries
Electronic Theses & Dissertations
• Partnership with Graduate School
• Hybrid System: DSpace and ETD-db
– ETD-db submission/approval/management
– Direct database extract for DSpace Ingest
Object creation
– Scheduled Batch Ingest process
• DSpace Considerations/Alterations
– Metadata Mapping
– Author Browse (exclude contributor.advisor)
– Various interface changes
NCSU Libraries
ETD-DB screenshot
NCSU Libraries
ETD DSpace screenshot
NCSU Libraries
Faculty Publications
• Built on Existing Author Database
– Rebuilt Authors DB from Access/ColdFusion
to Oracle/PHP
• Re-modeled data
• Added Functionality
– OpenURL
– ‘Vita-like’ citation display
– Full-text or submission links
– Full-text stored in DSpace
• Citation metadata and file exported by script
• DSpace Identifier currently manually entered
NCSU Libraries
Faculty Publications Schematic
Scholar
Submit
Citations
and/or
Text
Web Submission Form
View
full-text
S+R Citations
Web interface (php)
DSpace Item Display
PostgreSQL
(metadata)
Oracle Faculty
Publications
DB (citations)
Handle IDs
DSpace
Java/JSP
(full-text
only)
File System
(files)
Access
ISI
Ann. Reps
Etc.
Add/Edit data
Cataloging and Coll. Mgt.
NCSU Libraries
FacPubs Search Screen
NCSU Libraries
FacPubs result screenshot
NCSU Libraries
FacPubs Item screenshot
NCSU Libraries
Repository Governance
• Internal
– Digital Repository Planning Committee
– Data Repository Architect
• External
– Faculty Repository Advisory Committee
– Partnerships with departments and institutes
NCSU Libraries
NCGDAP: Overview
• NDIIPP: National Digital Information
Infrastructure and Preservation Program
• Collaboration with Library of Congress
• 1 of 8 three year projects to study long-term
(50+ years) digital preservation
• Objective: engage existing state/federal
geospatial data infrastructures in
preservation
• Project approaches: Technical and Social
NCSU Libraries
Repository Requirements
• Dim archive with possible future access
– minimal IR/access component
• Minimal repository imprint on data
– repository agnostic ingest and export
• Simple digital curation functions
– Periodic MD5 checksum validation
– Structured metadata index
• Expected archived-data exchange
• Leverage existing investments
• Free Software with active community
NCSU Libraries
Automation:
Threat and format analysis, validation
Python wrappers for the following:
• Anti-virus – ClamAV
• Compressed files (tar, zip, gzip, bzip)
• At-risk formats
• Executable files (magic numbers)
• Jhove validation
NCSU Libraries
Automation:
Archive package organization
• ESRI ArcGIS toolbar for selected formats
NCSU Libraries
Automation:
Archive package organization
• Rule-based python
logic
– filestem
– extension
relationships ( multifile format validation)
– directory structure
• Manual intervention
• NOID assignment
NCSU Libraries
Metadata:
Seed file form
• 'Transfer set' metadata capture in 'Seed
file'
– communicates with DSpace backend,
generates xml used to inform later scripts
NCSU Libraries
Metadata:
Communities and Collections
• Search by type for 100+ communities
• Facilitates creation and reduces errors
NCSU Libraries
Curation Processing
• At-risk format migration, original retained
• Agency-specific XML templates in
ArcCatalog with synchronization flags
• Provenance and curation metadata
scripted
NCSU Libraries
Source Metadata Translation
• Repository agnostic
approach
• Spokes for each
transformation
• Facilitates export from
Dspace into other
repositories
• Generate Dspace QDC,
METS; populate
Workflow database
NCSU Libraries
Extra-repository AIP management
• Workflow Management Database (WMD)
populated as a spoke on the
metadata/ingest hub
• External tracking of NOID, Handle, ISO
keywords, other metadata for interaction
with other systems
• Integrates with existing GIS Lookup tool
NCSU Libraries
Repository Architecture Overview
One shared username.
Separate database for each
app
PostgreSQL
repository
tomcat instance
Faculty Publications
PHP/DSpace hybrid
Tomcat
DSpace Internal
Repository
(DSpace)
•Technical Reports
•ETDs
Collections
(DSpace)
SCRC
--Course Catalogs
--Green ‘N’ Growing
NDIIPP
(DSpace)
Asset Store/
ATABeast
(sub-directory for each DSpace app)
NCSU Libraries
SCRC
(DSpace)
Upcoming Repository Related Projects
• Enhancements to current system
– XTF search interface
– Inter-archive exchange
• Digital Collections Repository
– Special Collections Research Center
– Other non-faculty collections
• Data Repository
– Scientific data
– Statistical resources
NCSU Libraries
For More Information:
• James Jackson Sanborn
– [email protected]
• Jim Tuttle
– [email protected]
NCSU Libraries