Transcript Slide 1
NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User Group ‘07 Early Repository Planning • Digital Repository Planning Committee • What it wouldn’t be (at least to start) – Distributed community structure – Open submission – ‘Institutional’ Repository • What it would be (at least to start) – Library-managed collections – Building block for campus partnership – Learning opportunity NCSU Libraries Repository Building Blocks • NCSU Electronic Theses and Dissertations – Started 1997 – Mandatory since 2002 – Virginia Tech’s ETDdb – ~3,000 ETDs • NCSU Authors Database – Started 1995 – Access Database/Cold Fusion front-end – ~22,000 citations NCSU Libraries Repository Building Blocks (cont’d) • Technical Reports Print Collection – Campus Institutes and Departments – Massive fall-off in print distribution • Special Collections Resource Center – Digitized texts and photographs – Campus Newsletters • GIS Data – Library managed/acquired data collection – Homegrown data layer database/discovery tools NCSU Libraries Repository Plan • Target ‘Research’ collections first – Technical Reports – ETDs – Faculty Publications/Citations • Treat each collection as its own project • Actively pursue common technological solutions NCSU Libraries Technical Reports • DSpace Application • Lightly Customized • Library Harvested – Local Cataloging/Metadata database – Scripted Ingest Object Creation – Batch Ingest • Mix of ongoing submission by institute/departmental personnel and Library capture. NCSU Libraries Tech Rep Screenshot NCSU Libraries Technical Reports Item Detail NCSU Libraries Electronic Theses & Dissertations • Partnership with Graduate School • Hybrid System: DSpace and ETD-db – ETD-db submission/approval/management – Direct database extract for DSpace Ingest Object creation – Scheduled Batch Ingest process • DSpace Considerations/Alterations – Metadata Mapping – Author Browse (exclude contributor.advisor) – Various interface changes NCSU Libraries ETD-DB screenshot NCSU Libraries ETD DSpace screenshot NCSU Libraries Faculty Publications • Built on Existing Author Database – Rebuilt Authors DB from Access/ColdFusion to Oracle/PHP • Re-modeled data • Added Functionality – OpenURL – ‘Vita-like’ citation display – Full-text or submission links – Full-text stored in DSpace • Citation metadata and file exported by script • DSpace Identifier currently manually entered NCSU Libraries Faculty Publications Schematic Scholar Submit Citations and/or Text Web Submission Form View full-text S+R Citations Web interface (php) DSpace Item Display PostgreSQL (metadata) Oracle Faculty Publications DB (citations) Handle IDs DSpace Java/JSP (full-text only) File System (files) Access ISI Ann. Reps Etc. Add/Edit data Cataloging and Coll. Mgt. NCSU Libraries FacPubs Search Screen NCSU Libraries FacPubs result screenshot NCSU Libraries FacPubs Item screenshot NCSU Libraries Repository Governance • Internal – Digital Repository Planning Committee – Data Repository Architect • External – Faculty Repository Advisory Committee – Partnerships with departments and institutes NCSU Libraries NCGDAP: Overview • NDIIPP: National Digital Information Infrastructure and Preservation Program • Collaboration with Library of Congress • 1 of 8 three year projects to study long-term (50+ years) digital preservation • Objective: engage existing state/federal geospatial data infrastructures in preservation • Project approaches: Technical and Social NCSU Libraries Repository Requirements • Dim archive with possible future access – minimal IR/access component • Minimal repository imprint on data – repository agnostic ingest and export • Simple digital curation functions – Periodic MD5 checksum validation – Structured metadata index • Expected archived-data exchange • Leverage existing investments • Free Software with active community NCSU Libraries Automation: Threat and format analysis, validation Python wrappers for the following: • Anti-virus – ClamAV • Compressed files (tar, zip, gzip, bzip) • At-risk formats • Executable files (magic numbers) • Jhove validation NCSU Libraries Automation: Archive package organization • ESRI ArcGIS toolbar for selected formats NCSU Libraries Automation: Archive package organization • Rule-based python logic – filestem – extension relationships ( multifile format validation) – directory structure • Manual intervention • NOID assignment NCSU Libraries Metadata: Seed file form • 'Transfer set' metadata capture in 'Seed file' – communicates with DSpace backend, generates xml used to inform later scripts NCSU Libraries Metadata: Communities and Collections • Search by type for 100+ communities • Facilitates creation and reduces errors NCSU Libraries Curation Processing • At-risk format migration, original retained • Agency-specific XML templates in ArcCatalog with synchronization flags • Provenance and curation metadata scripted NCSU Libraries Source Metadata Translation • Repository agnostic approach • Spokes for each transformation • Facilitates export from Dspace into other repositories • Generate Dspace QDC, METS; populate Workflow database NCSU Libraries Extra-repository AIP management • Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub • External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems • Integrates with existing GIS Lookup tool NCSU Libraries Repository Architecture Overview One shared username. Separate database for each app PostgreSQL repository tomcat instance Faculty Publications PHP/DSpace hybrid Tomcat DSpace Internal Repository (DSpace) •Technical Reports •ETDs Collections (DSpace) SCRC --Course Catalogs --Green ‘N’ Growing NDIIPP (DSpace) Asset Store/ ATABeast (sub-directory for each DSpace app) NCSU Libraries SCRC (DSpace) Upcoming Repository Related Projects • Enhancements to current system – XTF search interface – Inter-archive exchange • Digital Collections Repository – Special Collections Research Center – Other non-faculty collections • Data Repository – Scientific data – Statistical resources NCSU Libraries For More Information: • James Jackson Sanborn – [email protected] • Jim Tuttle – [email protected] NCSU Libraries