Transcript Document
The Open Archives Initiative and OAIster: Past, Present and Future Kat Hagedorn University of Michigan Libraries April 6, 2006 The oy(ai)ster and the hare Well, if oysters had feet… Other projects move faster (think Google) OAI still building speed Follows the punctuated equilibrium model… * © Johnny Hart! Ju n0 Au 2 g02 O ct -0 2 D ec -0 Fe 2 b0 Ap 3 r03 Ju n0 Au 3 g03 O ct -0 3 D ec -0 Fe 3 b0 Ap 4 r04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 Fe 4 b0 Ap 5 r05 Ju n0 Au 5 g05 O ct -0 5 D ec -0 Fe 5 b06 # records OAIster records over time 8 ,0 0 0 ,0 0 0 7 ,0 0 0 ,0 0 0 6 ,0 0 0 ,0 0 0 5 ,0 0 0 ,0 0 0 4 ,0 0 0 ,0 0 0 3 ,0 0 0 ,0 0 0 2 ,0 0 0 ,0 0 0 1 ,0 0 0 ,0 0 0 0 mont hs Ju n0 Au 2 g02 O ct -0 2 D ec -0 2 Fe b03 Ap r03 Ju n0 Au 3 g03 O ct -0 3 D ec -0 3 Fe b04 Ap r04 Ju n0 Au 4 g04 O ct -0 4 D ec -0 4 Fe b05 Ap r05 Ju n0 Au 5 g05 O ct -0 5 D ec -0 5 Fe b06 # repositories OAIster repositories over time 700 600 500 400 300 200 100 0 mont hs Why OAIster? And why the silly name? Initially, wanted to build the Academic HotBot (yup, you read that right) Essentially, a union catalog of those “objects” that couldn’t easily be spidered Currently, have more records that link to “objects” than there are records in our OPAC What does OAIster contain? Harvest everything available except obvious test repositories Keep nearly everything must have a digital object link must have decent metadata must be scholarly or informational For example… Why do (should) people use it? It’s big-- over 7 million last month It’s varied-- contains articles, books, images of artwork, datasets, videos, audios, finding aids, manuscripts It keeps growing-- as long as they keep paying my salary One interface to rule them all? If you don’t know this… www.oaister.org www.oaister.umdl.umich.edu/o/oaister …how do you get to the content? We consider part of our mission making this metadata as widely available as possible, so… Approached us as part of a big content appropriation push Send them our metadata monthly-- takes them about a week to include it in the search index For example-- SRU interface Federated search engines are “it” now-trying to solve problem of how to search simultaneously Perfect place for OAIster Built SRU interface (Z39.50 deemed older tech at this point) ExLibris building connector for MetaLib tool For example-- OAI: what it is (finally) Stands for Open Archives Initiative “…develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.” Includes a Protocol for Metadata Harvesting (PMH), i.e., what we use to fill OAIster Consists of data providers and service providers OAI: what it is not OAI ≠ open access “…defining and promoting machine interfaces that facilitate the availability of content from a variety of providers. Openness does not mean ‘free’ or ‘unlimited’ access to the information repositories that conform to the OAI-PMH.” However, a large majority of OAIster records are available to all and sundry Perfect opportunity-- freely sharing free stuff OAIster and open access We harvest a large number of open access “self-publishing” repositories, e.g., DSpace: 68 EPrints: 113 OJS: 21 Plus green and gold standard peer-reviewed digital object records from repositories like PLOS and arXiv OAI-PMH model OAI-PMH model Data providers: XML UTF-8 metadata records hosted by shareware software Service providers: discover the data provider harvest that metadata transform it… index it and make it searchable Transformation tool Remove “no digital object” records Add normalized fields for limiting search currently resource type normalized to 5 values: text, image, audio, video, dataset planning on date normalization Maps Simple Dublin Core to our own DLXS Bibliographic Class for indexing System design XSL stylesheets (per source type) OAI-enabled DC records UM harvester Record storage BibClass indexes XSLT transformation tool Search interface (XPAT) MODS / Aquifer portals Only harvest Simple Dublin Core for OAIster Experimenting with harvesting MODS Why MODS? Is the metadata standard of choice among richer, enhanced formats Offers more focused ability to search and retrieve records Based on MARC, but human-readable Digital Library Federation (we’re members) is pushing for its use What’d we do with MODS? Mapping MODS to DLXS Bibliographic Class with many modifications adding attributes-- handle display title (The quick fox…) vs. sort title (quick fox…, The) merging fields-- nameParts splitting out subject fields-- topical, name, geographical, hierarchical Not all that perfect merged fields don’t always make sense not fully leveraging the richer fields in search What else? Added bookbag functions Added thumbnails Created better search interface Next… tackle date normalization downloading of MODS directly from interface port useful features and widgets to OAIster Onwards… Receive grant to work on metadata remediation… …meaning ways to cluster and classify metadata so it is more easily searchable and browseable And continue to work on best practices for data providers Who will win?* * kidding…? Questions? Kat Hagedorn University of Michigan Libraries Digital Library Production Service www.oaister.org [email protected]