Transcript Document

The Open Archives Initiative
and OAIster:
Past, Present and Future
Kat Hagedorn
University of Michigan Libraries
April 6, 2006
The oy(ai)ster and the hare

Well, if oysters had feet…

Other projects move faster (think Google)
OAI still building speed
Follows the punctuated equilibrium model…


* © Johnny Hart!
Ju
n0
Au 2
g02
O
ct
-0
2
D
ec
-0
Fe 2
b0
Ap 3
r03
Ju
n0
Au 3
g03
O
ct
-0
3
D
ec
-0
Fe 3
b0
Ap 4
r04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
Fe 4
b0
Ap 5
r05
Ju
n0
Au 5
g05
O
ct
-0
5
D
ec
-0
Fe 5
b06
# records
OAIster records over time
8 ,0 0 0 ,0 0 0
7 ,0 0 0 ,0 0 0
6 ,0 0 0 ,0 0 0
5 ,0 0 0 ,0 0 0
4 ,0 0 0 ,0 0 0
3 ,0 0 0 ,0 0 0
2 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0
0
mont hs
Ju
n0
Au 2
g02
O
ct
-0
2
D
ec
-0
2
Fe
b03
Ap
r03
Ju
n0
Au 3
g03
O
ct
-0
3
D
ec
-0
3
Fe
b04
Ap
r04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
4
Fe
b05
Ap
r05
Ju
n0
Au 5
g05
O
ct
-0
5
D
ec
-0
5
Fe
b06
# repositories
OAIster repositories over time
700
600
500
400
300
200
100
0
mont hs
Why OAIster?

And why the silly name?

Initially, wanted to build the Academic
HotBot (yup, you read that right)
Essentially, a union catalog of those
“objects” that couldn’t easily be spidered
Currently, have more records that link to
“objects” than there are records in our
OPAC


What does OAIster contain?

Harvest everything available
 except obvious test repositories

Keep nearly everything
 must have a digital object link
 must have decent metadata
 must be scholarly or informational

For example…
Why do (should) people use it?



It’s big-- over 7 million last month
It’s varied-- contains articles, books, images
of artwork, datasets, videos, audios, finding
aids, manuscripts
It keeps growing-- as long as they keep
paying my salary
One interface to rule them all?

If you don’t know this…
 www.oaister.org
 www.oaister.umdl.umich.edu/o/oaister


…how do you get to the content?
We consider part of our mission making this
metadata as widely available as possible,
so…



Approached us as part of a big content
appropriation push
Send them our metadata monthly-- takes
them about a week to include it in the
search index
For example--
SRU interface





Federated search engines are “it” now-trying to solve problem of how to search
simultaneously
Perfect place for OAIster
Built SRU interface (Z39.50 deemed older
tech at this point)
ExLibris building connector for MetaLib tool
For example--
OAI: what it is (finally)

Stands for Open Archives Initiative
 “…develops and promotes interoperability standards
that aim to facilitate the efficient dissemination of
content.”


Includes a Protocol for Metadata Harvesting
(PMH), i.e., what we use to fill OAIster
Consists of data providers and service
providers
OAI: what it is not

OAI ≠ open access
 “…defining and promoting machine interfaces that
facilitate the availability of content from a variety of
providers. Openness does not mean ‘free’ or ‘unlimited’
access to the information repositories that conform to
the OAI-PMH.”


However, a large majority of OAIster records
are available to all and sundry
Perfect opportunity-- freely sharing free stuff
OAIster and open access

We harvest a large number of open access
“self-publishing” repositories, e.g.,
 DSpace: 68
 EPrints: 113
 OJS: 21

Plus green and gold standard peer-reviewed
digital object records from repositories like
PLOS and arXiv
OAI-PMH model
OAI-PMH model

Data providers:
 XML UTF-8 metadata records
 hosted by shareware software

Service providers:




discover the data provider
harvest that metadata
transform it…
index it and make it searchable
Transformation tool


Remove “no digital object” records
Add normalized fields for limiting search
 currently resource type normalized to 5 values:
text, image, audio, video, dataset
 planning on date normalization

Maps Simple Dublin Core to our own DLXS
Bibliographic Class for indexing
System design
XSL
stylesheets
(per source
type)
OAI-enabled
DC records
UM
harvester
Record
storage
BibClass
indexes
XSLT
transformation
tool
Search
interface
(XPAT)
MODS / Aquifer portals


Only harvest Simple Dublin Core for OAIster
Experimenting with harvesting MODS
Why MODS?




Is the metadata standard of choice among
richer, enhanced formats
Offers more focused ability to search and
retrieve records
Based on MARC, but human-readable
Digital Library Federation (we’re members)
is pushing for its use
What’d we do with MODS?

Mapping MODS to DLXS Bibliographic
Class with many modifications
 adding attributes-- handle display title (The
quick fox…) vs. sort title (quick fox…, The)
 merging fields-- nameParts
 splitting out subject fields-- topical, name,
geographical, hierarchical

Not all that perfect
 merged fields don’t always make sense
 not fully leveraging the richer fields in search
What else?


Added bookbag functions
Added thumbnails
Created better search interface

Next…

 tackle date normalization
 downloading of MODS directly from interface
 port useful features and widgets to OAIster
Onwards…



Receive grant to work on
metadata remediation…
…meaning ways to cluster
and classify metadata so it is
more easily searchable and
browseable
And continue to work on best
practices for data providers
Who will win?*
* kidding…?
Questions?





Kat Hagedorn
University of Michigan Libraries
Digital Library Production Service
www.oaister.org
[email protected]