OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley ([email protected]) Workshop Schedule  Day 1  Morning    Afternoon   Overview of OAI Look at OAI tools and resources DLESE OAI software.

Download Report

Transcript OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley ([email protected]) Workshop Schedule  Day 1  Morning    Afternoon   Overview of OAI Look at OAI tools and resources DLESE OAI software.

OAI Overview
DLESE OAI Workshop
April 29-30, 2002
John Weatherley ([email protected])
Workshop Schedule

Day 1

Morning



Afternoon


Overview of OAI
Look at OAI tools and resources
DLESE OAI software installation, configuration and setup
Day 2

Morning



Overview of NDSL and DLESE interoperability
architecture
NSDL metadata overview
Metadata and OAI
DLESE OAI April 29-30, 2002
2
Resources

Workshop presentation slides, links to
tools and other OAI resources are
located at:
http://oai.dlese.org
DLESE OAI April 29-30, 2002
3
What is DLESE and NSDL?

DLESE: Digital Library for Earth System
Education:


provides access to digitally accessible resources
for learning about the Earth system
NSDL: National Science (STEM) Digital
Library:

network of scholarly and educational digital
libraries related to science (DLESE will be part of
this network)
DLESE OAI April 29-30, 2002
4
1. What is the OAI?

What is the Open Archive Initiative (OAI)?



What is the OAI Protocol for Metadata Harvesting
(OAI-PMH)?


Organization dedicated to solving problems of digital library
interoperability by defining simple protocols and standards
Grew out of the e-prints (arXiv) community at Los Alamos
Protocol to transfer metadata from a source archive to a
destination archive
How is the OAI-PMH Being Used by the NSDL and
DLESE?


The OAI-PMH has been adopted as a primary means of
gathering and sharing metadata among contributors
Also used to facilitate internal management of metadata
stores
DLESE OAI April 29-30, 2002
5
What is Metadata?



Data refers to digital objects e.g. the
resources themselves
Metadata is data about data e.g. a
description about a resource, not the
resource itself
OAI is used to transmit metadata
DLESE OAI April 29-30, 2002
6
2. Definitions / Concepts

Basic Principles



Underlying Technology



Harvesting vs. Federation
Data Providers vs. Service Providers
HTTP and XML
XML Namespaces and Schema
Protocol Policies and Conventions


Basic Policies
Sets
DLESE OAI April 29-30, 2002
7
Harvesting vs. Federation

Competing approaches to interoperability





Federation is when services such as searching are
run remotely
Harvesting is when metadata is transferred from
remote sources to the destination where the
services are located
Federation requires more effort at the remote
site but is easier for the local system
Harvesting requires less effort at the remote
site; Services are provided by the local
system
OAI uses the harvesting model
DLESE OAI April 29-30, 2002
8
Data Providers vs. Service Providers


Data Providers refer to entities who possess
metadata and are willing to share this with
others (e.g. collection builders)
Service Providers are entities who harvest
data from Data Providers in order to provide
higher-level services to users (e.g. searching,
browsing, recommender systems, etc.). The
NSDL and DLESE are examples.
DLESE OAI April 29-30, 2002
9
Features of the OAI Approach

Lightweight: Low overhead for Data Providers




Protocol is relatively simple to implement
Many plug-and-play tools publicly available
Transports any metadata framework that can be
made available in XML form (details to come)
Details of searching, browsing, annotation and
other advanced services are handled by the
Service Provider
DLESE OAI April 29-30, 2002
10
Metadata Harvesting Framework
Data Providers:
Library User
(collection builders)
<xml/>
1. Service Provider polls periodically for new records
3. Provide searching,
browsing,
and other services over
the data.
<xml/>
OAI protocol
(over http)
…
<xml/>
Service
Provider
<xml/>
<xml/>
(DLESE, NSDL)
Harvested Records
<xml/>
…
2. New records downloaded and cached
by the Service Provider
<xml/>
<xml/>
<xml/>
…
DLESE OAI April 29-30, 2002
11
HTTP and XML




The OAI-PMH is an almost stateless
request/response protocol
Requests and responses are sent via
the HTTP protocol
Requests are encoded as GET/POST
operations
Responses are well-formed XML
documents
DLESE OAI April 29-30, 2002
12
Well-formed and Valid XML
Correct
<car>
<make>Dodge</make>
<model>Spirit</model>
<year>1994</year>
<owner>
<name>you</name>
<plate>CO</plate>
</owner>
</car>
DLESE OAI April 29-30, 2002
Incorrect
<car>
<make>Dodge</make>
<model>Spirit</model>
<year>1994
<owner>
<plate>CO</plate>
<name>you</name>
</car>
</owner>
13
DTD, Schemas & Namespace
DTD’s: Document Type
Definition




Describe the elements
of XML instance
documents
Not well-formed XML
Some data-typing
Namespaces harder to
deal with
Schemas
 Describe the elements
of XML instance
documents
 Well-formed XML
 Strong data-typing
 Namespaces are easier
to deal with
Namespace:
 Collection of related element names identified by a
name label (e.g. dc)
DLESE OAI April 29-30, 2002
14
XML Namespaces and Schema


Consistency and data quality is ensured by
using XML Schema descriptions for each
possible response
XML Namespaces are used where necessary
to clearly define which parts of the responses
are actual metadata and which support the
OAI-PMH.

Example:
http://www.cstc.org/cgibin/OAI/CSTC.pl?verb=GetRecord&identifier=oai%3ACSTC%3A103&metada
taPrefix=oai_dc
DLESE OAI April 29-30, 2002
15
Basic OAI Policies and Conventions



Each metadata record from a given Data Provider must have a
unique ID (OAI ID is not necessarily the same as the record ID)
Each metadata record must be persistent so that Service
Providers can always refer back to the source
Each record must have a date stamp indicating creation /
modification date



Dates provide a mechanism for incremental and continuous
transfer of metadata by only requesting records that have changed
since the previous harvest
Flow Control - Resumption Tokens can be used to return partial
results – the client is issued a token which may be presented to
the server to receive more results
Multiple formats of metadata are allowed

Examples: Dublin Core, DLESE IMS
DLESE OAI April 29-30, 2002
16
Sets



OAI-PMH mechanism to allow for harvesting of subcollections
Semantics for sets are defined outside of the protocol
Sets are defined by conventions established between
data and service providers



Example sets within DLESE might be: DWEL, COMET, LDEO,
etc.
Example sets within the NDSL might be: DLESE,
DLESE:DWEL, DLESE:COMET, DLESE:LDEO, etc.
Sets can be established that enable querying (e.g. by
topic, author name, subject area, etc.)

Example: The Open Digital Library (Suleman, 2001)
DLESE OAI April 29-30, 2002
17
3. Requirements to be a Data Provider

Source of metadata


Metadata mappings


Human or automated resource catalogers
Crosswalks from native formats to DC or other formats
Server technology

Handled by the OAI software

Datestamps

Deletions

Unique identifiers
DLESE OAI April 29-30, 2002
18
4. The OAI-PMH

Service Requests








Identify
ListMetadataFormats
ListSets
GetRecord
ListIdentifiers
ListRecords
Date Ranges
Resumption Tokens
DLESE OAI April 29-30, 2002
19
Identify

Purpose


Parameters


Return general information about the archive
and its policies
None
Sample URL

http://oai.dlese.org/provider?verb=Identify
DLESE OAI April 29-30, 2002
20
ListMetadataFormats

Purpose


Parameters


List metadata formats supported by the
archive as well as their schema locations and
namespaces
Identifier – for a specific record ( O )
Sample URL

http://oai.dlese.org/provider?verb=ListMetadataFormats
DLESE OAI April 29-30, 2002
21
ListSets

Purpose


Parameters


Provide a hierarchical listing of sets in which
records may be organized
None
Sample URL

http://oai.dlese.org/provider?verb=ListSets
DLESE OAI April 29-30, 2002
22
GetRecord

Purpose


Parameters



Returns the metadata for a single identifier in
the form on an OAI record
identifier – id for the record ( R )
metadataPrefix – metadata format ( R )
Sample URL

http://oai.dlese.org/provider?verb=GetRecord&identifier=dlese%3ADLE
SE-000-000-000-002&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002
23
ListIdentifiers

Purpose


Parameters




List all unique identifiers corresponding to the record
in the repository
from – start date ( O )
until – end date ( O )
resumptionToken – flow control mechanism ( X )
Sample URL

http://oai.dlese.org/provider?verb=ListIdentifiers
DLESE OAI April 29-30, 2002
24
ListRecords

Purpose


Parameters






Retrieves metadata for multiple records
from – start date ( O )
until – end date ( O )
resumptionToken – flow control mechanism ( X )
set – set to harvest from ( O )
metadataPrefix – metadata format ( R )
Sample URL

http://oai.dlese.org/provider?verb=ListRecords&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002
25
DLESE Architecture
DLESE
Portal
Library
Users
Search &
Discovery
OAI
Services: (e.g.
What’s New)
Metadata
Repository
OAI
OAI
NSDL
Direct
Entry
Resources
Collections
DLESE OAI April 29-30, 2002
26
References
1.
2.
3.
“Building Interoperable Digital Libraries: A Practical Guide to creating Open
Archives,” Hussein Suleman ([email protected]), JCDL 2001 Tutorial.
“A Framework for Building Open Digital Libraries,” Hussein Suleman and
Edward A. Fox, in D-Lib Magazine, December, 2001.
http://www.dlib.org/dlib/december01/suleman/12suleman.html
The Open Archives Initiative http://www.openarchives.org
DLESE OAI April 29-30, 2002
27