OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley ([email protected]) Workshop Schedule Day 1 Morning Afternoon Overview of OAI Look at OAI tools and resources DLESE OAI software.
Download
Report
Transcript OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley ([email protected]) Workshop Schedule Day 1 Morning Afternoon Overview of OAI Look at OAI tools and resources DLESE OAI software.
OAI Overview
DLESE OAI Workshop
April 29-30, 2002
John Weatherley ([email protected])
Workshop Schedule
Day 1
Morning
Afternoon
Overview of OAI
Look at OAI tools and resources
DLESE OAI software installation, configuration and setup
Day 2
Morning
Overview of NDSL and DLESE interoperability
architecture
NSDL metadata overview
Metadata and OAI
DLESE OAI April 29-30, 2002
2
Resources
Workshop presentation slides, links to
tools and other OAI resources are
located at:
http://oai.dlese.org
DLESE OAI April 29-30, 2002
3
What is DLESE and NSDL?
DLESE: Digital Library for Earth System
Education:
provides access to digitally accessible resources
for learning about the Earth system
NSDL: National Science (STEM) Digital
Library:
network of scholarly and educational digital
libraries related to science (DLESE will be part of
this network)
DLESE OAI April 29-30, 2002
4
1. What is the OAI?
What is the Open Archive Initiative (OAI)?
What is the OAI Protocol for Metadata Harvesting
(OAI-PMH)?
Organization dedicated to solving problems of digital library
interoperability by defining simple protocols and standards
Grew out of the e-prints (arXiv) community at Los Alamos
Protocol to transfer metadata from a source archive to a
destination archive
How is the OAI-PMH Being Used by the NSDL and
DLESE?
The OAI-PMH has been adopted as a primary means of
gathering and sharing metadata among contributors
Also used to facilitate internal management of metadata
stores
DLESE OAI April 29-30, 2002
5
What is Metadata?
Data refers to digital objects e.g. the
resources themselves
Metadata is data about data e.g. a
description about a resource, not the
resource itself
OAI is used to transmit metadata
DLESE OAI April 29-30, 2002
6
2. Definitions / Concepts
Basic Principles
Underlying Technology
Harvesting vs. Federation
Data Providers vs. Service Providers
HTTP and XML
XML Namespaces and Schema
Protocol Policies and Conventions
Basic Policies
Sets
DLESE OAI April 29-30, 2002
7
Harvesting vs. Federation
Competing approaches to interoperability
Federation is when services such as searching are
run remotely
Harvesting is when metadata is transferred from
remote sources to the destination where the
services are located
Federation requires more effort at the remote
site but is easier for the local system
Harvesting requires less effort at the remote
site; Services are provided by the local
system
OAI uses the harvesting model
DLESE OAI April 29-30, 2002
8
Data Providers vs. Service Providers
Data Providers refer to entities who possess
metadata and are willing to share this with
others (e.g. collection builders)
Service Providers are entities who harvest
data from Data Providers in order to provide
higher-level services to users (e.g. searching,
browsing, recommender systems, etc.). The
NSDL and DLESE are examples.
DLESE OAI April 29-30, 2002
9
Features of the OAI Approach
Lightweight: Low overhead for Data Providers
Protocol is relatively simple to implement
Many plug-and-play tools publicly available
Transports any metadata framework that can be
made available in XML form (details to come)
Details of searching, browsing, annotation and
other advanced services are handled by the
Service Provider
DLESE OAI April 29-30, 2002
10
Metadata Harvesting Framework
Data Providers:
Library User
(collection builders)
<xml/>
1. Service Provider polls periodically for new records
3. Provide searching,
browsing,
and other services over
the data.
<xml/>
OAI protocol
(over http)
…
<xml/>
Service
Provider
<xml/>
<xml/>
(DLESE, NSDL)
Harvested Records
<xml/>
…
2. New records downloaded and cached
by the Service Provider
<xml/>
<xml/>
<xml/>
…
DLESE OAI April 29-30, 2002
11
HTTP and XML
The OAI-PMH is an almost stateless
request/response protocol
Requests and responses are sent via
the HTTP protocol
Requests are encoded as GET/POST
operations
Responses are well-formed XML
documents
DLESE OAI April 29-30, 2002
12
Well-formed and Valid XML
Correct
<car>
<make>Dodge</make>
<model>Spirit</model>
<year>1994</year>
<owner>
<name>you</name>
<plate>CO</plate>
</owner>
</car>
DLESE OAI April 29-30, 2002
Incorrect
<car>
<make>Dodge</make>
<model>Spirit</model>
<year>1994
<owner>
<plate>CO</plate>
<name>you</name>
</car>
</owner>
13
DTD, Schemas & Namespace
DTD’s: Document Type
Definition
Describe the elements
of XML instance
documents
Not well-formed XML
Some data-typing
Namespaces harder to
deal with
Schemas
Describe the elements
of XML instance
documents
Well-formed XML
Strong data-typing
Namespaces are easier
to deal with
Namespace:
Collection of related element names identified by a
name label (e.g. dc)
DLESE OAI April 29-30, 2002
14
XML Namespaces and Schema
Consistency and data quality is ensured by
using XML Schema descriptions for each
possible response
XML Namespaces are used where necessary
to clearly define which parts of the responses
are actual metadata and which support the
OAI-PMH.
Example:
http://www.cstc.org/cgibin/OAI/CSTC.pl?verb=GetRecord&identifier=oai%3ACSTC%3A103&metada
taPrefix=oai_dc
DLESE OAI April 29-30, 2002
15
Basic OAI Policies and Conventions
Each metadata record from a given Data Provider must have a
unique ID (OAI ID is not necessarily the same as the record ID)
Each metadata record must be persistent so that Service
Providers can always refer back to the source
Each record must have a date stamp indicating creation /
modification date
Dates provide a mechanism for incremental and continuous
transfer of metadata by only requesting records that have changed
since the previous harvest
Flow Control - Resumption Tokens can be used to return partial
results – the client is issued a token which may be presented to
the server to receive more results
Multiple formats of metadata are allowed
Examples: Dublin Core, DLESE IMS
DLESE OAI April 29-30, 2002
16
Sets
OAI-PMH mechanism to allow for harvesting of subcollections
Semantics for sets are defined outside of the protocol
Sets are defined by conventions established between
data and service providers
Example sets within DLESE might be: DWEL, COMET, LDEO,
etc.
Example sets within the NDSL might be: DLESE,
DLESE:DWEL, DLESE:COMET, DLESE:LDEO, etc.
Sets can be established that enable querying (e.g. by
topic, author name, subject area, etc.)
Example: The Open Digital Library (Suleman, 2001)
DLESE OAI April 29-30, 2002
17
3. Requirements to be a Data Provider
Source of metadata
Metadata mappings
Human or automated resource catalogers
Crosswalks from native formats to DC or other formats
Server technology
Handled by the OAI software
Datestamps
Deletions
Unique identifiers
DLESE OAI April 29-30, 2002
18
4. The OAI-PMH
Service Requests
Identify
ListMetadataFormats
ListSets
GetRecord
ListIdentifiers
ListRecords
Date Ranges
Resumption Tokens
DLESE OAI April 29-30, 2002
19
Identify
Purpose
Parameters
Return general information about the archive
and its policies
None
Sample URL
http://oai.dlese.org/provider?verb=Identify
DLESE OAI April 29-30, 2002
20
ListMetadataFormats
Purpose
Parameters
List metadata formats supported by the
archive as well as their schema locations and
namespaces
Identifier – for a specific record ( O )
Sample URL
http://oai.dlese.org/provider?verb=ListMetadataFormats
DLESE OAI April 29-30, 2002
21
ListSets
Purpose
Parameters
Provide a hierarchical listing of sets in which
records may be organized
None
Sample URL
http://oai.dlese.org/provider?verb=ListSets
DLESE OAI April 29-30, 2002
22
GetRecord
Purpose
Parameters
Returns the metadata for a single identifier in
the form on an OAI record
identifier – id for the record ( R )
metadataPrefix – metadata format ( R )
Sample URL
http://oai.dlese.org/provider?verb=GetRecord&identifier=dlese%3ADLE
SE-000-000-000-002&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002
23
ListIdentifiers
Purpose
Parameters
List all unique identifiers corresponding to the record
in the repository
from – start date ( O )
until – end date ( O )
resumptionToken – flow control mechanism ( X )
Sample URL
http://oai.dlese.org/provider?verb=ListIdentifiers
DLESE OAI April 29-30, 2002
24
ListRecords
Purpose
Parameters
Retrieves metadata for multiple records
from – start date ( O )
until – end date ( O )
resumptionToken – flow control mechanism ( X )
set – set to harvest from ( O )
metadataPrefix – metadata format ( R )
Sample URL
http://oai.dlese.org/provider?verb=ListRecords&metadataPrefix=dlese_ims
DLESE OAI April 29-30, 2002
25
DLESE Architecture
DLESE
Portal
Library
Users
Search &
Discovery
OAI
Services: (e.g.
What’s New)
Metadata
Repository
OAI
OAI
NSDL
Direct
Entry
Resources
Collections
DLESE OAI April 29-30, 2002
26
References
1.
2.
3.
“Building Interoperable Digital Libraries: A Practical Guide to creating Open
Archives,” Hussein Suleman ([email protected]), JCDL 2001 Tutorial.
“A Framework for Building Open Digital Libraries,” Hussein Suleman and
Edward A. Fox, in D-Lib Magazine, December, 2001.
http://www.dlib.org/dlib/december01/suleman/12suleman.html
The Open Archives Initiative http://www.openarchives.org
DLESE OAI April 29-30, 2002
27