OAI-PMH at Yale

Download Report

Transcript OAI-PMH at Yale

OAI-PMH at Yale
Report on the
DLF OAI Training Session
November 10, 2005
Charlottesville, VA
Overview
•
•
•
•
Review of the protocol
OAI best practices
Potential Yale applications
Next steps for the Metadata Committee
OAI-PMH v.2.0
Basic Concepts
• Data provider: administers systems
that expose metadata
• Service provider: uses metadata to
build value-added services
• Harvester: a client application that
issues OAI-PMH requests
• Repository: a network accessible server
that can process OAI-PMH requests
OAI-PMH v.2.0
Basic Concepts
• Resource: the physical or digital object
that metadata is "about"
• Item: a constituent of a repository from
which metadata about a resource can
be disseminated
• Record: metadata in a specific format
• Identifier: a unique identifier that
unambiguously identifies an item in a
repository; must conform to URI syntax
OAI-PMH v.2.0
Harvesting
• Deleted records
• Sets
• Datestamps
– ISO 8601
– UTC
• Selective harvesting
OAI-PMH v.2.0
Protocol Features: HTTP
• Request
– GET baseURL?key=value&….&key=value
– POST baseURL
Content-Type: application/x-www-form-urlencoded
Content-Length: number of characters
key=value&…&key=value
• Response
– XML document in message body or error code
OAI-PMH v.2.0
Protocol Features: XML Response
• XML declaration
<?xml version="1.0" encoding="UTF-8"?>
• OAI-PMH root element with these attributes:
– Default namespace declaration
xmlns=“http://www.openarchives.org/OAI/2.0/”
– Schema instance declaration
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”
– Schema location
xsi:schemaLocation=“http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd”
OAI-PMH v.2.0
Protocol Features: XML Response
• responseDate element
– YYYY-MM-DDThh:mm:ssZ
• request element
<request key=“value” key=“value”
key=“value”>baseURL</request>
• response element
– It has the same name as the verb used in
the request.
OAI-PMH v.2.0
Protocol Features
• Multiple metadata formats
– metadataPrefix
• Flow control
– resumptionToken
OAI-PMH v.2.0
Requests & Responses
• GetRecord
• Identify
• ListIdentifiers
• ListMetadataFormats
• ListRecords
• ListSets
Current Work: Resource Harvesting
within the OAI-PMH Framework
• Datestamps
– Updated record vs. updated resource
• Locating the resource
– Multiple URLs: splash page, resource,
etc.
– Multiple elements used inconsistently:
dc.identifier, dc.format, dc.relation
Current Work: Resource Harvesting
within the OAI-PMH Framework
• Complex object formats
–
–
–
–
FOXML
METS
MPEG-21 DID
SCORM
• Other implementations
– mod_oai
OAI Best Practices
DLF OAI Implementers Workshop
Handouts from the session
1. Project Abstract
2. The Case for OAI
3. OAI “Cheat Sheet”: A Taxonomy of Rapid OAI
Deployment Strategies
4. Summary of OAI Metadata Best Practices
5. Summary of the DLF Aquifer MODS Profile
6. OAI Tools
7. OAI Implementation: Administrative Planning
OAI Best Practices
Implementation Decisions
• Collections
– Develop criteria. Prioritize according to
ease of implementation, associated risk,
logical dependencies among items, etc.
• Metadata formats
– Decide which formats to support.
• Technical infrastructure
– E.g., use a gateway that provides a base
URL for multiple individual collections.
OAI Best Practices
Deployment Options
• Emory’s Metadata Migrator
• Static repositories
• UIUC’s OAI FileMakerPro Gateway
• Fedora
• Luna Insight
OAI Best Practices
for Data Providers
• Identifiers
– Should be persistent & unique.
– Should not be reused.
– Specification and XML Schema
• Datestamps
– Use UTC.
– Support seconds granularity, if possible.
• Deleted records
– Provide persistent support, if possible.
OAI Best Practices
for Data Providers
• Resumption tokens
– For repositories > 2 MB
• Sets
– Service providers harvest by set.
– How should sets be organized?
• About containers
– Rights
– Provenance (for 3rd party aggregators)
Implementation Guidelines
Includes:
Guidelines for Repository Implementers
Guidelines for Harvester Implementers
OAI Validation
• Reap: OAI command line harvesting
• Repository explorer: for data providers
& service providers to test harvesting &
searching
• W3C validator for XML schema
• Utf8conditioner: for character encoding
problems
• See OAI Tools handout for more info.
OAI Best Practices
for Shareable Metadata
The four C’s of shareable metadata
• Consistency
• Coherence
• Context
• Conformance
OAI Best Practices
for Shareable Metadata
• Metadata in a shared environment
– Context & coherence
– Don’t assume a local user.
• Granularity of description
– Appropriate for access to the resource
– Don’t expose records for subordinate items.
• Use of multiple metadata formats
– Need to be expressed as XML schema
– Stepped crosswalking to simpler formats.
OAI Best Practices
for Shareable Metadata
• Relating versions of a resource
– One-to-One Principle
– Multiple strategies/compromises
• Document metadata creation practices
– In OAI responses
– In external documentation
• Communication with service providers
Potential Applications at Yale
Implementation Goals
• Improve user experience
– Federated search
• Improve management of resources
– Finding aids
• Collaborate with institutional partners
– AMEEL
• Develop digital library infrastructure
– At Yale and beyond
Potential Applications at Yale
Resources & Roles
• Resources
–
–
–
–
Commitment of stakeholders
Analysis of deployment options
Server infrastructure
Staff hours
• Roles
–
–
–
–
OAI-PMH Implementation Manager
Programmers & technical staff
Metadata specialists
Digital collection curators
Potential Applications at Yale
Sharing Metadata
• 3rd Party Aggregators
– OAIster
– DLF Portal
– MODS Portal
• Registries
– Registered OAI repositories
– Institutional Archives Registry
– OAI Registry at UIUC
Next Steps for the
Metadata Committee
• Centralized implementation at Yale?
If yes,
– Relate to other digital library initiatives.
– Create buy-in.
• Service provider needs
– Consult with IAC committees.
• Data provider needs
– Consult with digital collection curators.
Next Steps for the
Metadata Committee
• Metadata recommendations
– Recommend multiple formats
– Decide upon a common format
• YES? MODS?
• Stepped crosswalking from other formats
– Content & encoding guidelines
– Metadata creation tools
– Staffing