Transcript Document

UC Digital Library Forum
August 5, 2002
UCLA Digital Library
http://digital.library.ucla.edu
Presenter: Curtis Fornadley
Senior Programmer/Analyst UCLA Library
[email protected]
• Hardware and Software Architecture
• Project Architecture
• What is the Open Archives Initiative?
• The OAI Sheet Music Harvester
UCLA Digital Library
UCLA Digital Library Hardware Architecture
UCLA Digital Library
UCLA Digital Library Software Architecture
- Java 2 Enterprise Edition (J2EE) (v1.3-1.4)
- Oracle 8i (9i Fall 2002)
- Oracle Intermedia Tool Kit
- JRun Application Server (v3.1)
- XML, XSLT
- MS Access – for Metadata collection
- Microsoft NT4 and Win 2000
UCLA Digital Library
Digital Library Projects
Web based applications to search and present digital content and
metadata.
Project Type
Text
Image
Production
4
5
Development
3+2
3
Audio
Video
0
0
1- Planned WQ 03
None Planned
All projects share similar design patterns
UCLA Digital Library
UCLA Digital Library
UCLA Digital Library
UCLA Digital Library
Combining Text (XML) and Format (XSLT) to Create HTML
UCLA Digital Library
Archive of Popular American Music (APAM)
• APAM contains ~ 450,000 pieces of Sheet Music
• Metadata collected in UCLA Core. No pre-existing Metadata
• Content is digitized in house (about 850 sheets so far)
• Sheet music hosted as a PDF file.
• All Covers and PDF’s are hosted from Oracle DB as Bfiles
• Dynamic sizing of Cover images through Oracle InterMedia Tools.
• http://digital.library.ucla.edu/sheetmusic
• In production, last updated March 2002.
The basis for the OAI Sheet Music Harvester Project
UCLA Digital Library
Open Archives Initiative
Protocol for Metadata Harvesting
(OAI Version 2.0)
“The OAI protocol facilitates metadata harvesting”
http://www.openarchives.org
UCLA Digital Library
OAI Requests and Responses
OAI Requests and Responses uses HTTP - “just like the web”
OAI Requests
Use either the HTTP GET or POST methods.
OAI Responses
Formatted as HTTP Responses.
Every OAI Response is valid XML
UCLA Digital Library
Important OAI “Verbs”
The meat of the OAI is six “verbs” issued in a request to harvest metadata.
1) GetRecord - to retrieve an individual record
2) Identify - to retrieve information about a repository
3) ListIdentifiers - to retrieve the identifiers of records that can be harvested from a repository.
4) ListMetadataFormats - to retrieve the metadata formats available from a repository.
the minimum requirement.
5) ListRecords 6) ListSets -
to harvest records from a repository.
to retrieve the set structure in a repository.
UCLA Digital Library
DC is
Important OAI “Nouns”
Repository - a server to which OAI protocol requests can be submitted. The
repository outputs metadata in the form of a record.
Record - an XML-encoded byte stream that is returned by a repository in
response to an OAI request for metadata from an item in that repository.
At a minimum, repositories must be able to return records with metadata expressed
in unqualified Dublin Core.
Set - A construct for grouping items in a repository for the purpose of selective
harvesting of records.
UCLA Digital Library
UCLA Digital Library
Sheet Music OAI Data Providers
• UCLA – Currently online (Java)
• Library of Congress - Currently online
• John Hopkins University– any day now
• Indiana University - September 2002
• Duke – within the next 12 months
• Brown – within the next 12 months
Each participating institution is responsible for creating their own
OAI-compliant sheet music repository.
Major hurdles to becoming a Data Provider:
-Programming
-Data Mapping
UCLA Digital Library
High Level Design of OAI Sheet Music Service Provider
UCLA Digital Library
UCLA Digital Library
OAI Sheet Music Project
Development Goals and Challenges
• Leveraging UIUC Harvester code
• Challenge of reverse engineering and extending code
• Being flexible - combine relational and XML text indexing
• Performance vs. Functionality: an on-going challenge
• Testing of 0.1 Service Provider – August 2002
• Debut of the pilot - late Fall 2002
UCLA Digital Library
Hypothetical User Interface for Sheet Music Service Provider
The biggest challenge is to create a Service Provider
that extends the usable services offered to users.
Conceptualize -> Design -> Implement
UCLA Digital Library
Summary
John Ober’s Charge: “Discuss architecture and standards used in
projects and the technical challenges yet to be faced.”
Challenges:
• Metadata collection – Automated vs. Manual
• Meeting infrastructure storage needs: Online – Nearline - Backup
Personal Challenges and Thoughts:
• Many challenges are not technical
• Developing a personal filter on information
• Risk assessment: when is the right time to adopt a new technology
• Surface knowledge vs. Deep understanding. Islands of knowledge
• No stable resource body of knowledge to turn to for advise or help
UCLA Digital Library