The Fedora Project

Download Report

Transcript The Fedora Project

The Fedora Project
Tim Sigmon
University of Virginia
JA-SIG Winter Conference
December 9, 2003
This Fedora Project is not the
Redhat Fedora project.
The Fedora Project
• Fedora Digital Object Repository System
–
–
–
–
–
–
Extensible digital object model
Repository System exposed via Web service APIs
Scalable, persistent storage for content and metadata
Local and remote content
Associate services with objects
Content versioning
• Fedora Use cases
–
–
–
–
–
–
Content Management (CMS)
Digital Library architecture
Digital Asset Management
Institutional Repository
Scholarly publishing
Preservation
• Open source software
Priorities for digital libraries
• Managing digital resources as if they
are all the same
• Delivering digital resources as if they
are all unique and free to participate in
any number of contexts
• Supporting digital scholarship wherever
it may lead
Shortcomings of commercial digital
library products
• Narrow focus on specific media formats (e.g.
image databases, document management)
• Fail to effectively address interrelationships
among digital entities
• Fail to address interoperability
• Fail to provide facilities for managing programs
and tools that deliver digital content.
• Not extensible; do not enable easy integration of
new tools and services
Fedora History
• Research (1997-present) :
– DARPA and NSF-funded research project at Cornell
(Carl Lagoze and Sandy Payette)
– Reference implementation developed at Cornell
• First Application (1999-2001) :
– University of Virginia digital library prototype
(Thorny Staples and Ross Wayland)
– Scale/stress testing for 10,000,000 objects
• Open Source Software (2002-present):
– Andrew W. Mellon Foundation granted Virginia and Cornell $1
million to develop a production-quality Fedora system
– Fedora 1.0 released in May 2003
– www.fedora.info
Fedora 1.x
• Architecture
• Software
• Release 1.2 Features
• Demo Use Cases
Digital Object Model
Architectural View
Persistent ID (PID)
Disseminators
System Metadata
Globally unique persistent id
Public view: access methods
for obtaining “disseminations”
of digital object content
Internal view: metadata
necessary to manage the object
Datastreams
Protected view: content
that makes up the “basis”
of the object
Digital Object Model
Example Disseminators
Persistent ID (PID)
Disseminators
Get Profile
List Items
Get Item
List Methods
Get DC Record
Default
Simple Image
System Metadata
Datastreams
Get Thumbnail
Get Medium
Get High
Get VeryHigh
Object Behavior Contracts
Behavior Definition Object
Persistent ID (PID)
System Metadata
Data Object
Datastreams
Persistent ID (PID)
Behavior Definition
Metadata
Disseminators
System Metadata
Datastreams
behavior contract
Persistent ID (PID)
System Metadata
Datastreams
Service Binding
Metadata (WSDL)
Behavior Mechanism Object
Web
Service
DEMO: Basic Use Cases
Image (multiple datastreams)
Image (Mr. SID)
EAD (Rita Mae Brown papers)
Text conversion (TEI to PDF)
Basic Search
Users access data objects through
behaviors (or disseminations).
Behavior
Datastream
File
Datastream
File
Datastream
File
Behavior
Users
PID
Behavior
Behavior
Application
Behavior
Objects
services
Dynamic
data
Managers have direct access to each
component of a data object.
Managers
Behavior
Datastream
File
Datastream
File
Datastream
File
Behavior
PID
Behavior
Behavior
Behavior
Definintion
Behavior
Mechanism
Fedora and Web Services
• Fedora Repository system is a web service
– Access/Search (API-A) and Management (API-M)
– Service descriptions published using WSDL
– Both SOAP and HTTP bindings
• Back-end services
– Digital object behaviors implemented as linkages to
other distributed web services
– Service binding metadata (WSDL) stored in special
Fedora Behavior Mechanism objects.
– Fedora acts as mediator to these services.
Fedora Repository System
Client and Web Service Interactions
Service
Content
Transform
Service
Service
Fedora
Repository
System
Web Service Dispatch
Backend
Web Service
user
client
application
user
web
browser
client
application
Frontend
Content
Transform
Service
Fedora Repository
Service Interfaces
• Management Service (API-M)
– Ingest - XML-encoded object submission
– Create - interactive object creation via API requests
– Maintain - interactive object modification via API requests
– Validate – application of integrity rules to objects
– Identify - generate unique object identifiers
– Security - authentication and access control
– Preserve - automatic content versioning and audit trail
– Export - XML-encoded object formats
• Access Service (API-A and API-A-LITE)
– Search - search repository for objects
– Object Reflection - what disseminations can the object provide?
– Object Dissemination - request a view of the object’s content
• OAI-PMH Provider Service
– OAI-DC records
Fedora Repository System
Client
Application
Batch
Program
Web
Browser
HTTP SOAP
HTTP SOAP
HTTP SOAP
Manage
Access
Search
Server
Application
Web Service
Exposure
Layer
HTTP
OAI Provider
Session Management
User Authentication
Management
Subsystem
Security
Subsystem
Access
Subsystem
Object Reflection
Component Mgmt
Policy Enforcement
Object Dissemination
HTTP
Object Validation
Users/Groups
PID Generation
External
Content
Source
Policies
HTTP
Storage Subsystem
FTP
External Content
Retriever
Digital Objects
XML Files
Datastreams
HTTP
FT P
External
Content
Source
Content
Remote
Service
SOAP
Object Mgmt
Policy Mgmt
Relational DB
Local
Service
Fedora 1.2 Software Feature Set
•
Open Fedora APIs
– Repository as web services (REST and SOAP bindings); WSDL interface defs
•
Flexible Digital Object Model
– Content View: objects as bundle of items (content and metadata)
– Service View: objects as a set of service methods (“behaviors”)
– Extensible functionality by associating services with objects
•
Repository System
–
–
–
–
–
–
•
Core Services: Management, Access/Search, OAI-PMH
Storage: XML object store; relational db object cache; relational db object registry
Mediation - auto-dispatching to distributed web services for content transformation
Auto-Indexing – system metadata and DC record of each object
HTTP Basic Authentication and Access Control
Built-in disseminator services: XSLT x-form, image manipulation, xml-to-PDF
Content Versioning
– Automatic version control (saves version of content/metadata when modified)
– Enables date-time stamped API requests (see object as it looked at a point in time)
•
Clients
–
–
–
–
Fedora Administrator: GUI client to create/maintain objects
Default Web browser interface: search; access objects via default disseminator
Command line utilities (batch load, ingest, purge, others)
Migration Utility – mass export/ingest
Fedora Software Distribution Package
• Open Source (Mozilla Public License)
• 100% Java (Sun Java J2SDK1.4)
• Supporting Technologies
–
–
–
–
–
–
Apache Tomcat 4.1 and Apache Axis (SOAP)
Xerces 2-2.0.2 for XML parsing and validation
Saxon 6.5 for XSLT transformation
Schematron 1.5 for validation
MySQL and Mckoi relational database
Oracle 9i support
• Deployment Platforms
– Windows 2000, NT, XP
– Solaris
– Linux
DEMO: Basic Use Cases
Image (multiple datastreams)
Image (Mr. SID)
EAD (Rita Mae Brown papers)
Text conversion (TEI to PDF)
Basic Search
Projects using Fedora
• University of Virginia: digital library (images, EAD, e-texts)
• Tufts University: educational (VUE/concept maps); digital
library
• VTLS: basis for new commercial product (library system)
• Indiana University: EVIA Digital Archive (video)
• Northwestern: academic technologies (images, art, video, etexts)
• Rutgers University: digital library (e-journals, numeric data)
• Yale University: Electronic Records Archive
• New York University: Humanities Computing Group
Fedora Downloads since May 2003
•
•
•
•
Total downloads: >1500
Average downloads per day: 9
# Countries: 32
Types of orgs:
–
–
–
–
–
–
–
–
–
Universities: libraries, IT, departments
Software and technology companies
Defense/military
Banks
National libraries and archives
Publishers
Research labs
Library automation vendors
Scholarly societies
Future Software Releases
December 2003 – December 2004
• Fedora Object XML (FOXML)
–
–
–
–
Internal storage format; direct expression of Fedora object model
Better support for relationships (“kinship” metadata)
Better support for audit trail (event history)
Format identifiers for dynamic service binding
• Shibboleth authentication
• Policy Enforcement
– XACML expression language
– Fedora policy enforcement module
•
•
•
•
•
•
Web interface for easy content submission
Batch object modification utility
Administrative Reporting
Object Event History (ABC/RDF disseminations)
Better support for “collections”
New ingest and export formats (METS1.3, DIDL)
Future Development Proposals
• Digital Library in a Box
– Full-featured DL application with “Fedora inside”
– Optimized for common set of content types
• Fedora Power Server
–
–
–
–
–
–
–
Integrity Management Tools
Service and link liveness checker
Fault Tolerance
Mirroring and Replication
Peer-to-peer interoperability features
Repository clustering
Load balancing
• Object Creation Tools
– Workflow applications based on content models
– Web interface for document/content submission
Questions?
www.fedora.info