Architecting Extensible Digital Repository Services Robert Chavez, Robert Dockins, Anoop Kumar, Matthew Mcvey, Ranjani Saigal, Nikolai Schwertner Tufts University, Medford, MA Fedora Users Conference, Rutgers University, May.

Download Report

Transcript Architecting Extensible Digital Repository Services Robert Chavez, Robert Dockins, Anoop Kumar, Matthew Mcvey, Ranjani Saigal, Nikolai Schwertner Tufts University, Medford, MA Fedora Users Conference, Rutgers University, May.

Architecting Extensible Digital
Repository Services
Robert Chavez, Robert Dockins, Anoop
Kumar, Matthew Mcvey, Ranjani Saigal,
Nikolai Schwertner
Tufts University, Medford, MA
Fedora Users Conference, Rutgers
University, May 13 2005
An Overview





Digital Collections at Tufts
Reasons for developing Tufts Digital Repository
(TDR)
Some design requirements and goals
The TDR architecture and services
Applications that interface with TDR
–
–

Tufts Digital Library
VUE
Future Directions
A Brief History of Digital Collections at
Tufts

Pre-existing Digital Projects/Libraries/Collections
–
–
–
–
Perseus Digital Library
Tufts University Science Knowledgebase (TUSK-Medicine)
Artifact Image Library (Art History)
Miscellaneous projects


Crime and Punishment, Faculty Publications, Faculty
Datasets, many and varied content management systems
Digital Collections and Archives (DCA)



steward of the University's permanently valuable digital
records and collections
many and varied digital collections
university records
Why TDR?





Digital collections and materials are continually growing; adding
content in a variety of formats.
Original architectures and systems were not built to
accommodate such expansion.
Original architectures and systems were not built to facilitate
interoperability or sharing of resources.
Needed a university-wide digital repository that could manage
the ever increasing content while continuing to service
discipline specific needs and leveraging existing and new tools
and services.
Need for DCA to support digital data warehouse services and
digital archival storage services for digital content of enduring
value.
Who?

Digital Collections and Archives (DCA), Academic
Technology (AT)
–

partnered to create a digital repository and digital library
application for managing content while supporting teaching
and learning at the university.
Roles (a bit over-simplified):
–
–
DCA: content developers, collection and deposit policy
creators, managers of repository
AT: content developers, applications and overall system
architects and developers
Design Requirements

Persistence:
–
–
–

Ingest:
–
–
–

Enforce archival standards
Ability to incorporate appraisal
Automated ingest workflow
Management:
–
–
–

Enforce unique persistent identifiers
Manage identifiers for multiple projects
Assurance that the data will be
preserved and retrievable over time
Use of information packages to facilitate
storage and dissemination
Incorporate content models
Rights/access management
Access/Interoperability:
–
–
Digital resources should be accessible
to multiple applications and systems
Authorization policies must be enforced

Scalability

(Re)Usability
–
Leverage existing and new tools and
services
Requirements
System Services
Unique and persistent
identification of materials
Naming Service
Adherence to the concept of
archival information packages
(AIP)
Digital Object Provider (DOP)
Service -- Fedora
Adherence to the concept of
submission information
Packages (SIP)
Drop Box, Ingestion Service
Adherence to the concept of
Dissemination Information
Packages (DIP)
DOP Service -- Fedora
Authentication and integrity
checking
DOP Service, Ingestion
Service
Dissemination
Disseminators, Caching
Service, TDL, Search Service
Access
TDL and other applications
TDR Architecture
A
Caching
Service
Interfacing
Services
Naming Service
A
P
Application
Interface
Fedora
Client
U
Drop Box
Fedora
Repository
Service
Ingestion
Service
P
P - Data Provider
A - Administrator
U - User
Arrows represent flow
of data
Indexing
Service
Search
Index
Search
Service
Application
Interface
U
Search
Interface
U
Services of TDR
Component
Role
Drop Box and Ingestion Service
Validation, Preprocessing, Appraisal, Transfer/Deposit
Naming Service
Unique persistent identifiers (URNs) mapped to objects,
management of URNs, management of repositories.
Mapping between existing URN schemas to Fedora
schema
Fedora Repository Service
Management and access framework for digital objects
Indexing and Search Services
Metadata and full-text index creation.
Search API and application
Bridge Services
Provides mechanisms for external applications to
interface with repository
Current System Architecture
TDL Application

How it all fits together, a working application
–
http://dl.tufts.edu
General TDL application
search transaction process
U
TDL App
Search Service
Search
Interface
Oracle Query
Builder
[JSP]
[Java App.]
Search Index
Main Index
TDL App
Search
Search Service
[Oracle]
Results
Results
Collation
[Search
Interface]
[Java App.]
Search Index
XML index
[Oracle]
Naming Service
URN-PID
resolution
[MySQL]
TDL App
U
Disseminator
Viewer
[JSP]
Repository
Service
Object
Dissemination
[Fedora]
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing and Search Services
Interfacing Services
Drop Box and Ingestion Service




automate the process of preparing materials
for ingest
validate materials before ingest
primarily for large-scale ingests
not an object factory (i.e., not a tool for
building individual objects)
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service
Indexing and Search Services
Interfacing Services
Naming Service

Assigns, reserves and resolves URNs
–
–

Manages repositories
–

The URN has a very flexible structure that can be tailor made to suit the special needs
of the particular naming convention.
Example: namespace1:namescape2:namespace:3:object_id
multiple production repositories, backup repositories, etc.
Tufts URN Formats examples
tufts:dca:central:MS102:33.1345
Perseus:text:1999.04.0006
97.5224.77-1729-47

URN Properties
–
–

Provides unique ID to objects deposited into repository
Service assures resolution to unique resource.
Implementation
–
MySQL, Java class, JSP Management console
Tufts Naming Service
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service
Indexing and Search Service
Interfacing Services
Fedora Repository Service

Fedora met many of our critical needs:
–
–
–
–
–
Modular nature of the repository service
Management of digital content over time (versioning, etc.)
Aggregation of mixed, possibly distributed, data into
complex objects
The ability to specify multiple content disseminations of
these objects
The ability to associate rights management schemes with
these disseminations.
Fedora Repository Service, cont…

Tufts Implementation Details:
–
–
–

External data stores
Modeling behaviors and content
Piece of a larger architecture; not out of the box solution
Tufts Repository Models/Policies
–
Fedora @ Tufts serves several purposes

Archival/institutional repository
–

Data warehouse
–

Guarantee functional preservation
Guarantee bitstream preservation
Active Repository
–
Active workspace; constantly updated content (i.e faculty data sets, faculty
pubs, content mapping)
Behavior Definitions






Atomic units: sets of
standardized behaviors
Building blocks of content
models
Allow for flexible reuse of
data
Contributes to interrepository sharing of
objects
Dissemination of standard
output: XML, plain text,
binary format
Rendering/processing of
disseminations is the
responsibility of
applications implemented
over the repository.
BDefs
Methods
tuftsAssetDef
getPreview
getLabel
getDescription
getFullView
getDefaultContent
getDescMetadata
getAdminMetadata
tuftsText
getTOC
getChuckList
getChunk
getHeader
tuftsBasicImage
getThumbnail
getScreensize
getMaxSize
getDynamicView
Content Models




Unique content models built
from content modeling
components.
Digital Objects that
subscribe to a given content
model inherit all methods
established by a particular
behavior.
Digital objects can subscribe
to content models that suit
their type or class.
Functional not presentation
specific
Implementation Challenges

Processing large (>10MB) XML Documents
–

Processing large images
–






XML databases
Imaging servers
Streaming Media
GIS data
Modeling Collections
Advanced Searching
“Shopping cart” searching
Caching Disseminations
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service
Indexing and Search Service
Interfacing Services
Indexing Search Service

Indexing
–
–
–
–

Implementation
–
–

Java indexing application
Oracle database
Supported Types of Search
–
–
–

Digital objects piped through from ingestion service
Metadata index
Full-text index
Specialized XML index
Basic full-text
Basic metadata
Advanced metadata
Accessing the service
–
–
HTTP GET/POST
SOAP
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Interfacing Services
Interfacing Services



An important design requirement for TDR was to allow current
digital library applications to easily interface with TDR and
provide access to the content in the digital repository within
their own environments in a seamless fashion.
Current applications like VUE can interface with this service to
allow their tools to disseminate the content that resides in TDL
The service is being designed not only to support current
applications but also to accommodate the needs of future yetto-be-defined applications like course management systems,
learning tools, portals etc.
Fedora OKI Bridge
Fedora
OKI
PID
Shared.Id
DR
DigitalRepository
FedoraObject
Asset
FedoraObjectIterator
AssetIterator
BehaviorInfoStructure
InfoStructure
Behavior
InfoRecord
DisseminationInfoPart
InfoPart
Dissemination
InfoField
ParameterInfoPart
InfoPart
Parameter
InfoField
DataOutputStreamInfoPart
InfoPart
DataOutputStream(MIMETypeStream)
InfoField
Applications Accessing TDR Content

Tufts Digital Library Application
–

http://dl.tufts.edu/
Visual Understanding Environment (VUE)
–
http://vue.tccs.tufts.edu/
Learning
Theories
- Constructivism
- Active Learning
- Individualized
Learning
VUE Overview
Technical Infrastructure
OKI-FEDORA Bridge
VUE
OKI
Support
- Faculty needs
- Learners needs
Extend
- Digital Libraries
- OKI Standards
DR
API
DR Implementations
FEDORA
Digital
Repository
Digital
Repository
Future Directions









Revised search service (Zebra?)
XML database for metadata and XML objects (eXist)
Customization and enhancement to address a wide variety of
needs (i.e. University Records).
Object factory: a workbench for building certain classes of
objects
Automated browsing service for Repository.
Authentication and authorization modules
Asset Definitions
Collection Modeling
Federation
Asset Definitions




The purpose of the Fedora Asset Definition is to define and
expose content types and methods of objects/assets in a
repository in a standard way. The goal is to facilitate access
between applications and digital repositories, digital
repositories and digital repositories, etc.
Some of the questions that we asked ourselves during our
repository and application development helped us form the
concept of an “Asset Definition.” For example:
How can an application find out what are the objects/assets
within a particular repository and how does one figure out how
to refer to these objects?
If one has an object/asset in a repository, how does one
describe it so that other applications can understand what they
can do with it?
Asset Definitions, cont…
getFullAssetDefintion
getPreview
getDescription
getFullView
getDefaultContent
getDescMetadata
getAdminMetadata
getThumbnail
getScreenSize
getMaxSize
getDynamicView
Collection Modeling
Collection Modeling

Object Relationships
–
–
–
–
–
–
Extend Fedora RDF to create collection networks
Recursive disseminators to track paths in the
network
Facilitate access to sets of materials
Facilitate management of digital objects
Facilitate browsing of sets of materials
http://nikolai.tccs.tufts.edu:1980/fedora/get/demo:
collectionAll/demo:Collection/viewMembers/