The Supporting Digital Scholarship Project

Download Report

Transcript The Supporting Digital Scholarship Project

Fedora Commons:
Introduction and Update
Swedish National Library
June 24, 2008
The Flexible Extensible Digital
Repository Architecture
• A repository management system
• A foundation for many information
management applications
• Designed to make data “durable” over the
long term
• A set of abstractions that can be used to
represent different kinds of data
The Fedora Project
• Developed at Cornell under an NSF grant
• UVA Library re-interpreted the architecture
and created the first practical implementation
• 3 year project funded in 2001 by Andrew W.
Mellon Foundation to create open-source
system
• Another 3 years of development funded by
Mellon in 2004
Fedora Commons, Inc.
• 501-(c)3 private, non-profit company
• 4-year project funded by Moore Foundation to
become self-sustaining
• Continuing software development
• Moving towards community-based software
development
• Establishing “solution councils” for the
development of solution bundles.
The world we work in…
Scholarly and Scientific Collections
Data Curation, Linking, Publishing
Preservation and Archiving
Education, Knowledge Spaces
blog and wiki
and more …
127 Current Known Users
•
•
•
•
•
•
•
•
•
•
•
•
Consortia – 5
Corporations – 12
Government agencies – 2
IT- Related Institutions – 6
Medical Centers and Libraries – 4
Museums and Cultural Organizations – 4
National Libraries and Archives – 16
Professional Societies – 2
Publishing - 4
Research Groups and Projects – 11
Semantic and Virtual Library Projects - 6
University Libraries and Archives - 55
7 Known Vendors and Integrators:
•
•
•
•
•
•
•
Acuity Unlimited
Atos Origin, France
CARE Affiliates
Fitz Karlsrhue
MediaShelf, LLC
Sun Microsystems
VTLS
A data object is one unit of
content.
Persistent ID (PID)
System Metadata
Policies
Datastreams managed
by the system
Relationships
Local Content
Datastreams for the
components of the content
Datastream Types
• Inline XML : content in the FoXML object
• Managed Content : content is managed by
the repository
• Externally Referenced: URL of remote
content is in the FoXML object
• Re-directed Referenced: external but content
is not disseminated through Fedora
Datastream Characteristics
• Can have any number and multiple types in
the same object
• Versioned automatically by default
• Checksums automatically by default
• Formal identifier
• Alternate identifiers
• Audit trail maintained about all datastream
actions
Relationships Among Objects
• Describes adjacency relationships among
objects
• RDF data of the form:
PID – typeOfRelationship – relatedObjectPID
• Can used to assemble aggregations of
objects
• Can build graphs of relationships to feed into
user interfaces
Optional Object Behaviors
• Data objects can have different views or
transformations
• Sets of abstract behaviors that different kinds
of objects can subscribe to
• Corresponding sets of services that specific
objects can execute
• The business logic is hidden behind an
abstraction
General Image Object
Persistent ID
(PID)
System
Metadata
thumbnail image file
med res. image file
high res. image file
max res. image file
get-thumbnail-sized-image
get-med-sized-image
get-high-res-image
Service Mechanism
for General Image Objects
get-max-sized-image
get-thumbnail-sized-image
get-med-sized-image
JPEG2000 Image Object
Persistent ID
(PID)
Service Description
get-high-res-image
get-max-sized-image
System
Metadata
get-smallest-JPEG2000-size
JPEG200
image file
get-midrange-JPEG2000-size
get-high-res-JPEG2000-size
get-max-JPEG2000-size
ServiceMechanism
for JPEG2000 Image Objects
Content Models
• Create classes of data objects
• Expressed as Cmodel objects
• A Cmodel object defines the number and
types of data streams for objects of that class
• A Cmodel object binds to service object to
enable appropriate behaviors to be inherited
by data objects
Service Definition Object
Persistent ID (PID)
System Metadata
Cmodel Object
Persistent ID (PID)
Datastreams
Service Definition
Metadata
RDF data
System Metadata
Datastreams
service contract
Persistent ID (PID)
Data Objects
Persistent ID (PID)
System Metadata
Datastreams
SystemMetadata
Service Binding
Metadata (WSDL)
Datastreams
Service Mechanism Object
Web
Service
A behavior call has the form:
Object PID + BDef Name + Method Name
Other components include:
Parameter values used by the method
-
Datetime stamp for earlier version
Object Representing Aggregations
• Creating parent objects for complex
resources
• Representing explicit collections
• Representing implicit collection
• Creating digital surrogates for physical
entities
The Fedora Service Framework
Fedora Repository Service
GSearch
OAI
Simple
JMS
DirIngest
Preserve
These are the core servivce components we distribute.
Solution Councils
• Community group that creates and maintains
the vision for solution bundle in an area
• Led by a “champion”
• Small group that gets things done
• Gathers resources to create software for
solution
• Coordinates development with the FC
Architecture Council
Solution Areas
• Preservation and Archiving – Ron Jantz, from
Rutgers
• Data Curation – Sayeed Choudry, from Johns
Hopkins University
• Publishing – Rich Cave, from PLOS
• Integration Services – Matt Zumwalt from
MediaShelf, LLC
Other Possible Community Groups
• Other software development groups
• News and Publications Outreach group that
works with our Communications Director
• Issue/advocacy groups that work on
standards important to the community
Collaboration Discussion with
DSpace
• Conversation has just begun
• DSpace will experiment with Fedora in
Google Summer of Code, 2008
• Possibilites:
– Dspace 2.0 expresses data model using Fedora
objects (Dspace could be the “reference” IR
solution bundle)
– Shared development of services used by both
– Sharing administration
– ????
http://www.fedora-commons.org/