Transcript Slide 1

Presentations
• Introduction
• Case Studies:
– Policies, Services, Interoperability, Mashups:
• BNF, DCAPE, PoDRI, e-Legacy
– RENCI Federated Data Projects:
• NARA TPAP, RENCI VO, TIP
– Interfaces:
• Islandora, Jargon, CDR
iRODS federates major collections
From Ken Arnold, SHAMAN project
A Unified
Web interface for
Browsing or searching
User Sees Single Hierarchy
Flickr file system
YouTube
New Service
/flickr/commons/
Using flickr API, a
RESTful web API
Media accessible
through API
Mountable file system:
Hulu, photobucket, etc.
Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr
API, presented to iRODS as if it were a file system
For a collection to integrate, it would need to have some remote API that we could write a driver
for and one or more ways to map that collection into a tree
Each mountable service is made into a resource with all relevant info (location, resource type, etc.
iRODS Shows Unified “Virtual Collection”
User
With Client Views
& Manages Data
User Sees Single “Virtual Collection”
My Data
My Data
Partner’s Data
Disk, Tape, Database,
Filesystem, etc.
Disk, Tape, Database,
Filesystem, etc.
Remote Disk, Tape,
Filesystem, etc.
The iRODS Data System can install in a “layer” over existing or new data, letting
you view, manage, and share part or all of diverse data in a unified Collection.
Accessing Data in the iRODS System
User
“I need data!”
With iRODS Client
searches CATALOG to find
and get Data
“Finds the data.”
“Gets data to user.”
iRODS Data System
iRODS Metadata
Catalog
Keeps track of data
Data Server
Disk, Tape, Database,
Filesystem, etc.
Users can search for, access, add/extract metadata, annotate, analyze & process,
replicate, copy, share data, manage & track access, subscribe, and more.
Overview of iRODS Components
User Interface
Web or GUI Client to
Access and Manage Data
& Metadata*
iRODS Server
Data on Disk
iRODS Rule
Engine
Implements
Policies
iRODS Metadata
Catalog
Database
Tracks state of data
*Access data with: Web-based Browser, iRODS GUI, Command Line clients,
Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.
"Layers" in iRODS: From Users to Storage
Community
Policies
Decides how to
manage shared
Collection(s)
Express goals for data
access, sharing,
preservation, etc.
Administrator/User
Applies Rules
Rules
iRODS Server
Executes Microservices
Implement Policies in
computer-actionable form
Micro-services
Operate on reomte data
Under the hood - a glimpse
NC State
Meta Data
Catalog
Duke
Chapel Hill
DB
iRODS Server
Rule Engine
iRODS Server Rule
Engine
iRODS Server
Rule Engine
• User asks for data (using logical properties)
• Data request goes to 1st Server
• Server looks up information in catalog
• Catalog tells 2nd federated server has data
• 1st server asks 2nd server for data
• 2nd server applies Rules and serves data
Policies in iRODS
• Policies: Express community goals for data access and sharing,
management, long-term preservation, uses, etc.
• Policy Examples
– Run a particular workflow when a “set of files” is ingested into a collection
(e.g. make thumbnails of images, post to website).
– Automatically replicate a file added to a collection into 3 geographically
distributed sites.
– Automatically extract metadata for a file of a certain type and store in
metadata catalog.
– Periodically check integrity of files in a Collection and repair/replace if
needed/possible.
– Automatically pick a certain storage location based on user or collection or
size or type.
– Let a user access a collection only if using certificate-based login.
– Send a notification when a certain file is ingested.
– etc.
Policies, Services,
Interoperability, Mashups:
Richard Marciano, SILS
e-Legacy Mashup
RSS
Feed
Reader
Data Grid
(SRB/iRODS)
e-Legacy Demo
Appraisal
Subscribe
to RSS
Description
Arrangement
Preservation
Review
Received Entry
Preserve to
iRODS
Yes
Share
and Tag
Meet
Preservation
Criteria
National Library of France:
Distributed Archiving & Preservation System (SPAR)
BNF: French National Library
• Three rules:
– Import
• Import an input document into iRODS
• Add import date and checksum as AVU-triplet metadata
• Replicate to other resources
– Get
• Locate a copy of the record
• Return if physical checksum .eq. stored checksum
• If not, delete replica, copy a good one over it
– Audit
•
•
•
•
•
Locate all replicas of a data object
Compute a physical checksum using system’s MD5
Compare the result of the checksum stored in user metadata
All stale copies are removed and then replicated from another good copy
When all copies are audited, a clean copy is staged onto a specific FS directory
BNF: French National Library
• Three rules:
– Import
• Import an input document into iRODS
• Add import date and checksum as AVU-triplet metadata
• Replicate to other resources
– Get
• Locate a copy of the record
• Return if physical checksum .eq. stored checksum
• If not, delete replica, copy a good one over it
– Audit
•
•
•
•
•
Locate all replicas of a data object
Compute a physical checksum using system’s MD5
Compare the result of the checksum stored in user metadata
All stale copies are removed and then replicated from another good copy
When all copies are audited, a clean copy is staged onto a specific FS directory
BNF: French National Library
• Micro-Services
– Add metadata to an iRODS object
– Import an object into iRODS, compute MD5 checksum and validate
against the supplied one. Once validated, add MD5SUM and import
date as metadata. If invalid, content is removed from iRODS
– Return the value of an iRODS object metadata attribute
– Prepare to retrieve a metadata attribute for a resource
– Prepare to retrieve a metadata attribute for an object
– Get the input resources belonging to a zone name
– Get iCAT results regarding location info for a record
– Execute MD5SUM on the physical content and return value
– Return a pseudo random string of specified length
– Delete a stale replica and replicate over it from another fresh copy
– Stale replica replacement can be eager (synchronous execution) or
lazy (delayed execution)
DCAPE
DCAPE
DCAPE
PoDRI: Policy-Driven Repository Interoperability
RENCI Federated Data Projects
Leesa Brieger, RENCI
RENCI VO Data Grid
Duke
NCSU
iRODS Server
iRODS Server
ECU
iRODS Server
UNC-A
iRODS Server
UNC-CH
iRODS Server
RENCI, Europa Center
iRODS Server
• Client asks for data
• Data request goes to iRODS server
• Server looks up information in iCAT
• iCAT tells which iRODS server has data
• Data is retrieved from physical location
and delivered to client
DB
Metadata
Catalog (iCAT)
National Archives and Records Administration
Transcontinental Persistent Archive Prototype (TPAP)
Federation of Seven
Independent Data Grids
NARA I
NARA II
iCAT
iCAT
Georgia Tech
iCAT
Rocket Center
iCAT
UNC
iCAT
UMD
iCAT
UCSD
iCAT
• Extensible Environment:
can federate with additional research and education sites.
• Each data grid uses different vendor products.
TUCASI Infrastructure Project (TIP)
Federated Repositories
TUCASI Infrastructure Project (TIP)
Goals
• Leverage data resources for competitive
research and leadership
• Support research and education efforts in a wide range of disciplines and
domains
• National leadership in next-generation data management
• Model for long term campus storage
• Architecture and design; hardware, software
• Operations and support
• Data policies
 Selection and retention
 Ingest, curation and preservation
 Collections and repository management
A Test
Classroom content on a DICE/RENCI data grid
Panopto
Elluminate
Interfaces
Jargon, Web, REST, SOAP
Mike Conway, DICE Center
Jargon, Java, Interface Developer
Goals
Make integration simple by creating clear,
familiar service API.
Make IRODS a familiar, easy-to-use
resource to mid-tier Java developers.
Develop a REST/SOAP service model for
common use-cases using mature tools.
Create an out-of-the-box web interface that
makes IRODS easy for administrators and
archivists.

Currently...
•Jargon is a pure-Java API that talks to IRODS
over Java sockets.
•Jargon is fairly low-level and can be tricky
at first.
•Used in multiple projects including
WebDAV interface, as well as integration
with the Fedora repository via the
irodsfedora library.
Jargon (next...)

Jargon-core: Jargon re-factored




High level service API, POJO's, Spring-friendly
Emphasis on testability
Jargon-akubra: Implementation of an Akubra
module for IRODS via Jargon
Jargon-lingo: Application of mature opensource tools over Jargon-core to provide RESTful, SOAP, and Web interface to IRODS.
Conceptual Diagram
IRODS
Service
Model
SOAP/RE
ST
Web
Framewor
ks
Jargon-lingo
DuraSpace
Jargon-akubra
Jargon-core
IRODS Grid
Custom code
(Java, Groovy,
Jython
Jruby, etc.)
TRLN Partners Questionnaire
NC State
Duke
Duke
Duke
UNC
Jim Tuttle
Seth Shaw
Winston Atkins
Russell Koonts
Will Owen
1. Preservation
Projects
• Geo NDIIPP
• Images
• e-Theses
• Dissertations
• records
• TRAC
• 30 criteria
• Fedora  iRODS
• checksum
• 2 copies
• CDR
2. Status
• Planned
• planned
• production
• ½ way
• testing phase
• near
production
3. Preservation
Challenges
• permission
• auditing
• replication
• search/browse
• version control
• policies
• tiered storage
• getting the
backlog
• generating meta.
• consolidating
meta.
• prez. planning
• sys. reliability
4. iRODS
• no
• no
• no
• yes
• yes
5. iRODS
Challenges
• NA
• NA
• NA
• none
• rules syntax
6. Questions
• documentation
• production
configuration
• stable release
None
None
None
• working w.
archivists
• maintenance
releases
• iRODS book