Transcript Slide 1
Presentations • Introduction • Case Studies: – Policies, Services, Interoperability, Mashups: • BNF, DCAPE, PoDRI, e-Legacy – RENCI Federated Data Projects: • NARA TPAP, RENCI VO, TIP – Interfaces: • Islandora, Jargon, CDR iRODS federates major collections From Ken Arnold, SHAMAN project A Unified Web interface for Browsing or searching User Sees Single Hierarchy Flickr file system YouTube New Service /flickr/commons/ Using flickr API, a RESTful web API Media accessible through API Mountable file system: Hulu, photobucket, etc. Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file system For a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a tree Each mountable service is made into a resource with all relevant info (location, resource type, etc. iRODS Shows Unified “Virtual Collection” User With Client Views & Manages Data User Sees Single “Virtual Collection” My Data My Data Partner’s Data Disk, Tape, Database, Filesystem, etc. Disk, Tape, Database, Filesystem, etc. Remote Disk, Tape, Filesystem, etc. The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection. Accessing Data in the iRODS System User “I need data!” With iRODS Client searches CATALOG to find and get Data “Finds the data.” “Gets data to user.” iRODS Data System iRODS Metadata Catalog Keeps track of data Data Server Disk, Tape, Database, Filesystem, etc. Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more. Overview of iRODS Components User Interface Web or GUI Client to Access and Manage Data & Metadata* iRODS Server Data on Disk iRODS Rule Engine Implements Policies iRODS Metadata Catalog Database Tracks state of data *Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc. "Layers" in iRODS: From Users to Storage Community Policies Decides how to manage shared Collection(s) Express goals for data access, sharing, preservation, etc. Administrator/User Applies Rules Rules iRODS Server Executes Microservices Implement Policies in computer-actionable form Micro-services Operate on reomte data Under the hood - a glimpse NC State Meta Data Catalog Duke Chapel Hill DB iRODS Server Rule Engine iRODS Server Rule Engine iRODS Server Rule Engine • User asks for data (using logical properties) • Data request goes to 1st Server • Server looks up information in catalog • Catalog tells 2nd federated server has data • 1st server asks 2nd server for data • 2nd server applies Rules and serves data Policies in iRODS • Policies: Express community goals for data access and sharing, management, long-term preservation, uses, etc. • Policy Examples – Run a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website). – Automatically replicate a file added to a collection into 3 geographically distributed sites. – Automatically extract metadata for a file of a certain type and store in metadata catalog. – Periodically check integrity of files in a Collection and repair/replace if needed/possible. – Automatically pick a certain storage location based on user or collection or size or type. – Let a user access a collection only if using certificate-based login. – Send a notification when a certain file is ingested. – etc. Policies, Services, Interoperability, Mashups: Richard Marciano, SILS e-Legacy Mashup RSS Feed Reader Data Grid (SRB/iRODS) e-Legacy Demo Appraisal Subscribe to RSS Description Arrangement Preservation Review Received Entry Preserve to iRODS Yes Share and Tag Meet Preservation Criteria National Library of France: Distributed Archiving & Preservation System (SPAR) BNF: French National Library • Three rules: – Import • Import an input document into iRODS • Add import date and checksum as AVU-triplet metadata • Replicate to other resources – Get • Locate a copy of the record • Return if physical checksum .eq. stored checksum • If not, delete replica, copy a good one over it – Audit • • • • • Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory BNF: French National Library • Three rules: – Import • Import an input document into iRODS • Add import date and checksum as AVU-triplet metadata • Replicate to other resources – Get • Locate a copy of the record • Return if physical checksum .eq. stored checksum • If not, delete replica, copy a good one over it – Audit • • • • • Locate all replicas of a data object Compute a physical checksum using system’s MD5 Compare the result of the checksum stored in user metadata All stale copies are removed and then replicated from another good copy When all copies are audited, a clean copy is staged onto a specific FS directory BNF: French National Library • Micro-Services – Add metadata to an iRODS object – Import an object into iRODS, compute MD5 checksum and validate against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODS – Return the value of an iRODS object metadata attribute – Prepare to retrieve a metadata attribute for a resource – Prepare to retrieve a metadata attribute for an object – Get the input resources belonging to a zone name – Get iCAT results regarding location info for a record – Execute MD5SUM on the physical content and return value – Return a pseudo random string of specified length – Delete a stale replica and replicate over it from another fresh copy – Stale replica replacement can be eager (synchronous execution) or lazy (delayed execution) DCAPE DCAPE DCAPE PoDRI: Policy-Driven Repository Interoperability RENCI Federated Data Projects Leesa Brieger, RENCI RENCI VO Data Grid Duke NCSU iRODS Server iRODS Server ECU iRODS Server UNC-A iRODS Server UNC-CH iRODS Server RENCI, Europa Center iRODS Server • Client asks for data • Data request goes to iRODS server • Server looks up information in iCAT • iCAT tells which iRODS server has data • Data is retrieved from physical location and delivered to client DB Metadata Catalog (iCAT) National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP) Federation of Seven Independent Data Grids NARA I NARA II iCAT iCAT Georgia Tech iCAT Rocket Center iCAT UNC iCAT UMD iCAT UCSD iCAT • Extensible Environment: can federate with additional research and education sites. • Each data grid uses different vendor products. TUCASI Infrastructure Project (TIP) Federated Repositories TUCASI Infrastructure Project (TIP) Goals • Leverage data resources for competitive research and leadership • Support research and education efforts in a wide range of disciplines and domains • National leadership in next-generation data management • Model for long term campus storage • Architecture and design; hardware, software • Operations and support • Data policies Selection and retention Ingest, curation and preservation Collections and repository management A Test Classroom content on a DICE/RENCI data grid Panopto Elluminate Interfaces Jargon, Web, REST, SOAP Mike Conway, DICE Center Jargon, Java, Interface Developer Goals Make integration simple by creating clear, familiar service API. Make IRODS a familiar, easy-to-use resource to mid-tier Java developers. Develop a REST/SOAP service model for common use-cases using mature tools. Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists. Currently... •Jargon is a pure-Java API that talks to IRODS over Java sockets. •Jargon is fairly low-level and can be tricky at first. •Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library. Jargon (next...) Jargon-core: Jargon re-factored High level service API, POJO's, Spring-friendly Emphasis on testability Jargon-akubra: Implementation of an Akubra module for IRODS via Jargon Jargon-lingo: Application of mature opensource tools over Jargon-core to provide RESTful, SOAP, and Web interface to IRODS. Conceptual Diagram IRODS Service Model SOAP/RE ST Web Framewor ks Jargon-lingo DuraSpace Jargon-akubra Jargon-core IRODS Grid Custom code (Java, Groovy, Jython Jruby, etc.) TRLN Partners Questionnaire NC State Duke Duke Duke UNC Jim Tuttle Seth Shaw Winston Atkins Russell Koonts Will Owen 1. Preservation Projects • Geo NDIIPP • Images • e-Theses • Dissertations • records • TRAC • 30 criteria • Fedora iRODS • checksum • 2 copies • CDR 2. Status • Planned • planned • production • ½ way • testing phase • near production 3. Preservation Challenges • permission • auditing • replication • search/browse • version control • policies • tiered storage • getting the backlog • generating meta. • consolidating meta. • prez. planning • sys. reliability 4. iRODS • no • no • no • yes • yes 5. iRODS Challenges • NA • NA • NA • none • rules syntax 6. Questions • documentation • production configuration • stable release None None None • working w. archivists • maintenance releases • iRODS book