Transcript Slide 1
iRODS - integrated Rule Oriented Data System Reagan Moore [email protected] 1 Development Team • DICE team • • • • • • • Arcot Rajasekar - iRODS Development Lead Mike Wan - iRODS Chief Architect Wayne Schroeder - iRODS Product Mgr., Developer Bing Zhu - Fedora, Windows Mike Conway - Java (Jargon) Paul Tooby - Documentation, Foundation Sheau-Yen Chen - Data Grid Administration • Reagan Moore - PI • Preservation • Richard Marciano - Preservation Development Lead • Chien-Yi Hou - Preservation Micro-services • Antoine de Torcy - Preservation Micro-services 2 Overview of iRODS Architecture User Can Search, Access, Add and Manage Data & Metadata iRODS Data System iRODS Data Server Disk, Tape, etc. iRODS Rule Engine Track policies iRODS Metadata Catalog Track information *Access data with Web-based Browser or iRODS GUI or Command Line clients.3 Scale of iRODS Data Grid • Number of files • Tens to millions to hundreds of millions of files • Size of data • Gigabytes to hundreds of terabytes to petabytes of data • Number of policy enforcement points • 64 actions define when policies are checked • System state information • 112 metadata attributes for system information per file • Number of functions • 185 composable micro-services • Number of storage systems that are linked • One to tens to a hundred storage resources • Number of data grids • One to federation of tens of data grids 4 Data are Inherently Distributed • Distributed sources • • Minimize risk of data loss, optimize access Distributed users • • Grid computing Distributed data storage • • Projects span multiple institutions Distributed analysis platforms • • Demo-1 Caching of data near user Multiple stages of data life cycle • Data repurposing for use in broader context 5 Organize Distributed Data into a Sharable Collection • Project repository • • Institutional repository • • French National Library National Archive • • NSF Temporal Dynamics of Learning Center Australian Research Collaboration Service National Library • • RENCI Data Grid linking resources across North Carolina National collaboration • • • Carolina Digital Repository for UNC collections Regional collaboration • • MotifNet - manage collection of analysis products NARA Transcontinental Persistent Archive Prototype, Taiwan International collaboration • • BaBar High Energy Physics (SLAC-IN2P3) National Optical Astronomy Observatory (Chile-US) 6 Logical Name Spaces Demo-2 Data Access Methods (C library, Unix shell, Java, Fedora) Institution Repository Storage Repository Data Grid • Storage location • Logical resource name space • User name • Logical user name space • File name • Logical file name space • File context (creation date,…) • Logical context (metadata) • Access constraints • Access controls Community controls the name spaces 7 Social Challenges • Every community prefers their user interface • • • • • • • • • • • • • Unix shell commands - icommands Java I/O library - JARGON / JUX C I/O library Portals - EnginFrame Digital Libraries - Fedora / Dspace Workflows - Kepler / Taverna Transport - GridFTP / Parrot Web browsers / Windows browser Load libraries - Python (Pyrods) User level file systems - FUSE / WebDAV / PetaFS Grid APIs - JSAGA Web services - URSpace / VOSpace Future ports - Islandora / iDROP Heterogenity Challenges • Many types of operating systems • • • • • Many types of storage systems • • • • Unix variants, 32-bit/64-bit Mac OSX/IntelPC, Mac OSX/PowerPc Linux Windows XP, Vista File systems Tape archives Cloud storage Different administrative domains • • • • Challenge-response authentication Kerberos GSI - Grid Security Infrastructure (PKI certificates) Shibboleth Data Virtualization Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Storage System Map from actions requested by the access method to a standard set of Micro-services. Map the standard Micro-services to standard operations. Map the operations to protocol supported by the operating system. 10 iRODS - Policy-based Management • Turn policies into computer actionable rules • Compose rules by chaining micro-services • Manage state information as attributes on namespaces: • Files / collections /users / resources / rules • Validate assessment criteria • Queries on state information, parsing of audit trails • Automate administrative functions 11 iput With Replication NASA Center for Computational Sciences iput data Client icat Resource 1 metadata Metadata Data Resource 2 Rule Base Data data /<filesystem> Rule Base Rule added to Rule database 12 Under the hood - a glimpse Austin Meta Data Catalog San Diego Chapel Hill DB iRODS Server Rule Engine iRODS Server Rule Engine iRODS Server Rule Engine • User asks for data (using logical properties) • Data request goes to 1st Server • Server looks up information in catalog • Catalog tells 2nd federated server has data • 1st server asks 2nd server for data • 2nd server applies Rules and serves data 13 iRODS Distributed Data Management 14 iRODS Wiki • Presentations, papers, tutorials • • http://irods.diceresearch.org Open source software - BSD license • • • Contributed clients, software Performance assessments Download source code • • • Windows - binary release Unix / Mac / Linux build from source iRODS Primer • • Morgan & Claypool Synthesis Lectures on Information Concepts, Retrieval, and Services iRODS Shows Unified “Virtual Collection” User With Client Views & Manages Data User Sees Single “Virtual Collection” My Data Project Data Reference Data Disk, Tape, Database, Filesystem, etc. Disk, Tape, Database, Filesystem, etc. Remote Disk, Tape, Filesystem, etc. The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection. 16 Infrastructure Independence • Manage properties of the collection independently of the choice of technology • Access, authentication, authorization, description, location, distribution, replication, integrity, retention • Enforce policies across all storage locations • Rule Engine resident at each storage site • Apply procedures at each remote storage site • Chain encapsulated operations into workflows • Use infrastructure independence to enable use of new technology without interruption • Integrate new access methods, new storage systems, new network protocols, new authentication systems 17 Data Grid Security • Manage name spaces for: • • {users, files, storage} Assign access controls as constraints imposed between two logical name spaces • Access controls remain invariant as files are moved within the data grid • • Controls on: Files / Storage systems / Metadata Authenticate each user access • • • Demo-3 PKI, Kerberos, challenge-response, Shibboleth Use internal or external identity management system Authorize all operations • • • ACLs (Access Control Lists) on users and groups Separate condition for execution of each rule Internal approval flags (IRB) within a rule 18 iRODS Rules and Micro-services Reagan W. Moore 19 Rule Base • Rules stored in core.irb file • • Separate copy of core.irb installed at each storage location Can have storage or site specific rules • Each rule is associated (through its name) with specific event in the iRODS framework (64 hooks) • • • acPreProcForPut acPostProcForPut acDeleteUser • Can also execute user-defined rules through the irule command Variables • Session variables • Define parameters associated with the client session, such as: • • • Workflow variables • Define parameters used within the workflow • • • $userNameClient $rodsZoneClient *A, *CollName stdout Persistent state information • • • Maintained across sessions, stored in iCAT DATA_NAME, DATA_SIZE, COLL_NAME, DATA_CHECKSUM META_DATA_ATTR_NAME, META_DATA_ATTR_UNITS iRods Rules • Each rule defines • • • • An action for an event Condition Action chains (micro-services and rules) Recovery chains • Invoked by servers to enforce policies • Invoked by clients to run workflows on servers • Rule types • Atomic -- applied immediately • Deferred -- run at a later time in the background • Periodic – run at a fix time interval 22 Format of a Rule • Action | Condition | MS1, …, MSn | RMS1, …, RMSn • Action • • • • • Name of action to be performed • Name known to the server and invoked by server Condition – condition under which the rule apply Micro-services - If applicable micro services will be executed Recovery micro-service - If any micro service fails, recovery micro service(s) executed to maintain transactional consistency Example of MS/RMS • • createFile(*F) ingestMetadata(*F,*M) removeFile(*F) rollback 23 Condition • Condition under which this Rule applies • Examples • • $rescName == demoResc8 $objPath like /x/y/z/* • Many operators • • • ==, !=, >, <, >=, <= %%, !! (and, or) expr like reg-expr , expr not like reg-expr , expr ::= string 24 Micro-services (MSs) • Well-defined Server-side Procedures and Functions • C functions on servers • MSs can be chained to form workflow using ‘##’ msiDataObjOpen(*A,*S_FD)## msiDataObjRead(*S_FD,10000,*R_BUF)## msiDataObjClose(*D_FD,*stat) • Flow control • • • • • whileExec - while loop forExec – for loop forEachExec – for each in the table or list break ifExec – if-else 25 Micro-services – flow control examples • whileExec • assign(*A,0)##whileExec( *A < 20, writeLine(stdout,*A)##assign(*A, *A + 4), nop##nop) • forExec • forExec(assign(*A,0), *A < 20 , assign(*A,*A + 4), writeLine(stdout,*A),nop) • ifExec • ifExec(*A > *D, assign(*A,*D),nop,assign(*D,*A),nop) 26 Other Micro-services • delayExec - execute MSs at a later time • • • Exec by the iRods batch server (irodsReServer) in the background Example • delayExec(<PLUSET>1m</PLUSET>,msiReplColl(*desc_coll,*desc_resc, backupMode,*outbuf),nop) Time keywords • PLUSET – exec after the specified time has passed • ET – exec at the specified time (<ET>23:00</ET>) • FT – repeat exec at the specified frequency • Can be combined • • remoteExec – execute MSs on remote servers • • • • <PLUSET>1m</PLUSET><EF>5m</EF> remoteExec(andal.sdsc.edu,null,msiSleep(10,0)##writeLine(stdout,open remote write in andal), nop) assign - assign a value to a parameter writeString - write a string to stdout buffer writeLine - write a line (with end of line) to stdout buffer 27 Micro-Services parameters • Micro-services communicate through: • Arguments/Parameters • Input from the initiator (client/server) • Lieterals • Variables • start with * • Output of a MS can be used as input of another MS in a MS chain • System Session Parameters • Start with “$” • Valid across rule invocations • Persistent data – iCat • Query the iCat • Valid across sessions • XMessages – out-of-band communications • • • • Sender obtains send/receive tickets Pass receive ticket to receivers Receiver use ticket to read msg Msg exchange • Between Parallel Session • Between the batch manager and the task manager on the task status 28 Example of passing parameters between Micro-services • trimColl.ir file: myTestRule||acGetIcatResults(*Action,*Condition,*B)## forEachExec(*B,msiDataObjTrim(*B,tgReplResc,null,1,null,*C), nop)|nop##nop *Action=trim%*Condition= COLL_NAME = '/tempZone/home/rods/loopTest' *Action%*Condition • irule –F trimColl.ir 29 Using the rulegen parser • • See: https://www.irods.org/index.php/HELP.rulegen Uses a nicer rule language and converts it into the core.irb version • rulegen –s rX.r • • rulegen –s rX.r > rX.ir • • This converts from the rulegen syntax to the core.irb syntax and displays the result on your screen This converts from the rulegen syntax to the core.irb syntax and stores the result in the file rX.ir irule –F rX.ir • Executes the policy 30 Adding metadata values mytestrule{ msiString2KeyValPair("FILETYPE_STATUS2=FTPASS",*kvp); msiAssociateKeyValuePairsToObj(*kvp,*path,"-d"); } INPUT *Att=$FILETYPE,*Val=$text,*path=/renci/home/rods/listMS.ir OUTPUT ruleExecOut Note that there cannot be any spaces around the “=“ sign within the msiString2KeyValPair micro-service. Spaces are interpreted as part of the attribute name and attribute value. 31 Adding Metadata mytestrule{ msiString2KeyValPair("*attrname=*attrvalue",*kvp); assign(*A,*path/*obj); writeLine(stdout,*A); msiAssociateKeyValuePairsToObj(*kvp,*path/*obj,"-d"); } INPUT *path=/renci/home/rods,*obj=$listMS.ir,*attrname="FILETYPE", *attrvalue="25" OUTPUT ruleExecOut 32 Reading user-defined metadata acGetDataObjAVU{ msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName'", *Query); msiExecStrCondQuery(*Query, *GenQOut); forEachExec(*GenQOut){ msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue); msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName); msiGetValByKey(*GenQOut, DATA_NAME, *name); writeLine(stdout,"*name has attribute *AttrName and value *AttrValue"); } } INPUT *CollName="$/renci/home/rods" OUTPUT ruleExecOut This lists all of the user-defined metadata values for all of the files in the named collection 33 Example of multiple conditions acGetDataObjAVU{ msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName' and META_DATA_ATTR_NAME = '*AttrName'", *Query); msiExecStrCondQuery(*Query, *GenQOut); forEachExec(*GenQOut){ msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue); msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName); msiGetValByKey(*GenQOut, DATA_NAME, *name); writeLine(stdout,"*name has attribute *AttrName and value *AttrValue"); } } INPUT *CollName="$/renci/home/rods", *AttrName="FILETYPE" OUTPUT ruleExecOut This only lists files that have the specified attribute name 34 Simple rule to list files testlist.ir mytestRule||acGetIcatResults(*Action,*Condition,*B)## forEachExec(*B,msiGetValByKey(*B,DATA_NAME,*D)## msiGetValByKey(*B,COLL_NAME,*E)## writeLine(stdout,*E/*D),nop)|nop##nop *C=/renci/home/rods%*Action=list%*Condition=COLL_NAME = ’*C' ruleExecOut Try irule -F testlist.ir prompt irule -F testlist.ir ‘yourpathname’ irule -F testlist.ir *C=‘yourpathname’ 35 Converting String to AVU triplet testrule|| msiDataObjChksum(*objPath,null,*ChksumStr)## msiGetSystemTime(*Date,human)## msiString2KeyValPair(Checksum.*Date=*ChksumStr,*KVPair)## msiAssociateKeyValuePairsToObj(*KVPair,*objPath,-d)|nop *objPath=/tempZone/home/antoine/tmp.txt ruleExecOut 36 Installation of iRODS Chien-Yi Hou 37 iRODS Wiki • • • • • • • http://irods.diceresearch.org Descriptions of the technology Publications / presentations Download Performance tests Tinderbox system (tracks upgrades) irods-chat page 38 iRODS installation • Download appropriate installation manual from iRODS Wiki http://irods.dicerearch.org • Installation procedure will take • • • Up to 30 minutes for server/catalog/clients Up to 10 minutes for server/clients About 3 minutes for clients • We will do a client install 39 Windows Installation • From the URL https://www.irods.org/index.php/windows go to the section labeled Windows i-Commands and click on the file 10-29-09: Windows i-commands 2.2 • This will download the file win_icmds_2_2.zip • Uncompress the file 40 Detailed Windows Install • Extract the exe files. This will be a long list of separate executable commands, one for each type of operation that you may need to perform. The list will include: iadmin icd ils - used by the data grid administrator to set up resources and accounts - change to a different directory in the data grid - list files in a data grid directory • To use these icommands, you will need to set up an environment variable file which has default settings for the data grid that the class will use. • Note the directory name where you have put the executables 41 Detailed Windows Install • On the URL https://www.irods.org/index.php/windows there are instructions in the section labeled Setting up the iRODS User Environment file in Windows (for icommands only) • To create the .irodsEnv file: * Launch a "Command Prompt" by navigating to the menu "Start" -> "Accessories" -> "Command Prompt". * Change directory to the user home directory. > cd %HOMEDRIVE%%HOMEPATH% • * Type the following Windows command to create a folder, ".irods", and move into this directory. > md .irods > cd .irods > Notepad .irodsEnv • This will launch a Notepad and create a text file named ".irodsEnv". 42 Detailed Windows Install • Enter the following information into Notepad and click save. irodsHost iren.renci.org’ irodsPort 1247 irodsDefResource 'renci-vault1' irodsHome '/RENCI/home/usertutor1' irodsCwd '/RENCI/home/usertutor1' irodsUserName ’usertutor1' irodsZone ’renci’ • These are the Environment variables for a user account on the data grid ‘RENCI’ You will need to replace the three occurrences of ‘usertutor1’ with your iRODS account name on lines 4, 5, 6 • 43 Detailed Windows Install • • • • To run i-commands in any directory in a Windows machine, the path to where i-commands reside should be set in the Windows PATH environment variable. To do this, launch the System dialogue via: * Start -> settings -> control panel. * Click the "System" icon. * In the "Advanced" tab, click the "Environment variables" button. Add the path name for the i-commands directory to the "PATH" either in user category or the system category. The path name can be found from the window that shows the icommand executables. Add a semi-colon and this path name to the end of the PATH text. Then close the window and start a new command prompt window. You will be able to execute the icommands from any directory on your system. 44 Detailed Windows Install • To connect to the data grid, type iinit • To change your password, type ipasswd You will be prompted for your current password You will then be asked for the new password 45 iRODS - Unix/Linux/Mac Installation • https://www.irods.org/download.html • Fill out form for: • • • BSD license Registration / agreement Tar file • • • • Installation script (Linux, Solaris, Mac OSX) Automated download of PostgreSQL, ODBC Installation of PostgreSQL, ODBC, iRODS Initiation of iRODS collection 46 iRODS Installation- Unix • Unpack the release tar file • • gzip -d irods.tgz tar xf irods.tar • cd into the top directory and execute • ./irodssetup • It will prompt for a few parameters 47 irodssetup • • • Set up iRODS -----------------------------------------------------------------------iRODS is a flexible data archive management system that supports many different site configurations. This script will ask you a few questions, then automatically build and configure iRODS. • • • • • There are four main components to iRODS: 1. An iRODS server that manages stored data. 2. An iCAT catalog that manages metadata about the data. 3. A database used by the catalog. 4. A set of 'i-commands' for command-line access to your data. • You can build some, or all of these, in a few standard configurations. For new users, we recommend that you build everything. 48 iRODS Client Installation • • • • • iRODS configuration setup ---------------------------------------------------------------This script prompts you for key iRODS configuration options. Default values (if any) are shown in square brackets [ ] at each prompt. Press return to use the default, or enter a new value. • For flexibility, iRODS has a lot of configuration options. Often • the standard settings are sufficient, but if you need more control • enter yes and additional questions will be asked. • Include additional prompts for advanced settings [no]? 49 iRODS Client Installation • • • • iRODS configuration (advanced) -----------------------------iRODS consists of clients (e.g. i-commands) with at least one iRODS server. One server must include the iRODS metadata catalog (iCAT). • • For the initial installation, you would normally build the server with the iCAT (an iCAT-Enabled Server, IES), along with the i-commands. • • • • After that, you might want to build another Server to support another storage resource on another computer (where you are running this now). You would then build the iRODS server non-ICAT, and configure it with the IES host name (the servers connect to the IES for ICAT operations). • • If you already have iRODS installed (an IES), you may skip building the iRODS server and iCAT, and just build the command-line tools. • Build an iRODS server [yes]? no 50 iRODS Client Installation • • • • • • • • • • iRODS can make use of the Grid Security Infrastructure (GSI) authentication system in addition to the iRODS secure password system (challenge/response, no plain-text). In most cases, the iRODS password system is sufficient but if you are using GSI for other applications, you might want to include GSI in iRODS. Both the clients and servers need to be built with GSI and then users can select it by setting irodsAuthScheme=GSI in their .irodsEnv files (or still use the iRODS password system if they want). Include GSI [no]? no 51 iRODS Client Installation • Confirmation • -----------• Please confirm your choices. • -------------------------------------------------------• GSI not selected • • • • • Build iRODS command-line tools -------------------------------------------------------Save configuration (irods.config) [yes]? Saved. Start iRODS build [yes]? 52 iRODS Client Installation • • Build and configure ------------------- • • • • • • • • • • • • • • • • Preparing... Configuring iRODS... Step 1 of 4: Enabling modules... properties Step 2 of 4: Verifying configuration... No database configured. Step 3 of 4: Checking host system... Host OS is Mac OS X. Perl: /usr/bin/perl C compiler: /usr/bin/gcc (gcc) Flags: none Loader: /usr/bin/gcc Flags: none Archiver: /usr/bin/ar Ranlib: /usr/bin/ranlib 64-bit addressing not supported and automatically disabled. 53 iRODS Client Installation • • • • • • • • Step 4 of 4: Updating configuration files... Updating config.mk... Created /iRODS/config/config.mk Updating platform.mk... Created /iRODS/config/platform.mk Updating irods.config... Updating irodsctl... Compiling iRODS... • Step 1 of 2: Compiling library and i-commands... • Step 2 of 2: Compiling tests... • Done! 54 iRODS Client Installation • ----- • • • • • To use the iRODS command-line tools, update your PATH: For csh users: set path=(/iRODS/clients/icommands/bin $path) For sh or bash users: PATH=/iRODS/clients/icommands/bin:$PATH • • Please see the iRODS documentation for additional notes on how to manage the servers and adjust the configuration. • Change the path name to your installation path 55 Environment Variables • In home directory • • cd ~/.irods vi .irodsEnv • Default values to describe settings for interacting with your data grid 56 Environment File # iRODS personal configuration file. # # This file was automatically created during iRODS installation. # Created Fri Jan 18 10:01:48 2008 # # iRODS server host name: irodsHost ‘iren.renci.org’ # iRODS server port number: irodsPort 1247 # Home directory in iRODS: irodsHome ’/RENCI/home/usertutor1' # Current directory in iRODS: irodsCwd ’/RENCI/home/usertutor1' # Account name: irodsUserName ’usertutor1' # Zone: irodsZone ’renci' 57 User Configuration • To use the iRODS 'i-commands', update your PATH: • For csh users: • set path=(/storage-site/iRODS/clients/icommands/bin $path) • For sh or bash users: • PATH=/storage-site/iRODS/clients/icommands/bin:$PATH 58 irodsctl - script to control iRODS • Usage is: • ./irods/irodsctl [options] [commands] • Help options: • --help Show this help information • Verbosity options: • --quiet Suppress all messages • --verbose Output all messages (default) • iRODS server Commands: • istart • istop • irestart Start the iRODS servers Stop the iRODS servers Restart the iRODS servers 59 irodsctl options • • • • • • • • • • • • • Database commands: dbstart Start the database servers dbstop Stop the database servers dbrestart Restart the database servers dbdrop Delete the iRODS tables in the database dboptimize Optimize the iRODS tables in the database dbvacuum Same as 'optimize' General Commands: start Start the iRODS and database servers stop Stop the iRODS and database servers restart Restart the iRODS and database servers status Show the status of iRODS and database servers test Test the iRODS installation 60