Transcript slides
1 Ceph – status update and xrootd testing Alastair Dewhurst, Tom Byrne Tom Byrne, 12th November 2014 Introduction • • • On 15th October gave overview talk on plans for Ceph at RAL Tier 1. Will aim to provide updates on progress made focusing on the xrootd deployment and testing. Current Ceph cluster with 7 nodes using 2013 generation hardware. Tom Byrne, 12th November 2014 2 S3 gateway • At last meeting we had S3 gateway on virtual machine: • • • Hope to have firewall holes + x.509 authentication working by next week. S3 gateway ‘does it’s own thing’ with files which means it is difficult to use with other plugins. Will investigate writing own WebDAV gateway. Tom Byrne, 12th November 2014 3 CERN plugins • CERN have four plugins based on XRootD for CEPH: • • • • • • radosfs (impl. file & directories in rados) xrootd-rados-oss (interfacing radosfs as OSS plug-in) xrootd-diamond-ofs (adding checksumming & TPC) xrootd-auth-change-id (adding NFS server style authentication to xrootd) Our work has been on the xrootd-diamond-ofs Setup instructions can be found: https://github.com/cern-eos/eos-diamond/wiki Tom Byrne, 12th November 2014 4 Xrootd deployment • • Used RPMs provided on wiki to setup XrootD gateway Had to setup a Cache tier because it currently doesn’t work directly with erasure coded pools • • This is because the file is opened and then appended to, CERN are working on patching it to work with EC. There are two pools: • Data and Meta-Data Tom Byrne, 12th November 2014 5 Cache Tier • Cache Tier is using mostly default settings • • • • 3 replicas of the data Will create a ‘cold’ erasure coded copy instantly LRU algorithm to clean up data. We would prefer not to use a Cache Tier and have direct access to Erasure coded pool • • It would be possible to have a ~10% Cache Tier in front of the storage. We believe Erasure coded pool should work well as we are not appending to files. Tom Byrne, 12th November 2014 6 Diamond data • Plugin splits file into chunks which are stored with a GUID in Ceph: • Makes it hard to manage files and write other plugins. [root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort 774b1a83-14d0-4fb9-a6c0-10e36c32febf 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008 774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e 774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f Tom Byrne, 12th November 2014 7 Diamond meta-data https://indico.cern.ch/event/305441/session/5/contribution/37/material/slides/0.pdf Tom Byrne, 12th November 2014 8 Testing • Have tried commands from: • • • 9 UI (using xrootd v3.3.6) Node (using xrootd v4.0.4) Can copy files in and out: [root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1 [760.2MB/760.2MB][100%][==================================================][95.03MB/s] [root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root [760.2MB/760.2MB][100%][==================================================][58.48MB/s] Tom Byrne, 12th November 2014 “Filesystem” xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307" • Can create directories with UNIX style permissions. [root@gdss540 ~]# xrdfs gdss541 ls /atlas/ /atlas/ivukotic:group.test.hc.NTUP_SMWZ.root /atlas/test • Setup is “Fragile” – frequently need to restart xrootd. • Dies when doing “ls –l” Tom Byrne, 12th November 2014 10 Direct Read • Code from Wahid: • • 11 git clone https://[email protected]/reps/FAX Wanted to try 4 tests: • • • • Read 10% of the file and use 30MB cache Read 100% of the file and use 30MB cache Read 10% of the file and use 100MB cache – CRASHED! Read 100% of the file and use 100MB cache – CRASHED! 30MB Cache 100% CPU Time /s Disk IO MB/s 10% CPU Time /s Disk IO MB/s 1st 2nd 3rd Average 31.13 31.13 30.5 30.92 112.654 112.951 113.094 112.8997 15.9 16.35 16.04 16.09667 110.737 112.13 112.056 111.641 Tom Byrne, 12th November 2014 Future plans • 3 threads of development: • • • • • Get simplified xrootd to work. Look into GridFTP gateway – Spoken to Brian Bockelman who has made equivalent for HDFS. Look into Webdav gateway – Instructions to get started on Ceph wiki and will speak to DPM developers. Need to start looking at xattr We have procured mac mini for future Calamari builds. Tom Byrne, 12th November 2014 12 Summary • • • • We got S3 gateway to work, but it wasn’t quite what we wanted. Testing Diamond plugin with help from CERN. Do not need all the features. Question: Why do all the plugins create their own data formats? If we go with an object store we will have to write our own plugins but this does not appear to be an impossible task. Tom Byrne, 12th November 2014 13