Transcript slides

1
Ceph – status update
and xrootd testing
Alastair Dewhurst, Tom Byrne
Tom Byrne, 12th November 2014
Introduction
•
•
•
On 15th October gave overview talk on plans for
Ceph at RAL Tier 1.
Will aim to provide updates on progress made
focusing on the xrootd deployment and testing.
Current Ceph cluster with 7 nodes using 2013
generation hardware.
Tom Byrne, 12th November 2014
2
S3 gateway
•
At last meeting we had S3 gateway on virtual
machine:
•
•
•
Hope to have firewall holes + x.509 authentication working
by next week.
S3 gateway ‘does it’s own thing’ with files which
means it is difficult to use with other plugins.
Will investigate writing own WebDAV gateway.
Tom Byrne, 12th November 2014
3
CERN plugins
•
CERN have four plugins based on XRootD for
CEPH:
•
•
•
•
•
•
radosfs (impl. file & directories in rados)
xrootd-rados-oss (interfacing radosfs as OSS plug-in)
xrootd-diamond-ofs (adding checksumming & TPC)
xrootd-auth-change-id (adding NFS server style
authentication to xrootd)
Our work has been on the xrootd-diamond-ofs
Setup instructions can be found:
https://github.com/cern-eos/eos-diamond/wiki
Tom Byrne, 12th November 2014
4
Xrootd deployment
•
•
Used RPMs provided on wiki to setup XrootD
gateway
Had to setup a Cache tier because it currently
doesn’t work directly with erasure coded pools
•
•
This is because the file is opened and then appended to,
CERN are working on patching it to work with EC.
There are two pools:
•
Data and Meta-Data
Tom Byrne, 12th November 2014
5
Cache Tier
•
Cache Tier is using mostly default settings
•
•
•
•
3 replicas of the data
Will create a ‘cold’ erasure coded copy instantly
LRU algorithm to clean up data.
We would prefer not to use a Cache Tier and have
direct access to Erasure coded pool
•
•
It would be possible to have a ~10% Cache Tier in front of
the storage.
We believe Erasure coded pool should work well as we are
not appending to files.
Tom Byrne, 12th November 2014
6
Diamond data
•
Plugin splits file into chunks which are stored with a
GUID in Ceph:
•
Makes it hard to manage files and write other plugins.
[root@gdss540 ~]# rados -p diamond-data ls | grep 774b1a83-14d0-4fb9-a6c0-10e36c32febf | sort
774b1a83-14d0-4fb9-a6c0-10e36c32febf
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000001
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000002
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000003
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000004
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000005
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000006
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000007
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000008
774b1a83-14d0-4fb9-a6c0-10e36c32febf//00000009
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000a
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000b
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000c
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000d
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000e
774b1a83-14d0-4fb9-a6c0-10e36c32febf//0000000f
Tom Byrne, 12th November 2014
7
Diamond meta-data
https://indico.cern.ch/event/305441/session/5/contribution/37/material/slides/0.pdf
Tom Byrne, 12th November 2014
8
Testing
•
Have tried commands from:
•
•
•
9
UI (using xrootd v3.3.6)
Node (using xrootd v4.0.4)
Can copy files in and out:
[root@gdss540 ~]# xrdcp ./ivukotic\:group.test.hc.NTUP_SMWZ.root root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root.1
[760.2MB/760.2MB][100%][==================================================][95.03MB/s]
[root@gdss540 ~]# xrdcp root://gdss541//root/ivukotic:group.test.hc.NTUP_SMWZ.root /ivukotic\:group.test.hc.NTUP_SMWZ.root
[760.2MB/760.2MB][100%][==================================================][58.48MB/s]
Tom Byrne, 12th November 2014
“Filesystem”
xrdfs gdss541 mkdir "/atlas/?owner=10763&group=1307"
•
Can create directories with UNIX style permissions.
[root@gdss540 ~]# xrdfs gdss541 ls /atlas/
/atlas/ivukotic:group.test.hc.NTUP_SMWZ.root
/atlas/test
•
Setup is “Fragile” – frequently need to restart xrootd.
•
Dies when doing “ls –l”
Tom Byrne, 12th November 2014
10
Direct Read
•
Code from Wahid:
•
•
11
git clone https://[email protected]/reps/FAX
Wanted to try 4 tests:
•
•
•
•
Read 10% of the file and use 30MB cache
Read 100% of the file and use 30MB cache
Read 10% of the file and use 100MB cache – CRASHED!
Read 100% of the file and use 100MB cache – CRASHED!
30MB
Cache
100% CPU Time /s
Disk IO
MB/s
10% CPU Time /s
Disk IO
MB/s
1st
2nd
3rd Average
31.13
31.13
30.5
30.92
112.654
112.951
113.094
112.8997
15.9
16.35
16.04
16.09667
110.737
112.13
112.056
111.641
Tom Byrne, 12th November 2014
Future plans
•
3 threads of development:
•
•
•
•
•
Get simplified xrootd to work.
Look into GridFTP gateway – Spoken to Brian Bockelman
who has made equivalent for HDFS.
Look into Webdav gateway – Instructions to get started on
Ceph wiki and will speak to DPM developers.
Need to start looking at xattr
We have procured mac mini for future Calamari
builds.
Tom Byrne, 12th November 2014
12
Summary
•
•
•
•
We got S3 gateway to work, but it wasn’t quite what
we wanted.
Testing Diamond plugin with help from CERN. Do
not need all the features.
Question: Why do all the plugins create their own
data formats?
If we go with an object store we will have to write our
own plugins but this does not appear to be an
impossible task.
Tom Byrne, 12th November 2014
13