CASTOR / GridFTP Emil Knezo PPARC-LCG-Fellow CERN IT-ADC

Download Report

Transcript CASTOR / GridFTP Emil Knezo PPARC-LCG-Fellow CERN IT-ADC

CASTOR / GridFTP
Emil Knezo
PPARC-LCG-Fellow
CERN IT-ADC
GridPP 7th Collaboration Meeting, Oxford UK
July 1st 2003
Outline of this talk
Introduction to CASTOR HSM
CASTOR/GridFTP approach
GridFTP problems
CASTOR/GridFTP test service
Configuration issues
Usage examples
Plan for CASTOR/GridFTP service
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
CASTOR
CASTOR Mass Storage System evolved from SHIFT (tape
management system of 90’s)
CASTOR is HSM
Today @ CERN: 2066.37 TB of data of 10.51 M files stored in
CASTOR
CASTOR provides to users:
Name space
File names are in the form:
/castor/domain_name/experiment_name/…
for example: /castor/cern.ch/cms/
/castor/domain_name/user/…
for example: /castor/cern.ch/user/k/knezo
POSIX compliant I/O: RFIO
+ 64-bits support, streaming mode;
- security
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
CASTOR current layout
NAME
NAME
server
server
RFIO
Client
VDQM
VDQM
server
server
RTCOPY
CLIENT
STAGER
TPDAE
MON
(PVR)
RTCPD
RTCPD
(TAPE
MOVER)
RFIOD
(DISK
MOVER)
MSGD
VOLUME
manager
DISK POOL
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
GridFTP for CASTOR
Motivation for GridFTP interface to CASTOR
LCG
Data-movement protocol to couple different HSM systems of Tier-1 centers
Used by Replica Management System
Experiments
Offer experiments a secure alternative to rfio and FTP
Support CMS world-wide production starting in July
Mid-July 2003:
February 2004:
1TB per day to CASTOR from 12 regional centers
several TB per day from/to CASTOR
Approach for GridFTP interface to CASTOR
Modification of external GridFTP server to act as rfio-client to
CASTOR
Solution already proven for FTP servers
Not enough man-power do develop and maintain our own server
Development time restriction
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
Selected GridFTP server
Globus Toolkit GridFTP-1.5 server
GridFTP
Based on wu-ftp 2.6.2
Widely used
Control Data
expected good support
Supported GridFTP extensions:
2811
GridFTP
process
GridFTP
server
EBLOCK mode
PARALLEL transfer
REST STREAM
DCAU
ERET, ESTO
Also supported:
Third-party transfer
PBSZ, PROT
MDTM
RFIO
CASTOR
stager
1/7/2003
Tapes
Not supported GridFTP extensions:
STRIPING, SPAS, STOR
ABUF, SBUF
CASTOR & GridFTP / Emil Knezo CERN
GridFTP problems
Firewalls
Bi-directional data transfer in EBLOCK mode
Cannot open data-connection – blocked by firewall
Firewalls with NAT
GSI mutual authentication errors
HSM
Data existing in HSM name space are not always readily accessible:
Possible disconnection of idle control channel socket by some firewalls
Third-party transfer from HSM suffers from data-connection accept timeout at the datareceiving end.
Solution
HSM:
Always pre-stage your data in HSM before transfer
Currently with CASTOR “stagein” command; when available with SRM interface.
Firewall:
Do not use firewalls with NAT
Do not block data-connections in firewall
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
External network connection
GEANT
1Gb/s
US-link
622Mb/s
DataTAG
2.5Gb/s
router
350Mb/s half-duplex
HTAR
PIX
1Gb/s
350Mb/s half-duplex
router
1Gb/s

1/7/2003
1Gb/s
GridFTP
server
Data connections of CASTOR GridFTP
server are routed via 1Gb/s High
Throughput Access Route (HTAR)
Control connections are routed via PIX
TCP window size is fixed to 64kB if dataconnection goes via PIX.
Only high # ports connections to/from
CASTOR GridFTP server are routed via
HTAR
Configuration issue
Port #s interval currently applicable:<50k,51k>
External GridFTP clients or servers must
also select data-connection port #s from
the interval of HTAR routed ports,
otherwise data channel will go via PIX!
LCG guidelines for used data-connection
port numbers can solve this kind of
configuration issues.
CASTOR & GridFTP / Emil Knezo CERN
CASTOR/GridFTP test-service
CERN
wacdr002d
1 Gbit/s GEANT link

1Gbit/s
(via HTAR since mid-May)
CASTOR
GridFTP
GridFTP
rfio
Test service in operation from mid-January 2003
Installation based on
EDG Globus, rel.24 (January – mid.June)
VDT 1.1.8 (since mid.June)
Supports
All EDG GridFTP clients, globus-url-copy
Still on server-code TO-DO list
64-bit file support (currently no files > 2GB)
CWD, CDUP fails on CASTOR name-space (“..” problem).
In the meantime, full path is to be used by clients for CASTOR files
Internal “ls”, currently patched CASTOR’s nsls client used
Test some currently not used GridFTP commands (ESTO, ERET)_
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
Evolution of
CASTOR/GridFTP service
Set of configurations extended by UID--Stager mapping
DNS-load balancing (still to be verified)
Stager-response logging
Increased data-connection accept timeout (20 min)
CASTOR
Serv_1
rfio
griftpd
GridFTP via HTAR
Serv_2
stageatlas
cms001d
griftpd
…
DNS load-balancing
griftpd
UID – stager mapping
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
…
Serv_n
stagepublic
Performance and statistics
Performance
CERN internal transfer was: 5MB/s in/out; now: 7MB/s in/out
Transfer from NIKHEF was 3MB/s in/out; now: not available yet
Standard CERN TCP configuration (64kB TCP buffer size)
Not via HTAR
10 parallel streams
Statistics
Not properly kept
Ftp-xferlog file – broken file size for outbound traffic
GridFTP-xferlog – repeated file-record per every parallel stream of a transfer
Example: 2 weeks statistics May 26 – June 9:
Transferred 1480 files (1217 inbound, 263 outbound)
627,425 GB stored to CASTOR via GridFTP wacdr002d service
Main user: ATLAS
gppui04.gridpp.rl.ac.uk, aftpexp.bnl.gov, lscf.nbi.dk
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
DN -- User mapping
EDG-mechanisms used
grid-mapfile at the moment
Mapping granularity on VO-level (LDAP URL)
Currently un-maintainable to have user-level granularity
No dynamic pool accounts
edg-gridmap.conf:
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=alice,dc=eu-datagrid,dc=org
group ldap://grid-vo.nikhef.nl/ou=testbed1,o=atlas,dc=eu-datagrid,dc=org
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=cms,dc=eu-datagrid,dc=org
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=lhcb,dc=eu-datagrid,dc=org
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=biomedical,dc=eu-datagrid,dc=org
group ldap://grid-vo.nikhef.nl/ou=tb1users,o=earthob,dc=eu-datagrid,dc=org
group ldap://marianne.in2p3.fr/ou=ITeam,o=testbed,dc=eu-datagrid,dc=org
group ldap://marianne.in2p3.fr/ou=wp6,o=testbed,dc=eu-datagrid,dc=org
alice001
atlas001
cms001
lhcb001
biome001
ob001
iteam001
wpsix001
Up to VO Admin to create subsets of users for other UIDs
One DN – One User restriction
Hard to sell to experiments
VOMS should solve the problem
VOMS provide <DN + role> based UID mapping
VOMS to be tested with CASTOR GridFTP server (configuration issue)
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
Umask and usage examples
Umask 002 => “rw-rw-r—” permissions on CASTOR
Per server umask configuration
CASTOR at the moment still requires world-readable files
Usage examples
Prestage file
stagein [-h wacdr002d] -M /castor/cern.ch/atlas/subdirectory/file.name
stageqry [-h wacdr002d] -M /castor/cern.ch/atlas/subdirectory/file.name
Will be replaced by SRM prepareToGet call
Retrieve file from CASTOR
globus-url-copy [-p 10]
gsiftp://wacdr002d.cern.ch/castor/cern.ch/atlas/subdirectory/file.name
file:///home/knezo/file.name
Third party transfer from CASTOR
globus-url-copy [-p 10]
gsiftp://wacdr002d.cern.ch/castor/cern.ch/atlas/subdirectory/file.name
gsiftp://spider.usatlas.bnl.gov/usatlas/workarea/knezo/file.name
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
Plan for CASTOR/GridFTP
service
One year horizon
Support for CMS world-wide production
This is now High Priority Task
Performance challenge for server
Requires TCP-tuning, likely dedicated stager, maybe NAPI
DNS load-balanced server cluster
Sufficient for users with no strict throughput requirements for the coming year (ATLAS,
LHCB, EDG)
Service To-Do list
Performance tuning
DNS-load balancing configuration tests
Integrate with CERN monitoring, plus scripts to create server usage statistics
VOMS to improve DN–User mapping
Still to improve logging
Synchronisation on package upgrades with EDG
Prepare user & admin documentation, plus rpms
Shown interest from external institutes: INFN, IFAE, IFIC
Beyond one year
Need to understand what Globus GridFTP server evolution will be.
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN
Conclusions
GridFTP interface to CASTOR already exists
Ready to use service requires to solve:
Configuration issues
Performance issues
Admin issues
Service has potential to satisfy CASTOR users for
the next year
1/7/2003
CASTOR & GridFTP / Emil Knezo CERN