LCG AA Internal Review

Download Report

Transcript LCG AA Internal Review

ACAT05

Summary Session I René Brun

27 May 2005

Outline

19 presentations

Data Analysis, Data Acquisition and Tools :

6

GRID Deployment :

4

Applications on the GRID :

5

High Speed Computing :

4 R. Brun, ACAT05 DESY, Zeuthen 2

Data Analysis, Acquisition, Tools

• Evolution of the Babar configuration data base design • DAQ software for SND detector • Interactive Analysis environment of unified accelerator libraries • DaqProVis, a toolkit for acquisition, analysis, visualisation • The Graphics Editor in ROOT • Parallel interactive and batch HEP data analysis with PROOF R. Brun, ACAT05 DESY, Zeuthen 3

Evolution of the Configuration Database Design

Andrei Salnikov, SLAC For BaBar Computing Group

ACAT05 – DESY, Zeuthen

BaBar database migration

• BaBar was using Objectivity/DB ODBMS for many of its databases • About two years ago started migration from Objectivity to ROOT for event store, which was a success and improvement • No reason to keep pricey Objectivity only because of “secondary” databases • Migration effort started in 2004 for conditions,

configuration, prompt reconstruction, and ambient databases

R. Brun, ACAT05 DESY, Zeuthen 5

Configuration database API

• Main problem of the old database – API exposed too much to the implementation technology: • Persistent objects, handles, class names, etc.

• API has to change but we don’t want to make the same mistakes again (new mistakes are more interesting) • Pure transient-level abstract API independent on any specific implementation technology • Always make abstract APIs to avoid problems in the future (this may

be hard and need few iterations)

• Client code should be free from any specific database implementation details • Early prototyping could answer a lot of questions, but five years of experience count too • Use different implementations for clients with different requirements • Implementation would benefit from features currently missing in

C++: reflection, introspection (or from completely new language)

6 R. Brun, ACAT05 DESY, Zeuthen

DAQ software for SND detector

Budker Institute of Nuclear Physics, Novosibirsk M. Achasov, A. Bogdanchikov, A. Kim, A. Korol

Main data flow

Events packing 100 Hz 1 kHz Readout and events building 1 kHz 1 kHz Events filtering 1 KB 4 KB 4 KB 1 KB Expected rates: • Events fragments: 4 МB/s are read from IO processors over Ethernet; • Event building: 4 MB/s; • Event packing: 1 МB/s; • Events filtering (90% screening): 100 KB/sec.

Storage

R. Brun, ACAT05 DESY, Zeuthen 8

DAQ architecture

Detector

Front-end electronics KLUKVA KLUKVA X 16 X 12 Off-line Filtered events Visualization Readout & Event Building Buffer Database System support R. Brun, ACAT05 DESY, Zeuthen TLT computers Backup Calibration process 9

R. Brun, ACAT05 DESY, Zeuthen 10

Interactive Analysis Environment of Unified Accelerator Libraries

V. Fine, N. Malitsky, R.Talman

Abstract

Unified Accelerator

(http://root.cern.ch) data

Libraries

analysis

(UAL,http://www.ual.bnl.gov) software is an open accelerator simulation environment addressing a broad spectrum of accelerator tasks ranging from online-oriented

efficient models to full-scale realistic beam dynamics studies. The paper introduces a new package integrating UAL simulation algorithms with the Qt-based Graphical User Interface and an open collection of analysis and visualization components. The primary user application is implemented as an interactive and configurable Accelerator Physics Player whose extensibility is provided by plug-in architecture. Its interface to data analysis and visualization modules is based on the Qt layer (http://root.bnl.gov) developed and supported by the Star experiment. The present version embodies the ROOT framework and Coin 3D (http://www.coin3d.org) graphics library.

R. Brun, ACAT05 DESY, Zeuthen 12

Accelerator Physics Player

UAL::USPAS::BasicPlayer* player = new UAL::USPAS::BasicPlayer(); player->setShell(&shell); qApp.setMainWidget(player); player->show(); qApp.exec();

An open collection of viewers An open collection of algorithms

R. Brun, ACAT05 DESY, Zeuthen 13

Examples of the Accelerator-Specific Viewers

Turn-By-Turn BPM data (based on ROOT TH2F or TGraph ) Twiss plots (based on ROOT TGraph) Bunch 2D Distributions (based on ROOT TH2F)

R. Brun, ACAT05 DESY, Zeuthen

Bunch 3D Distributions (based on COIN 3D)

14

Parallel Interactive and Batch HEP-Data Analysis with PROOF

Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***, Derek Feichtinger****, Gerardo Ganis**, Guenter Kickinger**, Andreas Peters**, Fons Rademakers** * - MIT ** - CERN *** - FNAL **** - PSI

ROOT Analysis Model

standard model

Files analyzed on a local computer

Remote data accessed via remote fileserver (rootd/xrootd) Local file Remote file (dcache, Castor, RFIO, Chirp) Client Rootd/xrootd server

R. Brun, ACAT05 DESY, Zeuthen 16

PROOF Basic Architecture

  

Single-Cluster mode

The Master divides the work among the slaves After the processing finishes, merges the results (histograms, scatter plots) And returns the result to the Client Slaves Master Client Files

Commands, scripts Histograms, plots R. Brun, ACAT05 DESY, Zeuthen 17

PROOF and Selectors

The code is shipped to each slave and SlaveBegin(), Init(), Process(), SlaveTerminate() are executed there Many Trees are being processed Initialize each slave No user’s control of the entries loop!

The same code works also without PROOF.

DESY, Zeuthen 18 R. Brun, ACAT05

Analysis session snapshot What we are implementing:

AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 CQ1: Browse results of AQ8, BQ3->BQ6

R. Brun, ACAT05

Monday at 10h15 ROOT session On my laptop

DESY, Zeuthen

Monday at 16h25 ROOT session On my laptop Wednesday at 8h40 session on any web browser

19

ROOT Graphics Editor by Ilka Antcheva

ROOT graphics editor can be: • Embedded – connected only with the canvas in the application window • Global – has own application window and can be connected to any created canvas in a ROOT session.

R. Brun, ACAT05 DESY, Zeuthen 20

Focus on Users

Novices (for a short time) • Theoretical understanding, no practical experience with ROOT • Impatient with learning concepts; patient with performing tasks • Advanced beginners (many people remain at this level) • Focus on a few tasks and learn more on a need-to-do basis • Perform several given tasks well • Competent performers (fewer then previous class) • Know and perform complex tasks that require coordinated actions • Interested in solving problems and tracking down errors • Experts (identified by others) • Ability to find solution in complex functionality • Interested in theories behind the design • Interested in interacting with other expert systems R. Brun, ACAT05 DESY, Zeuthen 21

DaqProVis M.Morhac

• DaqProVis, a toolkit for acquisition, interactive analysis, processing and visualization of multidimensional data • Basic features • DaqProVis is well suited for interactive analysis of multiparameter data from small and medium sized experiments in nuclear physics.

• data acquisition part of the system allows one to acquire multiparameter events either directly from the experiment or from a list file, i.e., the system can work either in on-line or off-line acquisition mode.

• in on-line acquisition mode, events can be taken directly from CAMAC crates or from VME system that cooperates with DaqProVis in the client-server working mode. • in off-line acquisition mode the system can analyze event data even from big experiments, e.g. from Gammasphere.

• the event data can be read also from another DaqProVis system. The capability of DaqProVis to work simultaneously in both the client and the server working mode enables us to realize remote as well as distributed nuclear data acquisition, processing and visualization systems and thus to create multilevel configurations 22 R. Brun, ACAT05 DESY, Zeuthen

DaqProVis (Visualisation)

R. Brun, ACAT05 DESY, Zeuthen 23

DaqProVis (suite)

• DaqProVis and ROOT teams are already cooperating.

• Agreement during the workshop to extend this cooperation R. Brun, ACAT05 DESY, Zeuthen 24

GRID deployment

• Towards the operation of the Italian Tier-1 for CMS: Lessons learned from the CMS Data Challenge • GRID technology in production at DESY • Grid middleware Configuration at the KIPT CMS Linux Cluster • Storage resources management and access at Tier1 CNAF R. Brun, ACAT05 DESY, Zeuthen 25

Towards the operations of the Italian Tier-1 for CMS:

lessons learned from the CMS Data Challenge D. Bonacorsi

(on behalf of INFN-CNAF Tier-1 staff and the CMS experiment) May 22 nd -27 th

ACAT 2005

X Int. Work. on Advanced Computing & Analysis Techniques in Physics Research

, 2005 - DESY, Zeuthen, Germany

DC04 outcome

(grand-summary + focus on INFN T1) • reconstruction/data-transfer/analysis may run at 25 Hz • automatic registration and distribution of data, key role of the TMDB • was the embrional PhEDEx!

• support a (reasonable) variety of different data transfer tools and set-up • Tier-1’s: different performances, related to operational choices • SRB, LCG Replica Manager and SRM investigated: see CHEP04 talk • INFN T1: good performance of LCG-2 chain (PIC T1 also) • register all data and metadata (POOL) to a world-readable catalogue • RLS: good as a global file catalogue, bad as a global metadata catalogue • analyze the reconstructed data at the Tier-1’s as data arrive • LCG components: dedicated bdII+RB; UIs, CEs+WNs at CNAF and PIC • real-time analysis at Tier-2’s was demonstrated to be possible • ~15k jobs submitted • time window between reco data availability - start of analysis jobs can be reasonably low (i.e. 20 mins) • reduce number of files (i.e. increase <#events>/<#files>) • more efficient use of bandwidth • reduce overhead of commands • address scalability of MSS systems (!) R. Brun, ACAT05 DESY, Zeuthen 27

Learn from DC04 lessons…

• Some general considerations may apply: • although a DC is experiment-specific, maybe its conclusions are not •

an “experiment-specific” problem is better addressed if conceived as a “shared” one in a shared Tier -1

an experiment DC just provides hints, real work gives insight  crucial role of the experiments at the Tier-1 • find weaknesses of CASTOR MSS system in particular operating conditions • stress-test new LSF farm with official production jobs by CMS • testing DNS-based load-balancing by serving data for production and/or analysis from CMS disk-servers • test new components, newly installed/upgraded Grid tools, etc… • find bottleneck and scalability problems in DB services • give feedback on monitoring and accounting activities • … R. Brun, ACAT05 DESY, Zeuthen 28

PhEDEx at INFN

• • • • INFN-CNAF is a T1 ‘node’ in PhEDEx CMS DC04 experience was crucial to start-up PhEDEX in INFN • CNAF node operational since the beginning • First phase (Q3/4 2004): Agent code development + focus on operations: T0  T1 transfers • >1 TB/day T0  T1 demonstrated feasible • … but the aim is not to achieve peaks, but to sustain them in normal operations • Second phase (Q1 2005): PhEDEx deployment in INFN to Tier-n, n>1: • “distributed” topology scenario • Tier-n agents run at remote sites, not at the T1: know-how required, T1 support • already operational at Legnaro, Pisa, Bari, Bologna

An example: data flow to T2’s in daily operations (here: a test with ~2000 files, 90 GB, with no optimization) ~450 Mbps CNAF T1

LNL-T2 ~205 Mbps CNAF T1

Pisa-T2

 Third phase (Q>1 2005):  Many issues.. e.g. stability of service, dynamic routing, coupling PhEDEx to CMS official production system, PhEDEx involvement in SC3 phaseII, etc… R. Brun, ACAT05 DESY, Zeuthen 29

Storage resources management and access at TIER1 CNAF

Ricci Pier Paolo, Lore Giuseppe, Vagnoni Vincenzo on behalf of INFN TIER1 Staff [email protected]

ACAT 2005 May 22-27 2005 DESY Zeuthen, Germany

STK180 with 100 LTO-1

TIER1 INFN CNAF Storage

HSM (400 TB) NAS (20TB)

(10Tbyte Native) W2003 Server with LEGATO Networker (Backup) CASTOR HSM servers RFIO Linux SL 3.0 clients (100-1000 nodes) NFS WAN or TIER1 LAN STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4) STK 9940B drives NFS-RFIO-GridFTP oth...

SAN 2 (40TB)

Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1 Diskservers with Qlogic FC HBA 2340 2 Gadzoox Slingshot 4218 18 port FC Switch 2 Brocade Silkworm 3900 32 port FC Switch STK BladeStore About 25000 GByte 4 FC interfaces AXUS BROWIE About 2200 GByte 2 FC interface R. Brun, ACAT05 DESY, Zeuthen H.A.

NAS1,NAS4 3ware IDE SAS 1800+3200 Gbyte PROCOM 3600 FC NAS3 4700 Gbyte PROCOM 3600 FC NAS2 9000 Gbyte

SAN 1 (200TB)

IBM FastT900 (DS 4500) 3/4 x 50000 GByte 4 FC interfaces Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD 31

CASTOR HSM

STK L5500 2000+3500 mixed Point to Point FC 2Gb/s connections 8 tapeserver slots Linux RH AS3.0

6 drives LTO2 (20-30 MB/s) HBA Qlogic 2300 2 drives 9940B (25-30 MB/s) 1300 LTO2 (200 GB native) 650 9940B (200 GB native) EXPERIMENT ALICE ATLAS CMS LHCb BABAR,AMS+oth Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS 7.0 OS Solaris 9.0

Staging area (TB) Tape pool (TB native) 8 6 2 18 2 12 20 15 30 4 1 CASTOR (CERN)Central Services server RH AS3.0

WAN or TIER1 LAN 6 stager with diskserver RH AS3.0

15 TB Local staging area 1 ORACLE 9i rel 2 DB server RH AS 3.0

8 or more rfio diskservers RH AS 3.0 min 20TB staging area Indicates Full rendundancy FC 2Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW) R. Brun, ACAT05

SAN 2

DESY, Zeuthen

SAN 1

32

DISK access (2)

We have different protocols in production for accessing the disk storage. In our diskservers and Grid SE front-ends we corrently have:

1.

2.

3.

4.

NFS on local filesystem: ADV. Easy client implementation and compatibility and possibility of failover (RH 3.0). DIS. Bad perfomance scalability for an high number of access (1 client 30MB/s 100 client 15MB/s throughtput) RFIO on local filesystem: ADV. Good performance and compatibility with Grid Tools and possibility of failover. DIS. No scalability of front-ends for the single filesystem, no possibility of load-balancing Grid SE Gridftp/rfio over GPFS (CMS,CDF): ADV: Separation from GPFS servers (accessing the disks) and SE GPFS clients. Load balancing and HA on the GPFS servers and possibility to implement the same on the Grid SE services (see next slide). DIS. GPFS layer requirements on OS and Certified Hardware for support.

Xrootd (BABAR): ADV: Good performance DIS: No possibility of load-balancing for the single filesystem backends, not grid compliant (at present...)

NOTE The IBM GPFS 2.2 is a CLUSTERED FILESYSTEM so is possible from many front-ends (i.e. gridftp or rfio server) to access simultaneously the SAME filesystem. Also can use bigger filesystem size (we use 8-12TB).

R. Brun, ACAT05

1

DESY, Zeuthen 33

Generic Benchmark (here shown for 1 GB files)

# of simultaneous client processes GPFS 2.3.0-1 native Lustre 1.4.1

NFS RFIO native RFIO 1 114 102 79 102 93 5 WRITE (MB/s) 10 50 160 171 171 512 301 151 171 158 512 320 147 159 166 488 284 120 147 158 166 478 281 1 85 114 79 73 68 5 READ (MB/s) 10 50 301 320 320 366 269 301 366 301 640 269 305 322 320 453 314 120 305 292 321 403 349 • Numbers are reproducible with small fluctuations • Lustre tests with NFS export not yet performed R. Brun, ACAT05 DESY, Zeuthen 34

Grid Technology in Production at DESY

Andreas Gellrich

* DESY ACAT 2005

24 May 2005

DESY, Zeuthen 35

Grid @ DESY

• With the HERA-II luminosity upgrade, the demand for MC production rapidly increased while the outside collaborators moved there computing resources towards LCG • The ILC group plans the usage of Grids for their computing needs • The LQCD group develops a Data Grid to exchange data • DESY considers a participation in LHC experiments  EGEE and D-GRID  dCache is a DESY / FNAL development 

Since spring 2004 an LCG-2 Grid infrastructure in operation

R. Brun, ACAT05 DESY, Zeuthen 36

Grid Infrastructure @ DESY …

• DESY installed ( SL3.04, Quattor, yaim ) and operates a complete independent Grid infrastructure which provides generic (non- experiment specific) Grid services to all experiments and groups • The    

DESY Production Grid

is based on LCG-2_4_0 and includes: Resource Broker (RB), Information Index (BDII), Proxy (PXY) Replica Location Services (RLS) In total 24 + 17 WNs (48 + 34 = 82 CPUs) dCache-based SE with access to the entire DESY data space • VO management for the HERA experiments (‘

hone

’, ‘

herab

’, ‘

herme

s’, ‘

szeu

’), LQCD (‘

ildg

’), ILC (‘

ilc

’, ‘

calice

’), Astro-particle Physics (‘

baikal

’, ‘

icecube

’) • Certification services for DESY users in cooperation with GridKa R. Brun, ACAT05 DESY, Zeuthen 37

R. Brun, ACAT05 DESY, Zeuthen 38

Grid Middleware Configuration at the KIPT CMS Linux Cluster

S. Zub, L. Levchuk, P. Sorokin, D. Soroka

Kharkov Institute of Physics & Technology, 61108 Kharkov, Ukraine

http://www.kipt.kharkov.ua/~cms [email protected]

What is our specificity?

 Small PC-farm (KCC)  Small scientific group of 4 physicists, combining their work with system administration  CMS tasks orientation  No commercial software installed  Self-security providing  Narrow bandwidth communication channel  Limited traffic R. Brun, ACAT05 DESY, Zeuthen 40

Summary

• An enormous data flow expected in the LHC experiments forces the HEP community to resort to the Grid technology • The KCC is a specialized PC farm constructed at the NSC KIPT for computer simulations within the CMS physics program and preparation to the CMS data analysis • Further development of the KCC is planned with considerable increase of its capacities and deeper integration into the LHC Grid (LCG) structures • Configuration of the LCG middleware can be troublesome

(especially at small farms with poor internet connection), since this software is neither universal nor “complete”, and one has to resort to special tips

Scripts are developed that facilitate the installation procedure

at a small PC farm with a narrow internet bandwidth

41 R. Brun, ACAT05 DESY, Zeuthen

Applications on the Grid

• The CMS analysis chain in a distributed environment • Monte Carlo Mass production for ZEUS on the Grid • Metadata services on the Grid • Performance comparison of the LCG2 and gLite File Catalogues • Data Grids for Lattice QCD R. Brun, ACAT05 DESY, Zeuthen 42

The CMS analysis chain in a distributed environment Nicola De Filippis

on behalf of the CMS collaboration

ACAT 2005 DESY, Zeuthen, Germany 22 nd 27 th May, 2005 –

R. Brun, ACAT05 DESY, Zeuthen 43

The CMS analysis tools

Overview: Data management

• Data Transfer service:

PHEDEX

• Data Validation stuff:

ValidationTools

• Data Publication service:

RefDB/PubDB

Analysis Strategy

• Distributed Software installation:

XCMSI

• Analysis job submission tool:

CRAB

Job Monitoring

• System monitoring:

BOSS

• application job monitoring:

JAM

R. Brun, ACAT05 DESY, Zeuthen 44

The end-user analysis wokflow

• •  The user provides: Dataset (runs,#event,..) private code

UI DataSet Catalogue (PubDB/RefDB)

 CRAB discovers data and sites hosting them by querying RefDB/ PubDB  CRAB prepares, splits and submits jobs to the Resource Broker  The RB sends jobs at sites hosting the data provided the CMS software was installed  CRAB retrieves automatically the output files of the the job R. Brun, ACAT05

Computing Element

DESY, Zeuthen CRAB Job submission tool Workload Management System

Resource Broker (RB)

XCMSI

Storage Element Worker node

45

Conclusions

 CMS first working

prototype for Distributed User Analysis

is available and used by

real

users 

Phedex, PubDB, ValidationTools, XCMSI, CRAB, BOSS, JAM

under development, deployment and in production in many sites  CMS is using

Grid infrastructure

for physics analyses and Monte Carlo production 

tens

of users,

10 million

of analysed data,

10000

jobs submitted  CMS is designing a

new architecture

for the analysis workflow R. Brun, ACAT05 DESY, Zeuthen 46

R. Brun, ACAT05 DESY, Zeuthen 47

R. Brun, ACAT05 DESY, Zeuthen 48

Metadata Services on the GRID

Nuno Santos ACAT’05 May 25 th , 2005

Metadata on the GRID

• Metadata is data about data • Metadata on the GRID • Mainly information about files • Other information necessary for running jobs • Usually living on DBs • Need simple interface for Metadata access • Advantages • Easier to use by clients - no SQL, only metadata concepts • Common interface – clients don’t have to reinvent the wheel • Must be integrated in the File Catalogue • Also suitably for storing information about other resources R. Brun, ACAT05 DESY, Zeuthen 50

ARDA Implementation

• Backends • Currently: Oracle, PostgreSQL, SQLite • Two frontends • TCP Streaming • Chosen for performance • SOAP • Formal requirement of EGEE • Compare SOAP with TCP Streaming • Also implemented as standalone Python library • Data stored on filesystem Client Client SOAP TCP Streaming Metadata Server MD Server Oracle Postgre SQL SQLite Python Interpreter Client Metadata Python API filesystem R. Brun, ACAT05 DESY, Zeuthen 51

SOAP Toolkits performance

• • • Test communication performance • No work done on the backend • Switched 100Mbits LAN Language comparison • TCP-S with similar performance in all languages • SOAP performance varies strongly with toolkit Protocols comparison • Keepalive improves performance significantly • On Java and Python, SOAP is several times slower than TCP-S 25 20 TCP-S no KA TCP-S KA gSOAP no KA gSOAP KA 1000 pings 15 10 5 0 C++ (gSOAP) Java (Axis) Python (ZSI) R. Brun, ACAT05 DESY, Zeuthen 52

R. Brun, ACAT05 DESY, Zeuthen 53

R. Brun, ACAT05 DESY, Zeuthen 54

R. Brun, ACAT05 DESY, Zeuthen 55

R. Brun, ACAT05 DESY, Zeuthen 56

R. Brun, ACAT05 DESY, Zeuthen 57

R. Brun, ACAT05 DESY, Zeuthen 58

High speed Computing

• Infiniband • Analysis of SCTP and TCP based communication in high speed cluster • The apeNEXT Project • Optimisation of Lattice QCD codes for the Opteron processor R. Brun, ACAT05 DESY, Zeuthen 59

Forschungszentrum Karlsruhe

in der Helmholtz-Gemeinschaft

InfiniBand – Experiences at Forschungszentrum Karlsruhe

A. Heiss, U. Schwickerath

Credits: Inge Bischoff-Gauss Marc García Martí Bruno Hoeft Carsten Urbach InfiniBand-Overview Hardware setup at IWR HPC applications: MPI performance lattice QCD LM HTC applications rfio xrootd

Lattice QCD Benchmark GE wrt/ InfiniBand

Memory and communi cation intensive application Benchmark by C. Urbach See also CHEP04 talk given by A. Heiss Significant speedup by using InfiniBand

Thanks to Carsten Urbach FU Berlin and DESY Zeuthen

RFIO/IB Point-to-Point file transfers (64bit)

PCI-X and PCI-Express throughput

Notes

best results with PCI-Express:

> 800MB/s

raw transfer speed

> 400MB/s

file transfer speed RFIO/IB see ACAT03 NIM A 534(2004) 130-134 solid: file transfers cache->/dev/null dashed: network+protocol only

Disclaimer on PPC64: Not an official IBM Product. Technology Prototype. (see also slide 5 and 6)

Xrootd and InfiniBand

Notes: IPoIB notes:

Dual Opteron V20z Mellanox Gold drivers SM on InfiniCon 9100 same nodes as for GE

Native IB notes:

proof of concept version based on Mellanox VAPI using IB_SEND dedicated send/recv buffers same nodes as above

10GE notes:

IBM xseries 345 nodes Xeon 32bit, single CPU 1 and 2 GB RAM 2.66GHz clock speed Intel PRO/10GbE LR cards used for long distance tests First preliminary results

TCP vs. SCTP in high-speed cluster environment

Miklos Kozlovszky Budapest University of Technology and Economics BUTE

TCP vs. SCTP

Both: • IPv4 & IPv6 compatible • Reliable • Connection oriented • Offers acknowledged, error free, non-duplicated transfer • Almost same Flow and Congestion Control TCP Byte stream oriented 3 way handshake connection init Old (more than 20 years) R. Brun, ACAT05 SCTP Message oriented 4 way handshake connection init (cookie) Quite new (2000-) Multihoming Path-mtu discovery DESY, Zeuthen 65

Summary

• SCTP inherited all the “good features of TCP” • SCTP want to behave like a next generation TCP • It is more secure than TCP, and has many attractive feature (e.g.:multihoming) • Theoretically it can work better than TCP, but TCP is faster (yet “poor” implementations) • Well standardized, and can be useful for cluster R. Brun, ACAT05 DESY, Zeuthen 66

R. Brun, ACAT05 DESY, Zeuthen 67

R. Brun, ACAT05 DESY, Zeuthen 68

R. Brun, ACAT05 DESY, Zeuthen 69

R. Brun, ACAT05 DESY, Zeuthen 70

R. Brun, ACAT05 DESY, Zeuthen 71

R. Brun, ACAT05 DESY, Zeuthen 72

R. Brun, ACAT05 DESY, Zeuthen 73

R. Brun, ACAT05 DESY, Zeuthen 74

My Impressions

Concerns

• Only a small fraction of the Session I talks correspond to the original spirit of the AIHEP/ACAT Session I talks.

• In particular, many of the GRID talks about deployment and infrastructure should be given to CHEP, not here.

• The large LHC collaborations have their own ACAT a few times/year.

• The huge experiment software frameworks do not encourage cross-experiments discussions or tools.

• For the next ACAT, the key people involved in the big experiments should work together to encourage more talks or reviews.

76 R. Brun, ACAT05 DESY, Zeuthen

Positive aspects

• ACAT continues to be a good opportunity to meet with other cultures. Innovation may come from small groups or non HENP fields.

• Contacts (even sporadic) with Session III or plenary talks are very beneficial, in particular to young people.

R. Brun, ACAT05 DESY, Zeuthen 77

The Captain of Kopenick

• Question to the audience :  • Is

Friedrich Wilhelm Voigt

(Captain of Kopenick) an ancestor of Voigt, the father of the Voigt function ?

R. Brun, ACAT05 DESY, Zeuthen 78