Status of LHCb-INFN Computing

Download Report

Transcript Status of LHCb-INFN Computing

Status of LHCb-INFN Computing
CSN1, Catania, September 18, 2002
Domenico Galli, Bologna
LHCb Computing Constraints


Urgent Need of production and analysis of large
number of MC data sets in a short time.

LHCb-light detector design.

Trigger design, TDRs.
Need to optimize the hardware and software
configuration to minimize dead time and system
administration effort.
Status of LHCb-INFN Computing, 2
Domenico Galli
LHCb Farm Architecture (I)

Article in press on Computer Physics Communications:


“A Beowulf-class computing cluster for the Monte Carlo production of
the LHCb experiment”.
Disk-less computing nodes, with operating systems
centralized on a file server (Operating System Server).

Very flexible configuration, allows adding and removing nodes from the
system without any local installation.



Useful for computing resources shared among different experiments.
Extremely stable system: no side effects at all in more than 1 year
of work.
System administration duties minimized.
Status of LHCb-INFN Computing, 3
Domenico Galli
LHCb Farm Architecture (II)

Security

Usage of private IP addresses and Virtual LAN.



High level of isolation from the Internet network.
Extern accesses (afs servers, bookkeeping database, CASTOR
library at CERN) through Network Address Translation
technology on a Gateway node.
Potential system “Single Points of Failure” equipped
with redundant disk configuration.

RAID-5 (2 NAS).

RAID-1 (Gateway and Operating System Server).
Status of LHCb-INFN Computing, 4
Domenico Galli
LHCb Farm Architecture (III)
Red Hat 7.2 (kernel 2.4.18)
DNS
NAT (IP masquerading)
Disk-less node
CERN Red Hat 6.1
Kernel 2.2.18
PBS Master
MC control server
Farm Monitoring
Mirrored disks (RAID 1)
Gateway
Uplink
Public
VLAN
Private
VLAN
Fast Ethernet
Switch
NAS
NAS
1TB RAID 5
1TB RAID 5
Control Node
Processing Node 1
Processing Node n
Ethernet
Link
Power Distributor
Master Server
Power Control
Mirrored disks (RAID 1)
Status of LHCb-INFN Computing, 5
Domenico Galli
Disk-less nodes
CERN Red Hat 6.1
Kernel 2.2.18
PBS Slave
Red Hat 7.2
OS file-systems
Home directories
Various services:
PXE remote boot,
DHCP, NIS
Fast ethernet switch
Rack (1U dual-processor MB)
NAS, 1TB
Ethernet controlled power
Status of LHCb-INFN Computing, 6
distributor (32 channels) Domenico Galli
Data Storage

Files containing reconstructed events (OODST-ROOT
format) are transferred to CERN using bbftp and
automatically stored on the CASTOR tape library.


Data transfer from CNAF to CERN performed with a
maximum throughput of 70 Mb/s (on a 100 Mb/s link).
To be compared with ~15 Mb/s using ftp.
Status of LHCb-INFN Computing, 7
Domenico Galli
2002 Monte Carlo Production

Target


Software:


Production of large event statistics for the design of the LHCblight detector and of the trigger system (trigger TDR).
Simulation (FORTRAN) and reconstruction (C++) code to be used in
the production supplied in July.
LHCb Data Challenge ongoing (August-September)

Participating Computing Centers : CERN, INFN-CNAF, Liverpool,
IN2P3-Lyon, NIKHEF, RAL, Bristol, Cambridge, Oxford, ScotGrid
(Glasgow & Edinburgh)
Status of LHCb-INFN Computing, 8
Domenico Galli
Status of Summer LHCb-Italy Monte Carlo
Production (Data Challenge)

Events produced in Bologna (Aug., 1 –Sep., 12): 1,053,500
Bd0 -> pi+ pi-
79,000
Bd0 -> D*-(D0_bar(K+ pi-) pi-) pi+
19,000
Bd0 -> K+ pi-
55,500
Bs0 -> K- pi+
8,000
Bs0 -> K+ K-
8,000
Bs0 -> J/psi(mu+ mu-) eta(gamma gamma)
8,000
Bd0 -> phi(K+ K-) Ks0(pi+ pi-)
8,000
Bs0 -> mu+ mu-
8,000
Bd0 -> D+(K- pi+ pi+) D-(K+ pi- pi-)
8,000
Bs0 -> Ds-(K+ K- pi-) K+
8,000
Bs0 -> J/psi(mu+ mu-) phi(K+ K-)
8,000
Bs0 -> J/psi(e+ e-) phi(K+ K-)
8,000
Minimum bias
47,500
c c_bar -> inclusive (at least one c hadron in 400 mrad)
275,500
b b_bar -> inclusive (at leastDomenico
one b hadron
Galli in 400 mrad)
505,000
Status of LHCb-INFN Computing, 9
Distribution of Produced Events Among
Production Centers (August, 1–September, 12)
50%
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
CERN

INFN-CNAF IN2P3-Lyon
RAL
The other above mentioned centres are late on
the Data Challenge start date.
Status of LHCb-INFN Computing, 10
Domenico Galli
Usage of the CNAF Tier-1 Computing
Resources

Computing, Control and
Service Nodes:


Disk Storage Servers



130 PIII CPUs (clock ranges
from 866 MHz to 1.4 GHz)
1 TB NAS (14 x 80 GB IDE
disks + hotspare in RAID5).
1TB NAS (7 x 170 GB SCSI
disks + hotspare in RAID5).
All the stuff is working at a
very high duty-cycle.
CPU LOAD
Status of LHCb-INFN Computing, 11
Domenico Galli
Plan for Analysis Activities


In autumn the analysis of the data produced during the Data
Challenge is foreseen.
Complete porting to Bologna of the development environment
of the analysis code (DaVinci C++ code) already performed
and in use on a mini-farm since 2 months.



Need of an extension of the analysis mini-farm to a grater number of
nodes for the need of the Italian LHCb collaboration.
Data produced in Bologna are kept stored on Bologna disks,
data produced in the other centers need to be transferred to
Bologna on user-demand with an automatic procedure.
Analysis jobs (on ~100 CPUs) need an I/O throughput
(~100MB/s) greater than supplied by NAS (~10MB/s).
Status of LHCb-INFN Computing, 12
Domenico Galli
High Performance I/O System (I)
An I/O parallelization system (through the use of a parallel
file system) was successfully tested.

PVFS (Parallel Virtual File System).

File striping of data among local disks of several I/O servers (ION).

Scalable System (throughput ~ 100 Mbit/s x n_ION)
I/O nodes
ION 1
CN 1
ION 2
CN 2
ION n
Management
Node
Network

MGR
Status of LHCb-INFN Computing, 13
Domenico Galli
CN m
Clients
High Performance I/O System (II)


With 10 ION we were able to reach the Aggregate
I/O of 110 MB/s (30 client nodes reading data).
To be compared with:

20-40 MB/s (local disk)

10 MB/s (100Base-T NAS)

50 MB/s (1000Base-T NAS)
With a single file
hierarchy.
PVFS Performances (10 I/O servers)
Aggregate I/O [MB/s]

120
100
80
60
40
20
0
0
5
10
Status of LHCb-INFN Computing, 14
Domenico Galli
15
20
25
30
Client number
Test of a PVFS-Based Analysis Facility (I)




Test performed using the OO DaVinci algorithm for
B+– selection.
Analyzed 44.5k signal events and 484k bb inclusive
events in 25 minutes (to be compared with 2 days on
a single PC).
Completely performed with the Bologna Farm
parallelizing the analysis algorithm over 106 CPUs
(80 x 1.4 GHz PIII CPUs + 26 x 1 GHz PIII CPUs).
DaVinci processes read OODST from PVFS.
Status of LHCb-INFN Computing, 15
Domenico Galli
Test of a PVFS-Based Analysis Facility (II)
OODST
ION 1
OODST
ION 10
CN 2
Nt-ple
PVFS
ION 2
CN 1
OODST
CN 106
MGR
Login Node
Status of LHCb-INFN Computing, 16
Domenico Galli
Test of a PVFS-Based Analysis Facility (III)

106 DaVinci processes reading from PVFS.

968 files (500 OODST events each) x 120 MB.

116 GB read and processed in 1500 s.
Status of LHCb-INFN Computing, 17
Domenico Galli
B+–: Pion Momentum Resolution
p / p for identified pions
coming from B0
p / p
|p / p| vs p for identified pions
coming from B0
FWMH  0.01
p / p
Status of LHCb-INFN Computing, 18
Domenico Galli
p [GeV/c]
B0 Mass Plots
All pi+ pi- pairs with no cuts
All pi+ pi- pairs with all cuts
All pi+ pi- pairs with all cuts
(magnified)
3425 events

Pt > 800 MeV/c

d/d > 1.6

lB0 > 1 mm
MeV/c
2
MeV/c
FWMH  66 MeV
2
105 events
MeV/c
2
Status of LHCb-INFN Computing, 19
Domenico Galli
bb Inclusive Background Mass Plot




Total number of events
484k.
All pi+ pi- pairs with all cuts
Only events with single
interaction taken into
account at the moment:
~240k.
213 events in mass region
after all cuts.
32/213 are ghosts.
GeV/c2
Status of LHCb-INFN Computing, 20
Domenico Galli
Signal Efficiency and Mass Plots for Tighter
Cuts


Final Efficiency (tighter cuts) @
zero bb inclusive background (240k events) = 871/22271 = 4%
Rejection against bb inclusive background > 1-1/240000 =
99.9996%
871 signal events in mass region
16 BG events from signal sample in
mass region (all ghosts)
Pt >
2.8 GeV/c
d/d > 2.5
lB0 > 0.6 mm
GeV/c2
Status of LHCb-INFN Computing, 21
Domenico Galli
GeV/c2
Conclusions







MC production farm stably running (with increasing resources) since more
than 1 year.
INFN Tier-1 is the second most active LHCb MC production centre
(after CERN).
The collaboration with the CNAF staff is excellent.
Still we aren’t using GRID tools in production, but we plan to move as soon
as the detector design is stable.
An analysis mini-farm for interactive work is running since more than 1
month and we plan to extend the number of nodes depending on the
availability of the resources.
Massive analysis system architecture already tested using a parallel file
system and 106 CPUs.
We need at least to keep the present computing power at CNAF (but
more resources to keep production running in parallel with massive analysis
activities would be welcome) to supply the analysis facility to the LHCbItalian collaboration.
Status of LHCb-INFN Computing, 22
Domenico Galli