The LHCb Computing and Software HTASC meeting, February

Download Report

Transcript The LHCb Computing and Software HTASC meeting, February

The LHCb Software and Computing
GridPP10 meeting, June 2nd 2004
Ph. Charpentier, CERN
QuickT i me™ et un
décompresseur T IFF (LZW)
sont requis pour visionner cette i mage.
011010011101
10101000101
01010110100
B00le
Outline





PhC
Core software
LHCb Applications
Production and Analysis tools
Data Challenges plans
Summary
GridPP10 meeting, 02/6/2004
Slide 2/22
Software Strategy

Develop an Architecture (‘blueprint’) and a Framework (real
code) to be used at all stages of LHCb data processing



high level triggers, simulation, reconstruction, analysis
a single framework used by all members of the collaboration
Avoid fragmentation and duplication of computing efforts



common vocabulary, better understanding of the system
better specifications of what needs to be done
identify and build common components



PhC
guidelines and coordination for SD groups
Transparent use of third-party components wherever possible


QuickTime™ et un
décompress eur TIFF (LZW)
sont requis pour visionner cet te image.
Leverage from LCG applications area software
GUI, persistency, simulation.…
Applications are developed by customizing the Framework
GridPP10 meeting, 02/6/2004
Slide 3/22
Gaudi Architecture and Framework
Application
Manager
Message
Service
JobOptions
Service
Particle Prop.
Service
Other
Services
PhC
QuickTime™ et un
décompress eur TIFF (LZW)
sont requis pour visionner cet te image.
Converter
Converter
Converter
Event
Selector
Persistency
Service
Data
Files
Detec. Data
Service
Transient
Detector
Store
Persistency
Service
Data
Files
Histogram
Service
Transient
Histogram
Store
Persistency
Service
Data
Files
Event Data
Service
Transient
Event Store
Algorithm
Algorithm
Algorithm
GridPP10 meeting, 02/6/2004
Slide 4/22
Impact of LCG projects
QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
SEAL
Application
Manager
Message
SEAL
Service
JobOptions
SEAL
Service
Particle Prop.
HepPDT
Service
Other
Services
Other LCG
services
PhC
Converter
Converter
Converter
Event
Selector
Pool
Persistency
Service
Data
Files
Transient
Detector
Store
LCG
Persistency
Service
DDDD
Data
Files
Transient
Histogram
Histogram
ServiceAIDA Store
Persistency
Pool
Service
Data
Files
Event Data
Service
Transient
Event Store
Algorithm
Algorithm
Algorithm
Detec. Data
Service
GridPP10 meeting, 02/6/2004
Slide 5/22
Core Software - Status and Outlook



Sept ‘98 – project started GAUDI team assembled
Feb ‘99 - GAUDI first release (v1)
Nov ’99 GAUDI Open-Source style





Object Persistency, Detector Description, Data Visualization,
etc.
Dec ‘03 - POOL used for object persistency
More work still needed on ..


PhC
experiment independent web and release area
ATLAS started contributing to its development
used by Atlas, Harp, Glast
Jun ‘02 - All the basic functionality was available


QuickTime™ et un
décompress eur TIFF (LZW)
sont requis pour visionner cet te image.
Detector Conditions, Distributed computing (Grid), Interactive
environment, etc.
Integrate more LCG SEAL services (plugin manager…)
GridPP10 meeting, 02/6/2004
Slide 6/22
Applications
and datasets
011010011101
10101000101
01010110100
QuickTime™ et un
décompress eur TIFF (LZW)
sont requis pour visionner cet te image.
B00le
Detector Groups
Event model / Physics event model
MiniDST
GenParts
Detector
Description
RawData
Simul.
Gauss
MCHits
MCParts
Conditions
Database
Analysis
DaVinci
Recons.
& HLT
Brunel
Digit.
Boole
AOD
Digits
DST
Gaudi
PhC
GridPP10 meeting, 02/6/2004
Slide 7/22
Gauss - Geant4-based simulation application


Gaudi application
Uses




Simulation framework: GiGa



PhC
Pythia 6.205 and EVTGEN (Babar) for event generation
HepMC as exchange model
Geant4 as simulation engine
Converts HepMC to Geant4 input
Interfaces with all Gaudi services (geometry, magnetic field…)
Converts Geant4 trajectories to LHCb event model (MCHits,
MCParticles and MCVertices)
GridPP10 meeting, 02/6/2004
Slide 8/22
Boole & Brunel
Digitisation and reconstruction



From MCHits to digits (Raw buffer format)
Running trigger algorithms
Output: Raw buffer & MC truth (+relations)
Brunel






PhC
B00le
Boole


011010011101
10101000101
01010110100
Complete pattern recognition
Charged tracks: long, upstream, downstream
Calorimeter clusters & electromagnetic particle identification
RICH particle identification
Muon particle identification
Output: DST (ESD) format based on the LHCb event model
GridPP10 meeting, 02/6/2004
Slide 9/22
DaVinci - The LHCb Analysis Framework

Gaudi application



Physicists only manipulate abstract objects
(particles and vertices)



Concentrate on functionality rather than on technicality
Manipulation and analysis tools for general use
Physics event model for describing all physics
related objects produced by the analysis algorithms

PhC
Facilitates migration of algorithms from analysis to
reconstruction
Interactive analysis through Python scripting
Keep loose connection to reconstruction entities (tracks,
clusters)
GridPP10 meeting, 02/6/2004
Slide 10/22
DaVinci - toolset

To access and filter data :


Vertexing and constrained fitters :





PhC
MC Decay finder, Associators, …
Utilities


Geometrical Vertex Fitter, Mass Constrained Vertex Fitter, Primary
Vertex, Kinematic Fitter, …
MC analysis tools :


Physics Desktop, Particle Filters (PID, kinematics etc.), Particle Stuffer
GeometricalTools, Particle transporter, Debug tool
Long list of decay selections at different levels of development
Tutorial – attended by total of 60 physicists
All results shown in the ‘03 TDRs were obtained using this software
ROOT (or PAW) used to produce ultimate plots
GridPP10 meeting, 02/6/2004
Slide 11/22
Panoramix - Event & Geometry Display


Panoramix package based on
OpenInventor
Is able to display:





PhC
QuickT i me™ et un
décompresseur T IFF (LZW)
sont requis pour visionner cette i mage.
Geometry from XML files
MC data objects
Reconstruction objects
Scripting based on python
Gaudi application, hence can be
integrated with e.g. DaVinci
algorithms
GridPP10 meeting, 02/6/2004
Slide 12/22
Event Viewing
PhC
QuickT i me™ et un
décompresseur T IFF (LZW)
sont requis pour visionner cette i mage.
GridPP10 meeting, 02/6/2004
Slide 13/22
Dirac - Workload management software
User
interfaces
DIRAC
services
Job monitor
Production
manager
JobAccountingSvc
DIRAC
resources
DIRAC Sites
DIRAC CE
DIRAC CE
DIRAC CE
PhC
User CLI
DIRAC Job
Management
Service
JobMonitorSvc
AccountingDB
GANGA UI
Agent
Agent
QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
BK query
webpage
FileCatalog
browser
BookkeepingSvc
FileCatalogSvc
InfomarionSvc
MonitoringSvc
Agent
Resource
Broker
CE 1
GridPP10 meeting, 02/6/2004
LCG
CE 3
CE 2
DIRAC Storage
gridftp
DiskFile
bbftp
rfio
Slide 14/22
Catalogs

File metadata (bookkeeping)

LHCb bookkeeping database (BKDB)







Replica table in LHCb-BKDB
All files entered into an AliEn catalog
Data management tools


PhC
Support by GridPP-funded position (Carmine Cioffi)
File and replica catalog


Successfully tested during DC03
Supports full cross-reference of datasets
Full traceability of data history
XML-RPC remote access + web browser
Being developed, based on catalogs above
Essential for distributed analysis
GridPP10 meeting, 02/6/2004
Slide 15/22
Ganga - Interfacing Gaudi to the Grid
Goal


Required functionality






PhC
Simplify the management of
analysis and production jobs for
end-user physicists by developing a
tool for accessing Grid services
with built-in knowledge of how
Gaudi works
Job preparation and configuration
Job submission, monitoring and
control
Resource browsing, booking, etc.
Done in collaboration with ATLAS
Aim for being used for DC’04
analysis
Using grid middleware services
GUI

Job Options
Algorithms
GANGA
Histograms
Monitoring
Results
Collective
&
Resource
Grid
Services
GAUDI
GAUDIProgram
Program
GridPP10 meeting, 02/6/2004
Slide 16/22
Ganga - Functionality for DC’04

Prepare job






All settings can be saved /
retrieved for further use
Collaborate with Grid services



Prepare options for defining the
processing
Set algorithm tuning parameters
(criteria, cuts…)
Select datasets
Submit job
File/Metadata catalogs
Workload management
Select
WorkFlow
DaVinci
Workflow
Edit
AlgorithmFlow
DLLs
AlgFlowOptions


Interactive run
Local batch system
LCG Grid (possibly via Dirac)
AlgOptions
catalog
AlgParamOptions
Select
Datasets
DatasetOptions
Prepare
Sandbox
PhC
Prepare
AlgFlowOptions
and DLLs
Edit
AlgParamOptions
Jobs submitted to

DaVinci
AlgorithmFlow
GridPP10 meeting, 02/6/2004
Metadata
catalog
Sandbox
DLLs
JobOptions
FileCatalog slice
File
catalog
Submit Job
Slide 17/22
Ganga: the project


Ganga is an ATLAS-LHCb joint venture
Mainly funded by GridPP

Shared ATLAS-LHCb positions (GridPP)


Project management


LHCb bookkeeping & Ganga integration (GridPP)

Carmine Cioffi
Committed to ARDA

PhC
David Adams (ATLAS-BNL), Ulrik Egede (LHCb-Imperial College)
Strong interaction with dataset selection


Alexander Soroko, Karl Harrison, Alvin Tan
EGEE NA4 manpower integrated in the Ganga team (middleware
integration and testing)
GridPP10 meeting, 02/6/2004
Slide 18/22
Data Challenges

Plan series of Data Challenges



DC’03
measure quality (# crashes / # events) and performance of software
scalability tests for simulation, reconstruction and analysis
production tests over grid using all LHCb regional centres
5. 107 Events (2%)
1 crash in < 20k events
Feb-April 2003
DC’04 LCG2 functionality test
2. 108 Events (5%)
1 crash in < 200k
events
May-July 2004
DC’05
LCG3 production test
5. 108 Events (10%)
1 crash in < 2M events
June-Aug. 2005
DC’06
Large scale test over fully
deployed grid -
~109 Events (~20%)
1 crash in < 20M events
July-Sep. 2006
PhC
TDR production
GridPP10 meeting, 02/6/2004
Slide 19/22
Data Challenge ‘03








PhC
Still using Geant3 for
simulation
Used for 2003 LHCb
TDR studies
59 days needed (90
foreseen)
~5. 107 events produced
17 sites used
36600 jobs lauched
(success rate 92%)
80% outside CERN
2/3 in UK
GridPP10 meeting, 02/6/2004
Slide 20/22
Data Challenge ‘04
Computing data challenge 2004
Start dat e:
End Dat e:
Number of day s
All times normalis ed to 1GHz PI II proces sors
Las t updat e:
26/ 01/04
Ev ents request ed
Ty pe
Tot al
Min bias s tandard
75 000
Min bias s pec ial
11 000
B generic
50 000
B s ignal
20 950
Tot als
156 950
000
000
000
000
000
Simulation
Time (CPU hours ) oos im size (MB)
69 444
650 000
189 444
1 540 000
3 444 444
27 500 000
11 522 500
1 443 222
41 212 500
5 146 556
Digit ization
Reconst ruct ion
Time (CPU hours ) oodigi size (MB) Time (CPU hours ) Stripping f ac t or oods t s ize (MB)
22 917
22 200 000
110 417
0,001
4 800
3 361
3 256 000
16 194
0,001
704
26 389
25 200 000
222 222
1,000
8 100 000
11 057
10 558 800
93 111
1,000
3 393 900
63 724
61 214 800
441 944
11 499 404
5 652 224
Tot al time
Tot al CPU (SPECint2k * hours ) 2 260 889 444
Av erage CPUs used (Totals /Number of day s)
1GHz PI II proces sors
2617
SPECint 2k
1 046 708
CPU time by site
Share of av ailable
Site
SPECint 2k * hours by s ite by c ountry
CERN
851450800
20, 7%
20, 7%
BR
0
0,0%
0,0%
CH
19400000
0,5%
0,5%
DE-Karls ruhe
650 000 000 15, 8%
15, 8%
ES
205 200 000
5,0%
5,0%
FR-Ly on
367 200 000
8,9%
8,9%
GB-I mperial
400 000 000
9,7%
GB-Liv erpool
398 000 000
9,7%
53, 6%
GB-RAL
366 000 000
8,9%
GB-Sc ot Grid
47 000 000
1,1%
IT-Bologna
432 000 000 10, 5%
10, 5%
NL-NIKHEF
162 000 000
3,9%
3,9%
PL
75 600 000
1,8%
1,8%
RU
129 600 000
3,2%
3,2%
4 103 450 800 100,0%
100,0%
PhC
2004/04/ 01
2004/06/ 30
90
% of needed,
by s ite
37, 7%
0,0%
0,9%
28, 7%
9,1%
16, 2%
17, 7%
17, 6%
16, 2%
2,1%
19, 1%
7,2%
3,3%
5,7%
181,5%
Storage required
(Gby tes )
20 051
0
249
8 350
2 636
4 717
5 138
5 113
4 702
604
5 549
2 081
971
1 665
61 825
GridPP10 meeting, 02/6/2004
Ev ent ty pe
Minimum bias
B generic
B s ignal
Tot al
Storage (GB)
oos im
2 190
27 500
11 523
41 213
oods t
6
8 100
3 394
11 499
53% CPU power
pledged by UK
Slide 21/22
Summary


A software framework (Gaudi) with full set of services has been
developed for use in all event processing applications
Common set of high level components developed



Migration of LHCb software to use Geant4 and LCG-AA software
completed



Production architecture (DIRAC) and toolset
Dirac using LCG2 in place and running (~50% of processing power)
Set of Data Challenges planned for future deployment and validation
of computing infrastructure and software

PhC
POOL used for persistency, SEAL full integration still to come
The production of large datasets of ~200 M events in progress
(DC04)


e.g. detector geometry, event model, interactive/visualization tools
provides guidelines and minimizes work for physicists developing detector
and physics algorithms
Main challenge for 2004: distributed analysis (post DC04)
GridPP10 meeting, 02/6/2004
Slide 22/22