Transcript Document

The LHCb Way of Computing
The approach to its organisation and development
John Harvey
CERN/ LHCb
DESY Seminar
Jan 15th, 2001
Talk Outline
 Brief introduction to the LHCb experiment
 Requirements on data rates and cpu capacities
 Scope and organisation of the LHCb Computing Project
 Importance of reuse and a unified approach
 Data processing software
 Importance of architecture-driven development and software frameworks
 DAQ system
 Simplicity and maintainability of the architecture
 Importance of industrial solutions
 Experimental Control System
 Unified approach to controls
 Use of commercial software
 Summary
J. Harvey : LHCb Computing
Slide 2
Overview of LHCb Experiment
The LHCb Experiment
 Special purpose experiment to measure precisely CP
asymmetries and rare decays in B-meson systems
 Operating at the most intensive source of Bu, Bd, Bs and Bc, i.e.
the LHC at CERN
 LHCb plans to run with an average luminosity of 2x1032cm-2s-1
 Events dominated by single pp interactions - easy to analyse
 Detector occupancy is low
 Radiation damage is reduced
 High performance trigger based on
 High pT leptons and hadrons (Level 0)
 Detached decay vertices (Level 1)
 Excellent particle identification for charged particles
 K/p: ~1GeV/c < p < 100GeV/c
J. Harvey : LHCb Computing
Slide 4
The LHCb Detector
 At high energies b- and bhadrons are produced in same
forward cone
 Detector is a single-arm
spectrometer with one dipole
 min = ~15 mrad
pipe and radiation)
 max = ~300 mrad
optimisation)
J. Harvey : LHCb Computing
(beam
(cost
Polar angles of b and b-hadrons
calculated using PYTHIA
Slide 5
LHCb Detector Layout
J. Harvey : LHCb Computing
Slide 6
Typical Interesting Event
J. Harvey : LHCb Computing
Slide 8
The LHCb Collaboration
Brazil
Spain
49 institutes
513 members
Finland
Ukraine
France
UK
Switzerland
Germany
Italy
Netherlands
PRC
Poland
Romania
Russia
LHCb in numbers
 Expected rate from
inelastic p-p collisions is
~15 MHz
 Total b-hadron production
rate is ~75 kHz
 Branching ratios of
interesting channels range
between 10-5-10-4 giving
interesting physics rate of
~5 Hz
Bunch crossing rate
40 MHz
Level 0 accept rate
1 MHz
Level 1 accept rate
40 kHz
Level 2 accept rate
5 kHz
Level 3 accept rate
Number of Channels
200 Hz
1.1 M
Event Size
150 kB
Readout Rate
40 kHz
Event Building Bandwidth
6 GB/s
Data rate to Storage
50 MB/s
Total raw data per year
125 TB
Total ESD per year
100 TB
Simulation data per year
350 TB
Level 2/3 CPU
35 kSI95
Reconstruction CPU
50 kSI95
Analysis CPU
10 kSI95
Simulation CPU
500 kSI95
Timescales
 LHCb experiment approved in September 1998
 Construction of each component scheduled to start after
approval of corresponding Technical Design Report (TDR) :
 Magnet, Calorimeter and RICH TDRs submitted in 2000
 Trigger and DAQ TDRs expected January 2002
 Computing TDR expected December 2002
 Expect nominal luminosity (2x1032 cm-2sec –1) soon after LHC turn-on
 Exploit physics potential from day 1
 Smooth operation of the whole data acquisition and data processing chain
will be needed very quickly after turn–on
 Locally tuneable luminosity  long physics programme
 Cope with long life-cycle of ~ 15 years
J. Harvey : LHCb Computing
Slide 11
LHCb Computing
Scope and Organisation
Requirements and Resources
 More stringent requirements …
 Enormous number of items to control - scalability
 Inaccessibility of detector and electronics during datataking reliability
 intense use of software in triggering (Levels 1, 2, 3) - quality
 many orders of magnitude more data and CPU - performance
 Experienced manpower very scarce
 Staffing levels falling
 Technology evolving very quickly (hardware and software)
 Rely very heavily on very few experts (1 or 2) - bootstrap approach
 The problem - a more rigorous approach is needed but this is
more manpower intensive and must be undertaken under
conditions of dwindling resources
J. Harvey : LHCb Computing
Slide 13
Importance of Reuse
 Put extra effort into building high quality components
 Become more efficient by extracting more use out of these
components (reuse)
 Many obstacles to overcome
 too broad functionality / lack of flexibility in components
 proper roles and responsibilities not defined ( e.g. architect )
 organisational - reuse requires a broad overview to ensure unified approach
 we tend to split into separate domains each independently managed
 cultural




don’t trust others to deliver what we need
fear of dependency on others
fail to share information with others
developers fear loss of creativity
 Reuse is a management activity - need to provide the right organisation
to make it happen
J. Harvey : LHCb Computing
Slide 14
Traditional Project Organisation
Online
System
DAQ
Hardware
DAQ
Software
Offline
System
Detector
Control System
Simulation
Analysis
Event
Display
Detector
Description
Event
Display
Message
System
Detector
Description
J. Harvey : LHCb Computing
Detector
Description
Message
System
Slide 15
A Process for reuse
Manage
Plan, initiate, track, coordinate
Set priorities and schedules, resolve conflicts
Build
Develop architectural models
Choose integration standards
Engineer reusable components
Support
Support development
Manage & maintain components
Validate, classify, distribute
Document, give feedback
Requirements
(Existing software and
hardware)
J. Harvey : LHCb Computing
Assemble
Design application
Find and specialise components
Develop missing components
Integrate components
Systems
Slide 16
LHCb Computing Project Organisation
National Computing Board
RCRC
C
Computing Steering Group
MM
RC
M
M
M
Software
Development
Support
Code Mgt,
Release Mgt,
Tools,
Training
Documentation
Web
Manage
Simulation
M
M
M
C
M
GAUDI
Framework
Architecture Spec.
Det Desc, Visualisation
GEANT4, XML,…
Assemble
Build
Controls
Framework
Architecture Spec,
SCADA, OPC, …
Support
A
DAQ System
Timing Fast Control,
Readout Unit
Event Builder
Event Filter Farm
Analysis
A
M
M
Operations
Experiment
Control System
Detector controls
Safety System
Run Control system
Reconstruction
EE
A
M
Trigger
Technical Review
DAQ
Framework
Architecture Spec,
Simulation Model,
TTC, NP, NIC,..
Distributed
Computing
Facilities
CPU Farms
Data storage
Computing Model
Production Tools
GRID
C Computing Coordinator
RC Regional Centre Rep
M Project Manager
A
Software Architect
E Project Engineer
J. Harvey : LHCb Computing
Slide 17
Data Processing Software
Software architecture
 Definition of [software] architecture [1]
 Set or significant decisions about the organization of the software
system
 Selection of the structural elements and their interfaces which
compose the system
 Their behavior -- collaboration among the structural elements
 Composition of these structural and behavioral elements into
progressively larger subsystems
 The architectural style that guides this organization
 The architecture is the blue-print (architecture description
document)
[1] I. Jacobson, et al. “The Unified Software development Process”, Addison Wesley 1999
J. Harvey : LHCb Computing
Slide 19
Software Framework
 Definition of [software] framework [2,3]
 A kind of micro-architecture that codifies a particular domain
 Provides the suitable knobs, slots and tabs that permit clients to
customise it for specific applications within a given range of behaviour
 A framework realizes an architecture
 A large O-O system is constructed from several cooperating
frameworks
 The framework is real code
 The framework should be easy to use and should provide a lot
of functionality
[2] G. Booch, “Object Solutions”, Addison-Wesley 1996
[3] E. Gamma, et al., “Design Patterns”, Addison-Wesley 1995
J. Harvey : LHCb Computing
Slide 20
Benefits
 Having an architecture and a framework:
 Common vocabulary, better specifications of what needs to be done,
better understanding of the system.
 Low coupling between concurrent developments. Smooth integration.
Organization of the development.
 Robustness, resilient to change (change-tolerant).
 Fostering code re-use
architecture
framework
applications
J. Harvey : LHCb Computing
Slide 21
What’s the scope?
 Each LHC experiment needs a framework to be used in their
event data processing applications






physics/detector simulation
high level triggers
reconstruction
analysis
event display
data quality monitoring,…
 The experiment framework will incorporate other
frameworks: persistency, detector description, event
simulation, visualization, GUI, etc.
J. Harvey : LHCb Computing
Slide 22
One main framework
Various specialized frameworks:
visualization, persistency,
interactivity, simulation, etc.
A series of basic libraries widely
used: STL, CLHEP, etc.
J. Harvey : LHCb Computing
Analysis
Simulation
Reconstruction
Applications built on top of
frameworks and implementing the
required physics algorithms.
High level triggers
Software Structure
Frameworks
Toolkits
Foundation Libraries
Slide 23
GAUDI Object Diagram
Application
Manager
Message
Service
JobOptions
Service
Converter
Converter
Converter
Event
Selector
Persistency
Service
Data
Files
Detec. Data
Service
Transient
Detector
Store
Persistency
Service
Data
Files
Histogram
Service
Transient
Histogram
Store
Persistency
Service
Data
Files
Event Data
Service
Transient
Event Store
Algorithm
Algorithm
Algorithm
Particle Prop.
Service
Other
Services
J. Harvey : LHCb Computing
Slide 24
GAUDI Architecture: Design Criteria
 Clear separation between data and algorithms
 Three basic types of data: event, detector, statistics
 Clear separation between persistent and transient data
 Computation-centric architectural style
 User code encapsulated in few specific places:
algorithms and converters
 All components with well defined interfaces and as
generic as possible
J. Harvey : LHCb Computing
Slide 25
Status
 Sept 98 – project started GAUDI team assembled
 Nov 25 ’98 - 1- day architecture review
 goals, architecture design document, URD, scenarios
 chair, recorder, architect, external reviewers
 Feb 8 ’99 - GAUDI first release (v1)
 first software week with presentations and tutorial sessions
 plan for second release
 expand GAUDI team to cover new domains (e.g. analysis toolkits,
visualisation)
 Nov ’00 – GAUDI v6
 Nov 00 – BRUNEL v1
 New reconstruction program based on GAUDI
 Supports C++ algorithms (tracking) and wrapped FORTRAN
 FORTRAN gradually being replaced
J. Harvey : LHCb Computing
Slide 26
Collaboration with ATLAS
 Now ATLAS also contributing to the development of GAUDI
 Open-Source style, expt independent web and release area,
 Other experiments are also using GAUDI
 HARP, GLAST, OPERA
 Since we can not provide all the functionality ourselves, we rely
on contributions from others
 Examples: Scripting interface, data dictionaries, interactive analysis, etc.
 Encouragement to put more quality into the product
 Better testing in different environments (platforms, domains,..)
 Shared long-term maintenance
 Gaudi developers mailing list
 tilde-majordom.home.cern.ch/~majordom/news/gaudi-developers/index.html
J. Harvey : LHCb Computing
Slide 27
Data Acquisition System
Trigger/DAQ Architecture
LHC-B Detector
VDET TRACK ECAL
HCAL MUON
Data
rates
RICH
40 MHz
Fixed latency
4.0 ms
Level 1
Trigger
40 TB/s
1 MHz
Level-0
Timing L0
&
Fast
40 kHz
L1
Control
Front-End Electronics
1 MHz
Front-End Multiplexers (FEM)
Front End Links
Variable latency
<1 ms
RU
Throttle
1 TB/s
Level-1
RU
RU
LAN
Level 0
Trigger
Read-out units (RU)
Read-out Network (RN)
SFC
Variable latency
L2 ~10 ms
L3 ~200 ms
J. Harvey : LHCb Computing
Storage
SFC
6 GB/s
6 GB/s
Sub-Farm Controllers (SFC)
CPU
CPU
CPU
CPU
Trigger Level 2 & 3
Event Filter
Control
&
Monitoring
50 MB/s
Slide 29
Event Building Network
 Requirements





6 GB/s sustained bandwidth
Scalable
~120 inputs (RUs)
~120 outputs (SFCs)
commercial and affordable
(COTS, Commodity?)
 Readout Protocol
60x1GbE
60x1GbE
Foundry
BigIron 15000
Foundry
BigIron 15000
3
3
3
3
Foundry
BigIron 15000
Foundry
BigIron 15000
60x1GbE
60x1GbE
12x10GbE
 Pure push-through protocol of complete events to one CPU of the farm
 Destination assignment following identical algorithm in all RUs (belonging to
one partition) based on event number
 Simple hardware and software
 No central control  perfect scalability
 Full flexibility for high-level trigger algorithms
 Larger bandwidth needed (+~50%) compared with phased event-building
 Avoiding buffer overflows via ‘throttle’ to trigger
 Only static load balancing between RUs and SFCs
J. Harvey : LHCb Computing
Slide 30
Readout Unit using Network Processors
DAQ RU
FEM
FEM
FEM
FEM
GbE
GbE
GbE
GbE
Phy
Phy
Phy
Phy
GMII
GMII
GMII
GMII


Mem
PCI
ECS
Ethernet
IBM NP4GS3

CC-PC
Switch Bus
Switch Bus

IBM NP4GS3
4 x 1Gb full duplex
Ethernet MACs
16 RISC processors
@ 133 MHz
Up-to 64 MB
external RAM
Used in routers
Mem
IBM NP4GS3
GMII
GMII
GMII
GMII
GMII
GMII
GMII
Phy
Phy
Phy
Phy
GbE



RU Functions
EB and formatting
7.5 msec/event
~200 kHz evt rate
RN
J. Harvey : LHCb Computing
Slide 31
Sub Farm Controller (SFC)
Alteon Tigon 2
 Dual R4000-class processor
running at 88 MHz
 Up to 2 MB memory
 GigE MAC+link-level interface
 PCI interface
 ~90 kHz event fragments/s
Development environment
 GNU C cross compiler with few
special features to support the
hardware
 Source-level remote debugger
J. Harvey : LHCb Computing
Local Bus
PCI Bus
Smart NIC
CPU
~50 MB/s
~0.5 MB/s
PCI
Bridge
Memory
NIC
~50 MB/s
~0.5 MB/s
Control NIC
‘Standard’ PC
Readout
Network
(GbE)
Subfarm
Network
(GbE)
Controls
Network
(FEth)
Slide 32
Control Interface to Electronics
 Select a reduced number of solutions to
interface Front-end electronics to LHCb’s
control system:
 No radiation (counting room):
Ethernet to credit card PC on modules
 Low level radiation (cavern):
10Mbits/s custom serial LVDS twisted pair
SEU immune antifuse based FPGA interface chip
 High level radiation (inside detectors):
CCU control system made for CMS tracker
Radiation hard, SEU immune, bypass
 Provide support (HW and SW) for the
integration of the selected solutions
J. Harvey : LHCb Computing
Ethernet
Credit
card
PC
JTAG
I 2C
Par
Master
Serial
slave
JTAG
I 2C
Par
PC
Master
PC
Slide 33
Experiment Control System
Control and Monitoring
LHC-B Detector
VDET TRACK ECAL
HCAL MUON
Data
rates
RICH
40 MHz
Fixed latency
4.0 ms
Level 1
Trigger
40 TB/s
1 MHz
Level-0
Timing L0
&
Fast
40 kHz
L1
Control
Front-End Electronics
1 MHz
Front-End Multiplexers (FEM)
Front End Links
Variable latency
<1 ms
RU
Throttle
1 TB/s
Level-1
RU
RU
LAN
Level 0
Trigger
Read-out units (RU)
Read-out Network (RN)
SFC
Variable latency
L2 ~10 ms
L3 ~200 ms
J. Harvey : LHCb Computing
Storage
SFC
6 GB/s
6 GB/s
Sub-Farm Controllers (SFC)
CPU
CPU
CPU
CPU
Trigger Level 2 & 3
Event Filter
Control
&
Monitoring
50 MB/s
Slide 35
Experimental Control System
 The Experiment Control System will be used to control and
monitor the operational state of the detector, of the data
acquisition and of the experimental infrastructure.
 Detector controls





High and Low voltages
Crates
Cooling and ventilation
Gas systems etc.
Alarm generation and handling
 DAQ controls
 RUN control
 Setup and configuration of all readout components
(FE, Trigger, DAQ, CPU Farm, Trigger algorithms,...)
J. Harvey : LHCb Computing
Slide 36
System Requirements
 Common control services across the experiment





System configuration services – coherent information in database
Distributed information system – control data archival and retrieval
Error reporting and alarm handling
Data presentation – status displays, trending tools etc.
Expert system to assist shift crew
 Objectives
 Easy to operate – 2/3 shift crew to run complete experiment
 Easy to adapt to new conditions and requirements
 Implies integration of DCS with the control of DAQ and data
quality monitoring
J. Harvey : LHCb Computing
Slide 37
Integrated System – trending charts
DAQ
Slow Control
J. Harvey : LHCb Computing
Slide 38
Integrated system – error logger
ALEPH error logger, ERRORS + MONITOR + ALARM
DAQ
Slow Control
2-JUN
2-JUN
2-JUN
2-JUN
J. Harvey : LHCb Computing
11:30
11:30
11:30
11:30
ALEP
ALEP
ALEP
TPC
R_ALEP_0
TPEBAL
TS
SLOWCNTR
RUNC_DAQ
MISS_SOURCE
TRIGGERERROR
SECTR_VME
ALEPH>> DAQ Error
TPRP13 <1_missing_Source(s)>
Trigger protocol error(TMO_Wait_No_Busy)
VME CRATE fault in: SideA Low
Slide 39
Scale of the LHCb Control system
 Parameters




Detector Control: O (105) parameters
FE electronics: Few parameters x 106 readout channels
Trigger & DAQ: O(103) DAQ objects x O(102) parameters
Implies a high level description of control components (devices/channels)
 Infrastructure
 100-200 Control PCs
 Several hundred credit-card PCs.
 By itself a sizeable network (ethernet)
J. Harvey : LHCb Computing
Slide 40
LHCb Controls Architecture
Conf. DB, Archives, Log files, …
Technologies
Storage
Users
Servers
Supervision
WAN
SCADA
LAN
Other systems
(LHC, Safety, ...)
...
LAN
Controller/
PLC
Fieldbus
Experimental equipment
J. Harvey : LHCb Computing
Process
Management
VME
OPC
Communication
PLC
Field
Management
Fieldbuses
Devices
Slide 41
Supervisory Control And Data Acquisition
 Used virtually everywhere in industry including very large and
mission critical applications
 Toolkit including:
 Development environment
 Set of basic SCADA functionality (e.g. HMI, Trending, Alarm
Handling, Access Control, Logging/Archiving, Scripting, etc.)
 Networking/redundancy management facilities for distributed
applications
 Flexible & Open Architecture





Multiple communication protocols supported
Support for major Programmable Logic Controllers (PLCs) but not VME
Powerful Application Programming Interface (API)
Open Database Connectivity (ODBC)
OLE for Process Control (OPC )
J. Harvey : LHCb Computing
Slide 42
Benefits/Drawbacks of SCADA
 Standard framework => homogeneous system
 Support for large distributed systems
 Buffering against technology changes, Operating Systems,
platforms, etc.
 Saving of development effort (50-100 man-years)
 Stability and maturity – available immediately
 Support and maintenance, including documentation and
training
 Reduction of work for the end users
 Not tailored exactly to the end application
 Risk of company going out of business
 Company’s development of unwanted features
 Have to pay
J. Harvey : LHCb Computing
Slide 43
Commercial SCADA system chosen
 Major evaluation effort
 technology survey looked at ~150 products
 PVSS II chosen from an Austrian company (ETM)
 Device oriented, Linux and NT support
 The contract foresees:
 Unlimited usage by members of all institutes participating in LHC
experiments
 10 years maintenance commitment
 Training provided by company - to be paid by institutes
 Licenses available from CERN from October 2000
 PVSS II will be the basis for the development of the control
systems for all four LHC experiments (Joint COntrols Project)
J. Harvey : LHCb Computing
Slide 44
Controls Framework
 LHCb aims to distribute with the SCADA system a framework
 Reduce to a minimum the work to be performed by the sub-detector
teams
 Ensure work can be easily integrated despite being performed in
multiple locations
 Ensure a consistent and homogeneous DCS
 Engineering tasks for framework :







Definition of system architecture (distribution of functionality)
Model standard device behaviour
Development of configuration tools
Templates, symbols libraries, e.g. power supply, rack, etc.
Support for system partitioning (uses FSM)
Guidelines on use of colours, fonts, page layout, naming, ...
Guidelines for alarm priority levels, access control levels, etc.
 First Prototype released end 2000
J. Harvey : LHCb Computing
Slide 45
Application Architecture
ECS
DCS
Tracke
r
Vertex
H
V
Tem
p
H
V
GA
S
LHC
DAQ
Muon
H
V
Tracke
r
Vertex
GA
S
FE
R
U
FE
Muon
R
U
FE
R
U
SAFET
Y
J. Harvey : LHCb Computing
Slide 46
Run Control
J. Harvey : LHCb Computing
Slide 47
Summary
 Organisation has important consequences for cohesion,
maintainability, manpower needed to build system
 Architecture driven development maximises common
infrastructure and results in systems more resilient to change
 Software frameworks maximuse level of reuse and simplify
distributed development by many application builders
 Use of industrial components (hardware and software) can
reduce development effort significantly
 DAQ is designed with simplicity and maintainability in mind
 Maintain a unified approach – e.g. same basic infrastructure
for detector controls and DAQ controls
J. Harvey : LHCb Computing
Slide 48
Extra Slides
J. Harvey : LHCb Computing
Slide 50
Typical Interesting Event
J. Harvey : LHCb Computing
Slide 51
J. Harvey : LHCb Computing
Slide 52
LHCb Collaboration
France:
Germany:
Italy:
Netherlands:
Poland:
Spain:
Switzerland:
UK:
CERN
Brazil:
China:
Romania:
Russia:
Ukraine:
Clermont-Ferrand, CPPM Marseille, LAL Orsay
Tech. Univ. Dresden, KIP Univ. Heidelberg,
Phys. Inst. Univ. Heidelberg, MPI Heidelberg,
Bologna, Cagliari, Ferrara, Firenze, Frascati, Genova, Milano,
Univ. Roma I (La Sapienza), Univ. Roma II(Tor Vergata)
NIKHEF
Cracow Inst. Nucl. Phys., Warsaw Univ.
Univ. Barcelona, Univ. Santiago de Compostela
Univ. Lausanne, Univ. Zürich
Univ. Bristol, Univ. Cambridge, Univ. Edinburgh, Univ. Glasgow,
IC London, Univ. Liverpool, Univ. Oxford, RAL
UFRJ
IHEP (Beijing), Tsinghua Univ. (Beijing)
IFIN-HH Bucharest
BINR (Novosibirsk), INR, ITEP,Lebedev Inst., IHEP,PNPI(Gatchina)
Inst. Phys. Tech. (Kharkov), Inst. Nucl. Research (Kiev)
J. Harvey : LHCb Computing
Slide 53
Requirements on Data Rates and
Computing Capacities
LHCb Technical Design Reports
Submitted:
January 2000
Recommended by LHCC:
March 2000
Approved by RB:
April 2000
J. Harvey : LHCb Computing
Submitted:
September 2000
Recommended:
November 2000
Submitted:
September 2000
Recommended:
November 2000
Slide 55
Defining the architecture
 Issues to take into account









Object persistency
User interaction
Data visualization
Computation
Scheduling
Run-time type information
Plug-and-play facilities
Networking
Security
J. Harvey : LHCb Computing
Slide 56
Architectural Styles
 General categorization of systems [2]
user-centric
data-centric
computation-centric
focus on the direct visualization
and manipulation of the objects
that define a certain domain
focus upon preserving the integrity
of the persistent objects in a
system
focus is on the transformation of
objects that are interesting to the
system
Our applications have elements of all three. Which one dominates?
J. Harvey : LHCb Computing
Slide 57
Getting Started
 First crucial step was to appoint an architect - ideally skills as:
 OO mentor, domain specialist, leadership, visionary
 Started with small design team ~ 6 people, including :
 developers , librarian, use case analyst
 Control activities through visibility and self discipline
 meet regularly - in the beginning every day, now once per week
 Collect URs and scenarios, use to validate the design
 Establish the basic design criteria for the overall architecture
 architectural style, flow of control, specification of interfaces
J. Harvey : LHCb Computing
Slide 58
Development Process
 Incremental approach to development
 new release every few (~ 4) months
 software workshop timed to coincide with new release
 Development cycle is user-driven
 Users define priority of what goes in the next release
 Ideally they use what is produced and give rapid feedback
 Frameworks must do a lot and be easy to use
 Strategic decisions taken following thorough review (~1 /year)
 Releases accompanied by complete documentation
 presentations, tutorials
 URD, reference documents, user guides, examples
J. Harvey : LHCb Computing
Slide 59
Possible migration strategies
C++
Fortran
SICb
1
2
3
?
Gaudi
Fast translation of
Fortran into C++
SICb
Gaudi
Wrapping Fortran
SICb
Gaudi
Framework
development phase
J. Harvey : LHCb Computing
Transition
phase
Hybrid
phase
Consolidation
phase
Slide 60
How to proceed?
 Physics Goal:
 To be able to run new tracking pattern recognition algorithms written
in C++ in production with standard FORTRAN algorithms in time to
produce useful results for the RICH TDR.
 Software Goal
 To allow software developers to become familiar with GAUDI and to
encourage the development of new software algorithms in C++.
 Approach
 choose strategy 3
 start with migration of reconstruction and analysis code
 simulation will follow later
J. Harvey : LHCb Computing
Slide 61
New Reconstruction Program - BRUNEL
 Benefits of the approach
 A unified development and production environment
 As soon as C++ algorithms are proven to do the right thing, they can be
brought into production in the official reconstruction program
 Early exposure of all developers to Gaudi framework
 Increasing functionality of OO ‘DST’
 As more and more of the event data become available in Gaudi, it will
become more and more attractive to perform analysis with Gaudi
 A smooth transition to a C++ only reconstruction
J. Harvey : LHCb Computing
Slide 62
Integrated System - databases
SCDevType
SCDevType
SCDevice
The power
supply on that
VME crate
SCChannel
VMECrate
ModuleType
VICCable
VMEModule
VSBCable
Readout System Database
SCCrate
SCDetector
Slow Control Database
Detector
description
J. Harvey : LHCb Computing
Slide 63
Frontend Electronics
 Data Buffering for Level-0
latency
 Data Buffering for Level-1
latency
 Digitization and Zero
Suppression
 Front-end Multiplexing onto
Front-end links
 Push of data to next higher
stage of the readout (DAQ)
J. Harvey : LHCb Computing
Slide 64
Timing and Fast Control
J. Harvey : LHCb Computing
TTCrx
L0 trigger
L1
TTCrx
L1 trigger
17
17
L1
L0
Local trigger
(optional)
L0
Readout
Supervisor
Readout
Supervisor
L0 Throttle
switch
Readout
Supervisor
L1 Throttle
switch
TFC switch
L1 trigger system
SD1 TTCtx
SD2 TTCtx
Optical couplers
SDn TTCtx
Optical couplers
Optical couplers
L0 TTCtx
L1 TTCtx
Optical couplers
TTC system
TTCrx
TTCrx
L1E
L1E
ADC
ADC
ADC
ADC
ADC
ADC
FEchip
FEchip
FEchip
L1
buffer
FEchip
L1
buffer
ADC
ADC
ADC
DSP
ADC
DSP
DAQ
L0E
L0E
TTCrx
TTCrx
L1E
L1E
FEchip
FEchip
FEchip
FEchip
FEchip
FEchip
ADC
ADC
ADC
ADC
ADC
ADC
FEchip
FEchip
FEchip
L1
buffer
FEchip
L1
buffer
ADC
ADC
ADC
DSP
ADC
DSP
DAQ
Throttle OR
FEchip
FEchip
FEchip
FEchip
FEchip
FEchip
Control
Control
TTCrx
TTCrx
TTCrx
TTCrx
Throttle OR
L0E
L0E
Control
Control
 Provide common and
synchronous clock to all
components needing it
 Provide Level-0 and Level-1
trigger decisions
 Provide commands
synchronous in all
components (Resets)
 Provide Trigger hold-off
capabilities in case buffers
are getting full
 Provide support for
partitioning
(Switches, ORs)
Clock fanout
BC and BCR
LHC clock
Slide 65
IBM NP4GS3
 Features
 4 x 1Gb full duplex Ethernet
MACs
 16 special purpose RISC
processors @ 133 MHz with 2
hw threads each
 4 processor (8 threads) share
3 co-processors for special
functions
 Tree search
 Memory move
 Etc.
 Integrated 133 MHz Power PC
processor
 Up-to 64 MB external RAM
J. Harvey : LHCb Computing
Slide 66
Event Building Network Simulation
 Simulated technology: Myrinet
 Nominal 1.28 Gb/s
 Xon/Xoff flow control
 Switches:





ideal cross-bar
8x8 maximum size (currently)
wormhole routing
source routing
No buffering inside switches
Trigger Signal
Trigger
RU
Data Generator
RU
Data Generator
Buffer
Buffer
NIC
NIC
Lanai
Lanai
Throttle
Composite Switching Network
Lanai
Lanai
NIC
NIC
Buffer
Buffer
Fragment Assembler
SFC
Fragment Assembler
SFC
 Software used: Ptolemy discrete event
framework
 Realistic traffic patterns
 variable event sizes
 event building traffic
J. Harvey : LHCb Computing
Slide 67
Event Building Activities
 Tested NIC event-building
 simulated switching fabric of the size
suitable for LHCb
Results show that switching network
could be implemented (provided buffers
are added between levels of switches)
Efficiency relative to installed BW
 Studied Myrinet
60.00%
50.00%
40.00%
30.00%
20.00%
Myrinet Simulation
256 kb FIFOs
No FIFOs
10.00%
0.00%
8
32
64
96
128
Switch Size
 Currently focussing on xGb Ethernet
 Studying smart NICs (-> Niko’s talk)
 Possible switch configuration for LHCb
with ~today’s technology
(to be simulated...)
Multiple Paths between
sources and destinations!
J. Harvey : LHCb Computing
60x1GbE
60x1GbE
E.g. Foundry
BigIron 15000
E.g. Foundry
BigIron 15000
3
3
3
3
E.g. Foundry
BigIron 15000
E.g. Foundry
BigIron 15000
60x1GbE
60x1GbE
12x10GbE
Slide 68
Network Simulation Results
Efficiency relative to installed BW
Results don’t depend strongly on specific technology (Myrinet), but
rather on characteristics (flow control, buffering, internal speed,
etc)
60.00%
Switch Size
Fifo Size
Switching
Levels
Efficiency
50.00%
8x8
NA
1
52.5%
32x32
0
2
37.3%
32x32
256 kB
2
51.8%
64x64
0
2
38.5%
64x64
256 kB
2
51.4%
96x96
0
3
27.6%
96x96
256 kB
3
50.7%
128x128
0
3
27.5%
128x128
256 kB
3
51.5%
40.00%
30.00%
20.00%
256 kb FIFOs
No FIFOs
10.00%
0.00%
8
32
64
96
128
Switch Size
FIFO buffers between switching levels allow to recover scalability
50 % efficiency “Law of nature” for these characteristics
J. Harvey : LHCb Computing
Slide 69
Alteon Tigon 2
 Features
 Dual R4000-class processor
running at 88 MHz
 Up to 2 MB memory
 GigE MAC+link-level interface
 PCI interface
 Development environment
 GNU C cross compiler with few
special features to support the
hardware
 Source-level remote debugger
J. Harvey : LHCb Computing
Slide 70
Controls System
Common integrated controls system
Storage
 Detector controls
 DAQ controls
LAN
CPC
ROC
Readout system
High voltage
Low voltage
Crates
Alarm generation and handling
etc.
WAN
Other systems
(LHC, Safety, ...)





Master
Configuration DB
Archives, etc.
Logfiles
CPC
PLC PLC
 RUN control
 Setup and configuration of all components
(FE, Trigger, DAQ, CPU Farm, Trigger algorithms,...)
 Consequent and rigorous separation of controls and DAQ path
CPC
...
CPC
PLC
CPC
PLC
Sub-Detectors &
Experimental equipment
Same system for both functions!
By itself sizeable Network!
Scale: ~100-200 Control PCs
Most likely Ethernet
many 100s of Credit-Card PCs
J. Harvey : LHCb Computing
Slide 71