Transcript Document

A 40 MHz Trigger-free
Readout Architecture
for the LHCb experiment
16th IEEE-NPSS Real Time Conference,
10-15 May 2009,
Beijing, China
Federico Alessio, CERN
Zbigniew Guzik, IPJ, Swierk, Poland
Richard Jacobsson, CERN
Future of LHCb
“The LHCb Readout System and Real-Time Event Management”, TDA2-1, Thursday , 8h30
LHCb @ LHC
 Instantaneous luminosity in IP: tunable from 2x1032 cm-2s-1
to 5x1032 cm-2s-1 (factor 50 less than nominal LHC lumi)
Expected ∫L = 10 fb-1 collected after 5 years of operations
 Probe/measure NewPhysics at 10% level of sensitivity
 Measurements limited by statistics and detector itself ,
NOT BY LHC
LHCb @ S-LHC
 Collect ∫L = 100 fb-1  a factor 10 increase in data sample and in reasonable time
 probe NewPhysics down to a percent level
 Increase luminosity by a factor 10 @ LHCb, up to 2x1033 cm-2s-1
 assuming same bunch structure,
30 MHz S-LHCb effective interaction rate vs. 10 MHz LHCb
1 MHz bb-pair rate @ S-LHCb vs. 100 KHz @ LHCb
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
2
How to survive?
 Pile-up problem: Current LHCb not designed for multiple interactions per crossing
 <N> = 0.5 @ 2x1032 cm-2s-1 and <N> = 4 @ 20x1032 cm-2s-1
 Higher radiation damages over time
 Spill-over not minimized completely
 First-level trigger limited for hadronic modes at >2x1032 cm-2s-1  25% efficiency vs. 75% for muonic modes
 Increase hadron trigger efficiency by at least a factor 2
 At S-LHCb 1 MHz bb-pair rate  Trigger-free
How?
 Original LHCb performance as a baseline, new technologies for subdetectors to be replaced
More radiation hard
Reduced spill-over
Improved granularity
 Continuous 40 MHz Trigger-free Readout Architecture
 all detector data passed through the readout network
 all detector data available for High-Level Trigger (HLT)
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
In practice
Federico Alessio
3
LHCb Readout System Upgraded
Detector
L0
Trigger
L0 trigger Timing &
LHC clock
Fast
Control
System
VELO
ST
OT
RICH
ECal
HCal
Muon
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Front-End
MEP Request
READOUT NETWORK
Rethink/
Redraw/
Adapt/
Upgrade/
Replace
Event building
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
PP P P
UU U U
SWITCH
CC C C
P P P P
UU U U
MON farm
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
HLT farm
Federico Alessio
4
Architectures, Old vs New
LHC
CLOCK
TRIGGER
EVENT REQUESTS
L0
TFC
ECS
TFC DATA BANK
DATA
DATA
FE
ROB
S-LHC
FARM
CLOCK
EVENT REQUESTS
S-TFC
TFC DATA BANK
TFC, DATA, ECS
S-FE
S-ECS
DATA
S-ROB
S-FARM
 No L0-trigger
 Point-to-point bidirectional high-speed optical links
 Same technology and protocol type for readout, TFC and throttle
 Reducing number of links to FE by relaying ECS and TFC information via ROB
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
5
Overall Requirements
 Need to define protocols.
 Very likely the readout link FE-ROB and the protocol will be based on
CERN-GigaBitTransceiver (GBT)
 Need to define buffer sizes and truncation scheme to be compliant with the
worst scenario possible (big consecutive events which could overflow
memories).
 Need to fully control the phase of the recovered clock at the FE.
 Necessary reproducibility of the clock phase each time the system is
switched off/on
 The jitter of the reconstructed clock must be very small (< 10ps RMS).
 Need to control the rate in order to allow a “staged” installation
 Partitioning as a crucial aspect for parallel stand-alone tests
and sub-detectors development (test-bench support)
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
6
Implications for Front-End
Courtesy Ken Wyllie, LHCb
The S-FE records and transmits data @ 40 MHz, via optical link @ 4.8 Gb/s (3.2 Gb/s data)
It is necessary that Zero Suppression is
performed in rad-hard FE
 Asynchronous data transfer
 Data has to be tagged with identifiers
in header
 Realigned in Readout Boards
S-FE logical scheme
NZS data, event size is
400kB@40MHz = ~16TB/s!!
D
DERANDOMIZING
BUFFER
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
ADC
ZERO
SUPPRESS
Federico Alessio
7
Timing and Fast Control (1)
 Readout system requires timing, synchronization and various synchronous and asynchronous
commands
S-LHC
CLOCK
EVENT REQUESTS
S-TFC
TFC DATA BANK
TFC, DATA, ECS
S-FE
S-ECS
DATA
S-ROB
S-FARM
 Receive, distribute and align LHC clock and revolution frequency to readout electronics
 Transmit synchronous reset commands, calibration sequences and control the latency of
commands
 Back-pressure mechanism from S-ROB to handle network congestion
1. Effectively, throttle the readout rate
2. Possibly implementing an “intelligent” throttle mechanism, capable of distinguish interesting
physics events locally in each S-ROB
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
8
Timing and Fast Control (2)
S-LHC
CLOCK
EVENT REQUESTS
S-TFC
TFC DATA BANK
TFC, DATA, ECS
S-FE
S-ECS
DATA
S-ROB
S-FARM
Farm has to grow in size, speed and bandwidth
 Destination Control for the event packets in order to let the S-ROBs know where to send the event
(to which IP address)
 Request Mechanism (EVENT REQUESTS) to let the destination controller in the TFC system
know if a node is available or not. The definition of such a readout scheme is a “push protocol with a
passive pull”
 A data bank has to contain info about the identity of an event and trigger source information. This info is
added to each event (TFC DATA BANK)
 New TFC system (prototype of S-TFC) has to be ready well before the rest of the electronics in order to
allow development and testing, and validate conformity with the overall specs
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
9
S-TFC Architecture
(i.e. the new s-heartbeat of the LHCb experiment)
Detector
S-LHC Timing & Info
LHC clock
TFC
System
S-TFC Master
VELO
ST
OT
RICH
ECal
HCal
Muon
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
S-FE (single slice)
Front-End
MEP Request
READOUT NETWORK
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
PP P P
UU U U
D
SWITCH
MON farm
D
HLT farm
CERN-GBT
6
Switch Logic
Built-in
SERDES layer
FAN-OUT/FAN-IN
Logic
+
optional
TFC Master Logic
FAN-OUT
TFC, Throttle
Master FPGA (ARRIA II GX)
TFC
THROTTLE
#S-ROB/crate
From S-ROBs
Throttle @ 1.6Gb/s
via SERDES
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
DATA+MON
S-TFC Interface
To S-ROBs
TFC @ 2.4-3.0Gb/s
via GX transceiver &
electrical FAN-OUT
DATA+MON
#links = #LHCb sub-systems
~20m distance
2.4-3.0 Gb/s optical
DATA+MON
DATA+MON
Built-in GX Transceivers layer
TFC ENCODER/
DECODER
SERDES
Programmable Switch layer
(Partitioning)
CERN-GBT
TFC-Master
Instantiations (x6)
CERN-GBT
S-FARM
CC C C
P P P P
UU U U
TFC+CONF
S-ECS
CERN-GBT
PHY
CLK
TFC-Master
logic
D
D
PHY
TFC
SERVER
6
D
Event building
Slave FPGA
(NIOS II on
CYCLONE III)
Master FPGA (STRATIX IV GX)
Clock Fanout
Readout
Logic
S-ROB (crate of i.e. 20)
Federico Alessio
10
S-TFC Protocols
S-TFC Master  S-TFC Interface link
 TFC control fully synchronous 60bits@40MHz  2.4 Gb/s (max 75 bits@ 40 MHz  3.0 Gb/s)
EVENT ID
(4-12 bits)
TFC information
(40-32 bits)
ReedSolomon-FEC
(16 bits)
1. Reed Solomon-encoding used on TFC links for maximum reliability (header ~16 bits)
(ref. CERN-GBT)
2. Asynchronous data  TFC info must carry Event ID

Throttle(“trigger”) protocol
EVENT ID
(4-12 bits)
THROTTLE information
(20 bits)
OTHERS
ReedSolomon-FEC
(16 bits)
1. Must be synchronous (currently asynchronous)
 Protocol will require alignment

TFC control protocol incorporated on link between S-FE and S-ROB (i.e. CERN GBT)
S-TFC Interface  S-ROB
 Copper or backplane technology (In practice 20 HI-CAT bidirectional links)
 TFC synchronous control protocol same as S-TFC Master S-TFC Interface
 One GX transmitter with external transmitter 20x-fan-out (PHYs - electrical)
 Throttle(“trigger”) protocol using 20x SERDES interfaces <1.6 Gb/s
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
11
Overall Requirements
 Need to define protocols.
 Very likely the readout link FE-ROB and the protocol will be based on
CERN-GigaBitTransceiver (GBT)
 Need to define buffer sizes and truncation scheme to be compliant with the
worst scenario possible (big consecutive events which could overflow
memories).
 Need to fully control the phase of the recovered clock at the FE
 Necessary reproducibility of the clock phase each time the system is
switched off/on
 The jitter of the reconstructed clock must be very small (< 10ps RMS).
 Need to control the rate in order to allow a “staged” installation
 Partitioning as a crucial aspect for parallel stand-alone tests
and sub-detectors development (test-bench support)
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
12
Reaching the requirements: phase control
Use of commercial electronics:
S-LHC Timing
S-TFC Master
PHY
Slave FPGA
(NIOS II on
CYCLONE III)
CLK
Master FPGA (STRATIX IV GX)
Clock Fanout
TFC
SERVER
6
TFC-Master
logic
S-ECS
 Clock fully recovered from data transmission
(lock-to-data mode)
 Phase adjusted via register on PLL
 Jitter mostly due to transmission over fibres,
could be minimized at sending side
D
D
D
D
TFC ENCODER/
DECODER
#S-ROB/crate
THROTTLE
SERDES
FAN-OUT
TFC, Throttle
Built-in
SERDES layer
TFC
DATA+MON
Master FPGA (ARRIA II GX)
FAN-OUT/FAN-IN
Logic
+
optional
TFC Master Logic
DATA+MON
To S-ROBs
TFC @ 2.4-3.0Gb/s
via GX transceiver &
electrical FAN-OUT
DATA+MON
S-TFC Interface
DATA+MON
#links = #LHCb sub-systems
~20m distance
2.4-3.0 Gb/s optical
TFC+CONF
CERN-GBT
Built-in GX Transceivers layer
CERN-GBT
S-FARM
Switch Logic
CERN-GBT
6
Programmable Switch layer
(Partitioning)
CERN-GBT
PHY
TFC-Master
Instantiations (x6)
Readout
Logic
From S-ROBs
Throttle @ 1.6Gb/s
via SERDES
S-ROB
1. Use commercial or custom-made Word-Aligner output
FPGA S-TFC Master
TX[0]
FPGA S-TFC Interface
RX[0]
TX[1]
Clock
phase
shift
(left)
External
clock
WA
# of
bit-slips
RX[1]
BIT X-1
PLL
BIT
SLIP
2. Scan the phase of clock within “eye diagram”
FPGA S-ROB
WA
BIT X
BIT X+1
Clock
output
WA Phase
bitslip shift
(right)
out
BIT X-2
BIT X-1
BIT X
BIT X+1
Still investigating feasibility and fine precision
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
13
Simulation
 Full simulation
framework to study
buffer occupancies,
memories sizes,
latency, configuration
and logical blocks
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
14
Summary
New approach towards a 40 MHz trigger-free architecture
 Evaluated old system and carry over experience
 Use of point-to-point optical link technology for the entire readout system
 Maximum level of flexibility reached by using FPGA-based boards
 no need of complex routing, no need of big number of “different” boards
 GX transceiver as IP cores from Altera
 No First-Level Trigger and no direct link to FE from TFC
 Essential to fully control the phase and latency of the clock and TFC info
Validation with prototype is underway
 TFC System prototypes must be ready before any other
 Developing full simulation framework
Thanks for your attention
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
15
Backup
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
16
Intro: Giving out Numbers
From today (2009) to the near future (2013):
 Clock rate of 40 MHz, effective rate of events of 10MHz
 Expects to collect 10 fb-1, which allows for wide range of analysis, with high sensitivity
to new physics. Foreseen spectacular progress in heavy flavour physics
 Readout Supervisor based on 4-FPGAs fully programmable (total of ~25k logical
elements) and customizable (40/80 MHz clock speed and output based on 1 Gbit/s
cards)
 Readout network based on optical links (200 MB/s, ~400 links)
 ~16000 CPU cores foreseen for the LHCb Online Farm; ~1000 3GHz Intel
Harpertown quad-cores (~4500 individual cores) at present
 Storage system of > 50 TB at 400/500 MB/s
 Uninterrupted readout of data at 1MHz, effective reduction of factor 10 in selecting
events. Event size of 3,5x104 Bytes “dumped” in the Grid.
 Full (and 100% reliable) Readout Control System in place (ETM PVSS II). It is able to
start, configure and control some ~20000 elements between FARM and FEE and ROBs
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
17
S-TFC Master, specs
Board with one big central FPGA (Altera Stratix IV GX or alt. Stratix II GX for R&D)
 Instantiate a set of TFC Master cores to guarantee partitioning control for subdetectors
 TFC switches is a programmable patch fabric: a layer in FPGA
 no need of complex routing, no need of “discrete” electronics
 Shared functionalities between instantiations (less logical elements)
 More I/O interfaces based on bidirectional transceivers
 depend on #S-ROBs crates
 No direct links to FE
 Common server that talks directly to each instantiation:
 TCP/IP server in NIOS II
 Flexibility to implement (and modify any protocol)
 GX transceiver as IP cores from Altera
 Bunch structure (predicted/measured) rate control
 State machines for sequencing resets and calibrations
 Information exchange interface with LHC
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
18
S-TFC Interface, specs
Board with FPGA entirely devoted to fan-out TFC information/fan-in throttle info
 Controlled clock recovery
 Shared network for Throttling (Intelligent) & TFC distribution
 All links bidirectional
 1 link to S-TFC Master, 2.4 - 3.0 Gb/s, optical
 1 link per S-ROB, 20 max per board (full crate)
 Technology for S-ROBs links could be backplane (ex. xTCA) or copper HI-CAT
 Protocol flexible: compatibility with flexibility of S-TFC Master
 We will provide the TFC transceiver block for S-ROBs’ FPGA to bridge data to
FE through readout link S-FE  S-ROB
 For stand-alone test benches, the Super-TFC Interface would do the work of a
single TFC Master instantiation
16th IEEE NPSS Real Time Conference, 10-15 May 2009,
Beijing, China.
Federico Alessio
19