LHCb Trigger and Data Acquisition System

Download Report

Transcript LHCb Trigger and Data Acquisition System

Management of the LHCb Online
Network Based on SCADA System
Guoming Liu*†, Niko Neufeld†
* University of Ferrara, Italy
† CERN, Geneva, Switzerland
Outline
 Introduction to LHCb Online system
 LHCb online network
 Network management based on SCADA system
 Summary
ICALEPCS2009
Guoming Liu
2
LHCb online system
 LHCb is one of the large particle physics experiments on
LHC at CERN
 Online system is one of the infrastructures for LHCb,
providing IT services for the entire experiment
 Three major components:
 Data Acquisition (DAQ)
Transfers the event data from the detector front-end
electronics to the permanent storage
 Timing and Fast Control (TFC)
Provides fast clock and drives all stages of the data readout of
the LHCb detector between the front-end electronics and the
online processing farm
 Experiment Control System (ECS),
Controls and monitors all parts of the experiment
ICALEPCS2009
Guoming Liu
3
LHCb online system
L0
Trigger
L0 trigger
LHC clock
TFC
System
VELO
ST
OT
RICH
ECal
HCal
Muon
FEE
FEE
FEE
FEE
FEE
FEE
FEE
Readout Readout Readout Readout Readout Readout Readout
Board
Board
Board
Board
Board
Board
Board
Front-End
CASTOR
MEP Request
READOUT NETWORK
Event building
SWITCH SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
P P P P
UU U U
SWITCH
CCCC
PPPP
UUUU
MON farm
C C C C
P P P P
U U U U
HLT farm
Experiment Control System (ECS)
Detector
Event data
Timing and Fast Control Signals
Control and Monitoring data
ICALEPCS2009
Guoming Liu
4
LHCb Online Network
 Two dedicated networks:
 Control network: general purpose network for experiment control
system
Connects all the Ethernet devices in LHCb
 Data network: dedicated to data acquisition
Performance critical
ICALEPCS2009
Guoming Liu
5
LHCb Online Network
Two geographic parts: surface and underground
Connected by two 10G links
ICALEPCS2009
Guoming Liu
6
LHCb Online Network
On the surface
Core CTRL
Routers
Core DAQ
Router
DAQ Access
Switches (~50)
CTRL Access
Switches (~100)
ICALEPCS2009
Guoming Liu
7
Network Monitoring System based on SCADA
 Motivation
 This large network needs sophisticated monitoring
 Integration into LHCb ECS coherently
 Provides homogeneous interfaces for non-expert shift-crew
Commercial network management software?
 Expensive
 Integration?
ICALEPCS2009
Guoming Liu
8
Network Monitoring System: Architecture
 Supervisory layer
 PVSS II: commercial
SCADA system
 JCOP: Joint Control
Project for LHC
experiments
DIM
 Front–end Processes:
 SNMP
 sFlow
 syslog
 Data communication
SNMP / sFlow / Syslog
 DIM: Distributed
Information Management
ICALEPCS2009
Guoming Liu
9
Network Monitoring System: FSM
 All behaviors are modeled as Finite State Machines (FSM)
 Hierarchical structure: status/command propagated
 Device Units:
 Device Description
 Device Access
 Based on PVSS II
datapoint: Alarm
Handling, Archiving,
Trending etc.
 Control Units
 Abstract behavior
modeling
 Represents the
associated sub-tree
ICALEPCS2009
Guoming Liu
10
Network Monitoring System
The major items under monitor
 Physical topology
 Discovery of the network topology based on the Link Layer
Discovery Protocol (LLDP)
 Discovery of the network nodes: based on the information in
switches (ARP, MAC forwarding table)
 Traffic
 Octet / packet counters
 Discard/Error counters
...
 Switch status: CPU/Memory, temperature, power supply , . . .
 Data Paths for DAQ
ICALEPCS2009
Guoming Liu
11
Network Monitoring Snapshot(1): Topology
ICALEPCS2009
Guoming Liu
12
Network Monitoring Snapshot(2): traffic
ICALEPCS2009
Guoming Liu
13
Summary
 The network management system has been implemented
based on the commercial SCADA system PVSS II and the
framework JCOP
 It provides sophisticated monitoring of the network which
are essential for our operation, i.e. switch status, traffic
 It provides the homogenous operation interface and
intuitive display as well
 Currently only monitoring is provided, some control
commands of switches to be integrated
ICALEPCS2009
Guoming Liu
14
Thanks for your attention!
ICALEPCS2009
Guoming Liu
15
Backup
ICALEPCS2009
Guoming Liu
16
NMS Architecture:
front-end processes
 SNMP: Simple network management protocol
Used for general network monitoring, configuring
 sFlow:
 A sampling mechanism to capture traffic data
 Based on hardware.
 Two kinds of sFlow samples: flow samples and counter
samples.
Used on the core switch to collect traffic counters:
SNMP too slow, and consumes high CPU/Memory
 Syslog: event notification messages
 Three distinct parts: priority, header and message.
 The priority part represents both the facility and severity
of the message.
ICALEPCS2009
Guoming Liu
17
Network Monitoring: hardware/system
 Syslog can collect some information not covered by SNMP
 Syslog server is setup to receive the syslog messages
from the network devices and parse the messages.
Alarm information:
 Hardware: temperature, fan status, power supply status
 System: CPU, memory, login authentication etc.
 All the messages with the priority higher than warning,
will be sent to PVSS for further processing
ICALEPCS2009
Guoming Liu
18
Network Monitoring: IP routing
 Monitoring the
status of the
routing using
“ping“/”arping”
 Three stages for
the DAQ:
1. From readout
board to HLT
farm
2. From HLT Farm to
the LHCb online
storage
3. From the online
storage to CERN
CASTOR
ICALEPCS2009
Detector
L0
Trigger
VELO
ST
OT
RICH
ECal
HCal
Muon
FEE
FEE
FEE
FEE
FEE
FEE
FEE
Readout Readout Readout Readout Readout Readout Readout
Board
Board
Board
Board
Board
Board
Board
Front-End
CASTOR
READOUT NETWORK
1
Event building
3
2
SWITCH SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
C C C C
P P P P
U U U U
CC C C
P P P P
UU U U
SWITCH
CCCC
PPPP
UUUU
MON farm
C C C C
P P P P
U U U U
HLT farm
Event data
Timing and Fast Control Signals
Control and Monitoring data
Guoming Liu
19