SMI++ A Finite State Machine Toolkit

Download Report

Transcript SMI++ A Finite State Machine Toolkit

The LHCb Run Control System
An Integrated and
Homogeneous Control System
Clara Gaspar, May 2010
The Experiment Control System
❚ Is in charge of the Control and Monitoring of all parts
of the experiment
Experiment Control System
DCS Devices (HV, LV, GAS, Cooling, etc.)
Detector Channels
L0
TFC
Front End Electronics
Readout Network
HLT Farm
Storage
Monitoring Farm
DAQ
External Systems (LHC, Technical Services, Safety, etc)
Clara Gaspar, May 2010
2
Some Requirements
❚ Large number of devices/IO channels
➨ Need for Distributed Hierarchical Control
❘ De-composition in Systems, sub-systems, … , Devices
❘ Local decision capabilities in sub-systems
❚ Large number of independent teams and very different
operation modes
➨ Need for Partitioning Capabilities (concurrent usage)
❚ High Complexity & Non-expert Operators
➨ Need for Full Automation of:
❘ Standard Procedures
❘ Error Recovery Procedures
➨ And for Intuitive User Interfaces
Clara Gaspar, May 2010
3
Design Steps
❚ In order to achieve an integrated System:
❙ Promoted HW Standardization
(so that common components could be re-used)
❘ Ex.: Mainly two control interfaces to all LHCb electronics
〡Credit Card sized PCs (CCPC) for non-radiation zones
〡A serial protocol (SPECS) for electronics in radiation areas
❙ Defined an Architecture
❘ That could fit all areas and all aspects of the monitoring and
control of the full experiment
❙ Provided a Framework
❘ An integrated collection of guidelines, tools and components
that allowed the development of each sub-system
coherently in view of its integration in the complete system
Clara Gaspar, May 2010
4
Generic SW Architecture
ECS
Status & Alarms
Commands
INFR.
SubDet1
LV
DCS
SubDet1
DCS
SubDet2
DCS
SubDet1
TEMP
SubDet1
GAS
TFC
…
SubDetN
DCS
DAQ
SubDet1
DAQ
SubDet1
FEE
SubDet2
DAQ
SubDet1
RO
HLT
…
LHC
SubDetN
DAQ
Legend:
Control
Unit
LV
Dev1
LV
Dev2
…
LV
DevN
FEE
Dev1
FEE
Dev2
Clara Gaspar, May 2010
…
FEE
DevN
Device
Unit
5
The Control Framework
❚ The JCOP* Framework is based on:
❙ SCADA System - PVSSII for:
Device Units
Control Units
❘ Device Description (Run-time Database)
❘ Device Access (OPC, Profibus, drivers)
❘ Alarm Handling (Generation, Filtering, Masking, etc)
❘ Archiving, Logging, Scripting, Trending
❘ User Interface Builder
❘ Alarm Display, Access Control, etc.
❙ SMI++ providing:
❘ Abstract behavior modeling (Finite State Machines)
❘ Automation & Error Recovery (Rule based system)
* – The Joint COntrols Project (between the 4 LHC exp. and the CERN Control Group)
Clara Gaspar, May 2010
6
Device Units
Device
Unit
❚ Provide access to “real” devices:
❙ The Framework provides (among others):
❘ “Plug and play” modules for commonly used
equipment. For example:
〡CAEN or Wiener power supplies (via OPC)
〡LHCb CCPC and SPECS based electronics (via DIM)
❘ A protocol (DIM) for interfacing
“home made” devices. For example:
〡Hardware devices like a calibration source
〡Software devices like the Trigger processes
(based on LHCb’s offline framework – GAUDI)
❘ Each device is modeled as a Finite State Machine
Clara Gaspar, May 2010
7
Hierarchical control
Control
Unit
❚ Each Control Unit:
❙ Is defined as one or more Finite State Machines
❙ Can implement rules based on its children’s states
❙ In general it is able to:
❘ Summarize information (for the above levels)
❘ “Expand” actions (to the lower levels)
❘ Implement specific behaviour
& Take local decisions
〡Sequence & Automate operations
〡Recover errors
❘ Include/Exclude children (i.e. partitioning)
DCS
Tracker
DCS
Muon
DCS
Muon
LV
…
Muon
GAS
〡Excluded nodes can run is stand-alone
❘ User Interfacing
〡Present information and receive commands
Clara Gaspar, May 2010
8
Control Unit Run-Time
❚ Dynamically generated operation panels
(Uniform look and feel)
❚ Configurable
User Panels
and Logos
❚ “Embedded”
standard
partitioning
rules:
❙
❙
❙
❙
Clara Gaspar, May 2010
Take
Include
Exclude
Etc.
9
Operation Domains
❚ Three Domains have been defined:
❙ DCS
❘ For equipment which operation and stability is
normally related to a complete running period
Example: GAS, Cooling, Low Voltages, etc.
❙ HV
❘ For equipment which operation is normally related
to the Machine state. Example: High Voltages
❙ DAQ
❘ For equipment which operation is related to a RUN
Example: Readout electronics, High Level Trigger
processes, etc.
Clara Gaspar, May 2010
10
FSM Templates
❚ DCS Domain
Recover
ERROR
❚ HV Domain
Switch_OFF
NOT_READY
ERROR
NOT_READY
OFF
OFF
Switch_ON
Recover
Go_STANDBY1
RAMPING_STANDBY1
Switch_OFF
STANDBY1
READY
Go_STANDBY2
RAMPING_STANDBY2
STANDBY2
Go_READY
RAMPING_READY
❚ DAQ Domain
ERROR
READY
Recover
UNKNOWN
NOT_READY
Configure
CONFIGURING
Reset
READY
Start
Stop
❚ All Devices and SubSystems have been
implemented using one
of these templates
RUNNING
Clara Gaspar, May 2010
11
ECS: Run Control
❚ Size of the Control Tree:
ECS
❙ Distributed over ~150 PCs
❘ ~100 Linux
(50 for the HLT)
❘ ~ 50 Windows
❙ >2000 Control Units
❙ >50000 Device Units
HV
DCS
TFC
X
X
SubDet1
DCS
…
DAQ
SubDetN
DCS
SubDet1
DAQ
…
HLT
LHC
SubDetN
DAQ
SubDet1
❚ The Run Control can be seen as:
❙ The Root node of the tree
➨ If the tree is partitioned there can be
several Run Controls.
Clara Gaspar, May 2010
12
Partitioning
❚ Creating a Partition
❚ ECS Domain
❙ Allocate = Get a “slice” of:
NOT_ALLOCATED
Allocate
ALLOCATING
Deallocate
Recover
NOT_READY
Configure
Timing & Fast Control (TFC)
High Level Trigger Farm (HLT)
Storage System
Monitoring Farm
CONFIGURING
Detector
Reset
L0
Trigger
READY
StartRun
VELO
L0 trigger
LHC clock
StopRun
TFC
System
ST
OT
RICH
ECal
HCal
Muon
FE
FE
FE
FE
FE
FE
FE
Electronics Electronics Electronics Electronics Electronics Electronics Electronics
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Front-End
ACTIVE
MEP Request
READOUT NETWORK
StartTrigger
StopTrigger
Event building
RUNNING
SWITCH
SWITCH SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
C C CC C C CC
P P P P P P P P
U UUU U UUU
C C C C
P P P P
U UUU
C C CC
P P P P
U UUU
C CC C
P P P P
U UUU
CC CC
PP P P
UU UU
SWITCH
CC CC
P P P P
UU UU
MON farm
Experiment Control System (ECS)
ERROR
❘
❘
❘
❘
HLT farm
Event data
Timing and Fast Control Signals
Control and Monitoring data
Clara Gaspar, May 2010
13
Run Control
❚ Matrix
Domain
X
Sub-Detector
❚ Activity
Used for
Configuring all
Sub-Systems
Clara Gaspar, May 2010
14
Sub-Detector Run Control
❚ “Scan” Run
Clara Gaspar, May 2010
15
LHCb Operations
❚ Two operators
on shift:
❙ Data Manager
❙ Shift Leader
has 2 views of
the System:
❘ Run Control
❘ Big Brother
❚ Big Brother
❙ Manages the
LHC <-> LHCb
dependencies
❙ SubDetector
Voltage x LHC
State table
Clara Gaspar, May 2010
16
Automation
❚ Automation at several levels:
HLT
LHCb
BigBrother
Autopilot
❚ Always done by the FSM
(not by the panels)
Clara Gaspar, May 2010
17
Conclusions
❚ LHCb has designed and implemented a coherent and
homogeneous control system
❚ The Run Control allows to:
❙ Configure, Monitor and Operate the Full Experiment
❙ Run any combination of sub-detectors in parallel in standalone
❙ Can be completely automated (when we understand the machine)
❚ Some of its main features:
❙ Sequencing, Automation, Error recovery, Partitioning
➨ Come from the usage of SMI++ (integrated with PVSS)
❚ It’s being used daily for Physics data taking and other
global or sub-detector activities
Clara Gaspar, May 2010
18