Transcript Document

First attempt of ECS training
Work in progress…
A lot of material was borrowed!
Thanks!
Objectives
Get familiar with routine operation.
Get familiar with routine problem recovery.
Get familiar with the way to work inside a complex,
nearly chaotic, highly distributed
environment: rules must be followed…
Get familiar with the language.
Avoid details.
After the training you need to study
the TWiki documentation…
(and possibly contribute to it…).
Warnings
We are probably leaving aside many important things…
Many things are changing… and some will change a lot..
This tutorial is only meant as a broad overview.
The aim is to learn the basics for SD operation;
not to learn to develop parts of the ECS…
The other aim is to learn common usage and rules.
What is ECS ?
LHC era Control Technologies
Layer Structure
Storage
Technologies
Commercial
Configuration DB,
Archives,
Log files, etc.
Custom
FSM
WAN
Supervision
SCADA
Other systems
(LHC, Safety, ...)
LAN
LAN
Controller/
PLC
Process
Management
DIM
Communication Protocols
VME
PLC/UNICOS
Field Bus
Node
Node
Field
Management
Experimental equipment
Based on an original idea from LHCb
P.C. Burkimsher
OPC
PVSS & JCOP Framework Course May 2006
VME
Field buses & Nodes
Sensors/devices
ECS Scope
Experiment Control System
DCS Devices (HV, LV, GAS, Temperatures, etc.)
Detector Channels
L0
TFC
Front End Electronics
Readout Network
High Level Trigger
Storage
DAQ
External Systems (LHC, Technical Services, Safety, etc)
Clara Gaspar, March 2006
T.S
.
LHC
GAS
...
DetDcs1
SubSys
1
DAQ
DCS
DetDcs
N
SubSys
2
Dev
1
Dev
2
DetDaq
1
SubSysN
Dev
3
Dev
N
To Devices (HW or SW)
Clara Gaspar, March 2006
DSS
...
Commands
ECS
Status & Alarms
Abstract levels
ECS Generic Architecture
Control Units
❚ Each node is able to:
❙ Summarize information (for the above levels)
❙ “Expand” actions (to the lower levels)
❙ Implement specific behaviour
& Take local decisions
DCS
Tracke
r
❘ Sequence & Automate operations
❘ Recover errors
H
V
Tem
p
Muon
H
V
❙ Include/Exclude children (i.e. partitioning)
❘ Excluded nodes can run is stand-alone
❙ User Interfacing
❘ Present information and receive commands
Clara Gaspar, March 2006
GA
S
Device Units
❚ Device Units
❙ Provide the interface to real devices:
(Electronics Boards, HV channels, trigger algorithms, etc.)
Dev
❘ Can be enabled/disabled
N
❘ In order to integrate a device within FSM
〡Deduce a STATE from device readings (in DPs)
〡Implement COMMANDS as device settings
❘ Commands can apply the recipes previously defined
Clara Gaspar, March 2006
The Control Framework
❚ The FwFSM Component is based on:
❙ PVSS for:
Device Units
Control Units
❘ Device Description (Run-time Database)
❘ Device Access (OPC, Profibus, drivers)
❘ Alarm Handling (Generation, Filtering, Masking, etc)
❘ Archiving, Logging, Scripting, Trending
❘ User Interface Builder
❘ Alarm Display, Access Control, etc.
❙ SMI++ providing:
❘ Abstract behavior modeling (Finite State Machines)
❘ Automation & Error Recovery (Rule based system)
Clara Gaspar, March 2006
SMI++ Run-time Environment
❙ Device Level: Proxies
Obj
SMI Domain Obj
Obj
Obj
Obj SMI Domain
Obj
Obj
Obj
❘ drive the hardware:
〡deduceState
〡handleCommands
❘ C, C++, PVSS ctrl scripts
❙ Abstract Levels: Domains
❘ Implement the logical model
❘ Dedicated language - SML
❘ A C++ engine: smiSM
❙ User Interfaces
❘ For User Interaction
Proxy
Proxy
Proxy
Hardware Devices
❙ All Tools available on:
❘ Windows, Unix (Linux)
❘ All communications are transparent
and dynamically (re)established
Clara Gaspar, March 2006
Features of PVSS/SMI++
❚ Error Recovery Mechanism
❙ Bottom Up
❘ SMI Objects react to changes of their children
〡In an event-driven, asynchronous, fashion
❙ Distributed
❘ Each Sub-System recovers its errors
〡Each team knows how to recover local errors
❙ Hierarchical/Parallel recovery
❙ Can provide complete automation even for
very large systems
Clara Gaspar, March 2006
Sub-detector FSM Guidelines
❚ Started defining naming conventions.
❚ Defined standard “domains” per sub-detector:
❙ DCS
❘ DCS Infrastructure (Cooling, Gas, Temperatures, pressures, etc) that is
normally stable throughout a running period
❙ HV
❘ High Voltages or in general components that depend on the status of the
LHC machine (fill related)
❙ DAQ
❘ All Electronics and components necessary to take data (run related)
❙ DAQI
❘ Infrastructure necessary for the DAQ to work (computers, networks,
electrical power, etc.) in general also stable throughout a running period.
❚ And standard states & transitions per domain.
❚ Doc available in EDMS:
❘ https://edms.cern.ch/document/655828/1
Clara Gaspar, March 2006
Hierarchy & Conf. DB
ECS
1
Infrast.
DCS
HV
DAQI
DAQ
L0
TFC
HLT
LHC
1
MUON
VELO
DCS
DCS
MUON
VELO
HV
HV
MUON
VELO
DAQI
DAQI
MUON
VELO
DAQ
DAQ
2
Conf.
DB
1
VELO
DCS_1
VELO
DCS_2
VELO
DAQ_1
VELO
3
VELO
Dev1
VELO
Dev1
VELO
Dev1
DevN
VELO
DAQ_2
1
2
3
Configure/mode=“PHYSICS”
(Get “PHYSICS” Settings)
Apply Settings
Clara Gaspar, March 2006
LHC Era Control Technologies
Layer Structure
Storage
Technologies
Commercial
Configuration DB,
Archives,
Log files, etc.
Custom
FSM
WAN
Supervision
SCADA
Other systems
(LHC, Safety, ...)
LAN
LAN
Controller/
PLC
Process
Management
DIM
Communication Protocols
VME
PLC/UNICOS
Field Bus
Node
Node
Field
Management
Experimental equipment
Based on an original idea from LHCb
P.C. Burkimsher
OPC
PVSS & JCOP Framework Course May 2006
VME
Field buses & Nodes
Sensors/devices
What is JCOP?
• JCOP stands for “Joint Controls Project”
• Grouping of representatives from the 4 big
LHC experiments.
• Aims to reduce the overall manpower cost
required to produce and run the experiment
control systems
P.C. Burkimsher
PVSS & JCOP Framework Course May 2006
What is JCOP Framework?
• A layer of software components
– Produced in collaboration, components shared
– Produced using common tools, components
that work together
P.C. Burkimsher
PVSS & JCOP Framework Course May 2006
What is PVSS?
• The Supervisory Control And Data Acquisition
(SCADA) system chosen by JCOP.
– In-depth evaluation of products available (commercial
or open-source)
– JCOP (i.e. the experiments, i.e. you) chose PVSS
– Commercial product from ETM, Austria
– Since then, PVSS has been widely adopted across
CERN, not just used by the experiments
• PVSS is a TOOL, not a control system!
– You have to build your own system
P.C. Burkimsher
PVSS & JCOP Framework Course May 2006
What is PVSS (cont.)?
• PVSS II has capabilities for:
– Device Description
• Data Points, and Data Point items
– Device Access
• OPC, ProfiBus, Drivers
– Alarm Handling
• Generation, Masking, etc
– Alarm Display, Filtering, Summarising
– Archiving, Trending, Logging
– User Interface Builder
– Access Control
P.C. Burkimsher
PVSS & JCOP Framework Course May 2006
What is PVSS not?
• PVSS II does not have tools specifically for:
– Abstract behaviour modelling
• Finite State Machines
– Automation & Error Recovery
• Expert System
• But…
– FSM (SMI++) does
P.C. Burkimsher
PVSS & JCOP Framework Course May 2006
PVSS
Clara Gaspar, March 2006
PVSS Features
❚ Open Architecture
❙ We can write our own managers
➨It can be interfaced to anything (FSM, DIM)
❚ Highly Distributed
❙ 130 Systems (PCs) tested
➨No major problem found
❚ Standard Interface
❙ All data of all sub-systems defined as
DataPoints!
Clara Gaspar, March 2006
What is FSM?
❚ Finite State Machine (FSM)
❙ Abstract representation of your experiment.
What state is it in? Is it taking data? Is it in
standby? Is it broken? Is it switched off?
What triggers it to move from one of these
states to another?
❙ JCOP choose the State Management
Interface (SMI++) developed for the
DELPHI experiment.
❙ SMI = tool to build an FSM + Expert system.
Vital for controlling & recovering large
experiments Clara Gaspar, March 2006
Implementation of the ECS
A mixed Win/Linux cluster,
with shared resources (network disks, via SAMBA).
PCs:
– Controls PC: used to directly control some device.
– Control Room consoles: used to connect to controls PC.
– General servers: gateways to the external world, etc…
The mixed cluster means:
you need to master the basics of both Win and Linux.
Interfacing the HW:
– CCPC (Credit Card PC), Linux, integrated in the cluster;
local intelligence on electronics boards: UKL1 and HV.
– SPECS system (in radiationa areas):  Antonis.
Computing Environment at IP8
Access via the gateways
(lbgw for Linux, lbts for Windows).
The LHCb gateways are only visible from inside the
CERN network/firewall.
Users have personal logins on the LHCb network.
Online administrators:
[email protected]
The login and all computing infrastructure is common
across both Linux (including CCPC) and Windows.
Note that from inside the LHCb network the external
world is not, in general, accessible.
Computing Environment at IP8
There is an area set aside for common RICH software:
/group/rich/ and G:\rich respectively.
Group-wide login profile for the Linux systems at
/group/rich/scripts/rich_login.sh
See TWiki for file protection issues….(important).
The group area must only be used for files
used for running the detectors!
Remote Access to ECS PC
After logging into the LHCb network,
any ECS PC can be accessed as follows.
Windows to Windows: use remote desktop.
Linux to Linux: use ssh,
X sessions are not yet enabled (???) on the ECS PC.
Windows to Linux (including CCPCs):
– start the Exceed X server on the local PC;
default options are normally ok:
mode: passive,
security: any host access,
display: multiple plus display in localhost;
– logon via ssh with PuTTY; enable:
X11 forwarding and X display location = localhost.
Other
The oper folder in the group area
contains a lot of useful shortcuts for common operations.
Generic rich_shift account:
must only be used when logging
on the consoles in the control room.
It will be treated as scratch: for example
files stored by this user can be deleted at any time.
I strongly suggest that everybody uses its own account…
Which tools?
Web Console (healthiness of software components).
FSM panel (routine operation).
ECS manager panel (routine debugging).
Expert on-call (routine problem fixing…).
Logbook (identify yourself only using your account!).
When everything else fails …
Which tools?
Carmelo!
Routine Checks/Operations
Such a complex system need daily babysitting…
– many routine checks must be carried on,
to identify and/or trying to prevent problems.
A routine check-list is to be defined…
Everything relevant must be precisely written
in the logbook: this might save your time next time
and for sure it will save time to somebody else…
Write the issue, write the fixing!
Every problem must be delivered
to the appropriate list of people.
Warnings
Be always very careful:
in a distributed system non local effects may happen!
PVSS implementation
Distributed system across Win/Linux: some PVSS projects
run on windows, some on Linux (all CCPC-related).
Projects are installed in local disks: L:\pvvs | /localdisk/pvss.
FW and RICH components installed in the group area.
PVSS projects run as system services (Win only, so far).
The basic process is PVSS00pmon:
check via TaskManager | ps.
PVSS is basically running in background, connect to it!
Beware: PVSS is everywhere: every problem will reflect on
PVSS, this does not mean that there is a problem with PVSS!
PVSS console: shows managers and allow controlling them.
The components of ECS
Sub-Systems
– DCS MONITORING
– DCS LV and SiBias
– HV
– DAQ L0
– DAQ L1
– FSM
– Configuration DB
– Conditions DB
Interface to Gas, Cooling&Ventilation, DSS, Magnet.
ECS operation
Distributed system:
all systems can talk together and exchange data.
Can do many (but not - yet - all) operations
from a single machine:
no need to log on the Controls PC
(there are still currently many limitations!).
Some PVSS-related operations
RICH-ECS web panel (Mozilla) slide
PVSS Web Console
Normal Operations are handled via the FSM view:
 Antonis
Normal Debugging (also routine debug operations)
are via the ECS-Manager panels:
local/remote functions useful for debugging…
It complements and integrates the FSM panels;
it is intended more for easy and quick access
to a number of functions and tools
required outside routine operation and for debugging.
- slide A miscellanea of panels
Normal Operation: the FSM tree
See Antonis.
Used for routine operation:
– Everything must be accessible navigating the tree.
– Everything shall go via simple FSM commands.
– To be used by LHCb shifters also:
simple, clear, robust and mistake-protected.
– Normal operations, including error recovery,
must not require the operator to navigatethe tree
nor do any complex actions.
DSS info
?
Not everything is done, nor final, nor bug-free/perfect.
We need to exercise and stress the system
to spot problems which cannot be seen
at the current stage…
Many things need to be finalized
and the system must be stress-tested.
Reaction to alarm situations not yet complete.
Documentation not yet complete.
To do after!
All in twiki: study
The HV control
CCPC program:
– log onto the CCPC;
– type HVSetup;
– follow the message
(after having studied the instructions in TWiKi).
The PVSS interface…
HV PVSS Controls
The interface to the HW is done by the CCPC program;
the PVSS project is only a flexible interface to the CCPC
program.
A first production version of the PVSS controls is
available at the pit:
– Monitoring of the CCPC data and the ELMB voltage
measurements;
– Full control of the CCPC:
Single channel control;
All channels control via the FSM and recipes:
– TEST / COMMISSIONING / PHYSICS ..
– Many trace plots..
Warnings
If you do changes via the CCPC program PVSS
is confused: it does not (yet) receive read-back settings.
The FSM states are not always (yet) properly evaluated:
take them with care and report issues:
– I am trying to take care of a lot of information…
– No real test outside the pit is good enough…
WARNING means: I have contradictory information,
keep watching; it is often a temporary state.
Always read TWiKi for updates….
Make sure not to confuse:
– The ISEG channel (0-19);
– The physical column
(which the ELMB monitoring refers to).
HV Controls: automatic actions
The CCPC server will switch-off in case of OvCurr:
The CCPC server will switch-off in case of (UnCurr, OvVolt, UnVolt).
Other actions must be coordinated by PVSS,
if they need information not available by the CCPC.
Currently: PVSS gets information by the ELMB monitoring.
• Very simple objects with simple functions.
EM
HV
• Avoid to make more complex Device Units
and objects to introduce alarm handling.
EM_0
EM_1
Col_0
Col_1
AL_0
AL_1
HV_0
HV_1
HW
HW
TWiKi
Link