Transcript Document
First attempt of ECS training Work in progress… A lot of material was borrowed! Thanks! Objectives Get familiar with routine operation. Get familiar with routine problem recovery. Get familiar with the way to work inside a complex, nearly chaotic, highly distributed environment: rules must be followed… Get familiar with the language. Avoid details. After the training you need to study the TWiki documentation… (and possibly contribute to it…). Warnings We are probably leaving aside many important things… Many things are changing… and some will change a lot.. This tutorial is only meant as a broad overview. The aim is to learn the basics for SD operation; not to learn to develop parts of the ECS… The other aim is to learn common usage and rules. What is ECS ? LHC era Control Technologies Layer Structure Storage Technologies Commercial Configuration DB, Archives, Log files, etc. Custom FSM WAN Supervision SCADA Other systems (LHC, Safety, ...) LAN LAN Controller/ PLC Process Management DIM Communication Protocols VME PLC/UNICOS Field Bus Node Node Field Management Experimental equipment Based on an original idea from LHCb P.C. Burkimsher OPC PVSS & JCOP Framework Course May 2006 VME Field buses & Nodes Sensors/devices ECS Scope Experiment Control System DCS Devices (HV, LV, GAS, Temperatures, etc.) Detector Channels L0 TFC Front End Electronics Readout Network High Level Trigger Storage DAQ External Systems (LHC, Technical Services, Safety, etc) Clara Gaspar, March 2006 T.S . LHC GAS ... DetDcs1 SubSys 1 DAQ DCS DetDcs N SubSys 2 Dev 1 Dev 2 DetDaq 1 SubSysN Dev 3 Dev N To Devices (HW or SW) Clara Gaspar, March 2006 DSS ... Commands ECS Status & Alarms Abstract levels ECS Generic Architecture Control Units ❚ Each node is able to: ❙ Summarize information (for the above levels) ❙ “Expand” actions (to the lower levels) ❙ Implement specific behaviour & Take local decisions DCS Tracke r ❘ Sequence & Automate operations ❘ Recover errors H V Tem p Muon H V ❙ Include/Exclude children (i.e. partitioning) ❘ Excluded nodes can run is stand-alone ❙ User Interfacing ❘ Present information and receive commands Clara Gaspar, March 2006 GA S Device Units ❚ Device Units ❙ Provide the interface to real devices: (Electronics Boards, HV channels, trigger algorithms, etc.) Dev ❘ Can be enabled/disabled N ❘ In order to integrate a device within FSM 〡Deduce a STATE from device readings (in DPs) 〡Implement COMMANDS as device settings ❘ Commands can apply the recipes previously defined Clara Gaspar, March 2006 The Control Framework ❚ The FwFSM Component is based on: ❙ PVSS for: Device Units Control Units ❘ Device Description (Run-time Database) ❘ Device Access (OPC, Profibus, drivers) ❘ Alarm Handling (Generation, Filtering, Masking, etc) ❘ Archiving, Logging, Scripting, Trending ❘ User Interface Builder ❘ Alarm Display, Access Control, etc. ❙ SMI++ providing: ❘ Abstract behavior modeling (Finite State Machines) ❘ Automation & Error Recovery (Rule based system) Clara Gaspar, March 2006 SMI++ Run-time Environment ❙ Device Level: Proxies Obj SMI Domain Obj Obj Obj Obj SMI Domain Obj Obj Obj ❘ drive the hardware: 〡deduceState 〡handleCommands ❘ C, C++, PVSS ctrl scripts ❙ Abstract Levels: Domains ❘ Implement the logical model ❘ Dedicated language - SML ❘ A C++ engine: smiSM ❙ User Interfaces ❘ For User Interaction Proxy Proxy Proxy Hardware Devices ❙ All Tools available on: ❘ Windows, Unix (Linux) ❘ All communications are transparent and dynamically (re)established Clara Gaspar, March 2006 Features of PVSS/SMI++ ❚ Error Recovery Mechanism ❙ Bottom Up ❘ SMI Objects react to changes of their children 〡In an event-driven, asynchronous, fashion ❙ Distributed ❘ Each Sub-System recovers its errors 〡Each team knows how to recover local errors ❙ Hierarchical/Parallel recovery ❙ Can provide complete automation even for very large systems Clara Gaspar, March 2006 Sub-detector FSM Guidelines ❚ Started defining naming conventions. ❚ Defined standard “domains” per sub-detector: ❙ DCS ❘ DCS Infrastructure (Cooling, Gas, Temperatures, pressures, etc) that is normally stable throughout a running period ❙ HV ❘ High Voltages or in general components that depend on the status of the LHC machine (fill related) ❙ DAQ ❘ All Electronics and components necessary to take data (run related) ❙ DAQI ❘ Infrastructure necessary for the DAQ to work (computers, networks, electrical power, etc.) in general also stable throughout a running period. ❚ And standard states & transitions per domain. ❚ Doc available in EDMS: ❘ https://edms.cern.ch/document/655828/1 Clara Gaspar, March 2006 Hierarchy & Conf. DB ECS 1 Infrast. DCS HV DAQI DAQ L0 TFC HLT LHC 1 MUON VELO DCS DCS MUON VELO HV HV MUON VELO DAQI DAQI MUON VELO DAQ DAQ 2 Conf. DB 1 VELO DCS_1 VELO DCS_2 VELO DAQ_1 VELO 3 VELO Dev1 VELO Dev1 VELO Dev1 DevN VELO DAQ_2 1 2 3 Configure/mode=“PHYSICS” (Get “PHYSICS” Settings) Apply Settings Clara Gaspar, March 2006 LHC Era Control Technologies Layer Structure Storage Technologies Commercial Configuration DB, Archives, Log files, etc. Custom FSM WAN Supervision SCADA Other systems (LHC, Safety, ...) LAN LAN Controller/ PLC Process Management DIM Communication Protocols VME PLC/UNICOS Field Bus Node Node Field Management Experimental equipment Based on an original idea from LHCb P.C. Burkimsher OPC PVSS & JCOP Framework Course May 2006 VME Field buses & Nodes Sensors/devices What is JCOP? • JCOP stands for “Joint Controls Project” • Grouping of representatives from the 4 big LHC experiments. • Aims to reduce the overall manpower cost required to produce and run the experiment control systems P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is JCOP Framework? • A layer of software components – Produced in collaboration, components shared – Produced using common tools, components that work together P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS? • The Supervisory Control And Data Acquisition (SCADA) system chosen by JCOP. – In-depth evaluation of products available (commercial or open-source) – JCOP (i.e. the experiments, i.e. you) chose PVSS – Commercial product from ETM, Austria – Since then, PVSS has been widely adopted across CERN, not just used by the experiments • PVSS is a TOOL, not a control system! – You have to build your own system P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS (cont.)? • PVSS II has capabilities for: – Device Description • Data Points, and Data Point items – Device Access • OPC, ProfiBus, Drivers – Alarm Handling • Generation, Masking, etc – Alarm Display, Filtering, Summarising – Archiving, Trending, Logging – User Interface Builder – Access Control P.C. Burkimsher PVSS & JCOP Framework Course May 2006 What is PVSS not? • PVSS II does not have tools specifically for: – Abstract behaviour modelling • Finite State Machines – Automation & Error Recovery • Expert System • But… – FSM (SMI++) does P.C. Burkimsher PVSS & JCOP Framework Course May 2006 PVSS Clara Gaspar, March 2006 PVSS Features ❚ Open Architecture ❙ We can write our own managers ➨It can be interfaced to anything (FSM, DIM) ❚ Highly Distributed ❙ 130 Systems (PCs) tested ➨No major problem found ❚ Standard Interface ❙ All data of all sub-systems defined as DataPoints! Clara Gaspar, March 2006 What is FSM? ❚ Finite State Machine (FSM) ❙ Abstract representation of your experiment. What state is it in? Is it taking data? Is it in standby? Is it broken? Is it switched off? What triggers it to move from one of these states to another? ❙ JCOP choose the State Management Interface (SMI++) developed for the DELPHI experiment. ❙ SMI = tool to build an FSM + Expert system. Vital for controlling & recovering large experiments Clara Gaspar, March 2006 Implementation of the ECS A mixed Win/Linux cluster, with shared resources (network disks, via SAMBA). PCs: – Controls PC: used to directly control some device. – Control Room consoles: used to connect to controls PC. – General servers: gateways to the external world, etc… The mixed cluster means: you need to master the basics of both Win and Linux. Interfacing the HW: – CCPC (Credit Card PC), Linux, integrated in the cluster; local intelligence on electronics boards: UKL1 and HV. – SPECS system (in radiationa areas): Antonis. Computing Environment at IP8 Access via the gateways (lbgw for Linux, lbts for Windows). The LHCb gateways are only visible from inside the CERN network/firewall. Users have personal logins on the LHCb network. Online administrators: [email protected] The login and all computing infrastructure is common across both Linux (including CCPC) and Windows. Note that from inside the LHCb network the external world is not, in general, accessible. Computing Environment at IP8 There is an area set aside for common RICH software: /group/rich/ and G:\rich respectively. Group-wide login profile for the Linux systems at /group/rich/scripts/rich_login.sh See TWiki for file protection issues….(important). The group area must only be used for files used for running the detectors! Remote Access to ECS PC After logging into the LHCb network, any ECS PC can be accessed as follows. Windows to Windows: use remote desktop. Linux to Linux: use ssh, X sessions are not yet enabled (???) on the ECS PC. Windows to Linux (including CCPCs): – start the Exceed X server on the local PC; default options are normally ok: mode: passive, security: any host access, display: multiple plus display in localhost; – logon via ssh with PuTTY; enable: X11 forwarding and X display location = localhost. Other The oper folder in the group area contains a lot of useful shortcuts for common operations. Generic rich_shift account: must only be used when logging on the consoles in the control room. It will be treated as scratch: for example files stored by this user can be deleted at any time. I strongly suggest that everybody uses its own account… Which tools? Web Console (healthiness of software components). FSM panel (routine operation). ECS manager panel (routine debugging). Expert on-call (routine problem fixing…). Logbook (identify yourself only using your account!). When everything else fails … Which tools? Carmelo! Routine Checks/Operations Such a complex system need daily babysitting… – many routine checks must be carried on, to identify and/or trying to prevent problems. A routine check-list is to be defined… Everything relevant must be precisely written in the logbook: this might save your time next time and for sure it will save time to somebody else… Write the issue, write the fixing! Every problem must be delivered to the appropriate list of people. Warnings Be always very careful: in a distributed system non local effects may happen! PVSS implementation Distributed system across Win/Linux: some PVSS projects run on windows, some on Linux (all CCPC-related). Projects are installed in local disks: L:\pvvs | /localdisk/pvss. FW and RICH components installed in the group area. PVSS projects run as system services (Win only, so far). The basic process is PVSS00pmon: check via TaskManager | ps. PVSS is basically running in background, connect to it! Beware: PVSS is everywhere: every problem will reflect on PVSS, this does not mean that there is a problem with PVSS! PVSS console: shows managers and allow controlling them. The components of ECS Sub-Systems – DCS MONITORING – DCS LV and SiBias – HV – DAQ L0 – DAQ L1 – FSM – Configuration DB – Conditions DB Interface to Gas, Cooling&Ventilation, DSS, Magnet. ECS operation Distributed system: all systems can talk together and exchange data. Can do many (but not - yet - all) operations from a single machine: no need to log on the Controls PC (there are still currently many limitations!). Some PVSS-related operations RICH-ECS web panel (Mozilla) slide PVSS Web Console Normal Operations are handled via the FSM view: Antonis Normal Debugging (also routine debug operations) are via the ECS-Manager panels: local/remote functions useful for debugging… It complements and integrates the FSM panels; it is intended more for easy and quick access to a number of functions and tools required outside routine operation and for debugging. - slide A miscellanea of panels Normal Operation: the FSM tree See Antonis. Used for routine operation: – Everything must be accessible navigating the tree. – Everything shall go via simple FSM commands. – To be used by LHCb shifters also: simple, clear, robust and mistake-protected. – Normal operations, including error recovery, must not require the operator to navigatethe tree nor do any complex actions. DSS info ? Not everything is done, nor final, nor bug-free/perfect. We need to exercise and stress the system to spot problems which cannot be seen at the current stage… Many things need to be finalized and the system must be stress-tested. Reaction to alarm situations not yet complete. Documentation not yet complete. To do after! All in twiki: study The HV control CCPC program: – log onto the CCPC; – type HVSetup; – follow the message (after having studied the instructions in TWiKi). The PVSS interface… HV PVSS Controls The interface to the HW is done by the CCPC program; the PVSS project is only a flexible interface to the CCPC program. A first production version of the PVSS controls is available at the pit: – Monitoring of the CCPC data and the ELMB voltage measurements; – Full control of the CCPC: Single channel control; All channels control via the FSM and recipes: – TEST / COMMISSIONING / PHYSICS .. – Many trace plots.. Warnings If you do changes via the CCPC program PVSS is confused: it does not (yet) receive read-back settings. The FSM states are not always (yet) properly evaluated: take them with care and report issues: – I am trying to take care of a lot of information… – No real test outside the pit is good enough… WARNING means: I have contradictory information, keep watching; it is often a temporary state. Always read TWiKi for updates…. Make sure not to confuse: – The ISEG channel (0-19); – The physical column (which the ELMB monitoring refers to). HV Controls: automatic actions The CCPC server will switch-off in case of OvCurr: The CCPC server will switch-off in case of (UnCurr, OvVolt, UnVolt). Other actions must be coordinated by PVSS, if they need information not available by the CCPC. Currently: PVSS gets information by the ELMB monitoring. • Very simple objects with simple functions. EM HV • Avoid to make more complex Device Units and objects to introduce alarm handling. EM_0 EM_1 Col_0 Col_1 AL_0 AL_1 HV_0 HV_1 HW HW TWiKi Link