first attempt of ecs training work in progress… a lot of material was borrowed! thanks!
TRANSCRIPT
First attempt of ECS training First attempt of ECS training
Work in progress…Work in progress…
A lot of material was borrowed!A lot of material was borrowed!
Thanks!Thanks!
ObjectivesObjectives
Get familiar with routine operation.Get familiar with routine operation.Get familiar with routine problem recovery.Get familiar with routine problem recovery.Get familiar with the way to work inside a complex, Get familiar with the way to work inside a complex, nearly chaotic, highly distributednearly chaotic, highly distributedenvironment: rules must be followed…environment: rules must be followed…Get familiar with the language.Get familiar with the language.Avoid details.Avoid details.After the training you need to studyAfter the training you need to studythe TWiki documentation…the TWiki documentation…(and possibly contribute to it…).(and possibly contribute to it…).
WarningsWarnings
We are probably leaving aside many important things…We are probably leaving aside many important things…
Many things are changing… and some will change a lot..Many things are changing… and some will change a lot..
This tutorial is only meant as a broad overview.This tutorial is only meant as a broad overview.
The aim is to learn the basics for SD operation;The aim is to learn the basics for SD operation;not to learn to develop parts of the ECS…not to learn to develop parts of the ECS…
The other aim is to learn common usage and rules.The other aim is to learn common usage and rules.
What is ECS ?What is ECS ?
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
LHC era Control Technologies
Supervision
ProcessManagement
FieldManagement
Technologies
Experimental equipment
LAN
WAN
Storage
Oth
er s
yste
ms
(LH
C, S
afet
y, ..
.)
Configuration DB,Archives,Log files, etc.
Controller/PLC VME
Field Bus
LAN
Node Node
Based on an original idea from LHCb
Layer Structure
Sensors/devices
Field buses & Nodes
PLC/UNICOS
OPC
Communication Protocols
SCADA
VME
DIM
FSM
Commercial Custom
Clara Gaspar, March 2006
ECS Scope
Detector Channels
Front End Electronics
Readout Network
High Level Trigger
Storage
L0
Experi
men
t C
on
trol S
yst
em
DAQ
DCS Devices (HV, LV, GAS, Temperatures, etc.)
External Systems (LHC, Technical Services, Safety, etc)
TFC
Clara Gaspar, March 2006
ECS Generic Architecture
...
To Devices (HW or SW)
Com
mands
Sta
tus
&
Ala
rms
ECS
DCS DAQ
DetDcs1
DetDcsN
SubSys1
SubSys2
Dev1
Dev2
Dev3
DetDaq1
SubSysN
DevN
LHCT.S.
...GAS
DSS
Ab
stra
ct levels
Clara Gaspar, March 2006
Control Units❚Each node is able to:
❙Summarize information (for the above levels)
❙“Expand” actions (to the lower levels)
❙Implement specific behaviour& Take local decisions❘Sequence & Automate operations❘Recover errors
❙Include/Exclude children (i.e. partitioning)❘Excluded nodes can run is stand-alone
❙User Interfacing❘Present information and receive commands
DCS
Temp
Tracker
Muon
HV
GAS
HV
Clara Gaspar, March 2006
Device Units
❚Device Units❙Provide the interface to real devices:
(Electronics Boards, HV channels, trigger algorithms, etc.)
❘Can be enabled/disabled❘In order to integrate a device within FSM
〡Deduce a STATE from device readings (in DPs)〡Implement COMMANDS as device settings
❘ Commands can apply the recipes previously defined
DevN
Clara Gaspar, March 2006
❚The FwFSM Component is based on:❙PVSS for:
❘ Device Description (Run-time Database)
❘ Device Access (OPC, Profibus, drivers)
❘ Alarm Handling (Generation, Filtering, Masking, etc)
❘ Archiving, Logging, Scripting, Trending
❘ User Interface Builder
❘ Alarm Display, Access Control, etc.
❙SMI++ providing:❘ Abstract behavior modeling (Finite State Machines)
❘ Automation & Error Recovery (Rule based system)
The Control FrameworkD
evic
e U
nit
s
Con
trol U
nit
s
Clara Gaspar, March 2006
SMI++ Run-time Environment
ProxyProxyProxy
Hardware Devices
Obj
Obj
Obj
SMI Domain
ObjObjObj
Obj
Obj SMI Domain
❙Device Level: Proxies❘ drive the hardware:
〡deduceState〡handleCommands
❘ C, C++, PVSS ctrl scripts
❙Abstract Levels: Domains❘ Implement the logical model❘ Dedicated language - SML❘ A C++ engine: smiSM
❙User Interfaces❘ For User Interaction
❙All Tools available on: ❘ Windows, Unix (Linux) ❘ All communications are
transparent and dynamically (re)established
Clara Gaspar, March 2006
Features of PVSS/SMI++
❚Error Recovery Mechanism❙Bottom Up
❘SMI Objects react to changes of their children〡In an event-driven, asynchronous, fashion
❙Distributed❘Each Sub-System recovers its errors
〡Each team knows how to recover local errors
❙Hierarchical/Parallel recovery❙Can provide complete automation even
for very large systems
Clara Gaspar, March 2006
Sub-detector FSM Guidelines
❚Started defining naming conventions.❚Defined standard “domains” per sub-detector:
❙ DCS❘ DCS Infrastructure (Cooling, Gas, Temperatures, pressures, etc) that is
normally stable throughout a running period❙ HV
❘ High Voltages or in general components that depend on the status of the LHC machine (fill related)
❙ DAQ❘ All Electronics and components necessary to take data (run related)
❙ DAQI❘ Infrastructure necessary for the DAQ to work (computers, networks,
electrical power, etc.) in general also stable throughout a running period.
❚And standard states & transitions per domain.❚ Doc available in EDMS:
❘ https://edms.cern.ch/document/655828/1
Clara Gaspar, March 2006
MUONDCS
MUONHV
MUONDAQI
MUONDAQ
Hierarchy & Conf. DB
VELODCS
Infrast. DCS HV DAQI DAQ L0 TFC HLT LHC
VELOHV
VELODAQI
VELODAQ
VELODCS_1
VELODCS_2
VELODAQ_1
VELODAQ_2
ECS
VELODev1VELO
Dev1VELODev1VELO
DevN
Conf.DB
3
2
1
1 Configure/mode=“PHYSICS”(Get “PHYSICS” Settings)Apply Settings
2
3
1
1
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
LHC Era Control Technologies
Supervision
ProcessManagement
FieldManagement
Technologies
Experimental equipment
LAN
WAN
Storage
Oth
er s
yste
ms
(LH
C, S
afet
y, ..
.)
Configuration DB,Archives,Log files, etc.
Controller/PLC VME
Field Bus
LAN
Node Node
Based on an original idea from LHCb
Layer Structure
Sensors/devices
Field buses & Nodes
PLC/UNICOS
OPC
Communication Protocols
SCADA
VME
DIM
FSM
Commercial Custom
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
What is JCOP?• JCOP stands for “Joint Controls Project”
• Grouping of representatives from the 4 big LHC experiments.
• Aims to reduce the overall manpower cost required to produce and run the experiment control systems
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
What is JCOP Framework?• A layer of software components
– Produced in collaboration, components shared– Produced using common tools, components
that work together
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
What is PVSS?• The Supervisory Control And Data Acquisition
(SCADA) system chosen by JCOP.– In-depth evaluation of products available (commercial
or open-source)– JCOP (i.e. the experiments, i.e. you) chose PVSS– Commercial product from ETM, Austria– Since then, PVSS has been widely adopted across
CERN, not just used by the experiments
• PVSS is a TOOL, not a control system!– You have to build your own system
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
What is PVSS (cont.)?• PVSS II has capabilities for:
– Device Description • Data Points, and Data Point items
– Device Access• OPC, ProfiBus, Drivers
– Alarm Handling• Generation, Masking, etc
– Alarm Display, Filtering, Summarising– Archiving, Trending, Logging– User Interface Builder– Access Control
P.C. Burkimsher PVSS & JCOP Framework Course May 2006
What is PVSS not?• PVSS II does not have tools specifically for:
– Abstract behaviour modelling• Finite State Machines
– Automation & Error Recovery• Expert System
• But…– FSM (SMI++) does
Clara Gaspar, March 2006
PVSS
Clara Gaspar, March 2006
PVSS Features
❚Open Architecture❙We can write our own managers➨It can be interfaced to anything (FSM, DIM)
❚Highly Distributed❙130 Systems (PCs) tested➨No major problem found
❚Standard Interface❙All data of all sub-systems defined as
DataPoints!
Clara Gaspar, March 2006
What is FSM?
❚Finite State Machine (FSM)❙Abstract representation of your experiment.
What state is it in? Is it taking data? Is it in standby? Is it broken? Is it switched off? What triggers it to move from one of these states to another?
❙JCOP choose the State Management Interface (SMI++) developed for the DELPHI experiment.
❙SMI = tool to build an FSM + Expert system. Vital for controlling & recovering large experiments
Implementation of the ECSImplementation of the ECS
A mixed Win/Linux cluster,A mixed Win/Linux cluster,with shared resources (network disks, via SAMBA).with shared resources (network disks, via SAMBA).PCs:PCs:– Controls PC: used to directly control some device.Controls PC: used to directly control some device.– Control Room consoles: used to connect to controls PC.Control Room consoles: used to connect to controls PC.– General servers: gateways to the external world, etc…General servers: gateways to the external world, etc…
The mixed cluster means:The mixed cluster means:you need to master the basics of both Win and Linux.you need to master the basics of both Win and Linux.Interfacing the HW:Interfacing the HW:– CCPC (Credit Card PC), Linux, integrated in the cluster;CCPC (Credit Card PC), Linux, integrated in the cluster;
local intelligence on electronics boards: local intelligence on electronics boards: UKL1 and HV.UKL1 and HV.– SPECS system (in radiationa areas): SPECS system (in radiationa areas): Antonis. Antonis.
Computing Environment at IP8Computing Environment at IP8 Access via the gatewaysAccess via the gateways(lbgw for Linux, lbts for Windows).(lbgw for Linux, lbts for Windows).The LHCb gateways are only visible from inside the The LHCb gateways are only visible from inside the CERN network/firewall.CERN network/firewall.Users have personal logins on the LHCb network.Users have personal logins on the LHCb network.Online administrators:Online administrators:[email protected]@cern.chThe login and all computing infrastructure is commonThe login and all computing infrastructure is commonacross both Linux (including CCPC) and Windows.across both Linux (including CCPC) and Windows.Note that from inside the LHCb network the external Note that from inside the LHCb network the external world is not, in general, accessible.world is not, in general, accessible.
Computing Environment at IP8Computing Environment at IP8
There is an area set aside for common RICH software:There is an area set aside for common RICH software:/group/rich/ and G:\rich respectively. /group/rich/ and G:\rich respectively. Group-wide login profile for the Linux systems at Group-wide login profile for the Linux systems at /group/rich/scripts/rich_login.sh/group/rich/scripts/rich_login.shSee TWiki for file protection issues….(important).See TWiki for file protection issues….(important).The group area must only be used for filesThe group area must only be used for filesused for running the detectors!used for running the detectors!
Remote Access to ECS PCRemote Access to ECS PC After logging into the LHCb network, After logging into the LHCb network, any ECS PC can be accessed as follows. any ECS PC can be accessed as follows. Windows to Windows: use remote desktop.Windows to Windows: use remote desktop.Linux to Linux: use ssh,Linux to Linux: use ssh,X sessions are not yet enabled (???) on the ECS PC. X sessions are not yet enabled (???) on the ECS PC. Windows to Linux (including CCPCs):Windows to Linux (including CCPCs):– start the Exceed X server on the local PC;start the Exceed X server on the local PC;
default options are normally ok: default options are normally ok: mode: passive, mode: passive, security: any host access, security: any host access, display: multiple plus display in localhost;display: multiple plus display in localhost;
– logon via ssh with PuTTY; enable:logon via ssh with PuTTY; enable:X11 forwarding and X display location = localhost.X11 forwarding and X display location = localhost.
Other Other
The The oper oper folder in the group areafolder in the group areacontains a lot of useful shortcuts for common operations.contains a lot of useful shortcuts for common operations.
Generic rich_shift account:Generic rich_shift account:must only be used when loggingmust only be used when loggingon the consoles in the control room.on the consoles in the control room.It will be treated as scratch: for exampleIt will be treated as scratch: for examplefiles stored by this user can be deleted at any time.files stored by this user can be deleted at any time.
I strongly suggest that everybody uses its own account…I strongly suggest that everybody uses its own account…
Which tools?Which tools?
Web Console (healthiness of software components).Web Console (healthiness of software components).
FSM panel (routine operation).FSM panel (routine operation).
ECS manager panel (routine debugging).ECS manager panel (routine debugging).
Expert on-call (Expert on-call (routineroutine problem fixing…). problem fixing…).
Logbook (identify yourself only using your account!).Logbook (identify yourself only using your account!).
When everything else fails When everything else fails ……
Which tools?Which tools?
Carmelo!Carmelo!
Routine Checks/OperationsRoutine Checks/Operations
Such a complex system need daily babysitting…Such a complex system need daily babysitting…– many routine checks must be carried on,many routine checks must be carried on,
to identify and/or trying to prevent problems.to identify and/or trying to prevent problems.A routine check-list is to be defined…A routine check-list is to be defined…Everything relevant must be precisely writtenEverything relevant must be precisely writtenin the logbook: this might save your time next timein the logbook: this might save your time next timeand for sure it will save time to somebody else…and for sure it will save time to somebody else…Write the issue, write the fixing!Write the issue, write the fixing!Every problem must be deliveredEvery problem must be deliveredto the appropriate list of people.to the appropriate list of people.
WarningsWarnings
Be always very careful:Be always very careful:in a distributed system non local effects may happen!in a distributed system non local effects may happen!
PVSS implementationPVSS implementation
Distributed system across Win/Linux: some PVSS projectsDistributed system across Win/Linux: some PVSS projectsrun on windows, some on Linux (all CCPC-related).run on windows, some on Linux (all CCPC-related).Projects are installed in local disks: L:\pvvs | /localdisk/pvss.Projects are installed in local disks: L:\pvvs | /localdisk/pvss.FW and RICH components installed in the group area.FW and RICH components installed in the group area.PVSS projects run as system services (Win only, so far).PVSS projects run as system services (Win only, so far).The basic process is PVSS00pmon:The basic process is PVSS00pmon:check via TaskManager | ps.check via TaskManager | ps.PVSS is basically running in background, connect to it!PVSS is basically running in background, connect to it!Beware: PVSS is everywhere: every problem will reflect on Beware: PVSS is everywhere: every problem will reflect on PVSS, this does not mean that there is a problem with PVSS!PVSS, this does not mean that there is a problem with PVSS!PVSS console: shows PVSS console: shows managers managers and allow controlling them.and allow controlling them.
The components of ECSThe components of ECS
Sub-SystemsSub-Systems– DCS MONITORINGDCS MONITORING– DCS LV and SiBiasDCS LV and SiBias– HVHV– DAQ L0DAQ L0– DAQ L1DAQ L1– FSMFSM– Configuration DBConfiguration DB– Conditions DBConditions DB
Interface to Gas, Cooling&Ventilation, DSS, Magnet.Interface to Gas, Cooling&Ventilation, DSS, Magnet.
ECS operationECS operation
Distributed system:Distributed system:all systems can talk together and exchange data.all systems can talk together and exchange data.
Can do many (but not - yet - all) operationsCan do many (but not - yet - all) operationsfrom a single machine:from a single machine:no need to log on the Controls PCno need to log on the Controls PC(there are still currently many limitations!).(there are still currently many limitations!).
Some PVSS-related operationsSome PVSS-related operations
RICH-ECS web panel (Mozilla) RICH-ECS web panel (Mozilla) slideslidePVSS Web ConsolePVSS Web ConsoleNormal OperationsNormal Operations are handled via the FSM view:are handled via the FSM view: Antonis AntonisNormal DebuggingNormal Debugging (also (also routine debug operationsroutine debug operations))are via the ECS-Manager panels:are via the ECS-Manager panels:local/remote functions useful for debugging…local/remote functions useful for debugging…It complements and integrates the FSM panels;It complements and integrates the FSM panels;it is intended more for easy and quick access it is intended more for easy and quick access to a number of functions and tools to a number of functions and tools required outside routine operation and for debugging.required outside routine operation and for debugging.- slide -- slide -A miscellanea of panelsA miscellanea of panels
Normal Operation: the FSM treeNormal Operation: the FSM tree
See Antonis.See Antonis.
Used for routine operation:Used for routine operation:– Everything must be accessible navigating the tree.Everything must be accessible navigating the tree.– Everything shall go via simple FSM commands.Everything shall go via simple FSM commands.– To be used by LHCb shifters also:To be used by LHCb shifters also:
simple, clear, robust and mistake-protected.simple, clear, robust and mistake-protected.– Normal operations, including error recovery,Normal operations, including error recovery,
must not require the operator to navigatethe treemust not require the operator to navigatethe treenor do any complex actions.nor do any complex actions.
DSS infoDSS info
??
Not everything is done, nor final, nor bug-free/perfect.Not everything is done, nor final, nor bug-free/perfect.
We need to exercise and stress the systemWe need to exercise and stress the systemto spot problems which cannot be seento spot problems which cannot be seenat the current stage… at the current stage…
Many things need to be finalizedMany things need to be finalizedand the system must be stress-tested.and the system must be stress-tested.
Reaction to alarm situations not yet complete.Reaction to alarm situations not yet complete.
Documentation not yet complete.Documentation not yet complete.
To do after!To do after!
All in twiki: studyAll in twiki: study
The HV controlThe HV control
CCPC program:CCPC program:– log onto the CCPC;log onto the CCPC;– type HVSetup;type HVSetup;– follow the message follow the message
(after having studied the instructions in TWiKi).(after having studied the instructions in TWiKi).
The PVSS interface…The PVSS interface…
HV PVSS ControlsHV PVSS ControlsThe interface to the HW is done by the CCPC program;The interface to the HW is done by the CCPC program;the PVSS project is only a flexible interface to the CCPC the PVSS project is only a flexible interface to the CCPC program.program.
A first production version of the PVSS controls is A first production version of the PVSS controls is available at the pit:available at the pit:– Monitoring of the CCPC data and the ELMB voltage Monitoring of the CCPC data and the ELMB voltage
measurements; measurements; – Full control of the CCPC:Full control of the CCPC:
Single channelSingle channel control; control;
All channels controlAll channels control via the FSM and recipes: via the FSM and recipes:– TEST / COMMISSIONING / PHYSICS ..TEST / COMMISSIONING / PHYSICS ..
– Many trace plots..Many trace plots..
WarningsWarnings
If you do changes via the CCPC program PVSSIf you do changes via the CCPC program PVSSis confused: it does not (yet) receive read-back settings.is confused: it does not (yet) receive read-back settings.The FSM states are not always (yet) properly evaluated:The FSM states are not always (yet) properly evaluated:take them with care and report issues:take them with care and report issues:– I am trying to take care of a lot of information…I am trying to take care of a lot of information…– No real test outside the pit is good enough…No real test outside the pit is good enough…
WARNING means: I have contradictory information, keep WARNING means: I have contradictory information, keep watching; it is often a temporary state.watching; it is often a temporary state.Always read TWiKi for updates….Always read TWiKi for updates….Make sure not to confuse:Make sure not to confuse:– The ISEG channel (0-19);The ISEG channel (0-19);– The physical columnThe physical column
(which the ELMB monitoring refers to).(which the ELMB monitoring refers to).
HV Controls: automatic actionsHV Controls: automatic actions
The CCPC server will switch-off in case of OvCurr:The CCPC server will switch-off in case of OvCurr:
The CCPC server The CCPC server will switch-off in case of will switch-off in case of (UnCurr, OvVolt, UnVolt).(UnCurr, OvVolt, UnVolt).
Other actions must be coordinated by PVSS, Other actions must be coordinated by PVSS, if they need information not available by the CCPC.if they need information not available by the CCPC.
Currently: PVSS gets information by the ELMB monitoring.Currently: PVSS gets information by the ELMB monitoring.
Col_1
HV_1
EM_1
AL_1
HW
Col_0
HV_0
EM_0
AL_0
HW
HVEM
• Very simple objects with simple functions.
• Avoid to make more complex Device Units and objects to introduce alarm handling.
TWiKiTWiKi
LinkLink