c. fernndez bedoya c. battilana (a.k.a. 76020 to be called by the dt fm) i. redondo ddu: torino...

Download C. Fernndez Bedoya C. Battilana (a.k.a. 76020 to be called by the DT FM) I. Redondo DDU: Torino group (168681 to be called by the DT FM) DAQ/DCS: S.Ventura,

If you can't read please download the document

Upload: eustace-payne

Post on 19-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

DT training 07/07/ A minicrate is off, would a resynch help? 2.What of the following actions can I do on the DT readout while in running state that would not require to stop the run (only resynch will take to optimal state)? 1.MC reconfiguration 2. ccbrobreset on a MC 3. SC PC reboot 3.A DDU is reporting an error to the cDAQ, can a ROS be the source of the problem? 4.I see ROS errors, can the problem come from a minicrate? 5.Build a table with the granularities of the detector. Part of the detector affected (or where to find it) Clock DT DAQ Function Manager TTCci partitionPVSS partition DDU boardDDU channel ROS boardROS channel 1 ROB1 FEB LV PS boardHV PS boardLV PS crate TSC board OptoRX 1 TRBDDU PC SC PCDCS PC ……………………. THINK ABOUT THE SYSTEM GRANUALRITY

TRANSCRIPT

C. Fernndez Bedoya C. Battilana (a.k.a to be called by the DT FM) I. Redondo DDU: Torino group ( to be called by the DT FM) DAQ/DCS: S.Ventura, MarinaP ( to be called by the DT FM) DT training 07/07/ DT readout system is relatively complex The interventions you can do are limited but doing them correctly have impact in data efficiency/quality Your monitoring is crucial Log your observations! Dont be afraid about the amount of information, IMHO a pro-active attitude (but safety first!) is more than half of the way Talk to us in case of any doubt We need your feedback DT training 07/07/ A minicrate is off, would a resynch help? 2.What of the following actions can I do on the DT readout while in running state that would not require to stop the run (only resynch will take to optimal state)? 1.MC reconfiguration 2. ccbrobreset on a MC 3. SC PC reboot 3.A DDU is reporting an error to the cDAQ, can a ROS be the source of the problem? 4.I see ROS errors, can the problem come from a minicrate? 5.Build a table with the granularities of the detector. Part of the detector affected (or where to find it) Clock DT DAQ Function Manager TTCci partitionPVSS partition DDU boardDDU channel ROS boardROS channel 1 ROB1 FEB LV PS boardHV PS boardLV PS crate TSC board OptoRX 1 TRBDDU PC SC PCDCS PC . THINK ABOUT THE SYSTEM GRANUALRITY DT training 07/07/ You may be called for RO issues if: -DT is in Out of Synch -> Ask for a Resynch (no need to pause) -Resynch does not work -> Ask for a DT Reconfiguration from cDAQ -DT cannot be configured -> Call the DT FM -In case of a LV problem caused by MC trip you should try to recover chamber once (see MarinaGs talk). -Be aware of Central DQM instructions: https://twiki.cern.ch/twiki/bin/view/CMS/DQMShiftDT You should be familiar with which errors are happening routinely that week and which should be looked at. (If they call you for noise problems, unless they are dropping data taking efficiency (DDU is in busy too often), no inmediate action is needed. One important thing to remember: basically, all the expert actions that can be done in the read-out require the RUN to be PAUSED/STOPPED, etc (NO L1As being sent) For example, reconfiguring a Minicrate, send a robreset, recovering a GOL, etc, etc -Inform the DT run field manager -Send e-log! DT training 07/07/ Brief description of the DT Read-Out system. Know your neighbours. What is it made of? What is its job? Your duties: What you should monitor on-line What to do when called How to search for past behaviour What is normal? DT training 07/07/ The Central DAQ Empire configures and control us (and everyone) For cDAQ our 10 DDUs are instances of FEDs (Front End Drivers) that send inputs data to FRLs (Frontend Readout links) cDAQ builds a CMS event putting together everyones data The Global Trigger Alliance fanouts clock and trigger signals For GT we are: 3 TTC (Trigger Timing Control) partitions 10 TTS (Trigger Throttling System) signals that can block the trigger (GT also knows about us through the trigger chain) DT training 07/07/ The Central DAQ Empire configures and control us (and everyone) For cDAQ our 10 DDUs are instances of FEDs (Front End Drivers) that send inputs data to FRLs (Frontend Readout links) cDAQ builds a CMS event putting together everyones data Detector Control System for Minicrates being conquered by cDAQ The Global Trigger Alliance fanouts clock and trigger signals For GT we are: 3 TTC (Trigger Timing Control) partitions 10 TTS (Trigger Throttling System) signals that can block the trigger (GT also knows about us through the trigger chain) Detector Control System for Minicrates being conquered by GT DT training 07/07/ Sector Collector crates (2/wheel) shared with the DT trigger in the detector cavern holds the 60 ROS boards 1 DDU crate holds 10 DDU board (2/wheel) in USC cavern 250 Minicrates attached to the DT chambers hold the 1500 ROBs (a few in each) DT training 07/07/ m copper 240 Mbps ~16 Mbps throughput S-LINK MB/s ~ 200 MB/s throughput UXC55USC55 Minicrates 1500 ROB 128 ch/ROB Time digitalization (0.7 ns resolution) 1 s time window Sector Collector 60 ROS 25 ch/ROS => 1 sector Data merging Data quality monitoring & enforcement Optical conversion ~ 260 bytes muon event size/ROS DDU (FED) 10 DDU 12 ch/DDU. Now half a wheel/DDU, may move to wheel/DDU in the future Data merging Data quality monitoring TTS interface ~ 0.7 kB muon event size/DDU 100 m optical 800 Mbps ~80 Mbps throughput Chambers 5 wheels 60 sectors 250 chambers 660 super-layers 1640 layers ~ channels 128 channels 1 sector 1/2 Wheel DT training 07/07/ m copper 240 Mbps ~16 Mbps throughput S-LINK MB/s ~ 200 MB/s throughput UXC55USC55 Minicrates 1500 ROB Measure the hits with 0.7 ns resolution during a 1 s readout window from the trigger Sector Collector 60 ROS merge data from one sector and ship it out to USC Blocks channels malfunctioning or noisy ~ 260 bytes muon event size/ROS DDU (FED) 10 DDU Merges data from half a wheel and gives it to cDAQ Talks to the Central Systems through TTS Can block the run to insure data quality ~ 0.7 kB muon event size/DDU 100 m optical 800 Mbps ~80 Mbps throughput The chambers are sensitive to Collision muons Cosmic muons Punch throughs Noise 128 channels 1 sector 1/2 Wheel DT training 07/07/ CCB: Full chamber control and monitoring: configures, sets thresholds, reads temperatures, etc. CCBlink: Connects the CCB to the external DT DCS system. TRB: Searches track segments and performs bunch identification. SB: Performs track selection and transmits to TSC. ROB: Time digitalization of signals coming from the chambers. ROLINK: Collects outputs from ROBs and sends it to the ROS. New Minicrate firmware means new CCB firmware. Attached to the DT Chambers it contains the first level read-out, trigger and full chamber control electronics. DT training 07/07/ Configuration database Temperature history CCB server status DT DAQ Function manager status TTC status Covered in Sandros talk: + Nice DT system diagrams (computer network-centric) DT training 07/07/ DCS dashboardto tunnel into P5 network)http://vmepcs2g16-14/dtdcs/dtdcsmon.html DT training 07/07/ (from Sandro) More documentation in this twiki: https://twiki.cern.ch/twiki/bin/view/CMS/DTshiftFAQ2010x07x06 Prepare your setup previous to the shift: You need an internet connection to be oncall! DT training 07/07/ (from Sandro) DT training 07/07/ (from Sandro) DT training 07/07/ DCS dashboardto tunnel into P5 network)http://vmepcs2g16-14/dtdcs/dtdcsmon.html Check if it is not a known problem! https://twiki.cern.ch/twiki/bin/view/CMS/DTshiftKnownProblem DT training 07/07/ To know which particular problem is, go to the DCS_GUI and in the configuration tab right click on the Minicrate with problems and select checktdc (This can be done during the run, it does not disturb). It will take sometime but after a while the information will appear in the box in the right. If you get any message in red (write in the e-log), you will need to reconfigure that particular Minicrate (only the Daq (ROB) part), at present, call the expert. ssh Y ~dtdqm/DCS_GUI.sh Beware many runinig GUIs will eat PC memory TDC error automatically fixed at run stop DT training 07/07/ DCS dashboardAlternatively, quick buttons script should be checked: Check if it is not a known problem! It is the standard monitoring tool Remote command (linux): xterm -e "ssh -t 'ssh t './status_remote.sh'' " & Large events are noise that can fill the buffers! DT training 07/07/ Status.sh script parse ROS & DDU status web pages, html dumps from xdaq programs XDAQ programs started at DT configuration If XDAQs are running, the script will automatically: produce the web pages and force autorefresh ROS_YB-2=http://vmepcs2g16-06.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g16-06.cms:40000/urn:xdaq-application:lid=11/wheelWebPage ROS_YB-1=http://vmepcs2g16-07.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g16-07.cms:40000/urn:xdaq-application:lid=11/wheelWebPage ROS_YB0=http://vmepcs2g16-08.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g16-08.cms:40000/urn:xdaq-application:lid=11/wheelWebPage ROS_YB+1=http://vmepcs2g16-09.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g16-09.cms:40000/urn:xdaq-application:lid=11/wheelWebPage ROS_YB+2=http://vmepcs2g16-10.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g16-10.cms:40000/urn:xdaq-application:lid=11/wheelWebPage DDU=http://vmepcs2g18-16.cms:40000/urn:xdaq-application:lid=11/wheelWebPage *http://vmepcs2g18-16.cms:40000/urn:xdaq-application:lid=11/wheelWebPage DT training 07/07/ What the errors mean (in order of importance): https://twiki.cern.ch/twiki/bin/viewauth/CMS/DTshiftFAQ2010x05x21 ROS board errors (may affect 1 sector). In both cases, you will probably loose one Sector. FPGA programmed: Should be 0x1F GOL & QPLL: If different from 0x28, 0x29 or 0x2A there is a problem either with the clock distribution or with the optical transmitter (GOL). This errors are only cleared after a Reconfiguration (if possible). Rare but happens ROS input channels errors (may affect one or more ROBs) Errors that block the channel: if the source of error is transient (i,.e RPC trip) channels can be recovered with a resynch MBXX Channels Timed Out: MBXX Channels have Unlock: LV trip? Noise? MBXX Fifos Full Flag: Noise? MBXX Max Words Reached: Noise?. Errors that may block the channel: MBXX EventID Misalignment Err.: MBXX Fifos PAF Flag Reg.: Noise? DT training 07/07/ TTS (Trigger Throttling System) DDU can fire a TTS action that will be received at the Central DAQ/Trigger system and you could be called. Warning Overflow, Busy: Mean that either the amount of data or the trigger rate are too high and the buffers are close to fill up. The problem may be in our system (we have a lot of noise) or it can be in the DAQ part, that for some reason their FRL module is having problems. Actions are usually verify trigger rates and noise in the system. Out-of-synch: Usually related with an Event Misalignment between ROS and DDU, or due to many ROS errors. You can ask the Central DAQ shifter for a Resynch (no need to pause) command, that should clear the errors if they are not persistent. There are different registers that keep track of what has happened. Note: a resync is cheap in downtime, on the other hand, to reconfigure the DTs the run has to be stopped, which may upset other systems and cause large downtime. Exercise carefull judgement during physics running. DT training 07/07/ What is normal? DT Shift e-log is the place to log status of the detector/studies. (do not forget to follow the shift elog and read the commisioning hypernews) Once per day Savannah ticket for problems https://savannah.cern.ch/projects/dtproblems/https://savannah.cern.ch/projects/dtproblems/ The wave has a lot of expert (sometimes random) discussion https://wave.google.com/ (cmsdtonline user)https://wave.google.com/ Run summary in WBM https://cmswbm.web.cern.ch/cmswbm/RunSummary.htmlhttps://cmswbm.web.cern.ch/cmswbm/RunSummary.html Directly in DQM or WBM DT training 07/07/ Private DT DQMData integrity contains summaries within the run of the ROS/DDU status registers ROchannels has a trend of enabled channels within the run. Usuful to check if resynch worked Central DQM online https://cmsweb.cern.ch/dqm/online/session/QpBuQQhttps://cmsweb.cern.ch/dqm/online/session/QpBuQQ DT training 07/07/ Chamber LV OFF YB0 S4 MB2 ROB-ROS link problem YB-1 S11 MB4 ROB 23 6 ROBs not read 1 ROB not read Quick Collection DT/00-DataIntegrity DT/00-DataIntegrity/FEDXX/ROSXX DT training 07/07/ VERY IMPORTANT, ROS MISSING!!! DT/00-DataIntegrity DT training 07/07/ DT training 07/07/ Running MiniDAQ is trivial (when it works) It is important to us to detect when it gets broken. A MiniDAQ random/cosmics at a suitable time (consult DT RFM, he could teach you first time ) once a week insures the tool is available to us when needed (i.e. access). Then you could help us from upstairs. Ideally also TP, but first things first Tips: cDAQ has to destroy us before we can initialize MiniDAQ LTC xdaq used for LTC rate monitor can not be running. Should be killed by hand, this FAQ helps: https://twiki.cern.ch/twiki/bin/view/CMS/DTshiftFAQ2009x07x01 Advanced documentation:Not happening, DT training 07/07/ dduMonitor.py script parse DDU status web pages, (html dumps from xdaq programs) https://twiki.cern.ch/twiki/bin/view/CMS/DTexpertFAQ2009x12x04 Also dduMonitor.py r displays information for past runs from DB: If Status.sh is running itshould take care of loading the htmls when DT FM is destroyed DT training 07/07/ CHAMBER & MINICRATE SC crate SC HV LV PVSS program High and Low voltage DCS program Minicrate communication DDU PHTF WS BS Clk, L1A TTCci program vmepcs2b16-09 BS trigger TS (Trigger supervisor) TSC, DTTF configuration and monitoring fibber copper TOP Sectors 1-6 BOTTOM Sectors 7-12 TSC supervisor DAQ program ROS configuration and monitoring DAQ program DDU configuration and monitoring GT Global trigger TTCci + TTCci 0 TTCci - DT training 07/07/ https://cmswbm.web.cern.ch/cmswbm/ Browse runs Browse runs within fills Search for downtimes DT training 07/07/ Every 2 minutes, content of ROS and DDU registers stored Errors at the end of run In the bottom also status of LV and HV https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/DTSummary?MAIN=1&FUNCTION=RU NSUMMARY&RUN=136091https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/DTSummary?MAIN=1&FUNCTION=RU NSUMMARY&RUN= Easy browsing: Change run number in URL Navigate from RunSummary link DT training 07/07/ TDC error, not critical, advisable to reconfigure Minicrates when possible DT/00-DataIntegrity/DataIntegrityTDCSummary DT/00-DataIntegrity/FEDXX/ROSXX Dont worry Also seen in DCS GUI DCS xdaq v6 reconfigures automatically MCs with TDC error at run stop