dcc out of sync problems stan durkin, ohio state

9
DCC Out of Sync Problems Stan Durkin, Ohio State

Upload: shing

Post on 31-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

DCC Out of Sync Problems Stan Durkin, Ohio State. In Recent High Rate Cosmic Runs (July 18-23, 2010) DCCs have gone into an Out-of-Sync Condition 7 times FMM 750 W 82 B 28 S 1 E 0 FMM 752 W 0 B 0 S 0 E 0 FMM 754 W 1023 B 14 S 6 E 0 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DCC Out of Sync Problems Stan Durkin, Ohio State

DCC Out of Sync Problems

Stan Durkin, Ohio State

Page 2: DCC Out of Sync Problems Stan Durkin, Ohio State

In Recent High Rate Cosmic Runs (July 18-23, 2010) DCCs have gone into an Out-of-Sync Condition 7 times

FMM 750 W 82 B 28 S 1 E 0FMM 752 W 0 B 0 S 0 E 0FMM 754 W 1023 B 14 S 6 E 0FMM 756 W 107 B 33 S 2 E 0

Analyze Study Run 141291 (specifically 490s to 540 s)

4,230,000 events thru each RUI5102 events on CMSSW data~0.1 % of events saved

Rate (from slopes): 79.5 KHz

Time (seconds)

L1As

Page 3: DCC Out of Sync Problems Stan Durkin, Ohio State

DCC FIFO Overflows at High Data Rates

SLINK FIFO 1MB

Input_FIFO 248KB

CSC DCC &&DDU header have FMM information

Page 4: DCC Out of Sync Problems Stan Durkin, Ohio State

CSC DCC sTTS state machine:

SLINK_FIFO goes to Half_Full set WARNING;SLINK_FIFO reset WARNING when drop back to Almost_Empty;IN_FIFO goes to Half_Full and L1A Buffer in WARNING, set BUSY;IN_FIFO goes to Half_Full, but SLINK_FIFO not in WARNING, set WARNING;IN_FIFO stays Half_Full for more than 3.2ms, set BUSY;IN_FIFO reaches Almost_Full, set Out_Of_Sync;IN_FIFO or SLINK_FIFO reaches Full, set Out_of_Sync;L1A Buffer: >1536: set WARNING, reset WARNING when it drop to 1280;L1A Buffer: >1920: set BUSY, reset BUSY when it drop to 1536;L1A Buffer: >2016: set Out_Of_Sync;

- Warning and Busy Stops L1A Triggers (lacency ~1sec)- Out_of_Sync stops run for a resync

Page 5: DCC Out of Sync Problems Stan Durkin, Ohio State

t(s) dt(s) FMM 139.384429875 0.436721600 1 139.386232725 0.001802850 8 140.119162225 0.732929500 1 140.120998750 0.001836525 8 144.130565900 4.009567150 1 144.132397975 0.001832075 8 146.057188825 1.924790850 1 146.058872650 0.001683825 8 148.779290350 2.720417700 1 148.781143125 0.001852775 8 152.496441950 3.715298825 1 152.498013425 0.001571475 8 152.817810300 0.319796875 1 152.819979975 0.002169675 8 153.590204650 0.770224675 1 153.592016100 0.001811450 8 154.189867650 0.597851550 1 154.191494650 0.001627000 8 … repeats 90 times … 191.300884525 0.001097700 8 191.301140075 0.000255550 1

191.303430625 0.002290550 2

FMM Log 141491

FMM Throttling Seems to be Working

Time FMM 1 Asserted

Time (msec)

1.8 msec

Transition FMM 12

2.290±0.005 msec

Page 6: DCC Out of Sync Problems Stan Durkin, Ohio State

Data Rates aren’t Large Enough to be Causing Overflows

Average Event Sizes RUI 750 884 bytes RUI 751 993 bytes RUI 752 861 bytes RUI 753 1129 bytes RUI 754 843 bytes RUI 755 1163 bytes RUI 756 821 bytes RUI 757 988 bytes

78.5 Khz

~78.5 MB/sLog10(P)*106

Rate (MB/s)

Theoretical Probability of >50 events in Queue

SLINK FIFO1 Mbyte

600 MB/s 480 MB/s To Fill SLINK FIFO in 2.29 msecrequires >200 MB/s even if outputstopped

Page 7: DCC Out of Sync Problems Stan Durkin, Ohio State

60 Events in Run 141491 CMSSW data show bad transmission

1960 826d bc50 bc500000 8000 bc50 bc500080 0000 bc50 bc508000 8000 bc50 bc500000 0000 bc50 bc500080 2c1e bc50 bc50c0de c000 bc50 bc50 1560 826d 6d0f 50800000 8000 0001 80000080 0000 1014 3f7f8000 8000 ffff 80000000 0000 0000 20000080 2210 0006 a000

3.2 GB/s 3.2GB/s Two independent 3.2 Gbit links

Good Data

Bad data, 0xBC50 idle code

Transfer problemOn 3.2 Gbit Backplane

Page 8: DCC Out of Sync Problems Stan Durkin, Ohio State

f308 7342 76b2 516401f0 5ae0 0e36 d9001960 734d 5064 c0de0000 8000 8000 76b20080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 be16 a000 0000c0de c000 c000 00061960 86bd 5064 c0de0000 8000 8000 76b30080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 2a10 a000 0000c0de c000 c000 00061960 916d 5064 c0de0000 8000 8000 76b40080 0000 3f7f 00018000 8000 8000 10140000 0000 2000 ffff0080 5039 a000 0000c0de c000 c000 00061960 960d 5064 c0de

How do we prove these events are causing problem ?

last column shift

Viewed several hundred bad transmissionevents. Only a small number of DDU->DCClinks gave problems.

RUI755 DDU 25 mostRUI757 DDU 33 manyRUI751 DDU 7 a fewRUI751 DDU 3 a fewRUI756 DDU 35 oneRUI755 DDU 16 one

We will swap DDU 25 and see if the problemsgo away.

Page 9: DCC Out of Sync Problems Stan Durkin, Ohio State

Possible Remedies to Problem

• Fix problem boards

• Reconfigure XILINX RocketIOS

Channel Bonding – lock step data transmissions 16 bit -> 32 bit transfers – keep data packets together

• Change Clock Frequency in Firmware (divide by 2)

we don’t need 800 Mbyte/s

This is not urgent. We will proceed with caution.