hlt architecture

31
HLT architecture

Upload: emiko

Post on 02-Feb-2016

59 views

Category:

Documents


2 download

DESCRIPTION

HLT architecture. Digital Circuit. TPC FEE. FEC (Front End Card) - 128 CHANNELS (CLOSE TO THE READOUT PLANE). DETECTOR. Power consumption: < 40 mW / channel. L1: 5 m s 200 Hz. 8 CHIPS x 16 CH / CHIP. 8 CHIPS x 16 CH / CHIP. drift region 88 m s. L2: < 100 m s - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HLT architecture

HLT architecture

Page 2: HLT architecture

TPC FEE

anode wire

pad plane

drift region88s

L1: 5s 200 Hz

PASA ADC DigitalCircuit

RAM

8 CHIPS x

16 CH / CHIP

8 CHIPSx

16 CH / CHIP

CUSTOM IC(CMOS 0.35m) CUSTOM IC (CMOS 0.25m )

DETECTOR FEC (Front End Card) - 128 CHANNELS(CLOSE TO THE READOUT PLANE)

FEC (Front End Card) - 128 CHANNELS(CLOSE TO THE READOUT PLANE)

570132PADS

1 MIP = 4.8 fC

S/N = 30 : 1

DYNAMIC = 30 MIP

CSA SEMI-GAUSS. SHAPER

GAIN = 12 mV / fCFWHM = 190 ns

10 BIT

< 10 MHz

• BASELINE CORR.

• TAIL CANCELL.

• ZERO SUPPR.

MULTI-EVENT

MEMORY

L2: < 100 s 200 Hz

DDL(4096 CH / DDL)

Powerconsumption:

< 40 mW / channel

Powerconsumption:

< 40 mW / channel

gat

ing

gri

d

Page 3: HLT architecture
Page 4: HLT architecture

TPC electronics: ALICE TPCE READOUT CHIP (ALTRO)

0 100 200 300 400 500 600 700-50

0

50

100

150

200filter inputthreshold

0 100 200 300 400 500 600 700-50

0

50

100

150

200Filtered data and fixed threshold

filter outputthreshold

DIGITAL TAIL CANCELLATION PERFORMANCE

AD

C c

ou

nts

AD

C c

ou

nts

Time samples (170 ns)

AdaptiveBaselineCorrect.

I

AdaptiveBaselineCorrect.

I

ADCADC TailCancel.

TailCancel.

DataFormat.

DataFormat.

Multi-EventMemory

AdaptiveBaselineCorrect.

II

AdaptiveBaselineCorrect.

II

+-

10- bit20 MSPS

11- bit CA2arithmetic

18- bit CA2arithmetic

11- bitarithmetic

40-bitformat

40-bitformat

SAMPLING CLOCK 20 MHz READOUT CLOCK 40 MHz

DIGITAL PROCESSOR & CONTROL LOGIC

8 A

DC

s 8 A

DC

s

ME

MO

RY

0.25 m (ST) area:64mm2

power:29 mW / ch SEU protection

Page 5: HLT architecture

Data compression: Entropy coder

Variable Length Coding short codes for long codes for

frequent values infrequent values

Probability distribution of 8-bit TPC data

Results:NA49: compressed event size = 72%ALICE:

= 65%

(Arne Wiebalck, diploma thesis, Heidelberg)

Page 6: HLT architecture

TPC - RCU

Page 7: HLT architecture

RCU design – control flow

• State machines RCU

resource &

priority manager

TTCrxFEE bus controller

SIUcontroller DDL

commanddecoder

FEESC

Slow control

Watch dog:health agentDebugger

PCI core

Huffman encoder

Page 8: HLT architecture

RCU design - data flow

TTCrx registers

Event memory

Event fragment pointer list

TTC controller

FEE bus controller

FEE bus controller

Configuration memoryFEE bus controller

Slow control

SIU controller fifoSIU

Huffman encoder

• Shared memory modules

Page 9: HLT architecture

Data compression: TPC - RCU• TPC front-end electronics system architecture

and readout controller unit.• Pipelined Huffman Encoding Unit,

implemented in a Xilinx Virtex 50 chip*

* T. Jahnke, S. Schoessel and K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt

Page 10: HLT architecture

RCU prototypes• Prototype I

– Commercial OEM-PCI board– FEE-board test (ALTRO + FEE bus)– SIU integration– Qtr 3, 2001 – Qtr 2, 2002

• Prototype II– Custom design– All functional blocks– PCB: Qtr 2, 2002– Implementation of basic functionality (FEE-board -> SIU):

Qtr 2, 2002

– Implementation of essential functionalty: Qtr 4, 2002

• Prototype III – SRAM FPGA -> masked version or Antifuse FPGA (if needed)

• RCU production– Qtr 2, 2003

Page 11: HLT architecture

RCU prototype I

• Commercial OEM-PCI board– ALTERA FPGA APEX

EP20K400

– SRAM 4 x 32k x 16bits

– PMC I/O connectors (178 pins)

– Buffered I/O (72 pins)

Page 12: HLT architecture

RCU prototype I

• Implementation of basic test functionality– FEE-board test

(ALTRO + FEE bus)

– SIU integration

PCI core

SIU card

PCI bus FPGA APEX20k400

internalSRAM

I/O

onboard SRAM

4 x 32k x 16

FLASH EEPROM

FEE-busdaughter board

PMC

FEE boards trigger

Page 13: HLT architecture

RCU prototype II

• Implementation of essential functionality– Custom design

– All functional blocks

PCI coreSIU-CMCinterface

SIU

PCI bus

FPGA

internalSRAM

Memory D32

> 2 MB

FLASH EEPROM

SC TTC FEE-bus

Page 14: HLT architecture

RCU prototype II - schematics

JN

1

JN

2J

N4

JN

3

JN

2A

JN

5

APEX

Flash

Flash

Flash

SRAM SRAM

SRAM SRAM

SDRAM

Po

wer

(1.8

V G

en.)

SRAM SRAM

SRAM SRAM

CIA miscellaneous

Co

nn

ecto

rs

Page 15: HLT architecture

RCU prototype II – RCU mezzanine

RCU Mezzanine CardComponents on top side

No maximum height restriction

Front-End Bus Conn 1

Front-End Bus Conn 2

Page 16: HLT architecture

RCU prototype II - schematics

JN

1

JN

2J

N4

JN

3

JN

2A

JN

5

APEX

Flash

Flash

Flash

SRAM SRAM

SRAM SRAM

SDRAM

Po

wer

(1.8

V G

en.)

SRAM SRAM

SRAM SRAM

CIA miscellaneousSIU / DIU mezzanine card (1/2 CMC)

Co

nn

ecto

rs

RCU Mezzanine CardComponents on top side

No maximum height restriction

Front-End Bus Conn 1

Front-End Bus Conn 2

Page 17: HLT architecture

Programming model• Development version – status December 2001

PCLINUX RH7.1 (2.4.2)

PCI coremailboxmemory

SIU controller

FEE buscontroller

FEE bus

ALTROemulator

PCI-toolsRCU-API

device driver

ALTROemulator

PLDA board

SIU

DDL

Page 18: HLT architecture

SIU-RORC integration

SIU controllerPCI core

SIUinterface

PCI bus

FPGA

SRAM

LINUX/NTPLDA/PCI-

toolsRCU-API

devicer driver SIU

PCI bridge Glue logicDIU

interface

PCI bus

LINUXDDL/PCI-

toolspRORC-APIdevice driver DIU

DDL

RCU prototype I

pRORC

Page 19: HLT architecture

SIU-RORC integration• Result

data control

PC1:write memory block to FPGA internal SRAM

DDL

PC1 memory block

RCU internal SRAM

SIU

DIU

PC2 ”bigphys” memory area

SIU controller:wait for

READY-TO-RECEIVE

PC2:allocate bigphys area,

init link + pRORC

PC2:send DDL-FEE command READY-

TO-RECEIVESIU controller:strobe data into SIU

pRORC:copy data into bigphys area

via DMA=

Page 20: HLT architecture

RCU system for TPC test 2002

FEE-bus controller

SIU controller

PCI core

SIUinterface

PCI bus

FPGA

SRAM

LINUX RH7.xDATE

PLDA/PCI-toolsRCU-API

devicer driver SIU

PCI bridge Glue logicDIU

interface

PCI bus

LINUX RH7.xDATE

DDL/PCI-toolspRORC-APIdevice driver DIU

DDL

RCU prototype II/I

pRORC

ext. SRAM FLASH

Manager

FEE-bus

TriggerFEE-boards

Page 21: HLT architecture

Programming model• TPC test version – summer 2002

PCLINUX RH7.1 (2.4.2)

PCI coremailboxmemory

SIU controller

FEE buscontroller

FEE bus

FEE boards

PCI-toolsRCU-API

device driver

Prototype II(Prototype I)

SIU

DDL

DATEFEE configurator

RCUresource

& priority manager

Page 22: HLT architecture

TPC PCI-RORC

PCI bridge Glue logicDIU-CMCinterface

DIU card

PCI bus

FPGA Coprocessor

internalSRAM

MemoryD32

2 MB

Memory D32

2 MB

FLASH EEPROM

Page 23: HLT architecture

HLT architecture overview

• Not a specialized computer, but

a generic large scale (>500 node)

multi processor cluster• A few nodes have additional

hardware (PCI RORC)• has to be operational in off-line

mode also• Use of commodity processors• Use of commodity networks• Reliability and fault tolerance is

mandatory• Use standard OS (Linux)• Use of on-line disks as mass

storage

RcvBd

NICPC

IRcvBdRcvBd

NICNICPC

I RcvBd

NICPC

IRcvBdRcvBd

NICNICPC

I RcvBd

NICPC

IRcvBdRcvBd

NICNICPC

I RcvBd

NICPC

IRcvBdRcvBd

NICNICPC

I

HLT Network

ReceiverProcessos /HLT Processor

NICPC

I

NICNICPC

I

NICPC

I

NICNICPC

I

NICPC

I

NICNICPC

I

NICPC

I

NICNICPC

I

NICPC

I

NICNICPC

I

NICPC

I

NICNICPC

I

NICNICPC

I

HLT Processors

MonitoringServer

DistributedFarm Controller

Optical Links to Front-End

Page 24: HLT architecture

HLT - Cluster Slow ControlFeatures:

• Battery Backed Completely independent of host• Power Controller Remote powering of host• Reset Controller Remote physical RESET• PCI Bus perform PCI bus scans, identify devices• Floppy/flash emulator create remotely defined boot image• Keyboard driver remote keyboard emulation• Mouse driver remote mouse emulation• VGA replace graphics card• price very low cost

Functionality:• complete remote control of PC like terminal server but already at BIOS level• intercept port 80 messages (even remotely diagnose dead computer)• interoperate with remote server, providing status/error information• watch dog functionality• identify host and receive boot image for host• RESET/Power maintenance

Page 25: HLT architecture

HLT Networking (TPC only)

92 000

92 000

92 000

92 000

All data rates in kB/sec (readout not included here)

65 000

180

lin

ks,

200

Hz

cluster finder180+36 nodes

Track segments108+36 nodes

Track merger72+36 nodes

Global L312 nodes

Assume 40 Hz coinzidence trigger plus 160 Hz TRD pretrig with 4 sectors per triggerAssume 40 Hz coinzidence trigger plus 160 Hz TRD pretrig with 4 sectors per trigger

65 000

spare

spare

spare

17 000 000 aggregate 2 340 000 252 000 ?

7 000

7 000

Page 26: HLT architecture

HLT Interfaces

L3 Trigger Processor

Detectors

Event Builder

DCSEC(Experiment Control)

EC(Experiment Control)

On-line/off-lineSoftware

L3-API

L2A

L3A

Logging

Monitoring

DATA Grid

HLT internal, input and output interfacePublish/subscribe:

Publisher

SubscriberProxy

Publisher Process

Subscriber

PublisherProxy

Subscriber Process• When local do not move data – Exchange pointers only• Separate processes, multiple subscribers for one publisher• Network API and architecture independent• Fault tolerant (can loose node)• Consider monitoring• Standard within HLT and for input and output• Demonstrated to work on both shared memory paradigm and sockets• Very light weight

• When local do not move data – Exchange pointers only• Separate processes, multiple subscribers for one publisher• Network API and architecture independent• Fault tolerant (can loose node)• Consider monitoring• Standard within HLT and for input and output• Demonstrated to work on both shared memory paradigm and sockets• Very light weight

• HLT is autonomous system with high reliability standards (part of data path)• HLT has a number of operating modes

• on-line trigger• off-line processor farm• possibly combination of both

• very high input data rates (20 GB/sec)• high internal networking requirements• HLT front-end is first processing layer• Goal: same interface for data input, internal data exchange and data output

• HLT is autonomous system with high reliability standards (part of data path)• HLT has a number of operating modes

• on-line trigger• off-line processor farm• possibly combination of both

• very high input data rates (20 GB/sec)• high internal networking requirements• HLT front-end is first processing layer• Goal: same interface for data input, internal data exchange and data output

Page 27: HLT architecture

HLT system structure

TPC:fast cluster finder + fast tracker

Hough transform + cluster evaluatorKalman fitter

TRD trigger

Dimuon trigger

Trigger detectors

Pattern Recognition

Dimuon arm tracking

PHOStrigger

Extrapolate to ITS

Extrapolate to TOF

Extrapolate to TRD

...

Level-1

Level-3

(Sub)-event Reconstruction

Page 28: HLT architecture

raw data, 10bit dynamic range,zero suppressed

Huffman encoding (and vector quantization)

fast cluster finder: simple unfolding, flagging of

overlapping clusters

RCU

RORC

cluster list

raw data

fast vertex finder

fast track finder initialization (e.g. Hough transform)

Hough histogramsPeakfinder

receiver node

Preprocessing per sector

global node vertex position

detector front-end electronics

Huffman decoding,unpacking, 10-to-8 bit conversion

Page 29: HLT architecture

FPGA coprocessor: cluster finder• Fast cluster finder

– up to 32 padrows per RORC

– up to 141 pads/row and up to 512 timebins/pad

– internal RAM: 2x512x8bit

– timing (in clock cycles, e.g. 5 nsec)1:

#(cluster-timebins per pad) / 2 + #clustersouter padrow: 150

nsec/pad, 21 sec/row

1. Timing estimates by K. Sulimma, EDA group, Department of Computer Science, University of Frankfurt

– centroid calculation: pipelined array multiplier

Page 30: HLT architecture

FPGA coprocessor:Hough transformation

• Fast track finder: Hough transformations2

– (row,pad,time)-to-(2/R,,) transformation

– (n-pixel)-to-(circle-parameter) transformation

– feature extraction: local peak finding in parameter space

2. E.g. see Pattern Recognition Algorithms on FPGAs and CPUs for the ATLAS LVL2 Trigger,

C. Hinkelbein et at., IEEE Trans. Nucl. Sci. 47 (2000) 362.

Page 31: HLT architecture

raw data, 8bit dynamic range,decoded and unpacked

slicing of padrow-pad-time space into sheets of pseudo-rapidity,

subdiving each sheet into overlapping patches

track segments

fast track finder B:1. Hough transformation

receiver node

Processing per sector

vertex position,cluster list

sub-volumes in r,,

cluster deconvolutionand fitting

updated vertex positionupdated cluster list,track segment list

fast track finder B: 2. Hough maxima finder, 3. tracklett verification

RORC

fast track finder A:track follower