alice online upgrade
DESCRIPTION
ALICE Online upgrade. Pierre VANDE VYVRE. October 03, 2012 Offline Meeting, CERN. Requirements: Event Size and Rate. - PowerPoint PPT PresentationTRANSCRIPT
Pierre VANDE VYVRE
ALICE Online upgrade
October 03, 2012 Offline Meeting,
CERN
P. Vande Vyvre 2 03-Oct-2012
Detector
Event Size (MByte)
After Zero After Data Suppression Compression
Input to Online System
(GByte/s)
Peak Output to local data
storage (GByte/s)
Avrg Output Computing
Center (GByte/s)
ITS 0.8 0.20 40 10.0 1.6
TPC 20.0 1.00 1000 50.0 8.0
TRD 1.6 0.20 81.5 10.0 1.6
Others (1) 0.5 0.25 25 12.5 2.0
Total 22.9 1.65 1146.5 82.5 13.2
Requirements: Event Size and Rate
• Expected peak Pb-Pb Minimum Bias rate after LS2: 50 kHzSystem must be able to scale with a safety factor of 2 (readout part from the start)Expected average over a fill Pb-Pb Minimum Bias (MB) rate after LS2: 20 kHz
• Global data compression strategy: Optimize the overall cost by doing zero suppression on the detectors Data throughput reduced in the online farm by data compression (no selection)
• Combined efficiency of ALICE and LHC: ~106 s/month of data taking1 month of HI: ~2.0 x 1010 events
(1) Under study
P. Vande Vyvre 3 03-Oct-2012
Online System Requirements
• Common DAQ and HLT farm for cost reason. Should also be usable as offline Tier 1.
• Detector readout Detector throughput at 50 kHz: 9 Tbit/s + safety factor 2 for 100 kHz Capacity : 25 Tbit/s (~2500 detector links 10 Gb/s)
• First-Level Processors (FLPs) Input : 12 inputs at 10 Gb/s ~250 FLPs needed
• Event building Input to global reconstruction: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s Output to data storage: ~ 82.5 GB/s Total network throughput : ~310 GB/s or 2.5 Tb/s 250 x 2 links at 10 Gb/s (or 1 link at 40 Gb/s) to event building and HLT
• Event-Building and Processing nodes (EPNs) Current HLT: ~200 nodes with GPUs, ~2500 cores Computing power requirements increase by ~100 in 6 years Technology evolution (“Moore’s law” extension to computers): factor ~16 → ~1250 EPNs with GPUs needed ~1250 links at 10 Gb/s or 40 Gb/s to the network
Upgrade Online2x10 or 40 Gb/sFLP
DAQ and HLT10 or
40Gb/s
FLPEPN
FLP
ITS
TRD
Muon
FTP
L0L1
FLPEMCal
EPN
FLPTPC
DataStorage
FLP
FLPTOF
FLP
FarmNetworkPHOS
Trigger Detectors
~ 2500 DDL3s10 Gb/s
~ 250 FLPs ~ 1250 EPNs
L0
DataStorage
EPN
EPN
StorageNetwork
RORC3
RORC3
RORC3
RORC3
RORC3
RORC3
RORC3
RORC3
03-Oct-2012P. Vande Vyvre 4P. Vande Vyvre
Dataflow• Combination of continuous and triggered readout
• Fast Trigger Processor to complement/replace the present CTP• L0: Minimum Bias trigger for every interaction• L1: selective trigger• LHC clock and L0 trigger distributed for data tagging and test purpose
• Detector triggering and electronics• Continuous readout for TPC and ITS when
average inter-events arrival time < TPC drift time• At 50 kHz, ~5 events in TPC during the TPC drift time of 92 µs
• TRD: MB L0 trigger. Max: 50 kHz.• TOF: MB L0 trigger. Max: 400 kHz.• EMC, PHOS, Muon: L1 rare trigger
• Detector readout• RORC3: DDL 3 interface and cluster finder in the same FPGA shared by DAQ and HLT
• Event building and processing• FLPs: clusters of sub-events for triggered detectors and time-windows for
continuous readout detectors• EPNs:
• Tracking of time windows• Association of clusters to events only possible after the tracking • Final step of event building after the online reconstruction
03-Oct-2012P. Vande Vyvre 5P. Vande Vyvre
P. Vande Vyvre 6 03-Oct-2012
DDL and RORC evolutionDetector End (DDL) Online System End (RORC)
Run 1 2010-2012
DDL1: 2 Gb/sOperation phase
Daughter card
RORC1: 2 DDLs/board(1 to DAQ and 1 to HLT)PCI-X and PCIe Gen1 x4 (1 GB/s)
Run 2 ~ 2014-2016
DDL2: 6 Gb/s Compatible with DDL1 Prototyping phase
VHDL + optical transceiver
RORC2: 12 DDLs/board(6 to DAQ and 6 to HLT)PCIe Gen2 x8 (4 GB/s)Used by HLT for transition to PCIe
Run 3 ~ 2018-2020
DDL3: 10 Gb/sTechnology investigation
RORC3: 10-12 DDLs/board(all to common DAQ/HLT)PCIe Gen3 x16 (16 GB/s) or PCIe Gen4
444
P. Vande Vyvre 7 03-Oct-2012
• First-Level Processors (FLP): powerful I/O machine• Most recent architecture of Intel dual sockets servers
(Sandy Bridge): Dual socket 40 PCIe Gen lanes directly connected to the processor (40 GB/s) Memory bandwidth: 6-17 GB/s per memory bank
FLP I/O Throughput
CPU 1 QPI
DDR3
DDR3
DDR3
DDR3
PCIe Gen3 40 lanes
CPU 1
DDR3
DDR3
DDR3
DDR3
PCIe Gen3 40 lanes
P. Vande Vyvre 8 03-Oct-2012
• C-RORC aka RORC-2 prototype• Performance test of PCIe Gen 2 interface on 2 generations of Intel CPU• New CPU (Sandy Bridge) provide a significantly better performance than
the previous one (Nehalem) in particular for small event size• Fulfils the needs: 1 PCIe Gen 2 x8 provides 2.3-3.2 GB/s
C-RORC Prototyping
Plot of H. Engel
P. Vande Vyvre
• 2 different topologies are considered for the computing farm network• Fat-tree network with one central director switch or router and several edge switches
+ one single box to configure - one central point
of failure
• Spine and leaf network with several spine switches interconnecting leaf switches + cheaper cost per port, lower power consumption, more ports per Rack Unit (RU) + graceful degradation
in case of failure - more cabling
03-Oct-20129
Network Topology
Uplink
m21 3 …
DirectorSwitch
Edge Switch
1 n1 n 1 n 1 n
P. Vande Vyvre 9
p2 3
…1
SpineSwitch
m21 3…Leaf Switch
1 n1 n 1 n 1 n
Uplinks
P. Vande Vyvre 10 03-Oct-2012
• 2 network technologies are being considered: Infiniband and Ethernet • Both can be used with both topologies
Network Layout Infiniband
Infiniband network using a fat tree topology
• Edge switches: SX6025 (36 ports, 4.03 Tb/s)32 ports: data traffic to/from the nodes2 ports: data traffic to/from the director switch
• Director switch IS5200 (216 ports, 17.3 Tb/s)• Total throughput: 2 x 48 x 40 Gb/s = 3.8 Tb/s• Total ports: 48 x 32 = 1536 ports at 40 Gb/s
2 x 40 Gb/s
2 x 40 Gb/s
1 32
48
1 32
2
1 32
1
1 32
3
1 32
4 …
Network requirements
• Ports: ~250 FLPs and ~1250 EPNTotal ~1500 ports at 40 Gb/s
• Total throughput:• EPN input: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s• EPN output: ~ 82.5 GB/s• Total: 310 GB/s or 2.5 Tb/s
P. Vande Vyvre 11 03-Oct-2012
Network Layout Ethernet
Ethernet network using a fat tree topology
• Leaf switch: Z9000 (128 ports 10 GbE, )75 ports 10 GbE: data traffic to/from the nodes4 ports 40 GbE: data traffic to/from the other switches
• Spine switch: Z9000 (32 ports 40 GbE) • Total throughput: 4 x 24 x 40 =3.8 Tb/s• Total ports: 24 x 75 = 1800 ports at 10 Gb/s
1
4 x 40 Gb/s
…
4 x 40 Gb/s
1 75
24
1 75
2
1 75
1
1 75
3
1 75
4
42 3
Network requirements
• Ports: ~2*250 FLPs and ~1250 EPNTotal ~1750 ports at 10 Gb/s
• Total throughput:• EPN input: 50 kHz * ~ 4.5 MB/event: ~ 225 GB/s• EPN output: ~ 82.5 GB/s• Total: 310 GB/s or 2.5 Tb/s
P. Vande Vyvre 12 03-Oct-2012
• The ALICE online and offline sw frameworks are to be redesigned and rewritten: New requirements (common farm, higher rates and
throughput, large fraction of “offline” processing performed online etc)
New computing platforms requiring much more parallelism
DAQ-HLT-Offline Framework Panel• Common Firmware framework to be defined
Detector readout Cluster finder Accommodate for HLT modes
Software and Firmware
P. Vande Vyvre 13 03-Oct-2012
• 2012-2014 Strategy definition and R&D. Kick-off meeting during October ALICE week: Wednesday 10th October 14:00-18:00 Definition of the strategy to achieve a common DAQ, HLT and offline software framework. R&D on key hardware, firmware and software technologies.
• 2013-2016 Simulation, demonstrators and prototypes. Choice of
technologies. Development and exploitation of a program simulating the trigger and dataflow architecture. Development of demonstrators and prototypes for the key hardware, firmware technologies. Development of the new common software framework. Selection of technologies used in production. Technical Design Report
• 2017-2020 Production and procurement. Staged deployment.
Production of the hardware developed by the projects. Market surveys, tendering and procurement of commercial equipement. Staged deployment with a profile compatible with the detector installation for the readout
part (DDL3 and FLPs) and the accelerator luminosity for the processing (EPNs, network and data storage).
Next steps (Project)