overview

Flow of data from the ATLAS detector to mass storagea brief overview of concept, design and realization
John Erik Sloper
ATLAS TDAQ group
CERN - Physics Dept.
Event Building & Event filtering
Current status of installation
With the TDAQ group for 3½ years
Computer science background, currently enrolled at university of Warwick, Coventry for a PhD in engineering
Today:
Practical view point
Using the real ATLAS TDAQ system as an example
We will go through the entire architecture of the TDAQ system from readout to storage
A brief overview of the status of the installation
*
*
Receiving data from all read-out links for the entire detector
Processing
“Building the events” – Collecting all data that correspond to a single event
Serving the triggering system with data
Storing
Triggering
The trigger has the job of selecting the bunch-crossings of interest for physics analysis, i.e. those containing interactions of interest
*
*
*
*
What is an “event” anyway?
In high-energy particle colliders (e.g. Tevatron, HERA, LHC), the particles in the counter-rotating beams are bunched
Bunches cross at regular intervals
Interactions only occur during the bunch-crossings
In this presentation “event” refers to the record of all the products of a given bunch-crossing
The term “event” is not uniquely defined!
Some people use the term “event” for the products of a single interaction between the incident particles
*
Trigger menus
Typically, trigger systems select events according to a “trigger menu”, i.e. a list of selection criteria
An event is selected by the trigger if one or more of the criteria are met
Different criteria may correspond to different signatures for the same physics process
Redundant selections lead to high selection efficiency and allow the efficiency of the trigger to be measured from the data
Different criteria may reflect the wish to concurrently select events for a wide range of physics studies
HEP “experiments” — especially those with large general-purpose “detectors” (detector systems) — are really experimental facilities
The menu has to cover the physics channels to be studied, plus additional event samples required to complete the analysis:
Measure backgrounds, check the detector calibration and alignment, etc.
*
ATLAS/CMS physics requirements
Triggers in the general-purpose proton–proton experiments, ATLAS and CMS, will have to:
Retain as many as possible of the events of interest for the diverse physics programmes of these experiments
Higgs searches (Standard Model and beyond)
e.g. H ZZ leptons, H gg; also H tt, H bb
SUSY searches
With and without R-parity conservation
Searches for other new physics
Using inclusive triggers that one hopes will be sensitive to any unpredicted new physics
Precision physics studies
*
uch = no. charged particles / unit-h
nch = no. charged particles / interaction
Nch = total no. charged particles / BC
Ntot = total no. particles / BC
nch = uch x h= 6 x 7 = 42
Nch = nch x 23 = ~ 900
Ntot = Nch x 1.5 = ~ 1400
The LHC flushes each detector with ~1400 particles every 25 ns
(p-p operation)
7.5 m
and without knowing where to look for:
the Higgs could be anywhere up to ~1 Tev or even nowhere…
Higgs -> 4m
+30 MinBias
No. particles in ATLAS/25 ns 1400
Data throughput
--> LVL1 Accepts O(100) GB/s
--> Mass storage O(100) MB/s
CERN, John Erik Sloper
Very high rate
New data every 25 ns – virtually impossible to make real time decisions at this rate.
Not even time for signals to propagate through electronics
Amount of data
TB/s
Can obviously not be stored directly. No hardware or networks exist (or at least not affordable!) that can handle this amount of data
*
*
The TDAQ system must reduce the amount of data by several order of magnitudes
Event Building
Event filtering
Global view
The ATLAS TDAQ architecture is based on a three-level trigger hierarchy
Level 1
Level 2
Even filter
It uses a Level 2 selection mechanism based on a subset of event data -> Region-of-Interest
This reduces the amount of data needed to do lvl2 filtering
Therefore, there is a much reduced demand on dataflow power
Note that ATLAS differs from CMS on this point
*
*
Fast first-level trigger (custom electronics)
Needs high efficiency, but rejection power can be comparatively modest
High overall rejection power to reduce output to mass storage to affordable rate
Progressive reduction in rate after each stage of selection allows use of more and more complex algorithms at affordable cost
*
*
LV
L1
Calo
MuTrCh
LVL1 selection criteria
Features that distinguish new physics from the bulk of the cross-section for Standard Model processes at hadron colliders are:
In general, the presence of high-pT particles (or jets)
e.g. these may be the products of the decays of new heavy particles
In contrast, most of the particles produced in minimum-bias interactions are soft (pT ~ 1 GeV or less)
More specifically, the presence of high-pT leptons (e, m, t), photons and/or neutrinos
e.g. the products (directly or indirectly) of new heavy particles
These give a clean signature c.f. low-pT hadrons in minimum-bias case, especially if they are “isolated” (i.e. not inside jets)
The presence of known heavy particles
e.g. W and Z bosons may be produced in Higgs particle decays
Leptonic W and Z decays give a very clean signature
Also interesting for physics analysis and detector studies
*
High-pT muons
Identified beyond calorimeters; need pT cut to control rate from p+ mn,
K+ mn, as well as semi-leptonic beauty and charm decays
High-pT photons
Identified as narrow EM calorimeter clusters; need cut on ET; cuts on isolation and hadronic-energy veto reduce strongly rates from high-pT jets
High-pT electrons
Same as photon (matching track in required in subsequent selection)
High-pT taus (decaying to hadrons)
Identified as narrow cluster in EM+hadronic calorimeters
High-pT jets
Identified as cluster in EM+hadronic calorimeter — need to cut at very high pT to control rate (jets are dominant high-pT process)
Large missing ET or total scalar ET
*
Cable length ~100 meters …
44 m
22 m
Total Level-1 latency = (TOF+cables+processing+distribution) = 2.5 msec
For 2.5 msec, all signals must be stored in electronics pipelines (there are 108 channels!)
LV
L1
D
E
T
RO
Calo
MuTrCh
LVL1 trigger rate (high lum) = 40 kHz
Total event size = 1.5 MB
Total no. ROLs = 1600
LVL1 trigger rates
Muon
Channels
2.5 ms
The Level-1 selection is dominated by local signatures
Based on coarse granularity (calo, mu trig chamb), w/out access to inner tracking
Important further rejection can be gained
with local analysis of full detector data
Region of Interest - Why?
The geographical addresses of interesting signatures identified by the LVL1 (Regions of Interest)
Allow access to local data of each
relevant detector
Typically, there is 1-2 RoI per event accepted by LVL1
<RoIs/ev> = ~1.6
a few % of the Level-1 throughput
# region <-> ROB number(s)
(for each detector)
-> for each RoI the list of ROBs with the corresponding data from each detector is quickly identified (LVL2 processors)
This mechanism provides a powerful and economic way to add an important rejection factor before full Event Building
4 RoI
-f addresses
control traffic …
RoI mechanism - Implementation
Note that this example is atypical; the average number of RoIs/ev is ~1.6
There is typically only 1-2 RoI / event
Only the RoI Builder and ROB input see the full LVL1 rate
(same simple point-to-point link)
ROS systems
the amount of data required
the overall time budget in L2P
the rejection factor
the amount of data required : 1-2% of total
the overall time budget in L2P : 10 ms average
the rejection factor : x 30
LV
L1
D
E
T
RO
ROIB
L2P
L2SV
L2N
RoI
ROD
ROD
ROD
ROB
ROB
ROB
SFI
EBN
EB
H
L
T
D
A
T
A
F
L
O
W
LV
L1
D
E
T
RO
ROIB
L2P
L2SV
L2N
RoI
ROS systems
GbEthernet
L2
Switch
EB
Switch
Cross
Switch
SFO
1600 fragments of ~ 1 kByte each
*
All fully equipped with ROBINs
PC properties
• Motherboard: Supermicro X6DHE-XB
• RAM: 512 MB
Redundant power supply
Remote management via IPMI
4 GbE on PCI-Express card
1 for LVL2 + 1 for event building
Spares:
27 PC purchased (7 for new detectors)
ROBINs (Read-Out Buffer Inputs)
Input : 3 ROLs (S-Link I/F) Output : 1 PCI I/F and 1 GbE I/F (for upgrade if needed)
Buffer memory : 64 Mbytes (~32k fragments per ROL)
~700 ROBINs installed and spares
+ 70 ROBINs ordered (new detectors and more spares)
System is complete : no further purchasing/procurement foreseen
*
*
Trigger / DAQ architecture
- 1 Chassis + blades for 1 and 10 GbE as required
For July 2008 : 14 blades in 2 chassis
~700 GbE ports@line speed
Final system : extra blades
• Force10 E600
following EF evolution
following full system evolution
Final system : extra blades
~ 500 PCs for L2
~ 1800 PCs for EF
more applications)
Recently decided:
connected to both L2 and EF networks
161 XPU PCs installed
130 x 8 cores
• RAM: 1 GB / core, i.e. 8 GB
+ 31 x 4 cores
Network:
2 GbE onboard, 1 for control network, 1 for data network
VLAN for the connection to both
data and back-end networks
For July 2008 : total of 9 L2 + 27 EF racks
as from TDR for 2007 run (1 rack = 31 PCs)
Final system : total of 17 L2 + 62 EF racks
of which 28 (of 79) racks with XPU connection
*
*
Timing Trigger Control (TTC)
• Motherboard: Intel
• RAM: 4 GB
Network:
3 GbE on PCIe card for data
System is complete, no further purchasing foreseen
required b/width (300MB/s) already available
to facilitate detector commissioning and
calibrations in early phase
Routinely used for
Debugging and standalone commissioning of all detectors after installation
TDAQ Technical Runs - use physics selection algorithms in the HLT farms
on simulated data pre-loaded in the Read-Out System
Commissioning Runs of integrated ATLAS - take cosmic data through
full TDAQ chain (up to Tier-0) after final detectors integration
44 m
22 m
CHEP 07
B. Gorini - Integration of the Trigger and Data Acquisition Systems in ATLAS
Run Control
Essentially all detectors integrated
Shifters and systems readiness
60% of final EB
GbE b/w is limit
mean ~ 83 ms
mean = 26.5 ms
mean >94.3 ms
mean = 5.3
mean = 31.5 ms
CHEP 07
B. Gorini - Integration of the Trigger and Data Acquisition Systems in ATLAS
Preliminary EF measurements
mean 1.57 s
Only a snapshot of one particular setup, still far from being representative of the final hardware setup, typical high luminosity trigger menu, and actual LHC events!
Preliminary
*
In June we had a 14 day combined cosmic run. Ran with no magnetic field.
Included following systems:
Muons – RPC (~1/32) ,
EM (LAr )(~50%) &
Hadronic (Tile) (~75%)
Tracking – Transition Radiation Tracker (TRT) (~6/32 of the barrel of the final system)
*
We seem to be in business,
the ATLAS TDAQ system is doing its job
at least so far…
making use of the Region-of-Interest mechanism
--> important reduction of data movement
The system design is complete, but is open to:
Optimization of the I/O at the Read-Out System level
Optimization of the deployment of the LVL2 and Event Builder networks
The architecture has been validated via deployment of full systems:
On test bed prototypes
And via detailed modeling to extrapolate to full size
*
*

overview

Documents