experience with multi-threaded c++ applications in the atlas dataflow szymon gadomski university of...

26
Experience with multi- threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf of the ATLAS Trigger/DAQ DataFlow, CHEP 2003 conference Performance problems found and solved: • STL containers • thread scheduling • other

Upload: hope-arnold

Post on 17-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"3 ATLAS Data Flow software (2) State of the project: –development done mostly in , –measurements for Technical Design Report – performance, –preparation for beam test support – stability, robustness and deployment. 7 kinds of applications (+3 kinds of controllers) Always several threads (independent processes within one application without their own resources). Roles, challenges and use of threads very different. In this short talk only a few examples –use of threads, problems, solutions.

TRANSCRIPT

Page 1: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

Experience with multi-threaded C++ applications in the ATLAS

DataFlowSzymon GadomskiUniversity of Bern, Switzerlandand INP Cracow, Polandon behalf of the ATLAS Trigger/DAQ DataFlow, CHEP 2003 conference

Performance problems found and solved:• STL containers• thread scheduling• other

Page 2: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 2

ATLAS DataFlow software• Flow of data in the ATLAS DAQ system

– Data to LVL2 (part of event), to EF (whole event), to mass storage.

– See talks by Giovanna Lehman (overview of DataFlow) and by Stefan Stancu (networking).

• PCs, standard Linux, applications written in C++ (so far using only gcc to compile), standard network technology (Gb ethernet).

• “Soft” real time system, no guaranteed response time. The average response time is what matters.

• Common tasks (exchanging messages, state machine, access configuration db, reporting errors, …) using a framework (well, actually two…).

Page 3: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 3

ATLAS Data Flow software (2)• State of the project:

– development done mostly in 2001-2002,– measurements for Technical Design Report –

performance,– preparation for beam test support – stability,

robustness and deployment.• 7 kinds of applications (+3 kinds of controllers)• Always several threads (independent processes

within one application without their own resources). • Roles, challenges and use of threads very different. • In this short talk only a few examples

– use of threads, problems, solutions.

Page 4: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 4

Testbed at CERN

1U PCs >= 2 1U PCs >= 2 GHzGHz

4U PCs >= 2 4U PCs >= 2 GHzGHz

FPGA FPGA Traffic Traffic generatorsgenerators

Page 5: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 5

LVL2 processing unit (L2PU) - role

Multiplicties are

indicative only

L2PUL2SV

DataFlow application

Interface with control software

1600x

10xUp to 500x

MassStorage

pROS1x

ROBROBROBROBROBROBROBROB

140x

Detectordata!

ROS

L1 + RoI data

data request(RoI only)

data

Open choice.

detailedLVL2 result

LVL2 decision

• gets LVL1 decision• asks for data • gets it• makes LVL2 decision• sends it• sends detailed result

Page 6: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 6

L2PU design

Worker Thread

Worker Thread

Worker Thread

Input Thread

RoI Data Requests

RoI Data

L2SV

ROS‘s

LVL2 Decision

L2PULVL1 Result

Worker Thread

pROSLVL2

Result

Assemble RoI Data

Add to Event Queue

Get next Event from Queue

If Accept send Result

Run LVL2 Selection code

Send Decision

Request data + wait

RoI Data

RoI Data

RoI D

ata

Continue Selection code

If complete restart Worker

Page 7: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 7

Sub-farm Interface (SFI) - role

Multiplicties are indicative only

SFI

DataFlow application

EF

Interface with control

50x

140x

1x

MassStorage

LVL2 accepts

and rejectscomplete event

DFM

ROSdata

requestclear

assign

EoE

• gets event id (L2 accept)• asks for all event data • gets it• builds complete event• buffers it• sends it to Event Filter

request

Page 8: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 8

Assembly Thread

SFI Design

Input Thread Request Thread Event Handler

Data Requests

Event Data

Event Assign

s

Different threads for requesting and receiving data Threads for assembly and for sending to Event Handler

DFMEB Rate/SFI

~50 HzEnd of

Event

SFI

Reask Fragment ID

sAssigns

ROSFragments Events

EFFull

Event

ROS

Page 9: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 9

Lesson with L2PU and SFI – STL containers

# th

read

s

time blocked!

• With no apparent dependence between threads in code, it was observed that threads were not running independently. No effect from more threads.

• VisualThreads, using instrumented pthread library:– STL containers use a memory pool, by default one per

executable. There is a lock, threads may block each other.

Page 10: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 10

Lesson with L2PU and SFI – STL containers (2)

• The solution is to use pthread allocator. Independent memory pools for each thread, no lock, no blocking. • Use for all containers used at event rate.• Careful with creating objects in one thread and deleting in another.

blocked less often

# th

read

s

Page 11: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 11

SFI HistoryDate Change EB EB +

Output to EF30 Oct `02 First integration on testbed 0.5 MB/s -13 Nov Sending data requests at a regular pace 8.0 MB/s -14 Nov Reduce the number of threads 15 MB/s -20 Nov Switch off hyper-threading 17 MB/s -21 Nov Introduce credit based traffic shaping 28 MB/s -13 Dec First try on throughput - 14 MB/s17 Jan Chose pthread allocator for STL object 53 MB/s 18 MB/s29 Jan DC Buffer recycling when sending 56 MB/s 19 MB/s05 Feb IOVec storage type in the EFormat library 58 MB/s 46 MB/s21 Feb Buffer pool per thread 64 MB/s 48 MB/s21 Feb Grouping interthread communication 73 MB/s 51 MB/s26 Feb Avoiding one system call per message 80 MB/s 55 MB/s

threads

threadsthreads

threads

threadsthreads

Most improvements (and most problems) are related to threads.

Page 12: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 12

Lessons from SFI• Traffic shaping (limiting the number of outstanding

requests for data) eliminates packet loss.• Grouping interthread communication – decrease

frequency of thread activation.• Some improvements in more predictable areas:

• avoiding copies and system calls, • avoiding creations by recycling buffers,• avoiding contention, each thread has its own buffers. Optimizations driven by measurements with full functionality.

• Effective development: developer works on a

good testbed, tests and optimizes, short cycle.

Page 13: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 13

0

10

20

30

40

50

60

0 2 4 6 8 10#ROLs/ROS

EB

rat

e H

z

Performance of the SFI

• Reaching I/O limit at 95 MB/s otherwise CPU limited• 35% performance gain with at least 8 ROLs/ROS• Will approach I/O limit for 1 ROL/ROS with faster CPU

95 MB/s – IO limited

#ROLs/ROS

EB only

ThroughputCPU limited (2.4 GHz CPU)

Page 14: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 14

Readout System (ROS) - role

ROBinROBinROBin

I/O Manager

~12 bufersfor data

ROS controllerLVL2 or EB

Data request

request data

data

ROI collection andpartial event building.

Not exactly like SFI:ROS SFI

RequestRate

24 kHz L23 kHz EB

50 Hz

Dataper req.

2 kB LVL28 kB EB

1.5 MB

Data rate

72 MB/s 75 MB/s

All numbers approximate.

Page 15: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 15

IOManager in ROS

= Thread

= Process= Linux Scheduler

Requests(L2, EB, Delete)

Request Queue

RobInsRequest Handlers

Control, error

Trigger

The number of request handlers is configurable

Page 16: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 16

request rate vs. # request handlers

0

20

40

60

80

100

0 2 4 6 8 10 12

# request handlers

requ

est r

ate

(kHz)

patch

no patch

•System without interrupt. Poll and yield. •Standard linux scheduler puts the thread away until next time slice. Up to 10 ms.

Solution is to change scheduling in kernel•For 2.4.9 kernels there exists an unofficial patch (tested on CERN RH7.2) •For CERN RH7.3 there is a CERN-certified patch linux_2.4.18_18_sched.yield.patch

20 s latency for getting data

This is and evolving field, need to continue evaluating thread-related changes of Linux kernels.

Thread scheduling problem

Page 17: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 17

Conclusions• The DataFlow of ATLAS DAQ has a set of

applications managing the flow of data.• All prototypes exist, have been

optimized, are used for performance measurements and are prepared for Beam Test.

• Standard technology (Gb ethernet, PCs, standard Linux, C++ with gcc, multi-threaded) meets ATLAS requirements.

• A few lessons were learned.

Page 18: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 18

Backup slides

Page 19: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 19

Data Flow Manager (DFM) - role

Multiplicties are indicative only

L2SV SFI

DataFlow application

EF

I/F with OnlineSW

100x

200x

16x

1x

MassStorage

SFO

30x

Disk files

LVL2 accepts

and rejects

data data

data

DFM

ROSdata

requestclear

assign

EoE

Page 20: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 20

DFM Design

Bulk of work done in I/O threadCleanup thread identifies timed out eventsFully embedded in the DC framework

Threads allow for independent and

parallel processing within an application

DFM

I/O Thread Cleanup ThreadLoad BalancingBookkeeping

L2 DesicionsEndOfEvent

SFI AssignsTimeouts

L2SV

L2 Decisions

EventAssigns

EndOfEvent

Clears

ROSI/O Rate ~4 kHz

SFI

Page 21: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 21

STL containers (3)

Page 22: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 22

SFI performanceInput up to 95 Mb/s (~3/4 of the 1 Gb line)Input and output at 55 Mb/s (~1/2 line speed)

With all the logic of EventBuilding and all the objects involved, the performance is already close to the network limit (on a 2.4 GHz PC).

Page 23: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 23

0

100

200

300

400

500

600

700

800

0 2 4 6 8 10

# of SFIs

Tota

l dat

a ra

te o

f the

EB

[MB/

s]

Performance of Event Building

max EB rate with 8 SFIs ~ 350Hz (17% of ATLAS EB rate)

• N SFIs• 1 DFM• hardware emulators of ROS

Page 24: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 24

After the patch Xeon/2GHz - Linux 2.4.18+CERN scheduling patch

0

50

100

150

200

0 10 20 30 40

# request handlers

L2 re

ques

t rat

e (k

Hz)

latency = 2 usecslatency = 5 usecslatency = 10 usecslatency = 20 usecslatency = 50 usecslatency = 100 usecslatency = 1000 usecs

100% L2Requests1 ROL per L2 request

release grouping = 100

Simulated I/O latency

Page 25: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 25

Flow of messages

5a: DFM_Decision

SFIROS/ROBDFM

5b: SFI_EoE

SFI_FlowControl

Note6a: SFI_DataRequest associated with

5a: DFM_Decision used for error recovery.

1..n

L2PUL2SV

2a: L2PU_Data Request

p ROS

3a: L2PU_LVL2Result

4a: L2SV_LVL2 Decision

2b: ROS/ROB_Fragment

1..i

DFM_FlowControl

Build event

3b: pROS_Ack

wait EoE

reassign

1..n

1a: L2SV_LVL1ResultRoIB

EF

wait LVL2 decision

or time out 1..i

4b: DFM_Ack

1b: L2PU_LVL2Decision

1..n1..n

6b: ROS/ROB_EventFragment

5a': DFM_SFIAssign

6a: SFI_DataRequest

7: DFM_Clear

receive or timeout

sequential processing or time out

time-out event

or time out

Page 26: Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf

CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow" 26

LVL2 Processors

DFMs

Local EF Farms

SFIs

To Remote EF Farm

LVL2 Supervisors

RoIBSV

SwitchDFM

Switch

SubFarmSwitchSubFarm

Switch EF Switch EF Switch

RO{B,S}

EB Switch

LVL2 Switch

RO{B,S}RO{B,S}RO{B,S}RO{B/S}

RODsRODsRODsRODsRODsRODsRODs RODsRODsRODs

Deployment view