Download - FairRoot Status and plans
FairRoot
Status and plans
Mohammad Al-Turany
6/25/13 M. Al-Turany, ALICE Offline Meeting 1
What is FairRoot Framework? And why it is needed?
6/25/13 M. Al-Turany, ALICE Offline Meeting 2
http://fairroot.gsi.de
• Simulation-, Reconstruction-, and Analysis-Framework • 2003 started as 2 person project for the CBM experiment at FAIR• Long list of base and/or ready to use modules and base classes of
needed by the particle experiments
Current hot topics in FairRoot
• Database interface o Re-design the database interface based on TSQLServer
• ZeroMQ integrationo Use of ZeroMQ as a communication layer
• Building, testing and quality assurance systemso Coverage tests, quality tests and unit tests
• Online monitoring o For test beams and detector proto-types
• GPU support and integration
• Time based simulation
6/25/13 M. Al-Turany, ALICE Offline Meeting 3
long list of people
who have
contributed pieces of
code to FairRoot
since the project
started end of 2003
6/25/13 M. Al-Turany, ALICE Offline Meeting 4
Core Team:• Mohammad Al-Turany IT • Denis Bertini IT• Florian Uhlig CBM / IT• Radek Karabowicz PANDA / IT• Dmytro Kresan R3B/ IT• Tobias Stockmanns PANDA
(FZJ)
People participated to major features:Ilse König HADESVolker Friese CBMOlaf Hartman PANDA
FairRoot Developers:
Student:Dennis Klein (finished 02.2013)Alexey Rybalchenko (EE)
FairRoot Group at the GSI
• Mohammad Al-Turany (IT)
• Denis Bertini (IT)
• Radoslaw Karabowicz (IT/PANDA)
• Dymtro Kresan (IT/R3B)
• Anar Manafov (IT)
• Alexey Rybalchenko (Master Student)
• Yago Gonzalez Rozas (Guest scientist)
• Florian Uhlig (IT/CBM)
• N.N. (Sep.2013) (IT)
6/25/13 M. Al-Turany, ALICE Offline Meeting 5
Design
6/25/13 M. Al-Turany, ALICE Offline Meeting 613.03.13
Florian Uhlig ROOT Users Workshop, Saas Fee
Root
TE
ve
RO
OT IO
TG
eo
TV
irtu
alM
C
Cin
t
TTr
ee …
Pro
of
Geant3
Geant4
Genat4
_VM
C
Libraries
…
VG
M
FairRoo
t
…
Run
Manager
IO
Manager
Runti
me
DB
DB
In
terf
ace
Even
t D
ispla
y
MC
A
pplic
ati
on
Module
Dete
ctor
Task
Magneti
c Fi
eld
…
Even
t G
enera
tor
CbmRoot
PandaRoot AsyEosRoot
R3BRoot SofiaRoot MPDRoot
FopiRoot EICRoot
Start testing the VMC concept for CBM
First Release of CbmRoot
MPD (NICA) start also using FairRoot
ASYEOS joined(ASYEOSRoot)
GEM-TPC seperated from PANDA branch (FOPIRoot)
Panda decided to join->FairRoot: same Base package for different experiments
R3B joinedEIC (Electron Ion Collider BNL)EICRoot
2011201020062004
FairRoot : Timeline
2012
SOFIA (Studies On Fission with Aladin)
6/25/13 M. Al-Turany, ALICE Offline Meeting 7
ENSAR-ROOTCollection of modules used by structural nuclear phsyics exp.
2013
Database Re-Design
6/25/13 M. Al-Turany, ALICE Offline Meeting 8
Database in FairRoot:The real database in FairRoot is completely hidden from the user and/or software developer
• The runtime database is not a database in the classical
sense, but a parameter manager.
• It knows the “I/O”s defined by the user and all parameter
containers needed for the actual analysis and/or
Simulation.
• It manages the automatic initialization and saving of the
parameter containers
• After all initialization the complete list of runs and related
parameter versions are saved either to Database (Oracle,
MySql, …) or to ROOT files. 6/25/13 M. Al-Turany, ALICE Offline Meeting 9
FairRoot DB Design (Old)
10
FairRoot
Run Manager
RunTime Database
ASCII FileConfigurationparameters.
IO Manager
Root FileMC-pointsDigits, etc
Root FileConfigurationparameters.
Oracle
6/25/13 M. Al-Turany, ALICE Offline Meeting
FairRoot DB extended
11
FairRoot
Run Manager
RunTime Database
ASCII FileConfigurationparameters.
IO Manager
Root FileMC-pointsDigits, etc
Root FileConfigurationparameters.
TSQLServerOracle
Postgresql
MySQL
DB Interface
6/25/13 M. Al-Turany, ALICE Offline Meeting
Re-design Database interface based on ROOT Database Connectivity (RDBC) API which provides uniform interface to Oracle, MySQL, PgSQL • Database Interface in FairRoot using TSQLServer
– (MySQL, Oracle, PostGre,... )
• Allows multiple connections to Dbs at runtime• Adds Version Management
• Data type: Real and/or MC• Detector type• Date and Time Range
• Reduces SQL coding• Simple Predefined Table• Only Simple SQL used• Ultimately Generic Container
• Handles Write/Read access
6/25/13 M. Al-Turany, ALICE Offline Meeting 12
Detector
Time
Version
Validity time range (UTC)
STS CALMVD CAL
MVD TEMP
Version Mangment
• It must be possible to get a consistent set of information for any date (e.g. The start time of a certain run).
• It must be possible to get an answer to the question: 'Which parameters were used when analyzing this run X years ago?' (The calibration might have been optimized several times since this date. Maybe some bugs have been detected and corrected in the mean time.)
6/25/13 M. Al-Turany, ALICE Offline Meeting 13
RunID t
Time
Version Management The Query process1. Context ( Timestamp,Detector,Version) is the primary key2. Context converted to unique SeqNo3. SeqNo used as keys to access all rows in main table4. System gives user access of all such rows
SEQNoContextmatched
SeqNO Col 1 …
Col nValidity Frame
900001020900001020
900001020
900001020
Bigtable a Distributed Storage System forStructured Data, Google inc. OSDI 2006
Auxiliary validity table
D. Bertini
146/25/13 M. Al-Turany, ALICE Offline Meeting
New Data transfer layer for FairRoot
6/25/13 M. Al-Turany, ALICE Offline Meeting 15
The Online Reconstruction and analysis
6/25/13
300 GB/s20M Evt/s
< 1 GB/s25K Evt/s
We have the fastest algorithms but:How to distribute the processes? How to manage the data flow? How to recover processes when they crash?How to monitor the whole system?……
1 TB/s 1 G
B/s> 60 000 CPU-core
or Equivalent GPU, FPGA, …
> 60 000 CPU-core
or Equivalent GPU, FPGA, …
M. Al-Turany, ALICE Offline Meeting 16
Design constrains
• Highly flexible: o different data paths should be modeled.
• Adaptive: o Sub-systems are continuously under development and improvement
• Should works for simulated and real data: o developing and debugging the algorithms
• It should support all possible hardware where the algorithms
could run (CPU, GPU, FPGA)
• It has to scale to any size! With minimum or ideally no effort.
6/25/13 M. Al-Turany, ALICE Offline Meeting 17
Data transport
• How to handle dynamic components, i.e. pieces that go
away temporarily?
• How to handle messages that we can't deliver
immediately? Particularly, if we're waiting for a component
to come back on-line
• What if we need to use a different network transport. Say,
multicast instead of TCP unicast? Or IPv6? Do we need to
rewrite the applications, or is the transport abstracted in
some layer?6/25/13 M. Al-Turany, ALICE Offline Meeting 18
Before Re-inventing the Wheel
• What is available on the market and in the community?o A very promising package: ZeroMQ is available since 2 years
• Do we intend to separate online and offline? NO
• Multi-Threaded concept or Multi-Processes based on message
queues?o Message based systems allow us to decouple producers from consumers.
o We can spread the work to be done over several processes and machines.
o We can manage/upgrade/move around programs (processes)
independently of each other.
6/25/13 M. Al-Turany, ALICE Offline Meeting 19
ØMQ (zeromq)
• A socket library that acts as a concurrency framework.
• Carries messages across inproc, IPC, TCP, and multicast.
• Connect N-to-N via fanout, pubsub, pipeline, request-reply.
• Asynch I/O for scalable multicore message-passing apps.
• 30+ languages including C, C++, Java, .NET, Python.
• Most OS’s including Linux, Windows, OS X, PPC405/PPC440.
• Large and active open source community.
• LGPL free software with full commercial support from iMatix.
6/25/13 20M. Al-Turany, ALICE Offline Meeting
What does it deliver?
• It handles I/O asynchronously, in background threads. o These communicate with application threads using lock-free data structures,
o Concurrent ØMQ applications need no locks, semaphores, or other wait
states.
• Components can come and go dynamically and ØMQ will automatically
reconnect. o You can start components in any order.
o You can create "service-oriented architectures" (SOAs) where services can
join and leave the network at any time.
• When a queue is full, ØMQ o Automatically blocks senders, or
o Throws away messages, depending on the kind of messaging you are doing
(the so-called "pattern").
6/25/13 M. Al-Turany, ALICE Offline Meeting 21
What does it deliver?
• It does not impose any format on messages. o They are blobs of zero to gigabytes large.
o You can use any other product (Protocol) on top to represent
your data (Google's protocol buffers, etc).
• Applications talk to each other over arbitrary transports:
TCP, multicast, in-process, inter-process. o You don't need to change your code to use a different transport.
6/25/13 M. Al-Turany, ALICE Offline Meeting 22
The built-in core ØMQ patterns are:
• Request-reply, which connects a set of clients to a set of
services. (remote procedure call and task distribution
pattern)
• Publish-subscribe, which connects a set of publishers to
a set of subscribers. (data distribution pattern)
• Pipeline, which connects nodes in a fan-out / fan-in
pattern that can have multiple steps, and loops. (Parallel
task distribution and collection pattern)
• Exclusive pair, which connect two sockets exclusively
6/25/13 M. Al-Turany, ALICE Offline Meeting 23
Current Status
• The Framework deliver some components which can be
connected to each other in order to to optimize data flow
topology.
• All component share a common base called Device (ZeroMQ
Class).
• Devices are grouped by three categories:o Source: Sampler
o Message-based Processor: • Sink, BalancedStandaloneSplitter, StandaloneMerger, Buffer
o Content-based Processor: Processor
6/25/13 M. Al-Turany, ALICE Offline Meeting 24
Panda Example
6/25/13
Experiment/detector
specific code
Framework classes that can be used
directly
M. Al-Turany, ALICE Offline Meeting 25
FairMQ package
Computing Unit
Detector Simulation
Example for Panda online reconstruction hierarchy (scenario)
MVD Pixel data Mvd Strip data
Clusterer Clusterer
REQ
REP REP
REP REP
Tracker
REQ REQSUB
PUB
SUB
SUB
Parameter
databasePUB
SUB
SUB
PUB
SUB
REP
6/25/13 M. Al-Turany, ALICE Offline Meeting 26
Log XPUB Log XPUB
Log XPUB Log Aggregate
Log Writer
XSUB
XPUB
XSUB
Correct semantics for logging
• Pub/Sub sockets
• Never block
• Lossy! (if needed)
• Buffer sizes / locations configurable
• Arbitrary message size
6/25/13 M. Al-Turany, ALICE Offline Meeting 27
Results
• Throughput of 940 Mbit/s was measured which is very close
to the theoretical limit of the TCP/IPv4/GigabitEthernet
• The throughput for the named pipe transport between two
devices on one node has been measured around 1.7 GB/s
6/25/13 M. Al-Turany, ALICE Offline Meeting 28
Each message consists of digits in one panda event for one detector, with size of few kBytes
Payload in Mbyte/s as function of message size
128
Byte
256
Byte
512
Byte
1 kB
yte
2 kB
yte
4 kB
yte
8 kB
yte
16 k
Byte
32 k
Byte
64 k
Byte
128
kByt
e
256
kByt
e
512
kByt
e0
200
400
600
800
1000
1200
1400
10 Gbit 56 Gbit IB
6/25/13 M. Al-Turany, ALICE Offline Meeting 29
ZeroMQ works on InfiniBand but using IP over IB
ZeroMQRoot (Event loop)
6/25/13
FairRootManager
FairRunAna
FairTasks
Init()Re-Init()Exec()
Finish()
FairMQProcessorTask
Init()Re-Init()Exec()
Finish()
ROOT Files, Lmd Files, Remote event server, …
Integrating the existing software:
M. Al-Turany, ALICE Offline Meeting 30
FairBase/examples/Tutorial3
6/25/13 M. Al-Turany, ALICE Offline Meeting 31
Fairbase/example/Tutorial3
Next to implement
• Local and central Log processors
• Command channels and objects (messages)
• Automatic monitoring and configuration
(hopefully till the end of this year!)
6/25/13 M. Al-Turany, ALICE Offline Meeting 32
Summary
• ZeroMQ communication layer is integrated into our offline
framework (FairRoot)
• On the short term we will keep both options ROOT based
event loop and concurrent processes communicating with
each other via ZeroMQ.
• On long Term we are moving away from single event loop
to distributed processes.
Thanks you !
6/25/13 M. Al-Turany, ALICE Offline Meeting 33
Native InfiniBand/RDMA is faster than IP over IB
6/25/13 M. Al-Turany, ALICE Offline Meeting 34
Implementing ZeroMQ over IB verbs will improve the performance.
Device• Each processing stage of a pipeline is occupied
by a process which executes an instance of the Device class
6/25/13 M. Al-Turany, ALICE Offline Meeting 35
Sampler
• Devices with no inputs are categorized as sources
• A sampler loops (optionally: infinitely) over the loaded events and send them through the output socket.
• A variable event rate limiter has been implemented to control the sending speed
6/25/13 M. Al-Turany, ALICE Offline Meeting 36
Message format (Protocol)
• Potentially any content-based processor or any source can
change the application protocol. Therefore, the framework
provides a generic Message class that works with any
arbitrary and continuous junk of memory
(FairMQMessage).
• One has to pass a pointer to the memory buffer, the size in
bytes, and can optionally pass a function pointer to a
destructor, which will be called once the message object is
discarded.
6/25/13 M. Al-Turany, ALICE Offline Meeting 37
New simple classes without ROOT are used in the Sampler (This enable us to use non-ROOT clients) and reduce the messages size.
6/25/13 M. Al-Turany, ALICE Offline Meeting 38
Processor design
6/25/13 M. Al-Turany, ALICE Offline Meeting 39
Processor
N-Data output sockets
N-Data Input
sockets
Log serverCommand
Client
Content-based Processor
• The Processor device has at least one input and one output socket.
• A task is meant for accessing and potentially changing the message content.
6/25/13 M. Al-Turany, ALICE Offline Meeting 40
Message-based Processor
• All message-based processors inherit from Device and operate on messages without interpreting their content.
• Four message-based processors have been implemented so far
6/25/13 M. Al-Turany, ALICE Offline Meeting 41
6/25/13
MVD data
Clusterer
MVD Tracker
MVD data
FairMQBalancedStandaloneSplitter
Clustrer Clustrer Clustrer
FairMQStandaloneMerger
Tracker Tracker Tracker
Example for Fan-out/Fan-in the data path for load balancing
M. Al-Turany, ALICE Offline Meeting 42
6/25/13
MVD data
Clusterer
MVD Tracker
MVD data
FairMQBalancedStandaloneSplitter
Clustrer Clustrer Clustrer
FairMQStandaloneMerger
Example for Fan-out/Fan-in the data path for load balancing
M. Al-Turany, ALICE Offline Meeting 43
MVD Tracker