persistency at lhc vincenzo innocente cern history is as old as persistency
Post on 18-Dec-2015
217 views
TRANSCRIPT
Persistency at LHCPersistency at LHC
Vincenzo InnocenteCERN
History is as old as PersistencyHistory is as old as Persistency
April 18, 2023 Vincenzo Innocente LCB workshop
2
Sources and ContributionsSources and Contributions
Presentations at last RD45 workshop Presentations at the “Architecture Working
Group” Experiments’ Web pages Contributions to this Workshop
Focus on LHC experiments’ prototypes New generation experiments (BaBar, STAR, RunII)
experience and plans
April 18, 2023 Vincenzo Innocente LCB workshop
4
Process 1Process 1Process 2Process 2
Process 3Process 3
PermanentPermanentStorageStorage
VolatileVolatileMemoryMemory
Persistency: what for?Persistency: what for?
A process saves its state to be later re-used by the same process a different process
running the same executable
a different process running a different executable
Ideal persistency:Core Dump!
April 18, 2023 Vincenzo Innocente LCB workshop
5
Use CasesUse Cases
Extended (in space and time) virtual memory: proprietary format optimized for computational and
storage performance of a single application Import/Export in a heterogeneous environment
“standard” application-independent format conversion to/from internal application format
Management of different versions (identification, query mechanism) and of concurrency (locking) proprietary internal mechanism rely on the file system DBMS
April 18, 2023 Vincenzo Innocente LCB workshop
6
Use CasesUse Cases
Extended (in space and time) virtual memory: proprietary format optimized for computational and
storage performance of a single application
Import/Export in a heterogeneous environment “standard” application-independent format conversion to/from internal application format Management of different versions (identification,
query mechanism) and of concurrency (locking) proprietary internal mechanism rely on the file system DBMS
Conversion is always requiredWhat makes the difference is at which level is done
•Operating System (or below)•Persistency Service Provider•Application Framework•Application Code
Doing at a given level does not imply that it has not been done also at a lower level
Doing it at higher levels introduces flexibility but reduce performances
Doing it at a lower level improves performances but requires high integration (binds to a given solution)
Caveat
Concurrency is not only for banks...Myprog.cc changed on disk; really edit the buffer?
(emacs not oracle)
Caveat
April 18, 2023 Vincenzo Innocente LCB workshop
7
PermanentPermanentStorageStorage
VolatileVolatileMemoryMemory
Object PersistencyObject Persistency
Objects are atomic entities have a state
(data members including relationships)
provide services (methods)
Persistent objects survive process boundaries
when “retrieved” have the same state provide the same services
as they were “stored”
Event
Event
EventEvent
Event
April 18, 2023 Vincenzo Innocente LCB workshop
8
Object PersistencyObject Persistency
Persistency Objects retain their state between two
program contexts Storage entity is a complete object
State of all data members Object class
OO Language Support Abstraction Inheritance Polymorphism Parameterised Types (Templates)
April 18, 2023 Vincenzo Innocente LCB workshop
9
OO Language BindingOO Language Binding
User had to deal with copying between program and I/O representations of the same data User had to traverse the in-memory structure User had to write and maintain specialised code for
I/O of each new class/structure type Tight Language Binding
ODBMS allow to use persistent objects directly as variables of the OO language
C++, Java and Smalltalk (heterogeneity) I/O on demand: No explicit store & retrieve calls
April 18, 2023 Vincenzo Innocente LCB workshop
10
Problems with Naïve OPProblems with Naïve OP
Storing services (methods ready to run) is non trivial persistency services are just object-data store configuration management takes care of code frameworks can use dynamic loading to match data & code
Clean and performant object design is difficult: Different (partial) representations of the state of an object
may be required to cope with computational, storage and I/O efficiencies (and code development efficiency)
Object design and implementation evolve, persistent objects stay the same “Old” persistent objects need to be converted
April 18, 2023 Vincenzo Innocente LCB workshop
11
More Problems with Naïve OPMore Problems with Naïve OP
Object granularity does not match raw I/O granularity (which in turn is device dependent) small objects should be physically clusterized
according to users’ access patterns Object logical relationships do not necessarily reflect
access patterns (old rows vs columns dilemma) How objects become persistent
At construction time (user can control clustering) By reachability: An object becomes persistent when
“attached” to an already persistent object (clustering control difficult)
April 18, 2023 Vincenzo Innocente LCB workshop
12
Physical Model and Logical ModelPhysical Model and Logical Model
• Physical model may be changed to optimise performancePhysical model may be changed to optimise performance• Existing applications continue to workExisting applications continue to work
April 18, 2023 Vincenzo Innocente LCB workshop
13
Realistic Object PersistencyRealistic Object Persistency
filefile
pagepage
objectobject
objectobject
pagepage
objec
ts
objec
ts
compression?compression?Conversion from/to Conversion from/to
computational computational optimal format?optimal format?
Conversion from/to Conversion from/to machine dependent formatmachine dependent formatnew shapenew shape
April 18, 2023 Vincenzo Innocente LCB workshop
14
Components of a POMComponents of a POM
Storage manager manage the physical structure on “disk”
Transaction/concurrency manager client transaction, journaling, locking mechanisms (or rely on OS and file system protections)
RTTI system identifies the concrete type of object to
retrieve/store Converters
from storage format to “user” format and viceversa machine-dependencies, schema-evolutions, user-hooks
April 18, 2023 Vincenzo Innocente LCB workshop
15
Components of a POMComponents of a POM
Application Cache manager dynamic memory management with garbage
collection Tools and (G)UI
naming, indexing, query mechanisms interactive browsing and query development tools administration tools
April 18, 2023 Vincenzo Innocente LCB workshop
16
Objectivity/DBObjectivity/DB
ODBMS close to ODMG standard (library not framework) Storage Manager based on fixed physical hierarchy
slot-page-container-database(file)-federation Lock-server and journals to manage transactions Proprietary parsing of extension of C++ (ooddlx) Objects are converted when “opened”
schema-evolution effects: automatic or user defined Basic naming, indexing and query mechanisms Crude Browsing and administration tools
but Objy is integrated with some third-party frameworks
April 18, 2023 Vincenzo Innocente LCB workshop
17
ROOTROOTApplication Framework with embedded I/O
Storage Manager based on logical hierarchy Tbasket-branch-tree physical “logical-records” in files
No transactions, no concurrency management Parsing of C++ subset via CINT Objects are converted when retrieved (Streamer)
Automatically or by user (schema-evolution only by user) Basic naming, indexing or query mechanisms
and CINT scripting “Paw”erful interactive environment
April 18, 2023 Vincenzo Innocente LCB workshop
18
(Wrapped O)RDBMS(Wrapped O)RDBMS
Powerful, reliable and efficient storage managers with full concurrency and transaction management
SQL query mechanisms with transparent (hidden) indexing and naming
User friendly, fully integrated browsers and tools(for relational tables)
Poor object integration(developers should be both OO and ER experts at the
same time)
April 18, 2023 Vincenzo Innocente LCB workshop
20
HEP DataHEP Data
Environmental data Detector and Accelerator status Calibrations, Alignments
Event-Collection Meta-Data(luminosity, selection criteria, …)
… Event Data, User Data
Event Event CollectioCollectio
nn
CollectioCollectionn
Meta-Meta-DataData Event Event
ElectronsElectrons
Tracker Tracker AlignmenAlignmen
tt
TracksTracks Ecal Ecal
calibratiocalibrationn
User TagUser Tag(N-tuple)(N-tuple)
April 18, 2023 Vincenzo Innocente LCB workshop
21
Environmental DataEnvironmental Data
timeVersion A
Version BCalibration
Version B
Version A
Alignment
Version C
Version B
Version A
Geometry
Version C
Snapshot for Environmental data items valid for the currently processed event.
Parameters
April 18, 2023 Vincenzo Innocente LCB workshop
22
Event Structure & Placement (BaBar)Event Structure & Placement (BaBar)
EventHeader
Hdr
Raw
Rec
Esd
Aod
SimHeader
RawHeader
EmcHeader
TrkHeader
PidHeader
BetaHeader
SimData
RawData
EmcData
EmcData
TrkData
TrkData
TrkData
PidData
PidData
PidData
BetaData
BetaData
Tag TagEvs
Databases
Sim
April 18, 2023 Vincenzo Innocente LCB workshop
23
BaBar Event StructureBaBar Event Structure Decoupling of placement & navigation Hierarchical Placement Regions
Sim (Simulated Data). ~100kBytes/event Tru (Simulated Truth Data) ~40kBytes/event Raw (Raw Data) ~30kBytes/event Rec (Reconstructed Data) ~100kBytes/event Esd (Event Summary Data) ~20kBytes/event Aod (Analysis Object Data) ~2kBytes/event Tag (Event Selection Tag) ~200Bytes/event
Navigation Trees Minimize size of navigation headers Allow for expansion of data without schema
evolution
April 18, 2023 Vincenzo Innocente LCB workshop
25
Dynamic Load Balancing Dynamic Load Balancing Hierarchical Secure AMSHierarchical Secure AMS
Dynamic
Selection
April 18, 2023 Vincenzo Innocente LCB workshop
26
ODBMS-MSS IntegrationODBMS-MSS Integration
SLAC-Objy Plan Extensible AMS
Allows use of any type of filesystem via oofs layer Generic Authentication Protocol
Allows proper client identification Opaque Information Protocol
Allows passing of hints to improve filesystem performance Defer Request Protocol
Accommodates hierarchical filesystems Redirection Protocol
Accommodates terabyte+ filesystems Provides for dynamic load balancing
April 18, 2023 Vincenzo Innocente LCB workshop
27
One Technology for All ?One Technology for All ?
Event catalogues Update (add and remove) items of a catalogue Searchable: SQL or equivalent
Event data Write once-read many (WORM) Often on tertiary (sequential) storage Bulk data used by the entire collaboration
(Raw, Rec,…) User extracted data (N-tuples)
April 18, 2023 Vincenzo Innocente LCB workshop
28
One Technology for All ?One Technology for All ?
Detector data Updates of data items Versioning of data items Version configuration
Statistical data Understandable by interactive tools
A single coherent solution (non optimal for all purposes)
orAd-hoc optimal product for each given type?
April 18, 2023 Vincenzo Innocente LCB workshop
29
OutputStream
LHCbLHCb Event Persistency Event Persistency
Transient Event Store
Event DataService
PersistencyService
Sicb dataFiles
AlgorithmAlgorithm
SicbCnvSvc
RootCnvSvc
Root dataFiles
ConverterConverterConverter
ConverterConverterConverter
Sic
b/Z
ebra
Root
I/O
OutputStreamAppManager
April 18, 2023 Vincenzo Innocente LCB workshop
30
LHCbLHCb Generic Persistent Model Generic Persistent Model
Link ID Link Info
DB/Cont.name... ...
Storage TypeClass IDEntry IDLink ID
Converter
Technology
12ByteOID
<number>(1)
(2)(3)(4)
Lookup table
April 18, 2023 Vincenzo Innocente LCB workshop
31
LHCbLHCb Link Tables Link Tables
One Link table per Storage technology per DB
Link to Objy object no link table 8 Bytes are enough to hold ooRef directly
Link to ROOT object Link table entry must contain all navigation info
• File name• Tree/Branch name
Link to ZEBRA (SICB) object Link Table contains file name + ZEBRA bank name
April 18, 2023 Vincenzo Innocente LCB workshop
32
Hybrid Event Store in STARHybrid Event Store in STAR Adoption of ROOT I/O for the event store leaves Objectivity with one
role left to cover: the true ‘database’ functions of the event store Navigation among event collections, runs/events, event components Data locality (now translates basically to file lookup) Management of dynamic, asynchronous updating of the event store from
one end of the processing chain to the other From initiation of an event collection (run) in online through addition of
components in reconstruction, analysis and their iterations
But with the concerns and weight of Objectivity it is overkill for this role.
So we went shopping… looking to leverage the world around us, as always and eyeing particularly the rising wave of Internet-driven tools and open
software and came up with MySQL in May.
April 18, 2023 Vincenzo Innocente LCB workshop
33
MySQL data catalog
User request: Run123
User request: Yr1Central
Dataset components (file references with event ranges)
Grand Challenge
Mana
ged
Retr
ieva
l
HPSS
DS
T
ROOTDisk file
RA
W D
AQ
Flo
w t
ag
ROOTDisk file
A nalys isD
ata
set
look
up
New c om ponents ,tags c reated as new
files and added tocatalog (to original
or new dataset)
DS
T hits
Hig
h P
t uD
ST
ROOTDisk file
Flo
w u
DS
T
ROOTDisk fileH
igh P
t ta
g
ROOTDisk file
Data Retrieval an d An alysis:MySQ L + F iles
or
File based data store
April 18, 2023 Vincenzo Innocente LCB workshop
35
ATLASATLAS
Used Objectivity in several test-bed applications HCAL test-beam ATLFAST++ 1TB Milestone (HPSS used as MSS)
Plan to use Objectivity in future test-beams and MonteCarlo reconstruction
The application framework will provide a “database” independent interface
April 18, 2023 Vincenzo Innocente LCB workshop
36
CMSCMS
Uses Objectivity in production Test Beam DAQ Montecarlo (GEANT3) reconstruction
Objectivity fully integrated in Application Framework (CARF) CARF manages transactions, physical clustering and
the whole persistent object structure and its relations with the transient structure
users access persistent objects through C++ pointers
CARF takes care of pinning leaf inheritance from ooObj often used
April 18, 2023 Vincenzo Innocente LCB workshop
37
CMSCMS
Limited use of Objectivity “extentions” associations, indexes, maps, query predicates,
etc. object copy, move, versions
Schema evolution routinely used No complex object conversion attempted so far
Multi-federation environment to decouple production analysis development
April 18, 2023 Vincenzo Innocente LCB workshop
38
ALICEALICE
Simulation and reconstruction framework fully integrated in ROOT
Used in MonteCarlo simulation and reconstruction
Will be Used in TestBeams Mockup Data Challenge done: 7 TB in seven days
Use HPSS and/or CASTOR for file management
April 18, 2023 Vincenzo Innocente LCB workshop
39
ALICE DC IIALICE DC II
NA 57 data source9 PowerPC AIX
LDCLDCLDCLDCLDCLDCLDCLDCLDC Switch
5 MB/s
Intel/PC Linux + PowerPC /AIX +Sun
LDCLDCLDCLDCLDCLDCLDCLDCLDC
Switch
10 MB/s
ALICE DAQ data source
Switch
Computer Centre
Intel/Linux PC Cluster 10/15 nodes
DATE=GDC+LDC
GDCEvent Builder
ROOTObjectifier
pipe
HPSS CASTOR ??
10MB/s GB eth
GB eth
April 18, 2023 Vincenzo Innocente LCB workshop
40
LHCbLHCb
Do not want to limit to one persistency technology Speed, when you need speed Functionality, when you need functionality Ease migration to upcoming (superior) technologies
Independence Well defined interface to persistency technologies Interface: abstract technology independent API Example: ODBC for relational DBs
April 18, 2023 Vincenzo Innocente LCB workshop
41
LHCbLHCb
LHCb application framework (GAUDI) is independent from persistent technology
Manage its own application caches (data services) specialized in event data detector data statistical data
Abstract interface for user provided converters
April 18, 2023 Vincenzo Innocente LCB workshop
42
BaBarBaBar
Taking data since May Use Objectivity for all kind of data
many home made tools to manage the database Complete decoupling between transient
objects (seen by end user) and their persistent representations
No schema evolution (explicit renaming of classes)
Starts using multiple-federations to decouple running environments
April 18, 2023 Vincenzo Innocente LCB workshop
43
STARSTAR
Moved away from Objectivity mainly because of configuration management issues
Hybrid solution: ROOT for event file MySQL for event catalog and environmental data MySQL under test for event tags as well
HPSS (through Grand Challenge) for tertiary storage management
April 18, 2023 Vincenzo Innocente LCB workshop
44
Objectivity Burdens in STARObjectivity Burdens in STAR
The list of burdens imposed by Objectivity grew as our experience and lessons from BaBar mounted
Management, development burden imposed by ensuring consistent schema in a single experiment-wide federation
Schema evolution unusable if forward compatibility is desired (ability to run old executables on new data)
Do-it-yourself access control, particularly with AMS Risk of major impact from platform lock-in due to porting
delays; both Linux and Sun Scalability concerns (fall ‘98) -- lock manager performance
issues in parallel usage?
April 18, 2023 Vincenzo Innocente LCB workshop
45
Requirements: STAR 8/99 View Requirements: STAR 8/99 View Requirement Obj 97 Obj 99 ROOT 97 ROOT 99
C++ API OK OK OK OK
Scalability OK ? No file mgmt MySQL
Aggregate I/O OK ? OK OK
HPSS Planned OK? No OK
Integrity, availability OK OK No file mgmt MySQL
Recovery from lost data OK OK No file mgmt OK, MySQL
Versions, schema evolve OK Your job Crude Almost OK
Long term availability OK? ??? OK? OK
Access control OS Your job OS OS, MySQL
Admin tools OK Basic No MySQL
Recovery of subsets OK OK No file mgmt OK, MySQL
WAN distribution OK Hard No file mgmt MySQL
Data locality control OK OK OS OS, MySQL
Linux No OK OK OK
April 18, 2023 Vincenzo Innocente LCB workshop
46
Fermi RUNII (CDF & DØ)Fermi RUNII (CDF & DØ)
Sequential access model based on RUNI experience focus on efficient data access from hierarchical storage clustering optimized to largest data volume access
pattern Use
ROOT (CDF), EVpack (modified DSPACK) (DØ) for event files (MSQL and Oracle8 evaluated by DØ)
just I/O back-ends to EDM and DØOM DØ uses SAM for event catalog and file management
Oracle8 supporting database
April 18, 2023 Vincenzo Innocente LCB workshop
47
Data Organization Data Organization (Fermi RunII)(Fermi RunII)Data Organization Data Organization (Fermi RunII)(Fermi RunII)
Physical Clustering
Metadata
EventInformationTiersWarm
Cache
User and physics group(derived) data
From Oct 1997 Review - Lee Lueking
April 18, 2023 Vincenzo Innocente LCB workshop
48
Data Access Data Access (Fermi RunII)(Fermi RunII)Data Access Data Access (Fermi RunII)(Fermi RunII)
Mass Storage Pipeline Consumers
=Disk Storage
=Tape Storage
=File
=Event
=Data flow =Group of Users
=Single User=Pipeline Name
Lee Lueking - October 1997
April 18, 2023 Vincenzo Innocente LCB workshop
49
15.0 Stage IV 2/23/98
3.512.5
24.665340
11.6 44.2297000 300 33
23.1 15.2777784.3
2.5 8.6
2.5
118800 62402.8 2.8
0.250
5940 11880196.4
Dynamic Data Buffer
(R4)
RAWData
Archive(R1)
ReconstructionFarm (P-1)
AnalysisData(R2)
User Analysis
Disk(R5)
Static (R3)Data BufferThumbnail
Derived Data
Cache(R6)
Analysis
Processing(P-3)
On Demand FIles
Freight Train Data
EDU50
EDU250
PickEvents(P-2)
PickHot
Cache (R8)
PickWarm
Cache (R7)
To PickHot
To PickHot
PickEvents(P-4)
Season IV - aggregate bandwidths, summed from spreadsheet
April 18, 2023 Vincenzo Innocente LCB workshop
51
Toward 2001 MilestoneToward 2001 Milestone
“If the ODBMS industry flourishes it is very likely that by 2005 CMS will be able to obtain products, embodying thousands of man-years of work, that are well matched to its worldwide data management and access needs. The cost of such products to CMS will be equivalent to at most a few man-years. We believe that the ODBMS industry and the corresponding market are likely to flourish. However, if this is not the case, a decision will have to be made in approximately the year 2000 to devote some tens of man-years of effort to the development of a less satisfactory data management system for the LHC experiments.”
(CMS Computing Technical Proposal, section 3.2, page 22)
April 18, 2023 Vincenzo Innocente LCB workshop
52
Commercial vs Open SourceCommercial vs Open Source
Robust, tested, maintained, well documented (is stable)
Response to upgrade requests is often slow
They can not jeopardize deployed application
priority given to short term profit
difficult to understand internal details (no source)but in principle documentation
should be enough can go out of business
Good enough for physicistsRequire internal certification
Response to upgrade even too fast
old users usually ready to jump on new features
priority given to challenging requests...
Open source often you need it….
Author could get bored
April 18, 2023 Vincenzo Innocente LCB workshop
53
ODBMSODBMSObjectivity seems to satisfy HEP technical
requirements Needs upgrade for
VLDB support Mass storage interface remote access and data distribution
Not really a DBMS. More a DB access layer requires to be integrated (or interfaced) to application
frameworks and to administration tools It is the only real ODBMS survivor on the market
how long it will last?
April 18, 2023 Vincenzo Innocente LCB workshop
54
ROOTROOT A physics analysis framework with I/O support
Classified also as a rapid-prototyping tool (B.Meyer) Not sufficient for the management of large data
volumes (LHC major requirement) an external DBMS is required to manage Meta-Data
Limited experience so far (as POM in production) Many motivated users actively supported by the
authors Requires major architectural changes to make it
“modular” for those who do not want to use it as a framework
April 18, 2023 Vincenzo Innocente LCB workshop
55
Yet Another POMYet Another POM
Prototype required to uncover requirements understand problems estimate development effort
Usable as test-bed before asking upgrades to a commercial partner
Usable as “light-pom”? no transaction, no journaling, no schema, just
data…
April 18, 2023 Vincenzo Innocente LCB workshop
56
Personal CommentsPersonal Comments Event Data
object modeling and direct navigation OK DBMS tools (query processor, smart-association,
index, names, versions) more a burden than an help
Event Catalog, Environmental data, Detector description fit better standard (O)DBMS practices and tools
Statistical data simple I/O is not enough, need direct relations with
event catalog and event data Relation models do not suite HEP applications
April 18, 2023 Vincenzo Innocente LCB workshop
57
Personal CommentsPersonal Comments
“Applications require to be independent of underling technologies” Migration to a new technology should imply a
finite effort: Market survey 0.5PY Learning 1 year Implementation 1PY User Migration 0?
(P stands for Person not Peta!)
April 18, 2023 Vincenzo Innocente LCB workshop
58
Personal CommentsPersonal Comments
My personal LEP experience brought me to the conclusion that a multitude of persistency solution are difficult to manage and integrate properly.
In particular a file-based event-store (with filenames encoding metadata) does not scale.
My current (limited) experience tends to convince me more and more that a coherent approach to persistency is the only solution for LHC given the resource constrains we have
April 18, 2023 Vincenzo Innocente LCB workshop
59
Discussion ItemsDiscussion Items
What kind of POM should be used for raw-data reconstructed-data user-data meta-data environmental data
Is a coherent solution possible? Hybrid Solution
Have we evaluated all technical risks?
April 18, 2023 Vincenzo Innocente LCB workshop
60
Discussion ItemsDiscussion Items
Persistency & Framework which integration conversion & transient cache: who should be
responsible for? Hierarchical storage:
Does it impose constrains on data model and access model (and eventually on the POM)?
What we have learned in using Objectivity ROOT RDBMS
April 18, 2023 Vincenzo Innocente LCB workshop
61
Discussion ItemsDiscussion Items
Non-technical Risks: benefit and risks in choosing between a
Commercial and an Open-Source solution 2001 Milestone
What should really be decided in 2001 Do we understand all technical aspects
involved in choosing a POM? Do we need any further R&D?