obk – an online high energy physics’ meta-data repository list of authors: dr. i.alexandrov, dr....

32
OBK – An Online High Energy Physics’ Meta-Data Repository List of authors : Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-Chromek, Dr. M.Caprini, Dr. M.Dobson, Dr. J.Flammer, Mr. R.Hart, Dr. R.Jones, Mr. A.Kazarov, Mr. S.Kolos, Dr. V.Kotov, Dr. D.Liko, Mr. L.Lucio, Dr. L.Mapelli, Mr. M.Mineev, Dr. L.Moneta, Dr. I.Papadopoulos, Ms. M.Nassiakou, Dr. N.Parrington, Mr. L.Pedro, Mr. A.Ribeiro, Dr. Yu.Ryabov, Mr. D.Schweiger, Mr. I.Soloviev, Dr. H.Wolters Presentation by: Levi Lúcio

Upload: shon-hudson

Post on 26-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

OBK – An Online High Energy Physics’ Meta-Data Repository

List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-Chromek, Dr. M.Caprini, Dr. M.Dobson, Dr. J.Flammer, Mr. R.Hart, Dr. R.Jones, Mr. A.Kazarov, Mr. S.Kolos, Dr. V.Kotov, Dr. D.Liko, Mr. L.Lucio, Dr. L.Mapelli, Mr. M.Mineev, Dr. L.Moneta, Dr. I.Papadopoulos, Ms. M.Nassiakou, Dr. N.Parrington, Mr. L.Pedro, Mr. A.Ribeiro, Dr. Yu.Ryabov, Mr. D.Schweiger, Mr. I.Soloviev, Dr. H.Wolters

Presentation by: Levi Lúcio

Page 2: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (1) - CERN

Founded in 1954, CERN (European Organization for Nuclear Research) is a wide international collaboration (80 nationalities);

The objective of CERN is the experimental study of physics, in particular the study of matter and the forces that hold it together;

Within CERN’s lifetime, several important physics discoveries have been made, along with technology breakthroughs such as the WWW.

Page 3: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (2) - Accelerator

The LHC (Large Hadron Collider) accelerator is now being built at CERN to be ready in 2007. It will be the most powerful particle accelerator in the world and will allow breaking new barriers in HEP (High Energy Physics):

Page 4: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (3) - Detectors

Along the accelerator ring, several detectors (4) will be put in place. The ATLAS (A Toroidal LHC ApparatuS) is one of them:

Page 5: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (4) - Physics

Two particle beams travelling in the accelerator in opposite senses at 99.9999997% of the light speed meet head on in the detector, producing new particles;

The interaction (collision) of two particles and their final state products is called an event;

For ATLAS, many events need to be collected to have strong statistics that prove the theory - a very rare particle (Higgs boson) is searched for.

Page 6: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (5) - Triggers

The rate of events at ATLAS will be extremely high - 40 MHz;

Only a fraction of those events (1/107) is interesting - a powerful filter (trigger) is necessary;

This still means 100 events of 1Mbyte each per second - 100MByte/s storage;

The ATLAS is expected to produce 1PByte/year of event data.

LVL1

LVL2

HLT

40 MHz

100 KHz

1 KHz

100 Hz (100 M/s)

DBMS

Page 7: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Introduction (6) - OBK Online Book-keeper

Part of the Online Software system - online control, configuration and monitoring of the detector and triggers (thousands of machines);

Records and manages log data (meta-data) about the detector and trigger chain (diversified information);

Project undertaken in 1996 by the Lisbon FCUL / ATLAS group - L.Lucio, L.Pedro, A.Amorim, A.Ribeiro

ATLAS detectorand triggers

Online Software

OBK

DBMS

Page 8: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Databases in HEP (1) - History

Before the 1980s - database market not mature to handle size and complexity; in-house solutions in FORTRAN;

1980s - relational solutions to handle book-keeping data; interest in OO persistent data model;

1990s - standardization of OO databases (ODMG); investigation and consequent usage of commercial Objectivity/DB by LHC and other HEP experiments;

2002 - LHC experiments dropped Objectivity/DB and are searching for alternatives - Oracle 9i, homegrown ROOT?

Page 9: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Databases in HEP (2) Today’s needs

Management of large amounts of data (petabytes);

Support of addition of significative quantities of data on a daily basis;

Support of simultaneous queries;

Support of data access over international networks;

Flexible data model supporting versioning and schema evolution;

Adequate interfacing to tertiary storage.

Page 10: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Databases in HEP (3) Today’s trends

Indecision between homegrown (OO ROOT) or external (OR Oracle 9i) databases;

Not clear what data model to use (pure OO, Object-Relational?);

Heavy research on data distribution - replication, interfacing with GRID;

Homegrown/external

Data model OO/OR

Distribution

Page 11: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

The OBK (1) - Definition

Defined in the ATLAS technical proposal as the

component that “archives information about the data recorded to permanent storage by the data acquisition system. It records the information to be later used during data analysis1 on a per-run2 basis (run cataloger). It provides interfaces for retrieving and updating the information.” 1After being collected, event data is analyzed “manually”. 2A data taking period with a given machine parameterization.

Page 12: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

The OBK (2) Development approach

Prototypical spiral (3 prototypes - OBK/Objectivity, OBK/OKS and OBK/MySQL);

Well defined software development process + documentation production;

Usage of development support tools: CVS, CMT (platform management), Perl, Rose, documentation templates, etc.

Integration

Requirementsgathering

High leveldesign

Implementation

Testing

Requirementsdocument

DB and codediagrams

Developer anduser manuals

Test report

Page 13: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

The OBK (3) Online Software context

The OBK is part of the Databases super-component of the Online Software:

LVL1

Detector

DataFlow

SCADA

LVL2

EF

Online Sw.

Run Control

Messaging

Monitoring

Databases

Ancilliary

Page 14: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Requirements gatheringMain Use Cases

Data acquisition:Data acquisition: After being started with the Online Software, the OBK will acquire the specified data in an automatically without human intervention;

Information updating:Information updating:Users will want to add their own annotations to the acquired data;

Data access:Data access:It will be possible for several kinds of clients, such as humans, applications or offline data analysis frameworks to access the database adequately;

Data administration:Data administration:Users will want to manage and administrate the OBK database.

Page 15: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

High level design (1) Package overview

ISISInformation System

MRSMRSMessage Reporting System

ConfDBConfDBConfiguration Databases

DBMS

OBK acquisition software

C++ APIWeb BrowserAdministrative tools

Online Software

IS MRS ConfDB

Page 16: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

High level design (2)Logical database structure

Partition:Partition: subset of the detector and triggers that can acquire data independently.

Partition n-1 Partition n Partition n+1

Run n Run n+1Run n-1

AnnotationsIS

Meta-infoConfiguration

DataMRS

MessagesIS

Messages

Page 17: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation Languages and tools

C++ programming languageC++ programming languageUsed to code all OBK acquisition engines (including connections to the DBMSs) and API software;

STL (Standard Template Library)STL (Standard Template Library) Data containers and algorithm templates used as building blocks for C++ applications;

Objectivity/DBObjectivity/DBCommercial distributed object oriented database management system;

OKSOKSIn-memory persistent object manager implemented in-house to satisfy ATLAS’ needs in terms of configuration databases;

MySQLMySQLOpen source relational database management system;

PHPPHPGeneral purpose scripting language, specially adequate for web programming;

PerlPerlGeneral purpose scripting language;

ApacheApacheWidely used HTTP server.

Page 18: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation Objectivity prototype(1)

1

OBKSLCRun

1..*OBKRun

OBKRunWEvents

OBKAnnotation

OBKAuthor OBKMRSParam

OBKMRSMessage

OBKISInfo

OBKISAttribute

OBKISAttrBasic

OBKISAttrArray

OBKISDocument

OBKConfFiles

0..*

1 1

1

1

1

1

1

1..*

0..*

0..*

0..*

0..*

Federation

Database

Container

Object

Partition

Run

Object

class OBKRun : public ooContObj {

public: Run (); Run (uint32 runNumb); void setRunNumb (uint32 runNumb); uint32 getRunNumb ();

ooRef(OBKComment) runComms[] <-> commToRun[]; ooRef(Coordinator) runCoordinator <-> rCoordinated[]; ooRef(LockedStatus) runToLStat[] <-> lStatOfRun[]; ooRef(OBKConfdb)hasConfig <-> appliesToRuns[];

protected: uint32 m_runNumb; d_Timestamp m_startDate; d_Timestamp m_endDate;};

Inherits fromObjectivity class

References to accesspersistent objects

Page 19: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation Objectivity prototype(2)

CommentsComments Objectivity/DB makes available specialized engines to handle

connections and concurrency;

Very good integration between code and DMBS - minimal difference between persistent and transient objects;

The prototype makes use of Objectivity/DB transactions. A new transaction is started for each new run;

Objectivity/DB’s locking mechanism is used explicitely in the code to avoid incoherent reads/writes. MROW (Multiple Readers One Writer) facility used to read data as soon as it is written.

Page 20: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation OKS prototype(1)

Data model includes XML data files and objects; A data file is either in-memory or in disk (atomic loads); Database schema equivalent to OBK/Objectivity’s one.

DB Root (File System)

Partition 1 Partition 2 Partition 3

Run 21 Run 22 Run 23

Federationdata file

Partitiondata files

Rundata files

OksClass *PartitionInfo = new OksClass( "PartitionInfo",

false);{ OksAttribute *partitionName = new OksAttribute( "partitionName",

OksAttribute::string_type,false,"unknown",true);

PartitionInfo->add(partitionName); PartitionInfo->add(inUse);}

Definition of anew OKS “object”

Page 21: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation OKS prototype(2)

CommentsComments OKS is a persistency C++ library. No services other than the

ones included in the library are made available;

No concurrency management is available. In the OBK case concurrency was implemented using OS mechanisms;

No transactions are available. At the beginning of each run new data files are opened and at the end of the run closed;

The prototype includes optimized accesses to certain parameters which are very requested. They are kept in a special central data file (cache).

Page 22: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation MySQL prototype(1)

Relational modelRelational model: completely different database schema from previous OO approaches.

MYSQL *sock,mysql;MYSQL_RES *res;MYSQL_ROW tmp;string selectqbuf;char * date;

selectqbuf = ("SELECT MAX(StartDate) FROM run WHERE PartitionID =" + PartitionId);

if(mysql_query(sock,selectqbuf.c_str())){ userMessaging->m_obkErr(new string("Query: " + selectqbuf + " failed! " + (string)mysql_error(sock)),2); } if(!(res = mysql_store_result(sock))) {

userMessaging->m_obkErr(new string ("Couldn't get result from query: " + (string)mysql_error(sock)),2);

}

tmp = mysql_fetch_row(res); date =tmp[0];}

Query executionrequest to the

engine

Page 23: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation MySQL prototype(2)

CommentsComments As Objectivity/DB, MySQL also makes available an engine to deal with

queries;

Concurrency issues are managed transparently by the MySQL engine;

Transactions and atomic operations are made available by the MySQL engine - not used by the OBK though;

Indexes on certain key tables were created to accelerate queries (up to a factor of 45 speed difference);

XML used to deal with the difficulty of storing collection types.

Page 24: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Implementation - Data Access

Command line dumpCommand line dumpDebug situations, not many available resources;

C++ APIC++ APIShared library; uses STL for return structures;

Web-based browserWeb-based browserMore sophisticated, includes administrative tools. Heavier on resources than previous solutions.

Page 25: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Performance & Scalability (1)

While for the OBK/Objy and the OBK/OKS store times rise (check if the run already exists), the OBK/MySQL presents low and constant store times.

OBK/Objy

OBK/OKS

OBK/MySQLAll tests performed in unloaded linux7.1/gcc2.96 PIII/800MHz

Page 26: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Performance & Scalability (2)

The OBK/OKS is the fastest - the operation takes place in memory, no I/O accesses.

OBK/Objy

OBK/OKS

OBK/MySQL

Page 27: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Performance & Scalability (3)

While being accessed simultaneously by multiple IS servers the OBK/OKS presents the best performance - fastest IS storing time. Also, the Online Software is affected by OBK’s performance;

Worst performance in storage space by OBK/Objy - a container always allocates a predefined number of fixed size pages, even if they are not used.

Prototypevs. #of ISservers

10 20 50 100

OBK/Objy OK OK, butsometimesthe IS/MRSservers getblocked

Onlinesoftwarebecomesblocked

-

OBK/OKS OK OK OK, butsometimesthe IS/MRSservers getblocked

Onlinesoftwarebecomesblocked

OBK/MySQL

OK OK, butsometimesthe IS/MRSservers getblocked

Onlinesoftwarebecomesblocked

-

OBK/ Objy OBK/ OKS OBK/ MySQL

63.4 M 3.7 M 2.11 M

Page 28: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Performance & Scalability (4)

Best results by OBK/MySQL, due to the efficiency of the MySQL query engine, faster than hand-coded queries in the OO prototypes;

In query 2 the OBK/Objy presents the worse performance - the parameters which are searched for are cached in the case of OBK/Objy and OBK/OKS;

Local Remote

OBK/Objy 0.13 0.94OBK/OKS 0.38 1.42OBK/MySQL 0.39 0.70

Local Remote

OBK/Objy 18.13 116.08OBK/OKS 0.35 1.18OBK/MySQL 0.02 0.08

Page 29: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Performance & Scalability Overall Results

Best overall results by the MySQL OBK prototype;

Strong results also from the OKS OBK prototype, mainly due to its in-memory features;

Less optimal results achieved by the Objectivity/ DB prototype - requires deep know-how to be properly tuned.

Page 30: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Deployment

Large scale tests of the Online Software (simulated environment): 2001 (OBK/OKS): 111 nodes running on 111 machines; 2002 (OBK/MySQL): 210 nodes running on 210 machines.

Testbeams (real data acquisition with parts of the detector running): 2000 (OBK/Objy): 2 Gbytes acquired; 2001 (OBK/OKS): 3 Gbytes acquired; 2002 (OBK/MySQL): 5 Gbytes acquired (still running).

Page 31: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Some metrics

Effort(man/month)

Lines of Code

OBK/Objy 18 4166

OBK/OKS 4 8799

OBK/MySQL 3 6193

Effort(man/month)

OBK/Objy 1

OBK/OKS 1

OBK/MySQL 0.5

Requirements gathering: 2 man/month, 2 documents produced.

Documentation: 3 man/month, User & Developer’s manual,Test report.

Page 32: OBK – An Online High Energy Physics’ Meta-Data Repository List of authors: Dr. I.Alexandrov, Dr. A.Amorim, Ms. E.Badescu, Ms. M.Barczyk, Ms. D.Burckhart-

Levi Lucio - CERN EP/ATD, FCUL

Lessons learnt

Software Development ProcessSoftware Development ProcessFollowing a formal approach to the development of the three prototypes yielded: easy comparison of the OBKs; diminishment of the effort to build the latter prototypes; delivery of a quality OBK product;

TechnologyTechnologyOO DBMS technology is very flexible in terms of data mapping and provides natural integration with programming languages. It is possible to follow an OO development approach both for application and database. RDBMS technology is less elegant but very efficient…

Interaction with usersInteraction with usersGood and constant interaction with the final users of the system makes development simpler and faster. Continuous enhancement of the knowledge about the systems and the people the software interacts with is essencial while putting the problem under perspective.