pool project status gridpp 10 th collaboration meeting radovan chytracek cern it/db, gridpp, lcg aa

14
POOL Project Status POOL Project Status GridPP 10 GridPP 10 th th Collaboration Meeting Collaboration Meeting Radovan Chytracek Radovan Chytracek CERN IT/DB, GridPP, LCG AA CERN IT/DB, GridPP, LCG AA

Upload: marybeth-fleming

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

POOL Project StatusPOOL Project Status

GridPP 10GridPP 10thth Collaboration Meeting Collaboration Meeting

Radovan ChytracekRadovan ChytracekCERN IT/DB, GridPP, LCG AACERN IT/DB, GridPP, LCG AA

Page 2: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 2

What is POOL? What is POOL?

• PPool ool OOf persistent f persistent OObjects for bjects for LLHC HC – develops a common object I/O for High Energy

Physics applications in the LHC era

• Started in April 2002Started in April 2002– In the context of LHC Computing Grid (LCG)

Application Area (AA)

• Joint project of the LHC experiments and Joint project of the LHC experiments and the CERN IT/DB groupthe CERN IT/DB group– Several GridPP funded people actively involved

• Successfully used in productionSuccessfully used in production– LHC data challenges in 2004

Page 3: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 3

POOL project purpose POOL project purpose

• Is to allow storage and retrieval of the multi-PB of Is to allow storage and retrieval of the multi-PB of experiment data and associated meta data in a experiment data and associated meta data in a distributed and Grid enabled fashion distributed and Grid enabled fashion

• Data comes in different volumesData comes in different volumes– Event data, physics and detector simulation,– Detector data and bookkeeping data

• Data comes in various formsData comes in various forms– Bulk data– Time dependent data– Metadata

• This challenge is faced by a hybrid technology approachThis challenge is faced by a hybrid technology approach– C++ object streaming technology for bulk data

• Using ROOT framework– Transactional safe services for catalogs, collections and meta

data• Using RDBMS systems such as Oracle, MySQL, …

Page 4: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 4

POOL architecture POOL architecture

• POOL is a storage technology neutral API POOL is a storage technology neutral API – It is a component based system following the LCG Architecture

Blueprint recommendations

• The POOL is built from SW components where The POOL is built from SW components where thesethese– Implement pure abstract C++ interfaces

• Experiment framework user code is insulated from concrete implementation details and technologies

– Expose minimal dependencies• Weak coupling ensured by interactions only via their

abstract interfaces – Are loaded on demand

• Using the SEAL plug-in management and component model

Page 5: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 5

POOL Work Package breakdown POOL Work Package breakdown

• Storage ManagerStorage Manager– Streams transient C++ objects into/from a storage– Resolves a logical object reference into a physical object

• File CatalogFile Catalog– Maintains the information about POOL accessible data files– Helps the Storage Manager to resolve the physical location of the

data• Resolves a logical reference into a physical data source

– For more details see the talk of Maria Girone later this morning in Grid Data Management track

• Collections Collections – Provides the tools to manage potentially (large) ensembles of

objects stored via POOL persistence services • Explicit: server-side selection of object from query able

collections• Implicit: defined by physical containment of the objects

Page 6: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 6

Interaction between POOL components Interaction between POOL components

POOL API

Storage Service FileCatalog Collections

ROOT I/OStorage Svc

XMLCatalog

MySQLCatalog

RelationalCatalog

ExplicitCollection

ImplicitCollection

RDBMSStorage Svc

EDG Replica Location Service

Page 7: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 7

POOL and the Grid POOL and the Grid

• POOL is Grid aware via the File Catalog component based POOL is Grid aware via the File Catalog component based on the LCG Replica Location Service (RLS) on the LCG Replica Location Service (RLS) – File resolution and meta data queries are forwarded to Grid

middleware requests

• See talks in Grid Data Management Session

• The POOL Storage Manager allows access to a remote file The POOL Storage Manager allows access to a remote file via ROOT framework remote I/O facilitiesvia ROOT framework remote I/O facilities– Such as RFIO or dCache

• POOL Grid access facilities might evolvePOOL Grid access facilities might evolve– The new Grid File Access Library (GFAL) introduces uniform access

to file catalog and mass storage services– GFAL integration into POOL is being discussed by all involved

parties

Page 8: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 8

POOL New DevelopmentsPOOL New Developments

• Changes triggered by evolution of Changes triggered by evolution of foundation librariesfoundation libraries– Integration of the latest new features in SEAL

software– Parallel development of ROOT 4 based storage

service

• New developments due to the new set of New developments due to the new set of use-case and requirementsuse-case and requirements– Prototyping Relational Access back-ends– Implementation of some existing components

using the new Relational Access layer

Page 9: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 9

SEAL & ROOT 4 Related SEAL & ROOT 4 Related DevelopmentDevelopment

• Adapt to the interface changes in SEAL PluginManagerAdapt to the interface changes in SEAL PluginManager– Simplification of plug-in management code

• Pick up new interfaces of SEAL component model Pick up new interfaces of SEAL component model – Improves internal component organization and run-time

configuration– Performed with close collaboration with experiments to ensure

minimal impact on the client code

• Integration with ROOT 4Integration with ROOT 4– Evaluation Work has already started.– Improves support for STL data types– Faster execution thanks to direct calls to ROOT API

• Will prepare a migration plan with the experimentsWill prepare a migration plan with the experiments– Until agreement is reached with experiments on migration, version 3.x will

be used in the production releases.

• POOL 1.7.0 still with ROOT 3.x (we are maintaining a parallel POOL 1.7.0 still with ROOT 3.x (we are maintaining a parallel development branch for bug fixing).development branch for bug fixing).– But we will offer a development version with ROOT 4 as well.

Page 10: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 10

• Motivation: independence from DB vendorsMotivation: independence from DB vendors• Activity started for most parts only in March.Activity started for most parts only in March.

– Requirements collection– Domain decomposition– Draft project plan

• Addressing the needs of the existing POOL Addressing the needs of the existing POOL relational components (FileCatalog, Collection), the relational components (FileCatalog, Collection), the POOL object storage mechanism (StorageSvc) and POOL object storage mechanism (StorageSvc) and eventually also the ConditionsDB (if requested by eventually also the ConditionsDB (if requested by the experiments).the experiments).

• The use-cases and requirements are defined in The use-cases and requirements are defined in close cooperation with experimentsclose cooperation with experiments

POOL Relational Abstraction (I)POOL Relational Abstraction (I)

Page 11: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 11

POOL Relational Abstraction (II)POOL Relational Abstraction (II)

RelationalAccess

ObjectRelationalAccess

RelationalStorageSvc

POOL::StorageService

POOL::FileCatalog POOL::Collection

RelationalFileCatalog RelationalCollection

OracleAccess

SQLiteAccess

MySQLAccess

ODBCAccess

XML, MySQL, EDG Catalog MySQL, Root Collection

Root Storage Service

Page 12: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 12

POOL Relational Abstraction (III)POOL Relational Abstraction (III)

• Relational Abstraction Layer status:Relational Abstraction Layer status:– Base interfaces defined– AuthenticationService implementation provided.– Oracle plug-in implemented and unit-tested using OCI 9– SQLite plug-in implemented. Testing in progress.– ODBC plug-in implementation in progress.– Proof of concept RelationalFileCatalog implemented and

tested using the Oracle plug-in and the FileCatalog component.

• MySQL access is via ODBC MySQL access is via ODBC – Direct Implementation now would run into maintenance

problems as MySQL API will change with MySQL 5 – Until then POOL will access MySQL via the more generic ODBC

plug-in

Page 13: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 13

POOL Object Relational StoragePOOL Object Relational Storage

• Current statusCurrent status– Object/Relational mapping mechanism defined.

• User driven mapping with default rules.• Command line tools which generate and store

the mapping given a set of header files.– Implementation of the mapping I/O almost

complete.

• Next stepsNext steps– Basic object I/O within the next weeks.– Functional POOL Relational Storage Service soon

afterwards.

Page 14: POOL Project Status GridPP 10 th Collaboration Meeting Radovan Chytracek CERN IT/DB, GridPP, LCG AA

June 3rd 2004 GridPP 10th Collaboration Meeting 14

SummarySummary

• POOL highest priority remains the support of this POOL highest priority remains the support of this year’s data challenge and test beam activitiesyear’s data challenge and test beam activities– therefore only a few FTEs dedicated to new major

developments.

• Main POOL development this yearMain POOL development this year– following the developments in SEAL,– integrating with ROOT 4,– the relational abstraction layer,– the relational storage manager.

• Development progress close to the proposed POOL Development progress close to the proposed POOL work planwork plan

• GridPP contribution has made a significant impact GridPP contribution has made a significant impact on POOLon POOL