outline: arda services lhcb mini-workshop on data management and production tools ph.charpentier m...

15
Outlin e: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier The ARDA RTAG The ARDA services The proposed project Mapping to LHCb services

Upload: chad-miller

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management3 ARDA Schedule and Makeup m Alice: Fons Rademakers and Predrag Buncic Atlas: Roger Jones and Rob Gardner CMS: Lothar Bauerdick and Lucia Silvestris LHCb: Philippe Charpentier and Andrei Tsaregorodtsev LCG GTA: David Foster, stand-in Massimo Lamanna LCG AA: Torre Wenaus GAG: Federico Carminati

TRANSCRIPT

Page 1: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

Outline:

ARDA servicesLHCb mini-workshop on Data Management and Production ToolsPh.Charpentier

The ARDA RTAG The ARDA services The proposed project Mapping to LHCb services

Page 2: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 2

ARDA Mandate

Mandate for the ARDA RTAG

• To review the current DA activities and to capture theirarchitectures in a consistent way

• To confront these existing projects to the HEPCAL II use casesand the user's potential work environments in order to explorepotential shortcomings.

• To consider the interfaces between Grid, LCG and experiment-specific services– Review the functionality of experiment-specific packages, state of

advancement and role in the experiment.– Identify similar functionalities in the different packages– Identify functionalities and components that could be integrated in

the generic GRID middleware• To confront the current projects with critical GRID areas• To develop a roadmap specifying wherever possible the

architecture, the components and potential sources ofdeliverables to guide the medium term (2 year) work of the LCGand the DA planning in the experiments.

Page 3: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 3

ARDA Schedule and Makeup

Schedule and Makeup of ARDA RTAG

The RTAG shall provide a draft report to the SC2 by September 03.• It should contain initial guidance to the LCG and the experiments

to inform the September LHCC manpower review, in particular onthe expected responsibilities of– The experiment projects– The LCG (Development and interfacing work rather than coordination

work)– The external projects

The final RTAG report is expected for October 03.

The RTAG shall be composed of• Two members from each experiment• Representatives of the LCG GTA and AA• If not included above, the RTAG shall co-opt or invite

representatives from the major Distributed Analysis projects andnon-LHC running experiments with DA experience.

• Alice: Fons Rademakers and Predrag Buncic• Atlas: Roger Jones and Rob Gardner• CMS: Lothar Bauerdick and Lucia Silvestris • LHCb: Philippe Charpentier and Andrei Tsaregorodtsev • LCG GTA: David Foster, stand-in Massimo Lamanna• LCG AA: Torre Wenaus• GAG: Federico Carminati

Page 4: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 4

ARDA Distributed Analysis Services

Distributed Analysis in a Grid Services based architecture ARDA Services should be OGSI compliant -- built upon OGSI

middleware Frameworks and applications use ARDA API with bindings to C++,

Java, Python, PERL… interface through UI/API factory -- authentication, persistent “session”

Fabric Interface to resources through CE, SE services job description language, based on Condor ClassAds and matchmaking

Database(ses) through Dbase Proxy provide statefulness and persistence

We arrived at a decomposition into the following key services Authentication, Authorization, Accounting and Auditing services Workload Management and Data Management services File and (event) Metadata Catalogues Information service Grid and Job Monitoring services Storage Element and Computing Element services Package Manager and Job Provenance services

Page 5: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 5

Information Service

Authentication

Authorisation

Auditing

Grid Monitoring

Workload Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7: 12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

ARDA Key Services for Distributed Analysis

Numbers refer to time sequence of operations for a given use case:

1,2,3 Get access

4 Select dataset

5 Get PFMs

6 Submit job

7 Get files location

8 CE takes job

9,10 SW isinstalled

11 Progress checked

12,13 Get storage space

14,15 Store output

Page 6: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 6

API to Grid services

Importance of API Interface services to higher level software

Exp. framework Analysis shells, e.g. ROOT Grid portals and other forms of user interactions with environment Advanced services e.g. virtual data, analysis logbooks etc

Provide experiment specific services Data and Metadata management systems

Provide an API that others can project against Benefits of common API to framework

Goes beyond “traditional” UIs à la GANGA, Grid portals, etc Benefits in interfacing to analysis applications like ROOT et al Process to get a common API b/w experiments --> prototype

Page 7: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 7

On the road again

No “evolutionary” path from GT2-based grids David Foster at June 24th POB

We have a complex software infrastructure that needs simplifying ………………. Cannot simply incrementally improve the software we have.

Based on Globus GT2 design (which is being replaced by OGSA GT3) Augment LCG-1 and other grid services

ARDA Services deployed and run together with existing ones on LCG1 resources

Keep possibility to bridge to existing services if feasible Grid connectivity rather than interoperability

Use invaluable experience of LCG1 deployment for deploying ARDA

ARDA provides decomposition into those services that address the LHC distributed analysis use cases

Recommendation: build early a prototype based on re-factoring existing implementations

Page 8: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 8

ARDA Roadmap for Prototype

Prototype provides the initial blueprint Do not aim for a full specification of all the interfaces

4-prong approach: Re-factoring of AliEn, Dirac and possibly other services into ARDA

Initial release with OGSI::Lite/GT3 proxy, consolidation of API, release Implementation of agreed interfaces, testing, release

GT3 modeling and testing (in parallel) Interfacing to LCG-AA software like POOL, analysis shells like ROOT

Also opportunity to “early” interfacing to complementary projects Interfacing to experiments frameworks

metadata handlers, experiment specific services Provide interaction points with community

Early releases and workshops every few months Early strong feedback on API and services Decouple from deployment issues

Page 9: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 9

Experiments and LCG Involved in Prototyping

ARDA prototype would define the initial set of services and their interfaces. Timescale: spring 2004

Important to involve experiments and LCG at the right level Initial modeling of GT3-based services Interface to major cross-exp packages: POOL, ROOT, PROOF, others Program experiment frameworks against ARDA API, integrate with

experiment environments Expose services and API to other LHC projects to allow synergies Spend appropriate effort to document, package, release, deploy

After the prototype is delivered, improve on Scale up and re-engineer as needed: OGSI, databases, information

services Deployment and interfaces to site and grid operations, VO

management etc Build higher-level services and experiment specific functionality Work on interactive analysis interfaces and new functionalities

Page 10: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 10

Possible Strawman

Strawman workplan for ARDA prototype"Week" Core Developers Modeling and Eval. LCG s/w Interfaces Doc., Pack., Test. Experiments

1 OGSI::Lite Dummy ARDA Review API Identify projects to complem. implementation model (GT3) POOL, ROOT interface ARDA; Look at API

4 ------------------- Mini Workshop -------------------------------

5 Consolidate API Evaluate GT3 POOL interface python binding Python binding model perf. prototype Ganga/Clarens(?)

interface7 AliEn/ARDA implementation

(native perl and using java GT3 proxy)

12 ----------------------- Workshop -------------------------------

[Verify architecture, API, performance, revise proposed servicesand extract preliminary interfaces]

13 Implement agreed interfaces Document API & i/f interface exp. metadata catand services Packaging & specific services

16 Deploy, test and maintain prototype Test POOL i/f Interface exp's frameworks

20 Stress testing, scalability, performance Test exp. framework i/f

24 ----------------------- Workshop -------------------------------

[expose architecture, API, applications, performance, deployment issues, extensibility, early user feedback]

Page 11: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 11

Setting up the project

Propose ARDA to become now an LCG project Project should start with a definition of the work areas

identifying where the effort will come from Core development team: 2-3 *good* (experienced)

people plus 1 person from each experiment Estimate roughly total effort of some 10-15 people for the

6-month timescale to be practical Relevant experience and manpower coming from AliEn &

Dirac developers, other LHC experiments, GTA, AA, … Alice & LHCb needs to evaluate the impact on AliEn/Dirac

planning and makes a strong commitment to provide the relevant expertise

Page 12: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 12

Outcome of the SC2 presentation (October 3rd)

Good support from LCG management, even from EGEE reps

Surprise of CERN management (what, this is Grid developers’ job…)

Strange attitude of ATLAS (to a lesser extend CMS) Their problem is that they have a lot of projects on this

(Clarens, Dial, …) Will take time to get their people onboard

Written report expected end October In parallel, discussion are ongoing to set up the project Should be officially launched beginning of November IMPORTANT: there must be a strong participation besides

ALICE… Otherwise, it will be plain AliEn We should not hesitate to invest, it may pay back…

Page 13: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 13

LHCb view of ARDA services

Information Service

Authentication

Authorisation

Auditing

Grid Monitoring

Workload Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7: 12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

BookkeepingDatabase

Ganga

ProductionManager

DiracSW

Installation

bbftpDiracagent

Castor

ProductionDatabase

LHCb Production

Account

LSF

Handled bythe running script

Page 14: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 14

Dependencies of services for LHCb

Gaudi will use POOL for data storage POOL has to use the LHCb file catalog (or vice-versa) File catalog has to be decoupled from Bookkeeping Can be XML, mySQL, RLS…. or ARDA-compatible catalog? XML has to be used to test Gaudi/POOL (not suitable in production)

File catalog interfaced to User interfaces (GANGA, shell commands) Data Management service - to be defined (file replication)

Directly handled by Dirac agent in a first instance? Bookkeeping

Is it possible to define an interface? Experiment dependent interface? Is it a problem? Could use the existing BKDB to start with. Evaluate others (ARDA

prototype)

Page 15: Outline: ARDA services LHCb mini-workshop on Data Management and Production Tools Ph.Charpentier m The ARDA RTAG m The ARDA services m The proposed project

PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 15

Dependencies of services for LHCb

Software installation Should software be yet another file in the catalog, i.e. use

the data management service to install it (a la AliEn)? Otherwise is network connectivity necessary?

Authentication, authorization Should it be fully traceable? Should CEs trust the workload management (a la Dirac?) How to use a Grid security system and is it feasible at all? This is one of the biggest issues (for analysis) as Computer

Centres don’t want to give access to (even trusted) generic accounts

Could/should Dirac transfer credentials? How?