outline: arda services lhcb mini-workshop on data management and production tools ph.charpentier m...
DESCRIPTION
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management3 ARDA Schedule and Makeup m Alice: Fons Rademakers and Predrag Buncic Atlas: Roger Jones and Rob Gardner CMS: Lothar Bauerdick and Lucia Silvestris LHCb: Philippe Charpentier and Andrei Tsaregorodtsev LCG GTA: David Foster, stand-in Massimo Lamanna LCG AA: Torre Wenaus GAG: Federico CarminatiTRANSCRIPT
Outline:
ARDA servicesLHCb mini-workshop on Data Management and Production ToolsPh.Charpentier
The ARDA RTAG The ARDA services The proposed project Mapping to LHCb services
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 2
ARDA Mandate
Mandate for the ARDA RTAG
• To review the current DA activities and to capture theirarchitectures in a consistent way
• To confront these existing projects to the HEPCAL II use casesand the user's potential work environments in order to explorepotential shortcomings.
• To consider the interfaces between Grid, LCG and experiment-specific services– Review the functionality of experiment-specific packages, state of
advancement and role in the experiment.– Identify similar functionalities in the different packages– Identify functionalities and components that could be integrated in
the generic GRID middleware• To confront the current projects with critical GRID areas• To develop a roadmap specifying wherever possible the
architecture, the components and potential sources ofdeliverables to guide the medium term (2 year) work of the LCGand the DA planning in the experiments.
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 3
ARDA Schedule and Makeup
Schedule and Makeup of ARDA RTAG
The RTAG shall provide a draft report to the SC2 by September 03.• It should contain initial guidance to the LCG and the experiments
to inform the September LHCC manpower review, in particular onthe expected responsibilities of– The experiment projects– The LCG (Development and interfacing work rather than coordination
work)– The external projects
The final RTAG report is expected for October 03.
The RTAG shall be composed of• Two members from each experiment• Representatives of the LCG GTA and AA• If not included above, the RTAG shall co-opt or invite
representatives from the major Distributed Analysis projects andnon-LHC running experiments with DA experience.
• Alice: Fons Rademakers and Predrag Buncic• Atlas: Roger Jones and Rob Gardner• CMS: Lothar Bauerdick and Lucia Silvestris • LHCb: Philippe Charpentier and Andrei Tsaregorodtsev • LCG GTA: David Foster, stand-in Massimo Lamanna• LCG AA: Torre Wenaus• GAG: Federico Carminati
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 4
ARDA Distributed Analysis Services
Distributed Analysis in a Grid Services based architecture ARDA Services should be OGSI compliant -- built upon OGSI
middleware Frameworks and applications use ARDA API with bindings to C++,
Java, Python, PERL… interface through UI/API factory -- authentication, persistent “session”
Fabric Interface to resources through CE, SE services job description language, based on Condor ClassAds and matchmaking
Database(ses) through Dbase Proxy provide statefulness and persistence
We arrived at a decomposition into the following key services Authentication, Authorization, Accounting and Auditing services Workload Management and Data Management services File and (event) Metadata Catalogues Information service Grid and Job Monitoring services Storage Element and Computing Element services Package Manager and Job Provenance services
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 5
Information Service
Authentication
Authorisation
Auditing
Grid Monitoring
Workload Management
Metadata Catalogue
File Catalogue
Data Management
Computing Element
Storage Element
Job Monitor
Job Provenance
Package Manager
DB Proxy
User Interface
API
Accounting
7: 12:
5:
13:
8:
15: 11:
9: 10:
1:
4:
2:
3:
6:
14:
ARDA Key Services for Distributed Analysis
Numbers refer to time sequence of operations for a given use case:
1,2,3 Get access
4 Select dataset
5 Get PFMs
6 Submit job
7 Get files location
8 CE takes job
9,10 SW isinstalled
11 Progress checked
12,13 Get storage space
14,15 Store output
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 6
API to Grid services
Importance of API Interface services to higher level software
Exp. framework Analysis shells, e.g. ROOT Grid portals and other forms of user interactions with environment Advanced services e.g. virtual data, analysis logbooks etc
Provide experiment specific services Data and Metadata management systems
Provide an API that others can project against Benefits of common API to framework
Goes beyond “traditional” UIs à la GANGA, Grid portals, etc Benefits in interfacing to analysis applications like ROOT et al Process to get a common API b/w experiments --> prototype
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 7
On the road again
No “evolutionary” path from GT2-based grids David Foster at June 24th POB
We have a complex software infrastructure that needs simplifying ………………. Cannot simply incrementally improve the software we have.
Based on Globus GT2 design (which is being replaced by OGSA GT3) Augment LCG-1 and other grid services
ARDA Services deployed and run together with existing ones on LCG1 resources
Keep possibility to bridge to existing services if feasible Grid connectivity rather than interoperability
Use invaluable experience of LCG1 deployment for deploying ARDA
ARDA provides decomposition into those services that address the LHC distributed analysis use cases
Recommendation: build early a prototype based on re-factoring existing implementations
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 8
ARDA Roadmap for Prototype
Prototype provides the initial blueprint Do not aim for a full specification of all the interfaces
4-prong approach: Re-factoring of AliEn, Dirac and possibly other services into ARDA
Initial release with OGSI::Lite/GT3 proxy, consolidation of API, release Implementation of agreed interfaces, testing, release
GT3 modeling and testing (in parallel) Interfacing to LCG-AA software like POOL, analysis shells like ROOT
Also opportunity to “early” interfacing to complementary projects Interfacing to experiments frameworks
metadata handlers, experiment specific services Provide interaction points with community
Early releases and workshops every few months Early strong feedback on API and services Decouple from deployment issues
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 9
Experiments and LCG Involved in Prototyping
ARDA prototype would define the initial set of services and their interfaces. Timescale: spring 2004
Important to involve experiments and LCG at the right level Initial modeling of GT3-based services Interface to major cross-exp packages: POOL, ROOT, PROOF, others Program experiment frameworks against ARDA API, integrate with
experiment environments Expose services and API to other LHC projects to allow synergies Spend appropriate effort to document, package, release, deploy
After the prototype is delivered, improve on Scale up and re-engineer as needed: OGSI, databases, information
services Deployment and interfaces to site and grid operations, VO
management etc Build higher-level services and experiment specific functionality Work on interactive analysis interfaces and new functionalities
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 10
Possible Strawman
Strawman workplan for ARDA prototype"Week" Core Developers Modeling and Eval. LCG s/w Interfaces Doc., Pack., Test. Experiments
1 OGSI::Lite Dummy ARDA Review API Identify projects to complem. implementation model (GT3) POOL, ROOT interface ARDA; Look at API
4 ------------------- Mini Workshop -------------------------------
5 Consolidate API Evaluate GT3 POOL interface python binding Python binding model perf. prototype Ganga/Clarens(?)
interface7 AliEn/ARDA implementation
(native perl and using java GT3 proxy)
12 ----------------------- Workshop -------------------------------
[Verify architecture, API, performance, revise proposed servicesand extract preliminary interfaces]
13 Implement agreed interfaces Document API & i/f interface exp. metadata catand services Packaging & specific services
16 Deploy, test and maintain prototype Test POOL i/f Interface exp's frameworks
20 Stress testing, scalability, performance Test exp. framework i/f
24 ----------------------- Workshop -------------------------------
[expose architecture, API, applications, performance, deployment issues, extensibility, early user feedback]
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 11
Setting up the project
Propose ARDA to become now an LCG project Project should start with a definition of the work areas
identifying where the effort will come from Core development team: 2-3 *good* (experienced)
people plus 1 person from each experiment Estimate roughly total effort of some 10-15 people for the
6-month timescale to be practical Relevant experience and manpower coming from AliEn &
Dirac developers, other LHC experiments, GTA, AA, … Alice & LHCb needs to evaluate the impact on AliEn/Dirac
planning and makes a strong commitment to provide the relevant expertise
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 12
Outcome of the SC2 presentation (October 3rd)
Good support from LCG management, even from EGEE reps
Surprise of CERN management (what, this is Grid developers’ job…)
Strange attitude of ATLAS (to a lesser extend CMS) Their problem is that they have a lot of projects on this
(Clarens, Dial, …) Will take time to get their people onboard
Written report expected end October In parallel, discussion are ongoing to set up the project Should be officially launched beginning of November IMPORTANT: there must be a strong participation besides
ALICE… Otherwise, it will be plain AliEn We should not hesitate to invest, it may pay back…
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 13
LHCb view of ARDA services
Information Service
Authentication
Authorisation
Auditing
Grid Monitoring
Workload Management
Metadata Catalogue
File Catalogue
Data Management
Computing Element
Storage Element
Job Monitor
Job Provenance
Package Manager
DB Proxy
User Interface
API
Accounting
7: 12:
5:
13:
8:
15: 11:
9: 10:
1:
4:
2:
3:
6:
14:
BookkeepingDatabase
Ganga
ProductionManager
DiracSW
Installation
bbftpDiracagent
Castor
ProductionDatabase
LHCb Production
Account
LSF
Handled bythe running script
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 14
Dependencies of services for LHCb
Gaudi will use POOL for data storage POOL has to use the LHCb file catalog (or vice-versa) File catalog has to be decoupled from Bookkeeping Can be XML, mySQL, RLS…. or ARDA-compatible catalog? XML has to be used to test Gaudi/POOL (not suitable in production)
File catalog interfaced to User interfaces (GANGA, shell commands) Data Management service - to be defined (file replication)
Directly handled by Dirac agent in a first instance? Bookkeeping
Is it possible to define an interface? Experiment dependent interface? Is it a problem? Could use the existing BKDB to start with. Evaluate others (ARDA
prototype)
PhC, 16/10/03 ARDA services, LHCb workshop on Data Management 15
Dependencies of services for LHCb
Software installation Should software be yet another file in the catalog, i.e. use
the data management service to install it (a la AliEn)? Otherwise is network connectivity necessary?
Authentication, authorization Should it be fully traceable? Should CEs trust the workload management (a la Dirac?) How to use a Grid security system and is it feasible at all? This is one of the biggest issues (for analysis) as Computer
Centres don’t want to give access to (even trusted) generic accounts
Could/should Dirac transfer credentials? How?