use of the glite-wms in cms for production and analysis giuseppe codispoti on behalf of the cms...

17
Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Upload: rosanna-short

Post on 08-Jan-2018

215 views

Category:

Documents


1 download

DESCRIPTION

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Computing Model Overview CAF LSF based Complex system:  Access to distributed resources through Grid middleware  Access to local batch systems (e.g. local farms and CERN Analysis Facility for high priority tasks)  Access to CMS specific Workload and Data Management Tools High job rate:  Large experimental community (3k people)  Huge amount of data produced by the experiment (up to 2PB/year)  Comparable amount of Monte Carlo data samples to be generated and accessed See talk [192] by I. Fisk: Challenges for the CMS Computing Model in the First YearChallenges for the CMS Computing Model in the First Year

TRANSCRIPT

Page 1: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use ofthe gLite-WMS in CMS

for production and analysis

Giuseppe CodispotiOn behalf of the CMS Offline and Computing

Page 2: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20092 G.Codispoti, INFN Bologna

Outline BossLite: the common interface to Grid and batch

systems for the CMS tools gLite usage through BossLite gLite integration in the CMS tools Issues and proposed solutions WMS usage in analysis and MC production

activities Overall performances Conclusions

Page 3: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20093 G.Codispoti, INFN Bologna

Computing Model OverviewCAFCAF

LSF based

Complex system: Access to distributed

resources through Grid middleware

Access to local batch systems (e.g. local farms and CERN Analysis Facility for high priority tasks)

Access to CMS specific Workload and Data Management Tools

High job rate: Large experimental

community (3k people) Huge amount of data

produced by the experiment (up to 2PB/year)

Comparable amount of Monte Carlo data samples to be generated and accessed

See talk [192] by I. Fisk: Challenges for the CMS Computing Model in the First Year

Page 4: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20094 G.Codispoti, INFN Bologna

BossLite: a common Grid/batch interface with logging facilities CMS interface to different Grid [WLCG, OSG] and batch

systems [LSF, ARC, SGE…] Database to track and log information into an entity-related

schema Information logically remapped into python objects that can

be transparently used by the CMS framework and tools Requested high efficiency and safe operation in

multithreaded environment Database interaction through safe sessions and

connections Connections pool Thread safe Focused on connection stability

Page 5: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20095 G.Codispoti, INFN Bologna

Submission 1

Submission 2

Submission 1

Submission 2

Submission 3

Shared job info

Plugins for transparent interaction with Grid and

local batch systems

DataBase backend for logging and bookkeeping

User Task Description: identical jobs accessing different part of a dataset or producing a part of a

MC sample

Pool of DB connections to be shared

among threads

Job static info

Runtime info

BossLite Architecture

Page 6: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20096 G.Codispoti, INFN Bologna

The BossLite interface to gLite Bulk submission, bulk match-making, bulk status query

Faster, more efficient

Access through WMProxy python API Needed to allow the association between BossLite jobs and their grid

Identifiers (not trivial through the CLI) No parsing of "human readable" streams     But the use of the API is complex and the UI tools (e.g. UI configuration, input

sandbox transfers, ...) are lost Exposed to python compatibility issues

Access to LB information through API Easy and fast check of job status Easy to extract many more useful information at runtime: destination queue,

status reason, scheduling timestamps…

Note: the CMS computing model uses its own data location system: the WMS match making is only used to select among available resources hosting selected data.

Page 7: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20097 G.Codispoti, INFN Bologna

CMS use cases Monte Carlo Production:

Automatic, parallelized system for huge data samples simulation/reconstruction

Basic analysis tasks (single user): Transparent usage of the Grid infrastructure as well as local

batch system, integrated with the CMS workload management system

Regime Analysis and intensive analysis tasks Centralized system dealing with huge tasks, automating the

analysis workflow, optimizing Grid usage High concurrency system for multiuser environment

Page 8: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20098 G.Codispoti, INFN Bologna

Production AgentFiles limited in size, produced directly in CMS

destination sites (merged later

through ad hoc jobs)

Multi threaded Status Query

Sequential Job Submission (collections)

Multi Threaded Output Retrieval for

log files and production report

ProdAgent

See Poster [82] by F. Van Lingen:MC production and processing system – Design and experiences

Page 9: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 20099 G.Codispoti, INFN Bologna

CMS Remote Analysis Builder

All UI functionalities wrapped with WMProxy/LB

API

Page 10: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200910 G.Codispoti, INFN Bologna

CRAB Analysis Server

Direct ISB/OSB Shipping from WN: variable size, possible to implement CMS specific policies bypassing WMS (using gLite features!)

Multi threaded Status Query

Multi Threaded Job Submission

(collections, many users

concurrently)

Multi Threaded Output Handling and WMS purge

See talk [77] by D. Spiga: Automatization of User Analysis Workflow in CMS

Page 11: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200911 G.Codispoti, INFN Bologna

Evolution Fruitful collaboration with gLite developers to fix

bugs and implement the features CMS needs

Proposed xml/json output to CLI commands Same level of detail for job association Reusing all UI functionalities, everything is already

there, needs just to be made accessible Specific error logging Accessible through simple subprocess,

encapsulating environment/compatibility issues Simpler intermediate layer Reuse&simplify!

Page 12: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200912 G.Codispoti, INFN Bologna

Single WMS usage in every day activities

Typical instantaneous load of a single WMS (jobs running/idle) Up to 5 kJobs simultaneously handled

in every day analysis per WMS

Daily job rate, including ended jobs Typical jobs load for a single WMS

may already reach 15k jobs Stress tests reached 30 kJobs per day

without breaking point signal for a single WMS!

MC production and analysis jobs are balanced over many WMS: currently 7 for the analysis, 4 for the MC production

- active jobs

- ended jobs

- aborted jobs

- idle jobs- running jobs

Page 13: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200913 G.Codispoti, INFN Bologna

Overall Performances of the CMS tools

Reached limits are mainly due to tracking and output retrieval/handling Some optimizations already in place, other small tweaks possible The WMS architecture is such that the system scales linearly with the number of

WMSs Add as many WMSs to a CMS service as needed

The CMS architecture is similar: Deploy as many instances of PA and CRAB Server as needed:

No scale problems foreseen at the expected rates 50/100 kJobs/day for MC production and 100/200 kJobs/day for analysis

Single CRAB Server instance in multi use mode reached 50KJobs per day using 2 WMS

A single ProdAgent instance reached around 30kJobs per day Lower performance in the output

copy from the WMS: we plan to reduce the size and number of the files to be retrieved

50 kJobs

Page 14: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200914 G.Codispoti, INFN Bologna

CMS grid activity with gLite WMS ~75k per day

~30k analysis ~20k MC production ~25k other activities

Jobs uniformly distributed over more than 40 sites

CMS uses gLite WMS since years Increase in the activity during last year

May 2008 challenge (CCRC08, see [312],[389]) From May 2008 to March 2009

Poster [389]: CMS results from Computing Challenges and Commissioning of the computing infrastructurePoster [312]: Commissioning Distributed Analysis at the CMS Tier-2 Centers

Page 15: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200915 G.Codispoti, INFN Bologna

Job distribution per activitiesFrom May 2008 to March 2009 : 23 M total jobs submitted

58% Success

25% application

failures

12% grid failures

5% cancelled

about 78% of the total analysis jobs are sumitted with the gLite WMS (the rest mainly CondorG) since years!

~600 distinct real users in the last 3 months

81% Success

~ 9% application

failures

10% grid failures

8.8M Analysis Jobs

5.3M MC Production Jobs

87% Success

4% application failures7% grid failures

6.6 M JobRobot

2% cancelled+ 2,3M jobs other test activities

Page 16: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200916 G.Codispoti, INFN Bologna

Conclusions CMS successfully uses the WMS in MonteCarlo Production

and Analysis tasks We are able to reach more than 30kJobs with a single WMS Each CMS application service may use in parallel as many

WMSs as needed: Up to 50 kJobs from a single CMS server

We are able to cover CMS requests delivering few instances of CRAB/ProdAgent

The every day experience and usage allows us to improve the system and provide feedback to WMS/gLite developers since years

Page 17: Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, 23-27 March 200917 G.Codispoti, INFN Bologna

Author list G. Codispoti , C. Grandi , A.Fanfani (Bo-INFN) D. Spiga, V. Miccio , A. Sciaba' (CERN) F.Fanzago (CERN,CNAF-INFN) M. Cinquilli (Perugia-INFN) F. Farina (Mi-INFN,CERN) S. Lacaprara (LNL-INFN) S. Belforte (Trieste-INFN) D. Bonacorsi , A.Sartirana, D. DonGiovanni, D. Cesini (CNAF-INFN) S.Lemaitre , M.Litmaath, E.Roche, Y.Calas (CERN) S. Wakefield (IC-London) J. Hernandez (CIEMAT)