3 sept 2001f harris chep, beijing 1 moving the lhcb monte carlo production system to the grid...

18
3 Sept 2001 F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol E.van Herwijnen,P.Mato CERN A.Khan Edinburgh M.McCubbin,G.D.Patel Liverpool A.Tsaregorodtsev Marseille H.Bulten,S.Klous Nikhef F.Harris

Upload: wilfred-carpenter

Post on 12-Jan-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 1

Moving the LHCb Monte Carlo production system to the GRID

D.Galli,U.Marconi,V.Vagnoni INFN Bologna

N Brook Bristol

E.van Herwijnen,P.Mato CERN

A.Khan Edinburgh

M.McCubbin,G.D.Patel Liverpool

A.Tsaregorodtsev Marseille

H.Bulten,S.Klous Nikhef

F.Harris Oxford

G.N.Patrick,R.A.Sansum RAL

Page 2: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 2

Overview of presentation

• Functionality and distribution of the current system

• Experience with the use of Globus in tests and production

• Requirements and planning for the use of DataGrid middleware and security system

• Planning for interfacing GAUDI software framework to GRID services

• Conclusions

Page 3: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 3

LHCb distributed computing environment(15 countries - 13 European + Brazil,China, 50 institutes)

• Tier-0 – CERN

• Tier-1 – RAL(UK),IN2P3(Lyon),INFN(Bologna),Nikhef,CERN + ?

• Tier-2 – Liverpool,Edinburgh/Glasgow,Switzerland + ? (maybe grow to ~10)

• Tier-3– ~50 throughout the collaboration

• Ongoing negotations for centres (Tier-1/2/3)– Germany,Russia,Poland,Spain,Brazil

• Current GRID involvement– DataGrid (and national GRID efforts in UK,Italy,+..)– Active in WP8 (HEP Applications) of Datagrid– Will use middleware(WP 1-5) + Testbed(WP6) + Network(WP7) +

Security tools

Page 4: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 4

Current MC production facilities

Centre OS Max. # (av. #)of CPUs usedsimultaneously

Batchsystem

Typical weeklyproduction (#kof events)

Percentagesubmittedthrough Grid

CERN Linux 315(60) LSF 85 10%RAL Linux 50(30) PBS 35 100%IN2P3 Linux 225(60) BQS 35 100%Liverpool Linux 300(250) Custom 150 0%Bologna Linux 20(20) PBS 35 0%

• The max # of CPUs used simultaneously is usually less than the capacity of the farm.

•Will soon extend to Nikhef, Edinburgh, Bristol

Page 5: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 5

Submit jobs remotelyvia Web

Executeon farm

Monitorperformanceof farm viaWeb

Update bookkeepingdatabase (Oracle at CERN)

Transfer data toCASTORmass-store at CERN

Data Quality Check on data stored at CERN

Distributed MC production, today

Page 6: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 6

Distributed MC production in future (using DataGRID middleware)

Submit jobs remotelyvia Web

Executeon farm

Monitorperformanceof farm viaWeb

Update bookkeepingdatabase

Transfer data toCASTOR (and HPSS, RAL Datastore)

Data Quality Check ‘Online’

WP 1 job submission tools WP 4

environment

WP 1 job submission

tools

WP 3 monitoring

tools

WP 2 data replication

WP 5 API for mass storage

Online histogram production using

GRID pipes

WP 2 meta data tools

WP1 tools

Page 7: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 7

Use of Globus in tests and production

• Use of Globus simplifies remote production – submit jobs through local Globus commands rather than remote

logon

• Some teething problems in tests(some due to learning curve)– Some limitations to the system (e.g. need large temporary space for

running jobs)

– Some mismatches between Globus and the PBS batch system (job parameters ignored, submitting >100 jobs give problems)

• DataGrid testbed organisation will ensure synchronisation of versions at sites + Globus support

Page 8: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 8

Security • M9(October 2001...)

– Authorisation group working towards tool providing single log-on and single role for individual

– Individual will get certificate from national CA

– Must work out administration for this at start for experiment VO. Probably ~10 users for LHCb

• M21(October2002….)– Single log-on firmly in

place. Moved to structured VO with (group,individual) authorisation. Multiple roles

– Maybe up to ~50 users

Page 9: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 9

Job Submission• M9

– Use command line interface to WP1 JDL. ‘Static’ file specification.

– Use environment specification as agreed with WP1,4 (no cloning)

• M21 – Interface to WP1 Job

Options via LHCb application (GANGA). Dynamic ‘file’ environment according to application navigation

– May require access to query language tools to metadata

– More comprehensive environment specification

Page 10: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 10

Job Execution• M9

– Will run on farms at CERN, Lyon, RAL for first tests

• Extend to Nikhef, Bologna, Edinburgh once we get stability

– Will use a very simple environment (binaries)

– ‘Production’ flavour for work

• M21

– Should be running on many sites (? 20)

– Complete LHCb environment for production and development, without AFS (use WP1 ‘sandboxes’)

– Should be testing user analysis via GRID, as well as performing production(~50)

Page 11: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 11

Job Monitoring and data quality checking

• M9

– Monitor farms with home-grown tools via Web

– Use home-grown data histogramming tools for data monitoring

• M21

– Integrate WP3 tools for farm performance (status of jobs)

– Combine LHCb ideas on state management and data quality checking with DataGrid software

Page 12: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 12

Bookkeeping database• M9

– Use current CERN-centric Oracle based system

• M21 – Moved to WP2 metadata

handling tools ? ( ? Use of LDAP, Oracle)

– This will be distributed database handling using facilities of replica catalogue and replica management

– LHCb must interface applications view (metadata) to GRID tools. ?query tools availability

Page 13: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 13

Data copying and mass storage handling

• M9

– WP2 GDMP tool via command line interface to transfer Zebra format files(control from LHCb scripts)

– WP5 interface to CASTOR

• M21

– GDMP will be replaced by smaller tools with API interface. Copy Zebra +Root + ?

– Tests of strategy driven copying via replica catalogue and replica management

– WP5 interfaces to more mass storage devices. (HPSS+RAL Datastore)

Page 14: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 14

Gaudi ArchitectureConverter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverterEventSelector

Page 15: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 15

GAUDI services linking to external services

Converter

Algorithm

Event DataService

PersistencyService

AlgorithmAlgorithm

Detec. DataService

PersistencyService

MessageService

JobOptionsService

Particle Prop.Service

OtherServices Histogram

ServicePersistency

Service

ApplicationManager

ConverterConverterEventSelector

Analysis Program

OSMass

Storage

EventDatabasePDG

Database

DataSetDB

Other

MonitoringService

HistoPresenter

Other

JobService

Config.Service

TransientTransient

Transient Detector

Store

TransientHistogram

Store

Transient Event Store

Page 16: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 16

Another View

Algorithms

Gaudi Services

API

Application externalServices

API

Gaudi Domain

Grid Domain

Page 17: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 17

GANGA: Gaudi ANd Grid Alliance

GAUDI Program

GANGAGU

I

JobOptionsAlgorithms

Collective&

ResourceGrid

Services

HistogramsMonitoringResults

Page 18: 3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol

3 Sept 2001 F HARRIS CHEP, Beijing 18

Conclusions• LHCb already has distributed MC production using GRID facilities for job

submission

• Will test DataGrid M9 (Testbed1) deliverables in an incremental manner from October 15 using tools from WP1-5

• Have commenced defining projects to interface software framework (GAUDI) services (Event Persistency, Event Selection, Job Options) to GRID services

• Within the WP8 structure we will work closely with the other work packages (middleware,testbed,network) in a cycle of (requirements analysis, design, implementation,testing)

• http://lhcb-comp.web.cern.ch/lhcb-comp/

• http://datagrid-wp8.web.cern.ch/DataGrid-WP8/