ying ying li

1
Ying Ying Li Windows Implementation of LHCb Experiment Workload Management System DIRAC LHCb is one of the four main high energy physics experiments at the Large Hadron Collider (LHC) at CERN, Geneva. LHCb is designed to investigate the matter-antimatter asymmetries seen in the Universe today, concentrating on studies of particles containing a b quark. Once it starts operation in 2007/8 LHCb will need to process data volumes of the order of petabytes per year, requiring tens of thousands of CPUs. To be able to achieve this, a workload management system (DIRAC), allowing coordinated use of globally distributed computing resources (Grid) has been developed, with implementation in Python. DIRAC currently coordinates LHCb jobs running on 6000+ CPUs shared with other experiments, distributed among 80+ sites across 4 continents. DIRAC has demonstrated its capabilities during a series of data challenges held since 2002, with a current record of 10,000+ jobs running simultaneously across the Grid. Most of the LHCb data-processing applications are tested under both Windows and Linux, but the production system has previously been deployed only on Linux platforms. This project will allow a significant increase in the resources available to LHCb, by extending the DIRAC system to also use Windows machines. The process of porting DIRAC to Windows has involved work in several areas, which include automated installation of DIRAC, DISET (security module), automated LHCb application download, installation, running, and secure data transfer with .NetGridFTP. The result is a fully operational DIRAC system, that is easily deployable in a Windows environment, and allows the authorised user to submit jobs, and offer the CPU as an available resource to the LHCb experiment alongside Linux resources. The work describe has been developed on a Windows Compute Cluster consisting of four Shuttle SN95G5 boxes, running Windows Server 2003 Compute Cluster Edition software. This has also assisted in the extension of DIRAC’s Compute Cluster backend computing element module. Tests have also been made on a Windows XP laptop, which demonstrate the flexibility and ease of deployment. The system has been deployed on a small cluster at Cambridge and Bristol, and on a larger cluster (~100 CPU’s) at Oxford. This project displays the platform independence of DIRAC and its potential. The DIRAC system has been used successfully with a subset of the LHCb applications. Current work focuses on deploying and testing the full set of LHCb applications under Windows to allow the running of production jobs. Users can create jobs using a Python API or can directly write scripts in DIRAC’s Job Definition Language (JDL). In both cases, the user specifies the application to be run, the input data (if required), and any precompiled libraries. Applications developed by LHCb can be combined to form various types of jobs, rang- ing from production jobs (simulation + digitali- sation + reconstruc- tion) to physics analysis. DIRAC can be tailored to allow running of any type of application. The important applications for LHCb are based on a C++ frame work called Gaudi. GAUSS – Monte Carlo generator for simulation of particle collisions in the detector. Boole – Produces detector response to GAUSS ‘hits’. Brunel Reconstruction of events from Boole/detector. DaVinci – Physics Analysis (C++). Bender – Physics Analysis (Python, using bindings to C++). Jobs are submitted via DISET, the DIRAC security module built from Openssl language tools and modified version of pyOpenssl. Authorisation use is made of certificate based authentication. Input files are uploaded to the sandbox service on the DIRAC server. CASTOR storage Submit Job Create Job Once a job reaches the DIRAC server it is checked for requirements placed by the owner, and waits for a suitably matched Agent from a free resource. DIRAC Agents act to link the distributed resources together. When a resource is free to process jobs it sends out a locally configured Agent, with the specifications of the resource, to request jobs from the central server. After a suitable job and Agent are matched: Agent retrieves the job’s JDL and sandbox, it wraps the job in a Python script, and reports back. If the resource is not a standalone CPU, the resource backend (LCG, Windows Compute Cluster, condor etc.) is checked, and the wrapper is submitted accordingly. Download and install any required application when necessary. Using the GridFTP protocol and LFC (LCG File catalogue) download any required Grid data, for example from the CERN Castor system. LHC Computing Grid Clusters, Standalone desktops, laptops … Agents Agent GridFTP Data transfer 27Km Users are able to monitor job progress from the monitoring web page: http://lhcb.pic.es/DIRAC/ Monitoring/Analysis Monitori ng SoftwarePackages = { “DaVinci.v12r15" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC04/v2/ 00980000/DST/Presel_00980000_0 0001212.dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v12r15.log” }; JobType = "user"; import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinc i', 'v12r15') job.setInputSandbox(['DaVi nci.opts’]) job.setInputData(['LFN:/ lhcb/production/DC04/ v2/00980000/DST/ Presel_00980000_00001212.d st']) job.setOutputSandbox([‘DaV inci_v12r15.log’]) dirac.submit(job) JDL API

Upload: winka

Post on 22-Feb-2016

77 views

Category:

Documents


1 download

DESCRIPTION

Windows Implementation of LHCb Experiment Workload Management System DIRAC. Ying Ying Li. 27Km. The Experiment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ying Ying Li

Ying Ying Li

Windows Implementation of LHCb Experiment Workload

Management System DIRAC LHCb is one of the four main high energy physics experiments at the Large Hadron Collider (LHC) at CERN, Geneva. LHCb is designed to investigate the matter-antimatter asymmetries seen in the Universe today, concentrating on studies of particles containing a b quark. Once it starts operation in 2007/8 LHCb will need to process data volumes of the order of petabytes per year, requiring tens of thousands of CPUs. To be able to achieve this, a workload management system (DIRAC), allowing coordinated use of globally distributed computing resources (Grid) has been developed, with implementation in Python. DIRAC currently coordinates LHCb jobs running on 6000+ CPUs shared with other experiments, distributed among 80+ sites across 4 continents. DIRAC has demonstrated its capabilities during a series of data challenges held since 2002, with a current record of 10,000+ jobs running simultaneously across the Grid. Most of the LHCb data-processing applications are tested under both Windows and Linux, but the production system has previously been deployed only on Linux platforms. This project will allow a significant increase in the resources available to LHCb, by extending the DIRAC system to also use Windows machines.

The process of porting DIRAC to Windows has involved work in several areas, which include automated installation of DIRAC, DISET (security module), automated LHCb application download, installation, running, and secure data transfer with .NetGridFTP. The result is a fully operational DIRAC system, that is easily deployable in a Windows environment, and allows the authorised user to submit jobs, and offer the CPU as an available resource to the LHCb experiment alongside Linux resources. The work describe has been developed on a Windows Compute Cluster consisting of four Shuttle SN95G5 boxes, running Windows Server 2003 Compute Cluster Edition software. This has also assisted in the extension of DIRAC’s Compute Cluster backend computing element module. Tests have also been made on a Windows XP laptop, which demonstrate the flexibility and ease of deployment. The system has been deployed on a small cluster at Cambridge and Bristol, and on a larger cluster (~100 CPU’s) at Oxford. This project displays the platform independence of DIRAC and its potential. The DIRAC system has been used successfully with a subset of the LHCb applications. Current work focuses on deploying and testing the full set of LHCb applications underWindows to allow the running of production jobs.

Users can create jobs using a Python API or can directly write scripts in DIRAC’s Job Definition Language (JDL). In both cases, the user specifies the application to be run, the input data (if required), and any precompiled libraries. Applications developed by LHCb can be combined to formvarious types of jobs, rang-ing from production jobs (simulation + digitali-sation + reconstruc-tion) to physics analysis.

DIRAC can be tailored to allow running of any type of application. The important applications for LHCb are based on a C++ framework called Gaudi.GAUSS – Monte Carlo generatorfor simulation of particle collisionsin the detector. Boole – Produces detectorresponse to GAUSS ‘hits’.Brunel – Reconstruction of events from Boole/detector.DaVinci – Physics Analysis (C++).Bender – Physics Analysis (Python, using bindings to C++).

Jobs are submitted via DISET, the DIRAC security module built from Openssl language tools and modified version of pyOpenssl. Authorisation use is made of certificate based authentication. Input files are uploaded to the sandbox service on the DIRAC server.

CASTOR storage

Submit Job

Create Job

Once a job reaches the DIRAC server it is checked for requirements placed by the owner, and waits for a suitably matched Agent from a free resource. DIRAC Agents act to link the distributed resources together. When a resource is free to process jobs it sends out a locally configured Agent, with the specifications of the resource, to request jobs from the central server. After a suitable job and Agent are matched: Agent retrieves the job’s JDL and sandbox, it wraps the job in a Python script, and reports back. If the resource is not a standalone CPU, the resource backend (LCG, Windows Compute Cluster, condor etc.) is checked, and the wrapper is submitted accordingly. Download and install any required application when necessary. Using the GridFTP protocol and LFC (LCG File catalogue) download any required Grid data, for example from the CERN Castor system. Run the job, and report on progress. Perform any requested data transfers.

LHC Computing Grid

Clusters, Standalone desktops, laptops …

AgentsAgent

GridFTP Data transfer

27Km

Users are able to monitor job progress from the monitoring web page:http://lhcb.pic.es/DIRAC/Monitoring/Analysis

Monitoring

SoftwarePackages = { “DaVinci.v12r15" };InputSandbox = { “DaVinci.opts” };InputData = { "LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst" };JobName = “DaVinci_1";Owner = "yingying";StdOutput = "std.out";StdError = "std.err";OutputSandbox = { "std.out", "std.err", “DaVinci_v12r15.log” };JobType = "user";

import DIRACfrom DIRAC.Client.Dirac import *dirac = Dirac()job = Job()job.setApplication(‘DaVinci', 'v12r15')job.setInputSandbox(['DaVinci.opts’])job.setInputData(['LFN:/lhcb/production/DC04/v2/00980000/DST/Presel_00980000_00001212.dst'])job.setOutputSandbox([‘DaVinci_v12r15.log’])dirac.submit(job)

JDL

API