how to port an application on grid: available tools and tricks of the trade

28
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org How to port an application on GRID: available tools and tricks of the trade Patricia Méndez Lorenzo CERN (IT-PSS/ED) Trieste, 10th February 2006 ICTP/INFM-Democritos Workshop on Porting Scientific Applications on Computational GRIDs

Upload: zeus-webster

Post on 30-Dec-2015

27 views

Category:

Documents


2 download

DESCRIPTION

How to port an application on GRID: available tools and tricks of the trade. Patricia M é ndez Lorenzo CERN (IT-PSS/ED) Trieste, 10th February 2006 ICTP/INFM-Democritos Workshop on Porting Scientific Applications on Computational GRIDs. Outlook. - PowerPoint PPT Presentation

TRANSCRIPT

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

How to port an application on GRID: available tools and tricks of the tradePatricia Méndez Lorenzo

CERN (IT-PSS/ED)

Trieste, 10th February 2006

ICTP/INFM-Democritos Workshop on Porting Scientific Applications on Computational GRIDs

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Outlook

◘ We will see now two examples of new gridifications inside LCG/EGEE

➸ Geant4

➸ UNOSAT

◘ I would like to emphasize that they are CERN partner communities

➸ How they were known by the LCG is therefore obvious

➸ But the gridification procedure is the same for any other community

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

The Geant4 Toolkit

Generic Toolkit for Monte Carlo simulation of particle interactions with the matter

(i.e. detectors)

◘ Application domains:

➙ High-Energy Physics: ATLAS, CMS and LHCb (LHC), BaBar (SLAC), etc

➙ Space Radiation: ESA

➙ Medical Physics: Proton and brachy therapies, etc

◘ Object-Oriented (C++) project, modular and extensible. Significant improved with respect its predecessor, Geant3, not only from the software structure, but mainly for the physics coverage

◘ Electromagnetic physics of Geant4 and even more Hadronic physics are complex fields. It is fundamental to test their models covering the widest possible range of particles, materials and energies

Here appears the Grid Contribution

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 Toolkit and the GRID Environment

◘ Electromagnetic and Hadronic physics are fundamental features to be properly simulated in High-Energy Physics and medical applications. However they are extremely CPU demanding

▪ Number of events and energy depending: 1 event of 1 GeV ~ 0.03 sec (2.4 GHz)1 event of 300 GeV ~ 9-10 sec

Geant4 wants to use the LCG environment to validate the software they provide to their users twice per year

● Two large productions per year◘ Goal during the software validation: Comparison some shower observables between the two different Geant4 versions and check statistical significant changes

● Small productions (some few thousands of jobs) during the whole year

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 Toolkit and the Grid Environment

◘ Geant4 validates its software through a wide range of different parameters: ➸ 7 simplified detectors

► FeSci, CuSci, PbSci, CuLAr, PbLAr, WLAr, PbW04

➸ 8 different particles

► e-, pi+, pi-, k+, k-, k0L, p, n

➸ 23 different beam particles (GeV)

► 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 80, 100, 120, 150, 180, 200, 250, 300, 1000

➸ 5 physics list

► LHEP, QGSP, QGSC, QGSP_BIC, QGSP_BERT

The combinations of all these parameters define the possible scenarios to check the software

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 Toolkit and the GRID Environment

◘ Geant4 is an international project but the management is based at CERN ➸ It is very well known by LCG ➸ It is a “lower level tool” for many HEP experiments

➸ A good validated Geant4 software will assist during the experiment productions

◘ Its software is stable, feasible and very well tested in many platforms and systems ➸ This is attractive for LCG ➸ We would like to put it inside our own software tests

◘ They always arrive with a lack of time ➸ They produce the whole tar file 2 weeks before the release, so we have 2 weeks to perform the whole production

◘ We have to provide them with tools ➸ To perform fast and reliable productions ➸ Automatic job submission ➸ Easy monitoring tools

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 History

History:

◘ Geant4 contacts LCG in December 2004◘ At that moment, a support person was assigned to the

group➸ To teach them the project➸ To run the productions with them➸ To be the contact person with the deployment team and with the sites

◘ Between the 2 yearly productions, we have to keep on working with them➸ To improve the tools and the software provided to them

► The tools for the LCG implementation have been developed and provided by us➸ To fully involve them in the LCG infrastructure

► Make them become a new VO

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 History

Geant4 has already ran 3 productions with us

◘ First Production➸ During the event production phase, 5635 jobs had to be run for each Geant4 version: 11270 jobs in total➸ Finally the statistical test suite was used to compare parallel Geant4 outputs from each version (this part already outside the LCG resources)

◘ Second Production➸ During this phase 6440 jobs had to be run➸ This time each job contained the event production for each Geant4 version and the statistical test suite➸ In just one job the whole production and analysis was done

◘ Third Production➸ Same strategy as in the 2nd production

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Stages of the Geant4 Production

1. Software installation: Installation of the Geant4 packages (with all the required external additional packages) ➸ Software provided via a tar file (copied and register in the GRID) ➸ Installation performed using GRID jobs ➸ The installation is validated through a small production ➸ After this a tag is published inside the Information System ➸ Fundamental request for the sites: Shared area between WNs and

perfectly definition of the software installation region ( I go to this step immediately)

2. Events production: ➸ Jobs sent in bunches of 1288 each defined by each physics list➸ 5000 events were produce per job

3. Analysis (Performed outside GRID):➸ Statistical tests to perform the comparison between the two Geant4 versions.

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Why a shared area?

◘ Huge packages with the specific experiment software needed ➸ Executables, entry data, compilers, external libraries, etc…

◘ It is not effective to bring the whole experiment software with each job ➸ And this for each user of the VO ➸ Solution: Pre-installation of the specific experiment software before each production

◘ So what the production VO managers do:

➸ Pre-install the software in the site, in just one WN, through Grid jobs

➸ In certain regions visible from the WN (do not care where) ➸ Region mounted normally via NFS (shared among all WN) ➸ Access to it from the WN through a env variable ➸ Only sgm persons are allowed to do it

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Strategy inside LCG

◘ VO Configuration➸ 1st Production: dteam (6 certificates, one as dteamsgm)

► This is a problem for the sites because of the SFT tests► These are tests performed by the LCG team to test the sites► Normally sites dedicate 1 CPU for this test. If we take it for other purposesthey will not have a good result

➸ 2nd Production: alice (2 certificates, one as alicesgm)► We cannot count on it anymore, Alice and the rest of experiments are under

full production➸ 3nd Production: geant4 (2 certificates, one as geantsgm)

► First production with their own VO

◘ Resources➸ 1st Production: Own RB+BDII+UI at CERN➸ 2nd and 3rd Productions: lxplus resources and 2 RBs

◘ Outputs➸ 1st Production: about 30 GB stored at CERN (lxn1183)➸ 2nd and 3rd Productions: comparable quantity stored at CERN

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Geant4 requirements

◘ Each production takes about 3-4 years of CPU time◘ Very small output for the whole production: 15-20GB in total (fully retrieved to CERN for analysis)◘ As explain before, it is a CERN community ➸ CERN also supported them as site

◘ We ask (as support), we provide (as CERN) ➸ Access to UI (provided at CERN)

➸ VO = Geant4 (provided at CERN, but sites should recognize it)

➸ RBs access (provided at CERN)

➸ CE (dedicated long queues at each site)

➸ SE (provided at CERN)

➸ Software area (2GB at each site)

➸ Access to the LFC catalog (centralized at CERN)

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Tools developed for new Gridifications

Generation of a general framework consisting of 2 major tools:

1. Tool to perform the automatic job submission

2. Tool to retrieve and handle the corresponding output

1. Automatic job submission

◘ Given an user’s jdl this tool performs the following actions:

➸ It lists all sites able to run the jdl provided by the user

➸ It creates automatically a jdl file based on that provided by the user

➸ It submits the just created jdl containing the user application(s)

➸ Moreover it creates a subdirectory (defined by the user) containing a list of the sites where the jobs have been submitted, the corresponding jdls and the jobs IDs

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Tools developed for new Gridifications

◘ Additional Features: ➸ The user can define the queues where the jobs are submitted. These queues are checked to see whether it fixes the job requirements. ➸ Requested LFN files can be included. The corresponding TURLs are searched and included in a file passed in the InputSandbox to the WN

◘ Applications ➸ This tool has been used for the 1st and the 2nd phases of the production: software installation and production

◘ Usage:

./submitter_general -vo geant4 -jdl jdlexample -jobfile G4_PROD -data /grid/geant4/production_software

Mandatory

Mandatory

Give a jdl example

Mandatory. It stores the created jdl,

the job Ids, the list of used CEsNot mandatory. Just in the case this

LFN is required

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Tools developed for new Gridifications

2. Retrieve and handle of the outputs

➸ The 2nd tool checks the status of the jobs from the job IDs included in the directory given by the user

◘ Usage:

./get_output -jobfile G4_PROD -dest G4_PROD/outputs

◘ OutputThe job run in ramses.dcic.ups.es:2119/jobmanager-torque-

dteam is in status: ScheduledThe job run in grid01.phy.ncu.edu.tw:2119/jobmanager-torque-

dteam is in status: runningThe job run in scaic10.scai.frauhofer.de:2119/jobmanager-

torque-dteam is in status: over

Mandatory. Directory holding the jobs to monitor

Mandatory. Where to put the output

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Tools developed for new Gridifications

◘ Additional Features:

➸ It is possible to

visualize the

outputs on the web

➸ A html report is

provided

showing the files

decided by the user

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Results and Discussion

◘ 1st Production (as dteam) ➸ We were learning to involve Geant4 inside LCG ➸ The software was successfully installed in 28 sites ➸ Efficiency around 70%

◘ 2nd Production (as alice) ➸ The software was successfully installed in 35 sites ➸ Efficiency around 70%

◘ 3rd Production (as geant4) ➸ The software was successfully installed in 5 sites ➸ Efficiency 99% ➸ At this moment we have already 11 sites and OSG getting involved

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Results and Discussion

Strange Results?... Not reallyMain problems:

◘ Sites with not shared area or even not mounted ➸ It is not required for dteam (during the 1st production was the largest problem) ➸ We have forced the sites to include this region

◘ Instable sites➸ It is difficult to have under control 28 or 35 sites

➸ During the 3rd production (5sites), we assisted sites to setup the VO Geant4 and the contact with them was great

◘ Lack of time ➸ We have a short period of time ➸ Resubmissions no possible ➸ A good follow-up of the sites not possible

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Next Production with DIANE

◘ Resource optimization layer which exploits a pull model via direct communication channel between Master and Workers

◘ Implemented for the next Geant4 production

WN

WN

WNCE

◘ User runs the MASTER on his PC

◘ MASTER submits slaves NOT the jobs

➸ slaves are normal GRID jobs

◘ Slaves begin to pull jobs from the MASTER

MASTER

SITE

SLAVES

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

DIANE: EXAMPLE

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

UNOSAT

Satellite imagery based web mapping service

◘ Objectives➸ Easy access to quality geoinformation service➸ Organize the demand for geoinformation➸ Ensure cost-effective and timely products

◘ Core Services➸ Humanitarian Mapping➸ Image Processing

VEGETATION – 1 Km

IKONOS – 1m

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

UNOSAT

Data suppliers

Ground station

USER

UNOSATCentral Unit

WWW

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Relief Projects of UNOSAT

◘ Case Study: Indian Ocean Tsunami Relief and Development

◘ 29th Dec 2004: First Map distributed online to field users➸ 14th Jan 2005: Imagery Bank online:

► 100 Tsunami-related maps (pre and post)

► 670 raw satellite images

➸ January: 200,000 tsunami maps downloaded in total

◘ UNOSAT has a huge amount of data to stored

◘ CERN has provided a good amount of space for this aim

◘ From Summer 2005 the collaboration with GRID began

◘ Running and storing data in LCG/EGEE can certainly assist UNOSAT in their purposes

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

First step: UNOSAT and CERN

◘ UNOSAT is CERN partner since 2002◘ CERN supports them with network facilities, with computer infrastructure and with human (support) resources

◘ Asian Tsunami Example: ➸ Central Web Services at CERN under considerable strain ➸ Solution quickly found by CERN’s Internet Services Group ➸ Result: UNOSAT data remained available continuously

◘ UNOSAT provides the users with a web interface able to find the files of the images by clicking on the earth images

Attractive, easy....

◘ Something similar to do with the GRIDDeal with certificates, but possible

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

One step further: GRID

◘ Potential Bottlenecks: ➸Limited capacity and processing power

➸ Multiple satellites being launched

➸ Grid can help?

◘ In summer 2005 we have provided a whole structure at CERN for UNOSAT➸ UNOSAT Virtual Organization (VO)➸ 3.5TB in CASTOR➸ Computing Elements, Resource Brokers ➸ Collaboration with ARDA group➸ AFS area of 5GB

◘ We have run some UNOSAT tests (images compression) inside the GRID environment (quite successful)

◘ The framework developed for Geant4 has been adapted for UNOSAT needs

We have provided The whole GRID infrastructureAt CERN

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

A GRID Metadata Catalogue

◘ LFC Catalogue➸ Mapping of LFN to PFN

◘ UNOSAT requires➸ User will give as input data certain coordinates

➸ As output, he wants the PFN for downloading

◘ The ARDA Group assists us setting up the AMGA tool for UNOSAT

Oracle DB

ARDA APP

LFC

CASTOR

SRMMetadata

(x,y,z)LFN PFN

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Future Plans with UNOSAT

◘ Collaboration between UNOSAT, ARDA and GD➸ 1(2) ARDA and 2 UNOSAT Students

➸ Still many discussions needed

➸ Support from other sites foreseen

??(x,y,z)

GRID WORLD

Application

◘ User can get the info in his laptop too

◘ Fundamental AMGA

Trieste, 10th February 2006 Patricia Méndez Lorenzo

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

◘ The Support team is ready to assist projects besides the HEP communities to be involved in GRID

➸ We have different applications and frameworks ready to give such support

◘ Two different communities are already fully involved in the environment

➸ Geant4 and UNOSAT using ARDA applications normally used by huge HEP experiments

◘ Together with these communities we are gaining confidence in the procedure

➸ Now we can say we have a structure ready to do it