ganga tutorial

38
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Ganga Tutorial Hurng-Chun Lee

Upload: elijah-guerra

Post on 02-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Ganga Tutorial. Hurng-Chun Lee. Agenda. Part I: Ganga introduction Part II: Ganga hands-on Part III: More about Ganga. Part I: Ganga Introduction. Ganga overview. PBS. PBS. LocalPC. SGE. LHCb Users. PANDA. LSF. Atlas Users. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Ganga Tutorial

Hurng-Chun Lee

EGEE Tutorial, Manchester 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Agenda

• Part I: Ganga introduction

• Part II: Ganga hands-on

• Part III: More about Ganga

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Part I: Ganga Introduction

Ganga overview

EGEE Tutorial, Manchester 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Motivation

• In practice users deal with multiple computing backends

PBSPBS

LSF

PANDA

SGELocalPC

LHCb Users

Atlas Users

Biomed,...

EGEE Tutorial, Manchester 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Motivation

• FAQ: running applications on multiple computing backends

I must learnmany interfaces

How to configuremy applications?

Do I get a consistent viewon all my jobs?

PBSPBS

LSF

PANDA

SGELocalPC

EGEE Tutorial, Manchester 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

User requirements

• User requirements– interact with all backend systems in a very similar way

submit, kill, monitor jobs

– configure the applications easily and transparently across desired backends

– organize work job history: keep track of what user did save job outputs in a consistent way reuse configuration of previously submitted jobs

Ganga

EGEE Tutorial, Manchester 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Wish list

• Wish list– easy to learn and use– powerful: not limiting the features of the backends– close to the application (which is typically compiled locally)– not imposing single working style: scripts, command line, GUI,...

Ganga

EGEE Tutorial, Manchester 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Considerations of scientific application development

• Realities– Computing environment is heterogeneous– Computing technology is evolving– User requirement is also evolving

• Requirement– Application users prefer to learn as few as possible the tools

which are light-weight, handy and well-integrated with each other.

• Ganga tries to answer the questions:– How to minimize developer’s effort in building applications?– How to minimize user’s effort in running applications?

EGEE Tutorial, Manchester 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Ganga

• Ganga: Job Management Interface– a utility which you download to your computer

or it is already installed in your institute in a shared area• for example: /nfs/sw/ganga/install/4.2.14

it is an add-on to installed software

– comes with a set of plugins for Atlas and LHCb open - other applications and backend may be easily added

• even by users

GangaFrameworkGangaFramework

........Backend PluginsBackend Plugins

Application PluginsApplication PluginsGangaApplication SoftwareLSF ClientLCG UI

EGEE Tutorial, Manchester 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Ganga Job

Where the Ganga journey starts …Where the Ganga journey starts …

Mandatory

Optional

EGEE Tutorial, Manchester 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Plug-in based design

Common interface

Specific implementation

Ease user’s experience in switching between different technologies

Concentrate developer’s effort in specific domain

EGEE Tutorial, Manchester 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Applications & backends

EGEE Tutorial, Manchester 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Ganga development team

-Support for development work

-Core team:F.Brochu (Cambridge), U.Egede (Imperial), J. Elmsheuser (Munich),

K.Harrison (Cambridge), H.C.Lee (ASGC Taipei), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest), W.Reece (Imperial), A.Soroko (Oxford), CL.Tan (Birmingham)

- Ganga is an ATLAS/LHCb joint project

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Part I: Introduction to Ganga

Using Ganga

EGEE Tutorial, Manchester 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Download, Install, First launch

wget http://ganga.web.cern.ch/ganga/download/ganga-install

python ganga-install \ --prefix=~/opt/ganga \ --extern=GangaAtlas,GangaGUI,GangaPlotter \ 4.2.8

Download & Install

export PATH $HOME/opt/ganga/install/slc3_gcc323/4.2.8/bin:$PATH

ganga -o’[LCG]ENABLE_EDG=False’ -o’[LCG]ENABLE_GLITE=False’

*** Welcome to Ganga ***Version: Ganga-4-2-8Documentation and support: http://cern.ch/gangaType help() or help('index') for online help.

In [1]:Do you really want to exit ([y]/n)?

First Launch

download installer

installation prefixInstallation of external modules

Ganga version

start Ganga with inline configurations

Ganga CLIP

<ctrl>-D to exit Ganga CLIP

EGEE Tutorial, Manchester 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Get your hands dirty ...

• Job().submit() submit and run a test job on local machine

• Job(backend=LCG()).submit() submit and run a a test job on LCG

• jobs browse the created jobs (job history)

• j = jobs[1] get the first job from the job history

• j print the details of the job and see what you can set for a job

• j.copy().submit() make a copy of the job and submit the new job

• j.<tab> see what you can do with the job

EGEE Tutorial, Manchester 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Configurations

[Configuration]TextShell = IPython... ...[LCG]EDG_ENABLE = True... ...

Syntax

Hardcoded configurations

export GANGA_CONFIG_PATH = /some/physics/subgroup.ini:GangaLHCb/LHCb.ini ganga --config-path=/some/pyhsycis/subgroup.ini:GangaLHCb/LHCb.ini

~/.gangarcganga -o

How to set configurations

user config > site config > release config

Override sequence

Python ConfigParser standard

release config

site config

user config

EGEE Tutorial, Manchester 18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The ‘gangadir’

• created at the first launch within $HOME directory

• To locate it in different directory:– [DefaultJobRepository] local_root = /alternative/gangadir/repository– [FileWorkspace] topdir = /alternative/gangadir/workspace

Metadata of jobs

Data of jobs

EGEE Tutorial, Manchester 19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

*** Welcome to Ganga ***Version: Ganga-4-2-8Documentation and support: http://cern.ch/gangaType help() or help('index') for online help.

In [1]: jobsOut[1]: Statistics: 1 jobs--------------# id status name subjobs application backend backend.actualCE # 1 completed Executable LCG lcg-compute.hpc.unimelb.edu.au:2119/jobmanage

CLIP

User interfaces

GUI

#!/usr/bin/env ganga#-*-python-*-import timej = Job()j.backend = LCG()j.submit()while not j.status in [‘completed’,’failed’]: print(‘job still running’) time.sleep(30)

./myjob.exec

ganga ./myjob.exec

In [1]:execfile(“myjob.exec”)

GPI & Scripting

EGEE Tutorial, Manchester 20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Some handy functions

• <tab> completion

• <page up/down> for cmd history

• system command integration

• Job template

• In[1]: plugins()– plugins(‘backends’)

• In[2]: help()

• etc.

In[1]: j = jobs[1]

In[2]: cat $j.outputdir/stdoutHello World

In[1]: t = JobTemplate(name=’lcg_simple’)

In[2]: t.backend = LCG(middleware=’EDG’)

In[3]: templatesOut[3]: Statistics: 1 templates--------------# id status name subjobs application backend backend.actualCE # 3 template lcg_simple Executable LCG

In[4]: j = Job(templates[3])In[5]: j.submit()

EGEE Tutorial, Manchester 21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

j = Job()j.application = Athena()j.application.option_file = ‘myOpts.py’j.application.prepare(athena_compile = False)

j.inputdata = DQ2Dataset()j.inputdata.dataset = ‘interestingDataset.AOD.v12003104’j.inputdata.type = ‘DQ2_Local’

j.outputdata = AthenaOutputDataset()j.outputdata.outputdata = ‘myOutput.root’

j.splitter = AthenaSplitterJob(numsubjobs=2)j.merger = AthenaOutputMerger()

j.backend = LCG( CE=’ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas’ )j.submit()

CLIP modeCLIP mode

Real Application: the ATLAS data analysis application

application

inputdata

Outputdata

Splitter & Merger

Scripting modeScripting mode

EGEE Tutorial, Manchester 22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Behind the scene ...

EGEE Tutorial, Manchester 23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Application plugin

generic logic

specific backend binding

Application plugin

A good example: python/GangaTutorial/Lib/*.py

Splitter

Merger

Datasets

EGEE Tutorial, Manchester 24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Summary

• What is Ganga?– An easy-to-use front-end for job definition and management– A generic framework extensible for specific applications– A light-weight application component fully implemented in

Python

• What Ganga can help in application development?– Ganga provides a set of ready-to-use APIs for high-level job

management– It simplifies developers’ effort in developing applications– End-users can easily extend Ganga for their own purpose

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Part II: Ganga hands-on

EGEE Tutorial, Manchester 26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Step 0: launch Ganga CLIP

https://twiki.cern.ch/twiki/bin/view/ArdaGrid/EGEETutorialPackage

• Skip the installation step• Start your Ganga CLIP session using the commands:

shell> ganga --config-path=GangaTutorial/Tutorial.ini \

--option=‘[LCG]VirtualOrganisation=gilda’

EGEE Tutorial, Manchester 27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Step 1: Your first Ganga job - an arbitrary shell script

In [1]: !vi myscript.sh

In [2]: !chmod +x myscript.sh

In [2]: j = Job()

In [3]: j.application = Executable()

In [4]: j.application.exe = File(‘myscript.sh’)

In [5]: j.application.args = [‘ganga’]

In [6]: j.backend = Local()

In [7]: j.submit()

In [8]: jobs

In [9]: j.peek()

In [10]:cat $j.outputdir/stdout

#!/bin/shecho "hello! ${1}”echo $HOSTNAMEcat /proc/cpuinfo | grep 'model name’cat /proc/meminfo | grep 'MemTotal'

./myscript.sh ganga

EGEE Tutorial, Manchester 28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Step 2: your first Ganga job on the Grid

In [11]:j = j.copy()

In [12]:j.backend = LCG()

In [13]:j.application.args = [‘grid’]

In [14]:j.submit()

In [15]:j

In [16]:cat $j.backend.loginfo(verbosity=1)

In [17]:jobs

EGEE Tutorial, Manchester 29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Exercise: Prime number lookup

• Using the following Ganga components for this exercise:– Application: PrimeFactorizer()– Dataset: PrimeTableDataset()– Splitter: PrimeFactorizerSplitter()

Hints of the exercise

• Create a Ganga job

• Application specification

• Input dataset specification

• Splitter specification

• Backend specification

• Submit the job

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Part III: More about Ganga

EGEE Tutorial, Manchester 32

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Integration with frameworks

Job statistics on 2580 grid jobs from Ganga

EGEE Tutorial, Manchester 33

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Web portal for biologists

• Interface created by biologists (Model-View-Controller design pattern)

– The Model makes use of Ganga as a submission tool and DIANE to better handle docking jobs on the Grid– The Controller organizes a set of actions to perform the virtual screening pipeline; The View represents biological

aspects

EGEE Tutorial, Manchester 34

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

GANGA Activities

• Main Users

• Other activities

HARP GarfieldGarfield

EGEE Tutorial, Manchester 35

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

More than job submission: Monitoring & Accounting

submission tool

EGEE Tutorial, Manchester 36

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

More than 500 unique Users

~ 550 different users, ~100 users weeklythe monitoring started end 2006

Easter

~60% Atlas~25% LHCb~15% others

EGEE Tutorial, Manchester 37

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Ganga usage

over 50 local sites

CLIP and scripts most popular

EGEE Tutorial, Manchester 38

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

More info.

• Ganga Home: http://cern.ch/ganga

• Official Ganga User’s Guide: http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/

• Tutorial for ATLAS data analysis using Ganga: https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedAnalysisUsingGanga

• Looking for helps:– ATLAS user support: [email protected]– direct support from developers: [email protected]

EGEE Tutorial, Manchester 39

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Working with Ganga …

• Your applications developed in testbed environments can be smoothly migrated to production environments

• Your jobs are managed in a systematic way

• Your grid jobs benefit from a hidden job wrapper instrumented for– advanced input/output control– runtime progress monitoring

• New technologies will be transparent for you without changing your way of running applications