ganga status update will reece. will reece - imperial college londonpage 2 outline user statistics...

34
Ganga Status Update Will Reece

Upload: lindsay-stevens

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Ganga Status UpdateWill Reece

Page 2: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 2

Outline

• User Statistics

• User Experiences

• New Features in 4.3.0

• Upcoming Features

• Reference Manual

• Testing Tools

• Summary

Page 3: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 3

User Statistics

• 557 Unique Users Since Jan 1, ~110 per Week• 113 LHCb Users, ~25 Unique per Week

http://gangamon.cern.ch:8888/

25 Users

Page 4: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 4

User Experiences

• Feedback from Active LHCb Users– Helps prioritize features

• Tells us what Needs Improvement…– …and what is already good!

• Mailing Lists Good Source

• Will Look at Some Case Studies

Page 5: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 5

Robert Lambert• Used Gauss to Generate 70m Events

– Studying final state asymmetries custom decay– Needed 10-3 precision across 10 Pt bins

• Compared Custom Decay with DC06• Used Ganga and DIRAC ~4000 Jobs

– 2 Years of CPU Time!

Very Happy with DIRAC Success rateGanga Front-end – “Really Easy!”Likes SplitByFiles (but Replica Issues)Wants Merge of Subjobs

Page 6: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 6

Eduardo Rodrigues

• Toy MC Used for Sensitivity Studies– BsDs, BsDsK channels

– Needed large data set Used Ganga and LCG

• Uses ROOT and RooFit Root App– Ran ~3000 toy experiments– Each experiment takes 2-3 hours 1 year CPU!– Had some problems with LCG Planning to use Dirac

• Using PyROOT for e.g. Simplified Studies– Root App and LCG Backend with standard python modules

Has had good experience both with LSF and Grid

Page 7: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 7

Mitesh Patel

• Uses Ganga to Study Small Backgrounds• B± (D0/D0)(K,KK,)K± (LHCB-2006-066)

– Looking at suppressed (10-7) decays to measure

• Bd K*as New Physics Probe (LHCB-2007-038)

– Uses full sample b, b and bc to ntuple

Likes Splitters but Would Like More Warnings Has Submitted 1000s of Jobs Benefited from Developer Support More Examples Would be Nice

Page 8: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 8

New in 4.3.0

• GNU GPL License• Sun Grid Engine Support• Core Updates

– Oracle backend for remote repository– Subjob access to job repository optimized

• DIRAC Support for Root Application• PyROOT

– Run python jobs using the ROOT libraries

• Gaudi Updates: ROOT Map files• Many Bugfixes Improved Stability!

– Testing framework

http://ganga.web.cern.ch/ganga/release/4.3.0/

Page 9: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 9

Ganga Goes GPL

• 4.3.0 is First GPL Release– Aim is to protect project

• Applies to Future Releases

• Ganga Used Commercially– Clear license needed

http://www.gnu.org/licenses/gpl.html

Page 10: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 10

SGE Backend Now Supported

• Sun Grid Engine Support Added– Common batch system

• Can Use Following Applications– Executable– Root– Any Gaudi

Page 11: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 11

DIRAC Submission for ROOT

• Submit Jobs Using ROOT to DIRAC– Uses new functionality in DIRAC v2r13

• DIRAC Recommended for Remote ROOT Jobs– Improved reliability– Superior job debugging info– Excellent job monitoring

DIRAC is LHCb Standard for Distributed Analysis

Page 12: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 12

PyROOT Support

• ROOT Provides Python Bindings– Python is quick and easy to write Productive!

• Ganga Now Supports Use in Root App

• Need Correct Python Version for ROOT– Determined Automatically

• LHCb Configuration: uses LCG versions– /afs/cern.ch/sw/lcg/external/– Can be controlled in .gangarc file

Page 13: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 13

Page 14: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 14

PyROOT Support

• Root Documentation Updated– help(Root) in Ganga

Page 15: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 15

Gaudi Updates – ROOT Map

• ROOT Map used to Auto-load Libraries– Found via CMT

• Now Preparing for 4.3.x– Expect new LHCb Functionality in 4.3.2

Page 16: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 16

Upcoming Features

• Framework for Job Merging– Merge text and ROOT files

• Job Slices• LFC Aware Splitter for Gaudi

– Caching for Datasets

• Summary Printing of Objects• Improved Credential Management

Features planned for 4.3.x or 4.4:

https://twiki.cern.ch/twiki/bin/view/ArdaGrid/GangaIndex#GangaFourFour

Page 17: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 17

Merging of Jobs and Subjobs

• Jobs may have Many Subjobs• Hand Merge?

– Time Consuming and Error Prone Automate

• Merge Subjobs– Combines subjob output

• Can Run on Master Job Completion…• …or from Command Line• Merging Text and ROOT Files Supported

– What else is needed?

• Can Merge Lists of Jobs

Page 18: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 18

Automatic Merge

• Attach Merge Object to Job– Merge run on completion

Page 19: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 19

Command Line Merge

• Create List of Jobs to Merge– Will recursively merge subjobs

• Run Merge on Command Line• Support Job Slices in Ganga 4.4

Page 20: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 20

Types of Merge

• TextMerger – Concatenate Text– Unordered, but adds headers

• RootMerger – Combines ROOT Files– Uses hadd Adds histograms and trees

• MultipleMerger – Chain Merge Objects• SmartMerger – Merge by Extension

– Associations in .gangarc file

Page 21: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 21

Job Slices

• Change Semantics of jobs Object– Support slices jobs[-1], jobs[0:5]– Index by Job ID use __call__ e.g. jobs(45)

• Allow Job Operations on Slices– copy, fail, kill, peek, remove, resubmit, submit

• Job Subjobs also a Job Slice• Can Create Job Slice with select

– select(time='yesterday')– select(status='failed')

https://twiki.cern.ch/twiki/bin/view/ArdaGrid/GangaJobIndexingSlices

Page 22: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 22

LFC Aware Splitter for Gaudi

• Gaudi Provides SplitByFiles– Splits job into subjobs with subset of data files

• Data Files not Available in all Sites– Some subjobs are unrunnable

• DIRAC v2r14 Allows Query of LFC– Sort files by location optimal splitting

• New DiracSplitter– Splits files by file locations. Must use LFNs– Protects against mistyped file names Error

Page 23: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 23

Performance of LFC Replica Query

• Last SW Week– DIRAC v2r13: LFC Query Slow– ~0.5s per file 5min for 600 files

• DIRAC v2r14: Bulk Query– Much Improved Performance– Factor 10 times faster– 30s for 600 files

• Thanks to DIRAC Team!

DIRAC v2r13 Single QueryDIRAC v2r14 Multiple Query

Page 24: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 24

Performance of LFC Replica Query

• Further Speed Up Needed?– Multithreaded query worse– Limited by LFC– Queue system used?

• Use Replica Caching– Cache stored per file– Cache date stored

• Users Query with Dataset– updateReplicaCache()

• DiracSplitter Still Slow– Will print time estimate at start

Error bars show σ of 5 measurements

1397 Unique F

iles Q

ueried

Page 25: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 25

Printing Summary of Objects

• Printing Verbose– E.g. Job object with many

subjobs

• Summary as Default– Lists show length– Objects define own

summary

• Get Full Print– full_print(j)– Same on object attributes

Page 26: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 26

Page 27: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 27

Improved Credential Management

• Ganga Manages Credentials That Expire– AFS Token, Grid Proxy

• Expiring Tokens Affect Ganga Session

• Ganga May Not Clean-Up Services on Exit

• Introducing InternalService Objects– Ensures correct clean-up– Services not used when expired

• Alert Users Before Credentials Expire

• Ganga Shuts Down Gracefully

Page 28: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 28

Upcoming Feature – Remote Workspaces

• Roaming Ganga Profile

• Store Workspace Remotely– Access input and output files anywhere– Work across multiple machines

• Local Cache Created on Demand

• Currently at Prototyping Stage– Exciting new functionality!

• Release Schedule is Uncertain

Page 29: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 29

The Ganga Reference Manual

• Aim is to Show Ganga Help Online– Same information as help in Ganga

• Documentation Generated from Source

• Have Prototype Online– Missing documentation to be filled in on-going!

• Manual will be Generated with Release

• Feedback on Documentation Appreciated– Let us know if anything is not clear

http://ganga.web.cern.ch/ganga/user/GPI/

Page 30: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 30

Page 31: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 31

Testing Tools

• Use Test Framework– Based on unittest

• Reports with Release

• Helps Find Bugs!

• Now Collect Coverage– Use Figleaf Library– Should improve testing– Identifies untested code

Page 32: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 32

Page 33: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 33

The LHCb Distributed Analysis Mailing List

• Replaces Current List for LHCb Users – [email protected]

[email protected]– Can sign up at http://simba2.cern.ch

• Encourages User Community– Less support burden for developers!

https://mmm.cern.ch/public/archive-list/l/lhcb-distributed-analysis/

Page 34: Ganga Status Update Will Reece. Will Reece - Imperial College LondonPage 2 Outline User Statistics User Experiences New Features in 4.3.0 Upcoming Features

Will Reece - Imperial College LondonPage 34

Summary

• User Statistics: 557 Unique Users in ’07

• Ganga is de facto Grid front end tool for LHCb

• Ganga has New Features in 4.3.0– Dirac Handler for Root, PyROOT Support, etc.

• Interested Features Upcoming– Merge framework, DiracSplitter

• Reference Manual Coming Soon

http://ganga.web.cern.ch/ganga/