ganga status update will reece. will reece - imperial college londonpage 2 outline user statistics...
TRANSCRIPT
Ganga Status UpdateWill Reece
Will Reece - Imperial College LondonPage 2
Outline
• User Statistics
• User Experiences
• New Features in 4.3.0
• Upcoming Features
• Reference Manual
• Testing Tools
• Summary
Will Reece - Imperial College LondonPage 3
User Statistics
• 557 Unique Users Since Jan 1, ~110 per Week• 113 LHCb Users, ~25 Unique per Week
http://gangamon.cern.ch:8888/
25 Users
Will Reece - Imperial College LondonPage 4
User Experiences
• Feedback from Active LHCb Users– Helps prioritize features
• Tells us what Needs Improvement…– …and what is already good!
• Mailing Lists Good Source
• Will Look at Some Case Studies
Will Reece - Imperial College LondonPage 5
Robert Lambert• Used Gauss to Generate 70m Events
– Studying final state asymmetries custom decay– Needed 10-3 precision across 10 Pt bins
• Compared Custom Decay with DC06• Used Ganga and DIRAC ~4000 Jobs
– 2 Years of CPU Time!
Very Happy with DIRAC Success rateGanga Front-end – “Really Easy!”Likes SplitByFiles (but Replica Issues)Wants Merge of Subjobs
Will Reece - Imperial College LondonPage 6
Eduardo Rodrigues
• Toy MC Used for Sensitivity Studies– BsDs, BsDsK channels
– Needed large data set Used Ganga and LCG
• Uses ROOT and RooFit Root App– Ran ~3000 toy experiments– Each experiment takes 2-3 hours 1 year CPU!– Had some problems with LCG Planning to use Dirac
• Using PyROOT for e.g. Simplified Studies– Root App and LCG Backend with standard python modules
Has had good experience both with LSF and Grid
Will Reece - Imperial College LondonPage 7
Mitesh Patel
• Uses Ganga to Study Small Backgrounds• B± (D0/D0)(K,KK,)K± (LHCB-2006-066)
– Looking at suppressed (10-7) decays to measure
• Bd K*as New Physics Probe (LHCB-2007-038)
– Uses full sample b, b and bc to ntuple
Likes Splitters but Would Like More Warnings Has Submitted 1000s of Jobs Benefited from Developer Support More Examples Would be Nice
Will Reece - Imperial College LondonPage 8
New in 4.3.0
• GNU GPL License• Sun Grid Engine Support• Core Updates
– Oracle backend for remote repository– Subjob access to job repository optimized
• DIRAC Support for Root Application• PyROOT
– Run python jobs using the ROOT libraries
• Gaudi Updates: ROOT Map files• Many Bugfixes Improved Stability!
– Testing framework
http://ganga.web.cern.ch/ganga/release/4.3.0/
Will Reece - Imperial College LondonPage 9
Ganga Goes GPL
• 4.3.0 is First GPL Release– Aim is to protect project
• Applies to Future Releases
• Ganga Used Commercially– Clear license needed
http://www.gnu.org/licenses/gpl.html
Will Reece - Imperial College LondonPage 10
SGE Backend Now Supported
• Sun Grid Engine Support Added– Common batch system
• Can Use Following Applications– Executable– Root– Any Gaudi
Will Reece - Imperial College LondonPage 11
DIRAC Submission for ROOT
• Submit Jobs Using ROOT to DIRAC– Uses new functionality in DIRAC v2r13
• DIRAC Recommended for Remote ROOT Jobs– Improved reliability– Superior job debugging info– Excellent job monitoring
DIRAC is LHCb Standard for Distributed Analysis
Will Reece - Imperial College LondonPage 12
PyROOT Support
• ROOT Provides Python Bindings– Python is quick and easy to write Productive!
• Ganga Now Supports Use in Root App
• Need Correct Python Version for ROOT– Determined Automatically
• LHCb Configuration: uses LCG versions– /afs/cern.ch/sw/lcg/external/– Can be controlled in .gangarc file
Will Reece - Imperial College LondonPage 13
Will Reece - Imperial College LondonPage 14
PyROOT Support
• Root Documentation Updated– help(Root) in Ganga
Will Reece - Imperial College LondonPage 15
Gaudi Updates – ROOT Map
• ROOT Map used to Auto-load Libraries– Found via CMT
• Now Preparing for 4.3.x– Expect new LHCb Functionality in 4.3.2
Will Reece - Imperial College LondonPage 16
Upcoming Features
• Framework for Job Merging– Merge text and ROOT files
• Job Slices• LFC Aware Splitter for Gaudi
– Caching for Datasets
• Summary Printing of Objects• Improved Credential Management
Features planned for 4.3.x or 4.4:
https://twiki.cern.ch/twiki/bin/view/ArdaGrid/GangaIndex#GangaFourFour
Will Reece - Imperial College LondonPage 17
Merging of Jobs and Subjobs
• Jobs may have Many Subjobs• Hand Merge?
– Time Consuming and Error Prone Automate
• Merge Subjobs– Combines subjob output
• Can Run on Master Job Completion…• …or from Command Line• Merging Text and ROOT Files Supported
– What else is needed?
• Can Merge Lists of Jobs
Will Reece - Imperial College LondonPage 18
Automatic Merge
• Attach Merge Object to Job– Merge run on completion
Will Reece - Imperial College LondonPage 19
Command Line Merge
• Create List of Jobs to Merge– Will recursively merge subjobs
• Run Merge on Command Line• Support Job Slices in Ganga 4.4
Will Reece - Imperial College LondonPage 20
Types of Merge
• TextMerger – Concatenate Text– Unordered, but adds headers
• RootMerger – Combines ROOT Files– Uses hadd Adds histograms and trees
• MultipleMerger – Chain Merge Objects• SmartMerger – Merge by Extension
– Associations in .gangarc file
Will Reece - Imperial College LondonPage 21
Job Slices
• Change Semantics of jobs Object– Support slices jobs[-1], jobs[0:5]– Index by Job ID use __call__ e.g. jobs(45)
• Allow Job Operations on Slices– copy, fail, kill, peek, remove, resubmit, submit
• Job Subjobs also a Job Slice• Can Create Job Slice with select
– select(time='yesterday')– select(status='failed')
https://twiki.cern.ch/twiki/bin/view/ArdaGrid/GangaJobIndexingSlices
Will Reece - Imperial College LondonPage 22
LFC Aware Splitter for Gaudi
• Gaudi Provides SplitByFiles– Splits job into subjobs with subset of data files
• Data Files not Available in all Sites– Some subjobs are unrunnable
• DIRAC v2r14 Allows Query of LFC– Sort files by location optimal splitting
• New DiracSplitter– Splits files by file locations. Must use LFNs– Protects against mistyped file names Error
Will Reece - Imperial College LondonPage 23
Performance of LFC Replica Query
• Last SW Week– DIRAC v2r13: LFC Query Slow– ~0.5s per file 5min for 600 files
• DIRAC v2r14: Bulk Query– Much Improved Performance– Factor 10 times faster– 30s for 600 files
• Thanks to DIRAC Team!
DIRAC v2r13 Single QueryDIRAC v2r14 Multiple Query
Will Reece - Imperial College LondonPage 24
Performance of LFC Replica Query
• Further Speed Up Needed?– Multithreaded query worse– Limited by LFC– Queue system used?
• Use Replica Caching– Cache stored per file– Cache date stored
• Users Query with Dataset– updateReplicaCache()
• DiracSplitter Still Slow– Will print time estimate at start
Error bars show σ of 5 measurements
1397 Unique F
iles Q
ueried
Will Reece - Imperial College LondonPage 25
Printing Summary of Objects
• Printing Verbose– E.g. Job object with many
subjobs
• Summary as Default– Lists show length– Objects define own
summary
• Get Full Print– full_print(j)– Same on object attributes
Will Reece - Imperial College LondonPage 26
Will Reece - Imperial College LondonPage 27
Improved Credential Management
• Ganga Manages Credentials That Expire– AFS Token, Grid Proxy
• Expiring Tokens Affect Ganga Session
• Ganga May Not Clean-Up Services on Exit
• Introducing InternalService Objects– Ensures correct clean-up– Services not used when expired
• Alert Users Before Credentials Expire
• Ganga Shuts Down Gracefully
Will Reece - Imperial College LondonPage 28
Upcoming Feature – Remote Workspaces
• Roaming Ganga Profile
• Store Workspace Remotely– Access input and output files anywhere– Work across multiple machines
• Local Cache Created on Demand
• Currently at Prototyping Stage– Exciting new functionality!
• Release Schedule is Uncertain
Will Reece - Imperial College LondonPage 29
The Ganga Reference Manual
• Aim is to Show Ganga Help Online– Same information as help in Ganga
• Documentation Generated from Source
• Have Prototype Online– Missing documentation to be filled in on-going!
• Manual will be Generated with Release
• Feedback on Documentation Appreciated– Let us know if anything is not clear
http://ganga.web.cern.ch/ganga/user/GPI/
Will Reece - Imperial College LondonPage 30
Will Reece - Imperial College LondonPage 31
Testing Tools
• Use Test Framework– Based on unittest
• Reports with Release
• Helps Find Bugs!
• Now Collect Coverage– Use Figleaf Library– Should improve testing– Identifies untested code
Will Reece - Imperial College LondonPage 32
Will Reece - Imperial College LondonPage 33
The LHCb Distributed Analysis Mailing List
• Replaces Current List for LHCb Users – [email protected]
• [email protected]– Can sign up at http://simba2.cern.ch
• Encourages User Community– Less support burden for developers!
https://mmm.cern.ch/public/archive-list/l/lhcb-distributed-analysis/
Will Reece - Imperial College LondonPage 34
Summary
• User Statistics: 557 Unique Users in ’07
• Ganga is de facto Grid front end tool for LHCb
• Ganga has New Features in 4.3.0– Dirac Handler for Root, PyROOT Support, etc.
• Interested Features Upcoming– Merge framework, DiracSplitter
• Reference Manual Coming Soon
http://ganga.web.cern.ch/ganga/