alice t1/t2 workshop 4-6 june 2013 ccin2p3 lyon famous last words

20
ALICE T1/T2 workshop ALICE T1/T2 workshop 4-6 June 2013 4-6 June 2013 CCIN2P3 Lyon CCIN2P3 Lyon Famous last words Famous last words

Upload: miles-hoover

Post on 13-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

ALICE T1/T2 workshop ALICE T1/T2 workshop 4-6 June 20134-6 June 2013CCIN2P3 LyonCCIN2P3 Lyon

Famous last wordsFamous last words

Page 2: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

22

ALICE T1/T2 workshopsALICE T1/T2 workshops• Yearly event

• 2011 – CERN• 2012 – KIT (Germany)• 2013 – CCIN2P3 (Lyon)

• Aims at gathering middleware and storage software developers, grid operation and network experts, and site administrators involved in ALICE computing activities

Page 3: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

33

Some stats of the Lyon workshopSome stats of the Lyon workshop• 46 registered participants, 45 attended

• Good attendance, clearly these venues are still popular and needed

• 24 presentations over 5 session• 9 general on operations, software, procedures• 15 site-specific

• Appropriate number of coffee and lunch breaks, social events• Ample time for questions (numerous) and discussion (lively), true workshop style

Page 4: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

44

ThemesThemes• Operations summary• WLCG middleware/services• Monitoring• Networking: LHCONE and IPv6• Storage: xrood v4. and EOS• CVMFS and AliRoot• Site operations, upgrades and (new) projects, gripes (actually none…)

Page 5: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

55

Messages digest from the Messages digest from the presentationspresentations

• The original slides are available at the workshop indico page• Operations

• Successful year for ALICE and Grid operations – smooth and generally problem free, incident handling is mature and fast

• No changes foreseen to the operations principles and communication channels

• 2013/2014 (LHC LS1) will be years of data reprocessing and infrastructure upgrade

• The focus is on analysis – how to make it more efficient

Page 6: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

66

Messages (2)Messages (2)• WLCG middleware

• CVMFS installed on many sites, leverage ALICE deployment and tuning through the existing TF

• WLCG VO-box is there and everyone should update• All EMI-3 products can be used• SHA-2 is on the horizon, services must be made

compatible• glExec – hey, it is still alive!

• Agile Infrastructure – IaaS, SaaS (for now)• OpenStack (Cinder, Keystone, Nova, Horizon, Glance)• Management through Puppet (Foreman, MPM,

PuppetDB, Hiera, git) … and Facter• Storage with Ceph• All of the above – prototyping and tests, ramping up

Page 7: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

77

Messages (3)Messages (3)• Site dashboard

• http://alimonitor.cern.ch/siteinfo/issues.jsp• Get on the above link and start fixing, if you are on the

list• LHCONE

• The figure speaks for itself• All T2s shouldget involved• Instructions,expert lists are in the presentation

Page 8: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

88

Messages (4)Messages (4)• IPv6 and ALICE

• IPv4 address space almost depleted, IPv6 is being deployed (CERN, 3 ALICE sites already)

• Not all services are IPv6-ready – test and adjustment is needed

• Cool history of the network bw evolution

• Xrootd 4.0.0• Complete client rewrite, new caching, non-blocking

request (client call-back), new user classes for metadata and data operations, IPv6 ready

• Impressive speedup for large operations• API redesigned, no backward compatibility, some cli

commands change names• ROOT plugin ready and being tested• Mid-July release target

Page 9: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

99

Messages (5)Messages (5)• EOS

• Main disk storage manager at CERN, 45PB deployed 32PB used (9.9/8/3 ALICE)

• Designed to work with cheap storage servers, uses software raid (RAIN), ppm probability of file loss

• Impressive array of control and service tools (operations in mind)

• Even more impressive benchmarks…• Site installation – read carefully the pros/cons to

decide if it is good for you• Support – best effort, xrootd type

Page 10: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1010

Messages (6)Messages (6)• ALICE production and analysis software

• AliRoot is “one software to rule them all” in ALICE offline

• >150 developers, analysis 1M SLOC, reconstruction, simulation, calibration, alignment, visualization: ~1.4M SLOC, supported on many platforms and flavors

• In development since 1(8)998• Sophisticated MC framework with embedded physics

generators, using G3 and G4• Incorporates the full calibration code, which is also run

on-line and in HLT (code share)• Encapsulates fully the analysis, a lot of work on

improving it, more quality and control checks needed• Efforts to reduce memory consumption in reco• G4 and Fluka in MC

Page 11: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1111

Messages (7)Messages (7)

•CVMFS – timeline and procedures• Mature, scalable and supported product• Used by all other LHC experiments (and beyond)• Based on proven CernVM Family• Enabling technology for Clouds, CernVM as a user

interface, Virtual Analysis Facilities, opportunistic resources, Volunteer computing, part of a Long Term Data Preservation

• April 2014 – CVMFS on all sites, only method of sw distribution for ALICE

Page 12: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1212

Sites Messages (1)Sites Messages (1)• UK

• GridPP T1+19, RAL, Oxford and Birmingham for ALICE

• Smooth operation, ALICE can (and does) run beyond its pledge, occasional problems with job memory

• Test of cloud on small scale• RMKI_KFKI

• Shared CMS/ALICE (170 cores, 72TB disk)• Good resources delivery• Fast turnaround of experts, good documentation on

operations is a must (done)

Page 13: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1313

Sites Messages (2)Sites Messages (2)• KISTI

• Extended support team of 8 people • Tape system tested with RAW data from CERN• Network still to be debugged, but not a showstopper• CPU to be ramped up x2 in 2013• Well on its way to be the first T1 since the big T1 bang

• NDGF• Lose some (PDC), get some more cores (CSC)• Smooth going, dCache will stay and will get location

information to improve efficiency• The 0.0009 (reported, not real) efficiency at DCSC/KU

still a mystery, however it hurts NDGF as a whole, must be fixed

Page 14: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1414

Sites Messages (3)Sites Messages (3)• Italy

• New head honcho – Domenico Elia (grazie Massimo!)• Funding is tough, National Research Projects help a lot

for manpower, PON helps with hardware in the south• 6T2s and a T1 – smooth delivery and generally no

issues• Torino is a hotbed of new technology – Clouds

(OpenNebula, GlusterFS, OpenWRT)• TAF is open for business, completely virtual (surprise!)

• Prague• The city is (partially) under water• Current 3.7cores 2PB disk, shared LHC/D0,

contributes ~1.5% Grid resources of ALICE+ATLAS• Stable operation, distributed storage• Funding situation is degrading

Page 15: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1515

Sites Messages (4)Sites Messages (4)• US

• LLNL+LBL resources purchasing is complimentary and fits well to cover changing requirements

• CPU pledges fulfilled, SE a bit underused, on the rise• Infestation of the ‘zombie grass’ jobs, this is California,

something of this sort was to be expected…• Possibility for tape storage at LBL (potential T1)

• France• 8T2s, 1T1, providing 10% of WLCG power, steady

operation • Emphasis on common solutions for services and

support• All centres are in LHCONE (7in+7out PB have already

passed through it)• Flat resources provisioning for the next 4 year

Page 16: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1616

Sites Messages (5)Sites Messages (5)• India (Kolkata)

• Provides about 1.2% of ALICE resources• Innovative cooling solution, all issues of the past

solved, stable operation• Plans for steady resources expansion

• Germany• 2T2s, 1T1 – the largest T1 in WLCG, provides ~50%

of ALICE T1 resources• Good centre names: Hessisches

Hochleistungsrechenzentrum Goethe Universität (requires 180IQ to say it)

• The T2s have heterogeneous installation (both batch and storage), support many non-LHC groups, well integrated in the ALICE Grid, smooth delivery

Page 17: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1717

Sites Messages (6)Sites Messages (6)• Slovakia

• Since 2006 In ALICE• Serves ALICE/ATLAS/HONE• Upgrades planed for air-conditioning and power, later

CPU and disk, expert support is a concern• Reliable and steady resources provision

• RDIG• RRC-KI (toward T1): Hardware (CPU/Storage) rollout,

service installation and validation, personnel is in place, pilot testing with ATLAS payloads

• 8T2s + JRAF + PoD@SPbSU, deliver ~5% of the ALICE Grid resources, historically support all LHC VOs

• Plans for steady growth and sites consolidation• As all others, reliable and smooth operation

Page 18: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1818

Social eventsSocial events

Page 19: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

1919

Victory!Victory!

How are you so cool underpressure?

I work at a T1!

Page 20: ALICE T1/T2 workshop 4-6 June 2013 CCIN2P3 Lyon Famous last words

2020

The group The group