s. gadomski, "atlas computing in geneva", journee de reflexion, 14 sept 20071 atlas...

13
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 1 ATLAS computing in Geneva Szymon Gadomski description of the hardware the functionality we need the current status list of projects

Post on 23-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 1

ATLAS computing in Geneva

Szymon Gadomski

• description of the hardware• the functionality we need• the current status• list of projects

Page 2: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 2

The cluster at Uni Dufour (1)

Page 3: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 3

The cluster at Uni Dufour (2)

12 worker nodesin 2005

21 in 2006

and 20 in 2007!

Page 4: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 4

The cluster at Uni Dufour (3)

power and networkcabling of worker nodes

three nodesfor services(grid, batch,storageabstraction)

direct line from CERN

Page 5: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 5

The cluster in numbers

• 61 computers to manage – 53 workers, 5 file servers, 3 service nodes

• 188 CPU cores in the workers• 75 5 TB of disk storage• can burn up to 30 kW (power supply specs)

Page 6: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 6

The functionality we need

• our local cluster computing– log in and have an environment to work with ATLAS software, both

offline and trigger • develop code, compile, • interact with ATLAS software repository at CERN

– work with nightly releases of ATLAS software, normally not distributed off-site but visible on /afs

– disk space– use of final analysis tools, in particular ROOT– a convenient way to run batch jobs

• grid computing– tools to transfer data from CERN as well as from and to other Grid

sites worldwide– ways to submit our jobs to other grid sites– a way for ATLAS colleagues to submit jobs to us

Page 7: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 7

The system in production

Description at https://twiki.cern.ch/twiki/bin/view/Atlas/GenevaATLASClusterDescription

• 1 file server (+1 if needed), 3 login machines and 18 batch worker nodes• 30 ATLAS people have accounts

– ATLAS GE + friends and relations – people rely on the service

• maintenance of the system (0.3 FTE, top priority)– creation of user accounts,– web-based documentation for users,– installation of ATLAS releases,– maintenance of worker nodes, file servers and the batch system,– assistance to users executing data transfers to the cluster,– help with problems related to running of ATLAS software off-CERN-site, e,g.

access to data bases at CERN, firewall issues e.t.c. – raid recovery from hardware failures.

Page 8: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 8

Our systemin the Grid

• Geneva is in NorduGrid since 2005

• In company of Berne and Manno (out Tier 2)

Page 9: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 9

One recent setback

• We used to have a system of up to 35 machines in production.

• Problems with power to our racks since last August– A major blackout in Plainpalais area on August 2nd; UPS has gave up

after 10’ in the machine room; all University services went down. A major disaster.

– When recovering, we lost power again the next day. No explanation from the DINF.

– Slightly smaller system in use since then. Power lost again on Friday Sept 7th.

• Right now only a minimal service. Need to work together with the DINF, measure power consumption of our machines under full load. Also need to understand the limits of the infrastructure.

• Another power line is being laid for our 20 new worker nodes, the “blades”. The power cut has nothing to do with that.

Page 10: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 10

Things to do (and to research)a) Configuration of worker nodes:

• configuration of the CERN Scientific Linux system,• torque batch system software,• other added software, as requested by the users.

b) General cluster management issues:• security,• a way to install the system on multiple machines (three types of worker nodes),• automatic shutdown when UPS turns on,• monitoring of temperature, CPU use, network use.

c) Storage management:• operating system for the SunFire X4500 file servers (SUN Solaris or CERN Scientific

Linux), • a solution for storage management (e.g. dCache or DPM).

d) Grid nodes and grid software:• configuration of the CERN Scientific Linux for the grid interface nodes,• choice and installation of a batch system,• choice and installation of grid middleware.

e) Tools for interactive use of multiple machines (e.g. PROOF, Ganga).• Grid job submission interfaces (e.g. Ganga, GridPilot)

Page 11: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 11

Diagram of the system for 1st data

all the hardware is in place, (not all powered up)

some open questions

biggest new issueis storage managementwith multipleservers

Page 12: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 12

Summary• The ATLAS cluster in Geneva is a large Tier 3

– now 188 worker’s CPU cores and 75 TB – not all hardware is integrated yet

• A part of the system is in production– a Grid site since 2005, runs ATLAS simulation like

a Tier 2, plan to continue that.– since Spring in constant interactive use by the

Geneva group, plan to continue and to develop further. The group needs local computing.

• Busy program for several months to have all hardware integrated. With a larger scale come new issues to deal with.

Page 13: S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 2007 13

Comments about future evolution

• Interactive work is vital.– Everyone needs to login somewhere. – The more we can do interactively, the better for our efficiency. – A larger fraction of the cluster will be available for login.

• Plan to remain a Grid site. – Bern and Geneva have been playing a role of a Tier 2 in ATLAS.

We plan to continue that. • Data transfer are too unreliable in ATLAS.

– Need to find ways to make them work much better.– Data placement from FZK directly to Geneva would be welcome.

No way to do that (LCG>NorduGrid) at the moment..• Be careful with extrapolations from present experience. Real data volume will be 200x larger then a large Monte Carlo production.