david hutchcroft on behalf of john bland rob fay steve jones and mike houlden [ret.] * /.\ /..‘\...

$: David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\$
ComputingChristmas 2010 meeting

David Hutchcroft on behalf of John BlandRob FaySteve JonesAnd Mike Houlden [ret.]

* /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\/'.''.'.\^^^[_]^^^

* /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\/'.''.'.\^^^[_]^^^

_\/ \/_ _\/\/_ _\_\_\/\/_/_/_ / /_/\/\_\ \ _/\/\_ /\ /\

Thanks for the hard work

John, Rob and Steve have made the local system and the grid work extremely well this year

The result of a very large amount of work on the part of the team means we have had an excellent reliability

Local grid performance over the last 3 months

Things done this year (partial list)

Replaced MAP2 with HAMMER Moved to KVM for grid services 216 helpdesk tickets handled Helped Maths install their new cluster in our

server room Replaced printers Made scratch self cleaning and introduced

quotas for hepstore Upgraded user area Added 40TiB to GridPP storage All DPM on SL5 and tuned for max

performance

Hep services running well

Since new HEP server [August 2009] was installed the load on email, Twiki etc was much smaller

Local batch jobs running:peak was ~85

Jan Dec

Spillover for LCG jobs

Jan Dec

Cluster room The DELL nodes became

scrap Mostly sold at £5 each

but other given away Around 10 tons of scrap

removed The air-conditioning is

getting toward the end of its rated life: lower load gives overhead

All racks currently air cooled Officially not allowed to put

more into the server room until next university financial year

Link to CSD cluster

Linked to 60 nodes in CSD computer room Dual quad-core AMD Opteron 2.3GHz 'Barcelona' CPUs 32GB memory (4GB per core)

Switch from SUSE 10.3 to CentOS 5 to make the ATLAS software stack happy

10 gigabit link to storage in our cluster room Will be seen as a separate NW grid site

Jobs run there credited to us Expect to get a “substantial” portion of the nodes for

grid work

Slide is from last year: the link is still used very little

ATLAS : Hammercloud tests

Hammercloud is the simulation of the grid usage of ATLAS data analysis jobs

Obviously now have real jobs as well but we are still one of the leading sites working on optimising Tier2s for ATLAS jobs

Stuff anticipated for 2011

LHC and T2K continue data taking Generally expect an increase in local data

storage Please use DPM for data rather than hepstore or

scratch People should always use the grid for running

on large datasets if available Limited local disk and batch facilities Can not store a full ATLAS skim here...

Conversion to 64-bit processing for LHC nearly complete Already moves to use only 64-bit SLC5 nodes on the

grid

Upgrades in 2011

Linux desktop upgrades SL5/64bit on all Linux analysis machines Rack/network upgrades for Tier2

Possibly beginning to bottleneck on link to SuperJANET

Storage upgrades We are behind on our disk quota for GridPP:

future money to be spent on that

Network weather map : 2009

Network weather map

Now

Reminder of disks available

scratch for rapid access and temporary files Now cleaned: emails before

file deletion hepstore for medium/long

term storage of files Quotas imposed : 64% full

presently dpm where all large

experimental dataset must reside See the twiki page for how to

use it:https://hep.ph.liv.ac.uk/twiki/bin/view/

Computing/GridStorageGuide

hepstore free

scratch free

GridPP 4

GridPP 4 grant starts in April next year Pays for 1.8 people [up from 0.8 last time]

Approved as a 2+2 (i.e. only two years initially)

Large cuts to Tier 2 hardware budgets Sometime soon the metrics used to

determine service levels should be published Some anomalies ▪ Still asking for 80% of the disk to be empty from

before LHC data taking... If we do well more money flows in future tranches

Conclusion

A very successful year for the local team Excellent performance rewarded by GridPP

money Issues are

Future of 32 bit nodes now very limited The LHC data taking and analysis is stress

testing the grid more thoroughly Need to maintain excellent performance as

small or underperforming sites may not be funded

Please thank these people for making everything work

david hutchcroft on behalf of john bland rob fay steve jones and mike houlden [ret.] * /.\ /..‘\...

Documents

grid upgrades

grid work slide

atlas jobs stuff

little atlas

atlas skim

smaller local batch

separate nw grid sitejobs

new cluster