status of the wlcg tier-2 centres

17
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1 Simon Fraser Status of the WLCG Tier-2 Centres Status of the WLCG Tier-2 Centres M.C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN, October 27 th 2008

Upload: austin-cummings

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

Status of the WLCG Tier-2 Centres. M.C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN , October 27 th 2008. Sources of Information. Discussions with experiment representatives in July - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1

Simon Fraser

Status of the WLCG Tier-2 CentresStatus of the WLCG Tier-2 Centres

M.C. VetterliSimon Fraser University

and TRIUMF

WLCG Overview Board,CERN, October 27th 2008

Page 2: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #2

Simon Fraser

Sources of InformationSources of Information

Discussions with experiment representatives in July

APEL monitoring portal http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php

WLCG reliability reports http://lcg.web.cern.ch/LCG/accounts.htm

October GDB mtg; dedicated to Tier-2 issues http://indico.cern.ch/conferenceDisplay.py?confId=20234

Talks from the last OB & LHCC Slides labeled with a * are from MV’s LHCC rapporteur talk

Page 3: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #3

Simon Fraser

Tier-2 Performance Summary*Tier-2 Performance Summary*

Overall, the Tier-2s are contributing much more now

Significant fractions of the Monte Carlo simulations are being done in the T2s for all experiments

Reliability is better, but still needs to improve

CCRC’08 exercise is generally considered a success for the Tier2s

Page 4: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #4

Simon Fraser

Overall, the Tier-2s and the experiments considered the CCRC’08 exercise to be a success

The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out

Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated

While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested

Tier-2 Centres in CCRC’08 – General*Tier-2 Centres in CCRC’08 – General*

Page 5: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #5

Simon Fraser

Tier-2 Issues/ConcernsTier-2 Issues/Concerns

As of CB and meetings with experiments this summer

Communications: Do Tier-2s have a voice? Is there a good

mechanism for disseminating information?

Better monitoring: Pledges vs actual vs used

Hardware acquisitions: What should be bought? kSI2006?

Tier-2 capacity: Size of datasets? Effect of LHC delay?

Page 6: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #6

Simon Fraser

Tier-2 Issues/ConcernsTier-2 Issues/Concerns

Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern

User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2s especially.

Federated Tier-2s: Tools to federate? Monitoring? (averaging)

Interoperability of EGEE, OSG, and NDGF should be improved

Software/Middleware updates: Could be smoother; too frequent

Page 7: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #7

Simon Fraser

Communications for Tier-2sCommunications for Tier-2s

Identified by the T2s at the last CB as a serious problem. Interesting to me that many in experiment computing management did not share this concern.

Should communication be organized according to experiment or to Tier-1 association? There are also differing opinions on this. There are two issues: Grid middleware/operations Experiment software

My view after studying this is that the situation is OK for “tightly coupled” Tier-2s, but not for remote and smaller Tier-2s that are not well coupled to a Tier-1.

Page 8: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #8

Simon Fraser

Communications for Tier-2sCommunications for Tier-2s

Many lines of communication do indeed exist.

Some examples are: CMS has two Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN) - attend all operations meetings - feed T2 issues back to the operations group - write T2-relevant minutes - organize T2 workshops ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T2 site manager - weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1

Page 9: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #9

Simon Fraser

Communications for Tier-2sCommunications for Tier-2s

ATLAS uses its cloud structure for communications - Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e.g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinator Regional organizations, such as: + France Tier-2/3 technical group: - coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management + GRIF: Tier-2 federation of 5 labs around Paris + Canada: Weekly teleconferences of technical personnel (T1 & T2) to share information and prepare for upgrades, large production, etc. + Many others exist; e.g. in the US and the UK

Page 10: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #10

Simon Fraser

Communications for Tier-2sCommunications for Tier-2s Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2s a voice there.

Tier-2 mailing list: Actually exists and is being reviewed for completeness & accuracy

Tier-2 GDB: The October GDB was dedicated to Tier-2 issues + reports from experiments: role of the T2s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year?

Page 11: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #11

Simon Fraser

Page 12: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #12

Simon Fraser

Page 13: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #13

Simon Fraser

But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema

Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T2s better

Bad because APEL accounting still has problems Accounting seems to be very finicky; breaks when the CE or MON box is upgraded

How are jobs distributed to the Tier-2s?

Tier-2 Installed ResourcesTier-2 Installed Resources

Page 14: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #14

Simon Fraser

How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well

Tier-2 Hardware QuestionsTier-2 Hardware Questions

Page 15: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #15

Simon Fraser

How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well

We need to use something other than SpecInt2000! + this benchmark is totally out-of-date & useless for new CPUs + continued delays in SpecHEP can cause sub-optimal decisions

Tier-2 Hardware QuestionsTier-2 Hardware Questions

Page 16: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #16

Simon Fraser

Networking to the nodes is now an issue. + with 8 cores per node, 1 GigE connection ≈ 16.8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7.5 MB/sec at ATLAS and much more (x10?) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking GigE the worker nodes.

We need more guidance from the experiments The next round of purchases is now!

Tier-2 Hardware QuestionsTier-2 Hardware Questions

Page 17: Status of the WLCG Tier-2 Centres

M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #17

Simon Fraser

SummarySummary

The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T2s now.

The CCRC’08 exercise is considered a success by the Tier2s and by the experiments.

Availability and reliability are up, but still need improvement.

Resource acquisition vs pledges is better but still needs work

Issues for Tier2s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed