nikhef11 th march 2015 1 wlcg operational costs m. dimou, j. flix, a. forti, a. sciab wlcg...

Download NIKHEF11 th March 2015 1 WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciab WLCG Operations Coordination Team GDB  NIKHEF [11 th March

If you can't read please download the document

Upload: lynn-curtis

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

NIKHEF11 th March WLCG Site Survey  The Survey was focused, mainly, in 5 areas: ‣ FTE effort spent on operating services used for WLCG and on other activities related to WLCG operations ‣ Service upgrades & changes ‣ Communications ‣ Monitoring ‣ Services administration  The answers are (still) being analyzed ‣ The final report will be provided at the WLCG Collaboration Workshop in Okinawa (April 2015) ‣ Today we show some ( preliminary ) results for the FTE effort and Communication areas

TRANSCRIPT

NIKHEF11 th March WLCG Operational Costs M. Dimou, J. Flix, A. Forti, A. Sciab WLCG Operations Coordination Team GDB NIKHEF [11 th March 2015] NIKHEF11 th March WLCG Operational Costs WLCG Ops Coordination team was asked to launch this project in order to understand how effectively Grid Operations for the LHC experiments are organized, both centrally and at the computing sites Feedback from the experiments was collected at the end of 2014 A Site Survey was as well circulated at the end of 2014: Each site provided one (detailed and anonymous) answer All of these complete and detailed answers are useful for all sites Allow sites to give their feedback and suggestions on how to improve Ops ~100 sites answered the Survey The input received is very useful to draw indications on what could be done to make WLCG operations less expensive for the sites, for the experiments, and for the central operations team NIKHEF11 th March WLCG Site Survey The Survey was focused, mainly, in 5 areas: FTE effort spent on operating services used for WLCG and on other activities related to WLCG operations Service upgrades & changes Communications Monitoring Services administration The answers are (still) being analyzed The final report will be provided at the WLCG Collaboration Workshop in Okinawa (April 2015) Today we show some ( preliminary ) results for the FTE effort and Communication areas NIKHEF11 th March FTE Effort in WLCG NIKHEF11 th March FTE Effort on WLCG Ops Effort quantified in amount of FTEs The amount of FTEs defined as the ratio between the number of hours spent on the task in a year, divided by 1,600 hours Acknowledge that these estimates can be affected by a large uncertainty (including a misinterpretation, which was apparent in a few cases) careful in drawing strong conclusions from them NIKHEF11 th March Suported VOs: tickets and effort NIKHEF11 th March Suported VOs: tickets and effort Observations: Support via tickets for WLCG is not clearly scaling with the number of LHC VOs supported by the sites Total FTE effort reported by the sites is not clearly scaling with the number of LHC VOs supported by the sites NIKHEF11 th March Effort per area (T0/T1s) T0/1s: ~12.3 FTE; average for all categories: 0.7 FTE +T0 Dominated by core services CVMFS S0 & S1 FTS3 LFC VOMS WMs... Dominated by: Exp. services development Virtualization HW provisioning OS & configuration management trackers & version control... NIKHEF11 th March Effort per area (T2s) T2s: =2.8 FTE; average for all categories 0.2 FTE Dominated by: Exp. Developments Exp. Specific tasks HW provisioning OS & configuration management... Dominated by: APEL WMS VOMS... NIKHEF11 th March FTE Effort vs. Size of the sites 1/2 CPU Size of the sites taken from 2014 accounting: per day Disk & Tape for T0/T1s taken from WLCG monthly accounting Disk for T2s from Fed. pledges available no breadown of installed/site sites X NIKHEF11 th March FTE Effort vs. Size of the sites 2/2 A clear correlation is visible for T0/1s: more FTEs for bigger sites Not so clear for T2 sites excludes meetings, new tech., TFs, WGs... includes only Operations of Services NIKHEF11 th March FTE Effort Observations Not unexpectedly, the storage is the service that requires the highest amount of effort, in all of the sites Core Grid/Exp. services in T0/T1s take more effort than in T2s APEL is the most frequently mentioned service in the "other Grid services" category for T2s Exp. services development / Virtualization / HW provisioning / OS & configuration mgt takes more effort in T0/T1s than in T2s Networking effort is similar in T0/T1s and T2s Infrastructure services such as perfSONAR, Squid, ARGUS/GUMS take very little manpower NIKHEF11 th March Communication in WLCG NIKHEF11 th March Communications 1/12 NIKHEF11 th March Communications 2/12 What could be done to improve the communication between the site and WLCG operations? (free text) Some answers: Distinguish official requirements approved by WLCG ops vs. suggestions from experiments (it is not always obvious to distinguish) Creation of GGUS support unit for WLCG operations Establishing a WLCG Ops bulletin More feedback from sites before requests to sites are made Important service requests, like XRootD or WebDAV protocols should come from WLCG as formal requests, assuming new services are discussed within WLCG management board and properly endorsed NIKHEF11 th March Communications 3/12 NIKHEF11 th March Communications 4/12 How would you improve the sharing of information across WLCG sites? (free text) Some answers: More sites participating in HEPiX and GDB HEPIX is seen as quite effective indeed Creating new e-groups to share information on site-specific services and/or issues Acknowledged the relevance of LCG-ROLLOUT Consolidate the relevant information in open WLCG twikis Look into less pages find more information (and more relevant) Mini-workshops on specific topics, and/or the creation of an annual Tier1 or WLCG sites Jamboree (site oriented) NIKHEF11 th March Communications 5/12 HEPIX LCG-ROLLOUT GDB CHEP WLCG Ops Coord (T1s) private chats NIKHEF11 th March Communications 6/12 NIKHEF11 th March Communications 7/12 What changes do you think would make the meeting more effective and interesting for you as a site? (free text) Some answers: To be more focus (WLCG Ops Coord. meeting): avoid reports from TFs with little progress Shorten to 1h, maximum Time slot: Current: does not allow for Asia participation ; US would like to be a bit later (16:00 CET) Adding: actions from/to sites in the meeting minutes Sometimes, not clear for sites what are supposed to do, when reading the minutes NIKHEF11 th March Communications 8/12 NIKHEF11 th March Communications 9/12 If your site is not involved in a TF or WG, please indicate the main reason(s) (free text) Some answers: Many sites answered Lack of manpower Some: Not a funded work / part of the WLCG commitment Time-zone difference Problems in operating the site, but willing to participate NIKHEF11 th March Communications 10/12 What improvements would you like to see in GGUS? (free text) Some answers: Easy programmatic access to current and historical content Improvements in the Interface Every piece of middleware should be supported via GGUS NIKHEF11 th March Communications 11/12 When WLCG expects a certain action from a site (service upgrades and reconfiguration, etc...), what channels do you want to be used, in order of importance? Broadcasts and GGUS tickets are considered by far the best methods to communicate requests to sites Operations meetings are far behind Answers NIKHEF11 th March Communications 12/12 NIKHEF11 th March Communication Observations Well covered domains: WLCG Ops communication from/to sites ok, but it might be improved Information share across sites some sites are asking to have a (yearly?) dedicated WLCG sites meeting The channels seem good and sufficient ( , meetings, wikis, etc...) Suggestions to explore collaborative tools... WLCG TFs and WGs are considered useful GGUS is very much appreciated as a support tool The 3pm Ops call is considered useful by the T0/T1s - the frequency is fine Domains to improve: The role of WLCG Ops between the Experiments and their collaborating Sites The content of the fortnightly WLCG Ops Coord meeting (66/101 sites Never or Rarely attend) How to attract more site attention, in particular T2s NIKHEF11 th March Conclusions so far WLCG Workshop (Okinawa), next month