ggus summary (4 weeks)

7
GGUS summary (4 weeks) VO User Team Alarm Total ALICE 5 0 1 6 ATLAS 28 215 6 249 CMS 6 1 1 8 LHCb 2 38 1 41 Totals 41 254 9 304 1

Upload: ruby

Post on 22-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

GGUS summary (4 weeks). 1. Support-related events since last MB. We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GGUS summary (4 weeks)

GGUS summary (4 weeks)

VO User Team Alarm Total

ALICE 5 0 1 6

ATLAS 28 215 6 249

CMS 6 1 1 8

LHCb 2 38 1 41

Totals 41 254 9 304

1

Page 2: GGUS summary (4 weeks)

04/21/23 WLCG MB Report WLCG Service Report 2

Support-related events since last MB

• We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support.• GGUS:61440 (CNAF-BNL network problem) re-opened by ATLAS till network problem fully understood. •EMI insists on changing the GGUS supporters’ privileges, such that assignment to middleware-related Support Units (SUs) be only possible by the EGI DMSU (Deployed Middleware SU). Although this matches the ‘Service Desk’ spirit, it might slow things down. As we have no more USAG, we need the WLCG community input offline a.s.a.p. •There were 9 ALARM tickets since the Sept. 28th MB (4 weeks), 5 of which were real, all submitted by ATLAS. No ALARMs since the Oct 12th MB (where WLCG report was not given). Details follow…

Page 3: GGUS summary (4 weeks)

ATLAS ALARM->CERN-CNAF TRANSFERS

•https://gus.fzk.de/ws/ticket_info.php?ticket=62761

04/21/23 WLCG MB Report WLCG Service Report 3

What time UTC What happened

2010/10/05 9:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_Italy.

2010/10/05 10:23 Site acknowledges ticket and finds a StoRM backend problem.

2010/10/05 12:03 Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.

2010/10/11 9:48 Submitter of ticket GGUS:62745 sets status ‘verified’. No explanation on any of the 2 tickets what the problem/diagnostic/solution actually was…

Page 4: GGUS summary (4 weeks)

ATLAS ALARM->TRANSFERS TO .FR CLOUD

•https://gus.fzk.de/ws/ticket_info.php?ticket=62871

04/21/23 WLCG MB Report WLCG Service Report 4

What time UTC What happened

2010/10/08 5:56 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to NGI_France.

2010/10/08 6:31 Site acknowledges ticket and finds a network problem preventing all DB server access.

2010/10/08 7:29 Service restored.

2010/10/08 10:41 Site puts ticket to status ‘solved’.

2010/10/14 8:39 Submitter sets the ticket to status ‘verified’.

Page 5: GGUS summary (4 weeks)

ATLAS ALARM-> CERN SLOW LSF

•https://gus.fzk.de/ws/ticket_info.php?ticket=6246704/21/23 WLCG MB Report WLCG Service Report 5

What time UTC What happened

2010/09/27 15:34

GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/09/27 16:01

Operator acknowledges ticket and contacts the expert.

2010/09/27 16:37

Expert’s 1st diagnosis. Too many queries.

2010/09/27 20:10

Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’.

2010/09/28 12:21

Submitter sets ticket to status ‘verified’.

Page 6: GGUS summary (4 weeks)

ATLAS ALARM-> CERN SLOW AFS

•https://gus.fzk.de/ws/ticket_info.php?ticket=62662

04/21/23 WLCG MB Report WLCG Service Report 6

What time UTC What happened

2010/10/01 7:13 GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/10/01 7:33 Operator acknowledges ticket and contacts the expert.

2010/10/01 9:37 IT Service manager re-classifies in CERN Remedy PRMS.

2010/10/11 15:33

Still ‘in progress’. Reminder sent during this drill.

2010/10/25 15:56

Still ‘in progress’. No reaction to the Oct 11th reminder

Page 7: GGUS summary (4 weeks)

ATLAS ALARM-> CERN CASTOR

•https://gus.fzk.de/ws/ticket_info.php?ticket=62688

04/21/23 WLCG MB Report WLCG Service Report 7

What time UTC What happened

2010/10/01 16:24

GGUS ALARM ticket opened, automatic email notification to [email protected] AND automatic assignment to ROC_CERN.

2010/10/01 16:41

Operator acknowledges ticket and contacts the expert.

2010/10/01 16:42

Expert starts investigation.

2010/10/01 17:23

Solved. Put DONE in SRM not propagated to CASTOR. Done by hand.

2010/10/01 17:45

Submitter ‘verified’. Shifter added x-ref to GGUS:62705