lcg wlcg accounting: update, issues, and plans john gordon ral management board, 19 december 2006

21
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

Upload: bethany-jade-hines

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

last update 17/02/ :53 LCG – MB 19Dec06 Update  Much of this section was reported to the GDB in December resId=1&materialId=slides&confId=a resId=1&materialId=slides&confId=a057712

TRANSCRIPT

Page 1: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

LCG WLCG Accounting: Update, Issues, and Plans

John GordonRAL

Management Board, 19 December 2006

Page 2: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Overview Update

WLCG Reporting APEL Portal, APEL Sensors Other Sensors DGAS User Level Accounting Storage Accounting Future work

Issues Going Forward

Page 4: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

WLCG Reporting

WLCG official accounting is currently done manually, only by Tier1s.

By the end of 2006 the T0 and all existing Tier1s, except NorduGrid, are reporting monthly

per VO and per site on normalised CPU time, wallclock time, disk allocated and used, and tape used.

The reports are consolidated, compared to MoU pledges (with some efficiencies assumed), and published via the LCG Bulletin https://twiki.cern.ch/twiki/bin/view/LCG/LcgBulletins

Page 5: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

APEL Portal

The APEL Accounting Portal at RAL has been storing cpu accounting results for WLCG for more than a year.

CESGA has taken over development of the various reports so the EGEE view is now the definitive one

CESGA also monitor which sites are publishing and raise trouble tickets in GGUS when a site fails to publish for 30 days.

Manual checking of published results has revealed some gaps in data. SAM tests for APEL are under development to compare the results stored locally in the RGMA MON box with the central data.

Page 6: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Accounting Portal

Page 7: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

APEL Sensors

APEL2 was released in production in gLite 3.0.2 Update 10 last Monday. Main new features are:

More reliable publisher which can handle tcp connection timeouts with the archiver.

Encryption of UserDN using a 1024-bit RSA key, ready for user-level accounting

Support for the Blah accounting file on the gLiteCE

gLite 3.0.2u10 also contains patches to the gLite CE to correct erros in the Blah accounting log. The APEL sensors now work correctlky with the gLite CE.

Page 8: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Other Sensors

Not all sites report cpu accounting via the APEL sensors. Some interrogate their own site accounting databases and publish directly using R-GMA. Advice on how to do this is available at http://goc.grid-support.ac.uk/gridsite/accounting/faq.html

DGAS INFN uses DGAS to collect accounting information and

stores it in its own repositories (HLR) for each site. A new development DGAS2APEL, takes information from the site HLR and publishes it via RGMA to the APEL repository. This is deployed in production at 3 INFN sites usage records are being successfully transferred to the central APEL repository.

Page 9: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

User Level Accounting

APEL2 encrypts the user DN in the Usage Record. When a site switches on external user publishing the encrypted DN is sent to the central repository where it is decrypted to allow aggregation and then re-encrypted.

A prototype portal has been developed to show information to the roles identified (see GDB talks from October and December). No userDN information will be made available until the relevant policy documents are in place, approved, and signed by the relevant individuals.

Page 10: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

User-Level Accounting Development of a prototype User-level reporting display based on the

“Five Actors” described: VO Resource Manager VO Member User Site Administrator GOC Developer.

Screen shots demonstrate this in action

Page 11: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

VO-Resource Manager Table shows CPU, WCT and Job Eff. of the Top 10 Anonymised Users This example shows that the largest WCT User has a job efficiency of

10%…clearly the VO Manager may wish to contact this person?

Page 12: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

VO-Resource Manager

Cumulative CPU of the Top 10

Relative Share of Top 10 compared to the VO Total

Page 13: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Site Admin View

The Site Administrator can view usage of anonymous grid users who executed jobs at the site.

Page 14: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

User Each Grid User can interrogate their own accounting data

Tables showing what they did and when Number of Jobs, CPU and WCT per Month (per VO) Average Job Efficiency per VO Accumulative Njobs, CPU and WCT per VO The sites which executed the jobs, and when they were done

The following table shows the distribution of the Total number of Your Jobs grouped by VO and DATE

Page 15: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Storage Accounting

GridPP in the UK has developed storage accounting using values published in GLUE and harvesting them from the BDII.

The results are published and summarised in the same way as cpu and some example visualisations (by CESGA) shown using data from GridPP sites.

A roadmap exists for further development of the portal

http://goc02.grid-support.ac.uk/accountingDisplay/ This storage accounting has recently been

extended to all EGEE sites. OSG are developing their own solution.

Page 16: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Storage Accounting Display Visualisation of Storage Used per VO for Disk and Tape http://goc02.grid-support.ac.uk/accountingDisplay/view.php

Select Resources via a Tree Select time interval (last year, last month, last week, last day)

Page 17: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Storage Accounting Display Looking at data for RAL-LCG2 Storage units are 1TB = 10^6 MB Tape Used + Disk Used = Total

Sensor Drop Outs have been fixed

Total Used Storage (TB)

Tape UsedDisk Used

Page 18: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Issues: CPU Reporting

NorduGrid is not reporting any CPU use. Reporting should be extended to Tier2s. Correctness of data needs checking Completeness of data needs checking. CPU versus wallclock. How many accounting solutions do we need? Use of VOMS. Local versus Grid.

Page 19: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Issues: User Level Accounting

We need Sites to deploy gLite 3.0.2u10 and start publishing

encrypted DNs. The relevant policies to be formulated and

approved Feedback on the reporting suggested at December

GDB.

Page 20: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Issues: Storage Accounting

GLUE1.3 introduces new SE reporting concepts. Are they sufficient for storage accounting? Can they be implemented across all SEs? Can we ever account shared space on SEs

correctly?

Page 21: LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006

last update 05/04/23 12:07

LCG

[email protected] – MB 19Dec06

Going Forward

I suggest the priorities are:

Introduce T2 reporting using APEL for cpu (now) and for storage (hopefully soon)

Sites to check the data being published for storage. Rollout DGAS2APEL across INFN so that information

from Italy is collected centrally. Check results from Storage Accounting and

develop information providers further. Persuade NDGF to start publishing Tier1 accounting Rollout user level accounting