gridpp: the uk's contribution to the international collaboration building a worldwide grid, the...

15
GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony Doyle

Upload: marylou-mcdowell

Post on 28-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

GridPP: the UK's contribution to the international collaboration

building a worldwide Grid, the LHC Computing Grid

GridPP – is the system usable?

Tony Doyle

Page 2: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Summary• GridPP runs a major part of the EGEE/LCG

Grid, which supports ~3000 users • The Grid is not (yet) as transparent as end-

users want it to be• The underlying overall failure rate is ~10%• User (interface)s, middleware and operational

procedures (need to) adapt• (see talks by Dave Britton and Stephen Burke

for more info. on performance and operations [now])

• Procedures to manage the underlying problems such that system is usable are highlighted

Page 3: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

5 million hours

“Active” User requires thousands of CPU hoursEGEE CPU hours(1 April 2006 to 31 July

2006 )

Page 4: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Virtual Organisations• Users are grouped into Virtual Organisations

– Users/VO varies from 1 to 806 members (and growing..)

• Broadly four classes of VO– LHC experiments– EGEE supported– Worldwide (mainly non-LHC particle physics)– Local/regional e.g. UK PhenoGrid

• Sites can choose which VOs to support, subject to MOU/funding commitments– Most GridPP sites support ~20 VOs– GridPP nominally allocates 1% of resources to EGEE non-HEP

VOs– GridPP currently contributes 30% of the EGEE CPU resources

Page 5: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

User View?• Perspective matters• This talk is not

– a usability survey– unbiased– representative

• Straw poll – users overcame initial

registration hurdles within ~two weeks

– users adapt to Grid in (un-)coordinated ways

– The Grid was sufficiently flexible for many analysis applications

Page 6: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Physics AnalysisESD: Data or Monte CarloESD: Data or Monte Carlo

Event Tags Event TagsEvent Selection

Analysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object Data

AOD

Analysis Object Data

AOD

Calibration DataCalibration Data

Analysis, Skims

Raw DataRaw Data

Collaboration

-wide

Tasks

Analysis

Groups

Individual

PhysicistsPhysics Analysis

Physics

Objects Physics

Objects

Physics

Objects

INC

RE

AS

ING

DA

TA

FLO

W

Page 7: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

User evolution

Number of UK Grid users (exc. Deployment Team)Quarter: 05Q4 06Q2 06Q3Value: 1342 1831 2777

Many EGEE VOs supported c.f. 3000 EGEE targetNumber of active users (> 10 jobs per month)Quarter: 05Q4 06Q1 06Q2Value: 83 166 201Fraction: 6.2% 11.0%Viewpoint: growing fairly rapidly, but not as active

as they could be? depends on the “active” definition

Page 8: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

806 atlas 763 dzero 577 cms 566 dteam 150 lhcb 131

alice 75 bio 65 dteamsgm 41 esr 31 ilc 27 atlassgm 27 alicesgm 21 cmsprg 18

atlasprg 17 fusn 15 zeus 13 dteamprg 13 cmssgm 11 hone 9 pheno 9 geant 7 babar 6 aliceprg 5 lhcbsgm 5 biosgm 3 babarsgm 2 zeussgm 2 t2k 2 geantsgm 2 cedar 1 phenosgm

1 minossgm 1 lhcbprg 1 ilcsgm 1 honesgm 1 cdf

Kn

ow

you

r users

? U

K-e

nab

led

VO

s

Page 9: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

User Interface

• The GUI is relatively low-level (jobs, file collections)• Dynamic panels for higher level functions

Job details

Logical

Folders

Job Monitoring

Log window

Job builder

Scriptor

Screenshot of the Ganga GUI

Screenshot of the Ganga GUI

Dockable windows

Dockable windows

Page 10: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Complex ApplicationsATLAS• GANGA software framework (jointly with LHCb)• data challenges• producing Monte Carlo data •10 million CPU hours

per year

CMS• Monte Carlo production, data transfer, job submission• CMS transfers top a petabyte a month for the last three months

LHCb• DIRAC software to submit analysis jobs using Grid• 2006 analysis job completion efficiency improved to 91%

Page 11: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

WLCG MoU• Particle physicists collaborate,

play roles and delegate – e.g. “prg” production group

“sgm” software group managers

• Underpinned by Memoranda of Understanding

• Current MoU signatories:China France Germany Italy India Japan Netherlands Pakistan Portugal Romania Taiwan UK USA

• Pending signatures: Australia Belgium Canada Czech Republic Nordic Poland Russia Spain Switzerland Ukraine

• Negotiation w.r.t. resource and service level

Page 12: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Resource allocation• Need to assign quotas and priorities to VOs and measure delivery

• VOMS provides group/role information in the proxy

• Tools to control quotas and priorities in site services being developed– So far only at whole-VO level– Maui batch scheduler is flexible, easy to map to groups/roles– Sites set the target shares– Can publish VO/group-specific values in GLUE schema, hence the RB

can use them for scheduling

• Accounting tool (APEL) measures CPU use at global level (UK task)– Storage accounting currently being added– GridPP monitors storage across UK– Privacy issues around user-level accounting, being solved by

encryption

Page 13: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

User Support• Becoming vital as the number of users grows

– But modest effort available in the various projects

• Global Grid User Support (GGUS) portal at Karlsruhe provides a central ticket interface– Problems are categorised

• Tickets are classified by an on-duty Ticket Process Manager, and assigned to an appropriate support unit– UK (GridPP) contributes support effort

• GGUS has a web-service interface to ticketing systems at each ROC– Other support units are local mailing lists– Mostly best-effort support, working hours only

• Currently ~tens of tickets/week– Manageable, but may not scale much further– Some tickets slip through the net

Page 14: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Documentation & Training

• Need documentation and training for both system managers and users– Mostly expert users up to now, but user community is expanding– Induction of new VOs is a particular problem – no peer support– EGEE is running User Fora for users to share experience

• Next in Manchester in May ’07 (with OGF)– EGEE has a dedicated training activity run by NeSC/Edinburgh

• Documentation is often a low priority, little dedicated effort– The rapid pace of change means that material requires constant

review• Effort on documentation is now increasing

– GridPP has appointed a documentation officer• GridPP web site, wiki

– Installation manual for admins is good• There is also a wiki for admins to share experience

– Focus is now on user documentation• New EGEE web site – coming soon

Page 15: GridPP: the UK's contribution to the international collaboration building a worldwide Grid, the LHC Computing Grid GridPP – is the system usable? Tony

Usable Systems21 September 2006

Tony Doyle - University of Glasgow

Alternative view?

• The number of users in the Grid School for the Gifted is ~manageable now

• The system may be too complex, requiring too much work by the “average user”?

• Or the (virtual) help desk may not be enough?

• Or the documentation may be misleading?

• Or..• Having smart users helps

(the current ones are)