gridpp deployment status, user status and future outlook tony doyle

32
GridPP Deployment Status, User Status and Future Outlook Tony Doyle

Upload: alexa-smith

Post on 28-Mar-2015

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

GridPP Deployment Status,

User Statusand Future Outlook

Tony Doyle

Page 2: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Introduction

A. What is the deployment status?B. Is the system usable?C. What is the future of GridPP?

Wot no middleware?

Page 3: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

GridPP Middleware is..

Security

Network Monitoring

Information Services

Grid Data Management

Storage Interfaces

Workload Management

Middleware

Page 4: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

e.g. LCG monitoring applet

• Monitor:– resource brokers– virtual organisations

• ATLAS• CMS• LHCb• DTeam• Other

• SQL queries to logging and book-keeping database

Middleware

Page 5: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

e.g. APEL and R-GMA

R-GMA structure

• used in accounting system (GOCDB)

• For gLite the sensors are provided by DGAS via DGAS2APEL

• the EGEE portal for accounting data is provided by CESGA

Middleware

Page 6: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Resources

0

1000

2000

3000

4000

5000

6000

7000

06/2

0/04

07/1

5/04

08/0

9/20

04

09/0

3/20

04

09/2

8/04

10/2

3/04

11/1

7/04

12/1

2/20

04

01/0

6/20

05

01/3

1/05

02/2

5/05

03/2

2/05

04/1

6/05

05/1

1/20

05

06/0

5/20

05

06/3

0/05

07/2

5/05

08/1

9/05

09/1

3/05

10/0

8/20

05

11/0

2/20

05

11/2

7/05

22/1

2/20

05

16/0

1/20

06

10/0

2/20

06

07/0

3/20

06

01/0

4/20

06

26/0

4/20

06

22/0

5/20

06

16/0

6/20

06

11/0

7/20

06

05/0

8/20

06

30/0

8/20

06

24/0

9/20

06

22/1

0/20

06

Date

Pu

bli

shed

jo

b s

lots

UK total job slots

17/12/06: EGEE total slots 34141 => UKI is 6949 ~20% of the total 17/12/06: EGEE jobs running 21291 => UKI is 2912 ~ 14% jobsMax EGEE = 42517 Max UKI = 8176 (N.B. hyperthreading distorts 1:1 job:CPU core relation – reduces UKI numbers by ~500)

http://goc.grid.sinica.edu.tw/gstat/UKI.html

Sunday’s STATUS totalCPU freeCPU runJob waitJob seAvail TB seUsed TB maxCPU avgCPU

Total 6949 3210 2912 77321 246 313 8176 6716

Steady climb since 2004 towards target of ~10,000 CPU (cores) (~job slots)

Page 7: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

htt

p:/

/ww

w3

.egee.c

esg

a.e

s/gri

dsi

te/a

ccounti

ng/C

ESG

A/t

ree_e

gee.p

hp

Resources2006 CPU Usage

by Region

Via APEL accounting

Page 8: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

(not all records are being accounted)

htt

p:/

/ww

w3

.egee.c

esg

a.e

s/gri

dsi

te/a

ccounti

ng/C

ESG

A/t

ree_e

gee.p

hp

Resources

Page 9: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

2006 CPU Usageby experiment

htt

p:/

/ww

w3

.egee.c

esg

a.e

s/gri

dsi

te/a

ccounti

ng/C

ESG

A/t

ree_e

gee.p

hp

Resources

Total CPU used 52,876,788 kSI2k-hours!

Page 10: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

06/0

2/20

04

06/2

9/04

07/2

6/04

08/2

2/04

09/1

8/04

10/1

5/04

11/1

1/20

04

12/0

8/20

04

01/0

4/20

05

01/3

1/05

02/2

7/05

03/2

6/05

04/2

2/05

05/1

9/05

06/1

5/05

07/1

2/20

05

08/0

8/20

05

09/0

4/20

05

10/0

1/20

05

10/2

8/05

11/2

4/05

21/1

2/20

05

17/0

1/20

06

13/0

2/20

06

12/0

3/20

06

08/0

4/20

06

06/0

5/20

06

02/0

6/20

06

29/0

6/20

06

26/0

7/20

06

22/0

8/20

06

18/0

9/20

06

18/1

0/20

06

Date

% j

ob

slo

ts u

sed

% EGEE slots used % UK slots used

(Estimated utilisation based on gstat job slots/usage)

UKI mirrors overall EGEE utilisation Average Utilisation for Q306: 66%Compared to target of ~70%CPU utilisation was a T2 issue, but now improving..

Utilisation

Page 11: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

(measured by UK Tier-1 for all VOs)

~90% CPU efficiency due to i/o bottlenecks is OK Concern that this is currently ~75%

Efficiency

Each experiment needs to work to improve their

system/deployment practice anticipating e.g. hanging

gridftp connections during batch work

target

Page 12: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

(is still an issue for Tier-1 and Tier-2s)

htt

p:/

/ww

w.g

ridpp.a

c.uk/

stora

ge/s

tatu

s/gri

dppD

iscS

tatu

s.htm

l

• Utilisation is low (~30%) at T2s and accounting [by VO] is not (yet) there

Storage

Page 13: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

GOCDB Accounting Display - under development

• Looking at data for RAL-LCG2• Storage units are 1TB = 10^6 MB• Tape Used + Disk Used = Total

Sensor Drop Outs have been fixed

Total Used Storage (TB)

Tape UsedDisk Used

Storage

Page 14: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

• SRM at T1 ~200TB of disk (deployment problem in 2006)– ~100% usage (problem for 2006 service challenges)– Castor 2.1

• SRM at all T2s ~200TB of disk in total– ~30% usage: difficult to calculate– dCache 1.7.0 and DPM v1.5.10– Dedicated disk servers advised (storage should be robust)

• Need to make sure sites are running the latest GIP plugins(https://twiki.cern.ch/twiki/bin/view/EGEE/GIP-Plugins)

• New GOC storage accounting system being put in place• being deployed at Tier-2s• SRM v2.2 is being implemented: need to test interoperability

Storage

Page 15: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

(individual rates)

0

100

200

300

400

500

600

700

800

900

Lanc

aster

RALPP

Birming

ham

Glasgo

w

Edinbu

rgh

Oxford

Man

ches

ter

Sheffie

ld

Cambridge

Bristo

l

UCL-CENTRAL

Durham

QMUL

IC-H

EP

IC-L

eSC

UCL-HEP

RHUL

Brunel

Liver

pool

Tra

nsf

er r

ate

in M

b/s

Inbound Q1 Inbound Q3 Outbound Q1 Outbound Q3

Aim: to maintain data transfers at a sustainable level as part of experiment service challenges

http://www.gridpp.ac.uk/wiki/Service_Challenge_Transfer_Test_Summary

File Transfers

Current goals:>250Mb/s inbound-only>250Mb/s outbound-only >200Mb/s inbound and outbound

Page 16: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

• Approval for new (shared) machine room – ETA Summer 2008. Space for 300 racks.

• Procurement – March 06: 52 AMD 270 units, 21 disk

servers (168TB data capacity)– FY 06/07: 47 disk servers (282TB disk

capacity), 64 twin dual-core Intel Woodcrest 5130 units (550kSI2K)

– FY 06/07 upcoming: further 210 TB disk capacity plus high-availability systems (redundant PSUs, hot-swappable paired HDDs)

• Storage commissioning saga– Ongoing problems with March kit.

Firmware updates have now solved problem. (Disks on Areca 1170 in raid 6 experienced multiple dropouts during testing of WD drives)

• Move to CASTOR– Very support heavy but made available

for CSA06 and performing well

General- Air-con problems with high-temperatures triggering high pressure cut-outs in refrigerator gas circuits(summers are warmer even in the UK...)- July security incident - 10Gb CERN line in place. Second 10Gb line scheduled in 07Q1

Tier-1 Resource

Page 17: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

e.g. Glasgow: UKI-SCOTGRID-GLASGOW

• 800 kSI2k• 100 TB DPM

Needed for LHC start-upAugust 28

September 1

October 13

October 23

T2 Resources

IC-HEP•440 KSI2K•52 TB dCache

Brunel•260 KSI2K•5 TB DPM

Page 18: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Could also be 2006

T2 Resources

As overheard at one T2

site..

Page 19: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

A. “Usability” (Prequel)• GridPP runs a major part of the EGEE/LCG

Grid, which supports ~3000 users • The Grid is not (yet) as transparent as

end-users want it to be• The underlying overall failure rate is

~10%• User (interface)s, middleware and

operational procedures (need to) adapt• Procedures to manage the underlying

problems such that system is usable are highlighted

Page 20: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Virtual Organisations• Users are grouped into VOs

– Users/VO varies from 1 to 806 members (and growing..)

• Broadly four classes of VO– LHC experiments– EGEE supported– Worldwide (mainly non-LHC particle physics)– Local/regional e.g. UK PhenoGrid

• Sites can choose which VOs to support, subject to MOU/funding commitments– Most GridPP sites support ~20 VOs– GridPP nominally allocates 1% of resources to EGEE non-HEP

VOs– GridPP currently contributes 30% of the EGEE CPU resources

Page 21: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

User evolution

Number of users of the UK Grid (exc. Deployment Team)

Quarter: 05Q4 06Q2 06Q3Value: 1342 1831 2777

Many EGEE VOs supported c.f. 3000 EGEE targetNumber of active users (> 10 jobs per month)Quarter: 05Q4 06Q1 06Q2Value: 83 166 201Fraction: 6.2% 11.0%Viewpoint: growing fairly rapidly, but not as active

as they could be? depends on the “active” definition

Page 22: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

806 atlas 763 dzero 577 cms 566 dteam 150 lhcb 131

alice 75 bio 65 dteamsgm 41 esr 31 ilc 27 atlassgm 27 alicesgm 21 cmsprg 18

atlasprg 17 fusn 15 zeus 13 dteamprg 13 cmssgm 11 hone 9 pheno 9 geant 7 babar 6 aliceprg 5 lhcbsgm 5 biosgm 3 babarsgm 2 zeussgm 2 t2k 2 geantsgm 2 cedar 1 phenosgm

1 minossgm 1 lhcbprg 1 ilcsgm 1 honesgm 1 cdf

Kn

ow

you

r users

? U

K-e

nab

led

VO

s

Page 23: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Resource allocation• Assign quotas and priorities to VOs and measure delivery, but

further work required on VOMS-roles/groups within each VO

• VOMS provides group/role information in the proxy

• Tools to control quotas and priorities in site services being developed– So far only at whole-VO level– Maui batch scheduler is flexible, easy to map to groups/roles– Sites set the target shares– Can publish VO/group-specific values in GLUE schema, hence the RB

can use them for scheduling

• Accounting tool (APEL) measures CPU use at global level (UK task)– Storage accounting currently being added– GridPP monitors storage across UK– Privacy issues around user-level accounting, being solved by

encryption

Page 24: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

User Support• Becoming vital as the number of users grows

– But modest effort available in the various projects

• Global Grid User Support (GGUS) portal at Karlsruhe provides a central ticket interface– Problems are categorised

• Tickets are classified by an on-duty Ticket Process Manager, and assigned to an appropriate support unit– UK (GridPP) contributes support effort

• GGUS has a web-service interface to ticketing systems at each ROC– Other support units are local mailing lists– Mostly best-effort support, working hours only

• Currently ~tens of tickets/week– Manageable, but may not scale much further– Some tickets slip through the net

Page 25: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Documentation & Training

• Need documentation and training for both system managers and users– Mostly expert users up to now, but user community is expanding– Induction of new VOs is a particular problem – no peer support– EGEE is running User Fora for users to share experience

• Next in Manchester in May ’07 (with OGF)– EGEE has a dedicated training activity run by NeSC/Edinburgh

• Documentation is often a low priority, little dedicated effort– The rapid pace of change means that material requires constant

review• Effort on documentation is now increasing

– GridPP has appointed a documentation officer• GridPP web site, wiki

– Installation manual for admins is good• There is also a wiki for admins to share experience

– Focus is now on user documentation• New EGEE web site – coming soon

Page 26: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Alternative view?

• The number of users in the Grid School for the Gifted is ~manageable now

• The system may be too complex, requiring too much work by the “average user”?

• Or the (virtual) help desk may not be enough?

• Or the documentation may be misleading?

• Or..• Having smart users helps

(the current ones are)

Page 27: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Timeline – 1

Proposal Writing Proposal Defence

Apr May Jun Jul Aug Sep Oct

31st March – P

PARC Call

16th June – G

ridPP16 at Q

MUL

6th September – 1

st PPRP review

1st November – Grid

PP17

8th NovemberPPRP “visiting panel”

13th July – B

id Submitted

CB OC CB

Future?

Year-long process to define future LHC exploitation

htt

p:/

/ww

w.g

ridpp.a

c.uk/

docs

/gri

dpp3

/

Page 28: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Scenario Planning – Resource Requirements [TB, kSI2k]

GridPP requested a fair share of global requirements, according to experiment requirements

Changes in the LHC schedule prompted a(nother) round of resource planning - presented to CRRB on Oct 24th

New UK resource requirements have been derived and incorporated in the scenario planning e.g. Tier-1

Tier1 CPU 2008 2009 2010ALICE 10230 18430 22930ATLAS 18123 28423 49573CMS 12400 16900 36900LHCb 1770 4870 6740TOTAL 42523 68623 116143

Tier1 Disk 2008 2009 2010ALICE 5220 7940 9870ATLAS 9939 19686 39488CMS 5600 8500 13700LHCb 1025 2759 3250TOTAL 21784 38885 66308

Tier1 Tape 2008 2009 2010ALICE 7030 13980 20930ATLAS 7694 14950 28698CMS 13100 23500 36600LHCb 860 3070 5864TOTAL 28684 55500 92092

Page 29: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Input to Scenario Planning –Hardware Costing

• Empirical extrapolations with extrapolated (large) uncertainties• Hardware prices have been re-examined following recent Tier-1 purchase •CPU (woodcrest) was cheaper than expected based on extrapolation of previous 4 years of data

CPU Costs

-4

-3

-2

-1

0

1

01-Jan-02 16-May-03 27-Sep-04 09-Feb-06 24-Jun-07 05-Nov-08 20-Mar-10Date

Ln

(K£/

KS

I2K

)

Past CPU Purchase

Best fit to past purchases

29 month extrapolation

20 month extrapolation

Future price estimates

Max

Min

GridPP3 submission

CERN

Disk Costs

-2

-1

0

1

2

3

01-Jan-02 16-May-03 27-Sep-04 09-Feb-06 24-Jun-07 05-Nov-08 20-Mar-10Date

Ln

(K

£/T

B)

Past Purchases

Best fit (21.7months)

24 months

19 months

Future price estimates

Upper Limit

Lower limit

GridPP3 Proposal

CERN

Page 30: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Scenario PlanningAn example 70% scenario based on Experiment Inputs [£m]

WG Area Item Cost WG Frac Area Item CostStaff 4.99 93% Staff 4.62Hardware 11.72 85% Hardware 7.20Staff 3.29 89% Staff 2.94Hardware 5.12 85% Hardware 4.35

C Support Staff 4.50 C 69% Support Staff 3.10D Operations Staff 1.89 D 88% Operations Staff 1.66E Management Staff 1.17 E 90% Management Staff 1.06F Outreach Staff 0.37 F 74% Outreach Staff 0.28G Travel etc Other 0.84 G 75% Travel etc Other 0.63

33.89 25.841.25 0% 0.00

35.14 25.844.15 5.172.50 2.50

41.79 33.51Full Approval CostRunning Costs

GridPP3 Proposal 70% Scenario

Working AllowanceProject Cost

B

Full Approval Cost

Working AllowanceProject CostContingencyRunning Costs

Contingency

A Tier-1

B Tier-2

Tier-1

Tier-2

A

Page 31: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Timeline – 2

Nov Dec Jan Feb Mar Apr May

8th Nov –PPRP Visit

ing Panel

6th Dec – PPRP re

commend to SC

PPARC Council

Science Committee

Grants etc.

GridPP2+ outcome (1/9/07-31/3/08) now known emphasis on operations (modest middleware

support)Anticipates GridPP3 outcome (1/4/08-31/3/11) known in the New Year

Back to the Future?

Page 32: GridPP Deployment Status, User Status and Future Outlook Tony Doyle

INFNGrid Meeting20 December 2006 Tony Doyle - University of Glasgow

Conclusion

A. What is the deployment status? (snapshot)

See e.g. “Performance of the UK Grid for Particle Physics” http://www.gridpp.ac.uk/papers/GridPP_IEEE06.pdf for more info.

B. Is the system usable?Yes, but more work required from end-user perspective

C. What is the future of GridPP?Operations-led activity, working with EGEE/EGI (EU) and NGS (UK)