gridpp: executive summary

28
GridPP: Executive Summary Tony Doyle

Upload: malcolm-gonzales

Post on 31-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

GridPP: Executive Summary. Tony Doyle. Outline. Exec 2 Summary Grid status High level view 2006 Outturn Performance Monitoring Outlook for 2007 Beyond GridPP2. 2007. Exec 2 Summary. 2006 was the second full year for the UK Production Grid - PowerPoint PPT Presentation

TRANSCRIPT

GridPP: Executive Summary

Tony Doyle

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Outline• Exec2 Summary• Grid status• High level view • 2006 Outturn• Performance Monitoring• Outlook for 2007• Beyond GridPP2

2007

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Exec2 Summary• 2006 was the second full year for the UK Production Grid • More than 5,000 CPUs and more than 1/2 Petabyte of disk

storage• The UK is the largest CPU provider on the EGEE Grid, with

total CPU used of 15 GSI2k-hours in 2006• The GridPP2 project has met 69% of its original targets

with 92% of the metrics within specification• The initial LCG Grid Service is now starting and will run for

the first 6 months of 2007• The aim is to continue to improve reliability and

performance ready for startup of the full Grid service on 1st July 2007

• The GridPP2 project has been extended by 7 months to April 2008

• The outcome of the GridPP3 proposal to PPARC is awaited• We anticipate a challenging period from Sept. 2007

onwards

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Grid OverviewAim: by 2008 (full year’s data

taking)- CPU ~100MSI2k (100,000

CPUs)- Storage ~80PB - Involving >100 institutes

worldwide

- Build on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT)

1. Prototype went live in September 2003 in 12 countries

2. Extensively tested by the LHC experiments in September 2004

3. February 2006 25,547 CPUs, 4398 TB storage

Status in February 2007: 177 sites, 32,412 CPUs, 13,282 TB storageMonitoring via Grid Operations

Centre

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

htt

p:/

/ww

w3

.egee.c

esg

a.e

s/gri

dsi

te/a

ccounti

ng/C

ESG

A/t

ree_e

gee.p

hp

Resources2006 CPU Usage

by Region

Via APEL accounting

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

2006 OutturnDefinitions:• "Promised" is the total that was planned at the Tier-

1/A (in the March 2005 planning) and Tier-2s (in the October 2004 Tier-2 MoU) for CPU and storage

• "Delivered" is the total that was physically installed for use by GridPP, including LCG and SAMGrid at Tier-2 and LCG and BaBar at Tier-1/A

• "Available" is available for LCG Grid use, i.e. declared via the EGEE mechanisms with storage via an SRM interface

• "Used" is as accounted for by the Grid Operations Centre

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Resources Delivered

  CPU KSI2K Storage TB

  Promised Delivered Ratio Promised Delivered Ratio

Brunel 155 480 310% 21 6.3 30%

Imperial 1165 807 69% 93.3 60.3 65%

QMUL 917 1209 132% 58.5 18 31%

RHUL 204 163 80% 23.2 8.8 38%

UCL 60 121 202% 0.7 1.1 149%

Lancaster 510 484 95% 86.7 72 83%

Liverpool 605 592 98% 80.3 2.8 3%

Manchester 1305 1840 141% 372.6 145 39%

Sheffield 183 183 100% 3 2 67%

Durham 86 99 115% 5 4 79%

Edinburgh 7 11 152% 70.5 20 28%

Glasgow 246 800 325% 14.8 40 270%

Birmingham 196 223 114% 9.3 9.3 100%

Bristol 39 12 31% 1.9 3.8 200%

Cambridge 33 40 123% 4.4 4.4 101%

Oxford 414 150 36% 24.5 27 110%

RAL PPD 199 320 161% 17.4 66.1 381%

London 2501 2780 111% 196.7 94.4 48%

NorthGrid 2602 3099 119% 542.6 221.8 41%

ScotGrid 340 910 268% 90.3 64 71%

SouthGrid 880 745 85% 57.5 110.6 192%

Total 6322 7534 119% 887.1 490.8 55%

             

Tier-1 1604 1034 64% 1495 712 48%

Tier-1 and Tier-2 total delivery is impressive and usage is improved

Available

CPU: 8.5 MSI2k

Storage: 1.7 PB

Disk: 0.54 PB

Delivery of Tier-1 disk

Used

CPU: 15 GSI2k-hours

Disk: 0.26 PB

Usage of Tier-2 CPU, disk

Request: PPARC acceptance of the 2006 outturn

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

  Available (KSI2K) Used (KSI2K Hours) Ratio

  1Q06 2Q06 3Q06 4Q06 1Q06 2Q06 3Q06 4Q06 1Q06 2Q06 3Q06 4Q06

Brunel 116 116 116 480 12,811 105,014 159,082 643,806 0.70% 41.30% 62.60% 61.20%

Imperial 203 203 203 642 16,828 83,627 82,593 557,943 2.20% 18.80% 18.60% 39.70%

QMUL 281 1209 1209 1209 214,335 612,564 459,427 1,259,446 13.70% 23.10% 17.40% 47.60%

RHUL 163 163 163 163 25,085 21,940 176,046 147,360 17.20% 6.10% 49.30% 41.30%

UCL 121 121 121 121 42,217 51,106 73,763 156,576 16.00% 19.30% 27.80% 59.10%

Lancaster 485 476 473 473 248,463 402,774 210,432 297,550 14.90% 38.60% 20.30% 28.70%

Liverpool 572 592 592 592 56,218 455,727 40,551 164,222 11.80% 35.20% 3.10% 12.70%

Manchester 480 720 1152 1840 380,857 1,042,154 248,704 370,567 0.30% 66.10% 9.90% 9.20%

Sheffield 240 183 183 183 38,411 59,860 78,795 127,039 2.00% 15.00% 19.70% 31.80%

Durham 72 80 80 80 36,699 58,185 33,671 59,123 4.30% 33.20% 19.20% 33.70%

Edinburgh 6 6 6 6 14,829 4,637 3,641 4,918 34.40% 35.30% 27.70% 37.40%

Glasgow 104 104 47 800 75,774 50,462 72,105 155,986 6.30% 22.20% 70.10% 8.90%

Birmingham 62 23 23 23 38,473 31,795 28,299 53,930 13.20% 62.00% 55.20% 105.20%

Bristol 1 7 7 7 842 7,208 8,982 6,593 15.10% 45.70% 57.00% 41.80%

Cambridge 40 38 38 38 654 2,228 2,442 1,811 0.00% 2.70% 2.90% 2.20%

Oxford 67 65 65 65 94,093 92,841 82,284 63,959 34.30% 65.40% 58.00% 45.10%

RAL PPD 73 73 320 320 106,919 132,046 143,648 235,172 59.90% 82.10% 20.50% 33.60%

London 884 1812 1812 2615 311,276 874,251 950,911 2,765,131 10.30% 22.00% 24.00% 48.30%

NorthGrid 1777 1971 2399 3087 723,949 1,960,515 578,482 959,378 10.10% 45.40% 11.00% 14.20%

ScotGrid 182 190 133 886 127,302 113,284 109,417 220,027 6.40% 27.20% 37.60% 11.30%

SouthGrid 243 207 453 453 240,981 266,118 265,655 361,465 31.00% 58.80% 26.80% 36.40%

Total Tier-2 3086 4179 4797 7041 1,403,508 3,214,168 1,904,465 4,306,001 12.20% 35.10% 18.10% 27.90%

                         

Tier-1 415 620 651 848 624,636 1,089,917 1,393,022 992,106 68.70% 80.20% 97.60% 53.40%

LCG CPU Usage

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

(measured by UK Tier-1 for all VOs)

~90% CPU efficiency due to i/o bottlenecks is OK Concern that this is currently ~75%

Efficiency

Each experiment needs to work to improve their

system/deployment practice anticipating e.g. hanging

gridftp connections during batch work

target

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

(Tier-1 CPUs brought online on Jan 10)

• Tier-1 CPU fully utilised throughout 2006 (Grid & non-Grid)• Added 64 Intel twin dual-core Woodcrests on Jan 10• Busy with Grid jobs within 30 minutes

Utilisation

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

06/0

2/20

04

06/3

0/04

07/2

8/04

08/2

5/04

09/2

2/04

10/2

0/04

11/1

7/04

12/1

5/04

01/1

2/20

05

02/0

9/20

05

03/0

9/20

05

04/0

6/20

05

05/0

4/20

05

06/0

1/20

05

06/2

9/05

07/2

7/05

08/2

4/05

09/2

1/05

10/1

9/05

11/1

6/05

14/1

2/20

05

11/0

1/20

06

08/0

2/20

06

08/0

3/20

06

05/0

4/20

06

03/0

5/20

06

01/0

6/20

06

29/0

6/20

06

27/0

7/20

06

24/0

8/20

06

21/0

9/20

06

22/1

0/20

06

Date

% j

ob

slo

ts u

sed

% EGEE slots used % UK slots used

(Estimated utilisation based on gstat job slots/usage)

UKI mirrors overall EGEE utilisation Average Utilisation for Q306: 66%Compared to target of ~70%CPU utilisation was a major T2 issue, but now improving..

Utilisation

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

CPU by experiment

  Used at Tier-2 (KSI2K Hours) Used at Tier-1 (KSI2K Hours)

  1Q06 2Q06 3Q06 4Q06 1Q06 2Q06 3Q06 4Q06

ALICE   0 432 187 9031 17122 30139 36757

ATLAS 852014 1131060 569879 800194 156114 323195 256979 253869

CMS 125794 236409 489122 368427 77025 176198 407072 170784

LHCb 164339 1237858 718268 1072838 21210 404707 634341 396417

BaBar 41854 72932 9159 31454 254775 61636 15853 501

CDF   1 5 17        

D0 93373 102602 40541 221069 95963 53091 22433 27515

H1 3851 44018 23013 8460 3058 17083 18459 80

ZEUS 4965 23170 4140 60736 6906 20953 1353 19815

Other 115407 232299 4867 11341 548 15932 6384 478

LHC 1142147 2605327 1777701 2241646 263380 921222 1328531 857827

Total 1401597 3080349 1859426 2574723 624630 1089917 1393013 906216

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

ALICE1%

ATLAS34%

CMS16%

LHCb35%

Other3%

H11%

ZEUS1%

D05%

BaBar4%

CDF0%

2006 CPU Usage

by experiment

UK Resources

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

LCG Disk Usage

  Available (TB) Used (TB) Ratio

  1Q06 2Q06 3Q06 4Q06 1Q06 2Q06 3Q06 4Q06 1Q06 2Q06 3Q06 4Q06

Brunel   1.5 1.1 4.7   0.1 0.2 4.3   6.70% 18.10% 91.10%

Imperial 0.3 3.2 5.6 35.4 0.3 2.2 2.9 25.5 88.80% 69.40% 51.70% 72.00%

QMUL 18.2 15.9 18.2 18.2 14.3 3.6 3.4 4.8 78.40% 22.60% 18.40% 26.40%

RHUL 2.7 2.7 2.7 5.5 2.5 0.3 0.2 1.5 90.50% 10.60% 7.70% 27.30%

UCL 1.1 0 1 2 0.9 0 0.3 1.4 82.60% 54.30% 32.60% 70.00%

Lancaster 63.4 53.1 47.7 60 29.9 13.1 26.9 12.8 47.10% 24.70% 56.30% 21.30%

Liverpool   2.8 0.6 2.8 0 0 0.1 1.4   0.80% 16.30% 50.00%

Manchester   66.9 67.6 176.8 0 1.9 3.9 5.4   2.80% 5.80% 3.10%

Sheffield 4.5 3.9 2.3 2.2 4.4 1.2 0.3 0.1 95.80% 32.10% 12.40% 4.50%

Durham 1.9 1.9 3.5 3.5 0.6 1.3 0.9 1.2 30.90% 68.10% 25.40% 34.30%

Edinburgh 31 30 29 20 16.6 13.5 2.8 3.9 53.60% 45.10% 9.50% 19.50%

Glasgow 4.3 4.3 1.6 34 3.8 0.6 1.1 4.1 89.90% 15.00% 70.80% 12.10%

Birmingham 1.8 1.8 1.9 1.8 1.3 0.6 0.8 1.3 73.30% 31.80% 41.60% 72.20%

Bristol 0.2 0.2 2.1 1.8 0.2 0 0.3 0.4 89.60% 12.00% 16.00% 22.20%

Cambridge 3.2 3.2 3 3.1 3.1 0 0.8 2.1 94.70% 0.60% 26.30% 67.70%

Oxford 3.2 1.6 3.2 3.2 2.5 0 0 0.5 80.10% 1.10% 0.00% 15.60%

RAL PPD 6.8 6.8 6.4 16.6 6.4 0.6 0.3 13.5 93.50% 9.40% 4.20% 81.30%

London 22.4 23.4 28.7 65.8 17.9 6.2 7 37.5 80.30% 26.60% 24.40% 57.00%

NorthGrid 67.9 126.7 118.2 241.8 34.2 16.2 31.2 19.7 50.40% 12.80% 26.40% 8.10%

ScotGrid 37.1 36.2 34.1 57.5 21 15.5 4.8 9.2 56.60% 42.80% 14.00% 16.00%

SouthGrid 15.2 13.6 16.6 26.5 13.4 1.3 2.2 17.8 88.60% 9.30% 13.20% 67.20%

Total Tier-2 142.5 199.8 197.5 391.6 86.6 39.2 45.1 84.2 60.70% 19.60% 22.80% 21.50%

                         

Tier-1 121.1 114.4 123.1 145.3 56.4 107.2 149.4 177.7 46.60% 93.70% 121.40% 122.30%

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

(individual rates)

0

100

200

300

400

500

600

700

800

900

Lanc

aster

RALPP

Birming

ham

Glasgo

w

Edinbu

rgh

Oxford

Man

ches

ter

Sheffie

ld

Cambridge

Bristo

l

UCL-CENTRAL

Durham

QMUL

IC-H

EP

IC-L

eSC

UCL-HEP

RHUL

Brunel

Liver

pool

Tra

nsf

er r

ate

in M

b/s

Inbound Q1 Inbound Q3 Outbound Q1 Outbound Q3

Aim: to maintain data transfers at a sustainable level as part of experiment service challenges

http://www.gridpp.ac.uk/wiki/Service_Challenge_Transfer_Test_Summary

File Transfers

Current goals:>250Mb/s inbound-only>250Mb/s outbound-only >200Mb/s inbound and outbound

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

• Approval for new (shared) machine room – ETA Summer 2008. Space for 300 racks.

• Procurement – March 06: 52 AMD 270 units, 21 disk

servers (168TB data capacity)– FY 06/07: 47 disk servers (282TB disk

capacity), 64 twin dual-core Intel Woodcrest 5130 units (550kSI2K)

– FY 06/07 upcoming: further 210 TB disk capacity plus high-availability systems (redundant PSUs, hot-swappable paired HDDs)

• Storage commissioning saga– Ongoing problems with March kit.

Firmware updates have now solved problem. (Disks on Areca 1170 in raid 6 experienced multiple dropouts during testing of WD drives)

• Move to CASTOR– Very support heavy but made available

for CSA06 and performing well

General- Air-con problems with high-temperatures triggering high pressure cut-outs in refrigerator gas circuits- July security incident - 10Gb CERN line in place. Second 10Gb line scheduled in 07Q1

Tier-1 Resource

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

e.g. Glasgow: UKI-SCOTGRID-GLASGOW

• 800 kSI2k• 100 TB DPM

Needed for LHC start-upAugust 28

September 1

October 13

October 23

T2 Resources

IC-HEP•440 KSI2K•52 TB dCache

Brunel•260 KSI2K•5 TB DPM

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

GridPP Middleware incorporates..

Security

Network Monitoring

Information Services

Grid Data Management

Storage Interfaces

Workload Management

Middleware

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

MSN Outlook• The results of the GridPP2+ project extension proposal to PPARC were

made known to GridPP in November 2006 • The effects on MSN are significant and particularly damaging with the

overall effort reduced by more than a third from 13 to 8.3 FTEs • WMS testing and contributions to EGEE SA3 will reduce • GridPP work on metadata will cease and UK leadership will be lost, but

this is known to be an area the experiments are keen to see tackled • The reduction in Information and Monitoring effort will severely

impact re-engineering work and support for R-GMA and compromises UK obligations in fulfilling the EGEE contract

• GridPP has recognised the importance of finishing the R-GMA re-engineering, thus meeting the R-GMA deliverables to EGEE and has therefore agreed (in consultation with PPARC) to meet the costs of maintaining the current staffing levels to the end of EGEE-II from within existing allocations

• The reduction in networking activities is likely to impact GridPP’s ability to optimise its use of the underlying JANET network

• Staff whose contracts will not be extended beyond the end of August 2007 have been informed

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

e.g. ATLAS Tier-2 Testing

• Most of the experiments are now well advanced in highly pragmatic deployment issues, particularly in advance of the LHC data at the end of 2007

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Applications Outlook• Products developed by GridPP are in mainstream

use, and will form a vital component of the computing system of each LHC experiment for first data-taking and analysis

• However, almost all explicit funding for the further development and support of such products will terminate in September 2007, since it is now clear that this area will be supported neither via GridPP3 (as planned) nor the Rolling Grants round (as requested)

• This is a matter of concern both for the UK collaborations and the experiments as a whole

• Recovery plans are being prepared within each experiment, attempting to use non-specialist RA effort in tension with physics and hardware support, but there will be profound negative consequences for the continuation and maintenance of these projects

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Dissemination Outlook• Dissemination was one of the areas not fully funded in

GridPP2+ • The Dissemination Officer post was funded at 0.5 FTE (as at

present), but the PPRP did not allocate funds to continue the Events Officer position

• Due to a large number of events and activities planned for the end of 2007, we aim to fund this position for some months out of the current dissemination budget

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Hardware OutlookPlanning for 2007..

• A profiled ramp-up of resources is planned throughout 2007 to meet the UK requirements of the LHC and other experiments

• The results are available for the Tier-1 and Tier-2s• The Tier-1/A Board reviewed UK input to

International MoU negotiations for the LHC experiments as well as providing input to the International Finance Committee for BaBar

• An impasse was reached in planning for 2007 • No new investment in the BaBar Tier A analysis

facility hardware is planned• For LCG, the 2007 commitment for disk and CPU

capacity can be met out of existing hardware already delivered

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Timeline

Proposal Writing Proposal Defence

Apr May Jun Jul Aug Sep Oct

31st March – P

PARC Call

16th June – G

ridPP16 at Q

MUL

6th September – 1

st PPRP review

1st November – Grid

PP17

8th NovemberPPRP “visiting panel”30th November GridPP2+ outcome~FebruaryGridPP3 outcome

13th July – B

id Submitted

CB OC CB

Future?

~10 month process to propose/defend/define future programme

htt

p:/

/ww

w.g

ridpp.a

c.uk/

docs

/gri

dpp3

/

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Scenario Planning – Resource Requirements [TB, kSI2k]

GridPP requested a fair share of global requirements, according to experiment requirements

Changes in the LHC schedule prompted a(nother) round of resource planning - presented to CRRB on Oct 24th

New UK resource requirements have been derived and incorporated in the scenario planning e.g. Tier-1

Tier1 CPU 2008 2009 2010ALICE 10230 18430 22930ATLAS 18123 28423 49573CMS 12400 16900 36900LHCb 1770 4870 6740TOTAL 42523 68623 116143

Tier1 Disk 2008 2009 2010ALICE 5220 7940 9870ATLAS 9939 19686 39488CMS 5600 8500 13700LHCb 1025 2759 3250TOTAL 21784 38885 66308

Tier1 Tape 2008 2009 2010ALICE 7030 13980 20930ATLAS 7694 14950 28698CMS 13100 23500 36600LHCb 860 3070 5864TOTAL 28684 55500 92092

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Input to Scenario Planning –Hardware Costing

• Empirical extrapolations with extrapolated (large) uncertainties• Hardware prices have been re-examined following recent Tier-1 purchase •CPU (woodcrest) was cheaper than expected based on extrapolation of previous 4 years of data

CPU Costs

-4

-3

-2

-1

0

1

01-Jan-02 16-May-03 27-Sep-04 09-Feb-06 24-Jun-07 05-Nov-08 20-Mar-10Date

Ln

(K£/

KS

I2K

)

Past CPU Purchase

Best fit to past purchases

29 month extrapolation

20 month extrapolation

Future price estimates

Max

Min

GridPP3 submission

CERN

Disk Costs

-2

-1

0

1

2

3

01-Jan-02 16-May-03 27-Sep-04 09-Feb-06 24-Jun-07 05-Nov-08 20-Mar-10Date

Ln

(K

£/T

B)

Past Purchases

Best fit (21.7months)

24 months

19 months

Future price estimates

Upper Limit

Lower limit

GridPP3 Proposal

CERN

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Scenario PlanningAn example 70% “minimum viable level” scenario [£m]

WG Area Item Cost WG Frac Area Item CostStaff 4.99 93% Staff 4.62Hardware 11.72 85% Hardware 7.20Staff 3.29 89% Staff 2.94Hardware 5.12 85% Hardware 4.35

C Support Staff 4.50 C 69% Support Staff 3.10D Operations Staff 1.89 D 88% Operations Staff 1.66E Management Staff 1.17 E 90% Management Staff 1.06F Outreach Staff 0.37 F 74% Outreach Staff 0.28G Travel etc Other 0.84 G 75% Travel etc Other 0.63

33.89 25.841.25 0% 0.00

35.14 25.844.15 5.172.50 2.50

41.79 33.51Full Approval CostRunning Costs

GridPP3 Proposal 70% Scenario

Working AllowanceProject Cost

B

Full Approval Cost

Working AllowanceProject CostContingencyRunning Costs

Contingency

A Tier-1

B Tier-2

Tier-1

Tier-2

A

Oversight Committee8 February 2007 Tony Doyle - University of Glasgow

Beyond GridPP2 • The separation between GridPP2+ and GridPP3 was

primarily designed to ensure an early decision could be made on the extension in order to retain key staff

• Approval for the extension was received in late November but included major cuts in the middleware support area

• This is problematic in two ways: – EU-CCLRC contractual obligation– crucial 7 month ramp-up period - the worst time to cut back

• Problems are severely compounded by the outcome of the Rolling Grant round where much of the Applications support work will be lost during this same critical period

• We currently await the outcome of the GridPP3 bid in order to be able to assess the whole picture

• We anticipate a highly challenging period from September 2007 onwards