tony doyle - university of glasgow 6 september 2005collaboration meeting gridpp overview (emphasis...

24
6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow GridPP Overview (emphasis on beyond GridPP) Tony Doyle

Upload: lucas-page

Post on 03-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

GridPP Overview

(emphasis on beyond GridPP)Tony Doyle

3 February 2005 Science Committee Meeting Tony Doyle - University of Glasgow

• “2004 was a pivotal year, marked by extraordinary and rapid change with respect to Grid deployment, in terms of scale and throughput. The scale of the Grid in the UK is more than 2000 CPUs and 1PB of disk storage (from a total of 9,000 CPUs and over 5PB internationally), providing a significant fraction of the total resources required by 2007. A peak load of almost 6,000 simultaneous jobs in August, with individual Resource Brokers able to handle up to 1,000 simultaneous jobs, gives confidence that the system should be able to scale up to the required 100,000 CPUs by 2007. A careful choice of sites leads to acceptable (>90%) throughput for the experiments, but the inherent complexity of the system is apparent and many operational improvements are required to establish and maintain a production Grid of the required scale. Numerous issues have been identified that are now being addressed as part of GridPP2 planning in order to establish the required resource for particle physics computing in the UK.”

• Most projects fail in going from prototype to production…

• There are many issues: methodical approach reqd.

Executive Summary II

At the end of GridPP2 Year 1, the initial

foundations of “The Production Grid”

are built. The focus is on “efficiency”.

Some Open Questions..

What will it take to build upon this

foundation?

What are the underlying problems?

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Open Questions

1. "LCG Service Challenges" (plans for SC4 based on experience of SC3) How do we all prepare?

2. "Running Applications on the Grid"(Why won't my jobs run?)

3. "Grid Documentation" (What documentation is needed/missing? Is it a question of organisation?)

4. "What value does GridPP add?" 5. "Beyond GridPP2 and e-Infrastructure"

(What is the current status of planning?) 6. "Managing Large Facilities in the LHC era"

(What works? What doesn't? What won't)7. "What is a workable Tier-2 Deployment Model?" 8. "What is Middleware Support?" (really all about) Aim: to recognise the problems (at all levels), respond

accordingly, define appropriate actions

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Open Questions

1. "LCG Service Challenges" (plans for SC4 based on experience of SC3) How do we all prepare?

2. "Running Applications on the Grid"(Why won't my jobs run?)

3. "Grid Documentation" (What documentation is needed/missing? Is it a question of organisation?)

4. "What value does GridPP add?" 5. "Beyond GridPP2 and e-Infrastructure"

(What is the current status of planning?) 6. "Managing Large Facilities in the LHC era"

(What works? What doesn't? What won't)7. "What is a workable Tier-2 Deployment Model?" 8. "What is Middleware Support?" (really all about) Aim: to recognise the problems (at all levels), respond

accordingly, define appropriate actions

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Beyond GridPP2..

Funding from September 2007 will be incorporated as part of PPARC’s request for planning input for LHC exploitation

from the LHC experiments and GridPP that will be considered by a Panel consisting of Prof. G. Lafferty (Chair), Prof. S.

Watts and Dr. P. Harris meeting over the summer to provide input to Science Committee in the Autumn.

An important issue to note is the need to ensure matching funding is fully in place for the full term of EGEE-2,

anticipated to be 1st April 2006 to 31st March 2008. Such funding for SA1 and JRA1 is currently provided by PPARC through GridPP2, but this will terminate under current arrangements at the end of GridPP2 in August 2007.

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

LCG Tier-1 Planning

(CPU & Storage) CPU

0

20000

40000

60000

80000

100000

120000

2006 2007 2008 2009 2010

Year

kSI2

K

PIC, Barcelona

FNAL, US

BNL, US

RAL, UK

ASGC, Taipei

Nordic Data Grid Facility

NIKHEF/SARA, NL

CNAF, Italy

CC-IN2P3, France

GridKA, Germany

TRIUMF, Canada

CPU

0

20000

40000

60000

80000

100000

120000

2006 2007 2008 2009 2010

Year

kS

I2K Total Offered (Ext. T1s)

Total Requested

Experiment requests are large e.g. in 2008 CPU ~50MSi2k Storage ~50PB! They can be met globally except in 2008. UK expected to contribute ~ 7%. [Currently more]

First LCG Tier-1 Compute Law: CPU:Storage ~1[kSi2k/TB]Second LCG Tier-1 Storage Law: Disk:Tape ~ 1(The number to remember is.. 1)

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

LCG Tier-1 Planning

(Storage) Disk

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

2006 2007 2008 2009 2010

Year

Tbyt

es

PIC, Barcelona

FNAL, US

BNL, US

RAL, UK

ASGC, Taipei

Nordic Data Grid Facility

NIKHEF/SARA, NL

CNAF, Italy

CC-IN2P3, France

GridKA, Germany

TRIUMF, Canada

Disk

01000020000300004000050000

2006 2007 2008 2009 2010

Year

Tbyt

es Total Offered

Total Requested

Tape

0

10000

20000

30000

40000

50000

60000

70000

2006 2007 2008 2009 2010

Year

Tb

ytes

PIC, Barcelona

FNAL, US

BNL, US

RAL, UK

ASGC, Taipei

Nordic Data Grid Facility

NIKHEF/SARA, NL

CNAF, Italy

CC-IN2P3, France

GridKA, Germany

TRIUMF, Canada

Tape

0

10000

20000

30000

40000

50000

60000

70000

2006 2007 2008 2009 2010

Year

Tb

yte

s

Total Offered

Total Requested

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

LCG Tier-1 Planning

CPU

0

20000

40000

60000

80000

100000

120000

2006 2007 2008 2009 2010

Year

kSI2

K

PIC, Barcelona

FNAL, US

BNL, US

RAL, UK

ASGC, Taipei

Nordic Data Grid Facility

NIKHEF/SARA, NL

CNAF, Italy

CC-IN2P3, France

GridKA, Germany

TRIUMF, Canada

RAL, UK

Pledged Planned to be pledged

2006 2007 2008 2009 2010

CPU (kSI2K) 98014921234

27123943

4206 6321

585710734

Disk (Tbytes) 450841630

14842232

20873300

30205475

Tape (Tbytes) 6641080555

20742115

39344007

57106402

2006: March 2005 detailed planning (bottom up) v26b[uncertainty on when within 2006 - bid to PPARC]PPARC signatures required in Q4 20052007-10:(a) March 2005 detailed planning (bottom up) v26b [current plan](b)August 2005 minimal Grid (top down) [input requiring

LHC-UK experiments support, further iteration(s)..]

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

LCG Tier-2 Planning

2006: October 2004 Institute MoU commitments[deployment, 2005] requirement currently less than

“planned” reduced CPU and disk currently delivered, need to monitor this..

PPARC signatures required in Q4 20052007-10:(a) 2007 MoU, followed by pessimistic guess [current plan](b)August 2005 minimal Grid (top down) [input requiring

LHC-UK experiments support, further iteration(s)..]

UK, Sum of all Federations

Pledged Planned to be pledged

2006 2007 2008 2009 2010

CPU (kSI2K) 380038401592

48304251

54106127

60109272

Disk (Tbytes) 530540258

6001174

6602150

7203406

Third LCG Tier-2 Compute Law: Tier-1:Tier-2 CPU ~1Zeroth LCG Law: There is no Zeroth law – all is uncertainFifth LCG Tier-2 Storage Law: CPU:Disk~5[kSi2k/TB])

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Cascaded Pledges..

• T2 resource LCG pledges depend upon MoU commitments

• Current (Q2) Status:• SouthGrid has (already) met its MoU commitment • Other T2s have not• The Q3 status will be reported to PPARC as the year 1 outturn(info. must be correct)

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

"What value does GridPP add?"

    High Level Value added by GridPP

     

LCG 1 Enabling a rapid start for the LCG Project

Middleware

2 Generic Metadata Development.

3 To provide common storage solutions for the UK.

4 To provide and maintain a central Workload Management system in the UK.

5 Security in the Grid environment.

6 Information Monitoring System.

7 Network development.

Applications

8 Integration of the LHC experiment applicationts.

9 Ganga Development.

10 Integration with running experiments.

11 Connecting with the Theory Community

12 Grid Portal

Infrastructure

13 The Deployment Team

14 The Tier-2 structures

15 The Tier-1 infrastructure

16 Grid Support

17 Service Challenges

Coordination

18 The GridPP Website

19 The GridPP Identity

20 Dissemination

21 Management

22 The UK Grid

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

"What happens when GridPP disappears?"

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

"Beyond GridPP2 and e-Infrastructure"

LHC EXPLOITATION PLANNING REVIEWInput is requested from the UK project spokespersons, for ATLAS and CMS

for each of the financial years 2008/9 to 2011/12, and for LHCb, ALICE and GridPP for 2007/8 to 2011/12.

Physics programmePlease give a brief outline of the planned physics programme. Please also indicate how this planned programme could be enhanced with additional

resources. In total this should be no more than 3 sides of A4. The aim is to understand the incremental physics return from increasing resources.

Input was based upon PPAP roadmap inputE-Science and LCG-2 (26 Oct 2004)

and feedback from CB (12 Jan & 7 July 2005)

3 page description: “The Grid for LHC Exploitation” submitted in August 2005

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Beyond GridPP2..

• 3 page description: “The Grid for LHC Exploitation”

• “In order to calculate the minimum amount of resource required at the UK Tier-1 and Tier-2 we have taken the total Tier-1 and Tier-2 requirements of the experiments multiplied by a UK ‘share’.”

• Experiments should determine the “incremental physics return from increasing resources”.

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

UK Support for the LHC

Experiments• The basic functionality of the Tier-1 is:• ALICE - Reconstruction, Chaotic Analysis• ATLAS - Reconstruction, Scheduled

Analysis/strimming, Calibration• CMS - Reconstruction• LHCb - Reconstruction, scheduled strimming,

chaotic analysis• The basic functionality of the Tier-2s is:• ALICE - MC Production, Chaotic Analysis• ATLAS - Simulation, Analysis, Calibration• CMS - Analysis, All Simulation Production• LHCb - MC Production, No analysis

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Support for the LHC Experiments

in 2008• UK Tier-1 (~7% of Global Tier-1):

• UK Tier-2 (pre-SRIF3):

UK Tier1 ALICE ATLAS CMS LHCb SUM 2008/9 CPU (kSI2K) 84 1571 449 644 2748 Disk (Tbytes) 46 887 227 343 1503 Tape (Tbytes) 46 1033 670 346 2095

CPU (KSI2K) Disk (TB) ALICE ATLAS CMS LHCb ALICE ATLAS CMS LHCb

London 0 553 651 217 0 39 58 18 NorthGrid 0 1353 0 144 0 297 0 20 ScotGrid 0 123 0 131 0 4 0 65 SouthGrid 124 269 145 167 5 15 9 11

Status of current UK planning by experiment

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Tier-1 Requirements

• Minimal UK Grid – each experiments may wish to increase their share (tape omitted for clarity )

Requirements

    CPU (KSI2K) Disk (TB)

  UK Share 2007 2008 2009 2010 2011 2012 2007 2008 2009 2010 2011 2012

ALICE   4900 12300 16000 24300 29850 36040 2941 7353 9559 12426 15735 18801

UK 1% 49 123 160 243 299 360 29 74 96 124 157 188

ATLAS   4070 23970 42930 72030 91460 113744 2771 14434 22449 40614 50453 62607

UK 10% 407 2397 4293 7203 9146 11374 277 1443 2245 4061 5045 6261

CMS   7600 15200 20700 40700 62950 76850 2100 7000 10500 15700 19900 24330

UK 5% 380 760 1035 2035 3148 3843 105 350 525 785 995 1217

LHCb   2650 4420 5550 8350 9800 11623 1459 2432 2897 3363 4082 4700

UK 15% 398 663 833 1253 1470 1743 219 365 435 504 612 705

LHC Total   1234 3943 6321 10734 14062 17321 630 2232 3300 5475 6810 8370

Other 20% 247 789 1264 2147 2812 3464 126 446 660 1095 1362 1674

UK Total   2400 4732 7585 12880 16874 20785 1300 2678 3960 6570 8172 10044

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Tier-2 Requirements

• Initial requirements can be met via SRIF3 (2007-08..)

• Uncertain beyond this..Requirements

    CPU (KSI2K) Disk (TB)

  UK Share 2007 2008 2009 2010 2011 2012 2007 2008 2009 2010 2011 2012

ALICE   5800 14400 18700 24300 30750 36730 2042 5106 6638 8629 10927 13056

UK 1% 58 144 187 243 308 367 20 51 66 86 109 131

ATLAS   3650 19940 31770 53010 67070 83061 1607 8748 15905 25815 32964 40942

UK 10% 365 1994 3177 5301 6707 8306 161 875 1591 2582 3296 4094

CMS   9600 19300 32300 51600 62950 76850 1500 4900 9800 14700 18850 23300

UK 5% 480 965 1615 2580 3148 3843 75 245 490 735 943 1165

LHCb   4590 7650 7650 7650 7650 7650 14 23 23 23 23 23

UK 15% 689 1148 1148 1148 1148 1148 2 3 3 3 3 3

LHC Total   1592 4251 6127 9272 11310 13663 258 1174 2150 3406 4352 5393

Other 20% 318 850 1225 1854 2262 2733 52 235 430 681 870 1079

UK Total   1910 5101 7352 11126 13571 16396 310 1409 2580 4087 5222 6472

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Manpower

• Input Requirements for “minimal” Grid

• Supports LHC and other experiments• Does not include wider E-Infrastructure

(EGEE and beyond)

FTE Table

FY05 FY06 FY07 FY08 FY09 FY10 FY11 FY12

Apr/05 - Mar/06 Apr/06 - Mar/07 Apr/07 - Aug/07 Sep-07 FY07          

GridPP Other Total GridPP Other Total GridPP Other New Total New New New New New

Tier-1 Operation 13.50 3.00 16.50 13.50 3.00 16.50 5.63 1.25 8.75 15.63 15.00 15.00 15.00 15.00 15.00

Tier-2 Operation 9.00 6.00 15.00 9.00 6.00 15.00 3.75 2.50 8.75 15.00 15.00 15.00 15.00 15.00 15.00

Grid Operations 1.00 7.00 8.00 1.00 7.00 8.00 0.42 2.92 5.83 9.17 8.00 8.00 8.00 8.00 8.00

Management 3.40 0.00 3.40 3.40 0.00 3.40 1.42 0.00 1.46 2.88 2.50 2.50 1.50 1.50 1.50

Application Interfaces 18.50 3.00 21.50 18.50 3.00 21.50 7.71 1.25 4.67 13.63 8.00 7.00 7.00 7.00 7.00

Middleware Support 19.50 10.00 29.50 19.50 10.00 29.50 8.13 4.17 7.58 19.88 13.00 10.00 8.00 8.00 8.00

Total 64.90 29.00 93.90 64.90 29.00 93.90 27.04 12.08 37.04 76.17 61.50 57.50 54.50 54.50 54.50

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Estimated Costs

• Naïve Full Economic Cost approach ~£10m p.a.

FEC Table [k£]  Inflaction factor 1.05  

FY07 FY08 FY09 FY10 FY11 FY12

Average Salary £44.5k £46.7k £49.1k £51.5k £54.1k £56.8k

Effective FEC fraction 100% 100% 100% 100% 100% 100%

Total FEC per FTE £89.0k £93.5k £98.1k £103.0k £108.2k £113.6k

Cost Table           

FY07 FY08 FY09 FY10 FY11 FY12

Tier-1 Staff £779k £1,402k £1,472k £1,545k £1,623k £1,704k

Tier-1 Hardware £2,196k £2,041k £2,721k £1,793k £1,435k £1,416k

Tier-1 Running Costs £86k £170k £273k £464k £607k £748k

Tier-2 Staff £779k £1,402k £1,472k £1,545k £1,623k £1,704k

Tier-2 Hardware £1,484k £957k £1,286k £870k £605k £632k

Tier-2 Running Costs £69k £184k £265k £401k £489k £590k

Grid Operations £519k £748k £785k £824k £865k £909k

Management £130k £234k £245k £155k £162k £170k

Application Interfaces £415k £748k £687k £721k £757k £795k

Middleware Support £675k £1,215k £981k £824k £865k £909k

Travel and Operations £211k £294k £290k £289k £299k £309k

Grand Total £7,343k £9,393k £10,478k £9,431k £9,331k £9,886k

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Cost Breakdown2008 Estimated Costs

15%

22%

2%15%10%

2%8%

2%

8%

13%3%

Tier-1 Staff Tier-1 HardwareTier-1 Running Costs Tier-2 StaffTier-2 Hardware Tier-2 Running CostsGrid Operations ManagementApplication Interfaces Middleware SupportTravel and Operations

Total:£9,393k

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Viewpoint: Enabling Grids for E-science in

Europe is “E-Infrastructure”Deliver a 24/7 Grid service to

European science

•build a consistent, robust and secure Grid network that will attract additional computing resources. •continuously improve and maintain the middleware in order to deliver a reliable service to users. •attract new users from industry as well as science and ensure they receive the high standard of training and support they need.

•100 million euros/4years, funded by EU

• >400 software engineers + service support

• 70++ European partners

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Phase 2 Overview

EGEE is the Grid Infrastructure Project in Europe• Take the lead in developing roadmaps, white papers, collaborations• Organise European flagship events• Collaborate with other projects (including CPS)

– start date = April 1 2006

• UK partners– CCLRC+NeSC+PPARC (+TCD) (n.b. UK e-Science, not only HEP)

• NeSC : Training, Dissemination & Applications• NeSC : Networking• CLRC : Grid Operations, Support & Management• CLRC : Middleware Engineering (R-GMA)

• UK phase 2 added partners– Glasgow, ICSTM, Leeds(?), Manchester, Oxford, (+QMUL)

• Funded effort dedicated to deploying regional grids (+dissemination)• UK T2 coordinators (+newsletter)

6 September 2005 Collaboration Meeting Tony Doyle - University of Glasgow

Summary

• This meeting aims to address the uncertain areas of developing and maintaining a Production Grid

• Long-term planning (2007-12) is one of these (particularly) uncertain areas

• LCG MoUs will be signed shortly based upon Worldwide planning

• GridPP is providing PPARC with planning input for the LHC Exploitation Grid (+input from ALICE, ATLAS, CMS, LHCb)

• The (full economic) costs involved for even a minimal LHC Computing Grid are significant

• GridPP needs to demonstrate its wider significance (in order to enhance PPARC’s funding at a higher level)

• EGEE 2 starting, but beyond EGEE requires more planning

• Real work required for "Beyond GridPP2 and e-Infrastructure" open for (tomorrow’s) discussion..