gridpp from prototype to production david britton 21/sep/06 1.context – introduction to gridpp...

32
GridPP From Prototype to Production David Britton 21/Sep/06 1. Context – Introduction to GridPP 2. Performance of the GridPP/EGEE/wLCG Grid 3. Some Successes and Challenges

Upload: alexandrea-bleak

Post on 31-Mar-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

GridPPFrom Prototype to

ProductionDavid Britton 21/Sep/06

1. Context – Introduction to GridPP2. Performance of the GridPP/EGEE/wLCG Grid3. Some Successes and Challenges

Page 2: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

The CERN LHC

4 Large Experiments

The world’s most powerful particle accelerator - 2007

Page 3: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

ALICE- heavy ion collisions, to create quark-gluon plasmas

- 50,000 particles in each collision

LHCb- to study the differences between matter and antimatter

- will detect over 100 million b and b-bar mesons each year

ATLAS- General purpose- Origin of mass- Supersymmetry- 2,000 scientists from 34 countries

CMS- General purpose

- 1,800 scientists from over 150 institutes

“One Grid to Rule Them All”?

The LHC Experiments

Page 4: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Why do particle physicists need the

Grid?

Concorde(15 Km)

Mt. Blanc(4.8 Km)

One year’s data from LHC

would fill a stack of CDs 20km high • 100 million electronic

channels• 800 million proton-

proton interactions per second

• 0.0002 Higgs per second• 10 PBytes of data a year • (10 million GBytes

= 14 million CDs)

Page 5: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Why do particle physicists need the Grid?

Example from LHC: starting from this event…

…we are looking for this “signature”

Selectivity: 1 in 1013

Like looking for 1 person in a thousand world populations

Or for a needle in 20 million haystacks

Page 6: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

19 UK Universities, CCLRC (RAL & Daresbury) Funded by PPARC.

GridPP1 2001-2004 "From Web to Grid"

GridPP2 2004-2007 "From Prototype to Production"

GridPP3 2008-2011 "From Production to Exploitation"

Who are GridPP?

Page 7: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Global Context

2001 2002 2003 2004 2005 2006 2007

EDG EGEE-I EGEE-IILHC Data Taking

GridPP1 GridPP2 GridPP3

EGI ?

GridPP

EDGEGEE

LCG

(Many)

Evolving standardsDeveloping requirements

Changing Costs and budgets Experience

wLCG

Page 8: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tier Structure

Tier 0

Tier 1National centres

Tier 2Regional groups

Tier 3Institutes

Offline farm

Online system

CERN computer centre

RAL,UK

ScotGrid NorthGridSouthGrid London

ItalyUSA

Glasgow Edinburgh Durham

FranceGermany

Detector

Page 9: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

UK Tier-1/A Centre

• High quality data services• National and International

Role• UK focus for International Grid

development•1000 Dual CPU

•330 TB Disk•532 TB Tape

Grid Operations Centre

Page 10: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

UK Tier-2 Centres

ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield

SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick

LondonBrunel, Imperial, QMUL, RHUL, UCL

Page 11: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Grid Performance

Health of the Grid continuously monitored: - Site Functional Test (SFT) - Grid Status Monitor (GSTAT) - Resource Broker Logging - CPU/Storage accounting - Migration to “Service Availability Monitoring” (SAM)

Page 12: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Job Slots

0

1000

2000

3000

4000

5000

6000

06/2

0/04

07/1

1/20

04

08/0

1/20

04

08/2

2/04

09/1

2/20

04

10/0

3/20

04

10/2

4/04

11/1

4/04

12/0

5/20

04

12/2

6/04

01/1

6/05

02/0

6/20

05

02/2

7/05

03/2

0/05

04/1

0/20

05

05/0

1/20

05

05/2

2/05

06/1

2/20

05

07/0

3/20

05

07/2

4/05

08/1

4/05

09/0

4/20

05

09/2

5/05

10/1

6/05

11/0

6/20

05

11/2

7/05

18/1

2/20

05

08/0

1/20

06

29/0

1/20

06

19/0

2/20

06

12/0

3/20

06

02/0

4/20

06

23/0

4/20

06

15/0

5/20

06

05/0

6/20

06

Date

Pu

bli

shed

jo

b s

lots

UK total job slots

Number of published UK job slots from GSTAT

Significant increase in early 2006

Page 13: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Availability

Average monthly SITE and CPU availability from SFT.

Large contribution from UKI

Page 14: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Availability by UK Site

0

2

4

6

8

10

12

14

RAL-LC

G2

IC L

eSC*

RALPP

Glasgo

w

IC H

EP

Queen

Mar

y, UL

UCL-HEP

Durham

Man

ches

ter

Lanc

aster

Liver

pool

Bristo

l

Oxford

Birming

ham

Cambridge

Edinbu

rgh

Brunel

Sheffie

ld

Royal H

olloway

, UL

UCL-Cen

tral

Sta

cked

% o

f S

FT

s fa

iled

eac

h m

on

th

May

April

March

February

January

Page 15: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Usage

Usage over a week (16-27 June 2006) from the Resource Broker Logs

Publishing problem.

Page 16: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Active Users (EGEE – wide)by LHC experiment

ALICE (8)

CMS (150)

ATLAS (70)

LHCb (40)

Page 17: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Active users at RAL

Number of registered users (exc. DTEAM)Quarter: 05Q4 06Q1 06Q2 06Q3 Value: 1342 1831 2777

Number of active users (> 10 jobs)Quarter: 05Q4 06Q1 06Q2 06Q3Value: 83 166 201Fraction: 6.2% 11.0%

Page 18: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Virtual Organisations Supported

0

5

10

15

20

25

RALPP

Edinb

urgh

Durha

m

Brune

l

RAL Tie

r-1

Lanc

aste

r

Royal

Hollow

ay, U

L

UCL-CCC

Oxfo

rd

IC -

LESC

IC-H

EP

Man

ches

ter

Glas

gow

Birming

ham

Que

en M

ary,

UL

Sheffie

ld

Cambr

idge

Bristo

l

UCL - H

EP

Live

rpoo

l

Nu

mb

er o

f en

able

d V

Os

Jan-06

Jun-06

Page 19: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

0

10

20

30

40

50

60

70

80

90

100

110

Raw

Scaled

% O

ccup

ancy

2005 CPU Usage/Efficiency

Page 20: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

2006 CPU Usage/Efficiency

Page 21: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

2006 Job Efficiency

Major LHC experiments achieve ~90%

Page 22: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tier Centre Efficiency

0

100000

200000

300000

400000

500000

600000

700000

800000

London Tier-2 NorthGrid ScotGrid SouthGrid RAL Tier-1

To

tal H

ou

rs

Failed Hours

Success Hours

0

0.2

0.4

0.6

0.8

1

1.2

London Tier-2 NorthGrid ScotGrid SouthGrid RAL Tier-1

Tiers for the period [2006-01-01,2006-04-30]

Eff

icie

ncy

Successful hours (from RB logs) for each Tier Centre (Jan to Apr 2006)

Here, “Efficiency” =

Efficiencies for each Tier Centre.

90%

Successful TimeTotal Time

Page 23: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tier-1

Tier-2s

Storage Accounting

Used

Unused

Page 24: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Resources Delivered by the Tier-2 Sites

0%

50%

100%

150%

200%

250%

UC

L

She

ffie

ld

Dur

ham

Cam

brid

ge

Birm

ingh

am

Edi

nbur

gh

Lanc

aste

r

QM

UL

RH

UL

Bru

nel

RA

L P

PD

Gla

sgow

Impe

rial

Oxf

ord

Bris

tol

Man

ches

ter

Live

rpoo

l% C

PU

del

iver

ed v

s sc

hed

ule

d f

or

Sep

t 05

Q405

Q106

0%

20%

40%

60%

80%

100%

120%

140%

160%

UC

L

Cam

brid

ge

Birm

ingh

am

Oxf

ord

Bris

tol

Lanc

aste

r

Gla

sgow

She

ffie

ld

QM

UL

RA

L P

PD

Edi

nbur

gh

Dur

ham

RH

UL

Impe

rial

Man

ches

ter

Live

rpoo

l

Bru

nel%

dis

k d

eliv

ered

vs

sch

edu

led

fo

r S

ept

05

Q405

Q106

CPU delivered (05Q4 and 06Q1)

Disk delivered (05Q4 and 06Q1)

Disk utilisation at Tier-2 sites has been low (and new purchases were strategically delayed).

Situation addressed in 2006 (initial one-to-one relationship with LHC experiments). Occupancy rose to ~40% level by Q2 2006.

Page 25: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tickets Raised

0

2

4

6

8

10

12

14

16

18

RAL-LC

G2

QMUL-e

Science

Man

ches

ter

Durham

IC-L

eSC

Oxford

Lanc

aster

IC-H

EP

Birming

ham

Brunel

Liver

pool

Edinbu

rgh

RHUL

UCL-Cen

tral

RALPP

Cambridge

Glasgo

w

UCL-HEP

Sheffie

ld

Bristo

l

Nu

mb

er o

f n

ew t

icke

ts

Q1-2006 Q2-2006

0

20

40

60

80

100

120

140

160

RAL-LC

G2

QMUL-e

Science

Man

ches

ter

Durham

IC-L

eSC

Oxford

Lanc

aster

IC-H

EP

Birming

ham

Brunel

Liver

pool

Edinbu

rgh

RHUL

UCL-Cen

tral

RALPP

Cambridge

Glasgo

w

UCL-HEP

Sheffie

ld

Bristo

l

Ave

rag

e ti

me

to c

lose

tic

kets

(h

rs)

Q1-2006 Q2-2006

Page 26: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Scheduled Downtime

0

20

40

60

80

100

120

140

160

UCL-Cen

tral

Brunel

Sheffie

ld

IC L

eSC*

Cambridge

Queen

Mar

y, UL

UCL-HEP

Royal H

olloway

, UL

Durham

Liver

pool

Edinbu

rgh

IC H

EP

Glasgo

w

RALPP

Birming

ham

RAL-LC

G2

Bristo

l

Oxford

Lanc

aster

Man

ches

ter

Sta

cked

% s

ched

ule

d d

ow

n e

ach

mo

nth

May

April

March

February

January

Page 27: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Upgrades

0

5

10

15

20

25

30

35

40

09/0

4/20

05

23/0

4/20

05

07/0

5/20

05

21/0

5/20

05

04/0

6/20

05

18/0

6/20

05

02/0

7/20

05

16/0

7/20

05

30/0

7/20

05

13/0

8/20

05

27/0

8/20

05

10/0

9/20

05

24/0

9/20

05

08/1

0/20

05

22/1

0/20

05

05/1

1/20

05

19/1

1/20

05

03/1

2/20

05

17/1

2/20

05

31/1

2/20

05

14/0

1/20

06

28/0

1/20

06

11/0

2/20

06

25/0

2/20

06

11/0

3/20

06

25/0

3/20

06

08/0

4/20

06

22/0

4/20

06

06/0

5/20

06

20/0

5/20

06

03/0

6/20

06

17/0

6/20

06

# si

tes

at r

elea

se

LCG-2_6_0 LCG-2_7_0 GLITE-3_0_0 LCG-2_4_0

Page 28: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tier-0 to Tier-1Data Transfers

• worldwide data transfers > 950MB/s for 1 week

• peak transfer rate from CERN of >1.6GB/s

• Ongoing experiment transfers as part of current service challenges

Page 29: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Tier-1 to Tier-2Data Transfers

• UK data transfers >1000Mb/s for 3 days• Peak transfer rate from RAL of >1.5Gb/s• Need high data rate transfers to/from RAL as

a routine activity

Page 30: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Summary

“From Prototype to Production” is about understanding and improving performance.

Monitoring, understanding, and improving performance of a Grid is, in itself, a Grid challenge.

Many tools and metrics have, and are, being developed to measure and monitor the GridPP/EGEE/wLCG Grid performance are now providing feedback.

Page 31: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

Many Successes….

95% efficiencies

3 PB of data

GridPP3 Proposal

Industry

Schools

Physics!

MOU signed

Einstein

Einstein

Security

Wiki

Page 32: GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes

21/Sep/06 GridPP D. Britton

…and many Challenges

A year later: Progress in all areas but the challenges remain.