gridpp from prototype to production david britton 21/sep/06 1.context – introduction to gridpp...
TRANSCRIPT
GridPPFrom Prototype to
ProductionDavid Britton 21/Sep/06
1. Context – Introduction to GridPP2. Performance of the GridPP/EGEE/wLCG Grid3. Some Successes and Challenges
21/Sep/06 GridPP D. Britton
The CERN LHC
4 Large Experiments
The world’s most powerful particle accelerator - 2007
21/Sep/06 GridPP D. Britton
ALICE- heavy ion collisions, to create quark-gluon plasmas
- 50,000 particles in each collision
LHCb- to study the differences between matter and antimatter
- will detect over 100 million b and b-bar mesons each year
ATLAS- General purpose- Origin of mass- Supersymmetry- 2,000 scientists from 34 countries
CMS- General purpose
- 1,800 scientists from over 150 institutes
“One Grid to Rule Them All”?
The LHC Experiments
21/Sep/06 GridPP D. Britton
Why do particle physicists need the
Grid?
Concorde(15 Km)
Mt. Blanc(4.8 Km)
One year’s data from LHC
would fill a stack of CDs 20km high • 100 million electronic
channels• 800 million proton-
proton interactions per second
• 0.0002 Higgs per second• 10 PBytes of data a year • (10 million GBytes
= 14 million CDs)
21/Sep/06 GridPP D. Britton
Why do particle physicists need the Grid?
Example from LHC: starting from this event…
…we are looking for this “signature”
Selectivity: 1 in 1013
Like looking for 1 person in a thousand world populations
Or for a needle in 20 million haystacks
21/Sep/06 GridPP D. Britton
19 UK Universities, CCLRC (RAL & Daresbury) Funded by PPARC.
GridPP1 2001-2004 "From Web to Grid"
GridPP2 2004-2007 "From Prototype to Production"
GridPP3 2008-2011 "From Production to Exploitation"
Who are GridPP?
21/Sep/06 GridPP D. Britton
Global Context
2001 2002 2003 2004 2005 2006 2007
EDG EGEE-I EGEE-IILHC Data Taking
GridPP1 GridPP2 GridPP3
EGI ?
GridPP
EDGEGEE
LCG
(Many)
Evolving standardsDeveloping requirements
Changing Costs and budgets Experience
wLCG
21/Sep/06 GridPP D. Britton
Tier Structure
Tier 0
Tier 1National centres
Tier 2Regional groups
Tier 3Institutes
Offline farm
Online system
CERN computer centre
RAL,UK
ScotGrid NorthGridSouthGrid London
ItalyUSA
Glasgow Edinburgh Durham
FranceGermany
Detector
21/Sep/06 GridPP D. Britton
UK Tier-1/A Centre
• High quality data services• National and International
Role• UK focus for International Grid
development•1000 Dual CPU
•330 TB Disk•532 TB Tape
Grid Operations Centre
21/Sep/06 GridPP D. Britton
UK Tier-2 Centres
ScotGridDurham, Edinburgh, Glasgow NorthGridDaresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGridBirmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
LondonBrunel, Imperial, QMUL, RHUL, UCL
21/Sep/06 GridPP D. Britton
Grid Performance
Health of the Grid continuously monitored: - Site Functional Test (SFT) - Grid Status Monitor (GSTAT) - Resource Broker Logging - CPU/Storage accounting - Migration to “Service Availability Monitoring” (SAM)
21/Sep/06 GridPP D. Britton
Job Slots
0
1000
2000
3000
4000
5000
6000
06/2
0/04
07/1
1/20
04
08/0
1/20
04
08/2
2/04
09/1
2/20
04
10/0
3/20
04
10/2
4/04
11/1
4/04
12/0
5/20
04
12/2
6/04
01/1
6/05
02/0
6/20
05
02/2
7/05
03/2
0/05
04/1
0/20
05
05/0
1/20
05
05/2
2/05
06/1
2/20
05
07/0
3/20
05
07/2
4/05
08/1
4/05
09/0
4/20
05
09/2
5/05
10/1
6/05
11/0
6/20
05
11/2
7/05
18/1
2/20
05
08/0
1/20
06
29/0
1/20
06
19/0
2/20
06
12/0
3/20
06
02/0
4/20
06
23/0
4/20
06
15/0
5/20
06
05/0
6/20
06
Date
Pu
bli
shed
jo
b s
lots
UK total job slots
Number of published UK job slots from GSTAT
Significant increase in early 2006
21/Sep/06 GridPP D. Britton
Availability
Average monthly SITE and CPU availability from SFT.
Large contribution from UKI
21/Sep/06 GridPP D. Britton
Availability by UK Site
0
2
4
6
8
10
12
14
RAL-LC
G2
IC L
eSC*
RALPP
Glasgo
w
IC H
EP
Queen
Mar
y, UL
UCL-HEP
Durham
Man
ches
ter
Lanc
aster
Liver
pool
Bristo
l
Oxford
Birming
ham
Cambridge
Edinbu
rgh
Brunel
Sheffie
ld
Royal H
olloway
, UL
UCL-Cen
tral
Sta
cked
% o
f S
FT
s fa
iled
eac
h m
on
th
May
April
March
February
January
21/Sep/06 GridPP D. Britton
Usage
Usage over a week (16-27 June 2006) from the Resource Broker Logs
Publishing problem.
21/Sep/06 GridPP D. Britton
Active Users (EGEE – wide)by LHC experiment
ALICE (8)
CMS (150)
ATLAS (70)
LHCb (40)
21/Sep/06 GridPP D. Britton
Active users at RAL
Number of registered users (exc. DTEAM)Quarter: 05Q4 06Q1 06Q2 06Q3 Value: 1342 1831 2777
Number of active users (> 10 jobs)Quarter: 05Q4 06Q1 06Q2 06Q3Value: 83 166 201Fraction: 6.2% 11.0%
21/Sep/06 GridPP D. Britton
Virtual Organisations Supported
0
5
10
15
20
25
RALPP
Edinb
urgh
Durha
m
Brune
l
RAL Tie
r-1
Lanc
aste
r
Royal
Hollow
ay, U
L
UCL-CCC
Oxfo
rd
IC -
LESC
IC-H
EP
Man
ches
ter
Glas
gow
Birming
ham
Que
en M
ary,
UL
Sheffie
ld
Cambr
idge
Bristo
l
UCL - H
EP
Live
rpoo
l
Nu
mb
er o
f en
able
d V
Os
Jan-06
Jun-06
21/Sep/06 GridPP D. Britton
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0
10
20
30
40
50
60
70
80
90
100
110
Raw
Scaled
% O
ccup
ancy
2005 CPU Usage/Efficiency
21/Sep/06 GridPP D. Britton
2006 CPU Usage/Efficiency
21/Sep/06 GridPP D. Britton
2006 Job Efficiency
Major LHC experiments achieve ~90%
21/Sep/06 GridPP D. Britton
Tier Centre Efficiency
0
100000
200000
300000
400000
500000
600000
700000
800000
London Tier-2 NorthGrid ScotGrid SouthGrid RAL Tier-1
To
tal H
ou
rs
Failed Hours
Success Hours
0
0.2
0.4
0.6
0.8
1
1.2
London Tier-2 NorthGrid ScotGrid SouthGrid RAL Tier-1
Tiers for the period [2006-01-01,2006-04-30]
Eff
icie
ncy
Successful hours (from RB logs) for each Tier Centre (Jan to Apr 2006)
Here, “Efficiency” =
Efficiencies for each Tier Centre.
90%
Successful TimeTotal Time
21/Sep/06 GridPP D. Britton
Tier-1
Tier-2s
Storage Accounting
Used
Unused
21/Sep/06 GridPP D. Britton
Resources Delivered by the Tier-2 Sites
0%
50%
100%
150%
200%
250%
UC
L
She
ffie
ld
Dur
ham
Cam
brid
ge
Birm
ingh
am
Edi
nbur
gh
Lanc
aste
r
QM
UL
RH
UL
Bru
nel
RA
L P
PD
Gla
sgow
Impe
rial
Oxf
ord
Bris
tol
Man
ches
ter
Live
rpoo
l% C
PU
del
iver
ed v
s sc
hed
ule
d f
or
Sep
t 05
Q405
Q106
0%
20%
40%
60%
80%
100%
120%
140%
160%
UC
L
Cam
brid
ge
Birm
ingh
am
Oxf
ord
Bris
tol
Lanc
aste
r
Gla
sgow
She
ffie
ld
QM
UL
RA
L P
PD
Edi
nbur
gh
Dur
ham
RH
UL
Impe
rial
Man
ches
ter
Live
rpoo
l
Bru
nel%
dis
k d
eliv
ered
vs
sch
edu
led
fo
r S
ept
05
Q405
Q106
CPU delivered (05Q4 and 06Q1)
Disk delivered (05Q4 and 06Q1)
Disk utilisation at Tier-2 sites has been low (and new purchases were strategically delayed).
Situation addressed in 2006 (initial one-to-one relationship with LHC experiments). Occupancy rose to ~40% level by Q2 2006.
21/Sep/06 GridPP D. Britton
Tickets Raised
0
2
4
6
8
10
12
14
16
18
RAL-LC
G2
QMUL-e
Science
Man
ches
ter
Durham
IC-L
eSC
Oxford
Lanc
aster
IC-H
EP
Birming
ham
Brunel
Liver
pool
Edinbu
rgh
RHUL
UCL-Cen
tral
RALPP
Cambridge
Glasgo
w
UCL-HEP
Sheffie
ld
Bristo
l
Nu
mb
er o
f n
ew t
icke
ts
Q1-2006 Q2-2006
0
20
40
60
80
100
120
140
160
RAL-LC
G2
QMUL-e
Science
Man
ches
ter
Durham
IC-L
eSC
Oxford
Lanc
aster
IC-H
EP
Birming
ham
Brunel
Liver
pool
Edinbu
rgh
RHUL
UCL-Cen
tral
RALPP
Cambridge
Glasgo
w
UCL-HEP
Sheffie
ld
Bristo
l
Ave
rag
e ti
me
to c
lose
tic
kets
(h
rs)
Q1-2006 Q2-2006
21/Sep/06 GridPP D. Britton
Scheduled Downtime
0
20
40
60
80
100
120
140
160
UCL-Cen
tral
Brunel
Sheffie
ld
IC L
eSC*
Cambridge
Queen
Mar
y, UL
UCL-HEP
Royal H
olloway
, UL
Durham
Liver
pool
Edinbu
rgh
IC H
EP
Glasgo
w
RALPP
Birming
ham
RAL-LC
G2
Bristo
l
Oxford
Lanc
aster
Man
ches
ter
Sta
cked
% s
ched
ule
d d
ow
n e
ach
mo
nth
May
April
March
February
January
21/Sep/06 GridPP D. Britton
Upgrades
0
5
10
15
20
25
30
35
40
09/0
4/20
05
23/0
4/20
05
07/0
5/20
05
21/0
5/20
05
04/0
6/20
05
18/0
6/20
05
02/0
7/20
05
16/0
7/20
05
30/0
7/20
05
13/0
8/20
05
27/0
8/20
05
10/0
9/20
05
24/0
9/20
05
08/1
0/20
05
22/1
0/20
05
05/1
1/20
05
19/1
1/20
05
03/1
2/20
05
17/1
2/20
05
31/1
2/20
05
14/0
1/20
06
28/0
1/20
06
11/0
2/20
06
25/0
2/20
06
11/0
3/20
06
25/0
3/20
06
08/0
4/20
06
22/0
4/20
06
06/0
5/20
06
20/0
5/20
06
03/0
6/20
06
17/0
6/20
06
# si
tes
at r
elea
se
LCG-2_6_0 LCG-2_7_0 GLITE-3_0_0 LCG-2_4_0
21/Sep/06 GridPP D. Britton
Tier-0 to Tier-1Data Transfers
• worldwide data transfers > 950MB/s for 1 week
• peak transfer rate from CERN of >1.6GB/s
• Ongoing experiment transfers as part of current service challenges
21/Sep/06 GridPP D. Britton
Tier-1 to Tier-2Data Transfers
• UK data transfers >1000Mb/s for 3 days• Peak transfer rate from RAL of >1.5Gb/s• Need high data rate transfers to/from RAL as
a routine activity
21/Sep/06 GridPP D. Britton
Summary
“From Prototype to Production” is about understanding and improving performance.
Monitoring, understanding, and improving performance of a Grid is, in itself, a Grid challenge.
Many tools and metrics have, and are, being developed to measure and monitor the GridPP/EGEE/wLCG Grid performance are now providing feedback.
21/Sep/06 GridPP D. Britton
Many Successes….
95% efficiencies
3 PB of data
GridPP3 Proposal
Industry
Schools
Physics!
MOU signed
Einstein
Einstein
Security
Wiki
21/Sep/06 GridPP D. Britton
…and many Challenges
A year later: Progress in all areas but the challenges remain.