status of the lhc computing project desy 26.1.2004 matthias kasemann / desy seminar...

56
Status of the LHC Computing Project DESY 26.1.2004 Matthias Kasemann / DESY Seminar Datenverarbeitung in der Hochenergiephysik

Upload: ashley-bryan

Post on 27-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Status of the LHC Computing Project

DESY

26.1.2004Matthias Kasemann / DESY

Seminar Datenverarbeitung in der Hochenergiephysik

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

2/4

5

Outline Introduction

LCG Overview Reosurces

The CERN LCG computing fabric

Grid R&D and deployment Deploying the LHC computing environment New projects – EGEE

The LCG Applications Status of sub-projects

And Germany…

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

3/4

5

The Large Hadron Collider Project4 detectors CMS

ATLAS

LHCb

Requirements for world-wide data analysis

Storage – Raw recording rate 0.1 – 1 GBytes/sec

Accumulating at 5-8 PetaBytes/year

10 PetaBytes of disk

Processing – 100,000 of today’s fastest PCs

Requirements for world-wide data analysis

Storage – Raw recording rate 0.1 – 1 GBytes/sec

Accumulating at 5-8 PetaBytes/year

10 PetaBytes of disk

Processing – 100,000 of today’s fastest PCs

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

4/4

5

LHC Computing Hierarchy

Emerging Vision: A Richly Structured, Global Dynamic System

Tier 1

Tier2 Center

Online System

CERN Center PBs of Disk;

Tape Robot

FNAL CenterIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute

Workstations

~100-1500 MBytes/sec

2.5-10 Gbps

Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.

~PByte/sec

~2.5-10 Gbps

Tier2 CenterTier2 CenterTier2 Center

~2.5-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

0.1 to 10 GbpsPhysics data cache

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

5/4

5

LCG - Goals The goal of the LCG project is to prototype and deploy the

computing environment for the LHC experiments

Two phases:

Phase 1: 2002 – 2005 Build a service prototype, based on existing grid middleware Gain experience in running a production grid service Produce the TDR for the final system

Phase 2: 2006 – 2008 Build and commission the initial LHC computing environment

LCG is not a development project – it relies on other grid projects for grid middleware development and support

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

6/4

5

LHC Computing Grid Project The LCG Project is a collaboration of –

The LHC experiments The Regional Computing Centres Physics institutes

.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data

This includes support for applications provision of common tools, frameworks, environment, data

persistency .. and the development and operation of a computing service

exploiting the resources available to LHC experiments in computing centres, physics institutes and universities around the world

presenting this as a reliable, coherent environment for the experiments

the goal is to enable the physicist to concentrate on science, unaware of the details and complexity of the environment they are exploiting

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

7/4

5

Requirements setting: RTAG status

RTAG1     Persistency Framework; completedRTAG2     Managing LCG Software; completedRTAG3     Math Library Review; completed RTAG4     GRID Use Cases; completedRTAG5     Mass Storage; completed RTAG6     Regional Centres; completed RTAG7     Detector geometry & materials description; completed RTAG8     LCG blueprint; completed RTAG9     Monte Carlo Generators completed RTAG10   Detector Simulation; completed RTAG11  Architectural Roadmap towards Distributed Analysis; completed

Reports of the Grid Applications Group: HEPCAL I (LCG-SC2-20-2002)

Common Use Cases for a Common Application Layer HEPCAL II (LCG-SC2-2003-032)

Common Use Cases for a Common Application Layer for Analysis

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

8/4

5Human Resources used to October 2003 CERN + apps. at other institutes

LCG Project - Human ResourcesExperience-weighted FTE-years

2002 1Q03 2Q03 3Q03 TotalApplications (with ROOT) 37.7 13.3 15.6 15.7 82.3CERN Fabric 32.2 10.4 10.3 10.7 63.5Grid Deployment 7.5 4.1 4.6 4.9 21.1Project Technology 1.2 0.3 0.8 1.4 3.6LCG Management 5.9 1.8 1.7 1.6 11.0Total 84.5 29.9 33.0 34.2 181.6

Resources used (FTE-years)

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

9/4

5

LCG Project - Human Resourcesunweighted FTEs in October-03

ApplicationsSoftware Process Infrastructure 8.7Object Persistency (POOL) 12.3Core Libraries and Services (SEAL) 5.4Physics Interfaces (PI) 5.1Simulation 16.0GRID interfacing 4.2Architecture 0.2Management 1.7Total 53.6ROOT 5.7

Includes resources at CERN and at Other Institutes

6.0

50.9

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

10

/45

LCG Project - Human Resourcesunweighted FTEs in October-03

CERN Fabric Project Technology

System Management & Operations 10.4Grid Technology, modelling, evaluations 5.0

Development (e.g. Monitoring) 9.6 Total 5.0Data Storage Management 7.5 0.0Grid Security 4.0 Grid Deployment 0.0Grid-Fabric Interface 8.0 Integration and Certification 14.5

Internal Networking for Physics 1.0Grid Infrastructure, Operations and User Support 10.0

External Networking for Physics 0.6 Total 24.5Management 1.3Total 42.4 LCG Management 5.6

Includes all staff for LHC services at CERNDoes NOT include EDG middleware

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

11

/45

Personnel resourcesIncludes –- all staff for LHC services at CERN (inc. networking)- Staff at other institutes in applications area

without Regional Centres, EDG middleware

Applications 43%

CERN Fabric31%

Grid Deployment

18%

Project Technology

4%

Project Management

4%

Current Staff (FTEs) by Project Area

LCG Special Funding

43%

EU (Fabric)2%

CERN Support

44%

CERN - Experiments

5%

Other Insts. - Experiments

6%

Current Staff (FTEs) by Funding Source

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

12

/45

Phase 1+2 – Human Resources at CERN

LCG Human ResourcesBudget & Funding Estimates

0

20

40

60

80

100

120

140

160

180

2003

2004

2005

2006

2007

2008

2009

2010

Year

FT

Es

Experiments - Level of August2003

Estimated net resources fromEGEE

Guess about EGEE follow-on

Phase II in Budget - but NOTFunded

Commitments for SpecialFunding in Phase 1 at CERN

Physics Services funded atCERN (EP, IT)

Requirements - applications -Estimated evolution

Requirements - computingservice

14

28 2715

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

13

/45

Country

CPU Capacity

kSI2k

Disk Capacity

TB

LCG Support

FTE

Online Tapes

TB

Diff. to CPU Capacity

announced May 200303

CERN 700 100 10.0 1000 0Czech Rep. 30 6 1.0 2 -30France 150 24 1.0 160 -270Germany 305 44 9.3 74 98Holland 30 1 3.0 20 -94Hungary -70Italy 895 110 13.3 100 388Japan 125 40 50 -95Poland 48 5 1.8 5 -38Russia 102 12.9 4.0 26 -18Taipei 40 5.8 4.6 30 -180UK 486 55.3 3.4 100 -1170

USA¹ 80 25 100 -721Switzerland 18 4 1.4 20 -8Spain 150 30 4.0 100 0Sweden -179Sum 3159 463 57 1787 -2388

LCG Regional Centre Capacity 2Q2004

Today: ~1000 Si2k/boxIn 2007: 8000 Si2k/box

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

14

/45

Storage (II)Storage (II)

CASTOR

CERN development of a Hierarchical Storage Management system (HSM) for LHC

Two teams are working in this area : Developer (3.8 FTE) and Support (3 FTE) Support to other institutes currently under negotiation (LCG, HEPCCC)

Usage : 1.7 PB of data with 13 million files, 250 TB disk layer and 10 PB tape storage Central Data Recording and data processing NA48 0.5 PB COMPASS 0.6 PB LHC Exp. 0.4 PB

Current CASTOR implementation needs improvements New CASTOR stager A pluggable framework for intelligent and policy controlled file access scheduling Evolvable storage resource sharing facility framework rather than a total solutiondetailed workplan and architecture available, presented to the user community in summer

Carefully watching the tape technology developments (not really commodity) in depth knowledge and understanding is key

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

15

/45

Storage (II)Storage (II)

CASTOR

CERN development of a Hierarchical Storage Management system (HSM) for LHC

Two teams are working in this area : Developer (3.8 FTE) and Support (3 FTE) Support to other institutes currently under negotiation (LCG, HEPCCC)

Usage : 1.7 PB of data with 13 million files, 250 TB disk layer and 10 PB tape storage Central Data Recording and data processing NA48 0.5 PB COMPASS 0.6 PB LHC Exp. 0.4 PB

Current CASTOR implementation needs improvements New CASTOR stager A pluggable framework for intelligent and policy controlled file access scheduling Evolvable storage resource sharing facility framework rather than a total solutiondetailed workplan and architecture available, presented to the user community in summer

Carefully watching the tape technology developments (not really commodity) in depth knowledge and understanding is key

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

16

/45

Network Network

Network infrastructure based on ethernet technology

Need for 2008 a completely new (performance) backbone in the centre based on 10 Gbit technology. Today very few vendors offer this multiport, non-blocking, 10 Gbit router. We have an Enterasys product already under test (openlab, prototype)

Timescale is tight :

Q1 2004 market surveyQ2 2004 install 2-3 different boxes, start thorough testing prepare new purchasing procedures, finance committee vendor selection, large orderQ3 2005 installlation of 25% of new backboneQ3 2006 upgrade to 50%Q3 2007 100% new backbone

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

17

/45

Currently 4 lines 21 Mbit/s, 622 Mbits/s , 2.5 Gbits/s (GEANT), dedicated 10 Gbit/s line (starlight chicago, DATATAG), next year full 10 Gbit/s production line

Needed for import and export of data, Data Challenges, todays data rate is 10 – 15 MB/s

Tests of mass storage coupling starting (Fermilab and CERN)

Next year more production like tests with the LHC experiments CMS-IT data streaming project inside the LCG framework tests on several layers : bookkeeping/production scheme, mass storage coupling, transfer protocols (gridftp, etc.), TCP/IP optimization

2008 : multiple 10 Gbit/s lines will be availble with the move to 40 Gbit/s connections CMS and LHCb will export the second copy of the raw data to the T1 center , ALICE and ATLAS want to keep the second copy at CERN (still ongoing discussion)

Wide Area Network

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

18

/45

2008 prediction

Hierarchical Ethernet network 280 GB/s 2 GB/s

~ 8000 mirrored disks ( 4 PB) 2000 mirrored disk 0.25 PB

~ 3000 dual CPU nodes (20 MSI2000) 2000 nodes 1.2 MSI2000

~ 170 tape drives (4 GB/s) 50 drives 0.8 GB/s

~ 25 PB tape storage 10 PB

The CMS HLT will consist of about 1000 nodes with 10 million SI2000 !!

2003 status

ComparisonComparison

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

19

/45

External Fabric relationsExternal Fabric relations

Collaboration with IndustryopenlabHP, INTEL, IBM, Enterasys, Oracle-- 10 Gbit networking-- new CPU technology-- possibly , new storage technology

Collaboration with IndustryopenlabHP, INTEL, IBM, Enterasys, Oracle-- 10 Gbit networking-- new CPU technology-- possibly , new storage technology

EDG, WP4-- Installation-- Configuration-- Monitoring-- Fault tolerance

EDG, WP4-- Installation-- Configuration-- Monitoring-- Fault tolerance

GRID Technology and deployment-- Common fabric infrastructure-- Fabric GRID interdependencies

GRID Technology and deployment-- Common fabric infrastructure-- Fabric GRID interdependencies

GDB working groups-- Site coordination-- Common fabric issues

GDB working groups-- Site coordination-- Common fabric issues

LCG -- Hardware resources-- Manpower resources

LCG -- Hardware resources-- Manpower resources

External network-- DataTag, Grande-- Data Streaming project with Fermilab

External network-- DataTag, Grande-- Data Streaming project with Fermilab

Collaboration with India-- filesystems-- Quality of Service

Collaboration with India-- filesystems-- Quality of Service

Collaboration with CASPUR--harware and software benchmarks and tests, storage and network

Collaboration with CASPUR--harware and software benchmarks and tests, storage and network

CERN IT• Main Fabric providerCERN IT• Main Fabric provider

LINUX-- Via HEPIX RedHat license coordination inside HEP (SLAC, Fermilab) certification and security

LINUX-- Via HEPIX RedHat license coordination inside HEP (SLAC, Fermilab) certification and security

CASTOR-- SRM definition and implementation (Berkeley, Fermi, etc.)-- mass storage coupling tests (Fermi)-- scheduler integration (Maui, LSF)-- support issues (LCG, HEPCCC)

CASTOR-- SRM definition and implementation (Berkeley, Fermi, etc.)-- mass storage coupling tests (Fermi)-- scheduler integration (Maui, LSF)-- support issues (LCG, HEPCCC)

Online-Offline boundaries-- workshop and discussion with Experiments-- Data Challenges

Online-Offline boundaries-- workshop and discussion with Experiments-- Data Challenges

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

20

/45

FA Timeline FA Timeline

2004 2005 2006 2007 2008

LCG Computing TDRLCG Computing TDR

25% of network backbone25% of network backbone

Phase 2 installations tape,cpu,disk 30 %Phase 2 installations tape,cpu,disk 30 %

LHC start

50% of network backbone50% of network backbone

100% of network backbone100% of network backbone

Decision onstorage solutionDecision onstorage solution

Power and cooling0.8 MWPower and cooling0.8 MW

Power and cooling2.5 MWPower and cooling2.5 MW

Phase 2 installations tape,cpu,disk 60 %Phase 2 installations tape,cpu,disk 60 %

Power and cooling1.6 MWPower and cooling1.6 MW

Decision on batchschedulerDecision on batchscheduler

preparations, benchmarks,data challengesarchitecture verificationevaluationscomputing models

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

21

/45

Resources and Requests

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 3 4 5 6

Month 2004

KSI2

000

LHCb

CMS

ATLAS

ALICE

Possible res.

Tier resources

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

22

/45

Tier 0 CERNTier 1 Centres Brookhaven National Lab CNAF Bologna Fermilab FZK Karlsruhe IN2P3 Lyon Rutherford Appleton Lab

(UK) University of Tokyo CERN

Other CentresAcademica Sinica (Taipei)BarcelonaCaltechGSI DarmstadtItalian Tier 2s(Torino, Milano, Legnaro)Manno (Switzerland)Moscow State UniversityNIKHEF AmsterdamOhio Supercomputing CentreSweden (NorduGrid)Tata Institute (India)Triumf (Canada)UCSDUK Tier 2sUniversity of Florida– Gainesville University of Prague……

LHC Computing Grid Service

Initial sites – deploying now Ready in next 6-12 months

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

23

/45

Elements of a Production LCG Service

Middleware: Testing and certification Packaging, configuration, distribution and site validation Support – problem determination and resolution; feedback to middleware

developers Operations:

Grid infrastructure services Site fabrics run as production services Operations centres – trouble and performance monitoring, problem resolution –

24x7 globally RAL is leading sub-project on developing operations services

Initial prototype – • Basic monitoring tools• Mail lists and rapid communications/coordination for problem resolution

Support: Experiment integration – ensure optimal use of system User support – call centres/helpdesk – global coverage; documentation; training FZK leading sub-project to develop user support services

Initial prototype – • Web portal for problem reporting • Expectation that initially experiments will triage problems and experts will submit

LCG problems to the support service

Matt

hia

s K

ase

mann

, Sta

tus

of

the L

CG

2

6.1

.2004

24

/45

Timeline for the LCG services

Event simulation productions

Service for Data Challenges,batch analysis, simulation

Validation of computing models

Acquisition, installation, testing of Phase 2 service

Phase 2 service in production

2003 2006

LCG-1LCG-1 LCG-2LCG-2 LCG-3LCG-3

Agree LCG-1 Spec

LCG-1 service opens

LCG-2 with upgraded m/w, management etc.

TDR for Phase 2

Stabilize, expand, develop Evaluation 2nd generation middleware

LCG-3 full multi-tier prototype batch+interactive service

Computing model TDR’s

20052004

[email protected] 25

CERN

Deployment Activities: Human ResourcesDeployment Activities: Human ResourcesDeployment Activities: Human ResourcesDeployment Activities: Human Resources

Activity CERN/LCG External

Integration & Certification 6 External tb sites

Debugging/development/mw support

3

Testing 32 + VDT testers

group

Experiment Integration & Support

5

Deployment & Infrastructure Support

5.5RC system managers

Security/VO Management 2Security Task

Force

Operations CentresRAL + GOC Task

Force

Grid User SupportFZK + US Task

Force

Management 1

Totals 25.5

Team of 3 Russians have 1 at CERN at a given time (3 months)

Refer to Security talk

Refer to Operations Centre talk

In collaboration

The GDA team has been very understaffed – only now has this improved with 7 new fellows

There are many opportunities for more collaborative involvement in operational and infrastructure activities

[email protected] 26

CERN

2003 Milestones2003 Milestones2003 Milestones2003 Milestones

Project Level 1 Deployment milestones for 2003:

July: Introduce the initial publicly available LCG-1 global grid service

• With 10 Tier 1 centres in 3 continents

November: Expanded LCG-1 service with resources and functionality sufficient for the 2004 Computing Data Challenges

• Additional Tier 1 centres, several Tier 2 centres – more countries• Expanded resources at Tier 1s • Agreed performance and reliability targets

• The idea was:– Deploy a service in July

• Several months to gain experience (operations, experiments)– By November

• Meet performance targets (30 days running) – experiment verification

• Expand resources – regional centres and compute resources• Upgrade functionality

[email protected] 27

CERN

Milestone StatusMilestone StatusMilestone StatusMilestone Status

• July milestone was 3 months late– Late middleware, slow takeup in regional centres

• November milestone will be partially met– LCG-2 will be a service for the Data Challenges– Regional Centres added to the level anticipated (23), including

several Tier 2 (Italy, Spain, UK)– But:

• lack of operational experience• Experiments have only just begun serious testing

• LCG-2 will be deployed in December– Functionality required for DCs– Meet verification part of milestone with LCG-2 – early next year– Experiments must drive addition of resources into the service– Address operational and functional issues in parallel with

operations• Stability, adding resources

– This will be a service for the Data Challenges

[email protected] 28

CERN

Sites in LCG-1 – 21 NovSites in LCG-1 – 21 NovSites in LCG-1 – 21 NovSites in LCG-1 – 21 Nov

–FNAL–FZK

•Krakow

–Moscow–Prague–RAL

•Imperial C.•Cavendish

–Taipei–Tokyo

–PIC-Barcelona •IFIC Valencia

•Ciemat Madrid•UAM Madrid•USC Santiago de Compostela•UB Barcelona•IFCA Santander

–BNL–Budapest–CERN–CNAF

•Torino•MilanoSites to enter soon

CSCS Switzerland, Lyon, NIKHEF

More tier2 centres in Italy, UK

Sites preparing to join

Pakistan, Sofia

[email protected] 29

CERN

Achievements – 2003 Achievements – 2003 Achievements – 2003 Achievements – 2003

• Put in place the Integration and Certification process:– Essential to prepare m/w for deployment – the key tool in trying to

build a stable environment– Used seriously since January for LCG-0,1,2 – also provided crucial input

to EDG LCG is more stable than earlier systems

• Set up the deployment process:• Tiered deployment and support system is working• Currently support load on small team is high, must devolve to GOC

• Support experiment deployment on LCG-0,1• User support load high – must move into support infrastructure (FZK)• CMS use of LCG-0 in production• Produced a comprehensive User Guide

• Put in place security policies and agreements• Particularly important agreements on registration requirements

• Basic Operations Centre and Call Centre frameworks in place

• Expect to be ready for the 2004 DCs• Essential infrastructures are ready, but have not yet been production

tested• And, improvements will happen in parallel with operating the system

[email protected] 30

CERN

Issues Issues Issues Issues

• Middleware is not yet production quality – Although a lot has been improved, still unstable, unreliable– Some essential functionality was not delivered – LCG had to address

• Deployment tools not adequate for many sites– Hard to integrate into existing computing infrastructures– Too complex, hard to maintain and use

• Middleware limits a site’s ability to participate in multiple grids– Something that is now required for many large sites supporting many

experiments, and other applications

• We are only now beginning to try and run LCG as a service– Beginning to understand and address missing tools, etc for operation

• Delays have meant that we could not yet address these fundamental issues as we had hoped to this year

[email protected] 31

CERN

Expected Developments in 2004Expected Developments in 2004Expected Developments in 2004Expected Developments in 2004

• General:– LCG-2 will be the service run in 2004 – aim to evolve

incrementally– Goal is to run a stable service

• Some functional improvements:– Extend access to MSS – tape systems, and managed disk pools– Distributed replica catalogs – with Oracle back-ends

• To avoid reliance on single service instances

• Operational improvements:– Monitoring systems – move towards proactive problem finding,

ability to take sites on/offline– Continual effort to improve reliability and robustness– Develop accounting and reporting

• Address integration issues:– With large clusters, with storage systems– Ensure that large clusters can be accessed via LCG – Issue of integrating with other experiments and apps

[email protected] 32

CERN

Services in 2004 – EGEE Services in 2004 – EGEE Services in 2004 – EGEE Services in 2004 – EGEE

• LCG-2 will be the production service during 2004– Will also form basis of EGEE initial production service– EGEE will take over operations during 2004 – but same teams– Will be maintained as a stable service– Peering with other grid infrastructures– Will continue to be developed

• Expect in parallel a development service – Q204– Based on EGEE middleware prototypes– Run as a service on a subset of EGEE/LCG production sites– Demonstrate new architecture and functionality to applications

• Additional resources to achieve this come from EGEE

• Development service must demonstrate superiority– All aspects: functionality, operational, maintainability, etc.

• End 2004 – hope that development service becomes the production service

[email protected] 33

CERN

Deployment: SummaryDeployment: SummaryDeployment: SummaryDeployment: Summary

• LCG-1 is deployed to 23+ sites– Despite the complexities and problems

• The LCG-2 functionality– Will support production activities for the Data Challenges– Will allow basic tests of analysis activities– And the infrastructure is there to provide all aspects of support

• Staffing situation at CERN is now better– Most of what was done so far was with limited staffing– Need to clarify situation in the regional centres

• We are now in a position to address the complexities

[email protected] 34

CERN

EGEE-LCG RelationshipEGEE-LCG RelationshipEGEE-LCG RelationshipEGEE-LCG Relationship

Enabling Grids for e-Science in Europe – EGEE• EU project approved to provide partial funding for

operation of a general e-Science grid in Europe, including the supply of suitable middleware

• EGEE provides funding for 70 partners, large majority of which have strong HEP ties

Agreement between LCG and EGEE management on very close integration

OPERATIONS• LCG operates the EGEE infrastructure as a service to

EGEE - ensures compatibility between the LCG and EGEE grids

• In practice – the EGEE grid will grow out of LCG• The LCG Grid Deployment Manager (Ian Bird) serves also

as the EGEE Operations Manager

[email protected] 35

CERN

EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)EGEE-LCG Relationship (ii)

• MIDDLEWARE• The EGEE middleware activity provides a middleware package

– satisfying requirements agreed with LCG (..HEPCAL, ARDA, ..)– and equivalent requirements from other sciences

• Middleware - the tools that provide functions –– that are of general application .. – …. not HEP-special or experiment-special– and that we can reasonably expect to come in the long term from

public or commercial sources (cf internet protocols, unix, html)• Very tight delivery timescale dictated by LCG requirements

– Start with LCG-2 middleware– Rapid prototyping of a new round of middleware. First “production”

version in service by end 2004• The EGEE Middleware Manager (Frédéric Hemmer) serves also as

the LCG Middleware Manager

[email protected] 36

CERN

PARTNERS

70 partners organized in nine regional federations

Coordinating and Lead Partner: CERN

CENTRAL EUROPE – FRANCE - GERMANY & SWITZERLAND – ITALY - IRELAND & UK - NORTHERN EUROPE - SOUTH-EAST EUROPE - SOUTH-WEST EUROPE – RUSSIA - USA

STRATEGY

Leverage current and planned national and regional Grid programmes Build on existing investments in Grid Technology by EU and US Exploit the international dimensions of the HEP-LCG programme Make the most of planned collaboration with NSF CyberInfrastructure initiative

A seamless international Grid infrastructure to provide researchers in academia and industry

with a distributed computing facility

ACTIVITY AREAS

SERVICESDeliver “production level” grid services (manageable, robust, resilient to failure) Ensure security and scalability

MIDDLEWARE Professional Grid middleware re-engineering activity in support of the production services

NETWORKING Proactively market Grid services to new research communities in academia and industry Provide necessary education

[email protected] 37

CERN

Integrate regional grid efforts Represent leading grid activities in Europe

EGEE: partner federationsEGEE: partner federations

9 regional federations covering 70 partners in

26 countries

[email protected] 38

CERN

EGEE ProposalEGEE ProposalEGEE ProposalEGEE Proposal

• Proposal submitted to EU IST 6th framework call on 6th May 2003•Executive summary (exec summary: 10 pages; full proposal: 276 pages)

http://agenda.cern.ch/askArchive.php?base=agenda&categ=a03816&id=a03816s5%2Fdocuments%2FEGEE-executive-summary.pdf

•Activities•Deployment of Grid Infrastructure

•Provide a grid service for science research•Initial service will be based on LCG-1•Aim to deploy re-engineered middleware at the end of year 1

•Re-Engineering of grid middleware •OGSA environment – well defined services, interfaces, protocols•In collaboration with US and Asia-Pacific developments•Using LCG and HEP experiments to drive US-EU interoperability and common solutions•A common design activity should start now

•Dissemination, Training and Applications •Initially HEP & Bio

[email protected] 39

CERNEGEE: timelineEGEE: timeline

• May 2003• proposal submitted

• July 2003• positive EU reaction

• September 2003• started negotiation• approx 32 M€ over

2 years

• December 2003• sign EU contract

• April 2004• start project

40/13

41/13

42/13

Matt

hia

s K

ase

mann

, G

rid C

om

puti

ng, 1

6.1

0.2

003

43

/45

Group Level Analysis in an HENP experiment User specifies job information including

• Selection criteria;• Metadata Dataset (input);• Information about s/w (library) and configuration versions • Output AOD and/or TAG Dataset (typical);• Program to be run;

User submits job; Program is run; Selection Criteria are used for a query on the Metadata Dataset; Event ID satisfying the selection criteria and Logical Dataset Name of

corresponding Datasets are retrieved; Input Datasets are accessed; Events are read; Algorithm (program) is applied to the events; Output Datasets are uploaded; Experiment Metadata is updated; Report summarizing the output of the jobs is prepared for the group

(eg. how many evts to which stream, ...) extracting the information from the application and GridMW

•Authentication•Authorization•Metadata catalog

•Workload management

•Metadata catalog

•Package manager•Compute element

•Metadata catalog•Job provenance

•Auditing

HENP Distributed Analysis

•File catalog•Data management•Storage Element

•File catalog•Data management•Storage Element

LCG

AliEn EDG VDT

LHC Computing Grid:A Roadmap towards Distributed Analysis

• based on review of experiments needs

• input to EGEE Middleware re-engineering

Information Service

Authentication

Authorization

Auditing

Grid Monitoring

Workload Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7: 12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

Proposed: Key Services for HEP Distributed Analysis

45/13

46/13

24 November 2003 - 47EGEE MiddlewareLHCC LCG review

ARDA, EGEE & LCG Middleware possible evolution

LCG2 LCG2 ’’

12/2003 12/2004

ARDAStarts

ARDAPrototype

EGEEStarts

EGEERelease

Candidate

LCG2 ’

LCG

LHCC Review , Nov 2003 Slide 48 Torre Wenaus, BNL/CERN

Applications Area Organisation

Applicationsmanager

Architects forumAA manager (chair),

experiment architects,project leaders,ROOT leader

Applicationsarea

meeting

Simulationproject

PIproject

SEALproject

POOLproject

SPIproject

decisionsstrategy

consultation

ROOTUser - provider

~2 meetings/mo

Public minutes

~3 meetings/mo

Open to all

25-50 attendees

Now developing a more collaborative

ROOT relationship than user/provider

Projects organized into work packages

LHCC Review , Nov 2003 Slide 49 Torre Wenaus, BNL/CERN

Apps Area Projects and their Relationships

Sof

twa

re P

roce

ss &

Inf

rast

ruct

ure

(S

PI)

Core Libraries & Services (SEAL)

Persistency(POOL)

PhysicistInterface

(PI)Simulation…

LCG Applications Area

LHC Experiments

Oth

er L

CG

Are

as

LHCC Review , Nov 2003 Slide 50 Torre Wenaus, BNL/CERN

EventGeneration

Core Services

Dictionary

Whiteboard

Foundation and Utility Libraries

DetectorSimulation

Engine

Persistency

StoreMgr

Reconstruction

Algorithms

Geometry Event Model

GridServices

I nteractiveServices

Modeler

GUIAnalysis

EvtGen

Calibration

Scheduler

Fitter

PluginMgr

Monitor

NTuple

Scripting

FileCatalog

ROOT GEANT4 DataGrid Python Qt

Monitor

. . .MySQLFLUKA

EventGeneration

Core Services

Dictionary

Whiteboard

Foundation and Utility Libraries

DetectorSimulation

Engine

Persistency

StoreMgr

Reconstruction

Algorithms

Geometry Event Model

GridServices

I nteractiveServices

Modeler

GUIAnalysis

EvtGen

Calibration

Scheduler

Fitter

PluginMgr

Monitor

NTuple

Scripting

FileCatalog

ROOT GEANT4 DataGrid Python Qt

Monitor

. . .MySQLFLUKA

Applications Domain Decomposition

Products mentioned are examples; not a comprehensive list

Project activity in all expected areasexcept grid services (coming with ARDA)

LHCC Review , Nov 2003 Slide 51 Torre Wenaus, BNL/CERN

Level 1 and Highlighted Level 2 milestones

Jan 03: SEAL, PI workplans approved Mar 03: Simulation workplan approved Apr 03: SEAL V1 (Priority POOL needs) May 03: SPI software library fully deployed Jun 03: General release of POOL (LHCC Level 1)

Functionality requested by experiments for production usage Jul 03: SEAL framework services released (experiment directed) Jul 03: CMS POOL/SEAL integration Sep 03: ATLAS POOL/SEAL integration Oct 03: CMS POOL/SEAL validation (~1M events/week written) Dec 03: LHCb POOL/SEAL integration Jan 04: ATLAS POOL/SEAL validation (50TB DC1 POOL store) May 04: Generator event database beta in production Oct 04: Generic simulation framework production release Dec 04: Physics validation document Mar 05: Full function release of POOL (LHCC Level 1)

Completed

See supplemental slide for current milestone performance

LHCC Review , Nov 2003 Slide 52 Torre Wenaus, BNL/CERN

L1+L2 milestone counts by quarter

0

2

4

6

8

10

12

14

16

2002

Q2

2002

Q3

2002

Q4

2003

Q1

2003

Q2

2003

Q3

2003

Q4

2004

Q1

2004

Q2

2004

Q3

2004

Q4

2005

Q1

Level 2

Level 1

2004 milestones will be fleshed out by the workplan updates due this quarter the finalization of the slate of ~10 L2 milestones for the next quarter

(~2 per project) that we do before each quarterly report New milestones will be added in ARDA planning

LHCC Review , Nov 2003 Slide 53 Torre Wenaus, BNL/CERN

Applications Area Personnel Resources

LCG applications area hires complete 21 working; target in Sep 2001 LCG proposal was 23 Contributions from UK, Spain, Switzerland, Germany,

Sweden, Israel, Portugal, US, India, and Russia Similar contribution levels from CERN, experiments

In FTEs.

Experiment number includes CERN

people working on experiments

See supplemental slide for

detail on personnel sources

LHCC Review , Nov 2003 Slide 54 Torre Wenaus, BNL/CERN

Personnel Distribution

Effort levels match needanticipated in blueprint RTAG

LHCC Review , Nov 2003 Slide 55 Torre Wenaus, BNL/CERN

And in Germany...

We have GridKa and GSI and DESY

And we are starting to increase the HEP Grid footprint: Feb. 11: meeting of the German HEP Groups to

… to establish more flags on the German Grid map.

LHCC Review , Nov 2003 Slide 56 Torre Wenaus, BNL/CERN

Computing for Particle Physics: DESY

At DESY HEP computing is part of the physics programme.

DESY operates Tier 0 and Tier 1 centres for H1, ZEUS, Hermes, HeraB, LC and computing for theory.

Services & Development(highlights)

Distributed Monte Carlo production since 1993

Data handling on the Grid: dCache

OO simulation: Geant4 Large computing clusters

and data storage