last update: 01/06/2015 04:09 lcg les robertson - cern-it 1 lhc computing grid project - lcg the lhc...

30
les robertson - cern- it 1 st update: 03/11/22 15:25 LCG LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing Facility for Physics 18 September 2003 Les Robertson – LCG Project Leader CERN – European Organization for Nuclear Research Geneva, Switzerland LCG

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it 1last update: 18/04/23 19:51

LCG LHC Computing Grid Project - LCG

The LHC Computing Grid

First steps towards a

Global Computing Facility for Physics18 September 2003

Les Robertson – LCG Project LeaderCERN – European Organization for Nuclear Research

Geneva, Switzerland

[email protected]

LCG

Page 2: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-2last update 18/04/23 19:51

LCG LHC Computing Grid Project

The LCG Project is a collaboration of – The LHC experiments The Regional Computing Centres Physics institutes

.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data

This includes support for applications provision of common tools, frameworks, environment, data

persistency .. and the development and operation of a computing service

exploiting the resources available to LHC experiments in computing centres, physics institutes and universities around the world

presenting this as a reliable, coherent environment for the experiments

Page 3: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-3last update 18/04/23 19:51

LCG

Joint with EGEE

Applications Area

Development environmentJoint projects

Data managementDistributed analysis

Middleware Area

Provision of a base set of gridmiddleware – acquisition,development, integration,

testing, support

CERN Fabric Area

Large cluster managementData recording

Cluster technologyNetworking

Computing service at CERN

Grid Deployment Area

Establishing and managing theGrid Service - Middleware

certification, security, operations,registration, authorisation,

accounting

Operational Management of the Project

Page 4: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-4last update 18/04/23 19:51

LCG Applications Area Projects Software Process and Infrastructure (SPI) (A.Aimar)

Librarian, QA, testing, developer tools, documentation, training, … Persistency Framework (POOL) (D.Duellmann)

POOL hybrid ROOT/relational data store Core Tools and Services (SEAL) (P.Mato)

Foundation and utility libraries, basic framework services, object dictionary and whiteboard, math libraries, (grid enabled services)

Physicist Interface (PI) (V.Innocente) Interfaces and tools by which physicists directly use the software. Interactive analysis,

visualization, (distributed analysis & grid portals) Simulation (T.Wenaus et al)

Generic framework, Geant4, FLUKA integration, physics validation, generator services

Close relationship with -- ROOT (R.Brun) ROOT I/O event store; Analysis package

Group currently working on distributed analysis requirements

Page 5: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

LHC data

• 40 million collisions per second

• After filtering, 100-200 collisions of interest per second

• 1-10 Megabytes of data digitised for each collision = recording rate of 0.1-1 Gigabytes/sec

• 1010 collisions recorded each year = ~15 Petabytes/year of data

CMS LHCb ATLAS ALICE

1 Megabyte (1MB)A digital photo

1 Gigabyte (1GB) = 1000MBA DVD movie

1 Terabyte (1TB)= 1000GBWorld annual book production

1 Petabyte (1PB)= 1000TBAnnual production of one LHC experiment

1 Exabyte (1EB)= 1000 PBWorld annual information production

Page 6: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

LHC data

~15 PetaBytes – about 20 million CDs each year!

Concorde(15 Km)

Balloon(30 Km)

CD stack with1 year LHC data!(~ 20 Km)

Mt. Blanc(4.8 Km)

Its analysis will need the computing power of ~ 100,000 of today's fastest PC processors!

Where will the experiments store all of this data?

and where will they find this computing power?

Page 7: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

The CERN Computing Centre

~2,000 processors~100 TBytes of disk~1 PB of magnetic tape

Even with technology-driven improvementsin performance and costs – CERN can provide nowhere near enough capacity for LHC!

Page 8: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

Computing for LHC

Solution: Computing centres, which were isolated in the past, will be connected into a computing grid

Europe: 267 institutes4603 users

Elsewhere: 208 institutes1632 users

-- uniting the computing resources of particle physicists in the world!  

Page 9: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-9last update 18/04/23 19:51

LCG LCG Regional Centres

First wave centres CERN Academica Sinica Taiwan Brookhaven National Lab PIC Barcelona CNAF Bologna Fermilab FZK Karlsruhe IN2P3 Lyon KFKI Budapest Moscow State University University of Prague Rutherford Appleton Lab (UK) University of Tokyo

Other Centres Caltech GSI Darmstadt Italian Tier 2s(Torino, Milano,

Legnaro) JINR Dubna Manno (Switzerland) NIKHEF Amsterdam Ohio Supercomputing Centre Sweden (NorduGrid) Tata Institute (India) Triumf (Canada) UCSD UK Tier 2s University of Florida–

Gainesville ……

Pilot production service : 2003 – 2005 Pilot production service : 2003 – 2005

Page 10: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-10last update 18/04/23 19:51

LCG

Some of the Sources for Middleware & Tools used by

LCG

The Virtual Data Toolkit - VDT

Page 11: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-11last update 18/04/23 19:51

LCGGoals for LCG-1 – 2003-04

The Pilot Grid Service for LHC Experiments

Be the principal service for Data Challenges in 2004 Initially focused on batch production work And later also interactive analysis

Get experience in close collaboration between the Regional Centres

Learn how to maintain and operate a global grid

Focus on a production-quality Robustness, fault-tolerance, predictability, and supportability

take precedence; additional functionality gets prioritized

LCG should be integrated into the sites’ mainline physics computing services – it should not be something apart This requires coordination between participating sites in:

Policies and collaborative agreements Resource planning and scheduling Operations and Support

Page 12: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-12last update 18/04/23 19:51

LCGElements of a Production LCG

Service Middleware:

Integration, testing and certification Packaging, configuration, distribution and site validation Support – problem determination and resolution; feedback to

middleware developers Operations:

Grid infrastructure services Site fabrics run as production services Operations centres – trouble and performance monitoring, problem

resolution – 24x7 globally Support:

Experiment integration – ensure optimal use of system User support – call centres/helpdesk – global coverage; documentation;

training

Page 13: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-13last update 18/04/23 19:51

LCG

Certification and distribution process established Middleware package – components from –

European DataGrid (EDG) US (Globus, Condor, PPDG, GriPhyN) the Virtual Data Toolkit

Agreement reached on principles for registration and security Rutherford Lab (UK) to provide the initial Grid Operations Centre FZK (Karlsruhe) to operate the Call Centre

Pre-release middleware deployed in July to the initial 10 centres The “certified” release was made available to 13 centres on 1

September – Academia Sinica Taipei, BNL, CERN, CNAF, FNAL, FZK, IN2P3 Lyon, KFKI Budapest, Moscow State Univ., Prague, PIC Barcelona, RAL, Univ. Tokyo

Next steps – - Get the experiments going - Expand to other centres

LCG LCG Service Status

Page 14: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-14last update 18/04/23 19:51

LCGPreliminary full simulation and reconstruction tests with ALICE

ALIEN submitting work to the LCG service

Aliroot 3.09.06 - fully reconstructed events

CPU-intensive, RAM-demanding (up to 600MB), long lasting jobs ( average 14 hours )

Outcome: > 95 % successful job submission, execution and

output retrieval

Page 15: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-15last update 18/04/23 19:51

LCGLCG 1 at the Time of First

Release

Impressive improvement on Stability w.r.t. old 1.x EDG releases and corresponding testbeds

Lots of room for further improvements

Additional features to be added before the end of the year, in preparation for the data challenges of 2004

As more centres join, Scalability will surely become a major issue

Page 16: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-16last update 18/04/23 19:51

LCG

Resources committed for 1Q04

Resources in Regional Centres

Resources planned for the period of the data challenges in 2004

CERN ~12% of the total capacity

Numbers have to be refined – different standards used by different countries

Efficiency of use is a major question mark

Reliability Efficient scheduling Sharing between Virtual

Organisations (user groups)

  CPU (kSI2K)

Disk TB

Support FTE

Tape TB

CERN 700 160 10.0 1000

Czech Repub 60 5 2.5 5

France 420 81 10.2 540

Germany 207 40 9.0 62

Holland 124 3 4.0 12

Italy 507 60 16.0 100

Japan 220 45 5.0 100

Poland 86 9 5.0 28

Russia 120 30 10.0 40

Taiwan 220 30 4.0 120

Spain 150 30 4.0 100

Sweden 179 40 2.0 40

Switzerland 26 5 2.0 40

UK 1656 226 17.3 295

USA 801 176 15.5 1741

Total 5600 1169 120.0 4223

Page 17: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-17last update 18/04/23 19:51

LCG

From LCG-1 to LHC Startup

Page 18: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-18last update 18/04/23 19:51

LCGWhere are we now with Grid

Technology? For LHC –

- we now understand the basic requirements for batch processing

And we have a prototype solution developed by Globus and Condor in the US and the DataGrid and related projects in Europe

It is more difficult than was expected – - reliability, scalability, monitoring, operation, ..And we are not yet seeing useful industrial products

But we are ready to start re-engineering the components part of the large EGEE project proposal submitted to the EU re-write of Globus using a web-services architecture is now

available

Many more practical problems will be discovered now that we start running a grid as a sustained round-the-clock service

and the LHC experiments begin to use it for doing real work

Page 19: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-19last update 18/04/23 19:51

LCGGrid Middleware for LCG in the

Longer Term

Requirements – A second round of specification of the basic grid

requirements is being completed now - HEPCAL II A team has started to specify the higher level

requirements for distributed analysis – batch and interactive – and define the HEP-specific tools that will be needed

For basic middleware the current strategy is to assume that the US DoE/NSF will provide a well supported Virtual

Data Toolkit based on Globus Toolkit 3 that the EGEE project, approved for EU 6th framework

funding, will develop the additional tools needed by LCG

And the LCG Applications Area will develop higher-level HEP-specific functionality

LCG

Page 20: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

ARDA – An Architectural RoadmapTowards Distributed Analysis

Information Service

Authentication

Authorisation

Audi ting

Grid Monitoring

Workload Management

Metadata Catalogue

File Catalogue

Data Management

Computing Element

Storage Element

Job Monitor

Job Provenance

Package Manager

DB Proxy

User Interface

API

Accounting

7:

12:

5:

13:

8:

15: 11:

9: 10:

1:

4:

2:

3:

6:

14:

Work inProgress

!

Page 21: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-21last update 18/04/23 19:51

LCGHow close are we to LHC startup?

agree spec. of initial service (LCG-1)open LCG-1

used for simulated event productions

* TDR – technical design report

stabilise, expand the service develop operations centre, etc.

2003

2004

2005

2006

2007

first data

physicscomputing service

Starter toolkit – componentsfrom VDT and EDG

Page 22: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-22last update 18/04/23 19:51

LCGTimeline for the LCG computing service

agree spec. of initial service (LCG-1)open LCG-1

used for simulated event productions

principal service for LHC data challenges – batch analysis and simulation

LCG-2 upgraded middleware, mgt. s/w

* TDR – technical design report

stabilise, expand the service develop operations centre, etc.

2003

2004

2005

2006

2007

first data

physicscomputing service

This is the full complement of thefirst generation middleware fromVDT/EDG - hardened to provide a

stable, reliable service

Computing model TDRs*

Page 23: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-23last update 18/04/23 19:51

LCGTimeline for the LCG computing service

agree spec. of initial service (LCG-1)open LCG-1

used for simulated event productions

principal service for LHC data challenges – batch analysis and simulation

validation of computing models

LCG-2 upgraded middleware, mgt. s/w

LCG-3 full multi-tier prototype service – batch and interactive

* TDR – technical design report

stabilise, expand the service develop operations centre, etc.

2003

2004

2005

2006

2007

first data

physicscomputing service

Computing model TDRs*

testing, hardening of 2nd generation middleware

TDR for the Phase 2 gridAt this point we expect to startdeploying second generation

middleware components- building this up during the year

to attain the full base functionalityrequired for LHC startup

Page 24: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-24last update 18/04/23 19:51

LCGTimeline for the LCG computing service

agree spec. of initial service (LCG-1)open LCG-1

used for simulated event productions

principal service for LHC data challenges – batch analysis and simulation

validation of computing models

LCG-2 upgraded middleware, mgt. s/w

LCG-3 full multi-tier prototype service – batch and interactive

TDR for the Phase 2 grid

acquisition, installation, commissioning of Phase 2 service (for LHC startup)

Phase 2 service in production

* TDR – technical design report

stabilise, expand the service develop operations centre, etc.

testing, hardening of 2nd generation middleware

2003

2004

2005

2006

2007

first data

experiment setup & preparation

physicscomputing service

Computing model TDRs*

At CERN the acquisitionprocess will have startedalready during 2004 with

a market survey

Page 25: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-25last update 18/04/23 19:51

LCGTimeline for the LCG computing service

agree spec. of initial service (LCG-1)open LCG-1

used for simulated event productions

principal service for LHC data challenges – batch analysis and simulation

validation of computing models

LCG-2 upgraded middleware, mgt. s/w

LCG-3 full multi-tier prototype service – batch and interactive

TDR for the Phase 2 grid

acquisition, installation, commissioning of Phase 2 service (for LHC startup)

Phase 2 service in production

* TDR – technical design report

stabilise, expand the service develop operations centre, etc.

testing, hardening of 2nd generation middleware

2003

2004

2005

2006

2007

first data

experiment setup & preparation

physicscomputing service

Computing model TDRs*

Page 26: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-26last update 18/04/23 19:51

LCGEvolution of the Base

Technology

These are still very early days – with very few grids providing a reliable, round-the-clock “production” service

And few applications that are managing gigantic distributed databases

Although the basic ideas and tools have been around for a long time, we are only now seeing these applied to large scale services

Developing the grid concept continues to attract substantial interest and public funding

There are major changes taking place in architecture and frameworks –

E.g. the Open Grid Services Architecture and Infrastructure (OGSA, OGSI)

and there will be more to come as experience grows

There is a lot of commercial interest from potential software suppliers (IBM, HP, Microsoft, ..) – but no clear sight of useful products

Page 27: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-27last update 18/04/23 19:51

LCGAdapting to the changing

landscape In the short-term there will many grids and several middleware

implementations -- for LCG - inter-operability will be a major headache

Will be all agree on a common set of tools? - unlikely! Or will we have to operate a grid of grids – some sort of federation? Or will computing centres be part of several grids?

The Global Grid Forum – promises to provide a mechanism for evolving architecture and agreeing on standards – but this is a long-term process

In the medium-term, until there is substantial practical experience with different architectures and different implementations,de facto standards will emerge

How quickly will we recognise the winners? Will we have become too attached to our own developments to

change?

Page 28: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-28last update 18/04/23 19:51

LCG

-- this is probably the greatest risk that we take by adopting the grid model for LHC computing

Access Rights and Security

The grid idea assumes global authentication, and authorisation based on the user’s role in his virtual organisation

-- one set of credentials for everything you do

The agreement for LHC is that all members of a physics collaboration will have access to all of its resources

-- the political implications of this have still to be tested!

Could be an attractive target for hackers

Page 29: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-29last update 18/04/23 19:51

LCGKey LCG goals for next 12

months

Take-up by the experiments of the first versions of common applications – starts NOW

Evolve the LCG-1 service into a production facility for LHC experiments – validated in data challenges

Establish the requirements and a credible implementation plan for baseline distributed grid analysis for 2007-08

the model hep-specific tools base grid technology - middleware

to support the computing models of the experiments – Technical Design Reports due end 2004

LCG

Page 30: Last update: 01/06/2015 04:09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing

les robertson - cern-it-30last update 18/04/23 19:51

LCG Summary

The LCG Project has a clear goal of providing the environment and services for recording and analysing the LHC data when the accelerator starts operation in 2007

The computational requirements of LHC dictate a geographically distributed solution, taking maximum advantage of the facilities available to LHC around the world --- a computational GRID

A pilot service – LCG-1 – has been opened to learn how to use this technology to provide a reliable, efficient service encompassing many independent computing centres

It is already clear that the current middleware will have to re-engineered or replaced to achieve the goals of reliability and scalability

In the medium term we expect to get this new middleware from EU and US funded projects

But the technology is evolving rapidly – and LCG will have to adapt to a changing environment

While we keep a strong focus on providing a continuing service for the LHC Collaborations