alain romeyer - dec. 20041 grid computing for cms what is the grid ? let’s start with an analogy...

19
Alain Romeyer - Dec. 2004 1 Grid computing for CMS What is the Grid ? Let’s start with an analogy How it works ? (Some basic ideas) Grid for LHC and CMS computing model Conclusion Alain Romeyer (Mons - Belgium)

Post on 18-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Alain Romeyer - Dec. 2004 1

Grid computing for CMS

What is the Grid ?

Let’s start with an analogy

How it works ? (Some basic ideas)

Grid for LHC and CMS computing model

Conclusion

Alain Romeyer (Mons - Belgium)

Alain Romeyer - Dec. 2004 2

What is the Grid ?

What is not a Grid? A cluster, a network attached storage device, a

scientific instrument, a network, etc. Each may be an important component of a Grid, but

by itself does not constitute a Grid

For us : A new way of doing science !!!

an integrated advanced cyber infrastructure that delivers:Computing capacityData capacityCommunication capacity

Coordinated resource sharing and problem solving in dynamic

no centralized control Use standard and open protocols and interfaces deliver nontrivial qualities of service

Alain Romeyer - Dec. 2004 3

An analogy : Power electricity

(on demand access)

Time

Qua

lity,

eco

nom

ies

of s

cale

Alain Romeyer - Dec. 2004 4

By analogy

Decouple production and consumption Enable on-demand access Achieve economies of scale Enhance consumer flexibility Enable new device

On a variety of scales Department Campus Enterprise Internet

Alain Romeyer - Dec. 2004 5

Not a perfect analogy… I import electricity but must export data

“Computing” is not interchangeable but highly heterogeneous

Computers, data, sensors, services, …

So the story is more complicated

But more significantly, the sum can be greater than the parts

Dynamic allocation of resources Access to distributed services Virtualization & distributed service management

Alain Romeyer - Dec. 2004 6

How it works ? Grid responsibilities Security Infrastructure

Authentication (identity) authorization (rights)

Management : Information Management

Soft-state, registration, discovery, selection, monitoring Resource Management

Remote service invocation, reservation, allocation Resource specification

Data Management High-performance, remote data access Cataloguing, replication, staging

Alain Romeyer - Dec. 2004 7

Grid Security Infrastructure (GSI) Public key infrastructure (asymmetric)

Need to be associated to a Virtual Organisation (VO)

Need certificate delivered by a Certification Authority (CA)

A certificate (x509 international standard) is :

It contains : A subject name (identify the user/person) A user public key The identity of the CA The digital signature of the CA

How it works ? Security - Authentification

a digitally signed document attesting to the binding

of a public key to an individual entity

Alain Romeyer - Dec. 2004 8

How it works ? Security - Authentication

CA

VO

Cert signing

registration

hash

3kjfgf*£$&Digital Signature

Message Digest

PublicCertificate

Certificate Request

Encrypt

Py75c%bn

Alain Romeyer - Dec. 2004 9

WorkloadManager

Job controlCONDOR-G

NetworkServer

Global Manager

How it works ? Management

LRMS

Computing Element

LRMS

Storage ElementPublish characs,status, available services…

Request (JDL)

InformationService

ResourceLocationService

Where

?

Sta

tus ?

Best actions to satisfy the request :• match-making• where submit• Grid status

Decision

Job su

bmiss

ion

Data

Tra

nsfe

rt

End of job : outputs are stored in your « sand box » ask to download them

Alain Romeyer - Dec. 2004 10

Some Grid e-science projects

Sloan Digital Sky Survey

ALMALHC

LHCb Atlas

Alice

CMS

Alain Romeyer - Dec. 2004 11

EGEE (www.eu-egee.org)

Enabling Grid for E-science in Europe (2 years project)

Funded by the EU, 3 core areas : 1) build a consistent, robust and secure Grid network

that will attract additional computing resources.

2) continuously improve and maintain the middleware in order to deliver a reliable service to users.

3) attract new users from industry as well as science and ensure they receive the high standard of training and support they need.

Two pilot application selected : Biomedical Grids (bioinformatics and healthcare data) Large Hadron Collider Computing Grid (LCG)

Alain Romeyer - Dec. 2004 12

Phase I (2002 - 2005) : development phase + series of computing data challenges

Phase II (2006 – 2008) : real production and deployment phase

2 phase project

LHC Computing Grid (LCG)

6 000 physicist working together

12-14 PetaBytes of data will be generated each year (20 millions CDs == 20 km)

Analysing this will require the equivalent of 70,000 of today's fastest PC processors(~192 years)

LCG goal : prepare the computing infrastructure for the simulation, processing and analysis of LHC data for the 4 experiments.

Alain Romeyer - Dec. 2004 13

LCG status

22/09/2004Total Sites : 82Total CPUs : 7269Total Storage : 6558 (TB)

Alain Romeyer - Dec. 2004 14

CMS data production at LHC

40 MHz

40 MHz (1000 TB/sec)

(1000 TB/sec)

Level 1 Trigger 75 KHz 75 KHz (50 GB/sec)

(50 GB/sec)

1 bunch crossingEvery 25 ns

p

p

High Level Trigger

100 Hz 100 Hz (100

(100 MB/sec)MB/sec)

Data Recording &

Data Recording &

Offline Analysis

Offline Analysis

Cluster for the Trigger~ 1000 – 2000 PCs

Alain Romeyer - Dec. 2004 15

CMS computing model

Online System

CERN Center PBs of Disk;

Tape RobotTier 1

FNAL Center INFN Center

~2.5-10 Gbps

IN2P3 Center RAL Center

~100-1500 MBytes/sec

~PByte/sec

Tier 0 +1

Experiment

2.5-10 Gbps

Workstations Tier 4

Tier2 Center

InstituteInstituteInstituteInstitute

0.1 to 10 GbpsPhysics data cache

Tier2 CenterTier2 CenterTier2 Center

~2.5-10 GbpsTier 3

Tier2 Center Tier 2

Physicists work on analysis “channels”.data for these channels should be

cached by the institute server

Alain Romeyer - Dec. 2004 16

DC04 Data Challenge

T0

T0 at CERN in DC04• 25 Hz input event rate• Reconstruct quasi-realtime• Events filtered into streams• Distribute data to T1’s

PICBarcelona

FZKKarlsruhe

CNAFBologna

RALOxford

IN2P3Lyon

T1

T1

T1

T1

T1

FNALChicago

T1

T1 centres in DC04• Pull data from T0 to T1 and store• Make data available to PRS• Demonstrate quasi-realtime “fake”

analysis

March-April 2004

Alain Romeyer - Dec. 2004 17

DC04 Processing Rate

Processed about 30M events

T0 events processed vs. days

Got above 25Hz on many short occasions Only one full day >25Hz with full system

T0 event processing rate (Hz)

Next challenge: make it useable by average physicists…and demonstrate that the performance scales acceptably

DC04 demonstrated that the system can work…at least for well controlled data flow / analysis, and for a few expert users

Alain Romeyer - Dec. 2004 18

Conclusion

Grid becomes a reality Management is the crucial issue that is not fully

implemented will be done by the EGEE project

For the HEP, LCG II already available and working

CMS DC04 has showed that the system starts to work

Next data challenge will be crucial : Usable by standard physicist Performances reasonable for LHC

Alain Romeyer - Dec. 2004 19

Conclusion

Belgrid project (www.belgrid.be) « a Belgian Grid initiative «

Regroups academic, public and private partners Goal : share the local computing resources using Grid technologies Status : GridFTP between sites is working Plan : distributed computing

BEgrid (belnet) : grid computing for the Belgian Research

Belnet : official CA -> certificate also valid for use in EGEE 5 universities connected (KULeuven, UA, UG, ULB and VUB) LCG II and follow the EGEE middleware