grid computing for cms
DESCRIPTION
Grid computing for CMS. Alain Romeyer (Mons - Belgium). What is the Grid ? Let’s start with an analogy How it works ? (Some basic ideas) Grid for LHC and CMS computing model Conclusion. What is the Grid ?. an integrated advanced cyber infrastructure that delivers: Computing capacity - PowerPoint PPT PresentationTRANSCRIPT
Alain Romeyer - Dec. 2004 1
Grid computing for CMS
What is the Grid ?
Let’s start with an analogy
How it works ? (Some basic ideas)
Grid for LHC and CMS computing model
Conclusion
Alain Romeyer (Mons - Belgium)
Alain Romeyer - Dec. 2004 2
What is the Grid ?
What is not a Grid? A cluster, a network attached storage device, a
scientific instrument, a network, etc. Each may be an important component of a Grid, but
by itself does not constitute a Grid
For us : A new way of doing science !!!
an integrated advanced cyber infrastructure that delivers:Computing capacityData capacityCommunication capacity
Coordinated resource sharing and problem solving in dynamic no centralized control Use standard and open protocols and interfaces deliver nontrivial qualities of service
Alain Romeyer - Dec. 2004 3
An analogy : Power electricity
(on demand access)
Time
Qua
lity,
eco
nom
ies
of s
cale
Alain Romeyer - Dec. 2004 4
By analogy Decouple production and consumption
Enable on-demand access Achieve economies of scale Enhance consumer flexibility Enable new device
On a variety of scales Department Campus Enterprise Internet
Alain Romeyer - Dec. 2004 5
Not a perfect analogy… I import electricity but must export data
“Computing” is not interchangeable but highly heterogeneous
Computers, data, sensors, services, …
So the story is more complicated
But more significantly, the sum can be greater than the parts
Dynamic allocation of resources Access to distributed services Virtualization & distributed service management
Alain Romeyer - Dec. 2004 6
How it works ? Grid responsibilities Security Infrastructure
Authentication (identity) authorization (rights)
Management : Information Management
Soft-state, registration, discovery, selection, monitoring Resource Management
Remote service invocation, reservation, allocation Resource specification
Data Management High-performance, remote data access Cataloguing, replication, staging
Alain Romeyer - Dec. 2004 7
Grid Security Infrastructure (GSI) Public key infrastructure (asymmetric)
Need to be associated to a Virtual Organisation (VO)
Need certificate delivered by a Certification Authority (CA)
A certificate (x509 international standard) is :
It contains : A subject name (identify the user/person) A user public key The identity of the CA The digital signature of the CA
How it works ? Security - Authentification
a digitally signed document attesting to the binding
of a public key to an individual entity
Alain Romeyer - Dec. 2004 8
How it works ? Security - Authentication
CA
VO
Cert signing
registration
hash
3kjfgf*£$&Digital Signature
Message Digest
PublicCertificate
Certificate Request
Encrypt
Py75c%bn
Alain Romeyer - Dec. 2004 9
WorkloadManager
Job controlCONDOR-G
NetworkServer
Global Manager
How it works ? Management
LRMS
Computing Element
LRMS
Storage ElementPublish characs,status, available services…
Request (JDL)
InformationService
ResourceLocationService
Where ?
Status ?
Best actions to satisfy the request :• match-making• where submit• Grid status
Decision
Job submiss
ion
Dat
a Tr
ansf
ert
End of job : outputs are stored in your « sand box » ask to download them
Alain Romeyer - Dec. 2004 10
Some Grid e-science projects
Sloan Digital Sky Survey
ALMALHC
LHCb Atlas
Alice
CMS
Alain Romeyer - Dec. 2004 11
EGEE (www.eu-egee.org) Enabling Grid for E-science in Europe (2 years
project) Funded by the EU, 3 core areas :
1) build a consistent, robust and secure Grid network that will attract additional computing resources.
2) continuously improve and maintain the middleware in order to deliver a reliable service to users.
3) attract new users from industry as well as science and ensure they receive the high standard of training and support they need.
Two pilot application selected : Biomedical Grids (bioinformatics and healthcare data) Large Hadron Collider Computing Grid (LCG)
Alain Romeyer - Dec. 2004 12
Phase I (2002 - 2005) : development phase + series of computing data challenges
Phase II (2006 – 2008) : real production and deployment phase
2 phase project
LHC Computing Grid (LCG)
6 000 physicist working together
12-14 PetaBytes of data will be generated each year (20 millions CDs == 20 km)
Analysing this will require the equivalent of 70,000 of today's fastest PC processors(~192 years)
LCG goal : prepare the computing infrastructure for the simulation, processing and analysis of LHC data for the 4 experiments.
Alain Romeyer - Dec. 2004 13
LCG status
22/09/2004Total Sites : 82Total CPUs : 7269Total Storage : 6558 (TB)
Alain Romeyer - Dec. 2004 14
CMS data production at LHC
40 MHz 40 MHz (1000 TB/sec)
(1000 TB/sec)Level 1 Trigger
75 KHz 75 KHz (50 GB/sec)
(50 GB/sec)
1 bunch crossingEvery 25 ns
p
p
High Level Trigger100 Hz 100 Hz (100
(100 MB/sec)MB/sec)
Data Recording &
Data Recording &Offline Analysis
Offline Analysis
Cluster for the Trigger~ 1000 – 2000 PCs
Alain Romeyer - Dec. 2004 15
CMS computing model
Online System
CERN Center PBs of Disk;
Tape RobotTier 1
FNAL Center INFN Center
~2.5-10 GbpsIN2P3 Center RAL Center
~100-1500 MBytes/sec
~PByte/sec
Tier 0 +1
Experiment
2.5-10 Gbps
Workstations Tier 4
Tier2 Center
InstituteInstituteInstituteInstitute
0.1 to 10 GbpsPhysics data cache
Tier2 CenterTier2 CenterTier2 Center~2.5-10 Gbps
Tier 3Tier2 Center Tier 2
Physicists work on analysis “channels”.data for these channels should be
cached by the institute server
Alain Romeyer - Dec. 2004 16
DC04 Data Challenge
T0
T0 at CERN in DC04• 25 Hz input event rate• Reconstruct quasi-realtime• Events filtered into streams• Distribute data to T1’s
PICBarcelona
FZKKarlsruhe
CNAFBologna
RALOxford
IN2P3Lyon
T1
T1
T1
T1
T1
FNALChicago
T1
T1 centres in DC04• Pull data from T0 to T1 and store• Make data available to PRS• Demonstrate quasi-realtime “fake”
analysis
March-April 2004
Alain Romeyer - Dec. 2004 17
DC04 Processing Rate
Processed about 30M events
T0 events processed vs. days
Got above 25Hz on many short occasions Only one full day >25Hz with full system
T0 event processing rate (Hz)
Next challenge: make it useable by average physicists…and demonstrate that the performance scales acceptably
DC04 demonstrated that the system can work…at least for well controlled data flow / analysis, and for a few expert users
Alain Romeyer - Dec. 2004 18
Conclusion Grid becomes a reality Management is the crucial issue that is not fully
implemented will be done by the EGEE project
For the HEP, LCG II already available and working
CMS DC04 has showed that the system starts to work
Next data challenge will be crucial : Usable by standard physicist Performances reasonable for LHC
Alain Romeyer - Dec. 2004 19
Conclusion Belgrid project (www.belgrid.be) « a Belgian Grid initiative «
Regroups academic, public and private partners Goal : share the local computing resources using Grid technologies Status : GridFTP between sites is working Plan : distributed computing
BEgrid (belnet) : grid computing for the Belgian Research Belnet : official CA -> certificate also valid for use in EGEE 5 universities connected (KULeuven, UA, UG, ULB and VUB) LCG II and follow the EGEE middleware