cms atlas lhcb. slide 2 overview the first year of lcg experiment perspectives on the lcg...

21
CMS ATLAS LHCb

Upload: denis-wilkerson

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Slide 3 Opening Comments  The LCG process has started reasonably well  The Experiments, the CERN Divisions, the Regional Centers and the LCG Project itself are all working together to achieve the common goals  Differences of approach and emphasis exist between all parties, but are much less significant than the commonalities  Most importantly these differences are not destructive to the process, but probably constructive

TRANSCRIPT

Page 1: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

CMSATLAS

LHCb

Page 2: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 2

OverviewOverview The first year of LCG Experiment perspectives on the LCG management

bodies The Experiment Plans for Data Challenges Developing the Computing TDRs Summary

Successes Outstanding questions

Page 3: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 3

Opening CommentsOpening Comments The LCG process has started reasonably well The Experiments, the CERN Divisions, the

Regional Centers and the LCG Project itself are all working together to achieve the common goals

Differences of approach and emphasis exist between all parties, but are much less significant than the commonalities

Most importantly these differences are not destructive to the process, but probably constructive

Page 4: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 4

LCG Area 1: Applications Area (AA)LCG Area 1: Applications Area (AA) By the SC2/RTAG process we attempt to identify

common requirements of our applications and pull out common projects wherever appropriate

AA projects or contributions in: Persistency service (POOL) Basic Services (SEAL) ROOT Math Libraries Simulation Services (Including G4) Physicist Interface (PI) Infrastructure (SPI)

ATLAS, CMS and LHCb have baselines to use the POOL software this summer.

There is little scope for accepting delays. The next 2-3 months will be critical if the AA projects are to be successful

} Important deliverables in next few months

Page 5: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 5

The LCG Application BlueprintThe LCG Application Blueprint ALICE expects to build most of its application base on

ROOT (and AliEn) They have a well established program The creation of the EP/SFT ROOT group, the CERN commitment to ROOT

and the addition of LCG manpower are solid contributions to the ROOT development and support.

ATLAS, CMS and LHCb expect ROOT to be a major component but have worked towards an architecture in which it can/will be used, but is not the unique underpinning

These three experiments also have well established programs The support for ROOT is also at their request They work with AA to develop new packages (such as POOL) and to bring

their separate work to coherence (as in SEAL) The differences between the approaches of ALICE and

ATLAS-CMS-LHCb cannot be denied. But they have the potential to be constructive The project and the experiments have developed working models to cope

with these differences and to avoid stalemate situations

Page 6: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 6

AA ResourcesAA Resources All parties contribute extensively to AA

Existing Products and Manpower to extend/support : ROOT, SCRAM Experienced manpower in design and implementation for POOL Sharing of existing products and development of a coherent base in SEAL Numerically about 2-4FTEs from each experiment in LCG/AA projects

Qualitatively very important as these are typically our best people We have the impression that the redirection of experiment and CERN resources

has been more effective in this area than the addition of new resources. Time will tell

The Software Process and Infrastructure Group has made an excellent start

Useful tools deployed (savannah…) Extension of SCRAM to Windows underway Support of existing experiment projects and the new LCG projects

The real test comes when we start to use the new products (POOL, SEAL,..)

Experiments hope to benefit from significant help from LCG project members in this crucial phase (coming in next few months)

Page 7: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 7

LCG Area 2: Grid Deployment Area (GDA)LCG Area 2: Grid Deployment Area (GDA) Grid Deployment Board (GDB)

Strong participation from all experiments, centers Getting to grips with deploying an LCG1 service

There is no stable common middleware Experiments and Grid projects have learned a lot and made many

improvements in “Stress tests”. Success of LCG1 will depend on ability to respond to change

We are building a production system on a base that is still largely prototypal

LCG1 must be capable of rapidly assimilating developments Require Development, Testing, Hardening, Production environments and

continuous migration of code There is a temptation to believe too much in the ability of the

current round of middleware “This is a production service so it must be stable” (But it can’t be..)

Difficult balance between forcing a limited set of initial conditions and allowing for the actual wide diversity in regional centers m.o.

GDA is establishing a solid base of expertise in LCG

Page 8: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 8

LCG1LCG1 LCG1 must develop over this year to a useable production

service Data Challenges will depend on its functionality, even for a

some User use-cases in early 2004. By 2004 it must have a lot of resources!

For example CMS estimates 600 CPU’s this year, approaching 2000 in 2004

Portability of the software is becoming critical Large farms of 64bit processors are being built, but so far outside LCG1

possibilities The experiments go to great lengths to be able to run on multiple

platforms – and we get better software from that investment

Security policies, rapid convergence required, but not trivial Can enhance security (we want!) And/or put obstacles in the path of legitimate use (We don’t want!)

Page 9: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 9

Grid Applications Group (GAG)Grid Applications Group (GAG) This impressively acronymed group recently

established by SC2 Follow on of the work done in the HEPCAL RTAG (HEP Common

Application Area - describing the important use cases for the experiments

Track the match of deployed solutions with the HEPCAL use cases; extend HEPCAL.

Observation from recent GDB GAG is the experiments way to maintain focus on some key issues

such as middleware portability (Operating Systems etc) Develop and promulgate a common understanding of as yet

barely addressed issues such as “Virtual Data”

Great progress has been made with EDG and VDT software in the experiment stress-tests

As with most progress, it is acquired mostly by mistakes

Page 10: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 10

LCG Area 3: GRID Technology Area (GTA)LCG Area 3: GRID Technology Area (GTA) Only recently has a coherent understanding of

this activity started to be discussed The important work of specifying the LCG1 middleware has been

undertaken in GDB, but we expect this activity to migrate to GTA A project plan has been presented, but still rather unclear who

will do the work (David Foster is doing a great job, who will work with him?)

Experiments have developed different models of working in a GRID environment

ATLAS works with EDG, US projects and NorduGrid. ALICE uses AliEn (standalone and interfaced to EDG/VDT) CMS has run extensive tests with US and EDG products LHCb, early tester of EDG, new tests in real Monte Carlo

production (DC03) foreseen this month Room for collaboration on grid/user portals, RTAG being planned

Page 11: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 11

LCG Area 4: Fabric Technology AreaLCG Area 4: Fabric Technology Area Good collaboration between experiments and IT

How does (Can?) this group be made more relevant for regional centers?

Tape IO Obviously an issue for ALICE (1.25GB/s) in HI mode

(Plus: ATLAS and CMS plan to write data at least at their pp rate during HI running)

Even pp running can require this IO rate Current tests at ~300MB/s.

Tape IO is a vital component of the CERN T0 We appreciate the difficulties of the budget and the proposals to keep

the overrun as low as possible. TAPE IO tests must be able to track a reasonable ramp-up to build

expertise and confidence

Page 12: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 12

LCG Management BodiesLCG Management Bodies

SC2 is working quite well Most emphasis (too much?) on Applications Dominated by experiment reps. (Because mostly Applications)

Regional Center participants not very vocal so far Perceived difficulty of Overview

A detailed WBS has been produced Necessarily complex Verification of milestones and their relative meaning is problematic

PEB Some confusion, on project and experiment side, on the role of

experiment representatives has existed Some discussion between LCG management and experiment

management may be required to ensure this does not become a long term problem.

Page 13: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 13

Data ChallengesData Challenges

CMS

1

10

100

1000

10000

100000

2002 2003 2004 2005 2006 2007 2008 2009

kSI9

5.M

onth

s

CERN

OFFSITEAverage slope=x2.5/year

DC04Physics TDR

DC05LCG TDR

DC06Readiness

LHC2E33

LHC1E34

DAQTDR

Time shared Resources Dedicated CMS Resources

Page 14: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 14

Data ChallengesData Challenges All experiments have detailed DC plans

Motivated by need to test scaling of solutions (Hardware, Middleware and Experiment Software)

At CERN we count on continued support for MSS, in particular CASTOR Current tests are still a factor of 100 away from startup conditions

(KSI2k.Months) CERN has a special ramping difficulty going from shared systems in

2005/6 to separate systems in 2007. Introduces an even sharper slope change between phase I and II

Data Challenges moving to Analysis Challenges

Data Challenges will typically have two components A production phase, that is onerous but not part of the challenge per se

Should not be allowed to fail as also provides Physics data for experiment A true challenge phase

That may fail and lead to revision of the Computing Model or some of its components

Global Scheduling is required for the two components Productions can be in parallel, but typically not the true challenge periods

Page 15: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 15

Overview of Resource Requirements (CMS-DC04)Overview of Resource Requirements (CMS-DC04)

Estimates for CPU and Storage Requirements for CMS Data Challenge DC04

Year.Quarter 03Q3 03Q4 04Q1 04Q2Computing Power (kSI2K Months)Total Requirement for Simulation 900 1800Total Requirement for Digitization 135Total Requirement for Reconstruction 225Total Requirement for Analysis 450 450Total Previewed CERN/LCG capacity (Eck) 900 900 900 1890CERN T0 300 645 225CERN T1 (Challenge related only) 150 150Offsite T1+T2 (Challenge only) 600 1290 300 300

Storage (TeraBytes)Data Generated CERN 19 39 25Data Generated Offsite 39 78Data Transferred to CERN 17 33Sum Data Stored CERN 36 108 133 133Active Data at CERN 25 75 100 100Assumed Number of Active Offsite T1 3 3 3 3Sum Data Stored Offsite 39 117 192 192

Page 16: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

D.S

tickl

and

Feb

03, P

OB

Slide 16

CMS LCG-1 “pilot” diagram

EDG Replica Manager

RLS or ReplicaCatalogue

EDG WorkloadManagement System

EDG L&B

MDS(GLUE schema)

LDAP?

PublishResource

status Read data

Write dataRetrieve

Resourcestatus

Data management operations

Job assignment to resources

Copy data

MCRunJob +EDG plug-in

JDL+scripts Job submission

Inpu

t dat

alo

catio

n

Job

crea

tion

Fetch assignment data

ExperimentSoftware

PACMAN?

PACMAN DB?

Software release

SW dow

nload & installation

Dataset Algorithm Specification

Dataset Input Specification

Dataset Definition

New datasetrequest

BOSS&R-GMA?

BOSS-DB

Job Monitoring Definition Job type definition

Job output

filterin

g

Update

dataset metadata

Production monitoring

EDG SEVDT Server

Data

EDG CEVDT server

EDG UIVDT Client

Push dataor infoPull infoEDT Monitor

Resources monitoring

Page 17: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 17

Computing TDRsComputing TDRs Initial LHC Computing to be defined by four

experiment CTDRs and an LCG TDR Respective roles becoming clear (Discussed in last weeks SC2) LCG TDR depends on CTDR “Computing Model”, but not on all

details Computing Models a joint effort of Experiments matching their Physics

Model, and LCG expertise matching solutions. Timescale of LCG TDR set for Mid 2005 (just) in time for MOUs and

purchasing Timescales for experiment CTDRs ~6 months earlier

(at least Computing Model components) Experiment CTDRs expect to contain non-binding capacity

estimates from regional centers to set appropriate “budget targets”. Could also seed MOU deliberations

Close connection/collaboration required between LCG TDR and experiment CTDRs

Page 18: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 18

Computing Model (CMS Example)Computing Model (CMS Example)

The Capacity available to CMS in a single T1 2003 2004 2005 2006 2007 2008CPU scheduled 17 45 143 240 400 717 kSI2KCPU analysis 42 114 358 601 1000 1790 kSI2KTotal CPU 59 159 501 842 1400 2507 kSI2KDisk 22 57 163 284 465 813 TbytesActive tape 21 48 111 241 383 555 TbytesArchive tape 30 66 156 328 527 780 TbytesTape I/O 17 38 80 183 282 400 MB/sNumber of CPU boxes 66 151 358 463 580 696

The CERN T0 (Capacity available to CMS) 2003 2004 2005 2006 * 2007 * 2008CPU scheduled 75 202 635 693 1485 3176 kSI2KDisk 26 67 192 225 467 962 TbytesActive tape 53 120 275 450 845 1376 TbytesArchive tape 80 175 414 647 1239 2068 TbytesTape I/O 34 76 160 282 508 800 MB/sNumber of CPU boxes 83 192 453 340 561 882

The Capacity available to CMS in a single T2 2003 2004 2005 2006 2007 2008CPU scheduled 4 10 32 54 90 161 kSI2KCPU analysis 4 11 35 59 97 174 kSI2KTotal CPU 8 21 67 112 187 335 kSI2KActive tape 0 0 0 0 0 0 TbytesArchive tape 7 15 32 66 106 160 TbytesTape I/O 4 9 20 46 71 100 MB/sNumber of CPU boxes 9 20 48 62 78 93

The Current LCG funding shortfall

(3.8MCHF) permits either the T0 or the T1 at 20% scale, but

not both

The Current LCG funding shortfall

(3.8MCHF) permits either the T0 or the T1 at 20% scale, but

not both

Page 19: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 19

Computing Manpower in ExperimentsComputing Manpower in Experiments

Remains a problem in the short term Experiments have willingly contributed many of their most

experienced developers to establishing the LCG projects. In the medium to long term we count on this moderating our

manpower requirements In the short term this exacerbates our manpower shortfalls and

puts at risk our milestones. This even leads to duplicated effort to cover both interim and final

solutions Manpower shortfalls of ~20 people exist across the experiments

We assume we will fill the agreed 6 experiment related LCG posts Finding ways to tackle the remaining shortfall is critical and we will be

looking to cooperate with the project leadership in finding solutions as soon as possible

Page 20: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 20

Use of Outside ResourcesUse of Outside Resources Concern that projects are focused at CERN ,

possible disenfranchising some of the worldwide software developer effort

Probably an inevitable result of the necessary initial concentration on vital new developments

Strongly encourage use of all available tools to enable global participation

CERN video meeting resources are totally overstressed, not enough physically equipped rooms in key times of the day

Tools and networks much improved, this can be done effectively if the participants plan carefully and thoughtfully

Geographic sharing of projects in R&D and early production phases is very hard.

Most of our work requires detailed collaboration still at the project definition level

Similar problem for internal experiment and for LCG developments

Page 21: CMS ATLAS LHCb. Slide 2 Overview  The first year of LCG  Experiment perspectives on the LCG management bodies  The Experiment Plans for Data Challenges

Slide 21

SummarySummary This talk has concentrated on the difficulties But the overall assessment is one of positive motion Establishing stable middleware and ensuring its

support is crucial. Building a solid LCG1, but making allowance for

flexibility will be critical Ensuring the full scale of the CERN and Regional

Phase I capabilities is still not assured, and is vital to success.

CERN T0/T1 phase I needs the additional 3.8MCHF We require both T0 and T1 at scale to gain confidence this will all work

TAPE IO scale testing may still be at risk even with the 3.8MCHF

The entire acronym space is fully overloaded, we can’t establish any new committees…