computing plans in cms
DESCRIPTION
Ian Willers CERN. Computing Plans in CMS. The Problem and Introduction Data Challenge – DC04 Computing Fabric – Technologies evolution Conclusions. CERN. The Problem. event filter (selection & reconstruction). detector. processed data. event summary data. raw - PowerPoint PPT PresentationTRANSCRIPT
Computing Plans in CMS
Ian Willers
CERN
2
1.1. The Problem and IntroductionThe Problem and Introduction
2. Data Challenge – DC04
3. Computing Fabric – Technologies evolution
4. Conclusions
interactivephysicsanalysis
batchphysicsanalysis
batchphysicsanalysis
detector
event summary data
rawdata
eventreprocessing
eventreprocessing
eventsimulation
eventsimulation
analysis objects(extracted by physics topic)
The Problem
event filter(selection &
reconstruction)
event filter(selection &
reconstruction)
processeddata
CERN
4
Regional Centres – a Multi-Tier Model
Tier 1
Department
Desktop
CERN – Tier 0
FNALRAL
IN2P3622 M
bps2.5 Gbps
622 M
bp
s
155
mbp
s 155 mbps
Tier2 Lab a
Uni b Lab c
Uni n
6
Iterations /scenarios
Computing TDR StrategyPhysics Model
•Data model •Calibration•Reconstruction•Selection streams•Simulation•Analysis•Policy/priorities…
Computing Model
•Architecture (grid, OO,…)•Tier 0, 1, 2 centres•Networks, data handling•System/grid software•Applications, tools•Policy/priorities…
C-TDR• Computing model (& scenarios)• Specific plan for initial systems• (Non-contractual) resource planning
DC04 Data challenge
Copes with 25Hz at 2x10**33 for 1 month
TechnologiesEvaluation and
evolution Estimated AvailableResources
(no cost book for computing)
Requiredresources
SimulationsModel systems &
usage patterns
Validation of Model
7
1.1. The Problem and IntroductionThe Problem and Introduction
2. Data Challenge – DC04
3. Proposed Computing Fabric
4. Conclusions
8
DC04 Analysis challenge
DC04 Calibration challenge
T0
T1T2
T2
T1
T2
T2
Fake DAQ(CERN)
DC04 T0challenge
SUSYBackground
DST
HLTFilter ?
CERN disk pool~40 TByte(~20 days
data)
TAG/AOD(replica)
TAG/AOD(replica)
TAG/AOD(20
kB/evt)
ReplicaConditions
DB
ReplicaConditions
DB
HiggsDST
Eventstreams
Calibrationsample
CalibrationJobs
MASTERConditions DB
1st passRecon-
struction
25Hz1.5MB/evt40MByte/s3.2 TB/day
Archivestorage
CERNTape
archive
Disk cache
25Hz1MB/evt
raw
25Hz0.5MB recoDST
Higgs backgroundStudy (requests
New events)
Eventserver
50M events75 Tbyte
Pre Challenge Production
CERNTape
archive
Starting Now. “True” DC04 Feb,
2004
Data Challenge DC04
9
DC04 Analysis challenge
DC04 Calibration challenge
T0
T1T2
T2
T1
T2
T2
Fake DAQ(CERN)
DC04 T0challenge
SUSYBackground
DST
HLTFilter ?
CERN disk pool~40 TByte(~10 days
data)
50M events75 Tbyte
PCP
CERNTape
archive
TAG/AOD(replica)
TAG/AOD(replica)
TAG/AOD(10-100kB/evt)
ReplicaConditions
DB
ReplicaConditions
DB
HiggsDST
Eventstreams
Calibrationsample
CalibrationJobs
MASTERConditions DB
1st passRecon-
struction
25Hz2MB/evt
50MByte/s4 Tbyte/day
Archivestorage
CERNTape
archive
Disk cache
25Hz1MB/evt
raw
25Hz0.5MB recoDST
Higgs backgroundStudy (requests
New events)
Eventserver
10
MCRunJob
Pre–Challenge Production with/without GRID
Site Manager startsan assignment
RefDBPhysics Group asksfor official dataset
User starts aprivate production
Production Managerdefines assignments
DAG
job job
job
job
JDL
shellscripts
DAGMan
LocalBatch Manager
EDGScheduler
Computer farm
CMS/LCG-0
User’s Site (or grid UI) Resources
ChimeraVDL
Virtual DataCatalogue
Planner
11
1.1. The Problem and IntroductionThe Problem and Introduction
2. Data Challenge – DC04
3. Proposed Computing Fabric
4. Conclusions
12
HEP Computing
• High Throughput Computing– throughput rather than performance– resilience rather than ultimate reliability– long experience in exploiting inexpensive
mass market components– management of very large scale clusters is
a problem
13
CPU Servers
14
CPU capacity - Industry
• OpenLab study of 64 bit architecture • Earth Simulator
– number 1 computer in top 500– made in Japan by NEC– peak speed of 40 Tflops– leads Top 500 list by almost a factor 5– performance of Earth Simulator equals sum of next 12
computers– the Earth Simulator runs at 90% (vs. 10-60% for PC
farms) efficiency– Gordon Bell warned “Off-the-shelf supercomputing is a
dead end”
16
Earth Simulator
17
Earth Simulator
18
Cited problems with farms used as supercomputers
• Lack of memory bandwidth• Interconnect latency• Lack of interconnect bandwidth• Lack of high performance (parallel) I/O• High cost of ownership for large scale
systems• For CMS - does this matter?
19
LCG Testbed Structure used100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE
3 GB lines
3 GB lines
8 GB lines
64 disk server64 disk server
BackboneRouters BackboneRouters
36 disk server36 disk server
20 tape server20 tape server
100 GE cpu server100 GE cpu server
200 FE cpu server200 FE cpu server
100 FE cpu server100 FE cpu server
1 GB lines
20
HEP Computing
• Mass Storage model– data resides on tape – cached on disk– light-weight private software for scalability,
reliability, performance– petabyte scale object persistency database
products
21
Mass Mass StorageStorage
22
Mass Storage - Industry
• OpenLab – StorageTek 9940B drives driven by CERN at 1.1 GB/s
• Tape only for backup
• Main data stored on disks
• Google example
24
Disk Storage
25
Disks – Commercial trends
• Jobs accessing files over the GRID– GRID copied files to sandbox– new proposal for file access from GRID
• OpenLab – IBM 28TB TotalStorage using iSCSI disks
• iSCSI: SCSI over the Internet• OSD: Object Storage Device = Object Based
SCSI• Replication gives security and performance
26
File Access via Grid
• Access now takes place in steps:1) find site where file resides using replica
catalogue
2) check if the file is on tape or on disk, if only on tape move to disk
3) if you cannot open a remote file, copy the file to the worker node and use local I/O
4) open the file
27
Object Storage Device
28
Big disk, slow I/O tricks
HotData
ColdData
Sequential faster than randomAlways read from start to finish
31
Network trends
• OpenLab: 755MB/s over 10 Gbps Ethernet• CERN/Caltech land speed record holders (in
Guinness Book of Records)– CERN to Chicago: iPv6 single stream, 983 Mbps– Sunnyvale to Geneva: iPv4 multiple streams,
2.38 Gbps
• Network Address Translation, NAT• IPv6: IP address depletion, efficient packet
handling, authentication, security etc.
32
Port Address Translation
• PAT - A form of dynamic NAT that maps multiple unregistered IP addresses to a single registered IP address by using different ports
• Avoids iPv4 problems of limited addresses• Mapping can be done dynamically so adding nodes easier• Therefore easier to management of farm fabric?
33
iPv6
• iPv4: 32-bit address space assigned– 67% for USA– 6% for Japan– 2% for China– 0.14% for India
• iPv6: 128-bit address space
• No longer need for Network Address Translation, NAT?
34
1.1. The Problem and IntroductionThe Problem and Introduction
2. Data Challenge – DC04
3. Proposed Computing Fabric
4. Conclusions
35
Conclusions
• CMS faces an enormous challenge in computing– short term data challenges– long term developments within commercial and
scientific world
• The year 2007 is still four years away– enough for a completely new generation of computing
technologies to appear
• New inventions may revolutionise computing– CMS depends on this progress to make our
computing possible and affordable