the computing system for the belle experiment ichiro adachi kek representing the belle dst/mc...
TRANSCRIPT
The Computing System for the Belle Experiment
Ichiro AdachiKEK
representing the Belle DST/MC production group
CHEP03, La Jolla, California, USAMarch 24, 2003
•Introduction: Belle•Belle software tools•Belle computing system & PC farm•DST/MC production•Summary
March 24, 2003 Ichiro Adachi, CHEP03 2
Introduction
• Belle experiment– B-factory experiment at
KEK– study CP violation in B
meson system. start from 1999
– recorded ~120M B meson pairs so far
– KEKB accelerator is still improving its performance
120fb-1
The largest B meson data sample at (4s) region in the world
March 24, 2003 Ichiro Adachi, CHEP03 3
Belle detectorexample of event reconstruction
fully reconstructed event
March 24, 2003 Ichiro Adachi, CHEP03 4
Belle software tools
• Home-made kits– “B.A.S.F.” for framework
• Belle AnalySis Framework• unique framework for any step of event proc
essing• event-by-event parallel processing on SMP
– “Panther” for I/O package • unique data format from DAQ to user analysi
s• bank system with zlib compression
– reconstruction & simulation library• written in C++
• Other utilities– CERNLIB/CLHEP…– Postgres for database
Input with panther
Output with panther
unpackingcalibration
trackingvertexing
clustering
particle ID
diagnosis
B.A
.S.F
.
module
loaded dynamically
shared object
Event flow
March 24, 2003 Ichiro Adachi, CHEP03 5
University resources
User analysis & storage system
Computing network for batch jobs and DST/MC production
Belle computing system
TokyoNagoyaTohoku
super-sinet1Gbps
Sun computing server
500MHz*4
GbE switch
Compaq38 hosts
online tape server
PC farms
tape library 500TB Fujitsu
HSM server
HSM library 120TB
disk 4TBfile server 8TB
500MHz*49 hosts
user PC
1GHz 100hosts
work group server
GbE switch
GbE switch
March 24, 2003 Ichiro Adachi, CHEP03 6
Computing requirements
Reprocess entire beam data in 3 months
MC size is 3 times larger than real data at least
Once reconstruction codes are updated or constants are improved, fast turn-around is essential to perform physics analyses in a timely manner
Analyses are getting matured and understanding systematic effect in detail needs large MC sample enough to do this
Added more PC farms and disks
March 24, 2003 Ichiro Adachi, CHEP03 7
0
200
400
600
800
1000
1200
1400
1600
1999.1.11999.7.12000.1.12000.7.12001.1.12001.7.12002.1.12002.7.12003.1.1
PC farm upgrade
Total CPU(GHz)
~1500GHz
boost up CPU power for DST & MC productions Delivered
in Dec.2002
Will come soon
Total CPU has become 3 times bigger in recent two years60TB(total) disks have been also purchased for storage
Total CPU = CPU processor speed(GHz) # of CPUs # of nodes
March 24, 2003 Ichiro Adachi, CHEP03 8
2% 3% 3%
2%
11%
21%
2%25%
31%
Belle PC farm CPUs-heterogeneous system from various vendors-CPU processors(Intel Xeon/PenIII/Pen4/Athlon)
Fujitsu 127PCs(Pentium-III 1.26GHz)
Appro113PCs(Athlon 2000+)
Compaq 60PCs(Intel Xeon 0.7GHz)
380GHz
320GHz
168GHz
Dell 36PCs(Pentinum-III ~0.5GHz)
470GHz
NEC 84PCs(Pentium4 2.8GHz)
setting up done
will come soon
March 24, 2003 Ichiro Adachi, CHEP03 9
DST production & skimming scheme
PC farm
raw data
DST data
disk
DST data
histogramslog files
disk or HSM
skims such as hadronic data sample
Sun
disk
data transfer
Sun
1. Production(reproduction)
2. Skimming
histogramslog files
user analysis
March 24, 2003 Ichiro Adachi, CHEP03 10
Output skims• Physics skims from reprocessing
– “Mini-DST”(4-vectors) format– Create hadronic sample as well as typical physics channels(up to
~20 skims) • many users do not have to go through whole hadronic sample.
– Write data onto disk at Nagoya(350Km away from KEK) directly using NFS(thanks to super-sinet link of 1Gbps)
hadronic mini-DST
J/ inclusivebs D*
s
full reconmini-DST reprocessing output
Nagoya~350Km from KEK
1Gbps
KEK site
March 24, 2003 Ichiro Adachi, CHEP03 11
Processing power & failure rate
• Processing power– Processing ~1fb-1 per day with 180GHz
• Allocate 40 PC hosts(0.7GHzx4CPU) for daily production to catch up with DAQ– 2.5fb-1 per day possible
• Processing speed(in case of MC) with 1GHz one CPU– Reconstruction: 3.4sec– Geant simulation: 2.3sec
• Failure rate for one B meson pair
module crash < 0.01%
tape I/O error 1%
process communication error
3%
network trouble/system error
negligible
March 24, 2003 Ichiro Adachi, CHEP03 12
Reprocessing 2001 & 2002
• Reprocessing– major library &
constants update in April
– sometimes we have to wait for constants
• Final bit of beam data taken before summer shutdown always reprocessed in time
For 2001 summer 30fb-
1
For 2002 summer 78fb-
1
2.5months
3months
March 24, 2003 Ichiro Adachi, CHEP03 13
MC production• Produce ~2.5fb-1 per day with 400GHz PenIII
– Resources at remote sites also used• Size 15~20GB for 1 M events.
– 4-vector only• Run dependent
beam data file
Run# xxxB0 MC data
B+B- MC data
charm MC data
min. set of generic MC
Run# xxx
light quark MC
run-dependent background IP profile
mini-DST
March 24, 2003 Ichiro Adachi, CHEP03 14
MC production 2002
• Keep producing MC generic samples– PC farm shared with DST – Switch from DST to MC prod
uction can be made easily• Reached 1100M events in
March 2003. 3 times larger samples of 78fb-1 completed
major update
minor change
March 24, 2003 Ichiro Adachi, CHEP03 15
MC production at remote sites
• Total CPU resources at remote sites is similar to KEK
• 44% of MC samples has been produced at remote sites– All data transferred to K
EK via network• 6~8TB in 6 months
0
100
200
300
KEK NagoyaTIT RikenTohokuHawaiiTokyoVPI
0
100
200
300
400
500
600
KEK NagoyaTIT Riken TokokuHawaiiTokyo VPI
M events
GHz
CPU resource available
44% at remote sites
MC events produced
~300GHz
March 24, 2003 Ichiro Adachi, CHEP03 16
Future prospects
• Short term– Software:standardize utilities– Purchase more CPUs and/or disks if budget permits…– Efficient use of resources at remote sites
• Centralized at KEK distributed over Belle-wide– Grid computing technology… just started survey & application
• Date file management• CPU usage
• SuperKEKB project– Aim 1035(or more) cm-2s-1 luminosity from 2006– Phys.rate ~100Hz for B-meson pair– 1PB/year expected– New computing system like LHC experiment can be a candidate
March 24, 2003 Ichiro Adachi, CHEP03 17
Summary
• The Belle computing system has been working fine. More than 250fb-1 of real beam data has been successfully (re)processed.
• MC samples with 3 time larger than beam data has been produced so far.
• Will add more CPU in near future for quick turn-around as we accumulate more data.
• Grid computing technology would be a good friend of ours. Start considering its application in our system.
• For SuperKEKB, we need much more resources. May have rather big impact in our system.