the computing system for the belle experiment ichiro adachi kek representing the belle dst/mc...

17
The Computing System for the Belle Experiment Ichiro Adachi KEK representing the Belle DST/MC production group CHEP03, La Jolla, California, USA March 24, 2003 •Introduction: Belle •Belle software tools •Belle computing system & PC farm •DST/MC production •Summary

Upload: barnaby-alexander

Post on 01-Jan-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

The Computing System for the Belle Experiment

Ichiro AdachiKEK

representing the Belle DST/MC production group

CHEP03, La Jolla, California, USAMarch 24, 2003

•Introduction: Belle•Belle software tools•Belle computing system & PC farm•DST/MC production•Summary

March 24, 2003 Ichiro Adachi, CHEP03 2

Introduction

• Belle experiment– B-factory experiment at

KEK– study CP violation in B

meson system. start from 1999

– recorded ~120M B meson pairs so far

– KEKB accelerator is still improving its performance

120fb-1

The largest B meson data sample at (4s) region in the world

March 24, 2003 Ichiro Adachi, CHEP03 3

Belle detectorexample of event reconstruction

fully reconstructed event

March 24, 2003 Ichiro Adachi, CHEP03 4

Belle software tools

• Home-made kits– “B.A.S.F.” for framework

• Belle AnalySis Framework• unique framework for any step of event proc

essing• event-by-event parallel processing on SMP

– “Panther” for I/O package • unique data format from DAQ to user analysi

s• bank system with zlib compression

– reconstruction & simulation library• written in C++

• Other utilities– CERNLIB/CLHEP…– Postgres for database

Input with panther

Output with panther

unpackingcalibration

trackingvertexing

clustering

particle ID

diagnosis

B.A

.S.F

.

module

loaded dynamically

shared object

Event flow

March 24, 2003 Ichiro Adachi, CHEP03 5

University resources

User analysis & storage system

Computing network for batch jobs and DST/MC production

Belle computing system

TokyoNagoyaTohoku

super-sinet1Gbps

Sun computing server

500MHz*4

GbE switch

Compaq38 hosts

online tape server

PC farms

tape library 500TB Fujitsu

HSM server

HSM library 120TB

disk 4TBfile server 8TB

500MHz*49 hosts

user PC

1GHz 100hosts

work group server

GbE switch

GbE switch

March 24, 2003 Ichiro Adachi, CHEP03 6

Computing requirements

Reprocess entire beam data in 3 months

MC size is 3 times larger than real data at least

Once reconstruction codes are updated or constants are improved, fast turn-around is essential to perform physics analyses in a timely manner

Analyses are getting matured and understanding systematic effect in detail needs large MC sample enough to do this

Added more PC farms and disks

March 24, 2003 Ichiro Adachi, CHEP03 7

0

200

400

600

800

1000

1200

1400

1600

1999.1.11999.7.12000.1.12000.7.12001.1.12001.7.12002.1.12002.7.12003.1.1

PC farm upgrade

Total CPU(GHz)

~1500GHz

boost up CPU power for DST & MC productions Delivered

in Dec.2002

Will come soon

Total CPU has become 3 times bigger in recent two years60TB(total) disks have been also purchased for storage

Total CPU = CPU processor speed(GHz) # of CPUs # of nodes

March 24, 2003 Ichiro Adachi, CHEP03 8

2% 3% 3%

2%

11%

21%

2%25%

31%

Belle PC farm CPUs-heterogeneous system from various vendors-CPU processors(Intel Xeon/PenIII/Pen4/Athlon)

Fujitsu 127PCs(Pentium-III 1.26GHz)

Appro113PCs(Athlon 2000+)

Compaq 60PCs(Intel Xeon 0.7GHz)

380GHz

320GHz

168GHz

Dell 36PCs(Pentinum-III ~0.5GHz)

470GHz

NEC 84PCs(Pentium4 2.8GHz)

setting up done

will come soon

March 24, 2003 Ichiro Adachi, CHEP03 9

DST production & skimming scheme

PC farm

raw data

DST data

disk

DST data

histogramslog files

disk or HSM

skims such as hadronic data sample

Sun

disk

data transfer

Sun

1. Production(reproduction)

2. Skimming

histogramslog files

user analysis

March 24, 2003 Ichiro Adachi, CHEP03 10

Output skims• Physics skims from reprocessing

– “Mini-DST”(4-vectors) format– Create hadronic sample as well as typical physics channels(up to

~20 skims) • many users do not have to go through whole hadronic sample.

– Write data onto disk at Nagoya(350Km away from KEK) directly using NFS(thanks to super-sinet link of 1Gbps)

hadronic mini-DST

J/ inclusivebs D*

s

full reconmini-DST reprocessing output

Nagoya~350Km from KEK

1Gbps

KEK site

March 24, 2003 Ichiro Adachi, CHEP03 11

Processing power & failure rate

• Processing power– Processing ~1fb-1 per day with 180GHz

• Allocate 40 PC hosts(0.7GHzx4CPU) for daily production to catch up with DAQ– 2.5fb-1 per day possible

• Processing speed(in case of MC) with 1GHz one CPU– Reconstruction: 3.4sec– Geant simulation: 2.3sec

• Failure rate for one B meson pair

module crash < 0.01%

tape I/O error 1%

process communication error

3%

network trouble/system error

negligible

March 24, 2003 Ichiro Adachi, CHEP03 12

Reprocessing 2001 & 2002

• Reprocessing– major library &

constants update in April

– sometimes we have to wait for constants

• Final bit of beam data taken before summer shutdown always reprocessed in time

For 2001 summer 30fb-

1

For 2002 summer 78fb-

1

2.5months

3months

March 24, 2003 Ichiro Adachi, CHEP03 13

MC production• Produce ~2.5fb-1 per day with 400GHz PenIII

– Resources at remote sites also used• Size 15~20GB for 1 M events.

– 4-vector only• Run dependent

beam data file

Run# xxxB0 MC data

B+B- MC data

charm MC data

min. set of generic MC

Run# xxx

light quark MC

run-dependent background IP profile

mini-DST

March 24, 2003 Ichiro Adachi, CHEP03 14

MC production 2002

• Keep producing MC generic samples– PC farm shared with DST – Switch from DST to MC prod

uction can be made easily• Reached 1100M events in

March 2003. 3 times larger samples of 78fb-1 completed

major update

minor change

March 24, 2003 Ichiro Adachi, CHEP03 15

MC production at remote sites

• Total CPU resources at remote sites is similar to KEK

• 44% of MC samples has been produced at remote sites– All data transferred to K

EK via network• 6~8TB in 6 months

0

100

200

300

KEK NagoyaTIT RikenTohokuHawaiiTokyoVPI

0

100

200

300

400

500

600

KEK NagoyaTIT Riken TokokuHawaiiTokyo VPI

M events

GHz

CPU resource available

44% at remote sites

MC events produced

~300GHz

March 24, 2003 Ichiro Adachi, CHEP03 16

Future prospects

• Short term– Software:standardize utilities– Purchase more CPUs and/or disks if budget permits…– Efficient use of resources at remote sites

• Centralized at KEK distributed over Belle-wide– Grid computing technology… just started survey & application

• Date file management• CPU usage

• SuperKEKB project– Aim 1035(or more) cm-2s-1 luminosity from 2006– Phys.rate ~100Hz for B-meson pair– 1PB/year expected– New computing system like LHC experiment can be a candidate

March 24, 2003 Ichiro Adachi, CHEP03 17

Summary

• The Belle computing system has been working fine. More than 250fb-1 of real beam data has been successfully (re)processed.

• MC samples with 3 time larger than beam data has been produced so far.

• Will add more CPU in near future for quick turn-around as we accumulate more data.

• Grid computing technology would be a good friend of ours. Start considering its application in our system.

• For SuperKEKB, we need much more resources. May have rather big impact in our system.