dØ race introduction current status dØram architecture regional analysis centers conclusions dØ...

30
DØ RACE DØ RACE •Introduction •Current Status •DØRAM Architecture •Regional Analysis Centers •Conclusions DØ Internal Computing Review DØ Internal Computing Review May 9 – 10, 2002 May 9 – 10, 2002 Jae Yu Jae Yu

Upload: ellen-stephens

Post on 03-Jan-2016

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

DØ RACEDØ RACE

•Introduction •Current Status•DØRAM Architecture•Regional Analysis Centers•Conclusions

DØ Internal Computing ReviewDØ Internal Computing ReviewMay 9 – 10, 2002May 9 – 10, 2002

Jae YuJae Yu

Page 2: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

2

How Do You Want to Do?• John Krane would say “ I want to measure

inclusive jet cross section at my desk in ISU!!”• Chip Brock would say “ I want to measure W cross

section at MSU!!”• Meena would say “ I want to find the Higgs at BU!!”• All of the above should be possible in the “near”

future!!!• What do we need to do to accomplish the above?

Page 3: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

3

What is DØRACE, and Why Do We Need It?• DØ Remote Analysis Coordination Efforts• In existence to accomplish:

– Setting up and maintaining remote analysis environment – Promote institutional contribution remotely– Allow remote institutions to participate in data analysis– To prepare for the future of data analysis

• More efficient and faster delivery of multi-PB data• More efficient sharing processing resources• Prepare for possible massive re-processing and MC production to

expedite the process• Expedite physics result production

Page 4: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

4

DØRACE Cont’d– Maintain self-sustained support amongst the remote institutions

to construct a broader bases of knowledge– Alleviate the load on expert by sharing the knowledge and allow

them to concentrate on preparing for the future– Improve communication between the experiment site and the

remote institutions – Minimize travel around the globe for data access– Sociological issues of HEP people at the home institutions and

within the field. • Primary goal is allow individual desktop users to

make significant contribution without being at the lab

Page 5: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

5

• Identified Difficulties– Having hard time setting up initially

• Lack of updated documentation• Rather complicated set up procedure• Lack of experience No forum to share experiences

– OS version differences (RH6.2 vs 7.1), let alone OS– Most the established sites have easier time updating releases– Network problems affecting successful completion of large size

releases (4GB) takes a couple of hours (SA)– No specific responsible persons to ask questions– Availability of all necessary software via UPS/UPD– Time difference between continents affecting efficiencies

From the Nov. Survey

Page 6: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

6

• Categorized remote analysis system set up by the functionality– Desk top only– A modest analysis server– Linux installation– UPS/UPD Installation and deployment– External package installation via UPS/UPD

• CERNLIB• Kai-lib• Root

– Download and Install a DØ release• Tar-ball for ease of initial set up?• Use of existing utilities for latest release download

– Installation of cvs – Code development– KAI C++ compiler– SAM station setup

DØRACE Strategy

Phase IRootupleAnalysis

Phase 0Preparation

Phase IIExecutables

Phase IIICode Dev.

Phase IVData Delivery

Page 7: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

7

What has been accomplished?• Regular bi-weekly meetings every on-week Thursdays

– Remote participating through video conferencing (ISDN) Moving toward switching over to VRVS per VCTF’s recommendation

– Keep up with the progress via site reports– Provide forum to share experience

• DØRACE home page established (http://www-hep.uta.edu/~d0race) – To ease the barrier over the difficulties in initial set up – Updated and simplified instructions for set up available on the web

Many institutions have participated in refining the instruction – Tools to make DØ software download and installation made available – More tools identified and are in the works (Need to automate download

and installation as much as we can, if possible one button based operation)

Page 8: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

8

• Release Ready notification system activated– Success is defined by institutions– Pull system Institutions decide whether to download and

install a specific release• Build Error log and dependency tree utility in place • Release packet split to minimize network dependence• Automated one-button release download and operation

utility in the works• DØRACE workshop with hands-in session in Feb.

Page 9: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

9

DØRACE Status by Setup Phases

17

39

0 2

13

0

17

34

4 47 5

1619

0 2

1916

05

1015202530354045

NoInterest

Phase 0 Phase I Phase II Phase III Phase IV

Phases

Num

ber

of In

stitu

tions

Nov. Survey

Before 2/11

May 8, 2002

Progressive

Page 10: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

10

Where are we?• DØRACE has entered the next stage

– The compilation and running– Active code development– Propagation of setup to all institutions

• Instructions seem to take their shape well• Need to maintain and to keep them up to date• Support to help problems people encounter

• DØGRID– Prepare SAM and other utilities for transparent and efficient

remote contribution• Need to establish Regional Analysis Centers

Page 11: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

11

Central Analysis Center (CAC)

RegionalAnalysis Centers

RAC RAC ….

InstitutionalAnalysis Centers

DesktopAnalysis Stations DAS DAS…. DAS DAS….

….

Provide Various Services

IAC….

IAC IAC….

IAC

Normal InteractionCommunication Path

Occasional Interaction Communication Path

Proposed DØRAM Architecture

Page 12: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

12

Why do we need a DØRAM?• Total Run II data size reaches multiple PB• Data should be readily available for transparent and

expeditious analyses– Preferably disk resident so that time for caching is minimized

• Analysis processing compute power should be available without having the users relying on CAC

• MC generation should be transparently done• Should exploit compute resources at remote sites• Should exploit human resources at remote sites• Minimize resource needs at the CAC

– Different resources will be needed

Page 13: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

13

What is a DØRAC?• An institute with large concentrated and available

computing resources– Many 100s of CPUs– Many 10s of TBs of disk cache– Many 100Mbytes of network bandwidth– Possibly equipped with HPSS

• An institute willing to provide services to a few small institutes in the region

• An institute willing to provide increased infrastructure as the data from the experiment grows

• An institute willing to provide support personnel if necessary

Page 14: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

14

Chip’s W x-sec Measurement3

4

2

Page 15: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

15

What services do we want a DØRAC do?

1. Provide intermediary code distribution2. Generate and reconstruct MC data set3. Accept and execute analysis batch job requests4. Store data and deliver them upon requests5. Participate in re-reconstruction of data6. Provide database access7. Provide manpower support for the above

activities

Page 16: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

16

Code Distribution Service• Current releases: 4GB total will grow to >8GB?• Why needed?:

– Downloading 8GB once every week is not a big load on network bandwidth– Efficiency of release update rely on Network stability– Exploit remote human resources

• What is needed?– Release synchronization must be done at all RACs every time a new

release become available– Potentially need large disk spaces to keep releases – UPS/UPD deployment at RACs

• FNAL specific• Interaction with other systems?

– Need administrative support for bookkeeping• Current DØRACE procedure works well, even for individual users

Do not see the need for this service

Page 17: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

17

Generate and Reconstruct MC data• Currently done 100% at remote sites• Why needed?

– Extremely self-contained• Code distribution done via a tar-ball

– Demand will grow – Exploit available compute resources

• What is needed?– A mechanism to automate request processing– A Grid that can

• Accept job requests• Packages the job• Identify and locate the necessary resources• Assign the job to the located institution• Provide status to the users• Deliver or keep the results

• Perhaps most undisputable task but do we need a DØRAC?

Page 18: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

18

Batch Job Processing• Currently rely on FNAL resources

– D0mino, ClueD0, CLUBS, etc• Why needed?:

– Bring the compute resources closer to the user– Distribute the computing load to available resources– Allow remote users to process their jobs

expeditiously– Exploit the available compute resources– Minimize resource load at CAC– Exploit remote human resources

Page 19: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

19

Batch Job Processing cont’d• What is needed?

– Sufficient computing infrastructure to process requests• Network• CPU• Cache storage

– Access to relevant databases– A Grid that can:

• Accept job requests• Packages the job• Identify and locate the necessary resources• Assign the job to the located institution• Provide status to the users• Deliver or keep the results

• This task definitely needs a DØRAC– What do we do with input? Keep them at RACs?

Page 20: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

20

Data Caching and Delivery• Currently only at FNAL• Why needed?

– Limited disk cache at FNAL• Tape access needed• Latencies involved, sometimes very long

– Delivering data within a reasonable time over the network to all the requests is imprudent

– Reduce resource load on the CAC– Data should be readily available to the users with

minimal latency for delivery

Page 21: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

21

Data Caching and Delivery cont’d• What is needed?

– Need to know what data and how much we want to store• 100% TMB• 10-20% DST?• Any RAW data at all?• What about MC? 50% of the actual data

– Should be on disk to minimize data caching latency• How much disk space? (~50TB if 100% TMB and 10% DST for

RunIIa)– Constant shipment of data to all RACs from the CAC

• Constant bandwidth occupation (14MB/sec for Run IIa RAW)• Resources from CAC needed

– A Grid that can• Locate the data (SAM can do this already…)• Tell the requester about the extent of the request• Decide whether to move the data or pull the job over

Page 22: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

22

Data Reprocessing Services• These include:

– Re-reconstruction of the actual and MC data• From DST?• From RAW?

– Re-streaming of data– Re-production of TMB data sets– Re-production of roottree– ab initio reconstruction

• Currently done only at CAC offline farm

Page 23: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

23

Reprocessing Services cont’d• Why needed?:

– The CAC offline farm will be busy with fresh data reconstruction• Only 50% of the projected capacity is used for this but …• Going to be harder to re-reconstruct as more data accumulates

– We will have to • Reconstruct a few times (>2) to improve data• Re-stream TMB• Re-produce TMB’s from DST and RAW• Re-produce root-tree

– It will take many months to re-reconstruct the large amount of data• 1.5 Mo with 500 4GHz machines for Run IIa• 7.5 to 9 Mos for full reprocessing Run IIb

– Exploit resources in remote institutions– Expedite re-processing for expeditious analyses

• Cutting down the time by a factor of 2 to 3 will make a difference– Reduce the load on CAC offline farm– Just in case the CAC offline farm is having trouble, the RACs can even help

out with ab initio reconstruction

Page 24: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

24

Reprocessing Services cont’d• What is needed?

– Permanently store necessary data, because it would take a long time just to transfer data

• DSTs• RAW

– Large data storage– Constant data transfer from CAC to RACs as we take and

reconstruct data• Dedicated file server for data distribution to RACs• Constant bandwidth occupation• Sufficient buffer storage at CAC in case network goes down• Reliable and stable network

– Access to relevant databases• Calibration • Luminosity• Geometry and Magnetic Field Map

Page 25: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

25

Reprocessing Services cont’d– Transfer of new TMB and Roottrees to other sites– Well synchronized reconstruction code– A grid that can

• Identify resources on the net• Optimize resource allocation for most expeditious reproduction• Move data around if necessary

– A dedicated block of time for concentrated CPU usage if disaster strikes

– Questions• Do we keep copies of all data at the CAC?• Do we ship DSTs and TMBs back to CAC?

• This service is perhaps the most debatable one but I strongly believe this is one of the most valuable functionality of RAC.

Page 26: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

26

Database Access Service• Currently done only at CAC• Why needed?

– For data analysis– For reconstruction of data– To exploit available resources

• What is needed?– Remote DB access software services– Some copy of DB at RACs– A substitute of Oracle DB at remote sites– A means of synchronizing DBs

• A possible solution is proxy server at the central location supplimented with a few replicated DB for backup

Page 27: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

27

What services do we want a DØRAC do? Provide intermediary code distribution Generate and reconstruct MC data set Accept and execute analysis batch job requests Store data and deliver them upon requests Participate in re-reconstruction of data Provide database access Provide manpower support for the above

activities

Page 28: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

28

Progress in DØRAC Proposal

• Working group membersI. Bertram, R. Brock, F. Filthaut, L. Lueking, P. Mattig,M. Narain , P. Lebrun, B. Thooris , J. Yu, C. Zeitnitz

• A proposal document has been worked on– Target to release within two weeks, sufficiently prior

to the Director’s review in June– Doc. At :

http://www-hep.uta.edu/~d0race/d0rac-wg/d0rac-spec-050602.pdf

Page 29: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

29

DØRAC Implementation Timescale

• Implement First RAC by Oct. 1, 2002– Cluster associated IAC’s– Transfer Thumbnail data set constantly from CAC to

the RAC• Workshop on RAC in Nov., 2002• Implement the next set of RAC by Apr. 1, 2003

Page 30: DØ RACE Introduction Current Status DØRAM Architecture Regional Analysis Centers Conclusions DØ Internal Computing Review May 9 – 10, 2002 Jae Yu

May 9, 2002 DØRACEDØ Internal Review, Jae Yu

30

Conclusions

• DØRACE has been rather successful• DØ must prepare for large data set era • Need to expedite analyses in timely fashion• Need to distribute data set throughout the collaboration• DØRAC proposal almost ready for release • Establishing regional analysis centers will be the first

step toward DØ Grid By the end of Run IIa (2-3 years)