reconstruction and analysis on demand: a success story

Reconstruction and Analysis on Demand:

A Success Story

Christopher D. Jones

Cornell University, USA

C. Jones CHEP03 2

Overview

• Describe “Standard” processing model

• Describe “On Demand” processing model– Similar to GriPhN’s “Virtual Data Model”

• What we’ve learned

• User reaction

• Conclusion

C. Jones CHEP03 3

Standard Processing System

• Designed for reconstruction– All objects are supposed to be created for each event

• Each processing step is broken into its own module– E.g., track finding and track fitting are separate

• The modules are run in a user-specified sequence

• Each module adds its data to the ‘event’ when the module is executed

• Each module can halt the processing of an event

InputModule

Track Finder Track Fitter OutputModule

C. Jones CHEP03 4

Critique of Standard Design

• Good– Simple mental model

• Users can feel confident they know how the program works

– Easy to debug• Simple to determine which module had a problem

• Bad– User must know inter-module dependencies in order to place the

modules in the correct sequence• Users often run jobs with many modules they do not need in order to

avoid missing a module they might need

– Optimization of module sequence must be done by hand

– Reading back from storage is inefficient• Must create all objects from storage even if job does not use them

C. Jones CHEP03 5

On-demand System

• Designed for analysis batch processing– Not all objects need to be created each event

• Processing is broken into different types of modules– Providers

• Source: reads data from a persistent store• Producer: creates data on demand

– Requestors• Sink: writes data to a persistent store• Processor: analyzes and filters ‘events’

• Data providers register what data they can provide• Processing sequence is set by the order of data requests• Only Processors can halt the processing of an ‘event’

Source Processor A Processor B Sink

C. Jones CHEP03 6

Data Model

A Record holds all data that are related by life-timee.g., Event Record holds Raw Data, Tracks, Calorimeter Showers, etc.

A Stream is a time-ordered sequence of Records

A Frame is a collection of Records that describe the state of the detector at an instant in time.

All data are accessed via the exact same interface and mechanism

C. Jones CHEP03 7

Data Flow: Frame as Data Bus

EventDatabase

CalibrationDatabase TrackFinder TrackFitter

Frame

SelectBtoKPi EventDisplay Event List

Sources: data from storage Producers: data from algorithm

Processors: analyze and filter data Sinks: store data

Data Providers: data returned when requested

Data Requestor: sequentially run requestors for each new Record from a source

C. Jones CHEP03 8

Callback Mechanism

• Provider registers a Proxy for each data type it can create• Proxies are placed in the Record and indexed with a key

– Type: the object type returned by the Proxy– Usage: an optional string describing use of object – Production: an optional run-time settable string

• Users access data via a type-safe templated function callList<FitPion> pions;extract( iFrame.record(kEvent), pions);

• (based on ideas from Babar’s Ifd package)

• extract call builds the key and asks Record for Proxy• Proxy runs algorithm to deliver data

– Proxy caches data in case of another request– If a problem occurs, an exception is thrown

C. Jones CHEP03 9

Callback Example: Algorithm

Processor SelectBtoKPi

Producer Track Fitter

FitPionsProxyFitKaonsProxy…

Track Finder

TracksProxy

HitCalibrator

CalibratedHitsProxy

Source Calibration DB

PedestalProxyAlignmentProxy…

Raw Data File

RawDataProxy

C. Jones CHEP03 10

Callback Example: Storage

Processor SelectBtoKPi

Source Event Database

FitPionsProxyFitKaonsProxyRawDataProxy…

In both examples, same SelectBtoKPi shared object can be used

C. Jones CHEP03 11

Critique of On-demand System

• Good– Can be used for all data access needs

• Online software trigger, Online data quality monitoring, Online event display, calibration, reconstruction, MC generation, Offline event display, analysis

– Self organizes calling chain• Users can add Producers in any order

– Optimizes access from Storage• Sources only need to say when a new Record (e.g., event) is available

• Data for a Record is retrieved/decoded on demand

• Bad– Can be harder to debug since no explicit call order

• Use of exceptions key to simplifying debugging

– Performance testing is more challenging

C. Jones CHEP03 12

What We Have Learned

• First release of the system was September 1998

• Callback mechanism can be made fast– Proxy lookup takes less than 1 part in 10-7 of CPU time on simple

job that processed 2,000 events/s on moderate computer

• Cyclical dependencies are easy to find and fix– Only happened once and was found immediately on first test

• Do not need to modify data once it is created– Preliminary versions of data are given their own key

• Automatically optimizes performance of reconstruction– Trivially added filter to remove junk events by using FoundTracks

• Optimize analysis by storing many small objects– Only need to retrieve and decode data needed for current job

C. Jones CHEP03 13

User Reactions

• In general, user response has been very positive– Previously CLEO used a ‘standard system’ written in FORTRAN

• Reconstruction coders like the system– We have code skeleton generators for Proxy/Producer/Processor

• Only need to add their specific code

– Easy for them to test their code

• Analysis coders can still program the ‘old way’– All analysis code in the ‘event’ routine

• Some analysis coders are pushing bounds– Place selectors (e.g. cuts for tracks) in Producers

• Users share selectors via dynamically loaded Producers

– Processor only used to fill Histograms/Ntuples

– If stored selections, only rerun Processor when reprocessing data

C. Jones CHEP03 14

Conclusion

• It is possible to build an ‘on demand’ system that is– efficient

– debuggable

– capable of dealing with all data (not just data in an event)

– easy to write components

– good for reconstruction

– acceptable to users

• Some reasons for success– Skeleton code generators

• User only has to write new code, not infrastructure ‘glue’

– Users do not need to register what data they may request• Data reads occur more frequently than writes

– Simple rule for when algorithms run• If you add a Producer, it takes precedence over a Source

reconstruction and analysis on demand: a success story

Documents