reduction data treatment for arcs tim kelley, caltech materials science danse workshop—june 22,...

Post on 19-Dec-2015

220 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

reductiondata treatment for ARCS

Tim Kelley, Caltech Materials Science

DANSE workshop—June 22, 2004

reduction

• components to reduce raw, inelastic data from time-of-flight chopper spectrometer

• essential processing—must be done for every experiment

• validation is essential

• descended from Pharos IDL procedures

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

gross testing

recognizable?

Success!

Yes

oh, $%^!

No

gross testing useful only as long as it is not needed!

unit testing

• test every execution path

• programmer provides– standard inputs, expected results– means of comparing actual and expected

results to determine pass/fail

• combine unit tests to create regression tests

• unit tests part of writing code!

example: execution paths

def theAnswer( yourInt): if yourInt == 42: return True else: return False

def theAnswer2( yourInt): if ‘Tim’ in os.uname(): raise RuntimeError, “I hate that guy!” if yourInt == 42: return True else: return False

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

two approaches

why change?

• information sources badly mixed– data formats, instrument specs, algorithms– not reusable

• monolithic– unit testing difficult– difficult/impossible to tinker with

• slow

change to what?

• Object oriented– separate, equalize information sources

• histograms, instrument data, transformations

• Finer granularity, more layered– simpler components, easier to test, replace– add scientific context to generic components

• Unit tests, integrate nexus, Pyre

example: layering

C++

Python

std::vector

StdVector

S_QE

histograms

basic layers have no scientific content: can be reused for any science

higher layers add context

reduction structure

C++

Python

histograms instruments transformations

Store and access n. s. data• multidimensional data sets• 200—1400 MB

need• ability to associate metadata• efficient iteration• availability, stability

Histograms

Histograms: C++

Storage based on STL vector class• ANSI standard C++—universally available• Optimized by compilers

Adaptor classes extend to• Different storage schemes• Different dimensionality

Low-level routines: arithmetic, average, etc.

Histograms: Python

Group C++ containers to model complex data sets

• associate signal & error histograms, ...

Add context, preferred behaviors• names, axes, units• error propogation?

Instruments

Find, store, and serve instrument data• detector & monitor configuration, properties

– pixel-sample distances, angles, dimensions, ...

• moderator, chopper, sample positions...

Need• comprehensive description of instrument• flexibility to describe different instruments

Instruments: C++

Abstract base classes parallel NeXus

Concrete subclasses specialize to given instrument, detector type, etc.

STL-based containers implement sorting• e.g. pixels sorted by

– pixel-sample distance– scattering angle(s)

Instruments: Python

Interface to instrument descriptions• Hard-coded• NeXus or other files

Pass info to C++ layer• Python is better suited to parsing tasks• Also useful for instrument diagnostics and

simulation

Transformations

Variable changes, reductions, slicing• time-of-flight to energy• detector coord. to scattering angle (powder)

• S( Qi, Qj) for Qi (Qx, QY, Qz, E), ...

Need• scientific integrity• efficient (fast) operations• flexibility

Transformations: C++

Modular design• Building blocks perform simple tasks

– BinOverlapCalculator– BinOverlapMultiplier

Generic (iterator) interface• Independent of underlying container• Independent of instrument

Compiled, optimized for maximum speed

Transformations: Python

Driver routines• coordinate simpler objects to perform complex

tasks• EnergyRebinDriver, QRebinDriver • easily modified by user

Experiment with pipeline strategies• e.g. reduce data one pixel at a time

– Trivially parallel for much of process– Real-time refinement of ARCS data!

example: a C++ transformnamespace PharosMath{ template <typename NumT> void reduceSum2d( std::vector<NumT> const &vec2d, std::vector<NumT> & vec1d, std::vector<size_t> const & sizes, size_t axis) ...

1 1 1 1

1 1 1 1

1 1 1 1

4

4

4

3 3 3 3

Reduce a 2d array to 1d by summing rows or columns.

reduceSum2d unit tests

test_reduceSum2d

test_1

test_2

test_3

test_4

test_5

_runTest

_compareVecs

input #1 input #2

input sizes

reduceSum2d

exposing it Python

template <typename NumT>static void _callReduceSum2d( PyObject *py2dv, PyObject *py1dv, std::vector<size_t> const & dimSizes, size_t axis){ std::vector<NumT> *p2dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py2dv)); std::vector<NumT> *p1dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py1dv)); Pharos::reduceSum2d<NumT>( *p2dv, *p1dv, dimSizes, axis); return;}

calling it from Python

reduction.reduceSum2d( vector2d.handle(), vector2d.datatype(), vector1d.handle(), dimSizes, whichAxis)

a more interesting Python call

def S_QE_To_S_E( sQE):

sE = SE( sQE.datatype(), sQE.numE())

reduction.reduceSum2d( sQE.data(), sQE.datatype(), sE.data(),

sQE.sizes(), sQE.QAxis() )

return sE

instance of class that encapsulates S(|Q|,E).

class that encapsulates S(E)

Implementation

Category Progress To Do

Histograms•STL routines +90%•Pharos adaptors 100 %•unit testing ~50%

•“high-level” classes•abstract Adaptor hierachy

Instrument•Pharos/ARCS C++ classes 100%

•abstract “NeX-ified” hierarchy•unit testing ~20%

Transforms•S(|Q|, E) powders 98%•S(Q, E) sin. cryst. 30%

•revise interface•unit testing ~40%

Primary: C++ ~ 4000 lines; Python ~ 1800 lines; Auxiliary: Python-C bindings~4500 lines; tests~4800+ lines

how much effort?

• “topline” Original IDLC/C++ conversion 2-3 months

• topline reduction 10-12 weeks

Still to do

• Complete unit tests ~ 3-4 weeks• Another iteration ~ 4-6 weeks

– code cleaning/reorg (1)

– iterator interfaces for transformations (1-2)

– Python dataset classes (1-2)

– Performance questions (1)

• Test/integrate single crystal ~ 2 weeks• Pyre integration ~ 3+ weeks• Related, general tasks ~ 4-5 weeks

– Nexus C++ hierarchy (2)

– Strengthening Python-C++ integration (2-3)

top related