1 diane – distributed analysis environment jakub t. moscicki cern it/api

1

DIANE – Distributed Analysis DIANE – Distributed Analysis EnvironmentEnvironment

Jakub T. MoscickiCERN IT/[email protected]

CERN Computing Seminar, 11 Sept 2002 CERN IT/API, [email protected] 2

Distributed Analysis: MotivationDistributed Analysis: Motivation

why do we want distributed data analysis? move processing close to data

for example ntuple job description ~ kB the data itself ~ MB, GB, TB ...

rather than downloading gigabyte data let the remote server do the job

do it in parallel – faster clusters of cheap PCs

this is the view of analysis application provider


Computing ModelsComputing Models desktop computing

personal computing resource may lack CPU, high-speed access to networked

databases,... "mainframe" computing

shared supercomputer in a LAN expensive and may have scalability problems

cluster computing a collection of nodes in a LAN complex and harder to manage

grid computing a WAN collection of computing elements even more complex


Cluster Computing at CERNCluster Computing at CERN batch data analysis

e.g.: lxbatch currently in production workload management system (e.g. LSF) automatic scheduling and load-balancing

batch jobs – hours, days to complete interactive data analysis

currently desktop, will have to be distributed for LHC tried in the past for ntuple analysis

PIAF (Parallel Interactive Analysis Facility) running copies of PAW on behalf of the user. 8 nodes and tight coupling with the application layer (PAW)

semi-interactive analysis becomes more important – minutes... hours


HEP public/workgroup clusters features

many users, many jobs diverse applications:

ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines

dynamic environment users may submit their analysis code

mixed CPU and I/O intensive some applications may be preconfigured

general analysis e.g. ntuple projections or experiment specific apps load balancing important

thanks to Anaphe team


Topology of I/O Intensive App.

ntuple mostly I/O intensive rather than CPU intensive

fast DB access from cluster slow network from user to

cluster very small amount of data

exchanged between the tasks in comparison to"input" data


Parallel Ntuple Analysis data driven all workers perform same task (similar to SPMD) synchronization quite simple (independent workers) master/worker model


Simulation in Medical Apps example: brachytherapy

optimization of the treatment planning by MC simulation features

CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines

ongoing joint collaboration with G4and hospital units in Torino, Italy

thanks to M.G. Pia


Simulation in Space Science LISA: MC simulation for gravitational

waves experiment Bepi Colombo mission: HERMES experiment features

CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines

requirements: error recovery important monitoring and diagnostics

thanks to A. Howard


Master/Worker model

applications share the same computation model so also share a big part of the framework code but have different non-functional requirements


What DIANE is?What DIANE is? R&D project in IT/API

semi-interactive parallel analysis for LHC middleware technology evaluation & choice

CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID

prototyping (focus on ntuple analysis)

time scale and resources: Jan 2001: start (< 1 FTE) June 2002: running prototype exists

sample Ntuple analysis with Anaphe event-level parallel Geant4 simulation


What DIANE is?What DIANE is? framework for parallel cluster computation

application-oriented master-worker model common in HEP applications

application-independent apps dynamically loaded in a plugin style callbacks to applications via abstract interfaces

component-based subsystems and services packaged into component libraries core architecture uses CORBA and CCM (CORBA

Component Model ) integration layer between applications and the

GRID environment and deployment tools


What DIANE is What DIANE is not not ?? DIANE is not

a replacement for a GRID and its services a hardwired analysis toolkit


DIANE and GRID DIANE as a GRID computing element

...via a gateway that understands Grid/JDL ... Grid/JDL must be able to descibe parallel jobs/tasks

DIANE as a user of (low level) Grid services ...authentication, security, load balancing... and profit from existing 3rd party implementations

python environment is a rapid prototyping platform and may provide a convinient connection between DIANE

and Globus Toolkit via pyGlobus API


Architecture Overview layering: abstract middleware interfaces and components plugin-style application loading


Client Side DIANE

thin client / lightweight XML job description protocol just create a well-formed job description in XML send and read the results back as XML data messages

connection scenarios standalone clients: C++, python client apps

explicit connection from a shell prompt flexibility and choice of command-line tools

clients integrated into analysis framework: e.g. Lizard/python hidden connection behind-the-scenes

Web access: Java-CORBA binding, SOAP (?) universal and easy access


Data Exchange Protocol (1) XDR concept in C++

Specify data format Type and order of data fields

Data messages Sender and receiver agree on the format Message is send as opaque object (any) C++ type may be different at each side

Interfaces with flexible data types E.g. store list of identifiers (unknown type)


Data Exchange Protocol (2)class A : public DXP::DataObject

{

public:

DXP::String name; // predefined fundamental types

DXP::Long index;

DXP::SequenceDataObject<DXP::plain_Double> ratio;

B b; // nested complex object

A(DXP::DataObject *parent) : DXP::DataObject(parent), name(this), index(this), ratio(this), B(this) {}

};


Data Exchange Protocol (3) External streaming supported, e.g

Serialize as CORBA::byte_sequence Serialize to XML (ascii string) Visitor pattern – new formats easy

Handles Opaque objects (any) Typed objects – safe “casts”

DXP::TypedDataObject<A> a1,a2; // explicit format

DXP::AnyDataObject x = a1; // opaque object

a2 = x;

if(a1.isValid()) // "cast” successful"


Server Side Architecture

Corba Component Model (CCM) pluggable components & services make a truly component system on the core architecture

level common interface to the service components

difficult due to different nature of the services implementations

example: load-balancing service Condor - process migration LSF - black-box load balancing custom PULL implenetation - active load balancing

but first results show that it is feasible


DIANE & CORBA CORBA

industry standard (mature and tested) scalable (we need 1000s of nodes and processes) language and platform independent (IDL)

C, C++, Java, python,... many implementations commercial and open source directly supports OO, abstract interfaces CORBA facilities:

naming service, trading service etc. Corba Component Model

supports component programming (evolution of OO)


Component Technology components are not classes!

components are deployment units they live in libraries, object files and binaries they interact with the external world only via an abstract interface total separation from underlying implementation

classes are source code organization units they exist on different design levels and support different semantics

utility classes (e.g. STL vectors or smart pointers) mathematical classes (e.g. HepMatrix) complex domain classes (e.g. FML::Fitter)

but a class may implement a component OO fails to reuse, component technology might help

(hopefully)


Component Technology component-container idiom

run-time context is external to the definition of the component

component may be flexibly connected via ports to other components at run-time

Component interface

Attributes

OFFERED My

BusinessComponent

Facets

Eventsources

Eventsinks

Receptacles

REQUIRE

D

thanks to P.Merle / OMG

24

Server Side DIANE

25

Serv

er S

ide

DIAN

E


CORBA and XML in Practice inter-operability (shown in the prototype ntuple application)

cross-release (muchos gracias XML!) client running Lizard/Anaphe 3.6.6 server running 4.0.0-pre1

cross-language (muchos gracias CORBA!) python CORBA client (~30 lines) C++ CORBA server

compact XML data messages 500 bytes to server, 22k bytes from server of XML

description factor 106 less than original data (30 MB ntuple)

thin client: no need to run Lizard on the client side as an alternative use case scenario


Load balancing service Black-box (e.g. LSF)

limited control -> submit jobs (black box) job queues with CPU limits automatic load balancing, scheduling (task creation and

dispatch) prototype: deployed (~10s workers)

Explicit PULL LB custom daemons more control -> explicit creation of tasks load balancing callbacks into specific application prototype: custom PULL load-balancing (~10s workers)


Dedicated Interactive Cluster (1) Daemons per node

Dynamic process allocation


Dedicated Interactive Cluster (2) Daemons per user per node

Thread pools, per-user policies


Error Recovery Service The mechanisms

daemon control layer make sure that the core framework process are alive periodical ping – need to be hierarchized to be

scalable worker sandbox

protect from the seg-faults in the user applications memory corruption exceptions signals

based on standard Unix mechanisms: child processes and signals

31thanks to G.Chwajol


Other Services Interactive data analysis

connection-oriented vs connectionless monitoring and fault recovery

User environment replication do not rely on the common filesystem (e.g. AFS) distribution of application code

binary exchange possible for homogeneous clusters distribution of local setup data

configuration files, etc… binary dependencies (shared libraries etc)


Optimization Optimizing distributed I/O access to data

clustering of the data in the DB on the per-task basis depends on the experiment-specific I/O solution

Load balancing framework is not directly addressing low level issues ...but the design must be LB-aware

partition the initial data set and assign data chunks to tasks how big chunks? static/adaptive algorithm?

push vs pull model for dispatching tasks etc.


Further Evolution expect full integration and collaboration

with LCG according to their schedule software evolution and policy

distributed technology (CORBA, RMI, DCOM, sockets, ...) persistency technology (LCG RTAGs -> ODBMS, RDBMS,

RIO) programming/scripting languages (C++, Java, python,...)

evolution of GRID technologies and services Globus LCG, DataGrid, CrossGrid (interactive apps) ...


Limitations Model limited to Master/Worker More complex synchronization patterns

some particular cpu-intensive applications require fine-grained synchronization between workers - this is NOT provided by the framework and must be achieved by other means (e.g MPI)

Intra-cluster scope: NOT a global metacomputer

Grid-enabled gateway to enter Grid universe otherwise the framework is independent thanks to

Abstract Interfaces


Similar Projects in HEP PIAF (history)

using PAW TOP-C

G4 examples for parallelism at event-level BlueOx

Java using JAS for analysis some space for communality via AIDA

PROOF based on ROOT


Summary first prototype ready and working

proof of concept for up to 50 workers ~1000 workers needs to be checked

initial deployment integration with Lizard analysis tool Geant 4 simulation

active R&D in component architecture relation to LCG – to be established

38

That's about it

cern.ch/moscicki/work cern.ch/anaphe aida.freehep.org


Facade for end-user analysis

3 groups of user roles developers of distributed analysis applications

brand new applications e.g. simulation advanced users with custom ntuple analysis code

similar to Lizard Analyzer execute custom algorithm on the parallel ntuple scan

interactive users do the standard projections just specify the histogram and ntuple to project

user-friendly means: show only the relevant details hide the complexity of the underlying system


Facade for end-user analysis


Ntuple Projection Example example of semi-interactive analysis

data: 30 MB HBOOK ntuple / 37K rows / 160 columns time: minutes .. hours

timings desktop (400Mhz, 128MB RAM) - c.a. 4 minutes standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a. 45

sec 6 lxplus workers - c.a. 18 sec

why 6 * 18 = 45 ? job is small, so big fraction of the time is compilation and

dll loading, rather than computation pre-installing application would improve the speed caveat: example running on AFS and public machines

1 diane – distributed analysis environment jakub t. moscicki cern it/api

Documents