parallel tomography shava smallen cse dept. u.c. san diego

Parallel Tomography

Shava Smallen

CSE Dept.

U.C. San Diego

Parallel Tomography

Tomography: reconstruction of 3D image from 2D projections

Used for electron microscopy at National Center for Microscopy and Imaging Research (NCMIR)

Coarse-grain, embarrassingly parallel application

Goal of project: Achieve performance using AppLeS scheduling and Globus services

Parallel Tomography Structure

driver

reader

ptomo

ptomo

ptomo

writer

preprocessor

destsource

Off-line

Solid lines = data flowdashed lines = control


driver

reader

ptomo

ptomo

ptomo

writer

preprocessor

destsource

On-line

Solid lines = data flowdashed lines = control


driver: directs communication among processes, controls work queue

preprocessor: serially formats raw data from microscope for parallel processing

reader: reads preprocessed data and passes it to a ptomo process

ptomo: performs tomography processing writer: writes processed data to destination (i.e.

visualization device, tape)

NCMIR Environment Platform: collection of heterogeneous resources

workstations (e.g. sgi, sun) NPACI supercomputer time

(e.g. SP-2, T3E)

How can users achieve execution performance in heterogeneous, multi-user environments?

Performance and Scheduling

Users want to achieve minimum turnaround time in the following scenarios: off-line: already collected data set on-line: data streamed from electron microscope

Goal of our project is to develop adaptive scheduling strategies which promote both performance and flexibility for tomography in multi-user Globus environments

Adaptive Scheduling strategy Develop schedule which adapts to deliverable resource

performance at execution time Application scheduler will dynamically

select sets of resources based on user-defined performance measure

plan possible schedules for each set of feasible resources

predict the performance for each schedule implement best predicted schedule on selected

infrastructure

AppLeS = Application-Level Scheduling Each AppLeS is an adaptive application scheduler

AppLeS+application = self-scheduling application

scheduling decisions based on dynamic information (i.e. resource load forecasts) static application and system information

Resource Selection available resources

workstations run immediately execution may be slow due to load

supercomputers may have to wait in a queue execution fast on dedicated nodes

We want to schedule using both types of resources together for an improved execution performance

Allocation Strategy We have developed a strategy which

simultaneously schedules on both workstations immediately available supercomputer nodes

avoid wait time in the batch queue information is exported by batch scheduler

Overall, this strategy performs better than running on either type of resource alone

Preliminary Globus/AppLeS Tomography Experiments

Resources 6 workstations available at Parallel

Computation Laboratory (PCL) at UCSD immediately available nodes on SDSC

SP-2 (128 nodes) Maui scheduler exports the number of

immediately available nodes• e.g. 5 nodes available for the next 30 mins

10 nodes available for the next 10 mins

Globus installed everywhere

W

W

W W

W

WNCMIR

SDSC

SP-2

W

W

W

W W

W

PCL

Allocation Strategies compared 4 strategies compared:

SP2Immed/WS: workstations and immediately available SP-2 nodes

WS: workstations only SP2Immed: immediately available SP-2 nodes only SP2Queue(n): traditional batch queue submit using n nodes

experiments performed in production environment ran experiments in sets, each set contains all strategies

e.g. SP2Immed, SP2Immed/WS, WS, SP2Queue(8) within a set, experiments ran back-to-back

Results (8 nodes on SP-2)

Targeting Globus

AppLeS uses Globus services GRAM, GSI, & RSL

process startup on workstations and batch queue systems

remote process control Nexus

interprocess communication multi-threaded callback functions for fault tolerance

Experiences with Globus

What we would like to see improved: free node information in MDS (e.g. time availability,

frequency) steep learning curve to initial installation

knowledge of underlying Globus infrastructure startup scripts

more documentation more flexible configuration

Experiences with Globus

What worked: once installed, software works well responsiveness and willingness to help from Globus

team mailing list web pages

Future work

Develop contention model to address network overloading which includes network capacity information model of application communication requirements

Expansion of allocation policy to include additional resources other Globus supercomputers, reserved resources (GARA) availability of resources

NPACI Alpha project with NCMIR and Globus

AppLeS

AppLeS/NCMIR/Globus Tomography Project Shava Smallen, Jim Hayes, Fran Berman, Rich

Wolski, Walfredo Cirne (AppLeS) Mark Ellisman, Marty Hadida-Hassan, Jaime Frey

(NCMIR), Carl Kesselman, Mei-Hui Su (Globus)

AppLeS Home Page www-cse.ucsd.edu/groups/hpcl/apples

[email protected]

parallel tomography shava smallen cse dept. u.c. san diego

Documents

available sp

nodes available

available supercomputer

available nodese

data flowdashed lines

raw data

preprocessed data

tomography processingwriter