parallel computing 2009 – vrije universiteit, amsterdam1 multimedia content analysis on clusters...

72
Parallel Computing 2009 – Vrije Universiteit, Amsterdam 1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra ([email protected] ) Computer Systems Group, Faculty of Sciences, Vrije Universiteit, Amsterdam

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

Parallel Computing 2009 – Vrije Universiteit, Amsterdam1

Multimedia Content Analysison Clusters and Grids

Frank J. Seinstra([email protected])

Computer Systems Group, Faculty of Sciences,

Vrije Universiteit, Amsterdam

Page 2: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

2 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Overview (1)

Part 1: What is Multimedia Content Analysis (MMCA)? Part 2: Why parallel computing in MMCA – and how? Part 3: Software Platform: Parallel-Horus Part 4: Example – Parallel Image Processing on Clusters

Page 3: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

3 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Overview (1)

Part 5: ‘Grids’ and their specific problems Part 6: A Software Platform for MMCA on ‘Grids’? Part 7: Large-scale MMCA applications on ‘Grids’ Part 8: Future research directions

Page 4: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

4 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Introduction

A Few Realistic Problem Scenarios

Page 5: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

5 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

A Real Problem…

News broadcast - September 21, 2005:

Police Investigation: over 80.000 CCTV recordings First match found only 2.5 months after attacks

automatic

analysis?

Page 6: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

6 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Another real problem…

Web Video Search:

Sarah PalinSarah Palin

Search based on annotations Known to be notoriously bad (e.g, YouTube) Instead: search based on video content

Page 7: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

7 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Are these realistic problems?

NFI (Dutch Forensics Institute, Den Haag): Surveillance Camera Analysis Crime Scene Reconstruction

Beeld&Geluid (Dutch Institute for Sound and Vision, Hilversum): Interactive access to Dutch national

TV history

Page 8: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

8 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

But there are many more:

Healthcare

Astronomy

Remote Sensing

Entertainment (e.g. see: PhotoSynth.net)

….

Page 9: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

9 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 1

What is Multimedia Content Analysis?

Page 10: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

10 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Multimedia

Multimedia = Text + Sound + Image + Video + ….

Video = image + image + image + …. In many (not all) multimedia applications:

calculations are executed on each separate video frame independently

So: we focus on Image Processing (+ Computer Vision)

Page 11: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

11 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

What is a Digital Image?

“An image is a continuous function that has been discretized in spatial coordinates, brightness and color frequencies”

Most often: 2-D with ‘pixels’ as scalar or vector value

However: Image dimensionality can range from 1-D to n-D

Example (medical): 5-D = x, y, z, time, emission wavelength

Pixel dimensionality can range from 1-D to n-D Generally: 1D = binary/grayscale; 3D = color (e.g. RGB)

n-D = hyper-spectral (e.g. remote sensing by satellites)

Page 12: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

12 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Image ===> (sub-) Image Image ===> Scalar / Vector Value Image ===> Array of S/V Values

===> Feature Vector

Complete A-Z Multimedia Applications

A Z

Out: ‘meaning’ful result In: image

“Blue Car”“Supernova

at X,Y,t…”

“Pres. Bush

stepping off

Airforce 1”K R

Low level operations Intermediate level operations High level operations

(Parallel-) Horus

Impala

Page 13: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

13 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Low Level Image Processing Patterns (1)

= Unary Pixel Operation

(example: absolute value)

+ =Binary Pixel Operation

(example: addition)

+ =Template / Kernel / Filter /

Neighborhood Operation

(example: Gauss filter)

N-ary Pixel Operation…

Page 14: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

14 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Low Level Image Processing Patterns (2)

= Reduction Operation

(example: sum)

= N-Reduction Operation

(example: histogram)

2 1 7 6 4

=+ M Geometric Transformation

(example: rotation)

transformation matrix

Page 15: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

15 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Example Application: Template Matching

Input Image

Template

Result Image

for all images {

inputIm = readFile ( … );

unaryPixOpI ( sqrdInIm, inputIm, “set” );

binaryPixOpI ( sqrdInIm, inputIm, “mul” );

for all symbol images { symbol = readFile ( … ); weight = readFile ( … ); unaryPixOpI (filtIm1, sqrdInIm, “set”); unaryPixOpI (filtIm2, inputIm, “set”); genNeighborhoodOp (filtIm1, borderMirror, weight, “mul”, “sum”); binaryPixOpI (symbol, weight, “mul” ); genNeighborhoodOp (filtIm2, borderMirror, symbol, ”mul”, “sum”); binaryPixOpI (filtIm1, filtIm2, “sub”); binaryPixOpI (maxIm, filtIm1, “max”); } writeFile ( …, maxIm, … );}

See: http:/www.cs.vu.nl/~fjseins/ParHorusCode/

Page 16: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

16 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 2

Why Parallel Computing in MMCA (and how)?

Page 17: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

17 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

The ‘Need for Speed’ in MMCA

Growing interest in international ‘benchmark evaluations’ Task: find ‘semantic concepts’ automatically

Example: NIST TRECVID (200+ hours of video)

A problem of scale: At least 30-50 hours of processing time per hour of video

Beel&Geluid: 20,000 hours of TV broadcasts per year NASA: over 10 TB of hyper-spectral image data per day London Underground: over 120,000 years of processing…!!!

Page 18: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

18 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

High Performance Computing

Solution: Parallel & distributed computing at a very large scale

GPUs

Accelerators

General Purpose CPUs

Clusters

Grids

Question: What type of high-performance hardware is most suitable?

Our initial choice: Clusters of general purpose CPUs (e.g. DAS-cluster) For many pragmatic reasons…

Page 19: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

19 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

For non-experts in Parallel Computing?

Effort

Efficiency

Automatic Parallelizing

Compilers

Extended High Level

Languages (e.g., HPF)

Parallel Languages

(e.g., Occam, Orca)

Shared Memory

Specifications (e.g., OpenMP)

Message Passing

Libraries (e.g., MPI, PVM)

User Transparent

Parallelization Tools

Parallel Image Processing Libraries

Parallel Image Processing

Languages (e.g., Apply, IAL)

Page 20: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

20 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Existing Parallel Image Processing Libs

Suffer from many problems: No ‘familiar’ programming model:

Identifying parallelism still the responsibility of programmer (e.g. data partitioning [Taniguchi97], loop parallelism [Niculescu02, Olk95])

Reduced maintainability / portability: Multiple implementations for each operation

[Jamieson94] Restricted to particular machine [Moore97, Webb93]

Non-optimal efficiency of parallel execution: Ignore machine characteristics for optimization

[Juhasz98, Lee97] Ignore optimization across library calls [all]

Page 21: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

21 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Our Approach

Sustainable software library for user-transparent parallel image processing (1) Sustainability:

Maintainability, extensibility, portability (i.e. from Horus)

Applicability to commodity clusters

(2) User transparency: Strictly sequential API (identical to Horus) Intra-operation efficiency & inter-operation efficiency

Page 22: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

22 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 3 (a)

Software Platform: Parallel-Horus (parallel algorithms)

Page 23: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

23 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

What Type(s) of Parallelism to support?

Data parallelism: “exploitation of concurrency that derives from

the application of the same operation to multiple elements of a data structure” [Foster, 1995]

Task parallelism: “a model of parallel computing in which many

different operations may be executed concurrently” [Wilson, 1995]

Page 24: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

24 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Why Data Parallelism (only)?

Natural approach for low level image processing

Scalability (in general: #pixels >> #different tasks) Load balancing is easy Finding independent tasks automatically is hard In other words: it’s just the best starting point…

(but not necessarily optimal at all times)

Page 25: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

25 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Many Algorithms Embarrassingly Parallel

Parallel Operation on Image

{

Scatter Image (1)

Sequential Operation on Partial Image (2)

Gather Result Data (3)

}

On 2 CPUs:

(1)(3)

(2)

Works (with minor issues) for: unary, binary, n-ary operations & (n-) reduction operations

Page 26: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

26 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Other only marginally more complex (1)

On 2 CPUs (without scatter / gather):

Parallel Filter Operation on Image

{

Scatter Image (1)

Allocate Scratch (2)

Copy Image into Scratch (3)

Handle / Communicate Borders (4)

Sequential Filter Operation on Scratch (5)

Gather Image (6)

} Also possible: ‘overlapping’ scatter

But not very useful in iterative filtering

SCRATCH

SCRATCH

Page 27: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

27 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Other only marginally more complex (2)

On 2 CPUs (without broadcast / gather):

Potential faster implementations for special cases

Parallel Geometric Transformation on Image

{

Broadcast Image (1)

Create Partial Image (2)

Sequential Transform on Partial Image (3)

Gather Result Image (4)

}

RESULT

IMAGE

RESULT

IMAGE

Page 28: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

28 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Challenge: Separable Recursive Filtering

+ =Template / Kernel / Filter /

Neighborhood Operation

(example: Gauss filter)

+ =

+ =

Equivalent:

… followed by …

Page 29: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

29 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Challenge: Separable Recursive Filtering

Separable filters: 1 x 2D becomes 2 x 1D Drastically reduces sequential computation time

Recursive filtering: result of each filter step (a pixel value) stored back into input image So: a recursive filter uses (part of) its output as input

Page 30: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

30 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Parallel Recursive Filtering: Solution 1

(SCATTER) (GATHER)(TRANSPOSE) (FILTER Y-dir)(FILTER X-dir)

Drawback: transpose operation is very expensive (esp. when nr. CPUs is large)

Page 31: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

31 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Parallel Recursive Filtering: Solution 2

P0

P1

P2

Loop carrying dependence at final stage (sub-image level)

minimal communication overhead full serialization

P0

P1

P2

Loop carrying dependence at innermost stage (pixel-column level)

high communication overhead fine-grained wave-front parallelism

P0

P1

P2

Tiled loop carrying dependence at intermediate stage (image-tile level)

moderate communication overhead coarse-grained wave-front parallelism

Page 32: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

32 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Wavefront parallelism

CPU 0

CPU 1

CPU 2

CPU 3

Drawback: partial serialization non-optimal use of available

CPUs

Page 33: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

33 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Parallel Recursive Filtering: Solution 3

CPU 0

CPU 1

CPU 2

CPU 3

Multipartitioning: Skewed cyclic block partitioning Each CPU owns at least one tile

in each of the distributed dimensions

All neighboring tiles in a particular direction are owned by the same CPU

Page 34: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

34 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Parallel Recursive Filtering: Solution 3

CPU 0

CPU 1

CPU 2

CPU 3

Full Parallelism: First in one direction… And then in other… Border exchange at end of each

sweep Communication at end of sweep

always with same node

Page 35: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

35 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 3 (b)

Software Platform: Parallel-Horus (platform design)

Page 36: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

36 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Parallel-Horus: Parallelizable Patterns

Horus

Parallelizable Patterns Parallel Extensions

MPI

Minimal intrusion:

Sequential API

Re-use as much as possible the original sequential Horus library codes

Parallelization localized Easy to implement extensions

Page 37: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

37 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Pattern implementations (old vs. new)

template<class …, class …, class …>inline DstArrayT*CxPatUnaryPixOp(… dst, … src, … upo){ if (dst == 0) dst = CxArrayClone<DstArrayT>(src);

if (!PxRunParallel()) { // run sequential CxFuncUpoDispatch(dst, src, upo);

} else { // run parallel PxArrayPreStateTransition(src, …, …); PxArrayPreStateTransition(dst, …, …); CxFuncUpoDispatch(dst, src, upo); PxArrayPostStateTransition(dst); } return dst;}

template<class …, class …, class …>inline DstArrayT*CxPatUnaryPixOp(… dst, … src, … upo){ if (dst == 0) dst = CxArrayClone<DstArrayT>(src);

CxFuncUpoDispatch(dst, src, upo);

return dst;}

Page 38: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

38 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Inter-Operation Optimization

Don’t do this:

ImageOpImageOp ScatterScatter Gather Gather

Do this:

ImageOpScatter Avoid Communication ImageOp Gather

On the fly!

Lazy Parallelization:

Page 39: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

39 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Finite State Machine

Communication operations serve as state transition functions between distributed data structure states

State transitions performed only when absolutely necessary

State transition functions allow correct conversion of legal sequential code to legal parallel code at all times

Nice features: Requires no a priori knowledge of

loops and branches Can be done on the fly at run-time

(with no measurable overhead)

Page 40: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

40 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 4

Example – Parallel Image Processing on Clusters

Page 41: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

41 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Example: Curvilinear Structure Detection

Apply anisotropic Gaussian filter bank to input image

Maximum response when filter tuned to line direction

Here 3 different implementations fixed filters applied to a rotating image rotating filters applied to fixed input image

separable (UV) non-separable (2D)

Depending on parameter space: few minutes - several hours

Page 42: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

42 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Sequential = Parallel (1)

for all orientations theta {

geometricOp ( inputIm, &rotatIm, -theta, LINEAR, 0, p, “rotate” );

for all smoothing scales sy {

for all differentiation scales sx {

genConvolution ( filtIm1, mirrorBorder, “gauss”, sx, sy, 2, 0 );

genConvolution ( filtIm2, mirrorBorder, “gauss”, sx, sy, 0, 0 );

binaryPixOpI ( filtIm1, filtIm2, “negdiv” ); binaryPixOpC ( filtIm1, sx*sy, “mul” ); binaryPixOpI ( contrIm, filtIm1, “max” ); } }

geometricOp ( contrIm, &backIm, theta, LINEAR, 0, p, “rotate” );

binaryPixOpI ( resltIm, backIm, “max” );} IMPLEMENTATION 1

Page 43: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

43 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Sequential = Parallel (2 & 3)

for all orientations theta {

for all smoothing scales sy {

for all differentiation scales sx {

genConvolution (filtIm1, mirrorBorder, “func”, sx, sy, 2, 0 );

genConvolution (filtIm2, mirrorBorder, “func”, sx, sy, 0, 0 );

binaryPixOpI (filtIm1, filtIm2, “negdiv”); binaryPixOpC (filtIm1, sx*sy, “mul”); binaryPixOpI (resltIm, filtIm1, “max”); } }

}

IMPLEMENTATIONS 2 and 3

Page 44: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

44 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Measurements (DAS-1)

Performance

2085.985

20.017437.641

4.813

25.837

666.720

0

500

1000

1500

2000

2500

0 30 60 90 120

Nr. CPUs

Tim

e (

s)

Conv2D

ConvUV

ConvRot

Scaled Speedup

90.93

104.21

25.80

0

30

60

90

120

0 30 60 90 120

Nr. CPUsS

pee

du

p Linear

Conv2D

ConvUV

ConvRot

512x512 image 36 orientations 8 anisotropic filters

=> Part of the efficiency of parallel execution always remains in the hands of the application programmer!

Page 45: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

45 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Measurements (DAS-2)

512x512 image 36 orientations 8 anisotropic filters

So: lazy parallelization (or: optimization across library calls) is very important for high efficiency!

LazyPar on on off off#Nodes Conv2D ConvUV Conv2D ConvUV

1 425.115 185.889 425.115 185.8892 213.358 93.824 237.450 124.1694 107.470 47.462 133.273 79.8478 54.025 23.765 82.781 60.158

16 27.527 11.927 55.399 47.40724 18.464 8.016 48.022 45.72432 13.939 6.035 42.730 43.05048 9.576 4.149 38.164 40.94464 7.318 3.325 36.851 41.265

Speedup

0

16

32

48

64

0 16 32 48 64

#Nodes

Sp

eed

up Linear

Conv2D

ConvUV

Conv2D

ConvUV

Page 46: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

46 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 5

‘Grids’ and their Specific Problems

Page 47: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

47 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

The ‘Promise of The Grid’

1997 and beyond: efficient and transparent (i.e. easy-to-use) wall-socket

computing over a distributed set of resources

Compare electrical power grid:

Page 48: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

48 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Getting an account on remote compute clusters is hard! Find the right person to contact… Hope he/she does not completely ignore your request… Provide proof of (a.o.) relevance, ethics, ‘trusted’ nationality… Fill in and sign NDA’s, Foreign National Information sheets, official

usage documents, etc… Wait for account to be created, & username to be sent to you… Hope to obtain an initial password as well…

Getting access to an existing international Grid-testbed is easier But only marginally so…

Grid Problems (1)

Page 49: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

49 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Grid Problems (2)

Getting your C++/MPI code to compile and run is hard! Copying your code to the remote cluster (‘scp’ often not allowed)… Setting up your environment & finding the right MPI compiler

(mpicc, mpiCC, … ???)… Making the necessary include libraries available… Find out how to use the cluster reservation system… Finding the correct way to start your program

(mpiexec, mpirun, … and on which nodes ???)… Getting your compute nodes to communicate with other machines

(generally not allowed)…

So: Nothing is standardized yet (not even Globus) A working application in one Grid domain will generally fail in all

other

Page 50: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

50 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Grid Problems (3)

Keeping an application running (efficiently) is hard! Grids are inherently dynamic:

Networks and CPUs are shared with others, causing fluctuations in resource availability

Grids are inherently faulty: compute nodes & clusters may crash at any time

Grids are inherently heterogeneous: optimization for run-time execution efficiency is by-and-large

unknown territory

So: An application that runs (efficiently) at one moment should be

expected to fail a moment later

Page 51: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

51 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Realizing the ‘Promise of the Grid’

Set of fundamental methodologies required Each solving part of the Grid’s complexities

For most of these methodologies solutions exist today: Ibis: IPL, SmartSockets, JavaGAT, IbisDeploy Parallel-Horus (or the Ibis version: Jorus)

!

Page 52: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

52 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 6

A Software Platform for MMCA on Grids

Page 53: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

53 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Wide-area Multimedia Services

Parallel

Horus

Client

Parallel

Horus

Client

Services on clusters world-wide Respond to client requests

Parallel

Horus

Server

Parallel

Horus

Servers

Parallel

Horus

Servers

Each server runs in data parallel manner Client requests executed fully asynchronously Task parallel execution of data parallel services

Page 54: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

54 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Situation in 2005

Parallel

Horus

Client

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Servers

Parallel

Horus

Servers

C++C++

MPIMPI

SocketsSockets

SSH (incl. tunneling)SSH (incl. tunneling)

Instable / faulty communication Execution on each cluster ‘by hand’ Connectivity problems Code pre-installed at each cluster site

Page 55: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

55 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Situation in 2009

Parallel

Horus

Client

Parallel

Horus

Client

Parallel

Horus

Server

Parallel

Horus

Servers

Parallel

JorusJorus

Servers

IPL / SmartSocketsIPL / SmartSockets

IbisDeploy / JavaGATIbisDeploy / JavaGAT

All Java / Ibis Overall C++ 10% faster than Java… …but much easier to use on a worldwide scale You have seen the video…

Page 56: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

56 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 7

Large-scale MMCA Applications on ‘Grids’

Page 57: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

57 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Our Solution: Place ‘retina’ over input image Each of 37 ‘retinal areas’ serves as a ‘receptive field’ For each receptive field:

Obtain 6 local histograms, invariant to shading / lighting Estimate Weibull parameters ß and γ for each histogram

=> scene description by set of 37x6x2 = 444 parameters

+ =

Color Based Object Recognition (1)

=> 444-valued => 444-valued feature vectorfeature vector

Page 58: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

58 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Learning phase: Set of 444 parameters is stored in database So: learning from 1 example, under single

visual setting

Recognition phase: Validation by showing objects under at least 50 different conditions:

Lighting direction Lighting color Viewing position

“a hedgehog”

Color Based Object Recognition (2)

Page 59: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

59 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

In laboratory setting (1000 objects): 300 objects correctly recognized under all (!) visual conditions 700 remaining objects ‘missed’ under extreme conditions only

Color Based Object Recognition (3)

Page 60: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

60 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Page 61: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

61 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Page 62: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

62 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

0

8

16

24

32

40

48

56

64

0 8 16 24 32 40 48 56 64

Nr. of CPUs

Sp

eed

up

linear

client

0

16

32

48

64

80

96

0 16 32 48 64 80 96

Nr. of CPUs

Sp

eed

up

linear

client

Single cluster, client side speedup Four clusters, client side speedup

Recognition on single machine: +/- 30 seconds Using multiple clusters: up to 10 frames per second Insightful: ‘distant’ clusters can be used effectively

Results on DAS-2

Page 63: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

63 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Part 8

Future Research Directions

Page 64: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

64 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Current / Future Research

Applicability of graphics processors (GPUs) and other accelerators NVIDIA, CELL Broadband Engine, FPGAs: Can we make these ‘easily’ programmable?

Concurrent use of the complex set of heterogeneous and hierarchical hardware as available in ‘the real world’

Page 65: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

65 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

The End

contact & information:

[email protected]

http://www.cs.vu.nl/~fjseins/

Page 66: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

66 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Intermediate Level MMCA Algorithms

Appendix

Page 67: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

67 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Feature Vector A labeled sequence of (scalar) values Each (scalar) value represents image data related property Label: from user annotation or from automatic clustering

Let’s call this: “FIRE”

(histogram)

1 2 3 4 5 6 6 6 6 5 4 3 3=

Example:

(can be approximated by

a mathematical function,

e.g. a Weibull distribution;

only 2 parameters ‘ß’, ‘γ’)

<FIRE, ß=0.93, γ=0.13>

Feature Vectors

Page 68: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

68 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Sky

Sky

USAFlag

Sky

Road

Annotation for low level ‘visual words’ Define N low level visual concepts Assign concepts to image regions For each region, calculate feature vector:

<SKY, ß=0.93, γ=0.13, … > <SKY, ß=0.91, γ=0.15, … > <SKY, ß=0.97, γ=0.12, … > <ROAD, ß=0.89, γ=0.09, … > <USA FLAG, ß=0.99, γ=0.14, … >

N human-defined ‘visual words’, each having multiple descriptions

Annotation (low level)

Page 69: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

69 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Example: Split image in X regions, and obtain feature vector for each All feature vectors have position in high-dimensional space Clustering algorithm applied to obtain N clusters -> N non-human ‘visual words’, each with multiple descriptions

Label1 Label2

Label3

Label4 Label5

Alternative: Clustering

Page 70: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

70 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Compute similarity between each image region

. . .

. . . Sky Grass Road

with each low level visual word…

…and count the number of region-matches with each visual word e.g.: 3 x ‘Sky’; 7 x ‘Grass’; 4 x ‘Road’; …

Defines accumulated feature vector for a full image

Feature Vectors for Full Images

Page 71: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

71 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

Annotation for high level ‘semantic concepts’ Define M high level visual concepts, e.g.:

‘sports event’ ‘outdoors’ ‘airplane’ ‘president Bush’ ‘traffic situation’ ‘human interaction’, …

For all images in a known (training) set, assign all appropriate high level concepts

outdoors traffic situation

president Bush human interaction

outdoors airplane traffic situation

Annotation (high level)

Page 72: Parallel Computing 2009 – Vrije Universiteit, Amsterdam1 Multimedia Content Analysis on Clusters and Grids Frank J. Seinstra (fjseins@cs.vu.nl)fjseins@cs.vu.nl

72 Parallel Computing 2009 – Vrije Universiteit, Amsterdam

The new, accumulated feature vectors again define positions in a high-dimensional space

Classification defines a separation boundary in that space, given the known high-level concepts

NOT ‘sports event’

‘sports event’ ‘Recognition’: position accumulated feature vector see on which side of the boundary it is distance to boundary defines probability

(so we can provide ranked results)

‘Recognition’ by Classification