parallel implementation of geodesic distance transform with application in superpixel segmentation

8
PARALLEL IMPLEMENTATION OF GEODESIC DISTANCE TRANSFORM WITH APPLICATION IN SUPERPIXEL SEGMENTATION Tuan Q. Pham Canon Information Systems Research Australia (CiSRA) 1 Thomas Holt drive, North Ryde, NSW 2113, Australia. [email protected] ABSTRACT This paper presents a parallel implementation of geodesic dis- tance transform using OpenMP. We show how a sequential- based chamfer distance algorithm can be executed on parallel processing units with shared memory such as multiple cores on a modern CPU. Experimental results show a speedup of 2.6 times on a quad-core machine can be achieved without loss in accuracy. This work forms part of a C implementation for geodesic superpixel segmentation of natural images. Index Termsgeodesic distance transform, OpenMP, superpixel segmentation 1. INTRODUCTION Due to a raster order organisation of pixels in an image, many image processing algorithms operate in a sequential fashion. This sequential processing is suitable for running on a single processor system. However, even Personal Computers (PC) now have multiple processing cores. In fact, the number of cores on a chip is likely to double every 18 months to sus- tain Moore’s law [23]. As a result, there is a strong need to parallelise existing image processing algorithms to run more efficiently on multi-core hardware. OpenMP (Open Multi-Processing) is a powerful yet simple-to-use application programming interface that sup- ports many functionalities for parallel programming. OpenMP uses a shared-memory model, in which all threads share a common address space. Each thread can have additional pri- vate data under explicit user control. This shared-memory model simplifies the task of programming because it avoids the need to synchronise memory across different processors on a distributed system. The shared-memory model also fits well with the multi-core architecture of modern CPUs. Parallel programming using OpenMP has gained signif- icant interests in the image processing community in recent years. In 2010, the IEEE Signal Processing Society dedicated a whole issue of its flagship publication, the IEEE Signal Pro- cessing Magazine, to signal processing on multiple core plat- forms. In this issue, Slabaugh et al. demonstrated a 2- to 4- time speedup of several popular image processing algorithms on a quad-core machine using OpenMP [25]. The demon- strated algorithms involve either pixel-wise processing (im- age warping, image normalisation) or small neighbourhood- wise processing (binary morphology, median filtering). All of these algorithms generate the output at each pixel inde- pendently of those at other output pixels. As a result, they are naturally extendable to parallel implementation. This type of data-independent task parallelisation can even be done au- tomatically by a compiler [11]. Parallel implementation of sequential-based image processing algorithms, however, still requires manual adaptation by an experienced programmer. In this paper, we present a parallel implementation of Geodesic Distance Transform (GDT) using OpenMP. GDT accepts a greyscale cost image together with a set of seed points. It outputs a distance transform image whose intensity at each pixel is the geodesic distance from that pixel to a nearest seed point. The geodesic distance between two points is the sum of pixel costs along a minimum-cost path con- necting these two points. The nearest seed mapping forms an over-segmentation of the input image [18, 29]. Fast image segmentation is the main reason why a parallel implementa- tion of GDT is desirable [10, 2, 7, 28]. There are two main approaches to GDT estimation: a chamfer distance propaga- tion algorithm [15] and a wavefront propagation algorithm [27]. Both algorithms are sequential in nature, i.e. they are not directly parallelisable. The chamfer algorithm was se- lected for parallelisation in this paper due to its simple raster scan access over the image data. The rest of the paper is organised as follows. Section 2 provides some background on GDT and the chamfer distance propagation algorithm. Section 3 reviews previous attempts in the literature to parallelise (Euclidean) distance transform. Our proposed parallel implementation of GDT is presented in Section 4. Section 5 evaluates the speed and accuracy of our parallel implementation on different images and differ- ent computers. Section 6 presents an application of GDT in superpixel segmentation of images. Section 7 concludes the paper.

Upload: tuan-q-pham

Post on 14-Jun-2015

892 views

Category:

Technology


0 download

DESCRIPTION

T.Q. Pham, in Proceedings of DICTA 2013, pp.19-26.

TRANSCRIPT

Page 1: Parallel implementation of geodesic distance transform with application in superpixel segmentation

PARALLEL IMPLEMENTATION OF GEODESIC DISTANCE TRANSFORM WITHAPPLICATION IN SUPERPIXEL SEGMENTATION

Tuan Q. Pham

Canon Information Systems Research Australia (CiSRA)1 Thomas Holt drive, North Ryde, NSW 2113, Australia.

[email protected]

ABSTRACT

This paper presents a parallel implementation of geodesic dis-tance transform using OpenMP. We show how a sequential-based chamfer distance algorithm can be executed on parallelprocessing units with shared memory such as multiple coreson a modern CPU. Experimental results show a speedup of2.6 times on a quad-core machine can be achieved withoutloss in accuracy. This work forms part of a C implementationfor geodesic superpixel segmentation of natural images.

Index Terms— geodesic distance transform, OpenMP,superpixel segmentation

1. INTRODUCTION

Due to a raster order organisation of pixels in an image, manyimage processing algorithms operate in a sequential fashion.This sequential processing is suitable for running on a singleprocessor system. However, even Personal Computers (PC)now have multiple processing cores. In fact, the number ofcores on a chip is likely to double every 18 months to sus-tain Moore’s law [23]. As a result, there is a strong need toparallelise existing image processing algorithms to run moreefficiently on multi-core hardware.

OpenMP (Open Multi-Processing) is a powerful yetsimple-to-use application programming interface that sup-ports many functionalities for parallel programming. OpenMPuses a shared-memory model, in which all threads share acommon address space. Each thread can have additional pri-vate data under explicit user control. This shared-memorymodel simplifies the task of programming because it avoidsthe need to synchronise memory across different processorson a distributed system. The shared-memory model also fitswell with the multi-core architecture of modern CPUs.

Parallel programming using OpenMP has gained signif-icant interests in the image processing community in recentyears. In 2010, the IEEE Signal Processing Society dedicateda whole issue of its flagship publication, the IEEE Signal Pro-cessing Magazine, to signal processing on multiple core plat-forms. In this issue, Slabaugh et al. demonstrated a 2- to 4-time speedup of several popular image processing algorithms

on a quad-core machine using OpenMP [25]. The demon-strated algorithms involve either pixel-wise processing (im-age warping, image normalisation) or small neighbourhood-wise processing (binary morphology, median filtering). Allof these algorithms generate the output at each pixel inde-pendently of those at other output pixels. As a result, theyare naturally extendable to parallel implementation. This typeof data-independent task parallelisation can even be done au-tomatically by a compiler [11]. Parallel implementation ofsequential-based image processing algorithms, however, stillrequires manual adaptation by an experienced programmer.

In this paper, we present a parallel implementation ofGeodesic Distance Transform (GDT) using OpenMP. GDTaccepts a greyscale cost image together with a set of seedpoints. It outputs a distance transform image whose intensityat each pixel is the geodesic distance from that pixel to anearest seed point. The geodesic distance between two pointsis the sum of pixel costs along a minimum-cost path con-necting these two points. The nearest seed mapping forms anover-segmentation of the input image [18, 29]. Fast imagesegmentation is the main reason why a parallel implementa-tion of GDT is desirable [10, 2, 7, 28]. There are two mainapproaches to GDT estimation: a chamfer distance propaga-tion algorithm [15] and a wavefront propagation algorithm[27]. Both algorithms are sequential in nature, i.e. they arenot directly parallelisable. The chamfer algorithm was se-lected for parallelisation in this paper due to its simple rasterscan access over the image data.

The rest of the paper is organised as follows. Section 2provides some background on GDT and the chamfer distancepropagation algorithm. Section 3 reviews previous attemptsin the literature to parallelise (Euclidean) distance transform.Our proposed parallel implementation of GDT is presentedin Section 4. Section 5 evaluates the speed and accuracy ofour parallel implementation on different images and differ-ent computers. Section 6 presents an application of GDT insuperpixel segmentation of images. Section 7 concludes thepaper.

Page 2: Parallel implementation of geodesic distance transform with application in superpixel segmentation

source

destination

0

0.2

0.4

0.6

0.8

1

minimum path, cost = 1.7

straight path, cost = 11.1

a) cost image f(x, y)

seed

1

2

3

4

5

6

7

8

9

10

b) geodesic distance transform

Fig. 1. Minimum-cost path versus straight path on an unevencost surface generated by the membrane function in Matlab.

2. BACKGROUND ON GEODESIC DISTANCETRANSFORM

Geodesic distance or topographical distance [16] is a grey-weighted distance between two points on a greyscale cost sur-face. The geodesic distance is calculated as the sum of pixelcosts along a minimum-cost path joining the two points. Anexample is illustrated in Figure 1a, where the image intensi-ties f(x, y) represent the cost of traversing each pixel. Twodifferent paths from a source point in the middle of the imageto a destination point at the top-right corner are drawn. Theminimum cost path in dotted cyan line, despite being a longerpath, integrates over a smaller total cost than the straight pathin magenta (1.7 versus 11.1). The cost image f can be seenas a terrain surface, where the red blob corresponds to a highmountain. Figure 1a basically illustrates that going across asteep mountain incurs a much higher cost than going aroundits flat base to reach the other side. Figure 1b shows the GDTof the image in Figure 1a given one seed point at the centre ofthe image. The intensity of each pixel represents the geodesicdistance from that pixel to the central seed point.

2.1. Chamfer distance propagation algorithm

GDT can be estimated efficiently using chamfer distancepropagation [21]. The path between two pixels is approx-imated by discrete line segments of 1- or

√2-pixel length

connecting a pixel with one of its eight immediate neigh-bours. Initially, the distance transform at every pixel is set toinfinity except at locations of the seed points where the dis-tance transform is zero. The distance transform at every pixelis then updated by an iterative distance propagation process.Each iteration comprises two passes over the image. A for-ward pass scans the image rows from top to bottom, each rowis scanned from left to right (Figure 2a). A backward passscans the image rows from bottom up, each row is scannedfrom right to left (Figure 2b).

The forward pass propagates the distance transform offour causal neighbours (shaded grey in Figure 2a) to the cur-

a) forward propagation b) backward propagation

Fig. 2. One iteration of distance propagation comprises of aforward pass followed by a backward pass.

rent pixel P (x, y) according to equation (1):

d(x, y) = min

d(x− 1, y − 1) + bf(x, y)d(x, y − 1) + af(x, y)

d(x+ 1, y − 1) + bf(x, y)d(x− 1, y) + af(x, y)

(1)

where a = 0.9619 ≈ 1 and b = 1.3604 ≈√2 are opti-

mal chamfer coefficients for a 3×3 neighbourhood [4]. Sim-ilarly, the backward pass propagates the distance transformfrom four anti-causal neighbours (shaded grey in Figure 2b)to the current pixel P (x, y) according to equation (2):

d(x, y) = min

d(x+ 1, y + 1) + bf(x, y)d(x, y + 1) + af(x, y)

d(x− 1, y + 1) + bf(x, y)d(x+ 1, y) + af(x, y)

(2)

Equations (1) and (2) apply to pixels which have a fullset of 8 immediate neighbours. Pixels at image border needa different treatment because some of the neighbours are outof bound. These out-of-bound neighbours are ignored in thedistance propagation equations (1) and (2).

2.2. Example

An example of GDT given more than one seed points is givenin Figure 3. Figure 3a-b show an input image and its gradientenergy, respectively. The gradient energy is used as a non-negative cost image, from which the GDT is computed. Fourseed points are shown as circles of different colours in Fig-ure 3b. Figure 3c-d show intermediate distance transformsafter a first forward and a first backward pass through thecost image (blue=low distance, red=high distance). In thefirst forward pass, the top-left region of the distance trans-form is not updated because these pixels do not have a seedin their causal path. After the first backward pass, the dis-tance transform gradually settles into its final form beforeconverging at the twentieth iteration (which looks very similarto the GDT after 10 iterations in Figure 3e). Many iterationsare required because the minimum-cost paths are usually notstraight, they require multiple distance propagations from dif-ferent directions. Fortunately, fewer iterations are required if

Page 3: Parallel implementation of geodesic distance transform with application in superpixel segmentation

(a) Input (320×240) (b) Gradient energy

(c) intermediate GDTafter 1st forward pass

(d) intermediate GDTafter 1st backward pass

(e) GDT after 10 itera-tions

(f) nearest seed labelafter 1st forward pass

(g) nearest seed labelafter 1st backward pass

(h) nearest seed labelafter 10 iterations

Fig. 3. Geodesic distance transform and nearest seed labelcomputed from the gradient energy image with 4 seed points.

there are more seeds because the geodesic paths generally be-come shorter, hence do not contain many twists and turns.

The last row of Figure 3 shows the corresponding near-est seed labels of the intermediate distance transforms in thesecond row. Each coloured segment corresponds to a set ofpixels with a common nearest seed point. Pixels with thesame coloured label should be connected because they areconnected to the common seed point via some geodesic paths.Fragmentation happens on Figure 3g because this is an inter-mediate result. After the GDT converges, the segmentationboundaries generally trace out strong edges in the scene (Fig-ure 3h). This leads to a geodesic image segmentation algo-rithm to be presented later in Section 6.

3. LITERATURE SURVEY ON PARALLELDISTANCE TRANSFORM

Most previous techniques on parallel distance transform com-pute Euclidean Distance Transform (EDT) instead of GDT.EDT accepts a binary image and returns the Euclidean dis-tance from each pixel to a nearest nonzero pixel in the binaryimage. EDT is a special case of GDT when the cost imageis constant and positive. A squared Euclidean distance r2

can be decomposed into two components x2 + y2, each ofwhich can be estimated independently using a Voronoi dia-gram of the nonzero pixels in the binary image [6]. A parallelimplementation of EDT using OpenMP on a 24-core systemachieves 18-time speedup [14]. A parallel implementation ofthe chamfer EDT was presented by Shyu et al. in [24]. This

Fig. 4. Image partitioning strategy for a parallel chamfer dis-tance transform on a distributed system [24] (the distances ofshaded pixels are transmitted across processors).

method computes the EDT on a distributed system. As a re-sult, the intermediate results across different processors haveto be synchronised using Message Passing Interface (MPI).Similar to the original chamfer algorithm in [21], Shyu etal.’s implementation requires two passes over the image: aforward pass to propagate the distance transform from causalneighbours, followed by a backward pass to propagate the dis-tance transform from anti-causal neighbours.

To parallelise these sequential passes, Shyu et al. parti-tions the input image into bands, the distance computation ofeach band is assigned to a processor. At each processor, theimage band is further partitioned into parallelograms. Thelabel of each parallelogram in Figure 4 specifies its order ofprocessing (partitions n and n′ are processed concurrently).Due to the propagation of causal information, the parallelo-gram labelled 3′ on the second band must wait for the resultof the parallelogram labelled 2 on the first band. The EDT ofthe last row of parallelogram 2 (shaded grey) must be trans-mitted to the next processor before parallelogram 3′ can beprocessed. After this first data transmission, processor 1 and2 can work in parallel on its partition 3 and 3′, respectively.This process of local distance propagation followed by datatransmission repeats for partition 4 and 4′ and so on.

4. PARALLEL GEODESIC DISTANCE TRANSFORM

This section presents our parallel implementation of GDT us-ing OpenMP. Our implementation is motivated by the parallelimplementation of the chamfer distance transform in [24].Shyu et al.’s implementation, however, targets distributedmemory systems, in which data need to be synchronisedacross processors by message passing. Using the sharedmemory model present in multicore CPUs, we avoid the needto synchronise data.

The iterative nature of GDT also allows a simpler imagepartitioning strategy. Unlike EDT, GDT requires more thanone iterations of forward+backward passes. As a result, theGDT can be propagated from one image band to the next ina subsequent iteration rather than within the current pass likein [24]. Our implementation therefore only uses a band-basedimage partitioning across different processors. This fits wellwith the parallel for construct in OpenMP.

Page 4: Parallel implementation of geodesic distance transform with application in superpixel segmentation

Fig. 5. Band-based image partitioning strategy for parallelimplementation of geodesic distance transform in OpenMP(shaded pixels are visited in the current propagation iteration).

Figure 5 illustrates our band-based image partitioningstrategy for a forward propagation of the GDT. The first im-age row is processed by the master thread outside any parallelprocessing block. The first row is treated differently fromthe rest because pixels on the first row have only one causalneighbour. The remaining image rows are partitioned intonon-overlapping bands of equal height (called chunk size inOpenMP terminology). Each band is processed concurrentlyby a different thread. If there are more bands than the totalnumber of threads, the unprocessed bands will be assigned tothreads in a round-robin fashion (static scheduling) or to thenext available thread (dynamic scheduling).

A pseudo code of the parallel implementation of GDTin OpenMP is given in Algorithm 1. Details of the dis-tance propagation are handled in the functions fwdProp(),forwardPropagationFirstRow(), bwdProp(),and backwardPropagationLastRow(). This pseudocode differs from a non-parallel implementation of GDT onlyin the shaded lines, where a compiler directive appears justbefore a standard for loop in C. This omp parallelfor directive tells the master thread to create a team of paral-lel threads to process the for loop iterations. When the teamof threads completes the statements in the for loop, theysynchronise and terminate, leaving only the master threadrunning. This process is known as the fork-join model ofparallel execution [5].

One important requirement in parallel programming is theparallel region must be thread-safe. In order words, each it-eration of the for loop should be able to be executed inde-pendently without interaction across different threads (e.g.,no data dependencies). In GDT, this means the distance prop-agation within one band should not wait for the result of theprevious band. Thread 2 on Figure 5, for example, should notwait until Thread1 finishes the computation of band 1. Thismeans the GDT of band 1 is not propagated to band 2 withinthe current iteration (it will be in the next iteration). To avoiddata dependencies and racing conditions , private variablesundergoing change within each thread should be declared inthe private clause of the parallel for directive.

Because the computed distances from one thread are notused by other threads within the current iteration, it may

Algorithm 1 Parallel chamfer distance transform (shadedrows are compiler directives to enable parallel computation).

1 f o r ( i t e r =0 ; i t e r <10; i t e r ++ )2 { . . . . . . .3 / / Forward p r o p a g a t i o n4 f o r w a r d P r o p a g a t i o n F i r s t R o w ( . . . ) ;

5 #pragma omp parallel for private( ... private variable declarations ... )

6 f o r ( i =1 ; i<h e i g h t ; i ++ ) { fwdProp ( . . . ) ; }78 / / Backward p r o p a g a t i o n9 backwardPropaga t ionLas tRow ( . . . ) ;

10 #pragma omp parallel for private( ... private variable declarations ... )

11 f o r ( i = h e i g h t −2; i >=0; i− − ) {bwdProp ( . . . ) ;}12 } / / End o f i t e r a t i v e chamfer d i s t a n c e p r o p a g a t i o n

take longer for the GDT to propagate distances from the topband to the bottom band and vice versa. However, given adense sampling of seed points, each seed point only has alimited spatial range of influence. In other words, the dis-tance transform at one pixel is never propagated for morethan a few bands away. The range of influence depends onseed density and chunk size. In general, a few iterations offorward+backward propagation (fewer than 30) are sufficientfor most cases.

5. EVALUATION

We compare three different implementations of chamfer-based geodesic distance transform: non-parallel, parallelusing OpenMP with static scheduling (i.e. round-robin as-signment of threads to iterations), and parallel using OpenMPwith dynamic scheduling (tasks are assigned to a next avail-able thread). Given an input image, the cost image is com-puted from the gradient energy plus a constant regularisationoffset (e.g., the median gradient energy value), and the seedsfrom local gradient minima. Low-amplitude random noise isadded to the cost image to produce envenly distributed localminima even in flat image regions.

5.1. Task scheduling model and chunk size

OpenMP allows two main type of task scheduling: staticscheduling, where blocks of iterations are assigned to threadsin a round-robin fashion, and dynamic scheduling, wherethe next block of iterations is assigned to the next availablethread. The size of each block, a.k.a the chunk size, is con-figurable. For static scheduling, the default chunk size is thenumber of iterations (i.e. number of image rows in our case)divided by the number of threads.

To compare different scheduling methods and chunksizes, we ran GDT on a 1936×1288 cost image (the gradientenergy of the image in Figure 9) with 1017 evenly distributedseeds and measured the runtimes. The seeds were selected as

Page 5: Parallel implementation of geodesic distance transform with application in superpixel segmentation

a)2.8GHz quad-core(8 threads) b)2.4GHz dual-core(2 threads)

Fig. 6. Runtime as a function of chunk size for different paral-lel implementations of GDT on a 2MP image with 1017 seedsand roughly 30 iterations of distance propagation.

local minima of the cost image using non-maximum suppres-sion (NMS) [19] with a suppression radius (i.e. minimumseparation distance) of 20 pixels. The GDT converges in 30to 31 iterations for all runs with chunk size greater than 10.The same experiment was carried out on two different ma-chines: an Intel Xeon 2.8 GHz quad-core processor with 12GB of RAM and Microsoft Visual Studio 2010 compiler, andan Intel Core 2 Duo P9400 2.4 GHz dual-core processor with4 GB of RAM and Microsoft Visual Studio 2005 compiler.The runtimes on these two machines are plotted in Figure 6for different chunk size, where each data point is averagedover ten repeated runs.

Several conclusions can be drawn from Figure 6. Thereis little difference in the runtimes of static and dynamicscheduling (the red and blue lines). Both parallel imple-mentations are significantly faster than the non-parallel im-plementation (green line). The speedup factor of parallelversus non-parallel reaches a maximum of 2.6 times on aquad-core machine and 1.3 times on a dual-core one. Thismaximum speedup occurs at the default chunk size, whichis 1288/8=161 for the quad-core and 1288/2=644 for thedual-core machine (there are eight threads on a quad-coreprocessor due to Intel’s hyper-threading technology). Thehighest speed gain is also achieved at integer fractions (i.e.1/2, 1/3, 1/4, ...) of the default chunk size. This is when thetotal number of iterations (1288 image rows) is evenly dis-tributed amongst all threads. In short, static scheduling withdefault chunk size works best for GDT. This default chunksize will therefore be used in all subsequent experiments.

5.2. Number of iterations until convergence

We now show that the number of distance propagation itera-tions depends on the density of seed points. As stated ealier,the seed points are selected as local minima of the cost imageusing non-maximum suppression. We varied the NMS radiusfrom 5 to 100 pixels, which results in a number of seed pointsranging from 14000 down to 30, respectively.

Figure 7a plots the number of distance propagation itera-

a) number of GDT iterations b) speedup on a quad-core CPU

Fig. 7. Number of iterations until convergence and speedupfactor as a function of number of seed points on a 2MP image.

tions versus the number of seed points for the same 2MP im-age used in the previous experiment. As the seeds get denser,the minimum geodesic paths become shorter. Fewer iterationsare therefore required to propagate the GDT. If the seeds aresparsely sampled (e.g. less than 1000 seeds for a 2MP im-age), the parallel implementations require more iterations tocomplete the GDT compared to the non-parallel one. The rea-son for this has been mentioned at the end of Section 4. Formore than 500 seeds per mega-pixels, there is no differencein the number of iterations for either parallel or non-parallelimplementations.

Because seed density affects the number of iterations, italso affects the speedup factor. Figure 7b plots the speedupfactor of two parallel implementations over the non-parallelone as a function of seed number. Similar to the experi-ment in the previous subsection, the runtimes are averagedover ten identical runs to smooth out sudden glitches due tothe processors being summoned upon high-priority operatingsystem tasks. OpenMP implementations on a quad-core ma-chine speed up GDT by a factor between 1.7 and 2.5. Themaximum speedup is achieved when there are 500 seeds permega-pixels (i.e. one seed for every 50×50 image block).The speedup factor reduces slightly when there are more than500 seeds per mega-pixels.

5.3. Runtime for different image sizes

This subsection investigates the runtime and speedup factorof parallel GDT for different image sizes given the same seedselection strategy. Ten images of different sizes ranging from0.4 to 10 MP were chosen. For each image, the number ofseeds is set to a default value equal to the square root num-ber of pixels. Adaptive NMS (crobust = 1) [3] is used on anegated cost image to produce an exact number of seed points.The runtime results are plotted in Figure 8, where the x-axisspecifies the square root of the total number of pixels in theimage (which is also the number of seed points or the imagewidth for square images).

Figure 8a shows that it takes less than half a second tocompute the GDT for a 3MP image. For a 10MP image, the

Page 6: Parallel implementation of geodesic distance transform with application in superpixel segmentation

a) runtime b) speedup factor

Fig. 8. Runtime and speedup factor for images of differentsizes on a 2.8GHz quad-core machine with 12GB of RAM.

runtime increases to 1.5 seconds. The runtime is linearly pro-portional to the number of pixels in the image (quadraticallyproportional to the image width as shown in Figure 8a). How-ever, the runtime is image-content dependent as suggested bythe two data points around an image width of 1500. Despitehaving a similar number of pixels, a 1936×1288 image took0.28 seconds to compute its GDT, while a 1842×1380 imagetook 0.42 seconds (under static scheduling).

Figure 8b shows the speedup factor of two parallel im-plementations over the non-parallel one. Once again, thespeedup is image-content dependent. For 0.5MP images, thespeedup factor ranges from 1 to 3 times. As the images getbigger, the speedup factor range shrinks to between 2 to 2.5times. This variation is due to the different complexity ofedges in each image.

6. APPLICATION: SUPERPIXEL SEGMENTATION

A superpixel is a group of connected pixels sharing somecommon properties such as intensity, colour or texture [20]. Auseful superpixel segmentation partitions the image into regu-larly sized and shaped superpixels (i.e. close to round) that re-spect scene boundaries. This type of segmentation facilitatesedge-preserving image processing because the processing canbe done on individual superpixels, which do not include pix-els across differently textured regions.

As mentioned earlier, GDT produces a label image, inwhich each pixel is associated with its nearest seed label(nearest in term of geodesic distance). Pixels with a commonnearest seed are connected; together they form a superpixel.Using the strategy mentioned at the beginning of Section 5,where the cost image is the input image’s gradient energyplus a small offset and the seed points are its local minima,the input image can be segmented into geodesic superpixels.To make the superpixels’ shapes more regular, we movedeach seed point to its superpixel centroid [8] and rerun thegeodesic distance transform. An example of segmentation ofa 2MP image into 1000 superpixels using 3 iterations of seedrecentroiding, each with 10 iterations of distance propagationis given in Figure 9. Cyan lines denote the superpixel bound-

Fig. 9. 1000 geodesic superpixels on a 1936×1288 image.

aries, and yellow dots denote the recentroidal seed points.The superpixel boundaries closely follow strong edges inthe image. Note that these superpixels are not designed tocover every edge in the image, especially edges in highly tex-tured areas. This is because geodesic superpixels are grownfrom well-separated seed points. They do not shrink to fitarbitrarily small regions commonly found in fine textures.

We compared our superpixel segmentation result on a968×644 image in Figure 10 against eight other segmenta-tion methods:

• Watershed [16] with shallow region removal usingMathworks’ Image Processing Toolbox (watershedand imhmin) and small region removal using our ownMatlab implementation

• FH, i.e. graph-based segmentation [9], using a C im-plementation from the authors 1

• Quickshift [26] using a C implementation from VLFeat2

• Entropy rate [13] using C/MEX code from the authors3

• Centroidal Voronoi Tessellation (CVT) [8] using ourown Matlab implementation

• Superpixel lattices [17] using a C/MEX implementa-tion from the authors 4

• SLIC superpixels [1] using a command line Windowsexecutable from the authors 5

1FH: http://people.cs.uchicago.edu/˜pff/segment/2Quickshift: http://www.vlfeat.org/index.html3Entropy rate: http://www.umiacs.umd.edu/˜mingyliu/4Superpixel lattices: http://web4.cs.ucl.ac.uk/research/

vis/pvl/index.php?option=com_content&view=article&id=76:superpixel-lattices-code&catid=49:downloads&Itemid=62

5SLIC: http://ivrg.epfl.ch/supplementary_material/RK_SLICSuperpixels/index.html

Page 7: Parallel implementation of geodesic distance transform with application in superpixel segmentation

Fig. 10. Results of 9 different superpixel segmentation meth-ods on a 968×644 image (images are ordered as in the table,# denotes number of superpixels returned by the method).

• TurboPixels [12] using a Matlab implementation fromthe authors 6

Default parameters were used for all methods, except for:

• FH: min area for region merging was tuned (=22) toproduce a desired number of segments

• Quickshift: maxdist was tuned (=13) to produce adesired number of segments

• SLIC: spatial weight = 5 was chosen instead of10 (default) for better edge-following superpixels

The results in Figure 10 show that only SLIC, TurboPix-els and our method produces regular superpixels that followscene boundaries. Watershed produces a good edge-followingsegmentation that rivals the recent graph-based and mean-shift techniques. Entropy rate superpixel segmentation pro-duces irregular segments around flat image areas. CVT isregular but does not follow image edges. Superpixel latticeproduces blocky segmentation.

A close-up comparison of three methods that producesthe most edge-following regular superpixels is given in Fig-ure 11. SLIC superpixels follow edges well but have jaggyboundaries around textured areas. Our method produces themost regular and edge-following superpixels visually. Tur-boPixels produces more regular superpixels than SLIC but itmisses some strong edges. Geodesic superpixel segmentationis also the fastest methods amongst the three presented. Oursis one order of magnitude faster than SLIC and two orders of

6TurboPixels: http://www.cs.toronto.edu/˜babalex/research.html

(a) SLIC superpixels(4.6 seconds)

(b) geodesic superpix-els (0.64 second)

(c) TurboPixels (207seconds)

Fig. 11. Comparison of 3 superpixel segmentation methods(runtime was measured on the full 2MP image in Figure 9).

magnitude faster than TurboPixels using executables from thecorresponding authors. This speed advantage is partially dueto the parallel GDT implementation on a quad-core machine.

We also evaluate all nine superpixel methods using twomeasures of superpixel regularity. To measure size regularity,the standard deviation of all superpixels’ areas is used. Wenormalised the standard deviation by the averaged superpixelarea to yield a unit-free measure. The smaller the normalisedstandard deviation of superpixel size is, the better. To measureshape regularity, we used a modified version of the isoperi-metric quotient in [22]. The isoperimetric quotient is invertedso that smaller measure means more regular shape. This in-verted isoperimetric quotient is computed as the ratio of su-perpixel Perimeter over the square root of its Area (P/

√A).

We averaged this ratio over all superpixels to achieve a singleshape measure per method. The P/

√A ratio has a theoretical

lower bound of 2√π ≈ 3.54 for a circular segment. However,

this lower bound is never achieved since circles by themselvescannot form a 2D tessellation. Known tessellations such ashexagonal and square grid have an average P/

√A ratio of√

8√3 ≈ 3.72 and 4, respectively.

Figure 12 compares the size and shape regularity of thesuperpixels shown in Figure 10 over the whole image. Asexpected, CVT produces the smallest area deviation and av-erage ratio. Irregular segmentation methods such as Water-shed, FH and QuickShift, on the other hand, produce largevalues for both measures. Of the three edge-following super-pixel methods, SLIC produces the most regular size but leastregular shape superpixels, TurboPixel produces the most reg-ular shaped but least regular size superpixels. Our geodesicmethod achieves a balance between size and shape regularity.

7. CONCLUSION

We have shown that the sequential chamfer algorithm forcomputing geodesic distance transform can be modifiedfor parallel implementation on multicore processors us-ing OpenMP. The parallel implementations yield an exactGDT using a slightly higher number of iterations than anon-parallel implementation. However, the overall speed isincreased if the parallel implementations are run under a mul-ticore processor. A speedup factor of 1.3 is achieved for adual-core machine and 2.6 for a quad-core machine. When

Page 8: Parallel implementation of geodesic distance transform with application in superpixel segmentation

Fig. 12. Comparison of superpixel regularity from differentmethods (smaller is better).

applied to a gradient energy image with evenly distributedseeds, GDT can segment an image into regularly sized andshaped superpixels. Our geodesic superpixel segmentationproduces regularly edge-following superpixels at a fasterspeed than many state-of-the-art methods.

8. ACKNOWLEDGMENT

The author would like to thank Khanh Doan and Ernest Wanfor reviewing an earlier version of this paper.

9. REFERENCES

[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, andS. Susstrunk, “SLIC superpixels compared to state-of-the-artsuperpixel methods,” PAMI, 34(11):2274–2282, 2012.

[2] X. Bai, and G. Sapiro, “A geodesic framework for fast inter-active image and video segmentation and matting,” in Proc. ofICCV, 2007, pp. 510–517.

[3] M. Brown, R. Szeliski, and S. Winder, “Multi-image matchingusing multi-scale oriented patches,” in Proc. of CVPR, 2005,pp. 510–517.

[4] M.A. Butt and P. Maragos, “Optimum design of cham-fer distance transforms,” IEEE Trans. on Image Processing,7(10):1477–1484, 1998.

[5] B. Chapman, G. Jost, and R. van der Pas, Using OpenMP:Portable Shared Memory Parallel Programming, The MITPress, 2007.

[6] D. Coeurjolly and A. Montanvert, “Optimal separable algo-rithms to compute the reverse Euclidean distance transforma-tion and discrete medial axis in arbitrary dimension,” PAMI,29(3):437–448, Mar. 2007.

[7] A. Criminisi, T. Sharp, and A. Blake, “GeoS: Geodesic imagesegmentation,” in Proc. of ECCV, 2008, pp. 99–112.

[8] Q. Du, V. Faber, and M. Gunzburger, “Centroidal Voronoitessellations: Applications and algorithms,” SIAM Review,41(4):637–676, Dec. 1999.

[9] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient graph-based image segmentation,” IJCV, 59(2):167–181, 2004.

[10] L. Grady, “Random walks for image segmentation,” PAMI,28(11):1768–1783, 2006.

[11] Intel, “Automatic parallelization with Intel compilers,” in Intelguide for developing multithreaded application. Intel Corpora-tion, 2011.

[12] A. Levinshtein, A. Stere, K.N. Kutulakos, D.J. Fleet, S.J. Dick-inson, and K. Siddiqi, “TurboPixels: Fast superpixels usinggeometric flows,” PAMI, 31(12):2290–2297, 2009.

[13] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “En-tropy rate superpixel segmentation,” in Proc. of CVPR, 2011,pp. 2097–2104.

[14] D. Man, K. Uda, H. Ueyama, Y. Ito, and K. Nakano, “Im-plementations of parallel computation of Euclidean distancemap in multicore processors and GPUs,” in Proc. of the FirstInt’l Conf. on Networking and Computing, 2010, ICNC ’10,pp. 120–127.

[15] P. Maragos and M.A. Butt, “Curve evolution, differential mor-phology, and distance transforms applied to multiscale andeikonal problems,” Fundamenta Informaticae, 41(1-2):91–129, Jan. 2000.

[16] F. Meyer, “Topographic distance and watershed lines,” SignalProcessing, 38(1):113–125, July 1994.

[17] A.P. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones,“Superpixel lattices,” in Proc. of CVPR, 2008.

[18] G. Peyre, M. Pechaud, R. Keriven, and L.D. Cohen, “Geodesicmethods in computer vision and graphics,” Foundations andTrends in Computer Graphics, 5(3-4):197–397, 2010.

[19] T.Q. Pham, “Non-maximum suppression using fewer than twocomparisons per pixel,” in Proc. ACIVS, 2010, pp. 438–451.

[20] X. Ren and J. Malik, “Learning a classification model for seg-mentation,” in Proc. of ICCV, 2003.

[21] A. Rosenfeld and J.L. Pfaltz, “Distance functions on digitalpictures,” Pattern Recognition, 1(1):33–61, 1968.

[22] A. Schick, M. Fischer, and R. Stiefelhagen, “Measuring andevaluating the compactness of superpixels,” in Proc. of ICPR,2012, pp. 930–934.

[23] J. Shalf, J. Bashor, D. Patterson, K. Asanovic, K. Yelick,K. Keutzer, and T. Mattson, “The manycore revolution: WillHPC lead or follow?,” SciDAC Review, 14:40–49, 2009.

[24] S.J. Shyu, T.W. Chou, and T.L. Chia, “Distance transformationin parallel,” J. of Informatics & Electronics, 1(1):43–54, 2006.

[25] G. Slabaugh, R. Boyes, and X. Yang, “Multicore imageprocessing with OpenMP,” Signal Processing Magazine,27(2):134–138, 2010.

[26] A. Vedaldi and S. Soatto, “Quick shift and kernel methods formode seeking,” in Proc. of ECCV (4), 2008, pp. 705–718.

[27] B.J. Verwer, P.W. Verbeek, and S.T. Dekker, “An efficient uni-form cost algorithm applied to distance transforms,” PAMI,11(4):425–429, 1989.

[28] P. Wang, G. Zeng, R. Gan, J. Wang, and H. Zha, “Structure-sensitive superpixels via geodesic distance,” IJCV, 103(1):1–21, 2013.

[29] G. Zeng, P. Wang, J. Wang, R. Gan, and H. Zha, “Structure-sensitive superpixels via geodesic distance,” in Proc. of ICCV,2011.