a topological approach to hierarchical segmentation using...

1
A Topological Approach to Hierarchical Segmentation using Mean Shift Sylvain Paris and Frédo Durand Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory Mean shift is a popular method to segment images and videos. Pixels are represented by feature points, and the segmentation is driven by the point density in feature space. In this paper, we introduce the use of Morse theory to interpret mean shift as a topological decomposition of the feature space into density modes. This allows us to build on the wa- tershed technique and design a new algorithm to compute mean-shift segmentations of images and videos. In addition, we introduce the use of topological persistence to create a segmentation hierarchy. We vali- dated our method by clustering images using color cues. In this context, our technique runs faster than previous work, especially on videos and large images. We evaluated accuracy with a classical benchmark which shows results on par with existing low-level techniques, i.e. we do not sacrifice accuracy for speed. ABSTRACT DEFINITION: MEAN-SHIFT SEGMENTATION CONTRIBUTIONS RUNNING TIMES (8 Mpixels) RESULTS Each pixel is associated to a feature point “position & color”. Iteration: Each point moves to the mean of its neighbors. Segmentation: Points with the same limit are grouped. Known result: This is an ascent process on the feature point density. OUR STRATEGY 1- Compute the density function on the whole feature space. 2- Extract the density modes. 3- Build a hierarchy using topological persistence. FAST COMPUTATION OF THE FEATURE POINT DENSITY We estimate the density function on a regular grid in fea- ture space. Since we use a Gaussian density estimator, the density function is band-limited. Thus, we can use a coarse grid to sample the feature space. Performances are further improved using a separable Gaussian convolution. We describe a fast method to compute a hierarchical segmentation of large , low-dimensional data sets, such as high-resolution digital photo- graphs and videos. We base our work on a new interpretation of the mean- shift algorithm as a topological decomposition of the feature space. FAST EXTRACTION OF THE DENSITY MODES We use an algorithm akin to watershed. We process grid nodes from high density to low density. We assign labels to a node depending on the number of different labels present in its 1-neighborhood. 0 label: The processed node is a local maximum. We create a new label. 1 label: All the ascending paths go to the same summit. We copy the label. 2+ labels: Boundary between two modes. We put a special marker. g 1 g 2 g 3 g 4 D Step 1: Create new label m 1 g 1 g 2 g 3 g 4 D Step 2: Create new label m 1 m 2 g 1 g 2 g 3 g 4 D Step 3: Copy label m 2 m 1 m 1 g 1 g 2 g 3 g 4 D Step 4: Mark boundary m 1 m 1 b m 2 BUILD A HIERARCHY D (a) stable manifolds D thr>p (b) Morse-Smale complex (c) simplified complex + + + - - D thr<p + + - D (d) simplified stable manifolds formal steps only, not executed in practice Persistence characterizes the importance of topological features. For a mode, it is defined as the difference between the height of its summit and the height of a saddle point linking to an adjacent mode. A hierarchy is built by merging adjacent modes. Low persistence modes are merged first. We prove that this is equivalent to simplifying the underlying Morse-Smale complex, and that mergers can be done on-the-fly. Our algorithm works well for low-dimensional feature spaces and large kernels. The Berkeley benchmark shows an accuracy on par with existing methods using color cues. We process 6 seconds of video in 5min51 with 5D feature points, in 40s with 4D feature points. Previous work needs about 10min. 10 2 100 200 8 16 32 64 128 256 EDISON (high speed setting) graph-based segmentation our method (off-line) our method (on-the-fly, thr = 1) our method (on-the-fly, thr = 1, PCA to 4D) our method (on-the-fly, thr = 1, PCA to 3D) spatial bandwidth in pixels (log scale) time in seconds (log scale) Input Sample levels of the hierarchy Our algorithm needs to be run only once to obtain all levels. sample 2D feature space mean-shift trajectories and mean-shift segments density of feature points evaluated on a coarse grid mean-shift trajectories on the 3D plot of the density function density density grid nodes

Upload: others

Post on 23-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Topological Approach to Hierarchical Segmentation using ...people.csail.mit.edu/sparis/publi/2007/cvpr/Paris... · A Topological Approach to Hierarchical Segmentation using Mean

A Topological Approach to Hierarchical Segmentation using Mean ShiftSylvain Paris and Frédo Durand

Massachusetts Institute of TechnologyComputer Science and Artificial Intelligence Laboratory

Mean shift is a popular method to segment images and videos. Pixels

are represented by feature points, and the segmentation is driven by the

point density in feature space. In this paper, we introduce the use of

Morse theory to interpret mean shift as a topological decomposition of

the feature space into density modes. This allows us to build on the wa-

tershed technique and design a new algorithm to compute mean-shift

segmentations of images and videos. In addition, we introduce the use

of topological persistence to create a segmentation hierarchy. We vali-

dated our method by clustering images using color cues. In this context,

our technique runs faster than previous work, especially on videos and

large images. We evaluated accuracy with a classical benchmark which

shows results on par with existing low-level techniques, i.e. we do not

sacrifice accuracy for speed.

ABSTRACT

DEFINITION: MEAN-SHIFT SEGMENTATION

CONTRIBUTIONS

RUNNING TIMES (8 Mpixels) RESULTS

Each pixel is associated to a feature point “position & color”.

Iteration: Each point moves to the mean of its neighbors.

Segmentation: Points with the same limit are grouped.

Known result: This is an ascent process on the feature point density.

OUR STRATEGY1- Compute the density function on the whole feature space.

2- Extract the density modes.

3- Build a hierarchy using topological persistence.

FAST COMPUTATION OF THE FEATURE POINT DENSITYWe estimate the density function on a regular grid in fea-

ture space. Since we use a Gaussian density estimator, the

density function is band-limited. Thus, we can use a coarse grid to sample the feature space. Performances are further

improved using a separable Gaussian convolution.

We describe a fast method to compute a hierarchical segmentation of

large, low-dimensional data sets, such as high-resolution digital photo-

graphs and videos. We base our work on a new interpretation of the mean-

shift algorithm as a topological decomposition of the feature space.

FAST EXTRACTION OF THE DENSITY MODESWe use an algorithm akin to watershed. We process grid nodes from high

density to low density. We assign labels to a node depending on the number

of different labels present in its 1-neighborhood.

0 label: The processed node is a local maximum. We create a new label.

1 label: All the ascending paths go to the same summit. We copy the label.

2+ labels: Boundary between two modes. We put a special marker.

g1 g2 g3 g4

D Step 1:Create new labelm1

g1 g2 g3 g4

DStep 2:Create new label

m1m2

g1 g2 g3 g4

DStep 3:Copy label

m2

m1

m1

g1 g2 g3 g4

DStep 4:Mark boundary

m1

m1 bm2

BUILD A HIERARCHY

D (a) stable manifolds D

thr>p

(b) Morse-Smale complex (c) simplified complex+ +

+-

-

D

thr<p

+ +

-

D (d) simplified stable manifolds

formal steps only, not executed in practice

Persistence characterizes the importance of topological features. For a

mode, it is defined as the difference between the height of its summit and the

height of a saddle point linking to an adjacent mode. A hierarchy is built by

merging adjacent modes. Low persistence modes are merged first.

We prove that this is equivalent to simplifying the underlying Morse-Smale

complex, and that mergers can be done on-the-fly.

Our algorithm works well for low-dimensional feature spaces

and large kernels.

The Berkeley benchmark shows an accuracy on par with existing methods using color cues.

We process 6 seconds of video in 5min51 with 5D feature points, in 40s with 4D feature points.

Previous work needs about 10min.

10

2

100

200

8 16 32 64 128 256

EDISON (high speed setting)graph-based segmentationour method (off-line)our method (on-the-fly, thr = 1)our method (on-the-fly, thr = 1, PCA to 4D)our method (on-the-fly, thr = 1, PCA to 3D)

spatial bandwidth in pixels (log scale)

time

in se

cond

s (lo

g sc

ale)

InputSample levels of the hierarchy

Our algorithm needs to be run only once to obtain all levels.

sample 2D feature space mean-shift trajectoriesand mean-shift segments

density of feature pointsevaluated on a coarse grid

mean-shift trajectorieson the 3D plot of the density function

dens

ityde

nsity

grid nodes