content-based image retrieval using intuitive shape partitioning

8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning

1/69

Technische Universitt Hamburg-Harburg

Vision Systems

Prof. Dr.-Ing. R.-R. Grigat

Content-based Image Retrieval using

Intuitive Shape Partitioning

Studienarbeit

Andrey Galochkin

January 2007

In cooperation with Prof. Kamel, University of Waterloo


2/69

Erklrung

Hiermit erklre ich, dass die vorliegende Arbeit von mir selbstndig und nur unter Ver-

wendung der aufgefhrten Hilfsmittel erstellt wurde.

Harburg, den 5. Januar 2007


3/69

Abstract

In this thesis we present a novel query-by-example shape-based image retrieval system

that uses the correspondence of visual parts to assess the degree of similarity between

shapes. The visual parts are explicitly computed based on the cognitive principles ofhuman perception. The developed method is robust to rotation, translation, scale and

moderate level of noise. In addition, it can deal with articulated or partially occluded

shapes.

We compare our system with other part-based methods and evaluate its performance

using the MPEG-7 benchmark dataset.

Finally, we discuss the advantages and drawbacks of our system compared to global

shape similarity measures on the example of the Contour Fourier method.

ii


4/69

Contents

List of Figures iv

List of Tables v

1 Introduction 1

1.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Background Theory 2

2.1 Content-based image retrieval . . . . . . . . . . . . . . . . . . . . . . 2

2.1.1 Architecture of CBIR systems . . . . . . . . . . . . . . . . . . 2

2.1.2 Image descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Shape description techniques . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Demands on shape features. . . . . . . . . . . . . . . . . . . . 5

2.2.2 Classification of shape descriptors . . . . . . . . . . . . . . . . 6

2.2.3 Global descriptors . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.4 Structural descriptors and partial shape matching . . . . . . . . 10

3 Cognitive Principles of Shape Partitioning 15

3.1 The minima rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Boundary strength (minima salience) . . . . . . . . . . . . . . . . . . . 17

3.3 Cut length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 Relative area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.5 Protrusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.6 Good continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7 Convex partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.8 Partitioning problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 The Developed System 22

4.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iii


5/69

Contents iv

4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.1 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2 Reduce in size . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.3 Extract boundary . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.4 Adaptive smoothing . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.5 Discrete curve evolution . . . . . . . . . . . . . . . . . . . . . 30

4.3.6 Insert auxiliary points . . . . . . . . . . . . . . . . . . . . . . 32

4.4 Part segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4.1 SplitShape algorithm . . . . . . . . . . . . . . . . . . . . . . . 34

4.4.2 Merge parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.5 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5.1 Global features . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.2 Local features. . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Retrieval algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Performance Evaluation 42

5.1 Retrieval rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Time issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2.1 Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.2 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 Comparison to other part-based methods . . . . . . . . . . . . . . . . . 47

5.3.1 Shape tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3.2 Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.3 Latecki NL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Conclusions 54

7 Future Work 55

Bibliography 59


6/69

List of Figures

2.1 Typical architecture of a content-based image retrieval system (Reprinted

from [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Classification of shape representation and description techniques (Reprinted

from [22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 an object (a) and its convex hull (b). . . . . . . . . . . . . . . . . . . . 7

2.4 Reconstruction of a deer shape with increasing number of FDs. The

general form of an object can be described by the first few coefficients. . 10

2.5 A horse shape has been divided into different tokens. The numbers

corresponding to each token are the curvature and the orientation of the

token. (Reprinted from [2]). . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 The medial axis of a polygon is defined as the locus of centers of maxi-

mally inscribed disks. (Reprinted from [22]). . . . . . . . . . . . . . . 132.7 The sensitivity to noise of the medial axis: small changes in the bound-

ary may induce significant changes in the medial axis. (Reprinted from

[18]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.8 The parsing of the dog bone into parts at the branch points of the Medial

Axis Transform (a) gives the same part structure to a rectangle (b).

(Reprinted from [15]). . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 When two 3D shapes intersect, they generically create a concave crease

at the locus of intersection (reprinted from [15]). . . . . . . . . . . . . . 16

3.2 Although any subset of an object is physically a part of it, human ob-

servers clearly find some parts perceptually natural (b),whereas othersseem rather contrived (c) (reprinted from [15]). . . . . . . . . . . . . . 16

3.3 Sharper negative minima are stronger attractors of parts cuts than weaker

negative minima. In (b), a slight deviation of the part cut from negative

minima looks clearly wrong. However, in (d) a deviation of identical

magnitude appears less contrived (reprinted from [15]). . . . . . . . . . 17

v


7/69

List of Figures vi

3.4 The natural part cuts for the shape in (a) are shown in (b). Note that

each of these cuts joins a negative minimum of curvature to a point ofzero curvature. Simply joining the two negative minima, on the other

hand as in (c) leads to a perceptually unnatural parsing. (Adapted from

[16]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.5 The role of cut length in determining part cuts. The cut pq in (a) appears

far more natural than the cut pr. This is also true in (b) where the areas

of the two candidate parts have been equated. (reprinted from [16]) . . . 18

3.6 An example of the role of good continuation in parsing. The horizontal

cuts in (b)appear less natural than the vertical cuts in (c), even though

the vertical cuts are longer. (reprinted from [15]) . . . . . . . . . . . . 19

3.7 (a) is naturally segmented using four part cuts (into a central core andfour parts), whereas (b) is naturally segmented using two part cuts (into

a large vertical body and two parts on the sides). [ 16] . . . . . . . . . . 21

4.1 A contour consisting of 27 points (P1and P27coincide) . . . . . . . . . 22

4.2 A shape with the cutting segmentP8P12. The partP8P12 is the se-quence of pointsP8,P9,P10,P11,P12,P8. . . . . . . . . . . . . . . . . . . 23

4.3 if holes in (a) are filled (b), the degree of similarity between (b) and

other "lizzards" decreases. However, in some cases (c) holes should be

filled (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.4 Shapes before (a,c) and after adaptive smoothing (b,d) . . . . . . . . . 28

4.5 A shape and its curvature. After smoothing only global extrema remain.

(red: maxima, blue: minima, green: inflection points). . . . . . . . . . . 30

4.6 a shape before (a) and after discrete curve evolution (b) . . . . . . . . . 32

4.7 Contour of cellular_phone-04 after discrete curve evolution. . . . . . . 32

4.8 bad cuts (red). Because "good" points are missing, no "good" cuts exist

here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.9 After points have been inserted, intuitive partitioning is possible. . . . . 34

4.10 (a) incorrect partitioning of octopus-15. The part cut through the body

is wrong, even though its start and end points are salient minima. (b)

Correct partitioning.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.11 the correct partitioning (a) can be destroyed by an incorrect merge (b) . 38

5.1 Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.

Shapes in each row belong to the same class.(reprinted from [20] ) . . . 43

5.2 Results of the MPEG-7 CE-Shape-1 part B test for each class for both

Contour Fourier descriptors and our part-based method. . . . . . . . . . 44

5.3 Twenty most similar images to device7-10 found by our method. Matched

parts are displayed in the same color as the corresponding query parts.

Parts for which no correspondence was found are painted black. . . . . 49


8/69

List of Figures vii

5.4 Twenty most similar images found by the CFD method. Images are

displayed as silhouettes because this method doesnt compute any parts. 505.5 Twenty most similar images to ray-11 found by our method. Matched

parts are displayed in the same color as the corresponding query parts.

Parts for which no correspondence was found are painted black. . . . . 51

5.6 Twenty most similar images to ray-11 found by the CFD method.. . . . 52

5.7 Inconsistent partitioning makes it difficult to match shapes. . . . . . . . 53


9/69

List of Tables

5.1 Time needed to perform feature extraction and retrieval.. . . . . . . . . 45

viii


10/69

Chapter 1

Introduction

1.1 Problem definition

Global shape similarity measures fail when the analyzed shapes are partially occluded,

globally deformed or their parts articulated. The solution to this problem is to apply

part-based instead of whole-shape matching.

The main goal of this thesis project is to design algorithms that mimic the way humans

partition shapes and then carry out part-based matching which is robust to articulations

and occlusions.

1.2 Thesis outline

The rest of this thesis is organized as follows:

Chapter2 is a short survey of CBIR and image descriptors with the focus on shape

descriptors that we used in our algorithms.

Chapter3 explains some cognitive principles of shape partitioning.

Chapter4 describes the image retrieval system developed in this project.

Chapter5 is about the performance evaluation.

1


11/69

Chapter 2

Background Theory

2.1 Content-based image retrieval

As the size of digital image collections worldwide increases, searching for images in

such collections is becoming an important operation. In particular, there is an increas-

ing need for describing the complex information of digital images by non-textual de-scriptions, that can be used to efficiently search for similar images. The field within the

multimedia research area, focusing on using information about the visual content (such

as color, texture or shape) of the images in order to search an image database, is called

content-based image retrieval (CBIR). [18]

One of the main advantages of the CBIR approach is the possibility of an automatic

retrieval process, instead of the traditional keyword-based approach, which usually re-

quires very laborious and time-consuming previous annotation of database images. The

CBIR technology has been used in several applications such as fingerprint identification,

biodiversity information systems, digital libraries, crime prevention, medicine, histori-

cal research, among others.

2.1.1 Architecture of CBIR systems

Figure2.1shows a typical architecture of a content-based image retrieval system.

2


12/69

2.1 Content-based image retrieval 3

Figure 2.1: Typical architecture of a content-based image retrieval system (Reprinted

from [19])

Two main functionalities are supported: data insertion and query processing. The data

insertion subsystem is responsible for extracting appropriate features from images and

storing them into the image database (see dashed modules and arrows). This process is

usually performed off-line. The query processing, in turn, is organized as follows: the

interface allows a user to specify a query by means of a query pattern and to visualize the

retrieved similar images. The query-processing module extracts a feature vector from

a query pattern and applies a metric (such as the Euclidean distance) to evaluate the

similarity between the query image and the database images. Next, it ranks the database

images in a decreasing order of similarity to the query image and forwards the most

similar images to the interface module. Database images are often indexed according

to their feature vectors to speed up retrieval and similarity computation [19]. Note that

both the data insertion and the query processing functionalities use the feature vector

extraction module.

The CBIR system that we developed in this thesis is structured in a very similar way.


13/69

2.1 Content-based image retrieval 4

2.1.2 Image descriptors

An image descriptoris a pair is a pair feature vector extraction function and distance

function, used for image indexation by similarity. The extracted feature vector subsumes

the image properties and the distance function measures the dissimilarity between two

images with respect to their properties [19].

This section aims to present a brief overview of existing image descriptors. Even though

the image retrieval system developed in this thesis deals only with shape-based retrieval

for binary images, we will be discussing, for completeness, color and texture in addition

to shape features used in image retrieval.

2.1.2.1 Color

The color feature is one of the most widely used visual features in image retrieval.

It is relatively robust to background complication and independent of image size and

orientation [19].

Color description techniques can be grouped into two classes based on whether or not

they encode information related to the color spatial distribution.

Examples of descriptors that do not incorporate spatial color distribution include Color

Histogram, Color Moments and Color Sets. Color Histogram is the most commonly

used descriptor in image retrieval. Statistically, it denotes the joint probability of the

intensities of the color channels, e.g. RGB [12].

On the other hand, such descriptors as Color Coherence Vector (CCV), Border/Interior

Pixel Classification (BIC), and Color Correlogram, incorporate color spatial distribution

[19].

2.1.2.2 Texture

This image property can be characterized by the existence of basic primitives, whose

spatial distribution creates some visual patterns defined in terms of granularity, direc-

tionality, and repetitiveness. There exist different approaches to extract and represent

textures. They can be classified into space-based,frequency-basedmodels, andtexture

signatures [19].


14/69

2.2 Shape description techniques 5

Co-occurrence Matrix is one the most traditional techniques for encoding texture in-

formation. It describes spatial relationships among grey-levels in a image. A cell de-fined by the position(i,j)in this matrix registers the probability at which two pixels ofgray levelsi and j occur in two relative positions. A set of co-occurrence probabilities

(such as, energy, entropy, contrast) has been proposed to characterize textured regions.

Other example of space-based method includes the use ofAuto-RegressiveModels.

Frequency-based texture descriptors include, for instance, the Garbor wavelet coeffi-

cients that were found to be the best among the tested candidates which matched human

vision study results [12].

An example of texture signatures can be found in the proposal of Tamura et al. This

descriptor aims to characterize texture information in terms of contrast, coarseness, anddirectionality. The MPEG-7 initiative proposed three texture descriptors: texture brows-

ing descriptor, homogeneous texture descriptor, and local edge histogram descriptor

[19].

2.2 Shape description techniques

A shape was defined as: all the geometrical information that remains when location,

scale and rotational effects are filtered out from an object [17]. Shape description is

the extraction of shape features in order to quantify important properties of the shape.

2.2.1 Demands on shape features

Petrakis et al [10] state that among others the following properties are important for

reliable shape matching and retrieval:

Invariance to translation, rotation and scale, Robustness to noise and deformations, Computational efficiency, Compactness (the features require little storage space).

In our algorithms we used only such features that meet these requirements.


15/69


2.2.2 Classification of shape descriptors

Shape descriptors are classified into boundary-based (or contour-based) and region-

basedmethods. This classification takes into account whether shape features are ex-

tracted from the contour only or from the whole shape region. These two classes, in

turn, can be divided into structural (local) and global descriptors. This subdivision is

based on whether the shape is represented as a whole or represented by segments/sec-

tions. Another possible classification categorizes shape description methods intospatial

and transform domain techniques, depending on whether direct measurements of the

shape are used or a transformation is applied [22].

Figure 2.2: Classification of shape representation and description techniques (Reprinted

from [22])

Next, we present an overview of the shape descriptors relevant for the rest of this thesis.

2.2.3 Global descriptors

Perimeteris the shape boundary length, i.e the number of pixels on the shape boundary.

This feature is often used to normalize curves to have unity length (e.g. in discrete curve


16/69


evolution that will be described later).

Areais the number of pixels constituting the shape.

Roundnessis roughly correlated with the complexity of the contour and can be com-

puted asR=Area 4/Perimeter2.Roundness equals one for a circle and zero for a line segment.

Convex hull

A regionRis convex if and only if for any two points P1,P2R, the whole line segmentwhose end points are P1 and P2 is also inside R. The convex hull of a region is the

smallest convex regionHwhich satisfies the condition RH. [22]

(a) (b)

Figure 2.3: an object (a) and its convex hull (b)

Solidity is defined as the ratio of the shapes area to the area of its convex hull and

measures the deviation of a shape from being totally convex.

2.2.3.1 Shape signatures

In general, ashape signature is any 1-D function representing 2-D areas or boundaries

[21]. Assume the shape boundary coordinates(x(t),y(t)),t= 0,1,...,L 1, have beenextracted in the preprocessing stage. Then we can define the

Complex coordinates function,


17/69


which is simply the complex number generated from the boundary coordinates:

z(t) =x(t) + iy(t)

In order to eliminate the effect of bias, we use the shifted coordinates function:

z(t) = [x(t)xc] + i[y(t) yc]

where(xc,yc)is thecentroid of the shape, which is the average of the boundary coordi-

nates

xc=1

L

L1t=0

x(t), yc=1

L

L1t=0

y(t)

This shift makes the shape representation invariant to translation. An im-

portant property of this representation is that it is information preserving,

i.e. it allows full reconstruction of the shape of the contour [21].

2.2.3.2 Fourier descriptors

For a given shape signature described as aboves(t),t= 0,1,...L,assum-ing it is normalized toNpoints in the sampling stage, the discrete Fourier

transform ofs(t)is given by

un= 1

N

N1

k=0

s(t) exp(j2nt

N), n=0,1,...,N1

The coefficientsun,n=0,1,...,N 1,are called


18/69


Fourier descriptors(FD) of the shape, denoted asF Dn,n = 0,1,...,N

1 [21]. Rotation invarianceof the FDs is achieved by ignoring the phaseinformation and by taking only the magnitude values of the FDs.

For complex coordinates signature, all the Ndescriptors except the first

one (DC component) are needed to index the shape. The DC component

depends only on the position of the shape, it is not useful in describing

shape thus is discarded.

Scale normalization is achieved by dividing the magnitude values of all

the other descriptors by the magnitude value of the second descriptor.The invariant feature vector used to index the shape is then given by [ 21]

f= [ |FD2||FD1|,|FD3||FD1|,...,

|FDN1||FD1| ]

Typically, 10-15 descriptors are sufficient to describe shapes. We used

N=14 in our algorithms.


19/69


original shape

N=1 N=2 N=4

N=6 N=10 N=15

N=20 N=30 N=50

Figure 2.4: Reconstruction of a deer shape with increasing number of FDs. The general

form of an object can be described by the first few coefficients.

2.2.4 Structural descriptors and partial shape matching

With the structural approach, shapes are broken down into boundary seg-

ments called primitives. Structural methods differ in the selection of


20/69


primitives and the organization of the primitives for shape representation

[22].

2.2.4.1 Shape tokens

In [2], the curvature zero-crossing points from a Gaussian smoothed bound-

ary are used to obtain primitives, called tokens (Fig.2.5). The feature for

each token is its maximum curvature and its orientation, and the similar-

ity between two tokens is measured by the weighted Euclidean distance.

Figure 2.5: A horse shape has been divided into different tokens. The numbers cor-

responding to each token are the curvature and the orientation of the token. (Reprinted

from [2]).

Since the feature includes curve orientation, it is not rotation invariant.

The authors addressed the problem, but did not solve it.

Given a query shape, the retrieval of similar shapes from the database

takes two steps. The first step is token retrieval. For all theN tokens

on the query shape, the similar tokens are found by traversing the index

treeNtimes. The set of retrieved tokens having the same shape identifier


21/69


form a potential similar shape. The second step is to match the query

shape and the potential similar shape using a model-by-model match-

ing algorithm which is the best match between tokens of the two shapes

and involves O(MN)operations (MandNare the numbers of tokens oftwo matching shapes, respectively). Matching of tokens in both steps in-

volves thresholding which is ad hoc or empirical. Quantitative retrieval

performance (precision and recall) and retrieval efficiency are reported

based on a shape database extracted from classical painted images. Since

the tree is traversed a number of times in the shape matching, it is not

clear whether the indexing is better than model-by-model indexing. Onlymatching performance using different trees is reported. The matching ef-

ficiency also depends on the number of tokens for each shape, and on the

scale used in the smoothing stage [22].

2.2.4.2 Visual parts

Latecki et al [9] presented a shape matching approach that works directlyon the the closed boundaries. It is based on visual parts (VP), where (part

of) a database shape is simplified in the context of the query shape prior

to their matching. The simplification process includes the elimination

of particular points from the database shape such that the similarity to

the query shape is maximized. The main disadvantage of this method is

the high computational complexity of the matching algorithm, which is

O(N3logN)whereNis the number of the boundary points [1].

2.2.4.3 Skeletons

The basic idea is to eliminate redundant information while retaining only

the topological structure of the object. Skeletons can be computed by

medial axis transform. The medial axis is the locus of the centers of

maximal circles that fit within the shape, as illustrated in Fig. 2.6.


22/69


Figure 2.6: The medial axis of a polygon is defined as the locus of centers of maximally

inscribed disks. (Reprinted from [22]).

The skeleton is then segmented and represented as a graph according to

certain criteria. The matching between shapes becomes graph matching

problem. This method is sensitive to noise and requires high computa-

tions [22].

Figure 2.7: The sensitivity to noise of the medial axis: small changes in the boundarymay induce significant changes in the medial axis. (Reprinted from [18]).

Another disadvantage of skeleton-based shape partitioning is that it can

produce unintuitive results (Fig.2.8).


23/69


(a)

(b)

Figure 2.8: The parsing of the dog bone into parts at the branch points of the Medial

Axis Transform (a) gives the same part structure to a rectangle (b).

(Reprinted from [15]).


24/69

Chapter 3

Cognitive Principles of ShapePartitioning

There is strong evidence from cognitive psychology that humans recog-

nize objects by first decomposing them into parts. Human vision orga-

nizes object shapes in terms of parts and their spatial relationships. We

perceive a human hand, for example, as a coherent perceptual object; but

also as a spatial arrangement of clearly defined parts: five fingers and a

palm. Hence, perceptual units exist at many levels: at the level of whole

objects, at the level of parts, and possibly smaller parts nested within

larger ones [15].

In this chapter we summarize the main findings in this research area be-

cause they were explicitly used in the design of our algorithms.

3.1 The minima rule

Cognitive experiments have shown that humans perceive as a boundary

between two "parts" a segment containing at least one point of negative

curvature [3]. The reason is that when two convex parts overlap, their

15


25/69

3.1 The minima rule 16

boundary in most cases contains one or two points of negative curvature

(see Fig.3.1)

Figure 3.1: When two 3D shapes intersect, they generically create a concave crease at

the locus of intersection (reprinted from [15]).

Therefore we can define the

Minima Rule for Silhouettes:

Divide silhouettes into parts using points of negative minima of curvature

on their bounding contour as boundaries between parts [3].

Figure 3.2: Although any subset of an object is physically a part of it, human observers

clearly find some parts perceptually natural (b),whereas others seem rather contrived (c)

(reprinted from [15]).


26/69

3.2 Boundary strength (minima salience) 17

3.2 Boundary strength (minima salience)

The sharper a curvature minimum M, the more natural it is to a human

observer to draw a cut through it [15].

Figure 3.3: Sharper negative minima are stronger attractors of parts cuts than weaker

negative minima. In (b), a slight deviation of the part cut from negative minima looksclearly wrong. However, in (d) a deviation of identical magnitude appears less contrived

(reprinted from [15]).

However, a good cut doesnt always connect two curvature minima, even

if they are very sharp. Thus geometric constraints in addition to the min-

ima rule are needed to define cuts, and hence the parts themselves.

For our current purposes, we take a part cutto be a straight-line segment

which joins two points on the outline of a silhouette such that

**(1) at least one of the two points has negative curvature,

**(2) the entire segment lies in the interior of the shape.


27/69

3.3 Cut length 18

Figure 3.4: The natural part cuts for the shape in (a) are shown in (b). Note that each of

these cuts joins a negative minimum of curvature to a point of zero curvature. Simply

joining the two negative minima, on the other hand as in (c) leads to a perceptuallyunnatural parsing. (Adapted from [16])

3.3 Cut length

Consider the elbow in Figure3.5. Cut pq on this elbow looks far more

natural than cut pr. In Figure3.5b, we have made the areas of the two

segments equal, and pq is still the preferred cut, suggesting that the area

of the parts is not determining the cuts in these figures. Instead, exampleslike these suggest that human vision prefers to divide shapes into parts

using the shortest cuts possible.

Figure 3.5: The role of cut length in determining part cuts. The cut pq in (a) appears far

more natural than the cut pr. This is also true in (b) where the areas of the two candidate

parts have been equated. (reprinted from [16])


28/69

3.4 Relative area 19

3.4 Relative area

The salience of a part increases as the ratio of its visible area to the visible

area of the whole silhouette increases [15].

3.5 Protrusion

This factor is the degree to which a part sticks out from its object. Parts

that stick out more seem to be more salient [4]. It can be computed as the

ratio of the part perimeter to the length of the cutting segment.

3.6 Good continuation

Consider the shape in Figure3.6. Here the parsing induced by the shorter

cuts (shown in Figure3.6b) appears less natural than the one induced by

the longer cuts (shown in Figure3.6c).

Figure 3.6: An example of the role of good continuation in parsing. The horizontal cuts

in (b)appear less natural than the vertical cuts in (c), even though the vertical cuts are

longer. (reprinted from [15])


29/69

3.7 Convex partitioning 20

There is another factor at play here, in addition to minimizing cut length:

In Figure3.6c each cut continues the directions of two tangents at the

negative minima of curvature but not in Figure3.6b. Hence good contin-

uation between a pair of tangents (one at each of the two part boundaries)

is an important geometric factor for determining part cuts.

3.7 Convex partitioning

Rosin [11] showed that a partitioning scheme which maximizes the weighted

sum of part convexities is closely related to Hoffman and Singhs part

salience factors [4]. The idea is to produce few solid parts with maxi-

mum relative area.

3.8 Partitioning problems

One of the main problems of intuitive shape partitioning is instability,

e.g. small changes in shape can cause signigicant changes in part seg-

mentation. In particular, partitioning is sensitive to relative size of shape

parts.

We addressed this issue when designing our algorithms.


30/69

3.8 Partitioning problems 21

Figure 3.7: (a) is naturally segmented using four part cuts (into a central core and four

parts), whereas (b) is naturally segmented using two part cuts (into a large vertical body

and two parts on the sides). [16]


31/69

Chapter 4

The Developed System

4.1 Definitions

Contour (Boundary): a sequence of points P1P2..PNP1that, when joined,form a polygon without self-intersections (the contour of a shape). This

sequence begins and ends with the same point to ensure that the polygon

is closed.

Figure 4.1: A contour consisting of 27 points (P1and P27coincide)

22


32/69

4.1 Definitions 23

Part :

Let M=Pi (i-th point on the contour) and Q= Pj , then a subset of acontour,PiPi+1...Pj1PjPi, j>i mod Nis called apartMQ.The part has to have area larger than the remaining area. If this is not the

case, the part and the remaining shape are swapped. The segment M Qis called acutting segmentor justcut.

Figure 4.2: A shape with the cutting segment P8P12. The partP8P12 is the sequenceof pointsP8,P9,P10,P11,P12,P8.

As mentioned before, because of cognitive principles, a cutting segment

always starts at a curvature minimum and must lie completely inside the

shape contour.

The words "part" and "shape part" mean the same in the scope of this

thesis.


33/69

4.2 Overview 24

4.2 Overview

The developed system has two main functionalities: populate the fea-

ture databaseand retrieval.

In the populate_database method the visual parts of a shape are

computed, their features extracted and the resulting matrix saved to a file

imagename_Features.mat.


34/69

4.2 Overview 25

Populate Database:

read in an image

reduce in size

extract boundary

compute global featurescompute curvature

adaptive smoothing

discrete curve evolution

(leave only perceptually salient points )

insert points enabling to make shortest or straight cuts

iteratively split shape using intuition

merge incorrectly split parts

extract features from each partsave feature matrix to file

In this representation the rows are the found parts of the shape and the

columns are the corresponding features.

Note: for a very solid shape, such as a circle, no parts will be found, e.g.


35/69

4.3 Preprocessing 26

the shape will not be split. If two such shapes are to be matched, the

system simply computes the Euclidean distance between the two feature

vectors.

The retrieval algorithm is loosely based on the shape tokens retrieval

scheme[2] and is described in detail in the next chapter.

The main idea is to match shapes both on the global and local level.

d1

distance between global features

d2distance between local featuresdad1+ (1 a)d2Note: The retrieval algorithm can only be run after the populate_database

method since it requires the extracted shape features.

4.3 Preprocessing

4.3.1 Holes

In the pre-processing stage we didnt deal with holes because it is not

always clear when holes should be filled or opened.

(a) (b) (c) (d)

Figure 4.3: if holes in (a) are filled (b), the degree of similarity between (b) and other

"lizzards" decreases. However, in some cases (c) holes should be filled (d).


36/69


4.3.2 Reduce in size

This step is required mainly for computational reasons since the speed of

all subsequent algorithms strongly depends on the number of the bound-

ary points. Empirically we found out that N=128x128 white pixels are

sufficient to retain all perceptually important details of most images. There-

fore, each image is pre-processed as follows:

compute area (number of "turned on" pixels)if area > N

reduce in size to make area equal to N.

4.3.3 Extract boundary

Here we used the Matlab method bwtraceboundary and 8-connectivity

to extract the parametrized coordinates[x,y]of the shape contour.

4.3.4 Adaptive smoothing

For further processing it is necessary to sufficiently reduce the level of

noise and to remove small details that decrease the degree of similarity

between shapes. The contour of the shape is iteratively smoothed until

the number of curvature extrema becomes sufficiently low (at most 20curvature minima).

1 smoothing parameter P threshold


37/69


7 increase P

8 goto 39 end

Thus, complex shapes with many details are heavily smoothed whereas

"simple" shapes are left unchanged to prevent loss of information.

(a) (b)

(c) (d)

Figure 4.4: Shapes before (a,c) and after adaptive smoothing (b,d)

4.3.4.1 gauSmooth

The easiest and most computation efficient way to smooth the boundary

would be to simply reconstruct the curve using the previously computedFourier descriptors Fz:

z=ifft(Fz); % inverse Fourier transform

xsmooth=real(z);

ysmooth=imag(z);


38/69


However, such reconstruction sometimes produces self-intersections of

the approximated curve and ringing, which would significantly affect the

outcome of the partitioning algorithm.

This is why we used Gaussian smoothing, which produces natural results

(like blur of a camera).

Here the curve point sequence is smoothed by circularly convolving it

with a Gaussian

f(x|,) = 1

2e(x)2

22

1 sigma = smoothingParameter*length(x);

2 W = 3*sigma; % outside the Gaussian is negligibly small

3 t = (-W:W); % truncate too small values of the Gaussian

4 gau = normpdf(t,0,sigma);

5 gau=gau/sum(gau); % normalize to make area=1

6 xx=conv(x1,gau); % smoothed x coordinates

4.3.4.2 Compute curvature

Mathematically, a planar, continuous curve can be parameterized with

respect to its arc length t, and expressed as c(t) ={x(t),y(t)}

Hence, the curvature (t)ofc(t)at the point{x(t),y(t)}can be expressedas: can be expressed as:

(t) =xt(t)ytt(t)xtt(t)yt(t)

(x2t(t) +y2t(t))

3/2

The discrete derivatives are computed using the formulae

xn(k) =x(k 1) +x(k+ 1)

2


39/69


and

xnn(k) = xn(k 1) +xn(k+ 1)

2

From these formulae one can see that the computed curvature is very sen-

sitive to noise, which results in a high number of detected extrema. Hence

the curve needs to be smoothed before further processing. The smooth-

ing parameter is adjusted as described in the adaptive smoothing function.

(a) (b)

Figure 4.5: A shape and its curvature. After smoothing only global extrema remain.

(red: maxima, blue: minima, green: inflection points).

4.3.5 Discrete curve evolution

To reduce the computation time it is necessary that the shape boundary

consists of as few points as possible. A straight-forward downsampling

(take each n-th point of the boundary) has the disadvantage that some

perceptually significant points may be removed in areas of high detail

and too many insignificant points left in areas of low detail.


40/69


For example, to represent one period of a cosine wave, at least 5 points are

necessary. On the other hand, to downsample a line of arbitrary length,

just 2 points are enough.

We implemented the method described by Latecki [6,7].

In every evolutional step, a pair of consecutive line segments s1,s2 isreplaced by a single line segment joining the endpoints ofs1 s2.

The key property of this evolution is the order of the substitution. The

substitution is achieved according to a relevance measureKgiven by:

K(s1,s2) =(s1,s2)l(s1)l(s2)

l(s1) + l(s2)

where line segments s1,s2 are the polygon sides incident to a vertex v,

(s1,s2)is the turn angle at the common vertex of segmentss1,s2,lis the

length function normalized with respect to the total length of a polygonalcurveC. The main property of this relevance measurement is [7,9]:

The higher value ofK(s1,s2),the larger is the contribution of the arcs1 s2to the shape. Given the input boundary polygon P withn vertices,DCE produces a sequence of simpler polygons P= Pn,Pn1,...,P3 suchthatPn(k+1) is obtained by removing a single vertexv fromPnkwhoseshape contribution measured byKis the smallest.


41/69


(a) (b)

Figure 4.6: a shape before (a) and after discrete curve evolution (b)

4.3.6 Insert auxiliary points

4.3.6.1 Motivation

Empirically we found out that to partition a shape well, several points

have to be inserted. For example, if a cut has to be made and a pointhas been removed by discrete curve evolution, then this good cut cant be

made.

Figure 4.7: Contour of cellular_phone-04 after discrete curve evolution.

For example, the shape of a cell phone apparently consists of two parts:

the body and the antenna. However, since the body is rectangular, its

lines are straight and thus just two points of each line remain after the

curve evolution.

Obviously, none of the possible cuts is intuitive.


42/69


Figure 4.8: bad cuts (red). Because "good" points are missing, no "good" cuts exist

here.

Therefore, we need to insert points that will most likely be used to build

part cuts. We used the "shortest cut" and "good continuation" rules.

4.3.6.2 insertShortestCut

As previously mentioned, humans prefer to partition shapes with seg-

ments having shortest length. This means that such segments are orthog-

onal bisectors of the opposite lines on the contour. More formally,

1 for each curvature minimum M2 for each point Pi on the contour

3 compute vector v = Pi->Pi+1 %tangent at Pi

4 compute normal vector n v %orthogonal to PiPi+1

5

6 L:= M + k*n %line through M, in direction of n

7

8 S=intersect(L,PiPi+1) %intersection point of line L and

9 %segment PiPi+1

10

11 if between(S, PiPi+1) %intersection within segment PiPi+1

12 insert S %into the contour points sequence

13 end

14 end

15 end


43/69

4.4 Part segmentation 34

4.3.6.3 insertStraightCut

Continue each long line segment until it intersects some other segment.

Insert the intersection point into the sequence of boundary points.

Figure 4.9: After points have been inserted, intuitive partitioning is possible.

4.4 Part segmentation

4.4.1 SplitShape algorithm

The algorithm uses the previously mentioned cognitive principles trying

to split a shape in a way a human would do it. The following pseudocode

illustrates the main idea and is shown very strongly simplified.

1 while remaining shape not convex

2 for all curvature minima M

3 for all points on the boundary P

4 if admissible(cut M->P)

5 compute6 CutLength,

7 areaC %relative area of the candidate Part,

8 solC %its Solidity (=Convexity)

9 solR %Solidity of the Remaining Shape

10 mnSalience %Salience of start and end points

11

12 F = areaC + solC + cutLength + solR + mnSalience;

13

14 save F %value of utility function for cut M->P

15 end


44/69


16 end

17 end

18 remove Parts with highest F

19 end

A cut is admissible if it lies completely inside the shape boundary. To

verify that for a cut MP we designed a function lineInPoly thatchecks whether the vector MP intersects any of existing line segments.

(In other words, for alln, points pnpn+1 must lie on the same side of the

line through M

P). This solution is much better than the numerical

one (generate 100 points betweenM,Pand check if all of them lie in thepolygon).

Each component of the utility function F is cognitively motivated:

areaC

Part salience increases with relative area. Regions of a shape are nor-

mally perceived as parts only if they have some significant area relativeto the original shape [15].

solC

Rosin [11] showed that a partitioning scheme which maximizes the weighted

sum of part solidities is closely related to Hoffman and Singhs part

salience factors [4]. Therefore, part salience increases with solidity.

solR

Empirically we found out that "good" parts, when removed from the orig-

inal shape, make it more solid. For example, the solidity of an X-shape is

about 0.3. However, when the four "legs" are removed, only the central

core remains, which has solidity=1.


45/69


mnSalience

Hoffman and Singh [4] showed that sharper extrema of curvature are

more powerful attractors of part cuts. Latecki [6, 7] demonstrated that

also the lengths of the correspondent tangents are important.

Moreover, we found that the components are not significantly correlated

and thus all of them need to be computed. Each component has been

appropriately weighted, sometimes using nonlinear transformations.

After all part candidates have been examined, the algorithm:

- takes the partPiwith highest value of the utility functionF max,

- finds all other partsPj that are:

***(1) disjoint withPi (PiandPjhave not more than 2 points in common)

***(2) have value of utility functionFj>0.9Fmax.

The criterium (1) makes sure that any subregion of the shape belogs to

exactly one part. Thus, it is forbidden to partition a (leg=foot+shin+hip)

into (foot+shin) and (shin+hip)

Demand (2) is mainly due to computational reasons. To partition a shape

in less time it is better to remove several parts in one iteration. Moreover,

if parts were removed one by one, the value of the utility function for

future iterations would be different and thus the overall result very noise-

sensitive.

The described algorithm splits many simple shapes of the MPEG7 dataset

in just one iteration (such as device0..device7, apple, cell_phone etc).


46/69


4.4.2 Merge parts

The main idea of this thesis is not only to partition shapes intuitively, but

also to achieve that similar shapes are partitionedin the same way. Also,

partitioning should be robust to noise and moderate perturbations of the

contour.

Since partitioning algorithms have no background knowledge about the

world, wrong cuts cannot be avoided.

(a) (b)

Figure 4.10: (a) incorrect partitioning of octopus-15. The part cut through the body is

wrong, even though its start and end points are salient minima. (b) Correct partitioning.

The merging algorithm consists of two phases. In the first phase obvious

errors are corrected, such as splitting of a circle in two halves. These

errors mainly occur in an attempt to make the remaining shape as convex

as possible.

The key idea of the first phase is to merge two or more parts so that:

***(1) the resulting part has high solidity,

***(2) this merge is not "unfair" to other candidates for a merge.

For example, a regular pentacle (5-pointed star) is naturally split into a

pentagon with 5 triangles around it. We have to forbid a merge between


47/69

4.5 Feature extraction 38

the pentagon and any of those triangles to remain "fair" and preserve

topology.

(a) (b)

Figure 4.11: the correct partitioning (a) can be destroyed by an incorrect merge (b)

The second phase of the algorithm is "topological" merge. Two or more

neighbouring parts are merged if each of them has more than one neigh-

bour. The rationale is that most living things or objects have their limbs

arranged around just one center.

(Another extension of the algorithm would be to merge several parts ifthey can be turned in such a way that this results in high solidity. For

example, the tail of a ray is basically a long cylinder which can be bent

several times. Because of low solidity, it will be split in about ten parts

by the SplitShape algorithm. If all those parts are merged, the degree of

similarity between the ray with a straight tail and a bent tail will increase

making the retrieval robust to articulation of limbs.)

4.5 Feature extraction

4.5.1 Global features

The idea is to capture the overall appearance of the shape without going

into details. These features are computed right after the boundary ex-


48/69

4.5 Feature extraction 39

traction. As global features we took 14 normalized Fourier Descriptors

obtained by the Contour Fourier method [5]. The advantages of these

features are:

- robustness to noise,

- invariance to rotation, translation and scale,

- invariance to starting point on the boundary,

- computational efficiency.

4.5.2 Local features

To describe the segmented parts we used the following features:

1. Fourier descriptors (also 14, as in the case of global features)

2. roundness3. relative Area

4. solidity

5. number of neighbours

We found out that Fourier descriptors are much better at describing con-

vex shapes compared to such features as eccentricity. They can distin-

guish between such shapes as squares and hexagons, which is not thecase with many other global features.

Roundnesshas very small values for elongated shapes such as sticks or

pencils and is therefore good at detecting them.

Relative area helps to know at which scale the parts should be matched

during retrieval since the system expects whole shapes as a query.


49/69

4.6 Retrieval algorithm 40

Solidityof most parts will be equal to one because the SplitShape algo-

rithm tries to remove only solid parts. However, during topological merge

several central parts can be merged together. For example, a beetle torso

will be most likely split in two halves but later merged together because

each part has several neighbours (legs or antennas).

Number of neighboursallows to distinguish between such classes as de-

vice1 and device2. Both of these cogwheel-like structures have a central

core with limbs around them. However, the former has six spikes whereas

the latter has eight.

4.6 Retrieval algorithm

Once a reasonably intuitive shape partitioning has been found, partial

shape matching can be carried out which is more robust to articulations

and occlusions than whole shape matching.

For similarity computation we used a distance measure based on Eu-

clidean distance between feature vectors.

We found that with increasing solidity partitioning becomes increasingly

unstable and thus part-based distance becomes unreliable. Therefore we

decided to weight distances depending on the shape soliditys.

1 d1


50/69

4.6 Retrieval algorithm 41

1 Compute feature matrix of the query shape Q

2

3 for all part_feature matrices Mk in the database

4 if Q has more rows than Mk

5 swap(Mk,Q);

6 end

7

8 for each r ow ri o f Q

9 for each row rj of Mk

10 d(j) = dist(ri,rj) % Euclidean distance between two parts

11 end

12 totalDist += min(d) % add dist of best matching parts

13 remove matched part from Mk

14 end

15 totalDist+=penalty(unmatched parts)

16 end

Thus, the part-based shape distance is the sum of Euclidian distances

between best matching parts and a penalty for unmatched parts.

The advantage of this algorithm is that once shapes features have been

extracted, the retrieval is fast because then only Euclidian distances haveto be computed, opposed to graph-matching algorithms where most com-

putation has to be done during retrieval. Typically it takes less than 10

ms to match two shapes of average complexity.

Another advantage is flexibility. By adjusting the penalty weight one can

implicitly set the threshold for tolerable occlusion percentage. For ex-

ample, if the penalty is set to zero (which is the case in the shape tokens

algorithm [2]), then it is enough to match all parts of one shape to com-

pute distance. Thus, the distance between a cogwheel and a circle would

be zero because the core of the cogwheel perfectly matches the circle.

However, we believe this is not the most intuitive result.


51/69

Chapter 5

Performance Evaluation

5.1 Retrieval rate

To test the performance of our system we evaluated the retrieval rate on

the dataset created by the MPEG-7 committee for evaluation of shape

similarity measures [20]. The test set consists of 70 different classes of

shapes, each class containing 20 similar objects, usually (heavily) dis-

torted versions of a single base shape. The whole dataset therefore con-

sists of 1400 shapes. For example, each row in Figure shows four shapes

from the same class.

We focus our attention on the performance evaluation in experiments es-

tablished in Part B of the MPEG-7 CE-Shape-1 data set.

Each image was used as a query, and the retrieval rate is expressed by the

so called Bulls Eye Percentage (BEP): the fraction of images that belong

to the same class in the top 40 matches. Since the maximum number of

correct matches for a single query image is 20, the total number of correct

matches is 28000.

Strong shape variations within the same classes make that no shape simi-

42


52/69

5.1 Retrieval rate 43

(a) (b)

Figure 5.1: Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.

Shapes in each row belong to the same class.(reprinted from [20] )

larity measure achieves a 100% retrieval rate. E.g., see the third row in (a)

and the first and the second rows in (b). The third row shows spoons that

are more similar to shapes in different classes than to themselves [20].


53/69

5.1 Retrieval rate 44

010

20

30

40

50

60

70

80

90

100

RetrievalRate(%)

octo

pus

device

2de

vice

0tree device

5

dev

ice9 crow

nde

vice

1be

etle

device

6 camel

device

7hat

device

4sp

oon

lizza

rd bric

k Mis

kde

vice

3fish

chicke

n

elepha

nt

b

utte

rfly horse

sea_

snak

e wat

ch lmfis

hrayba

t

chop

per

person

al_c

ar turtle frog

ham

merca

rrat bone pen

cil bird dee

rfla

tfish Hear

tpo

cket

clas

sicfly com

ma

foun

tainja

r shoest

ef apple cattl

edog truck sprin

g

cellu

lar_

phon

egu

itar key

device

8bel

lHC

ircle fork

carria

gebottl

ecup

horses

hoe tedd

y glas

child

ren face

contourFourier

ourmethod

Figure5.2:Resu

ltsoftheMPEG-7CE-Shape-1partBtestforeachclassforb

othContourFourierdescriptorsandourpart-basedmethod.


54/69

5.2 Time issues 45

Figure 5.2 shows that our method (significantly) outperforms the CFD

method for 55 classes. The Bulls Eye percentagesare:

- our method: 63.536 %

- Contour Fourier method: 57.014 %.

From Figure5.2one can see that our method performs best on shapes

having clear part structure and thus having stable (consistent) partition-

ing, such as device7, which is most logically split to a central core and10 triangles around it.

Our method also better deals with occlusions and articulations than the

CFD method (see Fig.5.5)

However, in case of unstable partitioning the retrieval rate of our method

significantly decreases.

5.2 Time issues

To run the Bulls Eye test we used Intel Pentium 4 CPU with 2.26 GHz

and 512 MB of RAM. The programs were unoptimised Matlab code.

Our method Contour Fourier method

Time to complete Bulls Eye test 4h 32min 2h 2min

Average retrieval time for one query 11.65 seconds 5.23 seconds

Time to extract features 117 minutes 20 minutes

Average time to extract features 5 seconds 0.85 seconds

Table 5.1: Time needed to perform feature extraction and retrieval.


55/69

5.2 Time issues 46

5.2.1 Feature extraction

Table5.2shows that the proposed method is about six times slower than

the CFD method. In fact, the CFD is a subset of our method (see "com-

pute global features" part). However, the main amount of computation

for feature extraction is due to convex hull construction. Nevertheless,

this step cannot be left out because, as previously shown, convexity is

one of the main features that determine part saliency.

It also deserves mentioning that the CFDs speed isO(n)in the number ofboundary pixelsnand is absolutely independent of the shape (i.e. relative

positions of pixels). On the other hand, the speed of our partitioning al-

gorithm strongly depends on shape and is O(mn), wheremis the numberof curvature minima. However, this increase in complexity has limits be-

cause the adaptiveSmoothing and curveEvolution routines

simplify the shape boundary and thus limit the number of minima.

In the best case the shape is convex and doesnt need to be split. Thenthe algorithm is exactly as fast as the CFD. This explains why the feature

extraction time ranges from 1 to 15 seconds per shape.

5.2.2 Retrieval

Here the retrieval times differ on average by the factor of two. Again, in

the CFD case the speed is constant because to compare two shapes thedistance between just two feature vectors has to be computed.

On the other hand, our system basically needs to compute distances be-

tween each pair of parts additionally to the global feature vector. Thus,

time required grows quadratically in the number of shape parts. How-

ever, the number of parts is implicitly limited due to the preceding curve

smoothing, so that one can basically regard the increase in delay as a


56/69

5.3 Comparison to other part-based methods 47

constant factor.

We think that the retrieval speed can be improved by properly indexing

feature matrices. For example, one could sort them by the number of rows

(i.e. the number of shape parts) and then match only potential candidates.

We also believe that for a CBIR system, most relevant is the retrieval

time because this is the time the user has to wait when he needs to retrieve

results. Moreover, feature extraction needs to be done only once, whereas

retrieval delay occursevery time.

5.3 Comparison to other part-based methods

We compared our system to Shape Tokens, Latecki NL and Skeleton-

based methods, which were briefly described in chapter 1. To obtain

the Bulls Eye scores we used http://give-lab.cs.uu.nl/sidestep. The au-

thors of this website have reimplemented many popular shape-based al-

gorithms, thus we assume that the reported scores are correct.

5.3.1 Shape tokens

As mentioned before, this method is not rotation-invariant. It is also not

robust to occlusions, since sufficiently large protrusions caused by noise

can cause extra inflection points and thus splitting of shape tokens. Our

method would simply regard such protrusions as a new parts and remove

them in the SplitShape method.


57/69


5.3.2 Skeletons

The main problems of skeleton-based matching are: sensitivity to noise,

high computation complexity and sometimes unituitive partitioning (see

chapter1 for details). The advantage is robustness to articulations. Usu-

ally shock graphs need to be constructed and matched, which is very time

consuming. The Bulls Eye percentage reported by the aforementioned

website is 68%.

5.3.3 Latecki NL

This method is more accurate than the previous two with the Bulls Eye

score 72%. However, the price paid is O(n3log(n)) computation com-plexity during the matching.


58/69


device710d=0

device205d=1.71

device717

d=1.73

device715

d=1.83

device702d=1.88

device704d=1.9

device706d=2.23

device711d=2.45

device716d=2.54

device703d=2.55

device213d=2.58

device709d=2.64

device219d=2.66

device713d=2.66

device114d=2.72

octopus18d=2.73

device217d=2.84

device719d=2.84

device701d=2.86

device705d=2.87

Figure 5.3: Twenty most similar images to device7-10 found by our method. Matched

parts are displayed in the same color as the corresponding query parts. Parts for which

no correspondence was found are painted black.


59/69


device710d=0

device205d=0.55

octopus18d=1.31

device317d=1.4

device304d=1.42

device312d=1.42 hat04

d=1.44

device319d=1.48

device302d=1.53

device309d=1.54

device107d=1.6

device307d=1.6

device012d=1.63

device115d=1.63

device116d=1.65

device111d=1.66

device310d=1.66

device114d=1.66

device303d=1.66

device106d=1.67

Figure 5.4: Twenty most similar images found by the CFD method. Images are dis-

played as silhouettes because this method doesnt compute any parts.


60/69


ray11d=0

ray12d=0.98

ray15d=1.55

ray09

d=1.56

ray16d=1.57

ray06d=1.75 ray10

d=1.78

ray07d=1.94

ray08d=1.97

ray04d=1.98

cattle03d=1.98

ray13d=2.02

ray03d=2.04

cattle15d=2.05

cattle07d=2.06

elephant06d=2.07

cattle02d=2.11

elephant09d=2.11

elephant03d=2.12 cattle10

d=2.12

Figure 5.5: Twenty most similar images to ray-11 found by our method. Matched parts

are displayed in the same color as the corresponding query parts. Parts for which no

correspondence was found are painted black.


61/69


ray11d=0

ray12

d=0.6

ray15d=1.04

ray16d=1.05

butterfly20d=1.32

butterfly17d=1.4

ray06d=1.46 elephant09

d=1.46

cattle01d=1.46

ray07d=1.48 elephant06

d=1.49

ray19d=1.49

butterfly14d=1.55 cattle20

d=1.56

horse09d=1.57

horse05d=1.57

camel16d=1.59

ray09d=1.63

horse10d=1.64

deer19d=1.64

Figure 5.6: Twenty most similar images to ray-11 found by the CFD method.


62/69


device601 device602 device603 device604





Figure 5.7: Inconsistent partitioning makes it difficult to match shapes.


63/69

Chapter 6

Conclusions

As expected, the developed system performs best on shapes having clear

part structure (such as device1 or device7). It can distinguish between 5-,

8- and 10-pointed stars even in the presence of noise or articulations.

Also, because of partial matching, it significantly outperforms global de-

scriptors when dealing with partially occluded shapes (e.g. the classes

ray, apple and octopus of the MPEG-7 Shape-1 dataset).

Problems may arise when dealing with shapes prone to unstable parti-

tioning. Whenever a shape is split incorrectly (i.e. in a different way than

the members of the same class), it leads to a big shape distance. This

explains why the retrieval rate is so low for the classes fly or dog.

To overcome this flaw, the developed system was extended to combineboth, part-based and global descriptors. Shape distance is computed

as weighted sum between global and part-based distances. Whenever a

shape is prone to unstable partitioning (which is most often the case with

high solidity shapes), the algorithm gives part-based distance less weight.

Thus, the system tries to perform at least as good as the associated global

descriptor.

54


64/69

Chapter 7

Future Work

To improve the developed system one could implement the following ex-

tensions:

- Take relative positions of parts into account. Define such features

as part orientation and describe the position of each part in polarcoordinates (distance from shape centroid and polar angle).

- Merge part chains. Bent shapes (such as sea_snake) are currently

split in many convex segments, although topologically seen such

shapes have no part structure. Curved parts can be described by their

solidity or bending energy. This would allow to detect similaritity

between bone and broken_bone.

- To make matching more robust to partitioning errors, allow to matchseveral parts to one part or even N to M parts at once. Merge or split

parts at runtime, to better match the query shape. This extention

would be relatively easy to implement because currently the algo-

rithm can save the computed parts in the database.

- Compute several representations for each shape (most probable par-

titions). Then the shape distance is the smallest distance between all

55


65/69

Chapter 7: Future Work 56

pairs of such partitions. (This would, of course, reduce the speed of

both the feature extraction and retrieval processes asO(n2)).

- Use a more powerful global descriptor or a combination hereof, for

example the multiscale Fourier Descriptor [5].

- Use envelope detection or just convex hull before extracting shape

boundaries. This would allow to correctly classify such shapes as

distorted pentagons, triangles and squares from the MPEG-7 Shape-

1 dataset.


66/69

Contributions of this project thesis

Design of a utility function that combines several cognitive princi-ples of shape partitioning,

Algorithm to find perceptually salient curvature extrema, Algorithm to check whether a cutting segment lies inside a polygon, Shape splitting and merging algorithms, Combining of part-based and global similarity measures, Image database retrieval architecture.

57


67/69

Appendix

Results that turned out less useful

The original project title was "Shape representation and matching using geometric prim-

itives (geons)". The main idea was to decompose a given 2-D binary shape into gener-

alized rectangles and ellipses and represent the shape as a directed graph or encode as a

number.

However, we found that this approach is only applicable to man-made objects that have

clear geometric structure. On the contrary, most natural objects cannot be reliably rep-

resented by simple geometric figures. Thus, this decomposition scheme is not robustand hence inappropriate for part-based matching.

Therefore, instead of describing parts in parametric form, we decided to extract global

features from each part. This means that in the new approach the parts can have arbitrary

form.

58


68/69

Bibliography

[1] N. Alajlan. Multi-Object Shape Retrieval Using Curvature Trees. PhD thesis,University of Waterloo, Canada, 2006.

[2] S. Berretti, A. Del Bimbo, and P. Pala. Retrieval by shape using multidimensional

indexing structures. InICIAP, pages 945950, 1999.

[3] D. D. Hoffman and W. A. Richards. Parts of recognition. In T. F. Shipley and P. J.

Kellman, editors,Cognition, chapter 18, pages 6596. Elsevier Science, 1984.

[4] D. D. Hoffman and M. Singh. Salience of visual parts. Cognition, 63, pages

2978, 1997.

[5] I. Kunttu, L. Lepisto, J. Rauhamaa, and A. Visa. Multiscale fourier descriptor for

shape classification. InICIAP 03: Proceedings of the 12th International Confer-

ence on Image Analysis and Processing, pages 536541, Washington, DC, USA,

2003. IEEE Computer Society.

[6] L. J. Latecki and R. Lakamper. Convexity rule for shape decomposition based on

discrete contour evolution. Computer Vision Image Understanding, 73(3):441

454, 1999.

[7] L. J. Latecki and R. Lakamper. Shape similarity measure based on correspondence

of visual parts. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(10):11851190, 2000.

[8] L. J. Latecki, R. Lakamper, and U. Eckhardt. Shape descriptors for non-rigid

shapes with a single closed contour. InIEEE Conf. on Computer Vision and Pattern

Recognition (CVPR), pages 424429, 2000.

[9] L. J. Latecki, R. Lakamper, and D. Wolter. Optimal partial shape similarity.Image

and Vision Computing, 23(2):227236, 2005.

59


69/69

Bibliography 60

[10] E. Petrakis, A. Diplaros, and E. Milios. Matching and retrieval of distorted and

occluded shapes using dynamic programming, 2002.

[11] P. L. Rosin. Shape partitioning by convexity. BMVC99, pages 633642, 1999.

[12] Y. Rui and T. S. Huang. Image retrieval: Current techniques, promising directions,

and open issues. Journal of Visual Communication and Image Representation 10,

pages 3962, 1999.

[13] M. Safar, C. Shahabi, and X. Sun. Image retrieval by shape: A comparative study.

Technical Report 1, University of Southern California, 1999.

[14] K. Siddiqi and B. B. Kimia. Parts of visual form: Computational aspects. IEEETransactions on Pattern Analysis and Machine Intelligence, 17(3):239251, 1995.

[15] M. Singh and D. D. Hoffman. From Fragments to Objects: Grouping and Seg-

mentation in Vision, chapter 9, pages 401459. Elsevier Science, 2001.

[16] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhouettes: The short-cut rule.

Perception and Psychophysics, 61, pages 636660, 1999.

[17] M. B. Stegmann and D. D. Gomez. A brief introduction to statistical shape analy-

sis. Technical report, Technical University of Denmark, 2002.

[18] M. Tanase-Avatavului. Shape Decomposition and Retrieval. PhD thesis, UtrechtUniversity, Holland, 2005.

[19] R. S. Torres and A. X. Falcao. Content-based image retrieval: Theory and appli-

cations. Revista de Informatica Teorica e Aplicada, 13(2):161185, 2006.

[20] R. C. Veltkamp and L. J. Latecki. Properties and performances of shape similarity

measures. In Tim Crawford and Remco C. Veltkamp, editors,Content-Based Re-

trieval, Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany, 2006.

[21] D. Zhang and G. Lu. A comparative study on shape retrieval using fourier descrip-

tors with different shape signatures, 2001.

[22] D. Zhang and G. Lu. Review of shape representation and description techniques.

Pattern Recognition, 37(1):119, 2004.

content-based image retrieval using intuitive shape partitioning

Documents