content-based image retrieval using intuitive shape partitioning
TRANSCRIPT
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
1/69
Technische Universitt Hamburg-Harburg
Vision Systems
Prof. Dr.-Ing. R.-R. Grigat
Content-based Image Retrieval using
Intuitive Shape Partitioning
Studienarbeit
Andrey Galochkin
January 2007
In cooperation with Prof. Kamel, University of Waterloo
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
2/69
Erklrung
Hiermit erklre ich, dass die vorliegende Arbeit von mir selbstndig und nur unter Ver-
wendung der aufgefhrten Hilfsmittel erstellt wurde.
Harburg, den 5. Januar 2007
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
3/69
Abstract
In this thesis we present a novel query-by-example shape-based image retrieval system
that uses the correspondence of visual parts to assess the degree of similarity between
shapes. The visual parts are explicitly computed based on the cognitive principles ofhuman perception. The developed method is robust to rotation, translation, scale and
moderate level of noise. In addition, it can deal with articulated or partially occluded
shapes.
We compare our system with other part-based methods and evaluate its performance
using the MPEG-7 benchmark dataset.
Finally, we discuss the advantages and drawbacks of our system compared to global
shape similarity measures on the example of the Contour Fourier method.
ii
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
4/69
Contents
List of Figures iv
List of Tables v
1 Introduction 1
1.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background Theory 2
2.1 Content-based image retrieval . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 Architecture of CBIR systems . . . . . . . . . . . . . . . . . . 2
2.1.2 Image descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Shape description techniques . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Demands on shape features. . . . . . . . . . . . . . . . . . . . 5
2.2.2 Classification of shape descriptors . . . . . . . . . . . . . . . . 6
2.2.3 Global descriptors . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.4 Structural descriptors and partial shape matching . . . . . . . . 10
3 Cognitive Principles of Shape Partitioning 15
3.1 The minima rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Boundary strength (minima salience) . . . . . . . . . . . . . . . . . . . 17
3.3 Cut length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Relative area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.5 Protrusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Good continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 Convex partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8 Partitioning problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 The Developed System 22
4.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iii
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
5/69
Contents iv
4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.1 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.3.2 Reduce in size . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.3 Extract boundary . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.4 Adaptive smoothing . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.5 Discrete curve evolution . . . . . . . . . . . . . . . . . . . . . 30
4.3.6 Insert auxiliary points . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Part segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1 SplitShape algorithm . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.2 Merge parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.1 Global features . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.2 Local features. . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 Retrieval algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Performance Evaluation 42
5.1 Retrieval rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Time issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.1 Feature extraction. . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Comparison to other part-based methods . . . . . . . . . . . . . . . . . 47
5.3.1 Shape tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3.2 Skeletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.3 Latecki NL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Conclusions 54
7 Future Work 55
Bibliography 59
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
6/69
List of Figures
2.1 Typical architecture of a content-based image retrieval system (Reprinted
from [19]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Classification of shape representation and description techniques (Reprinted
from [22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 an object (a) and its convex hull (b). . . . . . . . . . . . . . . . . . . . 7
2.4 Reconstruction of a deer shape with increasing number of FDs. The
general form of an object can be described by the first few coefficients. . 10
2.5 A horse shape has been divided into different tokens. The numbers
corresponding to each token are the curvature and the orientation of the
token. (Reprinted from [2]). . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 The medial axis of a polygon is defined as the locus of centers of maxi-
mally inscribed disks. (Reprinted from [22]). . . . . . . . . . . . . . . 132.7 The sensitivity to noise of the medial axis: small changes in the bound-
ary may induce significant changes in the medial axis. (Reprinted from
[18]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 The parsing of the dog bone into parts at the branch points of the Medial
Axis Transform (a) gives the same part structure to a rectangle (b).
(Reprinted from [15]). . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 When two 3D shapes intersect, they generically create a concave crease
at the locus of intersection (reprinted from [15]). . . . . . . . . . . . . . 16
3.2 Although any subset of an object is physically a part of it, human ob-
servers clearly find some parts perceptually natural (b),whereas othersseem rather contrived (c) (reprinted from [15]). . . . . . . . . . . . . . 16
3.3 Sharper negative minima are stronger attractors of parts cuts than weaker
negative minima. In (b), a slight deviation of the part cut from negative
minima looks clearly wrong. However, in (d) a deviation of identical
magnitude appears less contrived (reprinted from [15]). . . . . . . . . . 17
v
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
7/69
List of Figures vi
3.4 The natural part cuts for the shape in (a) are shown in (b). Note that
each of these cuts joins a negative minimum of curvature to a point ofzero curvature. Simply joining the two negative minima, on the other
hand as in (c) leads to a perceptually unnatural parsing. (Adapted from
[16]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 The role of cut length in determining part cuts. The cut pq in (a) appears
far more natural than the cut pr. This is also true in (b) where the areas
of the two candidate parts have been equated. (reprinted from [16]) . . . 18
3.6 An example of the role of good continuation in parsing. The horizontal
cuts in (b)appear less natural than the vertical cuts in (c), even though
the vertical cuts are longer. (reprinted from [15]) . . . . . . . . . . . . 19
3.7 (a) is naturally segmented using four part cuts (into a central core andfour parts), whereas (b) is naturally segmented using two part cuts (into
a large vertical body and two parts on the sides). [ 16] . . . . . . . . . . 21
4.1 A contour consisting of 27 points (P1and P27coincide) . . . . . . . . . 22
4.2 A shape with the cutting segmentP8P12. The partP8P12 is the se-quence of pointsP8,P9,P10,P11,P12,P8. . . . . . . . . . . . . . . . . . . 23
4.3 if holes in (a) are filled (b), the degree of similarity between (b) and
other "lizzards" decreases. However, in some cases (c) holes should be
filled (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Shapes before (a,c) and after adaptive smoothing (b,d) . . . . . . . . . 28
4.5 A shape and its curvature. After smoothing only global extrema remain.
(red: maxima, blue: minima, green: inflection points). . . . . . . . . . . 30
4.6 a shape before (a) and after discrete curve evolution (b) . . . . . . . . . 32
4.7 Contour of cellular_phone-04 after discrete curve evolution. . . . . . . 32
4.8 bad cuts (red). Because "good" points are missing, no "good" cuts exist
here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.9 After points have been inserted, intuitive partitioning is possible. . . . . 34
4.10 (a) incorrect partitioning of octopus-15. The part cut through the body
is wrong, even though its start and end points are salient minima. (b)
Correct partitioning.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.11 the correct partitioning (a) can be destroyed by an incorrect merge (b) . 38
5.1 Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.
Shapes in each row belong to the same class.(reprinted from [20] ) . . . 43
5.2 Results of the MPEG-7 CE-Shape-1 part B test for each class for both
Contour Fourier descriptors and our part-based method. . . . . . . . . . 44
5.3 Twenty most similar images to device7-10 found by our method. Matched
parts are displayed in the same color as the corresponding query parts.
Parts for which no correspondence was found are painted black. . . . . 49
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
8/69
List of Figures vii
5.4 Twenty most similar images found by the CFD method. Images are
displayed as silhouettes because this method doesnt compute any parts. 505.5 Twenty most similar images to ray-11 found by our method. Matched
parts are displayed in the same color as the corresponding query parts.
Parts for which no correspondence was found are painted black. . . . . 51
5.6 Twenty most similar images to ray-11 found by the CFD method.. . . . 52
5.7 Inconsistent partitioning makes it difficult to match shapes. . . . . . . . 53
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
9/69
List of Tables
5.1 Time needed to perform feature extraction and retrieval.. . . . . . . . . 45
viii
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
10/69
Chapter 1
Introduction
1.1 Problem definition
Global shape similarity measures fail when the analyzed shapes are partially occluded,
globally deformed or their parts articulated. The solution to this problem is to apply
part-based instead of whole-shape matching.
The main goal of this thesis project is to design algorithms that mimic the way humans
partition shapes and then carry out part-based matching which is robust to articulations
and occlusions.
1.2 Thesis outline
The rest of this thesis is organized as follows:
Chapter2 is a short survey of CBIR and image descriptors with the focus on shape
descriptors that we used in our algorithms.
Chapter3 explains some cognitive principles of shape partitioning.
Chapter4 describes the image retrieval system developed in this project.
Chapter5 is about the performance evaluation.
1
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
11/69
Chapter 2
Background Theory
2.1 Content-based image retrieval
As the size of digital image collections worldwide increases, searching for images in
such collections is becoming an important operation. In particular, there is an increas-
ing need for describing the complex information of digital images by non-textual de-scriptions, that can be used to efficiently search for similar images. The field within the
multimedia research area, focusing on using information about the visual content (such
as color, texture or shape) of the images in order to search an image database, is called
content-based image retrieval (CBIR). [18]
One of the main advantages of the CBIR approach is the possibility of an automatic
retrieval process, instead of the traditional keyword-based approach, which usually re-
quires very laborious and time-consuming previous annotation of database images. The
CBIR technology has been used in several applications such as fingerprint identification,
biodiversity information systems, digital libraries, crime prevention, medicine, histori-
cal research, among others.
2.1.1 Architecture of CBIR systems
Figure2.1shows a typical architecture of a content-based image retrieval system.
2
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
12/69
2.1 Content-based image retrieval 3
Figure 2.1: Typical architecture of a content-based image retrieval system (Reprinted
from [19])
Two main functionalities are supported: data insertion and query processing. The data
insertion subsystem is responsible for extracting appropriate features from images and
storing them into the image database (see dashed modules and arrows). This process is
usually performed off-line. The query processing, in turn, is organized as follows: the
interface allows a user to specify a query by means of a query pattern and to visualize the
retrieved similar images. The query-processing module extracts a feature vector from
a query pattern and applies a metric (such as the Euclidean distance) to evaluate the
similarity between the query image and the database images. Next, it ranks the database
images in a decreasing order of similarity to the query image and forwards the most
similar images to the interface module. Database images are often indexed according
to their feature vectors to speed up retrieval and similarity computation [19]. Note that
both the data insertion and the query processing functionalities use the feature vector
extraction module.
The CBIR system that we developed in this thesis is structured in a very similar way.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
13/69
2.1 Content-based image retrieval 4
2.1.2 Image descriptors
An image descriptoris a pair is a pair feature vector extraction function and distance
function, used for image indexation by similarity. The extracted feature vector subsumes
the image properties and the distance function measures the dissimilarity between two
images with respect to their properties [19].
This section aims to present a brief overview of existing image descriptors. Even though
the image retrieval system developed in this thesis deals only with shape-based retrieval
for binary images, we will be discussing, for completeness, color and texture in addition
to shape features used in image retrieval.
2.1.2.1 Color
The color feature is one of the most widely used visual features in image retrieval.
It is relatively robust to background complication and independent of image size and
orientation [19].
Color description techniques can be grouped into two classes based on whether or not
they encode information related to the color spatial distribution.
Examples of descriptors that do not incorporate spatial color distribution include Color
Histogram, Color Moments and Color Sets. Color Histogram is the most commonly
used descriptor in image retrieval. Statistically, it denotes the joint probability of the
intensities of the color channels, e.g. RGB [12].
On the other hand, such descriptors as Color Coherence Vector (CCV), Border/Interior
Pixel Classification (BIC), and Color Correlogram, incorporate color spatial distribution
[19].
2.1.2.2 Texture
This image property can be characterized by the existence of basic primitives, whose
spatial distribution creates some visual patterns defined in terms of granularity, direc-
tionality, and repetitiveness. There exist different approaches to extract and represent
textures. They can be classified into space-based,frequency-basedmodels, andtexture
signatures [19].
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
14/69
2.2 Shape description techniques 5
Co-occurrence Matrix is one the most traditional techniques for encoding texture in-
formation. It describes spatial relationships among grey-levels in a image. A cell de-fined by the position(i,j)in this matrix registers the probability at which two pixels ofgray levelsi and j occur in two relative positions. A set of co-occurrence probabilities
(such as, energy, entropy, contrast) has been proposed to characterize textured regions.
Other example of space-based method includes the use ofAuto-RegressiveModels.
Frequency-based texture descriptors include, for instance, the Garbor wavelet coeffi-
cients that were found to be the best among the tested candidates which matched human
vision study results [12].
An example of texture signatures can be found in the proposal of Tamura et al. This
descriptor aims to characterize texture information in terms of contrast, coarseness, anddirectionality. The MPEG-7 initiative proposed three texture descriptors: texture brows-
ing descriptor, homogeneous texture descriptor, and local edge histogram descriptor
[19].
2.2 Shape description techniques
A shape was defined as: all the geometrical information that remains when location,
scale and rotational effects are filtered out from an object [17]. Shape description is
the extraction of shape features in order to quantify important properties of the shape.
2.2.1 Demands on shape features
Petrakis et al [10] state that among others the following properties are important for
reliable shape matching and retrieval:
Invariance to translation, rotation and scale, Robustness to noise and deformations, Computational efficiency, Compactness (the features require little storage space).
In our algorithms we used only such features that meet these requirements.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
15/69
2.2 Shape description techniques 6
2.2.2 Classification of shape descriptors
Shape descriptors are classified into boundary-based (or contour-based) and region-
basedmethods. This classification takes into account whether shape features are ex-
tracted from the contour only or from the whole shape region. These two classes, in
turn, can be divided into structural (local) and global descriptors. This subdivision is
based on whether the shape is represented as a whole or represented by segments/sec-
tions. Another possible classification categorizes shape description methods intospatial
and transform domain techniques, depending on whether direct measurements of the
shape are used or a transformation is applied [22].
Figure 2.2: Classification of shape representation and description techniques (Reprinted
from [22])
Next, we present an overview of the shape descriptors relevant for the rest of this thesis.
2.2.3 Global descriptors
Perimeteris the shape boundary length, i.e the number of pixels on the shape boundary.
This feature is often used to normalize curves to have unity length (e.g. in discrete curve
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
16/69
2.2 Shape description techniques 7
evolution that will be described later).
Areais the number of pixels constituting the shape.
Roundnessis roughly correlated with the complexity of the contour and can be com-
puted asR=Area 4/Perimeter2.Roundness equals one for a circle and zero for a line segment.
Convex hull
A regionRis convex if and only if for any two points P1,P2R, the whole line segmentwhose end points are P1 and P2 is also inside R. The convex hull of a region is the
smallest convex regionHwhich satisfies the condition RH. [22]
(a) (b)
Figure 2.3: an object (a) and its convex hull (b)
Solidity is defined as the ratio of the shapes area to the area of its convex hull and
measures the deviation of a shape from being totally convex.
2.2.3.1 Shape signatures
In general, ashape signature is any 1-D function representing 2-D areas or boundaries
[21]. Assume the shape boundary coordinates(x(t),y(t)),t= 0,1,...,L 1, have beenextracted in the preprocessing stage. Then we can define the
Complex coordinates function,
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
17/69
2.2 Shape description techniques 8
which is simply the complex number generated from the boundary coordinates:
z(t) =x(t) + iy(t)
In order to eliminate the effect of bias, we use the shifted coordinates function:
z(t) = [x(t)xc] + i[y(t) yc]
where(xc,yc)is thecentroid of the shape, which is the average of the boundary coordi-
nates
xc=1
L
L1t=0
x(t), yc=1
L
L1t=0
y(t)
This shift makes the shape representation invariant to translation. An im-
portant property of this representation is that it is information preserving,
i.e. it allows full reconstruction of the shape of the contour [21].
2.2.3.2 Fourier descriptors
For a given shape signature described as aboves(t),t= 0,1,...L,assum-ing it is normalized toNpoints in the sampling stage, the discrete Fourier
transform ofs(t)is given by
un= 1
N
N1
k=0
s(t) exp(j2nt
N), n=0,1,...,N1
The coefficientsun,n=0,1,...,N 1,are called
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
18/69
2.2 Shape description techniques 9
Fourier descriptors(FD) of the shape, denoted asF Dn,n = 0,1,...,N
1 [21]. Rotation invarianceof the FDs is achieved by ignoring the phaseinformation and by taking only the magnitude values of the FDs.
For complex coordinates signature, all the Ndescriptors except the first
one (DC component) are needed to index the shape. The DC component
depends only on the position of the shape, it is not useful in describing
shape thus is discarded.
Scale normalization is achieved by dividing the magnitude values of all
the other descriptors by the magnitude value of the second descriptor.The invariant feature vector used to index the shape is then given by [ 21]
f= [ |FD2||FD1|,|FD3||FD1|,...,
|FDN1||FD1| ]
Typically, 10-15 descriptors are sufficient to describe shapes. We used
N=14 in our algorithms.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
19/69
2.2 Shape description techniques 10
original shape
N=1 N=2 N=4
N=6 N=10 N=15
N=20 N=30 N=50
Figure 2.4: Reconstruction of a deer shape with increasing number of FDs. The general
form of an object can be described by the first few coefficients.
2.2.4 Structural descriptors and partial shape matching
With the structural approach, shapes are broken down into boundary seg-
ments called primitives. Structural methods differ in the selection of
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
20/69
2.2 Shape description techniques 11
primitives and the organization of the primitives for shape representation
[22].
2.2.4.1 Shape tokens
In [2], the curvature zero-crossing points from a Gaussian smoothed bound-
ary are used to obtain primitives, called tokens (Fig.2.5). The feature for
each token is its maximum curvature and its orientation, and the similar-
ity between two tokens is measured by the weighted Euclidean distance.
Figure 2.5: A horse shape has been divided into different tokens. The numbers cor-
responding to each token are the curvature and the orientation of the token. (Reprinted
from [2]).
Since the feature includes curve orientation, it is not rotation invariant.
The authors addressed the problem, but did not solve it.
Given a query shape, the retrieval of similar shapes from the database
takes two steps. The first step is token retrieval. For all theN tokens
on the query shape, the similar tokens are found by traversing the index
treeNtimes. The set of retrieved tokens having the same shape identifier
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
21/69
2.2 Shape description techniques 12
form a potential similar shape. The second step is to match the query
shape and the potential similar shape using a model-by-model match-
ing algorithm which is the best match between tokens of the two shapes
and involves O(MN)operations (MandNare the numbers of tokens oftwo matching shapes, respectively). Matching of tokens in both steps in-
volves thresholding which is ad hoc or empirical. Quantitative retrieval
performance (precision and recall) and retrieval efficiency are reported
based on a shape database extracted from classical painted images. Since
the tree is traversed a number of times in the shape matching, it is not
clear whether the indexing is better than model-by-model indexing. Onlymatching performance using different trees is reported. The matching ef-
ficiency also depends on the number of tokens for each shape, and on the
scale used in the smoothing stage [22].
2.2.4.2 Visual parts
Latecki et al [9] presented a shape matching approach that works directlyon the the closed boundaries. It is based on visual parts (VP), where (part
of) a database shape is simplified in the context of the query shape prior
to their matching. The simplification process includes the elimination
of particular points from the database shape such that the similarity to
the query shape is maximized. The main disadvantage of this method is
the high computational complexity of the matching algorithm, which is
O(N3logN)whereNis the number of the boundary points [1].
2.2.4.3 Skeletons
The basic idea is to eliminate redundant information while retaining only
the topological structure of the object. Skeletons can be computed by
medial axis transform. The medial axis is the locus of the centers of
maximal circles that fit within the shape, as illustrated in Fig. 2.6.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
22/69
2.2 Shape description techniques 13
Figure 2.6: The medial axis of a polygon is defined as the locus of centers of maximally
inscribed disks. (Reprinted from [22]).
The skeleton is then segmented and represented as a graph according to
certain criteria. The matching between shapes becomes graph matching
problem. This method is sensitive to noise and requires high computa-
tions [22].
Figure 2.7: The sensitivity to noise of the medial axis: small changes in the boundarymay induce significant changes in the medial axis. (Reprinted from [18]).
Another disadvantage of skeleton-based shape partitioning is that it can
produce unintuitive results (Fig.2.8).
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
23/69
2.2 Shape description techniques 14
(a)
(b)
Figure 2.8: The parsing of the dog bone into parts at the branch points of the Medial
Axis Transform (a) gives the same part structure to a rectangle (b).
(Reprinted from [15]).
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
24/69
Chapter 3
Cognitive Principles of ShapePartitioning
There is strong evidence from cognitive psychology that humans recog-
nize objects by first decomposing them into parts. Human vision orga-
nizes object shapes in terms of parts and their spatial relationships. We
perceive a human hand, for example, as a coherent perceptual object; but
also as a spatial arrangement of clearly defined parts: five fingers and a
palm. Hence, perceptual units exist at many levels: at the level of whole
objects, at the level of parts, and possibly smaller parts nested within
larger ones [15].
In this chapter we summarize the main findings in this research area be-
cause they were explicitly used in the design of our algorithms.
3.1 The minima rule
Cognitive experiments have shown that humans perceive as a boundary
between two "parts" a segment containing at least one point of negative
curvature [3]. The reason is that when two convex parts overlap, their
15
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
25/69
3.1 The minima rule 16
boundary in most cases contains one or two points of negative curvature
(see Fig.3.1)
Figure 3.1: When two 3D shapes intersect, they generically create a concave crease at
the locus of intersection (reprinted from [15]).
Therefore we can define the
Minima Rule for Silhouettes:
Divide silhouettes into parts using points of negative minima of curvature
on their bounding contour as boundaries between parts [3].
Figure 3.2: Although any subset of an object is physically a part of it, human observers
clearly find some parts perceptually natural (b),whereas others seem rather contrived (c)
(reprinted from [15]).
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
26/69
3.2 Boundary strength (minima salience) 17
3.2 Boundary strength (minima salience)
The sharper a curvature minimum M, the more natural it is to a human
observer to draw a cut through it [15].
Figure 3.3: Sharper negative minima are stronger attractors of parts cuts than weaker
negative minima. In (b), a slight deviation of the part cut from negative minima looksclearly wrong. However, in (d) a deviation of identical magnitude appears less contrived
(reprinted from [15]).
However, a good cut doesnt always connect two curvature minima, even
if they are very sharp. Thus geometric constraints in addition to the min-
ima rule are needed to define cuts, and hence the parts themselves.
For our current purposes, we take a part cutto be a straight-line segment
which joins two points on the outline of a silhouette such that
**(1) at least one of the two points has negative curvature,
**(2) the entire segment lies in the interior of the shape.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
27/69
3.3 Cut length 18
Figure 3.4: The natural part cuts for the shape in (a) are shown in (b). Note that each of
these cuts joins a negative minimum of curvature to a point of zero curvature. Simply
joining the two negative minima, on the other hand as in (c) leads to a perceptuallyunnatural parsing. (Adapted from [16])
3.3 Cut length
Consider the elbow in Figure3.5. Cut pq on this elbow looks far more
natural than cut pr. In Figure3.5b, we have made the areas of the two
segments equal, and pq is still the preferred cut, suggesting that the area
of the parts is not determining the cuts in these figures. Instead, exampleslike these suggest that human vision prefers to divide shapes into parts
using the shortest cuts possible.
Figure 3.5: The role of cut length in determining part cuts. The cut pq in (a) appears far
more natural than the cut pr. This is also true in (b) where the areas of the two candidate
parts have been equated. (reprinted from [16])
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
28/69
3.4 Relative area 19
3.4 Relative area
The salience of a part increases as the ratio of its visible area to the visible
area of the whole silhouette increases [15].
3.5 Protrusion
This factor is the degree to which a part sticks out from its object. Parts
that stick out more seem to be more salient [4]. It can be computed as the
ratio of the part perimeter to the length of the cutting segment.
3.6 Good continuation
Consider the shape in Figure3.6. Here the parsing induced by the shorter
cuts (shown in Figure3.6b) appears less natural than the one induced by
the longer cuts (shown in Figure3.6c).
Figure 3.6: An example of the role of good continuation in parsing. The horizontal cuts
in (b)appear less natural than the vertical cuts in (c), even though the vertical cuts are
longer. (reprinted from [15])
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
29/69
3.7 Convex partitioning 20
There is another factor at play here, in addition to minimizing cut length:
In Figure3.6c each cut continues the directions of two tangents at the
negative minima of curvature but not in Figure3.6b. Hence good contin-
uation between a pair of tangents (one at each of the two part boundaries)
is an important geometric factor for determining part cuts.
3.7 Convex partitioning
Rosin [11] showed that a partitioning scheme which maximizes the weighted
sum of part convexities is closely related to Hoffman and Singhs part
salience factors [4]. The idea is to produce few solid parts with maxi-
mum relative area.
3.8 Partitioning problems
One of the main problems of intuitive shape partitioning is instability,
e.g. small changes in shape can cause signigicant changes in part seg-
mentation. In particular, partitioning is sensitive to relative size of shape
parts.
We addressed this issue when designing our algorithms.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
30/69
3.8 Partitioning problems 21
Figure 3.7: (a) is naturally segmented using four part cuts (into a central core and four
parts), whereas (b) is naturally segmented using two part cuts (into a large vertical body
and two parts on the sides). [16]
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
31/69
Chapter 4
The Developed System
4.1 Definitions
Contour (Boundary): a sequence of points P1P2..PNP1that, when joined,form a polygon without self-intersections (the contour of a shape). This
sequence begins and ends with the same point to ensure that the polygon
is closed.
Figure 4.1: A contour consisting of 27 points (P1and P27coincide)
22
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
32/69
4.1 Definitions 23
Part :
Let M=Pi (i-th point on the contour) and Q= Pj , then a subset of acontour,PiPi+1...Pj1PjPi, j>i mod Nis called apartMQ.The part has to have area larger than the remaining area. If this is not the
case, the part and the remaining shape are swapped. The segment M Qis called acutting segmentor justcut.
Figure 4.2: A shape with the cutting segment P8P12. The partP8P12 is the sequenceof pointsP8,P9,P10,P11,P12,P8.
As mentioned before, because of cognitive principles, a cutting segment
always starts at a curvature minimum and must lie completely inside the
shape contour.
The words "part" and "shape part" mean the same in the scope of this
thesis.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
33/69
4.2 Overview 24
4.2 Overview
The developed system has two main functionalities: populate the fea-
ture databaseand retrieval.
In the populate_database method the visual parts of a shape are
computed, their features extracted and the resulting matrix saved to a file
imagename_Features.mat.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
34/69
4.2 Overview 25
Populate Database:
read in an image
reduce in size
extract boundary
compute global featurescompute curvature
adaptive smoothing
discrete curve evolution
(leave only perceptually salient points )
insert points enabling to make shortest or straight cuts
iteratively split shape using intuition
merge incorrectly split parts
extract features from each partsave feature matrix to file
In this representation the rows are the found parts of the shape and the
columns are the corresponding features.
Note: for a very solid shape, such as a circle, no parts will be found, e.g.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
35/69
4.3 Preprocessing 26
the shape will not be split. If two such shapes are to be matched, the
system simply computes the Euclidean distance between the two feature
vectors.
The retrieval algorithm is loosely based on the shape tokens retrieval
scheme[2] and is described in detail in the next chapter.
The main idea is to match shapes both on the global and local level.
d1
distance between global features
d2distance between local featuresdad1+ (1 a)d2Note: The retrieval algorithm can only be run after the populate_database
method since it requires the extracted shape features.
4.3 Preprocessing
4.3.1 Holes
In the pre-processing stage we didnt deal with holes because it is not
always clear when holes should be filled or opened.
(a) (b) (c) (d)
Figure 4.3: if holes in (a) are filled (b), the degree of similarity between (b) and other
"lizzards" decreases. However, in some cases (c) holes should be filled (d).
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
36/69
4.3 Preprocessing 27
4.3.2 Reduce in size
This step is required mainly for computational reasons since the speed of
all subsequent algorithms strongly depends on the number of the bound-
ary points. Empirically we found out that N=128x128 white pixels are
sufficient to retain all perceptually important details of most images. There-
fore, each image is pre-processed as follows:
compute area (number of "turned on" pixels)if area > N
reduce in size to make area equal to N.
4.3.3 Extract boundary
Here we used the Matlab method bwtraceboundary and 8-connectivity
to extract the parametrized coordinates[x,y]of the shape contour.
4.3.4 Adaptive smoothing
For further processing it is necessary to sufficiently reduce the level of
noise and to remove small details that decrease the degree of similarity
between shapes. The contour of the shape is iteratively smoothed until
the number of curvature extrema becomes sufficiently low (at most 20curvature minima).
1 smoothing parameter P threshold
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
37/69
4.3 Preprocessing 28
7 increase P
8 goto 39 end
Thus, complex shapes with many details are heavily smoothed whereas
"simple" shapes are left unchanged to prevent loss of information.
(a) (b)
(c) (d)
Figure 4.4: Shapes before (a,c) and after adaptive smoothing (b,d)
4.3.4.1 gauSmooth
The easiest and most computation efficient way to smooth the boundary
would be to simply reconstruct the curve using the previously computedFourier descriptors Fz:
z=ifft(Fz); % inverse Fourier transform
xsmooth=real(z);
ysmooth=imag(z);
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
38/69
4.3 Preprocessing 29
However, such reconstruction sometimes produces self-intersections of
the approximated curve and ringing, which would significantly affect the
outcome of the partitioning algorithm.
This is why we used Gaussian smoothing, which produces natural results
(like blur of a camera).
Here the curve point sequence is smoothed by circularly convolving it
with a Gaussian
f(x|,) = 1
2e(x)2
22
1 sigma = smoothingParameter*length(x);
2 W = 3*sigma; % outside the Gaussian is negligibly small
3 t = (-W:W); % truncate too small values of the Gaussian
4 gau = normpdf(t,0,sigma);
5 gau=gau/sum(gau); % normalize to make area=1
6 xx=conv(x1,gau); % smoothed x coordinates
4.3.4.2 Compute curvature
Mathematically, a planar, continuous curve can be parameterized with
respect to its arc length t, and expressed as c(t) ={x(t),y(t)}
Hence, the curvature (t)ofc(t)at the point{x(t),y(t)}can be expressedas: can be expressed as:
(t) =xt(t)ytt(t)xtt(t)yt(t)
(x2t(t) +y2t(t))
3/2
The discrete derivatives are computed using the formulae
xn(k) =x(k 1) +x(k+ 1)
2
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
39/69
4.3 Preprocessing 30
and
xnn(k) = xn(k 1) +xn(k+ 1)
2
From these formulae one can see that the computed curvature is very sen-
sitive to noise, which results in a high number of detected extrema. Hence
the curve needs to be smoothed before further processing. The smooth-
ing parameter is adjusted as described in the adaptive smoothing function.
(a) (b)
Figure 4.5: A shape and its curvature. After smoothing only global extrema remain.
(red: maxima, blue: minima, green: inflection points).
4.3.5 Discrete curve evolution
To reduce the computation time it is necessary that the shape boundary
consists of as few points as possible. A straight-forward downsampling
(take each n-th point of the boundary) has the disadvantage that some
perceptually significant points may be removed in areas of high detail
and too many insignificant points left in areas of low detail.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
40/69
4.3 Preprocessing 31
For example, to represent one period of a cosine wave, at least 5 points are
necessary. On the other hand, to downsample a line of arbitrary length,
just 2 points are enough.
We implemented the method described by Latecki [6,7].
In every evolutional step, a pair of consecutive line segments s1,s2 isreplaced by a single line segment joining the endpoints ofs1 s2.
The key property of this evolution is the order of the substitution. The
substitution is achieved according to a relevance measureKgiven by:
K(s1,s2) =(s1,s2)l(s1)l(s2)
l(s1) + l(s2)
where line segments s1,s2 are the polygon sides incident to a vertex v,
(s1,s2)is the turn angle at the common vertex of segmentss1,s2,lis the
length function normalized with respect to the total length of a polygonalcurveC. The main property of this relevance measurement is [7,9]:
The higher value ofK(s1,s2),the larger is the contribution of the arcs1 s2to the shape. Given the input boundary polygon P withn vertices,DCE produces a sequence of simpler polygons P= Pn,Pn1,...,P3 suchthatPn(k+1) is obtained by removing a single vertexv fromPnkwhoseshape contribution measured byKis the smallest.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
41/69
4.3 Preprocessing 32
(a) (b)
Figure 4.6: a shape before (a) and after discrete curve evolution (b)
4.3.6 Insert auxiliary points
4.3.6.1 Motivation
Empirically we found out that to partition a shape well, several points
have to be inserted. For example, if a cut has to be made and a pointhas been removed by discrete curve evolution, then this good cut cant be
made.
Figure 4.7: Contour of cellular_phone-04 after discrete curve evolution.
For example, the shape of a cell phone apparently consists of two parts:
the body and the antenna. However, since the body is rectangular, its
lines are straight and thus just two points of each line remain after the
curve evolution.
Obviously, none of the possible cuts is intuitive.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
42/69
4.3 Preprocessing 33
Figure 4.8: bad cuts (red). Because "good" points are missing, no "good" cuts exist
here.
Therefore, we need to insert points that will most likely be used to build
part cuts. We used the "shortest cut" and "good continuation" rules.
4.3.6.2 insertShortestCut
As previously mentioned, humans prefer to partition shapes with seg-
ments having shortest length. This means that such segments are orthog-
onal bisectors of the opposite lines on the contour. More formally,
1 for each curvature minimum M2 for each point Pi on the contour
3 compute vector v = Pi->Pi+1 %tangent at Pi
4 compute normal vector n v %orthogonal to PiPi+1
5
6 L:= M + k*n %line through M, in direction of n
7
8 S=intersect(L,PiPi+1) %intersection point of line L and
9 %segment PiPi+1
10
11 if between(S, PiPi+1) %intersection within segment PiPi+1
12 insert S %into the contour points sequence
13 end
14 end
15 end
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
43/69
4.4 Part segmentation 34
4.3.6.3 insertStraightCut
Continue each long line segment until it intersects some other segment.
Insert the intersection point into the sequence of boundary points.
Figure 4.9: After points have been inserted, intuitive partitioning is possible.
4.4 Part segmentation
4.4.1 SplitShape algorithm
The algorithm uses the previously mentioned cognitive principles trying
to split a shape in a way a human would do it. The following pseudocode
illustrates the main idea and is shown very strongly simplified.
1 while remaining shape not convex
2 for all curvature minima M
3 for all points on the boundary P
4 if admissible(cut M->P)
5 compute6 CutLength,
7 areaC %relative area of the candidate Part,
8 solC %its Solidity (=Convexity)
9 solR %Solidity of the Remaining Shape
10 mnSalience %Salience of start and end points
11
12 F = areaC + solC + cutLength + solR + mnSalience;
13
14 save F %value of utility function for cut M->P
15 end
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
44/69
4.4 Part segmentation 35
16 end
17 end
18 remove Parts with highest F
19 end
A cut is admissible if it lies completely inside the shape boundary. To
verify that for a cut MP we designed a function lineInPoly thatchecks whether the vector MP intersects any of existing line segments.
(In other words, for alln, points pnpn+1 must lie on the same side of the
line through M
P). This solution is much better than the numerical
one (generate 100 points betweenM,Pand check if all of them lie in thepolygon).
Each component of the utility function F is cognitively motivated:
areaC
Part salience increases with relative area. Regions of a shape are nor-
mally perceived as parts only if they have some significant area relativeto the original shape [15].
solC
Rosin [11] showed that a partitioning scheme which maximizes the weighted
sum of part solidities is closely related to Hoffman and Singhs part
salience factors [4]. Therefore, part salience increases with solidity.
solR
Empirically we found out that "good" parts, when removed from the orig-
inal shape, make it more solid. For example, the solidity of an X-shape is
about 0.3. However, when the four "legs" are removed, only the central
core remains, which has solidity=1.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
45/69
4.4 Part segmentation 36
mnSalience
Hoffman and Singh [4] showed that sharper extrema of curvature are
more powerful attractors of part cuts. Latecki [6, 7] demonstrated that
also the lengths of the correspondent tangents are important.
Moreover, we found that the components are not significantly correlated
and thus all of them need to be computed. Each component has been
appropriately weighted, sometimes using nonlinear transformations.
After all part candidates have been examined, the algorithm:
- takes the partPiwith highest value of the utility functionF max,
- finds all other partsPj that are:
***(1) disjoint withPi (PiandPjhave not more than 2 points in common)
***(2) have value of utility functionFj>0.9Fmax.
The criterium (1) makes sure that any subregion of the shape belogs to
exactly one part. Thus, it is forbidden to partition a (leg=foot+shin+hip)
into (foot+shin) and (shin+hip)
Demand (2) is mainly due to computational reasons. To partition a shape
in less time it is better to remove several parts in one iteration. Moreover,
if parts were removed one by one, the value of the utility function for
future iterations would be different and thus the overall result very noise-
sensitive.
The described algorithm splits many simple shapes of the MPEG7 dataset
in just one iteration (such as device0..device7, apple, cell_phone etc).
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
46/69
4.4 Part segmentation 37
4.4.2 Merge parts
The main idea of this thesis is not only to partition shapes intuitively, but
also to achieve that similar shapes are partitionedin the same way. Also,
partitioning should be robust to noise and moderate perturbations of the
contour.
Since partitioning algorithms have no background knowledge about the
world, wrong cuts cannot be avoided.
(a) (b)
Figure 4.10: (a) incorrect partitioning of octopus-15. The part cut through the body is
wrong, even though its start and end points are salient minima. (b) Correct partitioning.
The merging algorithm consists of two phases. In the first phase obvious
errors are corrected, such as splitting of a circle in two halves. These
errors mainly occur in an attempt to make the remaining shape as convex
as possible.
The key idea of the first phase is to merge two or more parts so that:
***(1) the resulting part has high solidity,
***(2) this merge is not "unfair" to other candidates for a merge.
For example, a regular pentacle (5-pointed star) is naturally split into a
pentagon with 5 triangles around it. We have to forbid a merge between
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
47/69
4.5 Feature extraction 38
the pentagon and any of those triangles to remain "fair" and preserve
topology.
(a) (b)
Figure 4.11: the correct partitioning (a) can be destroyed by an incorrect merge (b)
The second phase of the algorithm is "topological" merge. Two or more
neighbouring parts are merged if each of them has more than one neigh-
bour. The rationale is that most living things or objects have their limbs
arranged around just one center.
(Another extension of the algorithm would be to merge several parts ifthey can be turned in such a way that this results in high solidity. For
example, the tail of a ray is basically a long cylinder which can be bent
several times. Because of low solidity, it will be split in about ten parts
by the SplitShape algorithm. If all those parts are merged, the degree of
similarity between the ray with a straight tail and a bent tail will increase
making the retrieval robust to articulation of limbs.)
4.5 Feature extraction
4.5.1 Global features
The idea is to capture the overall appearance of the shape without going
into details. These features are computed right after the boundary ex-
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
48/69
4.5 Feature extraction 39
traction. As global features we took 14 normalized Fourier Descriptors
obtained by the Contour Fourier method [5]. The advantages of these
features are:
- robustness to noise,
- invariance to rotation, translation and scale,
- invariance to starting point on the boundary,
- computational efficiency.
4.5.2 Local features
To describe the segmented parts we used the following features:
1. Fourier descriptors (also 14, as in the case of global features)
2. roundness3. relative Area
4. solidity
5. number of neighbours
We found out that Fourier descriptors are much better at describing con-
vex shapes compared to such features as eccentricity. They can distin-
guish between such shapes as squares and hexagons, which is not thecase with many other global features.
Roundnesshas very small values for elongated shapes such as sticks or
pencils and is therefore good at detecting them.
Relative area helps to know at which scale the parts should be matched
during retrieval since the system expects whole shapes as a query.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
49/69
4.6 Retrieval algorithm 40
Solidityof most parts will be equal to one because the SplitShape algo-
rithm tries to remove only solid parts. However, during topological merge
several central parts can be merged together. For example, a beetle torso
will be most likely split in two halves but later merged together because
each part has several neighbours (legs or antennas).
Number of neighboursallows to distinguish between such classes as de-
vice1 and device2. Both of these cogwheel-like structures have a central
core with limbs around them. However, the former has six spikes whereas
the latter has eight.
4.6 Retrieval algorithm
Once a reasonably intuitive shape partitioning has been found, partial
shape matching can be carried out which is more robust to articulations
and occlusions than whole shape matching.
For similarity computation we used a distance measure based on Eu-
clidean distance between feature vectors.
We found that with increasing solidity partitioning becomes increasingly
unstable and thus part-based distance becomes unreliable. Therefore we
decided to weight distances depending on the shape soliditys.
1 d1
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
50/69
4.6 Retrieval algorithm 41
1 Compute feature matrix of the query shape Q
2
3 for all part_feature matrices Mk in the database
4 if Q has more rows than Mk
5 swap(Mk,Q);
6 end
7
8 for each r ow ri o f Q
9 for each row rj of Mk
10 d(j) = dist(ri,rj) % Euclidean distance between two parts
11 end
12 totalDist += min(d) % add dist of best matching parts
13 remove matched part from Mk
14 end
15 totalDist+=penalty(unmatched parts)
16 end
Thus, the part-based shape distance is the sum of Euclidian distances
between best matching parts and a penalty for unmatched parts.
The advantage of this algorithm is that once shapes features have been
extracted, the retrieval is fast because then only Euclidian distances haveto be computed, opposed to graph-matching algorithms where most com-
putation has to be done during retrieval. Typically it takes less than 10
ms to match two shapes of average complexity.
Another advantage is flexibility. By adjusting the penalty weight one can
implicitly set the threshold for tolerable occlusion percentage. For ex-
ample, if the penalty is set to zero (which is the case in the shape tokens
algorithm [2]), then it is enough to match all parts of one shape to com-
pute distance. Thus, the distance between a cogwheel and a circle would
be zero because the core of the cogwheel perfectly matches the circle.
However, we believe this is not the most intuitive result.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
51/69
Chapter 5
Performance Evaluation
5.1 Retrieval rate
To test the performance of our system we evaluated the retrieval rate on
the dataset created by the MPEG-7 committee for evaluation of shape
similarity measures [20]. The test set consists of 70 different classes of
shapes, each class containing 20 similar objects, usually (heavily) dis-
torted versions of a single base shape. The whole dataset therefore con-
sists of 1400 shapes. For example, each row in Figure shows four shapes
from the same class.
We focus our attention on the performance evaluation in experiments es-
tablished in Part B of the MPEG-7 CE-Shape-1 data set.
Each image was used as a query, and the retrieval rate is expressed by the
so called Bulls Eye Percentage (BEP): the fraction of images that belong
to the same class in the top 40 matches. Since the maximum number of
correct matches for a single query image is 20, the total number of correct
matches is 28000.
Strong shape variations within the same classes make that no shape simi-
42
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
52/69
5.1 Retrieval rate 43
(a) (b)
Figure 5.1: Some shapes used in part B of MPEG-7 Core Experiment CE-Shape-1.
Shapes in each row belong to the same class.(reprinted from [20] )
larity measure achieves a 100% retrieval rate. E.g., see the third row in (a)
and the first and the second rows in (b). The third row shows spoons that
are more similar to shapes in different classes than to themselves [20].
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
53/69
5.1 Retrieval rate 44
010
20
30
40
50
60
70
80
90
100
RetrievalRate(%)
octo
pus
device
2de
vice
0tree device
5
dev
ice9 crow
nde
vice
1be
etle
device
6 camel
device
7hat
device
4sp
oon
lizza
rd bric
k Mis
kde
vice
3fish
chicke
n
elepha
nt
b
utte
rfly horse
sea_
snak
e wat
ch lmfis
hrayba
t
chop
per
person
al_c
ar turtle frog
ham
merca
rrat bone pen
cil bird dee
rfla
tfish Hear
tpo
cket
clas
sicfly com
ma
foun
tainja
r shoest
ef apple cattl
edog truck sprin
g
cellu
lar_
phon
egu
itar key
device
8bel
lHC
ircle fork
carria
gebottl
ecup
horses
hoe tedd
y glas
child
ren face
contourFourier
ourmethod
Figure5.2:Resu
ltsoftheMPEG-7CE-Shape-1partBtestforeachclassforb
othContourFourierdescriptorsandourpart-basedmethod.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
54/69
5.2 Time issues 45
Figure 5.2 shows that our method (significantly) outperforms the CFD
method for 55 classes. The Bulls Eye percentagesare:
- our method: 63.536 %
- Contour Fourier method: 57.014 %.
From Figure5.2one can see that our method performs best on shapes
having clear part structure and thus having stable (consistent) partition-
ing, such as device7, which is most logically split to a central core and10 triangles around it.
Our method also better deals with occlusions and articulations than the
CFD method (see Fig.5.5)
However, in case of unstable partitioning the retrieval rate of our method
significantly decreases.
5.2 Time issues
To run the Bulls Eye test we used Intel Pentium 4 CPU with 2.26 GHz
and 512 MB of RAM. The programs were unoptimised Matlab code.
Our method Contour Fourier method
Time to complete Bulls Eye test 4h 32min 2h 2min
Average retrieval time for one query 11.65 seconds 5.23 seconds
Time to extract features 117 minutes 20 minutes
Average time to extract features 5 seconds 0.85 seconds
Table 5.1: Time needed to perform feature extraction and retrieval.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
55/69
5.2 Time issues 46
5.2.1 Feature extraction
Table5.2shows that the proposed method is about six times slower than
the CFD method. In fact, the CFD is a subset of our method (see "com-
pute global features" part). However, the main amount of computation
for feature extraction is due to convex hull construction. Nevertheless,
this step cannot be left out because, as previously shown, convexity is
one of the main features that determine part saliency.
It also deserves mentioning that the CFDs speed isO(n)in the number ofboundary pixelsnand is absolutely independent of the shape (i.e. relative
positions of pixels). On the other hand, the speed of our partitioning al-
gorithm strongly depends on shape and is O(mn), wheremis the numberof curvature minima. However, this increase in complexity has limits be-
cause the adaptiveSmoothing and curveEvolution routines
simplify the shape boundary and thus limit the number of minima.
In the best case the shape is convex and doesnt need to be split. Thenthe algorithm is exactly as fast as the CFD. This explains why the feature
extraction time ranges from 1 to 15 seconds per shape.
5.2.2 Retrieval
Here the retrieval times differ on average by the factor of two. Again, in
the CFD case the speed is constant because to compare two shapes thedistance between just two feature vectors has to be computed.
On the other hand, our system basically needs to compute distances be-
tween each pair of parts additionally to the global feature vector. Thus,
time required grows quadratically in the number of shape parts. How-
ever, the number of parts is implicitly limited due to the preceding curve
smoothing, so that one can basically regard the increase in delay as a
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
56/69
5.3 Comparison to other part-based methods 47
constant factor.
We think that the retrieval speed can be improved by properly indexing
feature matrices. For example, one could sort them by the number of rows
(i.e. the number of shape parts) and then match only potential candidates.
We also believe that for a CBIR system, most relevant is the retrieval
time because this is the time the user has to wait when he needs to retrieve
results. Moreover, feature extraction needs to be done only once, whereas
retrieval delay occursevery time.
5.3 Comparison to other part-based methods
We compared our system to Shape Tokens, Latecki NL and Skeleton-
based methods, which were briefly described in chapter 1. To obtain
the Bulls Eye scores we used http://give-lab.cs.uu.nl/sidestep. The au-
thors of this website have reimplemented many popular shape-based al-
gorithms, thus we assume that the reported scores are correct.
5.3.1 Shape tokens
As mentioned before, this method is not rotation-invariant. It is also not
robust to occlusions, since sufficiently large protrusions caused by noise
can cause extra inflection points and thus splitting of shape tokens. Our
method would simply regard such protrusions as a new parts and remove
them in the SplitShape method.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
57/69
5.3 Comparison to other part-based methods 48
5.3.2 Skeletons
The main problems of skeleton-based matching are: sensitivity to noise,
high computation complexity and sometimes unituitive partitioning (see
chapter1 for details). The advantage is robustness to articulations. Usu-
ally shock graphs need to be constructed and matched, which is very time
consuming. The Bulls Eye percentage reported by the aforementioned
website is 68%.
5.3.3 Latecki NL
This method is more accurate than the previous two with the Bulls Eye
score 72%. However, the price paid is O(n3log(n)) computation com-plexity during the matching.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
58/69
5.3 Comparison to other part-based methods 49
device710d=0
device205d=1.71
device717
d=1.73
device715
d=1.83
device702d=1.88
device704d=1.9
device706d=2.23
device711d=2.45
device716d=2.54
device703d=2.55
device213d=2.58
device709d=2.64
device219d=2.66
device713d=2.66
device114d=2.72
octopus18d=2.73
device217d=2.84
device719d=2.84
device701d=2.86
device705d=2.87
Figure 5.3: Twenty most similar images to device7-10 found by our method. Matched
parts are displayed in the same color as the corresponding query parts. Parts for which
no correspondence was found are painted black.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
59/69
5.3 Comparison to other part-based methods 50
device710d=0
device205d=0.55
octopus18d=1.31
device317d=1.4
device304d=1.42
device312d=1.42 hat04
d=1.44
device319d=1.48
device302d=1.53
device309d=1.54
device107d=1.6
device307d=1.6
device012d=1.63
device115d=1.63
device116d=1.65
device111d=1.66
device310d=1.66
device114d=1.66
device303d=1.66
device106d=1.67
Figure 5.4: Twenty most similar images found by the CFD method. Images are dis-
played as silhouettes because this method doesnt compute any parts.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
60/69
5.3 Comparison to other part-based methods 51
ray11d=0
ray12d=0.98
ray15d=1.55
ray09
d=1.56
ray16d=1.57
ray06d=1.75 ray10
d=1.78
ray07d=1.94
ray08d=1.97
ray04d=1.98
cattle03d=1.98
ray13d=2.02
ray03d=2.04
cattle15d=2.05
cattle07d=2.06
elephant06d=2.07
cattle02d=2.11
elephant09d=2.11
elephant03d=2.12 cattle10
d=2.12
Figure 5.5: Twenty most similar images to ray-11 found by our method. Matched parts
are displayed in the same color as the corresponding query parts. Parts for which no
correspondence was found are painted black.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
61/69
5.3 Comparison to other part-based methods 52
ray11d=0
ray12
d=0.6
ray15d=1.04
ray16d=1.05
butterfly20d=1.32
butterfly17d=1.4
ray06d=1.46 elephant09
d=1.46
cattle01d=1.46
ray07d=1.48 elephant06
d=1.49
ray19d=1.49
butterfly14d=1.55 cattle20
d=1.56
horse09d=1.57
horse05d=1.57
camel16d=1.59
ray09d=1.63
horse10d=1.64
deer19d=1.64
Figure 5.6: Twenty most similar images to ray-11 found by the CFD method.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
62/69
5.3 Comparison to other part-based methods 53
device601 device602 device603 device604
device605 device606 device607 device608
device609 device610 device611 device612
device613 device614 device615 device616
device617 device618 device619 device620
Figure 5.7: Inconsistent partitioning makes it difficult to match shapes.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
63/69
Chapter 6
Conclusions
As expected, the developed system performs best on shapes having clear
part structure (such as device1 or device7). It can distinguish between 5-,
8- and 10-pointed stars even in the presence of noise or articulations.
Also, because of partial matching, it significantly outperforms global de-
scriptors when dealing with partially occluded shapes (e.g. the classes
ray, apple and octopus of the MPEG-7 Shape-1 dataset).
Problems may arise when dealing with shapes prone to unstable parti-
tioning. Whenever a shape is split incorrectly (i.e. in a different way than
the members of the same class), it leads to a big shape distance. This
explains why the retrieval rate is so low for the classes fly or dog.
To overcome this flaw, the developed system was extended to combineboth, part-based and global descriptors. Shape distance is computed
as weighted sum between global and part-based distances. Whenever a
shape is prone to unstable partitioning (which is most often the case with
high solidity shapes), the algorithm gives part-based distance less weight.
Thus, the system tries to perform at least as good as the associated global
descriptor.
54
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
64/69
Chapter 7
Future Work
To improve the developed system one could implement the following ex-
tensions:
- Take relative positions of parts into account. Define such features
as part orientation and describe the position of each part in polarcoordinates (distance from shape centroid and polar angle).
- Merge part chains. Bent shapes (such as sea_snake) are currently
split in many convex segments, although topologically seen such
shapes have no part structure. Curved parts can be described by their
solidity or bending energy. This would allow to detect similaritity
between bone and broken_bone.
- To make matching more robust to partitioning errors, allow to matchseveral parts to one part or even N to M parts at once. Merge or split
parts at runtime, to better match the query shape. This extention
would be relatively easy to implement because currently the algo-
rithm can save the computed parts in the database.
- Compute several representations for each shape (most probable par-
titions). Then the shape distance is the smallest distance between all
55
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
65/69
Chapter 7: Future Work 56
pairs of such partitions. (This would, of course, reduce the speed of
both the feature extraction and retrieval processes asO(n2)).
- Use a more powerful global descriptor or a combination hereof, for
example the multiscale Fourier Descriptor [5].
- Use envelope detection or just convex hull before extracting shape
boundaries. This would allow to correctly classify such shapes as
distorted pentagons, triangles and squares from the MPEG-7 Shape-
1 dataset.
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
66/69
Contributions of this project thesis
Design of a utility function that combines several cognitive princi-ples of shape partitioning,
Algorithm to find perceptually salient curvature extrema, Algorithm to check whether a cutting segment lies inside a polygon, Shape splitting and merging algorithms, Combining of part-based and global similarity measures, Image database retrieval architecture.
57
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
67/69
Appendix
Results that turned out less useful
The original project title was "Shape representation and matching using geometric prim-
itives (geons)". The main idea was to decompose a given 2-D binary shape into gener-
alized rectangles and ellipses and represent the shape as a directed graph or encode as a
number.
However, we found that this approach is only applicable to man-made objects that have
clear geometric structure. On the contrary, most natural objects cannot be reliably rep-
resented by simple geometric figures. Thus, this decomposition scheme is not robustand hence inappropriate for part-based matching.
Therefore, instead of describing parts in parametric form, we decided to extract global
features from each part. This means that in the new approach the parts can have arbitrary
form.
58
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
68/69
Bibliography
[1] N. Alajlan. Multi-Object Shape Retrieval Using Curvature Trees. PhD thesis,University of Waterloo, Canada, 2006.
[2] S. Berretti, A. Del Bimbo, and P. Pala. Retrieval by shape using multidimensional
indexing structures. InICIAP, pages 945950, 1999.
[3] D. D. Hoffman and W. A. Richards. Parts of recognition. In T. F. Shipley and P. J.
Kellman, editors,Cognition, chapter 18, pages 6596. Elsevier Science, 1984.
[4] D. D. Hoffman and M. Singh. Salience of visual parts. Cognition, 63, pages
2978, 1997.
[5] I. Kunttu, L. Lepisto, J. Rauhamaa, and A. Visa. Multiscale fourier descriptor for
shape classification. InICIAP 03: Proceedings of the 12th International Confer-
ence on Image Analysis and Processing, pages 536541, Washington, DC, USA,
2003. IEEE Computer Society.
[6] L. J. Latecki and R. Lakamper. Convexity rule for shape decomposition based on
discrete contour evolution. Computer Vision Image Understanding, 73(3):441
454, 1999.
[7] L. J. Latecki and R. Lakamper. Shape similarity measure based on correspondence
of visual parts. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(10):11851190, 2000.
[8] L. J. Latecki, R. Lakamper, and U. Eckhardt. Shape descriptors for non-rigid
shapes with a single closed contour. InIEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), pages 424429, 2000.
[9] L. J. Latecki, R. Lakamper, and D. Wolter. Optimal partial shape similarity.Image
and Vision Computing, 23(2):227236, 2005.
59
-
8/13/2019 Content-based Image Retrieval using intuitive Shape Partitioning
69/69
Bibliography 60
[10] E. Petrakis, A. Diplaros, and E. Milios. Matching and retrieval of distorted and
occluded shapes using dynamic programming, 2002.
[11] P. L. Rosin. Shape partitioning by convexity. BMVC99, pages 633642, 1999.
[12] Y. Rui and T. S. Huang. Image retrieval: Current techniques, promising directions,
and open issues. Journal of Visual Communication and Image Representation 10,
pages 3962, 1999.
[13] M. Safar, C. Shahabi, and X. Sun. Image retrieval by shape: A comparative study.
Technical Report 1, University of Southern California, 1999.
[14] K. Siddiqi and B. B. Kimia. Parts of visual form: Computational aspects. IEEETransactions on Pattern Analysis and Machine Intelligence, 17(3):239251, 1995.
[15] M. Singh and D. D. Hoffman. From Fragments to Objects: Grouping and Seg-
mentation in Vision, chapter 9, pages 401459. Elsevier Science, 2001.
[16] M. Singh, G. Seyranian, and D. Hoffman. Parsing silhouettes: The short-cut rule.
Perception and Psychophysics, 61, pages 636660, 1999.
[17] M. B. Stegmann and D. D. Gomez. A brief introduction to statistical shape analy-
sis. Technical report, Technical University of Denmark, 2002.
[18] M. Tanase-Avatavului. Shape Decomposition and Retrieval. PhD thesis, UtrechtUniversity, Holland, 2005.
[19] R. S. Torres and A. X. Falcao. Content-based image retrieval: Theory and appli-
cations. Revista de Informatica Teorica e Aplicada, 13(2):161185, 2006.
[20] R. C. Veltkamp and L. J. Latecki. Properties and performances of shape similarity
measures. In Tim Crawford and Remco C. Veltkamp, editors,Content-Based Re-
trieval, Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany, 2006.
[21] D. Zhang and G. Lu. A comparative study on shape retrieval using fourier descrip-
tors with different shape signatures, 2001.
[22] D. Zhang and G. Lu. Review of shape representation and description techniques.
Pattern Recognition, 37(1):119, 2004.