many-to-many feature matching for structural pattern recognition · 2009-02-27 · keselman, dr....
TRANSCRIPT
Many-to-Many Feature Matching for
Structural Pattern Recognition
Muhammed Fatih Demirci
Technical Report DU-CS-05-13Department of Computer Science
Drexel UniversityPhiladelphia, PA 19104
December, 2005
1
Many-to-Many Feature Matching for Structural Pattern Recognition
A Thesis
Submitted to the Faculty
of
Drexel University
by
Muhammed Fatih Demirci
in partial fulfillment of the
requirements for the degree
of
Doctor of Philosophy
December 2005
c�
Copyright 2005Muhammed Fatih Demirci. All Rights Reserved.
ii
Dedications
To my family
iii
Acknowledgements
This research would not have been possible without and a number of people. First and fore-
most, I would like to express my deepest gratitude to my advisor, Dr. Ali Shokoufandeh, for
his invaluable, friendly guidance, trust, patience, and constant encouragement. Being his
first Ph. D. student has been a great honor to me. I will always walk through my academic
life the way he taught me.
I would also like to express my gratitude to Dr. Sven Dickinson of the University of
Toronto for his collaboration, for providing timely advice and encouragement, and for
serving on my Ph. D. committee. I also would like to thank Dr. Ko Nishino for his ad-
vise, his tireless effort in reviewing my thesis and spending his precious time on the Ph. D.
committee. Special thanks are due to Dr. Dario Salvucci and Dr. Kim Boyer of the Ohio
State University for reading this thesis and for serving on my committee. I thank Dr. Wei
Sun for his generous support and thoughtful feedback. Thanks are also due to Dr. Yakov
Keselman, Dr. Lars Bretzner, Bram Platel, and Nicu Cornea for their helpful collaboration.
I also would like to thank the members of the Applied Algorithms Lab, Trip Denton,
Jeff Abrahamson, and John Novatnack, for taking time to proofread most of my publi-
cations, including this thesis. The discussions I had in AAL provided valuable input to
this dissertation. I also thank Craig Schroeder for proofreading many parts of this the-
sis. I would like to thank my friends Kemal Birtek, Suleyman Teke, Necati Anaz, and
Yucel Savran for being so friendly, patient and for keeping me company during my stay
in Philadelphia. I am also thankful to my brother-in-law, Dr. Sinan Akgul and my sister
Aysen Akgul for their help and hospitality during the first year of my study in Delaware.
Finally, my endless thanks are due to my parents Keziban and Musa Demirci and my
iv
fiancee Elmashan for their support, patience, and encouragement during these many years.
I would have never dreamed of pursuing my career as a researcher without them.
v
Table of Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Graph Representations and Basic Terminology . . . . . . . . . . . . . . . 11
2.2 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Embedding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . 22
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Metric Embedding of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Construction of a Tree Metric from a Distance Matrix
(Numerical Taxonomy Problem) . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Embedding into Graph-Dependent Dimensionality . . . . . . . . . . . . . 33
3.3.1 Path Partition of a Graph . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Construction of the Embedding . . . . . . . . . . . . . . . . . . . 38
3.3.3 Bringing Point Distributions into the Same Normed Space . . . . 40
vi
3.4 Embedding through Spherical Coding . . . . . . . . . . . . . . . . . . . 44
3.4.1 Construction of the Embedding . . . . . . . . . . . . . . . . . . . 46
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Encoding Directed Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 Qualitative Shape Representation Using a Blob/Ridge Decomposition . . 54
4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Distribution-Based Many-to-Many Matching . . . . . . . . . . . . . . . . . . . 63
5.1 Choosing an Appropriate Transformation . . . . . . . . . . . . . . . . . 65
5.2 The Final Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 View-Based 3-D Object Recognition . . . . . . . . . . . . . . . . . . . . . . . 71
6.1 Many-to-Many Matching using Silhouettes . . . . . . . . . . . . . . . . . 71
6.2 Many-to-Many Matching using Ridge-and-Blob Decomposition Graphs . 78
6.3 Comparison to Other Approaches . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Face Recognition Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.1 Discrete Representation of Top Points via Scale Space Tessellation . . . . 89
7.2 Catastrophe Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3 Construction of the Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8 3D Object Retrieval using Many-to-Many Matching of Curve Skeletons . . . . 101
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
vii
8.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2.1 The Curve-Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.3.1 Base Classification and Object Retrieval . . . . . . . . . . . . . . 107
8.3.2 Part Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
9.3 Discussion and FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 118
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
viii
List of Tables
6.1 Recognition rate as a function of increasing perturbation. Note that thebaseline recognition rate (with no perturbation) is 98.0% for COIL-20 and98.5% for ETH-80 datasets. . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1 Recognition rate as a function of Gaussian noise at different signal levels. 99
ix
List of Figures
1.1 The need for many-to-many matching. In the two images, the two ob-jects are similar, but the extracted features are not necessarily one-to-one.Specifically, the ends of the fingers in the left hand have been over-segmentedin the hand of the right image. . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Object Recognition Domains Used in the Framework. From left-to-right:Silhouette, Multi-Scale Qualitative Shape Description, Top Point in ScaleSpace, 3-D Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Overview of Many-to-Many Matching Procedure. . . . . . . . . . . . . . 6
1.4 A hierarchical relation between two features in a directed graph. . . . . . 8
1.5 Left: the silhouette and its shock graph. Right: the shock tree constructedfrom the shock graph. Darker nodes reflect larger radii. . . . . . . . . . . 9
2.1 An example graph whose vertices represent different image regions andwhose edges represent relations between the regions. . . . . . . . . . . . 13
2.2 One-to-one feature correspondences computed by Siddiqi et al. [99] . . . 17
2.3 Matching results between two pairs of objects computed by Sebastian etal. [86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Representing a 256 � 256 pixel image as a point in a 65 � 536-dimensionalspace. Each pixel shown by a square in (a) corresponds to an entry in the65 � 536-size vector in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1 Metric tree representation of the Euclidean distances between nodes in agraph. The gesture image (a) consists of 6 regions (the region represent-ing the entire hand is not shown). The complete graph in (b) captures theEuclidean distances between the centroids of the regions, while (c) is themetric tree representation of the multi-scale decomposition (with additionalvertices). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Path partition of a tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
x
3.3 (a) A sample tree with edge weights. (b) Embedded vertices are shownin 3-dimensional space. The Cartesian coordinates of the points are: a ��0 � 0 � 0 � , b � � 1 � 0 � 0 � , c � � 1 � 5 � 0 � 0 � , d � � 0 � 2 � 0 � , e � � 0 � 3 � 5 � 0 � , f � � 0 � 2 � 23 � 1 �40
3.4 The minimum distance d and minimum angle θ between 2 points. . . . . 45
3.5 An edge weighted tree and its spherical code in 2D. The Cartesian coordi-nates of the vertices are: a � � 0 � 0 � , b � � 0 � 1 � 0 � , c � � 0 � 1 � 5 � , d � � 2 � 0 � 0 � ,e � � 2 � 5 � 0 � 87 � , f � � 3 � 5 � 0 � , g � � 3 � 93 � 0 � 25 � , and h � � 4 � 5 � 0 � . . . . . . . 46
3.6 Trade-off between distortion and dimension for a given set of graphs. . . . 51
4.1 Feature Extraction: Extracted blobs and ridges at appropriate scales. . . . 56
4.2 Extracted blobs and ridges after removing multiple responses and ridgelinking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 The four edge relations: (a,b) two normalized distance measures, (c) rela-tive orientation, and (d) bearing. . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Histogram creation for each directed graph relation . . . . . . . . . . . . 60
4.5 Part (a) shows a vertex and its neighbors with their attributes. Histogramscreated for each attribute are presented in parts (b) and (c). . . . . . . . . 61
6.1 Left: the silhouette and its medial axis. Right: the medial axis tree con-structed from the medial axis. Darker nodes reflect larger radii. . . . . . . 73
6.2 Sample views of the 9 objects. . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Summary of many-to-many matchings of object silhouettes. Every entry ofTable 1 corresponds to a set of 19 � 19 matching results between the viewsof the two objects associated with the row and the column. The shadeof gray in each cell denotes average matching distance of each 19 � 19block, with black and white representing smallest and largest distances,respectively. Table 2 shows a close up look at the matching results for fourviews of TEAPOT. Table 3 depicts a subset of results from three seperateblocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Illustration of the many-to-many correspondences computed for two adja-cent views of the TEAPOT. Matched point clusters are shaded with the samecolor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5 The result of matching skeleton graphs for some shapes in the Rutgers ToolsDatabase. Same colors indicate corresponding segments. Observe that thecorrespondence is intuitive in all cases. . . . . . . . . . . . . . . . . . . 77
xi
6.6 Applying our algorithm to the images in Figure 1.1. Many-to-many featurecorrespondences have been colored the same. . . . . . . . . . . . . . . . 79
6.7 Views of sample objects from the Columbia University Image Library (COIL-20) and the ETH Zurich (ETH-80) Image Set. . . . . . . . . . . . . . . . 80
6.8 Sample matching results for object 9 of the COIL-20 database, in whichrows and columns can be interleaved to form the set of sequential views.The diagonal and next lower diagonal therefore represent the neighboringviews of the query (row). Only one query, entry (10,8), was incorrectlymatched. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.9 The matching results for the COIL-20 database. The rows represent thequery views (36 views per object), and the columns representing modelviews (36 views per object). Each row represents the matching results for aquery view against the whole database. The intensity of entries representsthe quality of the matching, with black representing maximum similaritybetween the views and white minimum similarity. . . . . . . . . . . . . . 82
6.10 Sample views of objects from the Rutgers Tools Database. . . . . . . . . 85
6.11 Comparison to two leading graph matching algorithms: Pelillo et al. [76](left), Sebastian et al. [87] (center), and our algorithm (right). In each case,the top seven matched database objects are sorted by their similarity to thequery. Correct matches are colored yellow, while mismaches entries arecolored red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.1 The generic catastrophes in isotropic scale space. Left: an annihilationevent. Right: a creation event. A positive charge � denotes an extremum,a negative charge � denotes a saddle, indicates the singular point. . . . 91
7.2 Visualization of the DAG construction algorithm. Left: the Delaunay tri-angulations at the scales of the nodes. Right: the resulting DAG (edgedirections not shown). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 The right image shows the DAG obtained from applying Algorithm 7 to thecritical paths and top points of the face in the left. . . . . . . . . . . . . . 94
7.4 Sample faces from 20 people. . . . . . . . . . . . . . . . . . . . . . . . . 94
7.5 Ten face images of one person from the database. . . . . . . . . . . . . . 95
7.6 Computing similarity between two given faces. (Matched point clusters areshaded with the same color.) See text. . . . . . . . . . . . . . . . . . . . 96
xii
7.7 Table 1: Matching results of 20 people. The rows represent the queriesand the columns represent the database faces (query and database sets arenon-intersecting). Each row represents the matching results for the set of10 query faces corresponding to a single individual matched against theentire database. The intensity of the table entries indicates matching results,with black representing maximum similarity between two faces and whiterepresenting minimum similarity. Table 2: Subset of the matching resultswith the pairwise distances shown. Table 3: Effect of presence or absenceof glasses in the matching for the same person. . . . . . . . . . . . . . . . 97
7.8 Sample face image after adding Gaussian noise at different signal levels.Part (a) shows the original image. Parts (b), (c), (d), (e), (f) shows how theimage looks after adding 1%, 2%, 4%, 8%, and 16% of Gaussian noise,respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1 Some examples of 3D shapes and their computed skeletons. . . . . . . . . 104
8.2 Computing similarity between two given objects. . . . . . . . . . . . . . 106
8.3 Precision/Recall for many-to-many matching algorithm in object retrievalexperiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 Models are sorted by the similarity to the query object. . . . . . . . . . . 112
8.5 Part Matching Example: computed distances between a query part (torso)versus several simple and composite objects. . . . . . . . . . . . . . . . . 112
8.6 Correspondences in Part Matching: The query object in (a) is matchedagainst each of the objects in (b). The correspondences between their skele-tons are shown in red in (c) . . . . . . . . . . . . . . . . . . . . . . . . . 113
xiii
AbstractMany-to-Many Feature Matching for Structural Pattern Recognition
Muhammed Fatih DemirciAdvisor: Ali Shokoufandeh, Ph. D.
Graph matching is an important component in many object recognition algorithms. Al-
though most graph matching algorithms seek a one-to-one correspondence between nodes,
it is often the case that a more meaningful correspondence exists between a subset of nodes
in one graph and a subset of nodes in the other. In this thesis we aim to develop a framework
to establish many-to-many correspondences between the nodes of two noisy, vertex-labeled
weighted graphs. The difficulty of providing such correspondences is due to the fact that
any subset of nodes in one graph may correspond to any subset of nodes in another. To
overcome this combinatorial challenge, we transform the graphs into an alternative domain
in which the many-to-many graph matching becomes that of matching point sets. Our
interest in transforming the many-to-many graph matching problem into that of many-to-
many point matching is motivated by the fact that a number of algorithms have proven
useful in establishing such correspondences in the geometric space in polynomial-time.
Our goal is to use one such algorithm to approximate the solution for the original graph
representations. The algorithm is based on recent developments in efficient low-distortion
metric embedding of graphs into normed vector spaces. We present two such embedding
algorithms, beginning with Matousek’s algorithm [66], in which the dimensionality of a
graph’s embedding is graph-dependent. Two graphs to be matched may yield embeddings
with different dimensionality, requiring a projection step to bring them to the same space.
We overcome this problem by introducing a novel embedding technique, using a spheri-
cal encoding of graph structure, that embeds both graphs into a single space of prescribed
xiv
dimensionality. By embedding weighted graphs into normed vector spaces, we reduce
the problem of many-to-many graph matching to the problem of computing a distribution-
based distance measure between graph embeddings. We use a specific measure, the Earth
Mover’s Distance, to compute distances between sets of weighted vectors. The computed
mass flows yield a set of many-to-many node correspondences between the original graphs.
Empirical evaluation of the algorithm on an extensive set of recognition trials, including a
comparison with competing graph matching approaches, demonstrates both the robustness
and efficiency of the overall approach.
1
1. Introduction
1.1 The Problem
Humans show a remarkable ability to recognize objects without effort, despite the fact
that objects may vary in color, texture, or size. We are even able to describe and recognize
objects that we have not seen before. While people can accomplish such recognition tasks
quickly and accurately, building computer-based recognition systems with this capability
remains difficult. Given a database and a query, one way of defining the problem of object
recognition is to classify the query as an instance of a particular category from the database.
More formally, the object recognition problem is often formulated as the process of ex-
tracting object features, such as silhouettes, corners, and skeletons, and finding correspon-
dences between them. For recognition purposes, objects are often represented as attributed
graphs whose nodes represent their features (or their abstractions) and whose edges rep-
resent relations (or constraints) between the features. These graph representations allow
us to express many perceptually significant object properties, such as geometric or hierar-
chical part structures. We will use the terms features, nodes, and vertices interchangeably
throughout the rest of this thesis.
In the computer vision community, many algorithms designed to solve the object recog-
nition problem use graph representations. One of the most important reasons for this is
that graphs capture hierarchical feature relations in a way that is invariant to viewpoint
changes. When graphs are used to represent objects, the object recognition problem can be
transformed into that of graph matching. Given two graphs, the objective of graph match-
2
ing algorithms is to establish correspondences between nodes. To evaluate the quality of
a match, one defines an overall distance measure whose value depends on both node and
edge similarity.
Due to the importance of the recognition problem, which is formulated in terms of
graph matching, there has been a growing interest in developing efficient algorithms for
matching graphs and measuring the similarity of objects using graph representations. Pre-
vious work on graph matching has typically focused on the problem of finding a one-to-one
correspondence between vertex sets of graphs. However, the assumption of one-to-one cor-
respondence is a very restrictive one, as it assumes that the primitive features (nodes) in the
two graphs agree in their level of abstraction. Unfortunately, a variety of conditions may
lead to graphs that represent visually similar image feature configurations yet there is not a
one-to-one node correspondence between vertex sets.
The limitations of the one-to-one assumption are illustrated in Figure 1.1. In this ex-
ample an object is decomposed into a set of ridges and blobs extracted at appropriate
scales [95]. The ridges and blobs map to nodes in a directed graph, with parent/child
edges directed from coarser scale nodes to overlapping finer scale nodes, and sibling edges
between nodes that share a parent. Although the two images clearly contain similar ob-
jects, the decompositions are not identical. Specifically, the ends of the fingers in the right
hand have been over-segmented with respect to the left hand. It is quite common that due
to noise or segmentation errors inherent in any feature extraction method, a single feature
(node) in one graph can correspond to a collection of broken features (nodes) in another
graph. Also, due to scale differences, a single, coarse-grained feature in one graph can
correspond to a collection of fine-grained features in the other.
3
Figure 1.1: The need for many-to-many matching. In the two images, the two objects aresimilar, but the extracted features are not necessarily one-to-one. Specifically, the ends ofthe fingers in the left hand have been over-segmented in the hand of the right image.
1.2 Objective
The principal objective of the research reported in this thesis is to develop a novel
framework for establishing many-to-many feature correspondences between pairs of graphs.
The difficulty of finding such correspondences is due to the fact that any subset of nodes
in one graph may correspond to any subset of nodes in another. To overcome this combi-
natorial challenge, we transform the graphs into an alternative domain in which the many-
to-many feature matching becomes that of matching point sets. Our interest in transform-
ing the many-to-many graph matching problem into that of many-to-many point matching
is motivated by the fact that a number of algorithms have proven useful in establishing
such correspondences in the geometric space in polynomial-time. Our goal is to use one
such algorithm to approximate the solution for the original graphs. More specifically, we
4
draw on recent low-distortion graph embedding techniques, which embed the nodes of one
graph into points in a low-dimensional geometric space. The points (which represent graph
nodes) in this new space are positioned in such a way that the Euclidean distance between
pairs of points reflects the shortest path distances between their corresponding nodes in the
original graph.
It must be pointed out that these embedding methods are applicable only to undirected
graphs, in which a metric (symmetric) distance can be defined between every pair of nodes.
In most attributed graphs, however, such as scale-space structures, edges are directed and
information is encoded by hierarchical, non-metric relations, such as parent/child or sib-
ling relations. To extend our framework to directed graphs, we move such non-metric rela-
tional information into nodes as feature histograms. This allows embedded nodes to encode
neighborhood information about their relations in the original graph representations.
Armed with a low-dimensional vector representation of an input graph’s structure,
many-to-many graph matching can now be reduced to the much simpler problem of match-
ing weighted distributions of points in a normed vector space. A number of algorithms
have been developed for matching weighted points in normed vector spaces and computing
similarities between them. We consider one such similarity measure, known as the Earth
Mover’s Distance (EMD) [82]. Intuitively, the EMD approach can be defined as follows.
Given a pair of weighted point sets, consider the first set as a mass of earth spread in space
and the other as a collection of holes in the same space. The EMD then computes the
minimum amount of work needed to fill the holes with earth. The work here refers to the
product of point weights that move from one point set into the other and the distances over
which they travel. Our goal is to show that the many-to-many vector mapping that realizes
5
Figure 1.2: Object Recognition Domains Used in the Framework. From left-to-right: Sil-houette, Multi-Scale Qualitative Shape Description, Top Point in Scale Space, 3-D Skeleton
the minimum Earth Mover’s Distance corresponds to the desired many-to-many matching
between nodes of the original graphs. The result is a more efficient approach to many-to-
many graph matching that, in fact, includes the special case of one-to-one graph matching.
To demonstrate the effectiveness of the approach to shape retrieval, we apply it to four dif-
ferent object recognition domains: silhouettes, multi-scale qualitative shape descriptions,
top points in scale space, and 3-D skeletons. Figure 1.2 shows images from each of these
domains. A comparative study using silhouettes and 3-D skeletons shows that our method
outperforms all existing techniques reported for the same databases.
An overview of the approach is presented in Figure 1.3. A pair of views are first
represented by attributed graphs (Transition 1). The graphs are then mapped into a low-
dimensional vector space using a low-distortion graph embedding (Transition 2). Finally,
a many-to-many point (embedded graph node) correspondence is computed by the Earth
Mover’s Distance (Transition 3).
6
Figure 1.3: Overview of Many-to-Many Matching Procedure.
1.3 Thesis Overview
After defining the problem statement and the objectives of the research presented in this
chapter, we give a review of the related work in Chapter 2. The related work consists of
several existing graph representations, graph matching algorithms, embedding techniques,
and dimensionality reduction methods. Chapter 2 also introduces basic graph terminology.
Chapter 3 presents two low-distortion graph embedding techniques. Distortion, in this
concept, is defined as the maximum factor by which any distance is changed by the em-
bedding algorithm. The first embedding technique is inspired by the general framework
proposed by Matousek [66]. The framework begins by transforming a graph into a metric
tree, which is then embedded into a normed vector space. Although low-distortion em-
bedding is achieved, the approach suffers from the significant limitation that each graph
is embedded into a vector space whose dimensionality is graph-dependent. Thus, before
the embeddings can be matched, a dimensionality reduction technique such as Principle
Component Analysis (PCA) is required. The aim of dimensionality reduction methods is
7
to represent high-dimensional data in lower dimensions without significant loss of infor-
mation. Since the original high-dimensional data cannot be represented exactly in lower
dimensions, dimensionality reduction methods introduce error. The second embedding
technique is the deterministic variation of the spherical coding algorithm [43]. This novel
linear-time procedure embeds metric trees into normed vector spaces of prescribed dimen-
sionality with the minimal distortion. Since both embeddings are in the same space, they
can be matched directly without the need for a dimensionality reduction step.
Graph embedding methods approximate the distance metric defined on undirected edges
of the original graphs with minimal distortion. However, they fail to encode any oriented
relations, such as parent/child or sibling relations common to scale-space or coarse-to-fine
structures. This is due to the fact that oriented relations do not satisfy the symmetry prop-
erty of a metric. In Figure 1.4, for example, while the relative scale from feature B to
feature A is 0 � 5, it is 2 � 0 the other way around. To encode relational information in the vec-
tor space, the embedding procedure should represent each input graph node as a point in an
asymmetric metric space. Due to the limited number of algorithms defined on asymmetric
metric distances, we will instead propose a method to encode relational information in the
metric space.
More specifically, after using a graph embedding algorithm to represent nodes as points
in the metric space, we encode non-metric relational information into nodes as feature dis-
tributions over the values of incident oriented edges. For one node, encoding the attributes
of its oriented edges requires computing distributions on the attributes and assigning them
to the node. The resulting attribute provides a contextual signature for the node. This
allows the framework to be applied to hierarchical structures represented as hierarchical
8
Figure 1.4: A hierarchical relation between two features in a directed graph.
graphs. This process will be explained in more detail in Chapter 4.
Chapter 5 presents an overview of the Earth Mover’s Distance (EMD) framework [82]
for matching point sets in some geometric space. The EMD approach computes the min-
imum amount of work, which is defined in terms of displacements of the point masses
(weights), it takes to transform one distribution into another. The mass of each point is a
function of the histograms describing the node in the original graph. Here it is important
to consider the possibility that a point set may undergo a transformation with respect to the
other. To handle this, Cohen and Guibas [19] extended the definition of EMD, originally
applicable to pairs of fixed sets of points, to allow one of the sets to undergo a transforma-
tion. In Chapter 5 we show how to choose an appropriate transformation when matching
pairs of weighted point sets.
In Chapter 6 we evaluate the framework for each of the embedding techniques in two
different view-based recognition domains: silhouettes and multi-scale qualitative shape
descriptions. In the first domain, an object’s silhouette is represented by an undirected,
rooted, weighted graph in which nodes represent shocks [99] (also called skeleton points)
9
Figure 1.5: Left: the silhouette and its shock graph. Right: the shock tree constructed fromthe shock graph. Darker nodes reflect larger radii.
and edges connect adjacent shock points. Each point p on the discrete skeleton is labeled
by a 4-dimensional vector v�p �� � x � y � r� α � , where
�x � y � are the Euclidean coordinates
of the point, r is the radius of the maximal bi-tangent circle centered at the point, and α
is the angle between the normal to either bi-tangent and the linear approximation to the
skeleton curve at the point. The right of Figure 1.5 shows the shock graph constructed for
the left image. The second domain uses multi-scale qualitative shape descriptions in which
an image is decomposed into a set of blobs and ridges with automatic scale selection using
the algorithm described in [95]. We will explain these domains in more detail in Chapter 6.
In addition to these experiments, we show the applicability of the framework in two
other recognition domains: top points in scale-space and 3D skeletons in Chapter 7 and
Chapter 8, respectively. In Chapter 7 we describe a set of face recognition tests on a small
face database using scale-space top points. The embedding method used in this experiment
is based on spherical coding. The choice of spherical embedding is motivated by its better
performance than that of Matousek’s embedding. Each image in the database is represented
as a directed acyclic graph (DAG), where vertices represent the top points, and the edges
represent neighborhood structure between them. A DAG for each face image is constructed
10
using the algorithm described in [77].
In Chapter 8 we then apply the framework for 3D object retrieval using skeletal rep-
resentations of 3D volumetric objects. Each 3D object is represented as a curve skeleton,
which consists of a set of connected 1D curves (1 voxel thick). This representation has
a number of advantages: intuitiveness, part/component matching, registration, and artic-
ulated transformation invariance. One important contribution of this chapter is to show
the ability of our matching framework for part matching. More specifically, our goal is to
match a part within a complex whole in 3-dimensional space. This type of matching is par-
ticularly useful for CAD-type databases and also for recognition in laser-scanned images,
which tend to cluster objects together. It is also central to medical applications in which a
particular biological configuration is to be found somewhere in a larger object such as an
organ. At the end of Chapter 8 we present our preliminary part matching results.
Chapter 9 draws some conclusions and presents the potential of the proposed method
in a variety of computer vision and pattern recognition domains. In an object recognition
framework, the quality of feature extraction methods has a significant impact on both the
correctness and effectiveness of the recognition system. Hence, the experimental results
also present the goodness of the feature extraction methods used in the framework. In
Chapter 9 we will also discuss the limitations of the approach and identify some directions
for future work.
11
2. Review of Previous Work
In this chapter we present a review of previous work relevant to the research in this
thesis. Specifically, this chapter discusses a number of different techniques including graph
representations, graph matching algorithms, embedding techniques, and dimensionality re-
duction methods.
2.1 Graph Representations and Basic Terminology
A finite graph G is a pair�V � E � , where V is a finite set of vertices and E is a set
of edges between the vertices. An edge e � � u � v � consists of two vertices u � v � V . A
graph G � � V � E � is edge-weighted if each edge e � E has a weight � � e ��� . The size
of a graph G is defined as its number of vertices, �V � and number of edges, �E � . Edges
are undirected when their corresponding relations are unordered, and a graph that contains
these types of edges is called undirected. Similarly, for ordered relations, i.e,�u � v ���� � v � u � ,
the graph is called directed. Directed edges are usually used to represent non-symmetric
relations. In our framework, we use graphs that are either directed or undirected. A graph
G � � V � E � is said to be complete if for any two vertices u � v � V , where u �� v, there exists
an edge�u � v ��� E. In some graphs, vertices and edges contain additional information. In an
attributed graph, for example, while scale, orientation, anisotropy may be associated with
each vertex, an edge may contain scale ratio, relative orientation, and normalized distance
between two vertices.
Graphs have proven to be useful for object representations. A wide range of scientific
12
areas such as computer vision, computational and molecular biology, linguistics, computer
networks, etc., use graph representations for their applications. When graphs are used
to represent objects, vertices typically represent features (or regions) of an object, while
edges represent relations (or constraints) between features. To give an example, in Fig-
ure 2.1, each image region is shown by either circles or ellipses and relations between them
are shown by red lines. (The smallest circles represent the centers of regions.) The graph,
the result of a feature extraction process, contains four vertices and three edges. The first
vertex in this graph is a virtual feature corresponding to the whole image, while the others
represent the palm of the hand, the index finger, and tip of the finger. Edges show hierar-
chical information between the vertices. This type of graph representation is explained in
more detail in Chapter 4.
Many researchers encode structural representations of objects in graphs. To name a few,
Dickinson et al. [30] and Cyr et al. [25] used aspect graphs for 3D object representation.
Ioffe and Forsyth [49] employed trees to model people and for human tracking. Authors
in [57, 97, 98] used the notion of shock graphs to represent 2D shapes.
Due to their common use in many fields, graph representations have received significant
attention for indexing into large databases. Messmer and Bunke [68] proposed a decision
tree mechanism for hierarchically partitioning a database. The decision tree is constructed
from the database graphs in a preprocessing step. A query graph is first matched to the
root and depending on the result of this match, the process is applied recursively to one of
the subtrees. The objective here is to determine if there is a subgraph isomorphism from a
query graph to one of the database graphs. Sengupta and Boyer [90] partitioned a database
of 3D models in a spectral graph decomposition framework, where the nodes in the graph
13
Figure 2.1: An example graph whose vertices represent different image regions and whoseedges represent relations between the regions.
represented 3D patches.
A related approach to the partition framework is clustering, where the database is or-
ganized into a set of prototypes and one representative element is selected in each group.
Shapiro and Haralick [93] used a clustering approach based on relational distance metric to
orginize a database of relational models. Sengupta and Boyer [89] presented a framework
for organizing large structural modelbases using an information theoretic criterion. The au-
thors constructed the hierarchical structure via clustering and computed the representative
elements of each cluster.
Recently, Sebastian et al. [88] proposed an indexing mechanism for retrieving can-
didate graphs from a large database. The framework was based on the use of a coarse-
14
scale distance along with coarse-scale sampling. The authors showed that a coarse-scale
distance measure resulted in 50 � 100 times speed-up in distance computations and over-
all the framework reduced the computational requirements in retrieving candidate graphs.
Shokoufandeh et al. [96] proposed a framework for indexing hierarchical image structures
that embedded the topological structure of a directed acyclic graph (DAG) into a low-
dimensional vector space. Encoding a DAG’s topology was derived from an eigenvalue
characterization of a DAG’s adjacency matrix. Costa and Shapiro [23] developed an ap-
proach where small relational subgraphs were used to retrieve model graphs from a large
database. Sossa and Horaud [100] proposed a scheme that used the coefficients of the
d2 � polynomial corresponding to the Laplacian matrix of a graph. Irniger and Bunke [52]
presented a method based on decision trees to filter a database of graphs for a given query
graph. The method extended their previous work on graph matching performance and graph
database filtering [50, 51] and it was used to tackle both graph and subgraph isomorphism
problems.
A related problem to indexing in the information management community is that of
query processing over data that conforms to labeled graph data models. In this commu-
nity, a number of techniuqes have focused on extracting structural summaries from the
data [34, 38, 39, 71, 72]. During the query evaluation for graph-structured data, the struc-
tural summaries has an important role [2, 15].
2.2 Graph Matching
The problem of finding the similarity between pairs of objects using their graph repre-
sentations has been the focus for over twenty years of many researchers in the computer
15
vision and pattern recognition communities. In this thesis we consider model-based object
recognition problems, where query and database objects are represented as two different
graphs. When objects are represented as graphs, the problem of object recognition can
be reformulated as that of graph matching. Graph matching has been used in a number
of applications, such as image analysis [64], document processing [63], and video analy-
sis [17]. Given a pair of graph, graph matching techniques are often required to compute
the distance between them via variety of functions (see [16, 31, 83, 92, 103] ).
Previous work on graph matching has usually focused on finding one-to-one corre-
spondences between graph nodes. Barrow and Burstall [7] used association subgraphs
as an auxiliary graph to locate maximum common subgraphs. Vertices in an association
graph represent node-to-node correspondences between two input graphs. The goal, in this
work, was to find the maximal clique of an association graph to locate node correspon-
dences. Pelillo et al. [76] used a similar approach to match pairs of trees. Their method
first constructed an association graph using the concept of graph connectivity and obtained
a maximal subtree isomorphism through a maximal clique formulation. They proved that
there was a one-to-one correspondence between maximal clique and maximal subtree iso-
morphism.
Shapiro and Haralick [91] proposed a framework to find a common subgraph isomor-
phism between two attributed graphs. The algorithm was based on comparing weighted
primitives (weighted attributes and weighted relation tuples) using a normalized distance
for each primitive property that was inexactly matched. The drawback of this method was
its computational cost, which was exponential in the number of graph nodes. To improve
the complexity of the framework, Grimson et al. [42] used a heuristic search technique that
16
terminated the search as soon as a solution meeting some minimum requirement was found
(near-optimal).
Gold and Rangarajan [37] developed a graduated assignment algorithm for one-to-one
graph matching by combining graduated non-convexity (deterministic annealing), two-way
(assignment) constraints, and sparsity. The technique is based on efficiently finding solu-
tions for optimization problems that use a match matrix denoting an assignment (corre-
spondence) between graph nodes.
A number of approaches have been developped to solving stereo correspondence prob-
lem in vision. One of the most important work on this was proposed by Boyer and Kak [13].
The authors first extract structural desciptions of two two-dimensional scene through a low
level process. These desctiptions are derived from the radial-valued skeleton of binary im-
ages and they are used in a stereo matching procedure via consistent labeling problem [45].
The number of features in such a matching is shown to be much less than in more tradional
feature-based frameworks.
Another class of graph matching techniques, known as spectral methods, represents
structural properties of graphs using eigenvalues and eigenvectors of graph adjacency ma-
trices. One of the most important advantages of using these techniques comes from the
fact that it is less computationally expensive than general combinatorial search procedures.
One work on spectral abstraction of hierarchical graph structures was proposed by Siddiqi
et al. [99]. In this work, the authors combined a bipartite matching framework with a spec-
tral decomposition of graph structure to match shock graphs. Shocks are organized into
a directed acyclic shock graph, which they characterize by a shock grammar defining the
process of reducing the shock graph into a rooted shock tree. The method then matches
17
Figure 2.2: One-to-one feature correspondences computed by Siddiqi et al. [99]
shock trees to locate the best set of corresponding nodes in polynomial time. Figure 2.2
shows one-to-one feature correspondences computed by their matching algorithm.
Shokoufandeh et al. [95] extended this framework to directed acyclic graphs that arise
in multi-scale image representations. The algorithm computes both topological and geo-
metric similarity as well as node correspondence between two given graphs. Computing
the similarity and node correspondence were formulated as a function of structural, con-
textual, and node context similarities. Although the algorithm can be used to find explicit
one-to-one node correspondences further down the hierarchies, one-to-one node correspon-
dences at higher levels effectively define a many-to-many matching between their underly-
ing nodes. Belongie et al. [9] used a similar idea to encode the qualitative shape occupancy
characteristics of a neighborhood surrounding a point. In a bipartite matching framework,
correspondences were formed between points with similar shape contexts, despite the fact
that the neighbors could have differing numbers of points.
18
The problem of many-to-many graph matching has been addressed most often in the
context of edit-distance. The idea of edit-distance was originally introduced for graph
matching by Sanfeliu and Fu [84]. The authors defined a distance between attributed rela-
tional graphs based on a descriptive graph grammar. Messmer and Bunke [67] presented
a general error-tolerant matching algorithm for finding subgraph isomorphisms between
given pairs of graphs. Their goal was to modify the input graphs during the matching pro-
cess using edit operations. Liu and Geiger [62] proposed a framework for matching trees
on a many-to-many basis. The algorithm first represented each shape contour (silhouette)
as a tree structure derived from a shape axis model. An edit-distance based tree matching
schema was then used to find the best approximate match and a matching cost.
Myers, Wilson, and Hancock [69] used the edit-distance to model the probability dis-
tributions for structural errors in the graph-matching problem. The probability distribution
was used to locate matchings between graph nodes. Sebastian et al. [86] matched shock
graphs of 2D shapes by first representing each shape as a point in the shape space and defin-
ing the distance between shapes as the minimum cost of deformation path connecting one
shape to another. They present an efficient graph-edit distance algorithm for finding glob-
ally optimal paths between shapes. Sample matching results between two pairs of objects,
computed by their framework [86], are shown in Figure 2.3.
Despite the fact that many edit-distance based matching frameworks exist, they all share
the same objective: finding the minimal set of re-labelings, additions, deletions, merges,
and splits of nodes and edges that transform one graph into another. As a result, applying a
sequence of edit operations with minimum total cost will make two input graphs isomoprhic
with one another. Although powerful, the edit-distance approach has its drawbacks: 1) it is
19
Figure 2.3: Matching results between two pairs of objects computed by Sebastian et al. [86]
computationally expensive (polynomial-time algorithms are available only for trees); 2) the
method, in its current form, does not accommodate edge weights (most approaches used in
this context are heuristic in nature); 3) the method does not deal well with occlusion and
scene clutter, resulting in much effort spent in “editing out” extraneous graph structure; and
4) the cost of an editing operation often fails to reflect the underlying visual information
(for example, the visual similarity of a contour and its corresponding broken fragments
should not be penalized by the high cost of merging the fragments).
In the context of line and segment matching, Beveridge and Riseman [10] proposed
a framework to find the optimal many-to-many correspondence mapping between a line
segment model and image line segments through exhaustive local search. Performance
of the local search was presented in the presence of increasing model complexity, image
clutter, and additional model instances. Although their method found good matches reliably
and efficiently (due to their choice of the objective function and a small neighborhood
size), it is unclear how the approach can be generalized to other types of feature graphs and
20
objective functions.
Scott and Longuet-Higgins [85] presented an algorithm that maximized the inner prod-
uct of two matrices, pairing matrix and proximity matrix, to find feature correspondences.
Elements of the proximity matrix described Gaussian weighted distances between pairs
of features. They showed that the eigenvectors of this matrix could be used to determine
correspondences between two given feature sets.
Kosinov and Caelli [58] used a similar approach, showing how inexact graph matching
could be solved using the renormalization of projections of vertices into the eigenspaces
of graphs combined with a form of relational clustering. In this framework, the authors’
goal was to formulate graph matching as clustering, which groups common local relational
structures between different graphs. Our framework differs from their approach in that (1)
it can handle information encoded in a graph’s nodes, which is desirable in many vision ap-
plications; (2) it does not require an explicit clustering step; (3) it provides a well-bounded,
low-distortion metric representation of graph structure; (4) it encodes both local and global
structure, allowing it to deal with noise and occlusion; and (5) it can accommodate multi-
scale representations.
2.3 Embedding Techniques
Low-distortion embedding techniques have received much attention in theoretical com-
puter science and have proven to be useful in a number of graph algorithms, including
clustering and, most recently, on-line algorithms. Indyk [47] provides a comprehensive
survey of recent advances and applications of low-distortion graph embedding. The main
applications can be grouped into the following classes.
21
General metrics into low-dimensional normed spaces: The goal here is to obtain a
low-dimensional representation of the original metric. This embedding technique enables
us to represent data points in the original metric with fewer bits. One of the applications of
this approach was given by Linial, London, and Rabinovich [60]. The authors introduced
metric embedding to obtain an approximation algorithm for the sparsest cut problem in
which the objective was to maximize the weighted number of pairs while minimizing the
cost of the cut. Using the notion of the embedding, an O�logk � -approximation algorithm
was obtained.
General metrics into tree metrics: This type of embedding enables us to embed finite
metrics into tree metrics instead of normed spaces. Applications of this approach include
both on-line and off-line algorithms. In a typical on-line algorithm, the objective is to
perform a set of requests without knowing future requests. Bartal et al. [8] presented a
randomized on-line algorithm for the Metrical Task System problem that runs O�log2 n ��� a
competitive algorithm for the problem. Several researchers have also used this embedding
technique to find approximation algorithms for NP-hard problems. The studies have been
motivated by the fact that many problems that are NP-hard for general metrics have polyno-
mial time solutions for trees. As a result, this embedding approach enables the development
of good approximation algorithms.
Tree metrics into low-dimensional normed spaces: Instead of working with gen-
eral metrics, some embedding algorithms take tree metrics and embed them into low-
dimensional normed spaces. While many of these methods do not allow the dimension
of the target space to be specified, some of them embed tree metrics into normed spaces
with prescribed dimensionality. We will study two such embedding techniques, one in each
22
group, in Chapter 3. For recent approaches related to tree-metric embedding, see [43, 61,
66].
Specific metrics into normed spaces: Embedding specific metrics, such as Hausdorff
or edit-distance, into normed spaces allows us to use some well-known algorithms (cluster-
ing, for example) in the normed space to solve the problems in the original metric. Indyk
and Thaper [48] developed an embedding procedure in support of image retrieval. Image
feature distributions, such as color histograms, do not provide a convenient mechanism for
indexing into large image databases. In a two-step procedure, they first embed the feature
distribution in a vector and then use the Locality Sensitive Hashing (LSH) algorithm of
Gionis et al. [35] to retrieve nearby candidates. The embedding method was designed so
that the distance between two such embeddings mimics the Earth Movers Distance (EMD)
between their respective feature distributions. Grauman and Darrell [41] applied the frame-
work to match 2D contours as shape context-like distributions. Our earlier work that com-
bines low-distortion embedding and EMD was reported in [28], [29], and [56].
Low-distortion embedding continues to be a focal point in the theoretical computer
science community. Recent results related to properties of low-distortion embedding may
be found in [3, 61, 66]
2.4 Dimensionality Reduction Techniques
The goal of dimensionality reduction techniques is to map a set of points in a high-
dimensional space to a lower-dimensional space, with the aim of preserving important fea-
tures of the pointset (pairwise distances between data points, for example). Alternatively,
the goal can be defined as finding meaningful low-dimensional structures hidden in the
23
high dimensional data.
Dimensionality reduction techniques are useful tools especially for designing efficient
algorithms. One of the most important reasons for this comes from the fact that the running
times of most algorithms are proportional to the dimensionality of the space. Thus, re-
ducing the dimensionality of the space helps improve the running times. These techniques
are also used in clustering problems, where the objective is to find a set of representative
(or canonical) points minimizing a certain function defined on the input set, such as find-
ing the k-mean or k-center. The main idea of using a dimensionality reduction technique
for clustering is motivated by the following: Data points that are close to each other (and
thus, should be grouped together) in d � dimensional space become closer in dimension
d � 1. This, in turn, makes it easier to solve the clustering problem in lower dimensions.
Dimensionality reduction methods are also used to estimate the number of clusters.
Other application areas of dimensionality reduction techniques include visualization,
image processing, data compression, pattern recognition, data analysis, some biological
and physical sciences, and data mining. To give an example, let us assume that given a
pair of 256 � 256 pixel images, our goal is to compute the similarity between them. It is
clear that each image corresponds to a point in a 65 � 536-dimensional space, as depicted in
Figure 2.4. Assume that we are also given some similarity measure in this space. To gain
accuracy and to speed up computation time, one needs only to extract relevant information
and discard unnecessary details, which correspond to dimensions that do not provide any
useful information. The process of reducing such unnecessary details and selecting useful
features from a high-dimensional data set can be done through a dimensionality reduction
process.
24
Figure 2.4: Representing a 256 � 256 pixel image as a point in a 65 � 536-dimensional space.Each pixel shown by a square in (a) corresponds to an entry in the 65 � 536-size vector in(b).
Many popular methods exist for dimensionality reduction of data distributions. Two
well-known techniques in this family include principal components analysis (PCA) [53]
and multidimensional scaling (MDS) [24]. A standard multidimensional scaling technique
takes an�n � n � matrix, operates by means of eigenvector analysis, and produces a layout
based on a linear combination of dimensions. On the other hand, a typical PCA-based
dimensionality reduction technique is based on a linear projection that maximizes the vari-
ance in the projected space. In other words, the goal of a PCA-based algorithm is to find
a linear lower-dimensional representation of the data such that the variance of the recon-
structed data is preserved. We will study one PCA-based dimensionality reduction ap-
proach in Section 3.3.3.
PCA and MDS-based dimensionality reduction techniques are linear in nature, and are
widely used for classification and learning (via clustering) purposes. The linearity of these
25
methods is, in fact, considered to be one of their shortcomings. The high dimensional
representations of many experimental problems have more compact descriptions in terms
of lower dimensional manifolds. This makes the straight-line measurement of distances in
source space restrictive and will directly affect the results of classification and quality of
the resulting learning algorithms.
To address these shortcomings, Tenenbaum et al. [102] introduced the notion of ISOMAP
as a more sophisticated variation of multidimensional scaling. Here, the distances are mea-
sured based on geodesic shortest-paths along manifolds (or their approximations) of the
input data. To avoid global pairwise distance computation, Roweis and Saul [81] pro-
posed an eigenvalue-based method known as locally linear embedding (LLE). The method
characterizes each point in the data set by its local representation in terms of its neighbor-
hood patches that capture the local geometry of the manifold. The LLE then constructs a
neighborhood-preserving mapping based on the invariance properties of these local neigh-
borhoods. In the final step of the algorithm, each high-dimensional data point in the metric
space is mapped to a low-dimensional vector representing global internal coordinates on
the manifold.
2.5 Conclusions
Graphs have proven to be useful for object representations. When objects are repre-
sented as graphs, the problem of object recognition is reformulated as that of graph match-
ing. A number of approaches have been developed for matching pairs of graphs. Most of
these approaches, however, have focused on finding one-to-one feature (node) correspon-
dences. Due to limitations of these approaches mentioned above, they cannot be used in
26
more realistic cases, where a cluster of features of one graph correspond to a cluster of
features of another.
The problem of many-to-many graph matching has also been studied mostly in the
context of edit-distance. The general idea behind edit-distance is to find a minimal set of
re-labelings, additions, deletions, merges, and splits of nodes and edges that transform one
graph into another. Although the method has important potential for matching features
on a many-to-many basis, it suffers from a number of drawbacks, such as computational
complexity and inability to handle underlying visual information, while providing the cor-
respondences.
The development of an efficient and reliable many-to-many matching framework, which
is also stable with respect to noise is still an open issue. Our goal in this thesis is to de-
velop a framework for establishing many-to-many feature correspondences between pairs
of attributed graphs.
We have also seen that graph embedding techniques have proven to be useful in a num-
ber of graph algorithms, such as clustering, online algorithms, etc. Broadly speaking, they
reduce problems defined over “difficult” metric spaces, to problems over “easier” normed
spaces. In our framework, we will use graph embedding techniques to reduce the the many-
to-many feature matching problem to that of many-to-many point matching, for which a
number of existing matching approaches are available.
27
3. Metric Embedding of Graphs
In this chapter we introduce the concept of graph embedding, review some notation
and definitions that will be useful in the rest of this thesis, and present two low-distortion
embedding algorithms. Distortion is defined as the maximum factor by which any distance
between any two vertices is changed by the embedding algorithm. The formal definition
of distortion is given below. Both embedding algorithms begin by transforming a graph
into a metric tree that is then embedded into a normed vector space. In the first embedding
technique, which is inspired by the general framework proposed by Matousek [66], each
graph is embedded into a vector space whose dimensionality is graph-dependent. Thus, be-
fore the embeddings can be matched, a dimensionality reduction technique (such as PCA)
is required. Since high dimensional data cannot be represented exactly in lower dimen-
sions, dimensionality reduction techniques introduce error. We overcome this problem by
introducing the second embedding technique, which is the deterministic variation of the
spherical coding algorithm [43]. This novel procedure embeds metric trees into normed
vector spaces of prescribed dimensionality, precluding the need for dimensionality reduc-
tion techniques.
3.1 Introduction
The difficulty with establishing many-to-many node correspondences is due to the fact
that any subgraph of one graph can be assigned to any subgraph of another, which makes
the problem intractable. Our interest in low-distortion graph embedding is motivated by its
28
ability to transform graphs to an alternative space in which establishing many-to-many cor-
respondences between embedded graph nodes is computationally tractable. To ensure that
the solution of the many-to-many point matching problem in the embedded space reflects
a meaningful solution to the many-to-many graph matching problem in the original graph
space, the geometric structure of the points must somehow reflect the topological structure
of the graph.
During the last decade, low-distortion embedding has become recognized as a very
powerful tool for designing efficient algorithms. In low-distortion embeddings of metric
spaces into normed spaces, we consider mappings f : V ��� , where V is a set of points
in the original metric space, with distance function � � ������� , � is a set of points in the d-
dimensional normed space ��������� k, and for any pair p � q � V we have
1c� � p � q ��� ��� f � p �!� f
�q �"��� k �#� � p � q � (3.1)
where c is known as the distortion. Intuitively, such an embedding will enable us to re-
duce problems defined over difficult metric spaces,�V �$�%� , to problems over easier normed
spaces,� �&�'��������� k � . Clearly, the closer c is to 1, the better the target set � mimics the origi-
nal set V . Consequently, the distortion parameter c is a critical characteristic of embedding
f .
The most fundamental existence result in computational embedding is due to Bour-
gain [11].
Lemma 1. Any finite metric space�V �$�%� can be embedded into a finite normed space �����(��� 2
of dimension at most log �V � with distortion O�log �V �)� .
29
Matousek [65] further extended this lemma for embedding finite metrics into ld∞.
Lemma 2. For any positive integer q, any finite metric space�V �*�+� can be embedded into
ld∞ with distortion 2q � 1, where d � O
�qn1 , q logn � .
These results are important since even an exponential matching algorithm, in terms of
number of dimensions of the target space, may be tractable. However, O�log �V �)� is too
large of a distortion and we seek an embedding with a much lower distortion.
The above definition of a low-distortion embedding maps a set of points in the original
metric space to a set of points in the target space. Since in our framework the original space
is based on graph representations, we must choose a suitable metric for our graphs, i.e, we
must define a nonnegative function describing the distance between any two vertices.
Given a graph G � � V � E � and any three vertices u � v � w � V , a metric � for the graph
satisfies the following properties:
1. � � u � v �-�.� � v � u ��/ 0
2. � � u � u �0� 0
3. � � u � v �1�2� � u � w �435� � w � v �In general, there are many ways to define metric distances on a weighted graph. The
best-known metric is the shortest-path metric δ� ������� , i.e., � � u � v ��� δ
�u � v � , the shortest
path distance between u and v for all u � v � V .
The problem of low-distortion embedding has a long history for the case of planar
graphs, in general, and trees, in particular. The following conjecture shows the existence
of a O�1 � distortion embedding of planar graphs.
30
Conjecture 1. [44] Let G � � V � E � be a planar graph, and let M � � V �$�%� be the shortest-
path metric for the graph G. Then there is an embedding of M into ��������� 1 with O�1 � distor-
tion.
This conjecture has only been proven for the case in which G is a tree. Although the
existence of such a distortion-free embedding under ���6����� k-norms was established in [60],
no deterministic construction was provided. Several researchers have also studied the pos-
sibility of embedding a tree into the �����(��� 2 norm with O�1 � distortion. Bourgain [12] showed
that a complete binary tree cannot be embedded into �����(��� 2 with less than O�87
loglogn �distortion. Matousek [66] then showed that Bourgain’s bound is tight for all trees. More
generally, he proved that any tree can be embedded into ���6����� d with O���
loglogn � min 9 1 , 2 : 1 , d ; � .One deterministic algorithm to embed a tree into a vector space is given by Matousek [66].
His framework suggests that if we can somehow map our graphs into trees, with small dis-
tortion, we can then embed the resulting trees into a vector space. In the following section
our goal is to compute tree metrics from graph representations.
3.2 Construction of a Tree Metric from a Distance Matrix
(Numerical Taxonomy Problem)
Let G � � V � E � denote an edge-weighted graph and � denote a shortest-path metric for
G, i.e., � � u � v �<� δ�u � v � , for all u � v � V . The problem of approximating (or fitting) an
n � n distance matrix � by a tree metric = is known as the Numerical Taxonomy problem.
In many fields such as paleontology and evolutionary biology, approximating a distance
matrix by a tree metric plays an important role. Recall that a tree metric = is a metric
31
induced by an edge-weighted tree on its vertex set, where the distance between any pair of
vertices u and v is the length of the unique path between them.
The numerical taxonomy problem has received significant attention over many years
with work going as far back as the beginning of 20th century [6]. Waterman et al. [105]
showed that if there is a tree metric = coinciding exactly with distance matrix � then it
is unique and can be constructed in linear time. Day [27] showed that for L1 and L2, the
numerical taxonomy problem is NP-hard. Since the numerical taxonomy problem is an
open problem for general distance metrics, we must explore approximation methods. The
numerical taxonomy problem can be approximated by converting the distance matrix � to
the weaker ultra-metric distance matrix.
An ultra-metric is a special type of tree metric defined on rooted trees, where the
distance to the root is the same for all leaves in the tree, an approximation that intro-
duces small distortion. A metric � is an ultra-metric if, for all points x � y � z, we have�?> x � y @A� max BC�?> x � z @D�$�?> y � z @�EF� An ultra metric can also be represented by a weighted tree
such that �G> x � y @ is the maximum edge weight on the path between points x and y. Un-
fortunately, an ultra-metric does not satisfy all the properties of a tree metric distance. To
create a general tree metric from an ultra-metric, we need to satisfy the 4-point condition
(see [14]), defined as
�?> x � y @H3I�?> z � w @A� max BC�?> x � z @J35�?> y � w @D�$�?> x � w @H3I�?> y � z @�Efor all x � y � z � w. A metric that satisfies the 4-point condition is called an additive metric,
and a metric � is additive if and only if it is a tree metric (see [14]).
32
Our construction of a tree metric consists of: 1) constructing an ultra-metric from � ,
and 2) modifying the ultra-metric to satisfy the 4-point condition. One such approximation
framework, called the centroid metric tree = , has been given by Agarwala et al. [3]. The
construction of a tree metric in their algorithm is achieved by transforming the general tree
metric problem to that of ultra-metrics. Given a graph G � � V � E � and a metric � defined
over G, the construction of an ultra-metric starts by computing the minimum spanning tree= mst of G. Let e � � u � v � be the maximum-weight edge of = mst . Clearly, removing e from
the tree = mst results in two two distinct subtrees = 1 and = 2. The ultra-metric U has root at
height �?> u � v @LK 2 and the subtrees of the root are the ultra-metric trees U1 and U2 recursively
defined on = 1 and = 2, respectively.
The algorithm presented by Agarwala et al. [3] follows the two-step procedure outlined
above, and generates an approximate tree metric = to an optimal additive metric in time
O�n2 � . It should be noted that this construction does not necessarily maintain the vertex set
of G invariant. The embedding process may add extra vertices generated during the metric
tree construction that must be removed prior to matching.
More specifically, let � be an n � n distance matrix and = be a tree that approximates� . The algorithm presented in [3] finds an additive tree = such that ��� =M�*�5��� ∞ � 3ε , where
ε is the closest tree metric under the L∞ norm. Moreover, the authors showed that it is
NP-hard to find a tree = such that ���N=O�$�P��� ∞ Q 98ε . The results were then generalized to
other norms.
An example of constructing a metric tree from a graph is shown Figure 3.1, in which
a hierarchical blob decomposition of an image, shown in (a), yields a graph whose edge
weights reflect the Euclidean distances between the nodes (centroids of their corresponding
33
(a) (b) (c)
Figure 3.1: Metric tree representation of the Euclidean distances between nodes in a graph.The gesture image (a) consists of 6 regions (the region representing the entire hand is notshown). The complete graph in (b) captures the Euclidean distances between the centroidsof the regions, while (c) is the metric tree representation of the multi-scale decomposition(with additional vertices).
regions), shown in (b). The metric tree representation of the graph is shown in (c); note the
additional vertices (white) introduced by the construction, which will be later removed.
3.3 Embedding into Graph-Dependent Dimensionality
Given a metric tree approximation of our original graph, we can now proceed with
the first embedding algorithm, which is inspired by the general framework proposed by
Matousek’s [66]. The algorithm maps the nodes in the metric tree to points in some low-
dimensional Euclidean space. The dimension of the Euclidean space is graph-dependent.
The construction of the embedding depends on the notion of a path partition of a graph. In
the following subsection, we introduce the concept of path partition and then later on we
use it to construct the embedding.
34
3.3.1 Path Partition of a Graph
The process for the embedding of a particular node is based upon the path from the
tree’s root to that particular node. Parts of that path will be unique to that node, while
other parts will be shared by paths to other nodes. A partitioning of these paths, called
a caterpillar decomposition, yields a set of “basis” paths defining the dimensionality of
the vector embedding. The path from the root to any node will traverse some weighted
combination of these basis paths, yielding the components of the vector, with the weights
reflecting how much of the basis path is traversed.
Specifically, given a weighted graph G � � V � E � with metric distance � � ���R�S� , let =I��V �UTV� denote a tree representation of G, whose vertex distances are consistent with � � ������� .
In the event that G is a tree, =W� G; otherwise = is the centroid metric tree of G. To
construct the embedding, we will assume that = is a rooted tree. It will be clear from the
construction that the choice of the root does not affect distortion of the embedding.
The dimensionality of the embedding of = depends on the caterpillar dimension [66],
denoted by cdim� =X� , and is recursively defined as follows. If = consists of a single vertex,
we set cdim� =�Y� 0. For a tree = with at least 2 vertices, cdim
� =��� k 3 1 if there exist paths
P1 �����R�R� Pr beginning at the root and otherwise pairwise disjoint, such that each component= j of =��IT � P1 �Z�5T � P2 �Z�P[�[�['�5T � Pr � satisfies cdim� = j �\� k. Here =]�5T � P1 �Z�IT � P2 �Z�[�[�[8�]T � Pr � denotes the tree = with the edges of the Pi’s removed, and the components = j are
rooted at the single vertex lying on some Pi. The caterpillar dimension can be determined
in linear time for a rooted tree = , and it is known that cdim� =X��� log
� �V �)� (see Lemma 3).
The construction of vectors f�v � , for v � V , depends on the notion of a path partition
of = . The path partition ^ of = is empty if ^ is single vertex; otherwise ^ consists of a set
35
of paths P1 �������$� Pr as in the definition of cdim� =X� , plus the union of path partitions of the
components of =��PT � P1 �!�PT � P2 �A�P[R[�['�PT � Pr � . The paths P1 �������$� Pr have level 1, and the
paths of level k / 2 are the paths of level k � 1 in the corresponding path partitions of the
components of =��PT � P1 �A�5T � P2 �A�_[�[�[`�5T � Pr � . Note that the paths in a path partition are
edge-disjoint and their union covers the edge-set of = .
To illustrate these concepts, consider the tree shown in Figure 3.2. The three darkened
paths from the root represent three level 1 paths. Following the removal of the level 1 paths,
we are left with 6 connected components that, in turn, induce seven level 2 paths, shown
with lightened edges.1 Following the removal of the seven level 2 paths, we are left with
an empty graph. Hence, the caterpillar dimension (cdim� =X� ) is 2.
Lemma 3. Given a rooted tree = , cdim� =��� log
� �V �N� .Proof. It is known that among all trees, the complete binary tree Bn has the largest caterpil-
lar dimension [66]. Thus, it is sufficient to show that cdim�Bn �\� log
� �V �)� . Let us remove
a root-leaf path P1 from Bn. Note that P1 is a level one path in the caterpillar decomposi-
tion. After the removal of P1, we will have at most log �V � subtrees. When we recursively
construct the caterpillar decomposition for each subtree, the longest root-leaf path of each
subtree will have a level, which is at most one greater than the level of the path that has
just been removed. Therefore, for complete binary trees (and thus for other trees), the
caterpillar dimension is bounded by log �V � .1Note that the third node from the root in the middle level 1 branch is the root of a tree-component
consisting of four nodes that will generate two level 2 paths.
36
Figure 3.2: Path partition of a tree.
Given a rooted tree = , we give the construction of its caterpillar decomposition in Al-
gorithm 1.
Complexity Analysis of Algorithm 1
Since the embedding algorithms use the notion of the caterpillar decomposition, we
first analyze the running time of its construction as given in Algorithm 1. It is easy to see
that Steps 1 through 11 take O� �V 3 E �)�a3 O
� �V �N�a3 O� �E �)� , where �E �U�b�V �c� 1. To find the
edge-disjoint paths in the caterpillar decomposition along with their levels, each edge in the
tree is visited exactly twice. This implies that the running time of the Steps 11 through 32
37
Algorithm 1 Caterpillar Decomposition Construction ( = : Edge-weighted Tree)
1: root d root of = .
2: Call Breath-First-Search(BFS) on = .
3: for each vertex v �e= do4: color[v] d WHITE
5: level[v] d 06: end for
7: color[root] d BLACK
8: for each edge e �%= do9: level[e] d 0
10: end for
11: for each leaf v �+= do12: u d predecessor of v
13: create an empty array E fa�gBaE14: no-edges d 0;15: while color[u]==WHITE do16: E f > no-edges 3h3i@jd edge
�u � v �
17: v d u
18: u d predecessor of v
19: end while20: if u �k� root then
21: E f > no-edges 3h3i@jd edge�root � v �
22: end if23: last-vertex d u
24: for (i d 0; i Q no-edges; i 3_3 ) do25: let e d E f > i @ be an edge between x and y, i.e, e � � x � y �26: level[e] = level[last-vertex] + 127: level[x] = level[last-vertex] + 128: level[y] = level[last-vertex] + 1
29: color[x] d BLACK30: end for31: add the path specified by the edges in E f into set ^32: end for
33: return ^
38
is linear in terms of the number of edges. The total running of the algorithm then becomes:
T�n �l� O
� �V �D3#�E �)�43 O� �V �N�43 O
� �E �)�j3 O� �E �)�
T�n �l� O
� �V �N�3.3.2 Construction of the Embedding
Given a path partition ^ of = , we use m to denote the number of levels (or caterpillar
dimension) in = , and let P�v � represent the unique path between the root and a vertex v � V .
The first segment of P�v � of weight l1 follows some path P1 of level 1 in ^ , the second
segment of weight l2 follows a path P2 of level 2, and the last segment of weight lα follows
a path Pα of level α � m. The sequences m P1 �����R�R� Pα n and m l1 �����R�R� lα n will be referred to
as the decomposition sequence and the weight sequence of P�v � , respectively.
To define the embedding f : V �o� under ���6����� 2, we let the relevant coordinates in � be
indexed by the paths in ^ . The vector f�v � , v � V , has non-zero coordinates corresponding
to the paths in the decomposition sequence of P�v � . Returning to Figure 3.2, the vector
f�v � will have 10 components (defined by three level 1 paths and seven level 2 paths).
Furthermore, every vector f�v � will have at most two non-zero components. Consider, for
example, the second lowest leaf node in the middle branch. Its path to the root will traverse
two level 2 edges corresponding to the fourth level 2 path, as well as three level 1 edges
corresponding to the second level 1 path.
Such embedding functions have become fairly standard in the metric space represen-
tation of weighted graphs [61, 66]. In fact, Matousek [66] has proven that setting the k-
th coordinate of f�v � , corresponding to path Pk, 1 � k � α , in decomposition sequence
39m P1 ���R���R� Pα n , to
f�v � Pk �qp lk r lk 3 ∑α
j s 1 max t 0 � l j � lk K 2m uwvwill result in a small distortion of at most x log log �V � . It should be mentioned that al-
though the choice of path decomposition ^ is not unique, the resulting embeddings are
isomorphic up to the transformation that preserves ratios of distances. Given a metric tree= , the construction of this embedding is summarized in Algorithm 2. Figure 3.3 shows an
example of embedding a tree into 3-dimensional space using this algorithm.
Algorithm 2 Embedding into Graph-Dependent Dimensionality
1: Construct the path partition ^ of = according to Section 3.3.1.
2: m d number of levels in ^ (caterpillar dimension)
3: for all v �e= do
4: Compute its decomposition sequence Q P1 ��������� Pα y andweight sequence Q l1 ��������� lα y
5: for k d 1 to α do
6: f�v � Pk d p lk r lk 3 ∑α
j s 1 max t 0 � l j � lk K 2m uwv7: end for
8: end for
Complexity Analysis of Algorithm 2
As shown in the previous section, the construction of ^ has computational complexity
O� �V �)� . Since we embed each vertex in = to a point in Euclidean space, Steps 4-7 are
executed O� �V �N� times. The number of paths in ^ is bounded by O
� �E �N� . Thus, the total
running time of the algorithm is O� �V �N�43 O
� �V �'�G�E �N�0� O� �V �'�?�E �)� .
40
Figure 3.3: (a) A sample tree with edge weights. (b) Embedded vertices are shown in 3-dimensional space. The Cartesian coordinates of the points are: a � � 0 � 0 � 0 � , b � � 1 � 0 � 0 � ,c � � 1 � 5 � 0 � 0 � , d � � 0 � 2 � 0 � , e � � 0 � 3 � 5 � 0 � , f � � 0 � 2 � 23 � 1 �3.3.3 Bringing Point Distributions into the Same Normed Space
It is important to note that embeddings produced by the above algorithm may be in
different dimensions and are defined only up to a distance-preserving transformation. Note
that a translated and rotated version of a graph embedding will also be a graph embedding.
Therefore, in order to match two embeddings, we must first perform a “registration” step
to project the two distributions into the same normed space.
Our transformation is based on Principal Components Analysis (PCA). Specifically,
the projection of the original vectors onto the subspace spanned by the first K right singu-
lar vectors of the covariance matrix retains the maximum information about the original
vectors among all projections onto subspaces of dimension K. Hence, projecting the two
distributions onto the first K right singular vectors of their covariance matrices will equalize
their dimensions while losing minimal information. Specifically, our PCA-based transfor-
41
mation is contained in the following theorem:
Theorem 4. Let X �zB � x1 � w1 �������R�$� � xn � wn �`E and Y �{B � y1 � w1 �4�R���R� � ym � wm �`E be a pair
of weighted distributions in two different dimensions, d and d f , and let K be min�d � d f � .
Suppose moreover that
µx d �∑
iwixi ��K ∑
iwi (3.1)
µy d �∑
iwiyi ��K ∑
iwi (3.2)
σ 2x d �
∑i
wi ��� xi � µx ��� 2 ��K ∑i
wi (3.3)
σ 2y d �
∑i
wi ��� yi � µy ��� 2 ��K ∑i
wi (3.4)
Σxx d �∑
iwi�xi � µx � � xi � µx � T ��K ∑
iwi (3.5)
Σxx � UxDxV Tx is the SVD of Σxx (3.6)
Wx d first K columns of Vx (3.7)
Σyy d �∑
iwi�yi � µy � � yi � µy � T ��K ∑
iwi (3.8)
Σyy � UyDyV Ty is the SVD of Σyy (3.9)
Wy d first K columns of Vy (3.10)
Then the embeddings Px�xi �w� W T
x�xi � µx �RK σx and Py
�yi �Z� W T
y�yi � µy ��K σy equalize their
dimensions 2 while losing minimal information.
Proof. We represent the point sets X and Y by d � n and d f � m matrices, respectively.
Here d and d f reflect the dimensionality of the point sets. Without loss of generality let us
2In the literature this is also known as whitening.
42
assume that the dimensionality of set X is greater than that of set Y , i.e., the value of K used
in equations 3.7 and 3.10 is equal to the dimensionality of set Y (d y d f and K � d f ). For
simplicity suppose each point in sets X and Y has a uniform weight, i.e., wi � 1 � 0. Then
clearly,n
∑i
wi � n (3.11)
Using equation 3.11, equations 3.1 through 3.5 and equation 3.5 can be written as follows:
µx d 1n ∑
ixi (3.12)
µy d 1n ∑
iyi (3.13)
σ 2x d 1
n ∑i��� xi � µx ��� 2 (3.14)
σ 2y d 1
n ∑i��� yi � µy ��� 2 (3.15)
Σxx d 1n ∑
i
�xi � µx � � xi � µx � T (3.16)
Σyy d 1n ∑
i
�yi � µy � � yi � µy � T (3.17)
As shown in equation 3.6, computing the Singular Value Decomposition (SVD) of Σxx
yields three matrices,i.e, Σxx � UxDxV Tx , where the columns of Ux are the eigenvectors of
ΣxxΣxxT , diagonal entries of Dx are the square roots of the eigenvalues of both ΣxxΣxx
T and
ΣxxT Σxx, and the columns of Vx are the eigenvectors of Σxx
T Σxx. Note that Vx contains the
right singular vectors of Σxx and thus Wx consists of the first K right singular vectors of the
covariance matrix, Σxx. Similarly, after computing the Singular Value Decomposition of
Σyy, Wy consists of the first K right singular vectors of Σyy. Since based on our assumption
43
K � d f , Wy contains all right singular vectors of Σyy.
It is clear that both Σxx and Vx are d � d matrices. Since Wx includes the first d f (or
K) columns of Vx, Wx is a d � d f matrix. Similarly, Wy is represented as a d fj� d f matrix.
This, in turn, makes the final embeddings computed by both Px�xi � and Py
�yi � have the same
dimensions d f (Px and Py are d f|� n and d f|� m matrices, respectively). Hence, this proves
the theorem for uniform-weight point sets.
Note that the proof can easily be generalized for arbitrary point weights. Assuming
that point sets have integer weights (in case of rational weights, we can multiply each
weight by their least common denominator), we replace each weighted point�xi � wi � by
wi uniform-weight pairs. This results in two uniform-weight point sets X f and Y f with n fand m f elements, where n f / n and m f / m. (X f and Y f are d � n f and d f � m f matrices,
respectively). We can then use the first part of the proof for sets X f and Y f . This concludes
the theorem.
The above embedding has preserved both graph structure and edge weights, but has not
accounted for node information. To accommodate node information in our embedding, we
will associate a weight wv to each vector f�v � , for all v � V . These weights will be defined
in terms of vertex labels which, in turn, encode image feature values. Note that nodes with
multiple feature values give rise to a vector of weights assigned to every point. We will
present an example of one such distribution in Chapter 6.
44
3.4 Embedding through Spherical Coding
The previous embedding procedure suffers from a significant drawback. Namely, each
graph is embedded into a vector space whose dimensionality is dependent on the graph
structure. Before the embeddings can be matched, a dimensionality reduction technique is
required. Since the original high-dimensional data cannot be represented exactly in lower
dimensions, dimensionality reduction methods introduce error. In this section we intro-
duce a novel, linear-time method to embed trees into vector spaces of prescribed dimen-
sionality, thereby avoiding the need for a dimensionality reduction step. As in the previous
embedding, this embedding is based on the caterpillar decomposition of the metric tree.
The paths of this decomposition will be embedded along maximally spaced rays in some
fixed-dimension metric space. In this construction the set of rays share the origin as their
end-points. The main step of the embedding is to identify the principal direction for each
ray to guarantee that the rays are maximally apart. In practice, this can be achieved by
placing maximally spaced points on the surface of a unit sphere and using the unit-length
vectors between the origin and these points as the principal directions of the rays. One
may observe that the first embedding method is a special case of this embedding when the
dimension of the embedding space is equal to the number of levels in the decomposition
(caterpillar dimension) and the corresponding rays form an orthogonal basis for the em-
bedding space. This new embedding embedding algorithm is a deterministic version of the
algorithm presented in [43]. Given an object and its rotated view on a plane, this determin-
istic embedding algorithm guarantees that the principal directions of unit-length vectors
assigned to nodes that have the maximum distance from the root are the same.
45
Figure 3.4: The minimum distance d and minimum angle θ between 2 points.
A spherical code is a finite set of n points on the surface of a multi-dimensional unit
hypersphere. Given n, one may arrange the points on the sphere so as to minimize or
maximize a number of objective functions, such as the minimum distance between any
two points, the kissing number, the integration error, the indexing complexity, etc. The
choice of the objective function depends on the application. In this particular example, we
are interested in positioning the points on the sphere to maximize the minimum distance
between any pair of points [20]. Equivalently, one can try to minimize the radius r of a
multi-dimensional sphere such that n points can be placed on the surface, where any two
of the points are at angular distance 2 from each other. Recall that the angular distance
between two points is the acute angle subtended by them at the origin. Figure 3.4 shows
the relationship between the minimum distance and minimum angle between two points.
The minimum distance of a spherical code indicates the quality of the code.
46
a
c
h g
d b
f e
2.0
1.5
1.0
0.5 1.0
0.5 1.0
a
b
c
d
e
f
g
h
C b
C d
C e
C f
Figure 3.5: An edge weighted tree and its spherical code in 2D. The Cartesian coordinatesof the vertices are: a � � 0 � 0 � , b � � 0 � 1 � 0 � , c � � 0 � 1 � 5 � , d � � 2 � 0 � 0 � , e � � 2 � 5 � 0 � 87 � , f ��3 � 5 � 0 � , g � � 3 � 93 � 0 � 25 � , and h � � 4 � 5 � 0 � .
3.4.1 Construction of the Embedding
The embedding framework is best illustrated with an example where a weighted tree is
embedded into 2 , as shown in Figure 3.5. To ease visualization, we will limit the discus-
sion to the first quadrant. The weighted tree contains 4 paths } a � b � c ~ , } a � d � f � h ~ , } d � e ~ , and} f � g ~ in its caterpillar decomposition. In the embedding, the root is assigned to the origin.
Next, we seek a set of four vectors, one for each path in the caterpillar decomposition, such
that their inner products are minimized, i.e., their endpoints are maximally apart. These
vectors define the general directions in which the vertices on each path in the caterpillar
decomposition are embedded.
Three of the four vectors will be used by the caterpillar paths belonging to the subtree
rooted at vertex d, and one vector will be used by the path belonging to the subtree rooted
47
at vertex b. This effectively subdivides the first quadrant into two cones, Cb and Cd . The
volume of these cones is a function of the number of caterpillar paths belonging to the
subtrees rooted at b and d. The cone Cd , in turn, is divided into two smaller cones, Ce and
C f , corresponding to the subtrees rooted at e and f , respectively. The extreme rays of sub-
cones Cb, Ce, and C f correspond to the four directions defining the embedding. To complete
the embedding, we translate the sub-cones away from the origin along their directional rays
to positions defined by the path lengths in the tree. For example, to embed point b, we will
move along the extremal ray of Cb and will embed b at�0 � 1 � 0 � . Similarly, the sub-cone Cd
will be translated along the other extremal ray, embedding d at�2 � 0 � 0 � .
In d-dimensional Euclidean space d , computing the embedding f : V �o� under ���6����� 2is more involved. Let L denote the number of paths in the caterpillar decomposition. The
embedding procedure defines L vectors in d that have a large angle with respect to each
other on the surface of a hypersphere Sd of radius r. These vectors are chosen in such a
way that any two of their endpoints on the surface ∑d are at least spherical distance 2 from
each other. We refer to such vectors as well-separated. Consider the set of hyperplanes
Hi � � 0 � 2 � 4 �R�����*� 2i � , and let ∑d�i ��� Hi � ∑d . Since each of the ∑d
�i � are hypercircles, i.e.,
surfaces of spheres in dimension d � 1, we can recursively construct well-separated vectors
on each hypercircle ∑d�i � . Our construction stops when the sphere becomes a circle and
the surface becomes a point in two dimensions. It is known that taking r to be O�dL1 , d � 1 � ,
and the minimum angle between two vectors to be 2 K r provides us with L well-separated
vectors [20]. In Figure 3.5, we have four such vectors emanating from the origin.
Now that the embedding directions have been established, we can proceed with the em-
bedding of the vertices. The embedding procedure starts from the root (always embedded
48
at the origin) and embeds vertices following the embedding of their parents. For each ver-
tex in the metric tree = , we associate with every subtree = v a set of vectors Cv, such that
the number of vectors in Cv equals the number of paths in the caterpillar decomposition of= v. Initially, the root has the entire set of L vectors. Consider a subtree rooted at vertex
v, and let us assume that vertex v has k children, v1 �R�����R� vk. We partition the set of vectors
into k subsets, such that the number of vectors in each subset, Sv, equals the number of
leaves in = v. We then embed the vertex vl (1 � l � k) at the position f�v �j3 wl � xl , where
wl is the length of the edge�v � vl � and xl is some vector in Cv. We recursively repeat the
same process for each subtree rooted at every child of v, and stop when there are no more
subtrees to consider.
It must be noted that the above algorithm is randomized since the vectors are chosen
arbitrarily from each Cv. We will, however, use the non-randomized version of this embed-
ding in our framework. More specifically, after embedding vertex v into f�v � , we consider
the subtree rooted at v ( = v) and compute the length from each leaf to the root, v. Among
the children of the root, we first start the embedding process from the vertex vl , which lies
on the maximum-length root-leaf path in = v. We then continue embedding vertices in the
same fashion, i.e., the ith vertex to be embedded is the one which lies on the ith maximum-
length root-leaf path in = v. The embedding procedure is summarized in Algorithms 3 and
4
Complexity Analysis of Algorithms 3 and 4
Since the running time of Algorithm 3 depends on that of Algorithm 4, we first ana-
lyze the complexity of Algorithm 4. One may notice that Algorithm 4 is called recursively
for each vertex in the tree. Since all the other steps of this algorithm are constant, its
49
Algorithm 3 Embedding through Spherical Coding
1: Construct the path partition ^ of = according to Section 3.3.1.
2: L d number of paths ^ .
3: r d O�dL1 , d � 1 �
4: δ � ρr where ρ is at least 2.
5: SphericalEmbedding�root �
Algorithm 4 SphericalEmbedding (u)
1: if u � root then
2: f�u �0d 0
3: end if
4: Compute the set of vectors Cu using r� δ � andρ according to Section 3.4.
5: for all v � Ad j > u @ do
6: f�v �-d f
�u �43 w
�u � v � � xu, where xu � Cu
7: SphericalEmbedding�v �
8: end for
50
running time becomes linear in terms of the number of vertices in the tree. Returning to
Algorithm 3, we have shown previously that given a tree, the construction of its caterpil-
lar decomposition takes O� �V �)� . Since Steps 2, 3 and 4 are constant-time operations, the
running time of Algorithm 3 is linear, O� �V �N� .
Since the dimensionality of Euclidean space is taken as an input to the spherical em-
bedding algorithm, a natural question one may ask is what dimensionality of Euclidean
space should be chosen so that the embedding approximates pairwise distances with mini-
mum distortion. To answer this question, we conducted a set of experiments in which 200
trees constructed for some of the Columbia University Image Library (COIL-20)(see [70])
objects were embedded into Euclidean spaces of varying dimensions, and measured the
average distortion in each dimension. For a given tree, we used the following method to
measure its average distortion in one particular dimension. First, we computed all of its
pairwise node distances before and after the embedding. We then measured the maximum
factor by which any pairwise distance was changed by the embedding algoritm. After
repeating this procedure for all trees, the average distortion for one particular dimension
was calculated. The trade-off between distortion and dimension is shown in Figure 3.6.
It should be noted that, while increasing the dimensionality of the embedding space will
improve the quality by decreasing the distortion, this trend does not continue indefinitely to
produce isometric embeddings. This can be attributed to the fact that the original distances
are non-additive, making an isometric embedding impossible.
51
Figure 3.6: Trade-off between distortion and dimension for a given set of graphs.
3.5 Conclusions
This chapter introduced the notion of graph embedding and presented two low-distortion
graph embedding algorithms. Each algorithm takes a tree metric and embeds it into some
low-dimensional vector space with the aim of preserving pairwise distances between the
nodes. While the distances cannot be preserved exactly, we would like to approximate them
with minimal distortion.
The first embedding technique is inspired by the general framework proposed by Ma-
tousek [66]. The framework begins by transforming a graph into a metric tree, which,
is then embedded into a normed vector space. Although low-distortion embedding is
52
achieved, the approach suffers from the significant limitation that each graph is embed-
ded into a vector space whose dimensionality is a property of the graph. Thus, before the
embeddings can be matched, a dimensionality reduction technique (such as PCA) is re-
quired. Since high-dimensional data cannot be represented exactly in lower dimensions,
dimensionality reduction techniques are prone to error.
The limitations of the first embedding technique bring us a second embedding method,
which enables us to embed input trees into normed vector spaces of prescribed dimension-
ality. The main advantage of this technique comes from the fact that because the nodes of
the trees are embedded into the same space, it avoids the need for a dimensionality reduc-
tion step. This novel linear-time embedding technique is based on a deterministic version of
the spherical coding algorithm [43]. The starting point of this embedding technique comes
from the first embedding method, where we use a new dimension orthogonal to all previous
ones for each path in the caterpillar decomposition. However, to restrict the dimension of
the target space, we need to relax the orthogonality constraint and compute L vectors that
have small inner product with each other, where L is the number of paths in the caterpillar
decomposition. Such vectors form good spherical codes as defined in [43] and the cost of
relaxing the orthogonality constraint is O�L1 ,�9 d � 1 ; � .
As presented in this chapter, since the dimension of the target space is given as an input
to the algorithm, it raises an interesting question: What dimensionality of the target space
should be chosen to approximate pairwise distances with minimum distortion? We tried to
answer this question by conducting an experiment where a number of trees were embedded
into Euclidean spaces with different dimensionality and computed the average distortion in
each dimension. The experimental results indicate that increasing the dimensionality of the
53
embedding space improves the quality of the embedding, but this trend does not continue
indefinitely to produce isometric embeddings.
To compare these two embedding methods to each other, we will develop two different
variations of the many-to-many matching framework and demonstrate the effectiveness of
each variation for shape retrieval and pose estimation in the experimental section of this
thesis. Since two different embedding techniques are used, graph nodes will be positioned
at different Cartesian coordinates in the vector space. We will also study the stability of the
matching framework for each variation in the presence of noise/occlusion in the following
chapters.
54
4. Encoding Directed Edges
Graph embedding methods approximate the distance metric defined on undirected edges
of the original graphs with minimal distortion. However, they fail to encode any oriented
relations, such as parent/child or sibling relations common to scale-space or coarse-to-
fine structures. This is due to the fact that oriented relations do not satisfy the symmetry
property of a metric. To encode relational information in the vector space, the embedding
procedure should represent each input graph node as a point in asymmetric metric space.
Due to the limited number of algorithms defined on asymmetric metric distances, we will
instead propose a method to encode relational information in the metric space.
Our method retains this important information by moving it into the nodes as node
attributes, a technique used in the encoding of directed topological structure [99], directed
geometric structure [95], and shape context [9]. Encoding in a node the attributes of the
oriented edges incident to the node requires computing distributions on the attributes and
assigning them to the node. For example, a node with a single parent at a coarser scale and
two children at a finer scale might encode a relative scale distribution (histogram) as a node
attribute. The resulting attribute provides a contextual signature for the node which will be
used by the matching framework (Chapter 5) to reduce matching ambiguity.
4.1 Qualitative Shape Representation Using a Blob/Ridge Decomposition
We will motivate this encoding in the context of directed graphs for qualitative shape
representation using a blob/ridge decomposition; details can be found in [95]. Two exam-
55
ples are shown in Figure 1.1. A blob (compact region) is graphically represented by a circle
defining a support region whose radius is proportional to its scale (7
t). Blobs are detected
as local maxima in scale space of the square of the normalized Laplacian operator,
∇2normL � t
�Lxx 3 Lyy � (4.1)
Ridges (elongated structures) are represented as ellipses each defining a support region
whose width is proportional to its scale (7
t). These elongated structures are localized
where the multi-scale ridge detector,
RnormL � t3 , 2 � Lpp � Lqq � 2 � t3 , 2 ��� Lxx � Lyy � 2 3 4L2xy � (4.2)
assumes a local maximum in scale-space. For color images, the feature detection is per-
formed in the R, G, and B channels, respectively. To represent the spatial extent of a
detected image structure, a windowed second moment matrix,
∑ �&�η � ℜ2 ��� L2
x LxLy
LxLy L2y
�'�� g�η; tint � dη (4.3)
is computed at the detected feature position and at an integration scale tint proportional
to the scale tdet of the detected image feature. The orientation and the anisotropy of the
feature are estimated from the eigenvalues of ∑ and the corresponding eigenvectors. The
spatial extent of the feature is thus given by the scale, the anisotropy, and the orientation.
Figure 4.1 shows an image of a hand with the extracted features superimposed.
56
Figure 4.1: Feature Extraction: Extracted blobs and ridges at appropriate scales.
The feature detection process may receive multiple overlapping responses originating
from the same image structure. Therefore, features are merged to remove such overlapping
responses. To detect overlapping features, we need a measure of inter-feature similarity.
For this purpose, each feature is associated with a 2-D Gaussian kernel g�x � ∑ � . When
two features are positioned near each other, their Gaussian functions will intersect. The
similarity measure between two such features can be defined as the disjunct volume D of
the two Gaussians, and is computed as
D�A � B �0� p �ΣA �D3&�ΣB �
2�
η � ℜ2
�gA � gB � 2dx � (4.4)
Similarly, the ridge detection will produce multiple responses on a ridge structure that
is long compared to its width. These ridges are linked together to form one long ridge, as
shown in Figure 4.2. Ridges are linked based on overlap and alignment. After the linking
is performed, we re-calculate the anisotropy and support region for the resulting ridge. The
57
Figure 4.2: Extracted blobs and ridges after removing multiple responses and ridge linking.
anisotropy is re-calculated as 1 � � w K l � , where w is the width of the structure and l is the
length of the structure.
Once we construct the feature map, we then assemble the component features into a
directed acyclic graph. Algorithm 5 shows the graph construction.
Algorithm 5 Ridge and Blob Decomposition Graph Construction
1: Extract features, merge multiple feature responses, and link ridges.
2: Choose the coarsest scale feature as the root.
3: Recursively define child nodes of the root based on spatial overlap; find parental andsibling edges.
4: Compute relations between the nodes in the graph; these serve as attributes of theedges.
5: For features not included in the graph, go to step 3.
As outlined in the algorithm, after linking spatially overlapping aligned ridges and
merging spatially overlapping blobs, we build directed acyclic graphs in a coarse-to-fine
58
manner. Specially, let G � � V � E � be a graph to be embedded. Each feature is represented
as a node in the graph and has a number of attributes, including position, orientation, and
support region. A feature at the coarsest scale is chosen as the root. Next, finer-scale
features that overlap with the root become its children through hierarchical edges. These
children, in turn, select overlapping features at finer scales to be their children, etc. From
the unassigned features, the feature at the coarsest scale is chosen as a new root. Children
of this root are selected from unassigned as well as assigned features and the process is
repeated until all features are assigned to a graph. This process creates the possibility that
a node may have multiple parents. In order to create one rooted graph which is needed in
the matching step, a virtual top root node is inserted as the parent of all root nodes in the
image.
There are a number of important geometric attributes associated with each edge. For
an edge � , directed from a vertex � A representing feature � A, to a vertex � B representing
feature � B, we define the following attributes, as shown in Figure 4.3:� Distance. Two measures of inter-feature distance are associated
with the edge: 1) the smallest distance d from the support region
of � A to the support region of � B, normalized to the the largest of
the radii rA and rB; and 2) the distance between their centers nor-
malized to the radius rA of � A in the direction of the distance vector
between their centers.� Relative orientation. The relative orientation between � A and � B.� Bearing. The bearing of a feature � B, as seen from a feature � A,
59
r
Feature
Feature
dB
A
r
B
A
rA Feature B
AFeature
d
relative orientation
φ
Feature B
Feature A
(a) (b) (c) (d)
Figure 4.3: The four edge relations: (a,b) two normalized distance measures, (c) relativeorientation, and (d) bearing.
is defined as the angle of the distance vector xB � xA with respect to
the orientation of A measured counter-clockwise.� Scale ratio. The scale invariant relation between � A and � B is a
ratio between scales t �A
and t �B.
Examples of graphs for hand images, showing hierarchical edges, are shown in Figure 1.1.
For every pair of vertices,�u � v � , we let Ru : v denote the attribute vector associated with
the pair. The entries of each vector represent the set of oriented relations R between u � v.
For a vertex u � V , we let N�u � denote the set of vertices v � V adjacent to u. For a relation
p � R, we denote � � u � p � as the set of values for relation p between u and all vertices in
N�u � , i.e., � � u � p � corresponds to entry p of vector Ru : v for v � N
�u � . Feature vector � u
for point u is the set of all � � u � p � ’s for p � R. Observe that every entry � � u � p � of vector� u can be considered as a local distribution (histogram) of feature p in the neighborhood
N�u � of u (see Figure 4.4). We adopt the method of [95], in which the distance function for
two such vectors � u and � p is computed through a weighted combination of Hausdorff
distances between � � u � p � and � � u f � p � for all values of p.
In Part (a) of Figure 4.5, we illustrate an example graph where a vertex and its neighbors
60
Figure 4.4: Histogram creation for each directed graph relation
with their attributes are shown. Assuming there are only two attributes associated with
every vertex, Parts (b) and (c) show two histograms for each of these attributes.
4.2 Conclusions
As presented in Chapter 3, graph embedding methods approximate the distance metric
defined by undirected, weighted edges of the original graphs. Given two nodes u and v in
a tree, the embedding methods guarantee that the undirected (symmetric) distance between
them is within a maximum-factor range of that in the target space. However, since oriented
relations, such as parent/child or sibling relations common to scale-space structures are di-
rected (asymmetric), they cannot be encoded by embedding algorithms. To overcome this
problem, we moved such hierarchical relations into nodes as their node attributes. More
61
Figure 4.5: Part (a) shows a vertex and its neighbors with their attributes. Histogramscreated for each attribute are presented in parts (b) and (c).
specifically, for every incoming and outgoing edge adjacent to one vertex, we created a
local histogram. For one particular vertex in a given graph, we used its histograms along
with its geometric location in the vector space to find its corresponding node(s) in the sec-
ond graph. The main advantage of this method, along with a low-distortion tree embedding
method, is that it will enable us to encode both geometric and topological structure of input
graphs during the matching process.
In our many-to-many matching framework, the steps that we presented so far represent
graph nodes as a set of points in a high-dimensional vector space. Our next goal is to
match these point sets and find the correspondences between them. Given two point sets,
the algorithm establishing such correspondences should also compute a similarity score
between them. Using this similarity value, we will gain information about how similar the
62
original graphs, and therefore, their corresponding shapes are. In the next chapter we will
first define the point matching algorithm used in the framework. We then show by a set of
experiments that many-to-many point correspondences using embedded node histograms
in the vector space yield meaningful many-to-many node correspondences in the original
graphs.
63
5. Distribution-Based Many-to-Many Matching
By embedding vertex-labeled graphs into normed spaces, we have reduced the problem
of many-to-many matching of graphs to that of many-to-many matching of weighted dis-
tributions of points in normed spaces. Given a pair of weighted distributions in the same
normed space, the Earth Mover’s Distance (EMD) framework [82] is then applied to find
an optimal match between the distributions. The EMD approach computes the minimum
amount of work (defined in terms of displacements of the masses associated with points)
it takes to transform one distribution into another. The EMD approach assumes that a dis-
tance measure between single features, called the ground distance, is given. The EMD
then “lifts” this distance from individual features to full distributions. The main advantage
of using EMD lies in the fact that it subsumes many histogram distances and permits par-
tial matches. This important property allows the similarity measure to deal with uneven
clusters and noisy datasets.
Computing the EMD is based on a solution to the well-known transportation prob-
lem [4], whose optimal value determines the minimum amount of “work” required to trans-
form one distribution into the other. More formally, let P � B � p1 � wp1���R�����R� � pm � wpm �`E be
the first distribution with m points, and let Q �.B � q1 � wq1���R�����*� � qn � wqn �`E be the second dis-
tribution with n points. Let D �g> di j @ be the ground distance matrix, where di j is the ground
distance between points pi and q j. Our objective is to find a flow matrix F �q> fi j @ , with fi j
being the flow between points pi and q j, that minimizes the overall cost:
64
Work�P� Q � F �-� ∑m
i s 1 ∑nj s 1 fi jdi j
subject to the following list of constraints:
fi j / 0 � 1 � i � m � 1 � j � n
∑nj s 1 fi j � wpi
� 1 � i � m
∑mi s 1 fi j � wq j
� 1 � j � n
∑mi s 1 ∑n
j s 1 fi j � min t ∑mi s 1 wpi
� ∑nj s 1 wq j
uThe optimal value of the objective function Work
�P� Q � F � defines the Earth Mover’s Dis-
tance between the two distributions.
The above formulation assumes that the two distributions have been aligned. However,
recall that a translated and rotated version of a graph embedding will also be a graph em-
bedding. To accommodate pairs of distributions that are “not rigidly embedded”, Cohen
and Guibas [19] extended the definition of EMD, originally applicable to pairs of fixed
sets of points, to allow one of the sets to undergo a transformation. Assuming that a
transformation T ��� is applied to the second distribution, distances dTi j are defined as
dTi j � d
�pi � T � q j �R� , and the objective function becomes Work
�P� Q � F � T �-� ∑m
i s 1 ∑nj s 1 fi jd
Ti j.
The minimal value of the objective function Work�P� Q � F � T � defines the Earth Mover’s
Distance between the two distributions that are allowed to undergo a transformation from� .
Cohen and Guibas [19] also suggested an iterative process (which they call FT, short
for “an optimal Flow and an optimal Transformation”) that achieves a local minimum
65
of the objective function. Starting with an initial transformation T 9 0 ; �I� from a given
T 9 k ; �P� , they compute the optimal flow F � F 9 k ; that minimizes the objective function
Work�P� T 9 k ; � Q ��� F � , and from a given optimal flow F 9 k ; they compute an optimal transfor-
mation T � T 9 k � 1 ; �I� that minimizes the objective function Work�P� T � Q ��� F 9 k ; � . The
iterative process stops when the improvement in the objective function value falls below
a threshold. The resulting optimal pair�F � T � depends on the initial transformation T 9 0 ; .
Starting the iteration from several initial transformations increases the likelihood of obtain-
ing a global minimum.
5.1 Choosing an Appropriate Transformation
For our framework, the set � of allowable transformations consists of only those trans-
formations that preserve distances. Therefore, we use a weighted version of the Least
Squares Estimation algorithm [104] to compute an optimal distance-preserving transfor-
mation given a flow between the distributions. Specifically, the following theorem shows
how to compute the transformation parameters.
Theorem 5. Given a set of pairings B � xi � yi � wi �`E (the flow of weight wi is sent from point
xi to point yi), the optimal transformation T�x �-� cRx 3 t is defined as follows:
66
µx d �∑
iwixi ��K ∑
iwi (5.1)
µy d �∑
iwiyi ��K ∑
iwi (5.2)
σ 2x d �
∑i
wi ��� xi � µx ��� 2 ��K ∑i
wi (5.3)
σ 2y d �
∑i
wi ��� yi � µy ��� 2 ��K ∑i
wi (5.4)
Σxy d �∑
iwi�yi � µy � � xi � µx � T ��K ∑
iwi (5.5)
R d UV T , where UDV T is the SVD of Σxy (5.6)
c d σy K σx (5.7)
t d µy � cRµx (5.8)
Proof. The original proof of optimality of the transformation [104] is easily adapted to
the weighted case. Namely, assuming that the flows from the xi’s to the yi’s are integer,
and each weighted pairing B � xi � yi � wi �`E is replaced by wi unweighted pairings B � x ji � y j
i �`E ,which makes the original proof applicable. Collecting appropriate terms, we get weighted
versions of the original equations. Fractional flows are reduced to integer flows by multi-
plying all fractions by their least common denominator. More formally the proof can be
stated as follows.
Consider a set of pairs S and its weight set W :
S � B � x1 � y1 ���������*� � xn � yn �`EW � B w1 �������*� wn E
67
Let us first assume that each pair�xi � yi � has a uniform weight, i.e, wi � 1 � 0. Then
clearlyn
∑i
wi � n (5.9)
Using equation 5.9, characteristics in 5.1- 5.5 can be written as follows:
µx d 1n ∑
ixi (5.10)
µy d 1n ∑
iyi (5.11)
σ 2x d 1
n ∑i��� xi � µx ��� 2 (5.12)
σ 2y d 1
n ∑i��� yi � µy ��� 2 (5.13)
Σxy d 1n ∑
i
�yi � µy � � xi � µx � T (5.14)
One may notice that the equations (5.10 - 5.14) are the same as the ones in [104]. Thus,
we can follow the original proof for the uniformly weighted case. More generally, suppose
that each pair has an integer weight, i.e., wi / 1. (Note that in case of rational weights, we
multiple each weight by their least common denominator. ) After replacing each weighted
pair�xi � yi � wi � with wi uniform-weight pairs, S and W can be written as follows:
S f � B � x1 � y1 ���������*� � xn � yn �`�������*� � xm � ym �`EW f � B 1 �������$� 1 E
68
In S f some of the pairings�xi � yi � are repeated more than once. It is then easy to see that the
first part of the proof can easily be applied to S f . Hence, this concludes the proof.
5.2 The Final Algorithm
Our algorithm for many-to-many matching is a combination of the previous procedures.
Specifically, given two vertex-labeled edge-weighted graphs G1 and G2, we first find low-
distortion embeddings of the graphs into low-dimensional normed spaces, obtaining two
weighted distributions. Depending on which embedding method is used, a dimensionality
reduction technique (such as PCA) is required to bring the embeddings into the same space.
We then “register” one distribution with respect to the other so as to minimize the (original)
EMD between them. Next, we apply the FT iteration of the transformation version of the
EMD framework [19] to minimize the (extended) EMD. The pairing of points minimizing
the EMD corresponds to a weighted many-to-many pairing of nodes. We summarize our
approach in Algorithm 6.
Algorithm 6 Many-to-many graph matching
1: Compute the metric tree = i corresponding to Gi according to Chapter 3 (see [3] fordetails).
2: Construct low-distortion embeddings � i = fi� = i � of = i into
� � i �'���6����� 2 � according to oneof the algorithms presented in Chapter 3.3 and Chapter 3.4.
3: Compute the EMD between � i’s by applying the FT iteration, computing the optimaltransformation T according to Chapter 5 (see [56] for details).
4: Interpret the resulting optimal flow between � i’s as a many-to-many vertex matchingbetween Gi’s.
69
Complexity Analysis of Algorithm 6
As we showed in Section 3.2, computing the metric tree = i for a given graph Gi takes
O� �V � 2 � . The complexity of Step 2 depends on which embedding algorithm is used. While
this may take O� �V ���I�E �)� using graph dependent dimensionality, it can also be done in
linear time through spherical coding. Since computing the EMD is based on the trans-
portation problem, it can be solved using a network flow algorithm in O� �V � 3 � . The FT
iteration alternates between finding the optimum transformation for a given flow and the
optimum flow for a given transformation. We measured in our experiments that this proce-
dure converges after five or six iterations. Finally, Step 4, the mapping of the EMD solution
back to the graph solution, is O� �V �N� . The overall complexity of the algorithm is therefore
O� �V � 3 � . Note that the total running time can be further improved by using efficient algo-
rithms for the transportation problem. For example, Atkinson and Vaidya [5] presented an
O�n2 � 5 logn logW � -algorithm for solving the transportation problem, where W is the mag-
nitude of the largest supply or demand in the EMD formulation and n is the total number
of nodes in G1 and G2.
5.3 Conclusions
After reducing the many-to-many feature matching problem into the point matching
problem, we use one existing framework, Earth Mover’s Distance, to find many-to-many
point correspondences in a vector space. Recall that pairwise distances between points in
the vector space reflect the shortest path distances between their corresponding nodes in
the original graphs. As mentioned in the previous chapters, we create local histograms
70
to encode directed relations in the original graphs. Given a point in the first embedding,
we use its local histograms and geometric coordinates in the EMD approach to locate its
corresponding point(s) from the second embedding.
During the matching process, it is important to consider the possibility that one point
set may undergo a transformation with respect to the other. To handle this, we use the EMD
under transformation to further minimize the objective function of the EMD formulation.
An important property of the EMD approach is that it subsumes many histogram dis-
tances and permits partial matches. This property is particularly useful when the total
weights (masses) of two distributions are not equal. In the experimental sections of the
thesis, we will use this property to locate a query object in a scene.
To experimentally verify that the many-to-many point correspondences reflect mean-
ingful many-to-many feature (node) correspondences in the original graphs and also to
show that the similarity score between point sets can be used as the similarity of input
graphs, we will perform a set of recognition and matching experiments on different do-
mains in the following chapters.
71
6. View-Based 3-D Object Recognition
To demonstrate the effectiveness of our many-to-many matching framework, we apply
it to the problem of view-based 3-D object recognition using two different graph-based
shape representations; silhouettes and ridge-and-blob decomposition graphs. In addition,
we compare our matching results to two leading graph matching algorithms: a one-to-one
matching algorithm proposed by Pelillo et al. [76] (using association graphs) and a many-
to-many matching algorithm proposed by Sebastian et al. [87] (using graph-edit distance)
in this chapter.
6.1 Many-to-Many Matching using Silhouettes
We first turn to the domain of view-based object recognition using silhouettes. For a
given view, an object’s silhouette is first represented by an undirected, rooted, weighted
graph, in which nodes represent shocks [99] (or, equivalently, skeleton points) and edges
connect adjacent shock points. Note that this representation is closely related to Siddiqi et
al.’s shock graph [99], except that our nodes (shock points) are neither clustered nor are
our edges directed. We will assume that each point p on the discrete skeleton is labeled
by a 4-dimensional vector v�p ��� � x � y � r� α � , where
�x � y � are the Euclidean coordinates of
the point, r is the radius of the maximal bi-tangent circle centered at the point, and α is the
angle between the normal to either bitangent and the linear approximation to the skeleton
curve at the point.1 This 4-tuple can be thought of as encoding local shape information of1Note that this 4-tuple is slightly different from Siddiqi et al.’s shock point 4-tuple, where the latter’s
radius is assumed normal to the axis.
72
the silhouette.
Skeletons with many points lead to graphs with many nodes. To reduce the size of the
graph, we first subdivide the skeleton into a number of small fragments of approximately 5
shock points each. Since the fragments are small, we can compute well-defined vector (4-
tuple) averages over the fragments. These averages become the labels of the corresponding
graph nodes. We define the distance between two nodes as the Euclidean distance between
their vector labels. For those pairs of nodes that correspond to adjacent skeleton fragments,
we define an edge whose weight is defined by the Euclidean distance between the pair. We
should mention here that the fragment size was chosen arbitrarily, and we expect that other
choices of similar magnitudes will work equally well.
To convert our shock graphs to shock trees, we compute the minimum spanning tree
of the weighted shock graph. Since the edges of the shock graph are weighted based on
Euclidean distances of corresponding nodes, the minimum spanning tree will generate suit-
able tree approximation for shock graphs. The root of the tree is the node that minimizes
the sum of distances to all other nodes. Finally, each node is weighted proportionally to
its average radius, with the total tree weight being 1. An illustration of the procedure was
given in Figure 1.5 (It is shown again in Figure 6.1). The left portion shows the initial
silhouette and its shock points (skeleton). The right portion depicts the constructed shock
tree. Darker, heavier nodes correspond to fragments whose average radii are larger.
We tested our many-to-many matching algorithm on a database of 1620 silhouettes
of 9 objects, with 180 views per object. A representative view of each object is shown
in Figure 6.2. For the experiments, we compute the shock tree representation of every
silhouette, and embed each tree into a normed space with low distortion. This procedure
73
Figure 6.1: Left: the silhouette and its medial axis. Right: the medial axis tree constructedfrom the medial axis. Darker nodes reflect larger radii.
Figure 6.2: Sample views of the 9 objects.
results in a database of weighted point-sets, each representing an embedded graph.To test our approach, we randomly selected 19 equidistant views of each object and
computed distances between these views and each of the remaining database entries (the
distance between a view and itself is always zero). To compute the distance between ob-
jects A and B, for one view of object A, we first find the sum of its total distances to object
B. After repeating this process for the other views of object A, we compute the average
distance between them. These object distances are summarized in Table 1, Figure 6.3. The
magnitudes of the distances are denoted by shades of gray, with black and white repre-
senting the smallest and largest distance, respectively. Due to symmetry of the resulting
distances, we only included the upper triangle of results. Intra-object distances, shown
along the main diagonal, are very close to zero. According to the table, inter-object dis-
74
tances were near intra-object distances in only 3 out of 36 cases (BINOCULAR and CLOCK,
CAMERA and PHONE, and CAR and TEAPOT).
To better understand the differences in the recognition rates for different objects, we
have selected a subset of the matching results among the 4 views of TEAPOT, taken at
20 � , 30 � , 60 � , and 90 � , respectively, as shown in Table 2. Due to the highly symmetric
structure of the object, implying that neighboring views are more likely to be similar, the
distance between a view of TEAPOT and its neighboring view is closer than its distance
to other objects’ views. Conversely, Table 3 illustrates the fact that due to a low view
sampling resolution, certain views of certain objects are more similar to certain views of
other objects than they are to neighboring views of the same object. For example, the best
(non-identical) match for the third view of CUP is the first view of PHONE. Upon closer
inspection of these two degenerate views, it turns out that there is considerable similarity
in their shock tree representations. On the other hand, the first two views of CUP have been
optimally matched to each other, along with the last two views of PHONE.
Figure 6.4 illustrates the many-to-many correspondences that our matching algorithm
yields for two adjacent views (30 � and 40 � ) of the TEAPOT. Corresponding clusters (many-
to-many mappings) have been shaded with the same color. Note that the extraneous branch
in the left view was not matched in the right view, reflecting the method’s ability to deal
with noise. More examples showing that the many-to-many feature matching results in an
intuitive pairing of shock segments are presented in Figure 6.5.
Based on the overall matching statistics, we observed that in 5 � 74% of the experiments,
the closest match selected by our algorithm was not a neighboring view of the correct ob-
ject. We expect that with increased view sampling resolution, ensuring that for each object
75
Figure 6.3: Summary of many-to-many matchings of object silhouettes. Every entry of Ta-ble 1 corresponds to a set of 19 � 19 matching results between the views of the two objectsassociated with the row and the column. The shade of gray in each cell denotes averagematching distance of each 19 � 19 block, with black and white representing smallest andlargest distances, respectively. Table 2 shows a close up look at the matching results forfour views of TEAPOT. Table 3 depicts a subset of results from three seperate blocks.
76
Figure 6.4: Illustration of the many-to-many correspondences computed for two adjacentviews of the TEAPOT. Matched point clusters are shaded with the same color.
view there exists a similar neighboring view, this error rate would decrease significantly.
We repeated the experiment using the spherical embedding, resulting in a 4 � 9% error rate.
This is a clear improvement in performance, at a reduced computational cost.
It should be noted that both the embedding and matching procedures can accommodate
perturbation, such as noise and occlusion. This is due to the fact that the path partitions
for unperturbed portions of the graph are unaffected by perturbation. Moreover, the projec-
tions of unperturbed nodes will also be unaffected by perturbation. Finally, the matching
procedure is an iterative process driven by flow optimization which, in turn depends only
on local features, whose local attributes can act as matching constraints.
77
Figure 6.5: The result of matching skeleton graphs for some shapes in the Rutgers ToolsDatabase. Same colors indicate corresponding segments. Observe that the correspondenceis intuitive in all cases.
78
To test the sensitivity of the matching algorithm to perturbation of the query, we per-
formed the following experiment for each of the 9 objects. Each view, in turn, was used
as a query (with replacement) and perturbed by deleting a randomly selected connected
subset of the skeleton points whose size was chosen randomly to fall between 5% and 25%
of the total number of skeleton points. If the closest view to the query was the unperturbed
view, matching was scored as correct. For the 9 objects, the average correct score was 89%,
reflecting the algorithm’s stability to missing data, a form of occlusion using Matousek’s
embedding. For the spherical embedding, we observed an average correct score of 91.4%.
6.2 Many-to-Many Matching using Ridge-and-Blob Decomposition Graphs
We now turn to the domain of blob graphs, a brief overview of which was presented
in Chapter 4. Let us first return to the example shown in Figure 1.1, where we observed
the need for many-to-many matching. The results of applying our method to these two
images are shown in Figure 6.6, in which many-to-many feature correspondences have
been colored the same. For example, a set of blobs and ridges describing a finger in the left
image is mapped to a set of blobs in ridges on the corresponding finger in the right image.
To provide a more comprehensive evaluation, we tested our framework on two separate
image libraries, the Columbia University COIL-20 (20 objects, 72 views per object) and
the ETH Zurich ETH-80 (8 categories, 10 exemplars per category, 41 views per exemplar).
A representative view of each object is shown in Figure 6.7. For each view, we compute a
multi-scale blob decomposition, using the algorithm described in [95]. Next, we compute
the tree metric corresponding to the complete edge-weighted graph defined on the regions
of the scale-space decomposition of the view. The edge weights are computed as a function
79
Figure 6.6: Applying our algorithm to the images in Figure 1.1. Many-to-many featurecorrespondences have been colored the same.
of the distances between the centroids of the regions in the scale-space representation. Fi-
nally, each tree is embedded into a normed space of prescribed dimension. This procedure
results in two databases of weighted point sets, each point set representing an embedded
graph.
For the COIL-20 database, we begin by removing 36 (of the 72) representative views
of each object (every other view), and use these removed views as queries to the remaining
view database (the other 36 views for each of the 20 objects). We then compute the distance
between each “query” view and each of the remaining database views, using our proposed
matching algorithm. Ideally, for any given query view i of object j, vi : j, the matching
algorithm should return either vi � 1 : j or vi � 1 : j as the closest view. We will classify this as a
correct matching. Figure 6.8 presents a subset of the matching experiments for object 9 of
the COIL-20 database, with a correct matching in almost all cases.
Based on the overall matching statistics, we observe that in all but 4 � 8% of the exper-
iments, the closest match selected by our algorithm was a neighboring view. Moreover,
80
Figure 6.7: Views of sample objects from the Columbia University Image Library (COIL-20) and the ETH Zurich (ETH-80) Image Set.
among the mismatches, the closest view belonged to the same object in 81 � 02% of the
cases. In comparison, Matousek’s embedding yielded a 10.74% matching error where,
among the mismatches, the closest view belonged to the same object in 80.0% of the cases.
Figure 6.9 presents the result of this experiment, with darker points representing the closer
matches.
For the ETH-80 database, we chose a subset of 32 objects (4 from each of the 8 cate-
gories) with full sampling (41 views) per object. For each object, we removed each of its
41 views from the database, one view at a time, and used the removed view as a query to
the remaining view database. We then computed the distance between each query view and
each of the remaining database views. The criteria for correct classification was similar
to the COIL-20 experiment. Our experiments showed that in all but 6 � 2% of the experi-
ments, the closest match selected by our algorithm was a neighboring view. Among the
81
ModelQuery
2.70 7.05 6.54 7.78 10.37 5.89 13.41 12.30 20.34 13.90 19.60 19.53
4.14 5.56 5.18 7.98 10.30 4.24 12.34 11.23 20.01 12.24 18.40 17.73
6.39 2.34 2.68 4.17 5.97 5.94 17.03 15.87 25.28 16.74 22.99 22.17
6.07 4.04 4.04 3.17 4.44 6.64 17.26 15.92 25.82 17.17 23.53 22.80
7.27 6.39 6.55 5.31 3.88 8.26 18.20 16.88 26.74 17.85 24.57 23.81
5.31 4.20 5.25 5.67 3.21 5.63 17.08 15.86 25.20 17.07 23.49 22.79
9.61 11.65 11.21 13.81 16.00 6.80 7.07 8.20 14.92 9.05 13.74 14.65
13.64 15.32 14.85 17.35 19.28 11.80 2.69 3.70 14.20 6.75 10.93 12.19
14.34 16.03 15.23 17.92 19.90 12.21 5.28 3.54 14.61 4.61 8.96 10.33
13.50 14.90 14.41 17.39 19.32 11.44 6.56 4.13 15.00 5.25 8.97 9.98
17.16 18.97 18.34 21.28 23.11 15.70 7.95 7.85 13.52 4.23 4.73 6.17
20.53 22.30 21.25 24.17 26.18 18.77 11.46 11.59 14.48 7.14 3.02 2.75
20.19 20.90 19.92 22.89 24.87 18.18 12.19 12.27 14.91 7.94 6.53 3.24
Figure 6.8: Sample matching results for object 9 of the COIL-20 database, in which rowsand columns can be interleaved to form the set of sequential views. The diagonal and nextlower diagonal therefore represent the neighboring views of the query (row). Only onequery, entry (10,8), was incorrectly matched.
mismatches, the closest view belonged to the same object in 77 � 19% of the cases, and the
same category in 96 � 27% of the cases. For Matousek’s embedding, in all but 17 � 5% of
the experiments, the closest view belonged to the correct object in 67 � 4% of the cases, and
82
Figure 6.9: The matching results for the COIL-20 database. The rows represent the queryviews (36 views per object), and the columns representing model views (36 views per ob-ject). Each row represents the matching results for a query view against the whole database.The intensity of entries represents the quality of the matching, with black representing max-imum similarity between the views and white minimum similarity.
83
PERTURBATION 5% 10% 15% 20%RECOGNITION RATE COIL-20 91.07% 88.13% 83.68% 77.72%RECOGNITION RATE ETH-80 93.2% 90.1% 86.3% 82.2%
Table 6.1: Recognition rate as a function of increasing perturbation. Note that the base-line recognition rate (with no perturbation) is 98.0% for COIL-20 and 98.5% for ETH-80datasets.
the same category in 81 � 3% of the cases. The results clearly demonstrate the improved
performance offered by the spherical embedding technique.
To demonstrate the framework’s robustness, we performed four perturbation experi-
ments on the COIL-20 and ETH-80 databases. The experiments are identical to the COIL-
20 and ETH-80 experiments described above, except that the query graph was perturbed
by adding/deleting 5%, 10%, 15%, and 20% of its nodes (and their adjoining edges). The
choice of spherical embedding was motivated by its better performance over that of Ma-
tousek’s embedding. The results are shown in Table 6.1, and reveal that, like our skeleton
tree matching example, the error rates increase gracefully as a function of increased pertur-
bation.
It should be pointed out that both skeleton tree and blob graph experiments can be con-
sidered worst case for two reasons. First, the sampling resolutions of the viewing sphere
were high in each case, meaning that more than the immediate neighbors of a particular
view may be similar to it. Given the high similarity among neighboring views, it could be
argued that our matching criterion is overly harsh, and that perhaps a measure of “view-
point distance”, i.e., “how many views away was the closest match” would be less severe.
In any case, we anticipate that with fewer samples per object, neighboring views would be
more dissimilar, and our matching results would improve. Second, and perhaps more im-
84
portantly, many of the objects are symmetric, and if a query neighbor has an identical view
elsewhere on the object, that view might be chosen (with equal distance) and scored as an
error. Many of the objects in the database are rotationally symmetric, yielding identical
views from each viewpoint.
6.3 Comparison to Other Approaches
In addition to demonstrating the effectiveness of our many-to-many matching algorithm
applied to shape retrieval, we compare our matching results to two leading graph matching
algorithms: a one-to-one matching algorithm proposed by Pelillo et al. [76] (using asso-
ciation graphs) and a many-to-many matching algorithm proposed by Sebastian et al. [87]
(using graph-edit distance). For the comparison, we use the Rutgers Tool Database [99],
which consists of 25 shapes organized into eight classes: brush, hammer, pliers, screw-
driver, wrench, hand, profile, and horse. Four of these classes, namely, hammer, pliers,
screwdriver, and wrench, can be further grouped into a broader “tools” category. Sam-
ple views from each class are shown in Figure 6.10. In the experiment, we remove the
first shape (the query) from the database and compare it to all remaining database shapes.
The shape is then put back in the database, and the procedure is repeated with the second
database shape, etc., until all 25 shapes have been used as a query. After computing the
similarity values between every database pair, we look at the top matches to see how many
of the within-category shapes belong to the same class as the query. Ideally, if an object
has n shapes in the database, the top n � 1 entries should belong to the same class as the
query.
Our results, along with those reported in [76] and [87], are presented in Figure 6.11,
85
Figure 6.10: Sample views of objects from the Rutgers Tools Database.
where correct matches retrieved from the database are colored yellow, while the mismached
entries are colored red. Considering only the best matches, we observe that while in Pelillo
et al.’s shock tree approach there is a total of 3 mismatched entries, both Sebastian et al.’s
graph-edit distance framework and our approach yield only 1 mismatched entry. In addi-
tion, considering all within-category matches, both the shock tree and graph-edit distance
approaches yield a total of 5 errors, while our approach yields only 3 errors. Moreover, if
we further group the hammer, pliers, screwdriver, and wrench shapes into the same “tools”
category, our many-to-many matching approach produces a 100% correct matching, while
the other two approaches still have mismatched entries. One would expect that as the de-
gree to which correspondences are many-to-many increases, both the graph-edit distance
algorithm as well as our algorithm would yield improved scores.
Overall, it is clear from the results that our results outperform both shock tree and
graph-edit distance approaches for the Rutgers Tools Database.
86
Figure 6.11: Comparison to two leading graph matching algorithms: Pelillo et al. [76](left), Sebastian et al. [87] (center), and our algorithm (right). In each case, the top sevenmatched database objects are sorted by their similarity to the query. Correct matches arecolored yellow, while mismaches entries are colored red.
87
6.4 Conclusions
In this chapter we experimentally verified that our matching framework yields mean-
ingful many-to-many feature correspondences between pairs of graphs representing 2D
shapes. The distance between graphs can be used as a dissimilarity measure between orig-
inal shapes. The effectiveness of the approach in the context of shape retieval using two
different recognition domains was demonstrated. Since we presented two embedding al-
gorithms in Chapter 3, we tested our matching framework using each of these embedding
techniques. In both domains, the experimental results clearly demonstrate the improved
performance offered by the spherical embedding technique over that of Matousek. This
can be attributed to the fact that embedded graph nodes can be directly matched in the
target space without the need for a dimensionality reduction process. As mentioned in
Chapter 3, an important trade-off exists between the distortion of the spherical embedding
and the dimension of the target space. The value of the dimension effects the distortion of
the embedding as well as the recognition rate. However, we should note that the higher the
dimensionality of the target space, the longer it takes the EMD to solve the transportation
problem. For practical purposes, we set the value of the dimension to 40.
We also tested the robustness of the framework by a set of perturbation experiments
in which the query graph was perturbed by adding/deleting 5%, 10%, 15%, and 20% of
its nodes and adjacant edges. According to the results, error rates increase gracefully as a
function of increased perturbation, which, in turn, shows the ability of the framework for
accommodating perturbation.
In addition to the recognition tests, we also performed a set of pose estimation experi-
88
ments, where the objective was to retrieve one of the neighboring views of the query. The
results show that for a given query, in more than 93% of the experiments, the algorithm
selects a correct neighboring view.
After demonstrating the effectiveness of our many-to-many matching algorithm applied
to shape retrieval, we will extend our framework to work with different feature extraction
algorithms and graph types. Here, our objective is to show the matching potential of our
framework in two different domains: face recognition and 3D object retrieval. We will
also compare our results to some existing approaches presented for these domains in the
experimental sections of the following chapters.
89
7. Face Recognition Experiments
In this chapter we evaluate our framework on a set of face recognition experiments.
We first begin by introducing a new feature extraction process and graph types. We then
show how we apply our matching framework to compute the similarity between pairs of
graphs of the new type. At the end of the chapter, we present the recognition performance
of our algorithm on a face database of 20 people with 10 faces per person for a total of
200 images. We also examine the stability of both the graph construction and matching
approaches in the experimental section.
7.1 Discrete Representation of Top Points via Scale Space Tessellation
It has been shown that top points (singular points in the scale space representation
of generic images) have proven to be valuable sparse image descriptors that can be used
for image reconstruction [54, 73] and image matching [55, 78]. In this section, we take
an unstructured set of top points and impose a neighborhood structure on them. Inspired
by the work of Lifshitz and Pizer [59], we will encode the scale space structure of a set
of top points in a directed acyclic graph (DAG). Specifically, we combine the position-
based grouping of the top points provided by a Delaunay triangulation with the scale space
ordering of the top points to yield a directed acyclic graph. This new representation allows
us to utilize powerful graph matching algorithms to compare images represented in terms
of top point configurations, rather than using point matching algorithms to compare sets of
isolated top points. Specifically, we draw on our work in many-to-many graph matching
90
which reduces the matching problem to that of computing a distribution-based distance
measure between embeddings of labeled graphs.
We describe our construction by first elaborating on those basics of catastrophe theory
required to introduce the concept of a top point. Next, we formally define a top point,
and introduce a measure for its stability that will be later utilized in the matching algo-
rithm. Section 7.3 describes the construction of the DAG through a Delaunay triangulation
scheme. The details of this construction process can be found in [77].
7.2 Catastrophe Theory
Critical points are points at any fixed scale in which the gradient vanishes (∇u � 0).
The study of how these critical points change as certain control parameters change is called
catastrophe theory. A Morse critical point will move along a critical path when a control
parameter is continuously varied. In principle, the single control parameter in the models
of this article can be identified as the scale of the blurring filter. The only generic mortifica-
tions in Gaussian scale space are creations and annihilations of pairs of Morse hypersaddles
of opposite Hessian signature1 [26, 32]. An example of this is given in Figure 7.1.
The points at which creation and annihilation events take place are often referred to
as top points2. A top point is a critical point at which the determinant of the Hessian
degenerates: ��� �� ∇u � 0
det�H �-� 0 � (7.1)
1The Hessian signature is the sign of the determinant evaluated at the location of the critical point.2The terminology is reminiscent of the 1D case, in which only annihilations occur generically.
91
++
0
0C
A
space
scal
e
Figure 7.1: The generic catastrophes in isotropic scale space. Left: an annihilation event.Right: a creation event. A positive charge � denotes an extremum, a negative charge �denotes a saddle, indicates the singular point.
An easy way to find these top points is by means of zero-crossings in scale space. This
involves derivatives up to second order and yields sub-pixel results. Other, more elaborate
methods, can be used to find or refine the top point positions. For details, the reader is
referred to [32].
It is obvious that the positions of extrema at very fine scales are sensitive to noise. This,
in most cases, is not a problem. Most of these extrema are blurred away at coarse scales
and won’t affect our matching scheme. However, problems do arise in areas in the image
that consist of almost constant intensity (genericity implies that flat plateaus do not occur
in the image). One can imagine that the positions of the extrema (and thus the critical paths
and top points) are very sensitive to small perturbations in these areas. These unstable
critical paths and top points can continue up to very high scales since there is no structure
in the vicinity to interact with. To account for these unstable top points, we need to have a
measure of stability, so that we can either give unstable points a low weight in our matching
92
scheme, or disregard them completely.
7.3 Construction of the Graph
The goal of our construction is two-fold. First, we want to encode the neighborhood
structure of a set of points, explicitly relating nearby points to each other in a way that
is invariant to minor perturbations in point location. Moreover, when local neighborhood
structure does indeed change, it is essential that such changes not affect the encoded struc-
ture elsewhere in the graph (image). The Delaunay triangulation imposes a position-based
neighborhood structure with exactly these properties [79]. It represents a triangulation of
the points which is equivalent to the nerve of the cells in a Voronoi tessellation, i.e., that
triangulation of the convex hull of the points in the diagram in which every circumcircle
of a triangle is an empty circle [74]. The edge set of our resulting graph will be based on
the edges of the triangulation. Our second goal is to capture the scale space ordering of
the points to yield a directed acyclic graph, with coarser scale top points directed to nearby
finer scale top points.
A summary of this procedure is presented in Algorithm 7, and it is illustrated for a
simple image in Fig. 7.2. In the top two frames in the left figure, we show the transition
in the triangulation from v2 (point 2) to v3 (point 3); the root is shown as point 1. In the
upper right frame, the triangulation consists of three edges; correspondingly, G has three
edges:�1 � 2 �`� � 1 � 3 �`� � 2 � 3 � , where
�x � y � denotes an edge directed from node x to node y. In
the lower left figure, point 4 is added to the triangulation, and the triangulation recomputed;
correspondingly, we add edges�1 � 4 ��� � 2 � 4 ��� � 3 � 4 � to G (note that
�1 � 2 � is no longer in the
triangulation, but remains in G). Finally, in the lower right frame, point 5 is added, and
93
Figure 7.2: Visualization of the DAG construction algorithm. Left: the Delaunay triangu-lations at the scales of the nodes. Right: the resulting DAG (edge directions not shown).
the triangulation recomputed. The new edges in the triangulation yield new edges in G:
(2,5),(4,5),(1,5). The right side of Figure 7.2 illustrates the resulting graph (note that the
directions of the edges are not shown). In Figure 7.3 the right image shows the result of
applying this construction to the left image.
7.4 Experimental Results
We conduct our experiments using a subset of the Olivetti Research Laboratory face
database. The database consists of faces of 20 people with 10 faces per person, for a total
of 200 images; each image in the database is 112 � 92 pixels. The face images are in
frontal view and differ by various factors such as gender, facial expression, hair style, and
presence or absence of glasses. A representative view of each face and all 10 face images of
one person from the database are shown in Figure 7.4 and Figure 7.5, respectively. Our goal
is to evaluate our proposed many-to-many matching framework on a set of face recognition
94
Figure 7.3: The right image shows the DAG obtained from applying Algorithm 7 to thecritical paths and top points of the face in the left.
Figure 7.4: Sample faces from 20 people.
experiments, where the objective is to select a correct face image belonging to the same
person as the query.
Fig. 7.6 presents an overview of the approach for these experiments. For a given face,
we first create its DAG according to Section 7.3 (Transition 1), and embed each vertex of
the DAG into a vector space of prescribed dimensionality using a deterministic spherical
coding (Transition 2). The choice of spherical coding is motivated by its better performance
over that of Matousek. (See Chapter 6). Finally (Transition 3), we compute the distance
95
Algorithm 7 Top point graph construction procedure
1: Detect the critical paths.
2: Extract the top points from the critical paths.
3: Label the extremum path continuing up to infinity as v1.
4: Label the rest of the nodes (critical paths, together with their top points) according tothe scale of their top points from high scale to low as v2 ��������� vn.
5: For i � 2 to n evaluate node vi:
6: Project the previous extrema into the scale of the considered node vi.
7: Calculate the 2D Delaunay triangulation of all the extrema at that scale.
8: All connections to vi in the Delaunay triangulation are stored as directed edges inG.
between the two distributions by the modified Earth Mover’s Distance under transforma-
tion. The dimension of the target space in Transition 1 has a direct effect on the quality of
the embedding. Specifically, as the dimensionality of the target space increases, the quality
of the embedding will improve. As mentioned in Chapter 3, there exists an asymptotic
bound beyond which increasing the dimensionality will no longer improve the quality of
the embedding.
For the experiments, we first group the faces in the database by individual; these will
represent our categories. Next, we remove the first image (face) from each group and
compare it (the query) to all remaining database images. The image is then put back in
Figure 7.5: Ten face images of one person from the database.
96
Figure 7.6: Computing similarity between two given faces. (Matched point clusters areshaded with the same color.) See text.
the database, and the procedure is repeated with the second image from each group, etc.,
until all 10 face images of each of the 20 individuals had been used as a query. We say
the matching is correct if a query from one individual matches closest to another image
from the same individual, rather than an image from another individual. The results are
summarized in Table 1, Fig. 7.7. The magnitudes of the distances are denoted by shades
of gray, with black and white representing the smallest and largest distances, respectively.
Due to symmetry, only the lower half of the distance matrix is presented. Intra-object
distances, shown along the main diagonal, are very close to zero.
To better understand the differences in the recognition rates for different people, we
randomly selected a subset of the matching results among three people in the database,
as shown in Table 2, Fig. 7.7. Here, the�i � j � -th entry shows the actual distance between
face i and face j. It is important to note that the distance between two faces of the same
person is smaller than that of different people, as is the case for all query faces. In our
97
Figure 7.7: Table 1: Matching results of 20 people. The rows represent the queries and thecolumns represent the database faces (query and database sets are non-intersecting). Eachrow represents the matching results for the set of 10 query faces corresponding to a singleindividual matched against the entire database. The intensity of the table entries indicatesmatching results, with black representing maximum similarity between two faces and whiterepresenting minimum similarity. Table 2: Subset of the matching results with the pairwisedistances shown. Table 3: Effect of presence or absence of glasses in the matching for thesame person.
98
experiments, one of our objectives was to see how various factors, such as the presence or
absence of glasses, affects the matching results for a single person. Accordingly, we took
a set of images from the database of one person, half with the same factor, and computed
the distances between each image pair. Our results show that images with the same factors
are more similar to each other than to others. Table 3 of Fig. 7.7 presents a subset of our
results. As can be seen from the table, images of the same person with glasses are more
similar than those of the same person with and without glasses. Still, in terms of categorical
matching, the closest face always belongs to the same person.
We also examine the stability of the proposed matching framework under additive Gaus-
sian noise at different signal levels applied to the original face images. For this experiment,
the database consists of the original 200 unperturbed images, while the query set consists
of noise-perturbed versions of the database images. Specifically, for each of the 200 im-
ages in the database, we create a set of query images by adding 1%, 2%, 4%, 8%, and
16% Gaussian noise. Figure 7.8 shows how an image looks after adding Gaussian noise
at different signal levels. Next, we compute the similarity between each query (perturbed
database image) and each image in the database, and score the trial as correct if its dis-
tance to the face from which it was perturbed is minimal across all database images. This
amounts to 40,000 similarity measurements for each noise level, for a total of 200,000 sim-
ilarity measurements. Our results show that the recognition rate decreases down to 96.5%,
93%, 87%, 83.5%, and 74% for 1%, 2%, 4%, 8%, and 16% of Gaussian noise, respectively
(see Table 7.1).
99
Figure 7.8: Sample face image after adding Gaussian noise at different signal levels. Part(a) shows the original image. Parts (b), (c), (d), (e), (f) shows how the image looks afteradding 1%, 2%, 4%, 8%, and 16% of Gaussian noise, respectively
GAUSSIAN NOISE 1% 2% 4% 8% 16%RECOGNITION RATE COIL-20 96.5% 93.0% 87.0% 83.5% 74.0%
Table 7.1: Recognition rate as a function of Gaussian noise at different signal levels.
7.5 Conclusions
In this chapter we first presented a method for imposing neighborhood structure on a
set of scale space top points. Drawing on the Delaunay triangulation of a set of points,
we generated a directed acyclic graph (DAG) whose edges were directed from top points at
coarser scales to nearby top points at finer scales. We then applied our matching framework
on the resulting DAGs to compute similarities between them. The approach was used in
face recognition for a database which consists of faces of 20 people with 10 faces per
person, for a total of 200 images. We computed an average similarity score between each
pair of people. Our experimental results show that the similarity score between a person
from the database and himself/herself is always greater than the similarities with the others.
In other words, using average pairwise similarity values our algorithm resulted in 100%
accuracy. One of our objectives in the experiments was to see how various factors, such
100
as the presence or absence of glasses, affects the matching results for a single person. Our
results showed that images with the same factors are more similar to each other than to
others. We also studied the stability of the overall recognition approach with respect to
additive Gaussian noise at different signal levels. Generally, the matching scores indicate
the robustness of the framework against increasing level of noise.
Overall, the experimental results demonstrate the performance of our matching frame-
work in a face recognition domain using singular points in the scale space representation
of generic images (top points). In the next chapter we will adapt our matching approach to
work with a different shape representation in a different recognition domain. Specifically,
we will use our matching algorithm to retrieve 3D volumetric objects using their skeletal
representations in a database of 1081 objects.
101
8. 3D Object Retrieval using Many-to-Many Matching of Curve Skeletons
In this chapter, we will adapt our many-to-many matching framework to 3D object re-
trieval. The objects used in this work are volumetric and are represented as 3D skeletons.
We demonstrate the performance of the approach on a large database of 3D objects con-
taining more than 1000 exemplars. The method is especially suited to matching objects
with distinct part structure and is invariant to part articulation. Skeletal matching has an
intuitive quality that helps in defining the search and visualizing the results. In particular,
the matching algorithm produces a direct correspondence between two skeletons and their
parts, which can be used for registration and juxtaposition.
One important contribution of this study is to show the ability of our matching frame-
work for part matching. More specifically, our goal is to match a part within a complex
whole in 3-dimensional space. This type of matching is particularly useful for CAD-type
databases and also for recognition in laser-scanned images, which tend to cluster objects
together. It is also central to medical applications in which a particular biological configu-
ration is to be found somewhere in a larger object such as an organ.
8.1 Introduction
3D object models are now widespread and are used in many diverse applications, such
as computer graphics, scientific visualization, CAD, computer vision, medical imaging,
etc. Large databases of 3D models are publicly available, such as the Princeton Shape
Benchmark Database [94] or the 3D Cafe repository [1], with datasets contributed by the
102
CAD community, computer graphic artists, or scientific visualization community. Such
models include both polygonal representations (CAD objects, computer graphics imagery)
and volumetric data (medical images, scientific visualization datasets). The problem of
searching for a specific shape in a large database of 3D models is an important area of
research. Text descriptors associated with the 3D shapes can be used to drive the search
process, as is done for 2D images [40,80]. However, text descriptions may not be available
and furthermore, could not be used for part-matching or similarity-based matching.
Matching 3D objects is a difficult problem, with a complex relation to the 2D shape-
matching problem. While the 3D nature of the representation helps to remove some of
the viewpoint, lighting, and occlusion problems in computer vision, other issues arise. Of
course, the added dimension and the inherent increase in data size make the matching pro-
cess more computationally expensive. Furthermore, many of the models are degenerate,
containing holes, intersecting polygons, overly thin regions, etc. And there are many dif-
ferent types of matching that may be desirable. Given a query object, one may want to
search an entire database for a matching exemplar, if one exists. On the other hand, if the
database contains categorical models, one may want to find the category to which the query
exemplar belongs.
In this chapter we use the skeleton of a 3D shape for matching. The skeleton used here
is a stick-like simplification of the 3D object, which preserves the main topological fea-
tures of the original object and provides information about the local structure in the form
of the distance between the skeleton point and the surface point. It is an intuitive shape
representation, which captures the notion of parts or components of an object. This allows
the user to understand the nature of the match and to influence the matching process by em-
103
phasizing or de-emphasizing certain features of the object. We demonstrate the efficiency
of the matching framework on a database of about 1100 examples. While the performance
of our algorithm is comparable to that of other existing 3D matching methods (eg. [75,94]),
the locality of our skeletal representation and matching algorithm has some other benefits,
such as enabling part matching and articulated matching.
8.2 Approach
The main steps of the skeleton matching process are as follows. First, we determine the
curve skeleton of the object. An overview of this step is presented in Section 8.2.1. Next,
we match the exemplar skeleton against all other skeletons in the database. Finally, we will
rank the results and visualize the best match. Details of the approach can be found in [22].
A skeleton is a useful shape abstraction that captures the essential topology of an ob-
ject in both two and three dimensions. It provides the following characteristics that are not
present in global shape descriptors.
Part/Component Matching: In contrast to a global shape measure, skeleton matching can
accommodate part matching, where the object to be matched is part of a larger object, or
vice versa. This feature can potentially give the user more control over the matching algo-
rithm, allowing them to specify what part of the object they would like to match or whether
the matching algorithm should weight one part of the object more than the rest.
Registration and visualization: The skeleton can be used to register the two matched objects
and visualize the result in a common space. This is very important in scientific applications
where one is interested in both finding a similar object and understanding the extent of the
similarity [101].
104
Figure 8.1: Some examples of 3D shapes and their computed skeletons.
Intuitiveness: The skeleton is an intuitive representation of shape and can be easily under-
stood by the user, providing more control in the matching process.
Articulated transformation invariance: The method presented here can be used for artic-
ulated object matching, because the skeleton topology does not change within limits as a
result of articulated motion. An example was shown in [101]. Note that most global shape
descriptors cannot accommodate such changes in object configuration.
8.2.1 The Curve-Skeleton
We utilize a curve skeleton for the matching. The curve skeleton is a concise represen-
tation of the object which is easy to understand and is used in many CAD and Computer
Graphics modeling programs. The curve skeleton is not unique in 3D and its determination
is based upon the application for which it is being used. A full description and explanation
can be found in [21].
Our curve-skeleton extraction algorithm works on a volumetric representation of the 3D
105
object. It is based on the method presented by Chuang et. al. [18] which uses a generalized
Newtonian potential field generated by charges placed on the surface of the object to extract
a 1D curve-skeleton from a 3D shape. The generalized potential at a point due to a nearby
point charge is defined as a repulsive force, pushing the point away from the charge with a
strength that is inversely proportional to some power of the distance between the point and
the charge. This step produces a vector field.
Given a 3D vector field, we use concepts from vector field visualization to identify two
types of seed points that we will use to construct a curve-skeleton: critical points and high
divergence points. At critical points, the magnitude of the vector vanishes, which is why
they are also called zeros of the vector field. A full discussion of the visualization of vector-
field topology and the different types of critical points can be found in [36] and [46]. In
addition to critical points, we also use the divergence of the vector field to select new seed
points. We compute the divergence at each voxel inside the object and the user specifies
the percentage of the highest divergence points that will be used as seeds [21]. By varying
this parameter, one can generate an entire hierarchy of skeletons of various complexities
and select the best one for a given application. In the experiments presented in Section 8.3,
we used 40% of the highest divergence points as seeds for all our skeletons.
Skeleton segments are discovered using a force-following algorithm on the underlying
vector field, starting at each of the identified seed points. The force following process
evaluates the vector (force) value at the current point in the vector field and moves in the
direction of the vector with a small pre-defined step. For more details of this procedure,
see [21]. Figure 8.1 shows a few examples of 3D objects and their respective skeletons.
The algorithm starts by computing the generalized potential function at each object
106
Figure 8.2: Computing similarity between two given objects.
voxel, producing a 3D vector-field. Next, the critical and high divergence points of the
vector field as seeds for skeleton segments will be detected. Finally, the curve-skeleton
using the force-following algorithm initiated at every seed point will be extracted.
The skeleton obtained using the above algorithm consists of a set of points sampled
by the force following algorithm. Each skeleton point is then equipped with a distance-
transform value [33], a real number specifying the distance to the closest point on the
surface of the object. This additional information is used by the many-to-many matching
process.
Figure 8.2 shows an example of matching between two objects: in step 1, the curve-
skeleton for each object is computed while in step 2, the many-to-many matching estab-
lishes the distance and the correspondence between the two skeletal representations. The
skeleton regions that were matched to each other are shown in the same color in Figure
8.2.
107
8.3 Experimental Results
To evaluate the utility of our skeletal representation and many-to-many matching algo-
rithm, we performed 2 sets of experiments: 3D base classification and part matching.
8.3.1 Base Classification and Object Retrieval
We first tested our proposed approach to retrieving similar objects on a subset of 1081
objects from the Princeton Shape Benchmark Database [80], grouped into 99 non-empty
classes from both the test and train classifications [94]. In our experiments, we first created
3D skeletons for each object. We used 40% of highest divergence points as seeds for all
our skeletons. We then computed the distance from each object to the remaining database
entries using our many-to-many matching algorithm. If the conceptual classes correspond
to bodies which vary only in scale, or by articulated transformation, our algorithm should
return an object that belongs to the same class as the query. We will classify this as a
“correct matching”. Based on the overall matching statistics, we observe that in 71.1% of
the experiments, the overall best match selected by our algorithm belonged to the same
class as the query (also known as the nearest neighbor criterion [94]). In 74.3% of the
experiments, the best match belonged to the same parent class as that of query.
In a second experiment, we asked how many of the models in the query’s class appear
within the top T � 1 matches, where T is the size of the query’s class (First tier [94]). This
number was 17.2%. Repeating the same experiment, but considering the top 2 � T � 1
matches (second tier [94]) covers 22.7% of the members of the class.
Comparing these results with those reported by Shilane et. al [94] in Table 4 of their
108
work, it should be noted that our method outperforms all methods on the nearest neighbor
criterion, but does not do as well on the first and second tier criterion. This is evident in the
precision-recall plot in Figure 8.3. The precision-recall plot shows the relation between
recall (the ratio of models from the class of the query returned within the top N matches)
and the precision (the ratio of the top N matches that belong to the query class) [94]. Figure
8.3 shows the precision-recall plot averaged over all models and looking at the first 20 best
matches only.
In Figure 8.4, we have presented the matching results for a small subset of objects.
The first column of each row shows the query object; the remaining elements of each row
represent the top 10 closest objects of the database determined by our matching algorithm.
Observe that in most cases, the closest object is an object from a similar class. In some
cases, while the algorithm has identified an object with similar structure as the best match,
it was still penalized for selecting an object from an incorrect category. The query object
(race car) in row one and its best matched object are an example of such a case. They
can be attributed to the hierarchy of particular categories used by the Princeton Shape
Benchmark Database [80]. When similarity of shape is desired, a method which relies
on shape would help retrieve objects not normally associated with the exemplar and not
typically categorized with it.
8.3.2 Part Matching
Matching of a part within a complex whole is useful for CAD-type databases and also
for recognition in laser-scanned images, which tend to cluster objects together. It is also
central to medical applications in which a particular biological configuration is to be found
109
somewhere in a larger object such as an organ. Specifically, given a part of an object as a
query, one attempts to locate objects containing similar subparts. Here, the difficulty lies
in the fact that none of the database objects contains an exact copy of the query.
An important aspect of the part matching approach is the computation of correspon-
dence between the matched objects. Our many-to-many matching algorithm provides a
direct correspondence between the skeleton points of the query object and the skeleton
points in the matched objects. This allows one to register the query part into the composite
object. Global shape descriptors perform poorly at this task because global information
cannot preserve local correspondences.
In our next experiment, we used a query part (a torso) and matched it against several
simple objects in the database, some containing the query part. Aside from the simple
objects in the database, we have created a number of composite objects obtained by a
union operation applied to two simple objects – the kind of composition one would expect
to encounter in laser-scanned scenes. The query objects and some of the database objects
together with distance values computed by our matching algorithm are shown in Figure 8.5.
For every database object, we also show its corresponding parts with the query object in
Figure 8.6.
8.4 Conclusions
In this chapter we applied our matching framework to 3D object retrieval using skeletal
representations of volumetric objects. We demonstrated the performance of the method
on a database of over 1000 objects, with retrieval results comparable to the global shape
descriptor methods presented in [94].
110
The skeleton-based approach has a number of advantages over the global shape de-
scriptor methods. It is an intuitive representation of 3D objects that can be easily used to
understand the similarities present in the matched objects. Since our many-to-many match-
ing algorithm provides a direct correspondence between skeleton points in two matched
objects, one can use this correspondence for registration and juxtaposition. The skeleton
captures both global and local properties of the shape, so it can be used for many different
matching tasks.
One important contribution of this chapter is to show the ability of our matching frame-
work for part matching, where only a portion of the skeleton is matched. Part matching is
also useful in a CAD environment, where a user may be interested in retrieving objects that
contain a certain part or component. Laser-scanned scenes also tend to merge together all
elements in the environment; in this situation, part matching can be used for segmentation.
Our part matching examples showed that many-to-many matching can be used to locate
a part in a database of composite objects. The inverse problem is also of interest, where
given a composite object, one would like to identify its component parts among the objects
of a database. We will focus on this problem in the future.
111
Figure 8.3: Precision/Recall for many-to-many matching algorithm in object retrieval ex-periment.
112
2.4 17.9 18.0 20.4 20.5 20.9 21.0 21.1 21.8 21.9
1.5 10.4 12.8 14.0 14.2 14.7 14.8 15.3 15.4 15.5
25.8 30.4 35.5 36.2 38.1 43.7 44.1 44.8 44.9 45.3
1.3 34.3 34.7 35.3 35.5 35.9 39.8 40.2 40.5 40.6
Query Top 10 Matched Objects
Figure 8.4: Models are sorted by the similarity to the query object.
42.0 158.2 189.5 206.2 212.9
Figure 8.5: Part Matching Example: computed distances between a query part (torso) ver-sus several simple and composite objects.
113
Figure 8.6: Correspondences in Part Matching: The query object in (a) is matched againsteach of the objects in (b). The correspondences between their skeletons are shown in redin (c)
114
9. Conclusions
9.1 Summary
There is a growing trend towards research in feature matching, often formulated as a
graph matching problem, whose goal is to establish node correspondences between pairs of
graphs. Depending on the way that these correspondences are established, graph matching
algorithms can be divided into two groups: one-to-one and many-to-many. Although pow-
erful, algorithms providing one-to-one feature correspondences suffer from the significant
limitation that one-to-one correspondences between graphs of similar objects must exist.
However, due to noise, segmentation or articulation errors, such correspondences may not
exist.
In this thesis we presented an efficient (polynomial time) novel matching algorithm that
established many-to-many correspondences between the nodes of two noisy, vertex-labeled
weighted graphs. To match two graphs, we began by constructing metric tree representa-
tions of the graphs. Next, we embedded them into a geometric space with low distortion
using a novel encoding of the graph’s vertices with the aim of preserving pairwise vertex
distances. While the distances could not be preserved exactly, they were approximated
with low distortion. We presented two low-distortion embedding algorithms, beginning
with one that was inspired by the general framework of Matousek [65]. In this algorithm,
the dimensionality of a graph’s embedding is a function of the graph. Specifically, the
number of paths in the caterpillar decomposition of the graph defines the dimension of the
target space. Two graphs to be matched may yield embeddings with different dimensional-
115
ity, requiring a projection step to bring them to the same space. We overcome this problem
by introducing a second embedding technique, using a novel spherical encoding of graph
structure, which embedded both graphs into a single space of prescribed dimensionality.
The second embedding algorithm is a deterministic variation of the embedding technique
presented in [43].
By embedding weighted graphs into normed vector spaces, we reduced the problem
of many-to-many graph matching to that of many-to-many geometric point matching, for
which the Earth Mover’s Distance algorithm is ideally suited. Moreover, by mapping a
node’s geometric and structural “context” in the graph to an attribute vector assigned to its
corresponding point, we extended the technique to deal with hierarchical graphs that repre-
sent multi-scale structure. The many-to-many point matching computed by the EMD yields
a set of many-to-many node correspondences between the original graphs. Despite the fact
that our framework was designed to establish many-to-many feature correspondences, it,
in fact, includes one-to-one matching as a special case.
We evaluated the framework using each embedding technique on two different object
recognition domains: silhouettes and multi-scale ridge and blob decompositions. The ex-
perimental results demonstrated the effectiveness of the approach for finding many-to-many
feature correspondences. Given a query and a database of more than one thousand entries,
a more comprehensive evaluation of our framework for shape retrieval demonstrated the
ability of our approach to estimate correct pose and to select the correct object in more
than 93% and 97% of the cases, respectively. A set of perturbation experiments showed the
stability of the overall framework. We also compared our approach to two leading graph
matching algorithms and presented the recognition rates of each approach and their top
116
seven matches. Considering within-category matches, these comparison tests showed that
our framework resulted in better recognition rates than the others.
In addition to these experiments, we presented the applicability of the framework to
two other recognition domains: face recognition, and 3D object retrieval using skeletal
representations of 3D volumetric objects. These experiments also produced encouraging
results, showing the potential of the developed method in a variety of computer vision
and pattern recognition domains. The stability of the overall approach against increasing
levels of Gaussian noise and our preliminary part matching results were also presented in
these works. In 3D object retrieval experiments, our matching framework outperformed
all existing frameworks on a database of more than one thousand objects for the nearest
neighbor criterion.
9.2 Contributions
There are many contributions of this work that would be valuable to many fields of
computer vision. The specific contributions of the thesis are as follows:
1. We developed a novel framework for graph matching with many-to-
many node correspondences. Specifically, we showed that many-to-
many graph matching problem could be reduced to that of many-to-
many point matching in vector space. This contribution is important
because this step enables us to transform an intractable problem in
graph space into a tractable one in vector space with some approxi-
mation.
117
2. We showed that the deterministic variation of the spherical embed-
ding method is a powerful technique that enables us to embed tree
metrics into a vector space of prescribed dimensionality. The main
advantage of this technique is due to the fact that the embedded
nodes can be matched directly without the need for a dimensionality
reduction process.
3. We showed that directed edge relations, such as hierarchies between
graph nodes, could be represented as node attributes. Encoding such
relations in node attributes allowed us to express the mass of each
node as a function of its local histograms, which, in turn, enabled
us to use the graph structure, while establishing correspondences.
More specifically, this process extended the technique to deal with
hierarchical graphs representing multi-scale structures.
4. By a set of experiments, we showed that the many-to-many vector
mapping that realizes the minimum Earth Mover’s Distance corre-
sponds to the desired many-to-many matching between nodes of the
original graphs. In addition, shape retrieval experiments in various
computer vision domains showed that the distance computed by the
EMD could be used as a dissimilarity value between the original
graphs representing objects.
118
9.3 Discussion and FutureWork
Our matching framework can be applied to any many-to-many graph matching problem,
whether directed or undirected graphs. Still, the approach has its limitations. Finding
meaningful feature correspondences depends on appropriate edge weights in the original
graphs. Since these edge weights ultimately govern the proximity of the embedded points
and hence their propensity to being combined during the EMD step, the edge weights
(distances) are effectively a perceptual grouping or abstraction heuristic between features.
If they are chosen or defined poorly, the EMD step may not converge on a meaningful
solution.
The EMD is a global distance that tries to account for all the points. Although we
showed in our experiments that the framework is robust to perturbation of the graphs in
terms of missing/spurious features, overall the method is global. If a graph includes a node
representing an occluder with large mass, its presence will have an adverse effect on the
computed flows for the algorithm cannot selectively exclude the node. Note, however, that
if there are unique attributes shared by nodes to be matched, these attributes can act as
constraints on the EMD matching, ensuring that a pile of dirt with a particular “color” can
flow to holes of the same color.
It should also be noted that in an object recognition problem, the type of representation
used in describing the objects has a significant impact on both the correctness and effec-
tiveness of the recognition system. Hence, our recognition results show the quality of our
matching approach, as well as the feature extraction and representation methods used in
the framework.
119
The results of comparing our approach to existing frameworks are rather promising and
require further exploration of the matching algorithm. We will study the effectiveness of
our algorithm on much larger datasets and compare our results to more leading matching
frameworks based on both one-to-one and many-to-many matchings. Although our algo-
rithm finds many-to-many matchings in polynomial time, it takes about one minute on
an Intel(R) Xeon(TM) CPU 1.50GHz computer to match two graphs having around 2000
nodes, which limits the number of graphs that can be practically matched. We plan to
improve the efficiency of the algorithm by optimizing the matching code and revising the
algorithm itself.
One way to revise the algorithm is to use a distance-preserving embedding algorithm
(isometric embedding) in the framework. For instance, one may embed tree metrics into l1
and compute the correspondences under this norm. Given tree metrics, this technique will
enable us to embed their nodes with no distortion. Thus, pairwise distances in the vector
space will be equal to those in the tree metrics. Another distance-preserving embedding
algorithm can be obtained by embedding metrics defined for input graphs into l∞. Since,
any metric space can be embedded isometrically into l∞, pairwise distances in the target
space will be equal to the original ones. In both of these approaches, our goal will be to
solve for the transportation problem under each of the l1 and l∞ norms, while accounting
for the transformation.
Another alternative way of revising the algorithm is to define a different metric distance
on given weighted graphs. Recall that only the shortest-path metric was used in the frame-
work to compute distances between vertices of the input graphs. Since finding meaningful
feature correspondences depends on appropriate edge weights in the original graphs, the
120
type of the metric distance will have a direct effect on the overall performance of the algo-
rithm. When trying different types of metric distances, one interesting question is to find
out which types will result in better recognition scores than the others. In various recogni-
tion domains, this question also involves finding best metric distances for different feature
extraction methods.
As presented by a set of experiments, our framework can be used to locate a part in a
database for part matching. Although promising, these results are preliminary and require
further exploration. The inverse problem, namely, given a composite shape, identifying its
component parts from a part database, is also of interest. We will focus on these problems
in the future as well.
One of the objectives of our future work is also to develop an indexing mechanism
for improving the overall efficiency and effectiveness of the algorithm. One may observe
that such mechanisms may be constructed for both original graph representations and for
embedded point sets. A comparison study between these two methods is also of interest.
A key component to many research problems (such as feature tracking, morphing) is
robust feature matching. In the future we will design new vision algorithms based on our
many-to-many feature matching framework. We believe that our framework will have an
immediate impact on other computer vision problems.
121
Bibliography
[1] 3D Cafe. http://www.3dcafe.com/asp/freestuff.asp.
[2] S. Abiteboul. Querying semi-structured data. In ICDT, pages 1–18, 1997.
[3] R. Agarwala, V. Bafna, M. Farach, M. Paterson, and M. Thorup. On the approxima-bility of numerical taxonomy (fitting distances by tree metrics). SIAM Journal onComputing, 28(2):1073–1085, 1999.
[4] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms,and Applications, pages 4–7. Prentice Hall, Englewood Cliffs, New Jersey, 1993.
[5] D. S. Atkinson and P. M. Vaidya. Using geometry to solve the transportation problemin the plane. Algorithmica, 13(5):442–461, 1995.
[6] R. Barie. Lecons sur les fonctions discontinues. Paris, 1905.
[7] H.G Barrow and R.M. Burstall. Subgraph isomorphism, matching relational struc-tures and maximal cliques. Information Processing Letters, E76-A(4):83–84, 1975.
[8] Y. Bartal, A. Blum, C. Burch, and A. Tomkins. A polylog(n)-competitive algorithmfor metrical task systems. In STOC ’97: Proceedings of the twenty-ninth annualACM symposium on Theory of computing, pages 711–719, New York, NY, USA,1997. ACM Press.
[9] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition usingshape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(4):509–522, April 2002.
[10] R. Beveridge and E. M. Riseman. How easy is matching 2D line models usinglocal search? IEEE Transactions on Pattern Analysis and Machine Intelligence,19(6):564–579, June 1997.
[11] J. Bourgain. On Lipschitz embedding of finite metric spaces into Hilbert space.Israel Journal of Mathematics, 52:46–52, 1985.
[12] J. Bourgain. The metrical interpretation of superreflexivity in Banach spaces. IsraelJournal of Mathematics, 56:222–230, 1986.
[13] K.L. Boyer and A.C. Kak. Structural stereopsis for 3-D vision. IEEE Transactionson Pattern Analysis and Machine Intelligence, 10(2):144–166, March 1988.
122
[14] P. Buneman. The recovery of trees from measures of dissimilarity. In F. Hodson,D. Kendall, and P. Tautu, editors, Mathematics in the Archaeological and HistoricalSciences, pages 387–395. Edinburgh University Press, Edinburgh, 1971.
[15] P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query language and algebrafor semistructured data based on structural recursion. VLDB Journal: Very LargeData Bases, 9(1):76–110, 2000.
[16] H. Bunke and K. Shearer. A graph distance metric based on the maximal commonsubgraph. Pattern Recognition Letters, 19(3-4):255–259, 1998.
[17] H.T. Chen, H. H. Lin, and T.L. Liu. Multi-object tracking using dynamical graphmatching. In Proceedings, IEEE Conference on Computer Vision and Pattern Recog-nition, pages II:210–217, 2001.
[18] J.H. Chuang, C. Tsai, and M.C. K. Skeletonization of three-dimensional object usinggeneralized potential field. IEEE Transactions on Pattern Analysis and MachineIntelligence, 22(11):1241–1251, 2000.
[19] S. D. Cohen and L. J. Guibas. The earth mover’s distance under transformationsets. In Proceedings, 7th International Conference on Computer Vision, pages 1076–1083, Kerkyra, Greece, 1999.
[20] J. H. Conway and N. J. A. Sloane. Sphere Packing, Lattices and Groups. Springer-Verlag, New York, 1998.
[21] N. Cornea, D. Silver, X. Yuan, and R. Balasubramanian. Computing hierarchicalcurve-skeletons of 3d objects. CAIP Technical Report CAIP-TR275, Nov 2004.
[22] N. D. Cornea, M. F. Demirci, D. Silver, A. Shokoufandeh, Y. Keselman, S. J. Dick-inson, and P. B. Kantor. 3d object retrieval using many-to-many matching of curveskeletons. In Shape Modeling and Applications, 2005.
[23] M. S. Costa and L. G. Shapiro. Relational indexing. In SSPR, pages 130–139, 1996.
[24] T. Cox and M. Cox. Multidimensional Scaling. Chapman and Hall, London, 1994.
[25] C. M. Cyr and B. B. Kimia. A similarity-based aspect-graph approach to 3d objectrecognition. Int. J. Comput. Vision, 57(1):5–22, 2004.
[26] J. Damon. Local morse theory for solutions to the heat equation and gaussian blur-ring. Journal of Differential Equations, 115(2):386–401, 1995.
123
[27] W.H.E. Day. Computational complexity of inferring phylogenies from dissimilaritymatrices. Bulletin of Mathematical Biology, 49(4):461–467, 1987.
[28] M. F. Demirci, A. Shokoufandeh, S. J. Dickinson, Y. Keselman, and L. Bretzner.Many-to-many feature matching using spherical coding of directed graphs. In ECCV(1), pages 322–335, 2004.
[29] M. F. Demirci, A. Shokoufandeh, Y. Keselman, S. J. Dickinson, and L. Bretzner.Many-to-many matching of scale-space feature hierarchies using metric embedding.In Scale-Space, pages 17–32, 2003.
[30] S. Dickinson, A. Pentland, and A. Rosenfeld. 3-D shape recovery using distributedaspect matching. IEEE Transactions on Pattern Analysis and Machine Intelligence,14(2):174–198, 1992.
[31] M. A. Eshera and K. S. Fu. A graph distance measure for image analysis. IEEETrans. SMC, 14:398–408, May 1984.
[32] L. Florack and A. Kuijper. The topological structure of scale-space images. J. Math.Imaging Vis., 12(1):65–79, 2000.
[33] N. Gagvani and D. Silver. Parameter controlled volume thinning. Graphical Modelsand Image Processing, 61(3):149–164, 1999.
[34] M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: Asystem for extracting document type descriptors from xml documents. In SIGMODConference, pages 165–176, 2000.
[35] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions viahashing. In The VLDB Journal, pages 518–529, 1999.
[36] A. Globus, C. Levit, and T. Lasinski. Tool for visualizing the topology of three-dimensional vector fields. In IEEE Visualization, pages 33–40, 1991.
[37] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching.IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):377–388,1996.
[38] R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimiza-tion in semistructured databases. In VLDB’97, Proceedings of 23rd InternationalConference on Very Large Data Bases, pages 436–445. Morgan Kaufmann, 1997.
124
[39] R. Goldman and J. Widom. Approximate DataGuides, 1999.
[40] Google Image Search. http://www.google.com.
[41] K. Grauman and T.J. Darrell. Fast contour matching using approximate earthmover’s distance. In Proceedings, IEEE Conference on Computer Vision and PatternRecognition (CVPR04), pages I: 220–227, 2004.
[42] W.E.L. Grimson, T. Lozano-Perez, and D.P. Huttenlocher. Object Recognition byComputer: The Role of Geometric Constraints. MIT Press, 1990.
[43] A. Gupta. Embedding tree metrics into low dimensional Euclidean spaces. In Pro-ceedings of the thirty-first annual ACM symposium on Theory of computing, pages694–700, 1999.
[44] A. Gupta, I. Newman, Y. Rabinovich, and A. Sinclair. Cuts, trees and l1 embeddings.Proceedings of Symposium on Foundations of Computer Scince, 1999.
[45] R. M. Haralick and L. G. Shapiro. The consistent labeling problem. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1:173–184, 1979.
[46] J.L. Helman and L. Hesselink. Visualizing vector field topology in fluid flows. IEEEComputer Graphics and Applications, 11(3):36–46, 1991.
[47] P. Indyk. Algorithmic aspects of geometric embeddings. In Proceedings, 42ndAnnual Symposium on Foundations of Computer Science, 2001.
[48] P. Indyk and N. Thaper. Fast image retrieval via embeddings. In 3rd Intl. Workshopon Statistical and Computational Theories of Vision, 2003.
[49] S. Ioffe and D.A. Forsyth. Human tracking with mixtures of trees. In ICCV01, pagesI: 690–695, 2001.
[50] C. Irniger and H. Bunke. Graph matching: Filtering large databases of graphs usingdecision trees. IAPR-TC15 Workshop on Graph-based Representation in PatternRecognition, pages 239–249, 2001.
[51] C. Irniger and H. Bunke. Graph database filtering using decision trees. In Proceed-ings, 12th International Conference on Pattern Recognition, pages 383–388, 2004.
[52] C. Irniger and H. Bunke. Decision trees for error-tolerant graph database filtering.In GbRPR, pages 301–311, 2005.
125
[53] I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.
[54] F. Kanters, L. Florack, B. Platel, and B. M. ter Haar Romeny. Image reconstructionfrom multiscale critical points. In Scale-Space, pages 464–478, 2003.
[55] F. Kanters, B. Platel, L. Florack, and B. M. ter Haar Romeny. Content based imageretrieval using multiscale top points. In Scale-Space, pages 33–43, 2003.
[56] Y. Keselman, A. Shokoufandeh, M. F. Demirci, and S. Dickinson. Many-to-manygraph matching via low-distortion embedding. In Proceedings, IEEE Conference onComputer Vision and Pattern Recognition, Madison, WI, June 2003.
[57] B. B. Kimia, A. Tannenbaum, and S. W. Zucker. Shape, shocks, and deformations I:The components of two-dimensional shape and the reaction-diffusion space. Int. J.Computer Vision, 15:189–224, 1995.
[58] S. Kosinov and T. Caelli. Inexact multisubgraph matching using graph eigenspaceand clustering models. In Proceedings of SSPR/SPR, volume 2396, pages 133–142.Springer, 2002.
[59] L. M. Lifshitz and S. M. Pizer. A multiresolution hierarchical approach to imagesegmentation based on intensity extrema. IEEE Transactions on Pattern Analysisand Machine Intelligence, 12(6):529–540, 1990.
[60] N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of itsalgorithmic applications. Proceedings of 35th Annual Symposium on Foundations ofComputer Science, pages 557–591, 1994.
[61] N. Linial, A. Magen, and M. E. Saks. Trees and Euclidean metrics. Proceedings ofthe Thirtieth Annual ACM Symposium on the Theory of Computing, pages 169–175,1998.
[62] T.-L. Liu and D. Geiger. Approximate tree matching and shape similarity. InProceedings, 7th International Conference on Computer Vision, pages 456–462,Kerkyra, Greece, 1999.
[63] J. Llados, E. Marti, and J. Villanueva. Symbol recognition by error-tolerant subgraphmatching between region adjacency graphs. IEEE Transactions on Pattern Analysisand Machine Intelligence, 23(10):1137–1143, 2001.
126
[64] B. Luo and E.R.Hancock. Structural matching using the em algorithm and singularvalue decomposition. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 23:1120–1136, 2001.
[65] J. Matousek. On the distortion required for embedding finite metric spaces intonormed spaces. Israel Journal of Mathematics, 93:333–344, 1996.
[66] J. Matousek. On embedding trees into uniformly convex Banach spaces. IsraelJournal of Mathematics, 237:221–237, 1999.
[67] B. Messmer and H. Bunke. Efficient error-tolerant subgraph isomorphism detection.In D. Dori and A. Bruckstein, editors, Shape, Structure and Pattern Recognition,pages 231–240. World Scientific Publ. Co., 1995.
[68] B. T. Messmer and H. Bunke. Subgraph isomorphism in polynomial time. TechnicalReport IAM 95-003, 1995.
[69] R. Myers, R. Wilson, and E. Hancock. Bayesian graph edit distance. IEEE PAMI,22(6):628–635, 2000.
[70] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil-20).Technical Report CUCS-005-96, February 1996.
[71] S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructureddata. pages 295–306, 1998.
[72] S. Nestorov, J. D. Ullman, J. L. Wiener, and S. S. Chawathe. Representative objects:Concise representations of semistructured, hierarchial data. In ICDE, pages 79–90,1997.
[73] M. Nielsen and M. Lillholm. What do features tell about images? In Scale-Space’01: Proceedings of the Third International Conference on Scale-Space and Mor-phology in Computer Vision, pages 39–50, London, UK, 2001. Springer-Verlag.
[74] A. Okabe and B. Boots. Spatial tessellations: Concepts and applications of Voronoidiagrams. John Wiley and Sons, New York, 1992.
[75] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. ACMTransactions on Graphics, 21(4):807–832, Oct. 2002.
127
[76] M. Pelillo, K. Siddiqi, and S. Zucker. Matching hierarchical structures using asso-ciation graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,21(11):1105–1120, November 1999.
[77] B. Platel, M. F. Demirci, A. Shokoufandeh, L. Florack, F. Kanters, B. M. terHaar Romeny, and S. J. Dickinson. Discrete representation of top points via scalespace tessellation. In Scale-Space, pages 73–84, 2005.
[78] B. Platel, F. Kanters, L. Florack, and E. Balmachnova. Using multiscale top pointsin image matching. In 11th International Conference on Image Processing, 2004.
[79] F. Preparata and M. Shamos. Computational Geometry. Springer-Verlag, New York,NY, 1985.
[80] Princeton Shape Retrieval and Analysis, 3D Model Search.http://shape.cs.princeton.edu/search.html.
[81] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linearembedding. Science, 290:2323–2326, December 2000.
[82] Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s distance as a metric forimage retrieval. International Journal of Computer Vision, 40(2):99–121, 2000.
[83] A. Sanfeliu and K. S. Fu. A distance measure between attributed relational graphs forpattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13:353–362, May 1983.
[84] A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphsfor pattern recognition. SMC, 13(3):353–362, May 1983.
[85] G. Scott and H. Longuet-Higgins. An algorithm for associating the features of twopatterns. Proceedings of Royal Society of London, B244:21–26, 1991.
[86] T. Sebastian, P. Klein, and B. Kimia. Recognition of shapes by editing shock graphs.In IEEE International Conference on Computer Vision, pages 755–762, 2001.
[87] T. Sebastian, P. N. Klein, and B. Kimia. Recognition of shapes by editing theirshock graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,26(5):550–571, 2004.
128
[88] T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock-based indexing into large shapedatabases. In ECCV ’02: Proceedings of the 7th European Conference on ComputerVision-Part III, pages 731–746, London, UK, 2002. Springer-Verlag.
[89] K. Sengupta and K. L. Boyer. Organizing large structural modelbases. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 17(4):321–332, 1995.
[90] K. Sengupta and K. L. Boyer. Modelbase partitioning using property matrix spectra.Computer Vision Image Understanding, 70(2):177–196, 1998.
[91] L. G. Shapiro and R. M. Haralick. Structural descriptions and inexact matching.IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:504–519, 1981.
[92] L. G. Shapiro and R. M. Haralick. A metric for comparing relational descriptions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:90–94, January1985.
[93] L. G. Shapiro and R.M. Haralick. Organization of relational models for scene analy-sis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(6):595–602,November 1982.
[94] P. Shilane, M. Kazhdan P. Min, and T. Funkhouser. The princeton shape benchmark.In Shape Modeling International, Genoa, Italy, June 2004.
[95] A. Shokoufandeh, S.J. Dickinson, C. Jonsson, L. Bretzner, and T. Lindeberg. On therepresentation and matching of qualitative shape at multiple scales. In Proceedings,7th European Conference on Computer Vision, volume 3, pages 759–775, 2002.
[96] A. Shokoufandeh, D. Macrini, S.J. Dickinson, K. Siddiqi, and S.W. Zucker. Indexinghierarchical structures using graph spectra. PAMI, 27(7):1125–1140, July 2005.
[97] K. Siddiqi and B. B. Kimia. A shock grammar for recognition. In IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, pages 507–513,1996.
[98] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker. Shock graphs and shapematching. In Proceedings, IEEE International Conference on Computer Vision,pages 222–229, Bombay, January 1998.
[99] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker. Shock graphs and shapematching. International Journal of Computer Vision, 30:1–24, 1999.
129
[100] H. Sossa and R. Horaud. Model indexing: the graph-hashing approach. In Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition, Urbana-Champaign, Illinois, USA, June 1992.
[101] H. Sundar, D. Silver, N. Gagvani, and S. Dickinson. Skeleton based shape matchingand retrieval. In Shape Modelling and Applications Conference, SMI 2003, Seoul,Korea, May 2003.
[102] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework fornonlinear dimensionality reduction. Science, 290:2319–2323, 2000.
[103] A. Torsello, D. Hidovic, and M. Pelillo. Four metrics for efficiently comparingattributed trees. In Proceedings, International Conference on Pattern Recognition,pages 467–470, 2004.
[104] S. Umeyama. Least-squares estimation of transformation parameters between twopoint patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(4):376–380, April 1991.
[105] M.S. Waterman, T.F. Smith, M. Singh, and W.A. Beyer. Additive evolutionary trees.J. Theoret. Biol., 64:199–213, 1977.
130
Vita
Muhammed Fatih Demirci was born in Turkey, in 1978. He received his B.S. from the
Department of Computer Engineering, Selcuk University, Turkey in 1999 and his M.S. in
Computer Science from Drexel University in Philadelphia in 2002. Mr. Demirci is the
recipient of the College of Engineering Outstanding Graduate Student Research Award,
Drexel University (2003 - 2004). His research interests include computer vision, statistical
and structural pattern recognition, feature tracking, and graph theory.
Selected Publications:
Cornea, Demirci, Silver, Shokoufandeh, Dickinson, Kantor. 3D Object Retrieval usingMany-to-Many Matching of Curve Skeletons. IEEE International Conference on ShapeModeling and Applications 2005.
Platel, Demirci, Shokoufandeh, Florack, Kanters, Romeny, Dickinson. Discrete Rep-resentation of Top Points via Scale Space Tessellation. 5th International Conference onScale-Space 2005: 73-84.
Demirci, Shokoufandeh, Dickinson, Keselman, Bretzner. Many-to-Many Feature Match-ing Using Spherical Coding of Directed Graphs. The 8th European Conference on Com-puter Vision - ECCV (1) 2004: 322-335.
Demirci, Shokoufandeh, Keselman, Dickinson, Bretzner. Many-to-Many Matching ofScale-Space Feature Hierarchies Using Metric Embedding. 4th International Conferenceon Scale-Space 2003: 17-32.
Keselman, Shokoufandeh, Demirci, Dickinson. Many-to-Many Graph Matching viaMetric Embedding. IEEE Conference on Computer Vision and Pattern Recognition -CVPR (1) 2003: 850-857.