mesh-based active monte carlo recognition (mesh · pdf filemesh-based active monte carlo...

6
MESH-based Active Monte Carlo Recognition (MESH-AMCR) Felix v. Hundelshausen and H.J. W ¨ unsche University of the Federal Armed Forces Munich Department of Aerospace Engineering Autonomous Systems Technology 85577 Neubiberg, Germany M. Block, R. Kompass and R. Rojas Free University of Berlin Department of Computer Science 14195 Berlin, Germany Abstract In this paper we extend Active Monte Carlo Recognition (AMCR), a recently proposed framework for object recog- nition. The approach is based on the analogy between mobile robot localization and object recognition. Up to now AMCR was only shown to work for shape recognition on binary im- ages. In this paper, we significantly extend the approach to work on realistic images of real world objects. We accom- plish recognition under similarity transforms and even severe non-rigid and non-affine deformations. We show that our ap- proach works on databases with thousands of objects, that it can better discriminate between objects than state-of-the art approaches and that it has significant conceptual advan- tages over existing approaches: It allows iterative recogni- tion with simultaneous tracking, iteratively guiding attention to discriminative parts, the inclusion of feedback loops, the simultaneous propagation of multiple hypotheses, multiple object recognition and simultaneous segmentation and recog- nition. While recognition takes place triangular meshes are constructed that precisely define the correspondence between input and prototype object, even in the case of strong non- rigid deformations. Introduction Recently, Active Monte Carlo Recognition (AMCR), a new object-recognition approach exploiting the analogy between robot localization and object recognition was proposed by (von Hundelshausen & Veloso 2006). Although the ap- proach is conceptually interesting, it was only shown to work for shape recognition on binary images, so far. Fur- thermore the proposed approach is not scalable to large ob- ject databases. In this paper we massively extend the approach in various ways, such that it works on realistic images of real-world objects, that it is scalable to large object databases, and such that a correspondence mesh is constructed between input and prototype object recovering correspondence even in the case of large deformations. We show that our new approach MESH-AMCR can better discriminate between objects than state-of-the art methods and that it performs better geomet- ric verification than current approaches for both rigid and deformable objects. Copyright c 2007, American Association for Artificial Intelli- gence (www.aaai.org). All rights reserved. The field of object recognition can be divided in Object Class Recognition, the problem of recognizing categories of objects, such as cars, bikes or faces (see (Fei-Fei, Fergus, & Perona 2004), (Serre, Wolf, & Poggio 2005) , (Mutch & Lowe 2006)) and Object Identification, the problem of recognizing specific, identical objects, such a identifying a specific face or a specific building. Our approach MESH- AMCR addresses this latter problem. Figure 1: Palm Sunday, an ambiguous painting by the Mex- ican artist Octavio Ocampo State-of-the-art approaches in object identification typi- cally divide the task in two stages, Feature Matching and Geometric Verification. Prominent feature detection meth- ods are (Lowe 2004), (Ke & Sukthankar 2004), (Mikola- jczyk & Schmid 2004) and (Matas et al. 2002). For finding prototype images with matching features different schemes have been tried, reaching from nearest neighbor search and Hough-based grouping (Lowe 2004) to the use of vocab- ulary trees (Nistr & Stewnius 2006). In this first stage, the goal is to retrieve a bunch of candidate images, with- out worrying about the geometric validity of feature corre- spondences. In Geometric Verification the geometric layout of feature correspondences is verified. Typically, RANSAC

Upload: ngocong

Post on 28-Mar-2018

229 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

MESH-based Active Monte Carlo Recognition (MESH-AMCR)

Felix v. Hundelshausen and H.J. WunscheUniversity of the Federal Armed Forces Munich

Department of Aerospace EngineeringAutonomous Systems Technology

85577 Neubiberg, Germany

M. Block, R. Kompass and R. RojasFree University of Berlin

Department of Computer Science14195 Berlin, Germany

Abstract

In this paper we extend Active Monte Carlo Recognition(AMCR), a recently proposed framework for object recog-nition. The approach is based on the analogy between mobilerobot localization and object recognition. Up to now AMCRwas only shown to work for shape recognition on binary im-ages. In this paper, we significantly extend the approach towork on realistic images of real world objects. We accom-plish recognition under similarity transforms and even severenon-rigid and non-affine deformations. We show that our ap-proach works on databases with thousands of objects, thatit can better discriminate between objects than state-of-theart approaches and that it has significant conceptual advan-tages over existing approaches: It allows iterative recogni-tion with simultaneous tracking, iteratively guiding attentionto discriminative parts, the inclusion of feedback loops, thesimultaneous propagation of multiple hypotheses, multipleobject recognition and simultaneous segmentation and recog-nition. While recognition takes place triangular meshes areconstructed that precisely define the correspondence betweeninput and prototype object, even in the case of strong non-rigid deformations.

IntroductionRecently, Active Monte Carlo Recognition (AMCR), a newobject-recognition approach exploiting the analogy betweenrobot localization and object recognition was proposed by(von Hundelshausen & Veloso 2006). Although the ap-proach is conceptually interesting, it was only shown towork for shape recognition on binary images, so far. Fur-thermore the proposed approach is not scalable to large ob-ject databases.

In this paper we massively extend the approach in variousways, such that it works on realistic images of real-worldobjects, that it is scalable to large object databases, and suchthat a correspondence mesh is constructed between inputand prototype object recovering correspondence even in thecase of large deformations. We show that our new approachMESH-AMCR can better discriminate between objects thanstate-of-the art methods and that it performs better geomet-ric verification than current approaches for both rigid anddeformable objects.

Copyright c© 2007, American Association for Artificial Intelli-gence (www.aaai.org). All rights reserved.

The field of object recognition can be divided in ObjectClass Recognition, the problem of recognizing categories ofobjects, such as cars, bikes or faces (see (Fei-Fei, Fergus,& Perona 2004), (Serre, Wolf, & Poggio 2005) , (Mutch& Lowe 2006)) and Object Identification, the problem ofrecognizing specific, identical objects, such a identifying aspecific face or a specific building. Our approach MESH-AMCR addresses this latter problem.

Figure 1: Palm Sunday, an ambiguous painting by the Mex-ican artist Octavio Ocampo

State-of-the-art approaches in object identification typi-cally divide the task in two stages, Feature Matching andGeometric Verification. Prominent feature detection meth-ods are (Lowe 2004), (Ke & Sukthankar 2004), (Mikola-jczyk & Schmid 2004) and (Matas et al. 2002). For findingprototype images with matching features different schemeshave been tried, reaching from nearest neighbor search andHough-based grouping (Lowe 2004) to the use of vocab-ulary trees (Nistr & Stewnius 2006). In this first stage,the goal is to retrieve a bunch of candidate images, with-out worrying about the geometric validity of feature corre-spondences. In Geometric Verification the geometric layoutof feature correspondences is verified. Typically, RANSAC

Page 2: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

(Fischler & Bolles 1981) is used to find valid groups of fea-ture correspondences, typically verifying their positions sub-ject to the class of affine transformations or rigid epipolar ge-ometry (Kushal & Ponce 2006). An exemption is (Alexan-der C. Berg 2005) where low distortion correspondencesare found through solving an integer quadratic programmingproblem, similar to how correspondences where establishedin the work of (Belongie, Malik, & Puzicha 2002).

Although, the referenced approaches let us reach a stateof maturity in object identification, they still have some sig-nificant drawbacks and conceptual problems. With the ex-emption of (Belongie, Malik, & Puzicha 2002) their draw-back is that they use only the feature descriptors to distin-guish between objects. While the features are well-suitedto establish hypothetical scale invariant frames of reference,their descriptors only describe a small amount of informa-tion that could be used for discrimination. SIFT features,for example, are willingly not detected at edges, but edgesat locations where no SIFT-features are detected could helpidentification. The other drawbacks can best be describedby three criterions that should be met by an object recogni-tion framework that is suited to be integrated in a real-timerobotic system.

1. Iterative Recognition The process of recognition shouldbe distributable over several frames, only performing asmall amount of processing in each frame, but this amountbeing rapidly computable, such that other approaches liketracking can take place simultaneously in real-time. Us-ing the information of several frames over time providesricher information for discrimination. None of the aboveapproaches fulfills this criterion, because they are basedon single images and perform all the processing in eachimage. Only by using an iterative object recognitionapproach and combining it with tracking the continuityconstraints that underly successive images of real-worldvideo can be exploited.

2. Allowing Feedback Figure 1 shows, that the mind seeswhat it wants. The nose can either be a nose or an el-bow depending on what the mind wants to see. Sincerecognition is also a question of will” a recognition ap-proach should offer an interface for this will”. For exam-ple, it should offer an interface of where attention is puton and on being able to specify a bias for favored interpre-tations. Feedback not only means the control of attentionbut also on the type of features that are use for discrimina-tion. Also, real motoric actions, like gaze control, or evenmovements of a robot might be necessary to distinguishtwo objects. Although these ideas are not new, none ofthe current best object-identification programs really im-plements these concepts.

3. Propagating Multiple Hypotheses Meanwhile the com-puter vision community has well understood, that seg-mentation cannot be accomplished independently fromrecognition. Thus, no unique ad hoc frame of referencecan be established for comparison, but rather different hy-pothetical matches have to be propagated simultaneously.Rather than performing a depth first examination and it-erative breath-first examination has to take place. While

different interpretations iteratively compete, this compe-tition can be examined to determine where to guide at-tention an which features to use in order to discriminatebetween the competing hypotheses. None of the before-mentioned approaches offers an interface for such a pro-cess.

We will show that MESH-AMCR allows to integrate all theseconcepts. The remainder of the paper is organized as fol-lows. In section 2 we shortly review Active Monte CarloRecognition (AMCR). In section 3 we describe our new ap-proach MESH-AMCR. In section 4 we show experimentalresults. Section 5 finally concludes our paper.

Active Monte Carlo Recognition (AMCR)AMCR was developed as the consequence of recogniz-ing that mobile robot localization and object identifica-tion are essentially the same problem and transferring thebest known localization method, Monte Carlo Localization(MCL) (S. Thrun & Dellaert 2001) to the problem of objectrecognition(von Hundelshausen & Veloso 2006). Similarlyto how a robot explores its environment and localizes itselfin a set of maps, the focus of attention explores an imageand recognition is put as the problem of localizing the focusof attention in one of a set of prototype images. In the sameway as MCL uses a set of particles to approximate a prob-ability distribution for the robot’s position, AMCR uses aset of M-Particles to localize the position of the attentive V-Particle. Furthermore, not only one point of attention mightbe used but AMCR employs a set of V-Particles to explorethe input image, similarly to how an unknown area mightbe explored by a swarm of robots. To apply MCL only twomodels have to be provided, a measurement model and amotion model. In AMCR, the measurement model spec-ifies what kind of measurements are taken at the point inthe image where a V-Particle is located. The motion modelspecifies the distribution of how the M-Particles move whenthe corresponding V-Particle moves. Additionally AMCRrequires the specification of a motion policy, that is a mech-anism of how the V-Particles should explore the input image.

In the original work (von Hundelshausen & Veloso 2006)only binary images of various shapes where used. Each V-and M-Particle was just described by a (x, y) location to-gether with an orientation. The motion policy was definedby letting the V-Particles move in the direction of their ori-entation and letting them being reflected at edges. The mea-surement model was implemented by letting the particlesperforming a range scan, measuring the distance to the firstedge for each scan line, similarly to how laser range scan-ners are used in mobile robot localization.

This instance of AMCR, RADIAL-AMCR is hardly appli-cable on real word images, since the measurement model istoo simplistic. But it has further drawbacks: One is, thatthe method does not scale well with the number of pro-totype objects, since for each V-Particle a whole bunch ofM-Particles has to be instantiated in each prototype image.Another drawback is that in RADIAL-AMCR there was nomechanism to prevent a V-Particle returning to a location italready had explored.

Page 3: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

In our new approach MESH-AMCR we will get rid of allthese drawbacks by specifying a new measurement model,a new motion model and adding the idea of iteratively con-structing hypothetical correspondence-meshes.

Our new approach: MESH-AMCRIn RADIAL-AMCR a swarm of M-Particles starts forminga cluster in a prototype shape that corresponds to the inputimage, at a position that corresponds to the respective V-Particle. While the V-Particle is moving in the input image,the M-Particles move accordingly. Thus, a sequence of hy-pothetically corresponding points in the input image and theprototype images is retrieved. One basic idea of MESH-AMCR is to use these points to construct correspondingmeshes. Here, each V-Particle has a triangular mesh, andeach of its M-Particles has a corresponding triangular meshwith the same topology, but allowing a deformed geometriclayout. Each cell of the meshes is a triangle and each trian-gle in the V-Particle’s mesh has one unique correspondingtriangle in each mesh of its connected M-Particles. WhenMESH-AMCR iteratively converges to a stable interpreta-tion, the mesh of the best M-Particle shows how the inputimage has to be deformed in order to match the retrievedprototype-image. Here, each triangle in the V-particles meshis linearly mapped to the corresponding M-Particle’s trian-gle. Indeed the meshes define an affine transformation thatcan change from cell to cell such that the recovering of non-affine transformations is possible. Before we describe thesemechanisms in detail we first solve the problem of the badscalability to large object databases.

Candidate Object RetrievalWhen using a large database of prototype images, we limitthe number of images to be considered by applying the re-cently proposed vocaulary tree method of (Nistr & Stewnius2006), but using PCA-SIFT features (Ke & Sukthankar2004) instead of maximally stable regions (Matas et al.2002). Here, we have to extract all PCA-SIFT features priorto starting MESH-AMCR, and by applying the tree basesscoring scheme we retrieve the k best candidate images. Wethen apply MESH-AMCR solely on these candidate images.At this point we are violating our own iterative recognitionphilosophy, but we belief that the PCA-SIFT feature extrac-tion approach can be changed in future, such that the pro-cess of extraction can itself be reorganized into an iterativeprocess and distributed over several frames. This would bean interesting topic for future research. Currently, we justextract the features in the first frame, get the candidate pro-totype images and then start our iterative method with thesuccessive frames. In our experiments we take the best 8can-didate images out of a database of 1000 objects.

V- and M-particlesBoth V- and M-Particles will iteratively construct triangu-lar meshes. The trick in MESH-AMCR is that in each it-eration a V-Particle corresponds to exactly one cell in theV-Mesh, and each M-Particle corresponds to exactly onecell in the M-Mesh. But the cells that correspond to the

particles change in each iteration. In fact, the meshes ex-pand in each iteration by one additional cell, respectively,and always these last cells correspond to the particles. Thusboth, V- and M-Particles are basically triangles, defined bythree points. This is quite different to (von Hundelshausen& Veloso 2006) where the particles were just points plusan orientation. In this way each related pair of V- and M-particles naturally defines an affine transformation that rep-resents the hypotheses that the V-triangle has to be linearlymapped to the M-triangle in order to make input and pro-totype image match to each other. However, the validityof the affine transformation is restricted to the particularcells. Pixels outside the triangles have to be mapped bythe affine transformations that are given by the neighboringmesh-cells, if available. Of course, the choice to use trian-gles as mesh-cells is not arbitrarily: Two corresponding setsof three points constitute the smallest entity that uniquelydefine an affine transformation.

Summarizing, a V-particle v is represented as

vi := (pi0,pi1,pi2, V-meshi), (1)

with pik = (xik, yik)Tk=0,1,2, being the corner points of the

triangle and V-meshi being a triangular mesh that initiallyonly contains a copy of the V-Particle’s triangle. Similarly,an M-particle is represented as a triangle, a mesh, and twoadditional indices:

mj := (p′j0,p

′j1,p

′j2, M-meshj , ij , kj), (2)

where ij is the index of the V-particle to which the M-particle is connected and kj is the prototype image in whichthe M-particle is located.

Scale Invariant Features as Igniting SparksThe M-Particles approximate a probability distribution forthe V-Particle’s corresponding position and mesh configu-ration in the prototype maps. Since the meshes are itera-tively expanded, the dimensonality of the the space of M-Particles varies. Initially, the meshes start with a single tri-angle and thus the dimensionality is 6 + 1 = 7 (6 for thethree points and one for the prototype index kj) for the firstiteration. Even this initial dimensionality is very large anda huge number of M-Particles was needed to let recognitionconverge to the correct solution starting for example from auniform distribution of M-Particles. To overcome this prob-lem, the trick is to place V- and M-Particles at initial startingposition that have a high probability of being correct. Weuse the PCA-SIFT matches found in the candidate retrievalphase to define good starting points. Here, we place a V-Particle at each keypoint in the input image and connect it toM-Particles placed at possible matches in the prototype im-ages. More precisely, given a keypoint in the input image,we can efficiently navigate down the vocabulary tree reach-ing a leaf-node that contains a bucket with all keypoints inthe protype images that define the same visual word (see(Nistr & Stewnius 2006)). These hypothetical relations be-tween keypoints are the igniting sparks for MESH-AMCR.

A typical initial situtation is illustrated in figure 2. Themain notion behind the V- and M-particles is that a pictorial

Page 4: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

element in the input image can be interpreted in differentways: For instance, the nose in image 2 can either be in-terpreted as a nose (prototype image face) or as an ellbow(prototype image of an arm). We now define the motion pol-

input image

prototype images

v1v2 m1

m2

m3

V-particles M-particles

Figure 2: After keypoint matching, V- and M-particles aresetup according to probable keypoint matches returned as abyproduct of the candidate object retrieval phase.

icy, the measurment model and the motion model for MESH-AMCR.

Motion Policy and Mesh ExpansionConsider the case of having only one V-particle v :=(p0,p1,p2, V-mesh). In each iteration, the V-particle per-forms a movement, i.e. the coordinates of its three points(p0,p1,p2) change. This movement is performed in a way,that the new triangle aligns with an edge of the old triangle,thereby expanding one of its edges and successively con-structing a triangular mesh. While the V-particle moves ineach iteration, the data structure V −mesh holds the corre-sponding mesh that is constructed. There are many differentpossibilities regarding the order of edges that are expanded.In our experiments we simply expand the boundary meshcells in a clockwise order.

With the V-Particle moving and expanding its mesh byone cell each M-particle performs a corresponding move-ment and mesh expansion. While the topology of the M-mesh corresponds exactly to its V-mesh, the exact positionof the points can vary, thereby allowing deformed meshes asis illustrated in figure 3. The distribution of variation is de-fined in terms of the motion model. Thus while the motionpolicy defines how a V-particle moves, the motion modeldescribes the variations of movements the M-Particles canperform. Note, that the expansion and position of the tri-angular cells is independent of the initial keypoints. Onlythe first cell positions during initialization are placed at thelocation of keypoint matches.

Motion Model and Mesh VariationsBecause of the resampling process, good M-particles willproduce multipe exact copies. When a V-particle performsa movement and mesh expansion step, all the M-particleswill also perform a corresponding movement and mesh-expansion step. However, each M-particle will do so in a

mesh of the V-particle mesh of the M-particle

V-particleM-particle

2. iteration

1. iteration

5. iteration

21. iteration

82. iteration

V-particle

V-particle

V-particle

V-particle

M-particle

M-particle

M-particle

M-particle

Figure 3: During expansion the meshes behold the sametopological structure. However, the positions of the nodes ofthe M-meshes can vary, thereby allowing to match deformedobjects.

slightly different way such that variations arise in the newlyadded triangles of the M-meshes. Indeed, it is only the freepoint of the new triangle that varies among the M-meshes.However, after several iterations of measurement, resam-pling and expansion steps, the resulting meshes can be quitedifferent, even when they all started to be the same.

Measurement ModelIn each iteration, each M-particle gets a weight that repre-sents the quality of the hypothetical relation to its connectedV-particle. This weight is calculated by investigating the ap-propriate pixels in the input and prototype image. The tri-angle of the V-particle and the triangle of the M-particle de-fine an affine transformation and we define a dissimilarityfunction that compares these triangles subject to the givenaffine transformation. Instead of comparing the pixels, webuild histograms with 16 bins of edge orientations and com-pare these histograms. This idea is inspired by (Edelman,Intrator, & Poggio 1997), based upon a biological model ofcomplex cells in primary visual cortex.

Resampling of the M-particlesIn each iteration, the M-particles obtain a weight accordingto the measurement model and the M-particles are resam-

Page 5: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

pled. In this process a complete set of new M-particles isgenerated from the old one. Some of the old M-particles arediscarded, and some are copied multiple times to the newset, with the total number of M-particles remaining constant.The expected number of copies of an M-particle is propor-tional to its weight. Resampling takes place in the set ofall M-particles that are connected to the same V-particle, nomatter in which prototype they lie. Consequently, it can hap-pen that a prototype looses all its M-particles after some it-erations, meaning that the respective prototype does not rep-resent the given input image. Vice versa, M-particles willcluster in prototypes matching the input image. In this waythe computational power is concentrated on similar proto-types.

Experimental ResultsIn this section we compare our method with state-of-the-artobject recognition systems and show its advantages. Sinceour goal is object identification we do not compare to objectclass recognition approaches. We show that MESH-AMCRis better for both geometric verification and distinction be-tween similar objects than existing approaches. To proofthis, we show that MESH-AMCR can deal with all cases inwhich traditional methods work, and that it can additionallysolve cases in which state-of-the-art methods fail.

Example Before describing our results on large objectdatabases we illustrate MESH-AMCR with an example. Inthis example we assume that an image of a deformed ban-knote has to be recognized within 4 candidate objects re-trieved by the vocabulary tree scheme. One of the pro-totype images contains the same but non-deformed ban-knote. While we assume that the prototype images are seg-mented, the input image is not segmented and contains back-ground clutter. Figure 4 shows how recognition is performedand how deformations can be recovered by deforming themeshes of the M-Particles.

Efficient Implementation Although several hypothesesare propagated, our method is still efficient. The number ofV-particles in quite low (10-20). In each iteration only onetriangular cell of each M-Mesh has to be compared to its cor-responding cell of its V-Mesh. Since several M-Particles arelinked to the same V-Particle, the pixels of each V-Particle’scell have to be sampled only once. Comparing the triangularcells subject to their affine transformation can be efficientlydone using standard graphics accelerator hardware (e.g. us-ing DirectX).

Experiments on rigid object databases To evaluate ourmethod we use the wide-baseline stereo dataset of the Am-sterdam Library of Object Images (ALOI) (Geusebroek,Burghouts, & Smeulders 2005) containing realistic imagesof 1000 objects, each captured from three different view-points (left, center, right) rotated by 15 degrees each. Weused all 1000 right images to learn the vocabulary tree. Astest images we used the left images (differing by 30 degrees)but additionally we scaled them down to 40 percent of theoriginal size, plus rotating each about 45 degrees. For thegeometric verification phase, we implemented a method of

Figure 4: Red (dark) triangles show the position of V- andM-Particles. The yellow (light) triangles show the corre-sponding meshes. The pixels within the M-Meshes are tex-ture mapped from its corresponding V-Meshes. In (a) V- andM-Particles are initialized according to well matching PCA-SIFT features. (b) shows the situation after 3 iterations. In(c) only the maximum a posteriori (MAP) estimate of theM-Particle and its corresponding V-Particle is shown. Themeshes of the MAP V- and M-particles specify the corre-spondence between input and correctly identified prototypeimage.

reference according to current approaches like (Lowe 2004)and (Ke, Sukthankar, & Huston 2004). They use affinetransformations to explain the geometric layout for keypointmatches. In our reference method, we use RANSAC (Fis-chler & Bolles 1981) to verify the geometric validity of key-point matches based on affine transformations. Using thisapproach we get a recognition rate of 72.8 percent on the1000 objects database. For comparison we applied MESH-AMCR on the same dataset and test images. Here we usedonly 6 iterations leading to hypothetical correspondence-

Page 6: MESH-based Active Monte Carlo Recognition (MESH · PDF fileMESH-based Active Monte Carlo Recognition (MESH-AMCR) ... such as cars, bikes or faces (see (Fei-Fei, ... best known localization

meshes with 6 triangular cells, respectively. As recognizedobject we take the candidate-image that has the most M-Particles at the end. In doing so, we get a recognition rateof 85.2 percent. Thus, MESH-AMCR is performing signif-icantly better. MESH-AMCR correctly recognized 46.1 ofthe cases that were falsely classified by the affine transfor-mation based RANSAC method.

These failures typically occur when the image of one ofthe objects - input object or prototype object - has onlyfew keypoints. In the work of (Ke, Sukthankar, & Huston2004) 5 keypoint matches are required as minimal configu-ration that satisfies geometric constraints. In the approachof (Lowe 2004) as few as 3 keypoint-matches constitute theminimal valid constellation. This design was made, to allowrecognition of objects with few keypoints. Unfortunately,the less keypoint-matches the more likely the chance of find-ing a matching group in a wrong prototype image.

The advantage of our method is that geometric veri-fication is not based on the keypoint descriptors but onthe original image data referenced by the constructedcorrespondence-meshes. Thus, our source of information ismuch richer for both discrimination and geometric verifica-tion. We only need one keypoint-match as igniting spark formesh-construction at the beginning. Thus, while requiringless keypoint-matches to initiate verification, we still havericher information to check for geometric constraints.

ConclusionsIn this paper, we have proposed MESH-AMCR, a new algo-rithm for object identification that has its origins in meth-ods for mobile robot localization. Besides the advantagesof better discrimination and recognition of deformable ob-jects our approach has some conceptual advantages: First,recognition is performed in an iterative way such that otheralgorithms like tracking can run in parallel. Second, it doesnot assume a prior segmentation of the input images. Third,it is able to propagate several hypotheses simultaneously.Hence, the simultaneous recognition of several objects ispossible, too. Additionally, the algorithm yields precise cor-respondence in form of corresponding mesh-cells. There areplenty of possibilities of extending the approach: One of themost interesting one is how to guide the V-Particles to partsthat let us best discriminate among competing hypotheses,and how to dynamically adjust the measurement models ofthe V-Particles for best discrimination. Another interestingquestion is of how the meshes could be used to recover shad-ing models to allow larger variations in lighting conditions.

ReferencesAlexander C. Berg, Tamara L. Berg, J. M. 2005. Shapematching and object recognition using low distortion cor-respondence. In IEEE Computer Vision and Pattern Recog-nition (CVPR) 2005.Belongie, S.; Malik, J.; and Puzicha, J. 2002. Shape match-ing and object recognition using shape contexts. IEEETrans. Pattern Anal. Mach. Intell. 24(4):509–522.Edelman, S.; Intrator, N.; and Poggio, T. 1997. Com-

plex cells and object recognition. unpublished manuscript:http://kybele.psych.cornell.edu/edelman/archive.html.Fei-Fei, L.; Fergus, R.; and Perona, P. 2004. Learninggenerative visual models from few training examples: Anincremental bayesian approach tested on 101 object cate-gories. 178–178.Fischler, M. A., and Bolles, R. C. 1981. Random sampleconsensus: a paradigm for model fitting with applicationsto image analysis and automated cartography. Communi-cations of the ACM 24(6):381–395.Geusebroek, J. M.; Burghouts, G. J.; and Smeulders, A.W. M. 2005. The Amsterdam library of object images. Int.J. Comput. Vision 61(1):103–112.Ke, Y., and Sukthankar, R. 2004. Pca-sift: A more dis-tinctive representation for local image descriptors. In IEEEComputer Society Conference on Computer Vision and Pat-tern Recognition (CVPR 2004), volume 2, 506–513.Ke, Y.; Sukthankar, R.; and Huston, L. 2004. Efficientnear-duplicate and sub-image retrieval. In Proceedings ofACM Mulimedia.Kushal, A., and Ponce, J. 2006. Modling 3d objects fromstereo views and recognizing them in photographs. In Eu-ropean Conference on Computer Vision (ECCV).Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vi-sion 60(2):91–110.Matas, J.; Chum, O.; Urban, M.; and Pajdla, T. 2002. Ro-bust wide baseline stereo from maximally stable extremalregions. In BMVC, 384–393.Mikolajczyk, K., and Schmid, C. 2004. Scale and affineinvariant interest point detectors. International Journal ofComputer Vision 60(1):63–86.Mutch, J., and Lowe, D. G. 2006. Multiclass object recog-nition with spars, localized features. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), toappear.Nistr, D., and Stewnius, H. 2006. Scalable recognition witha vocabulary tree. In CVPR 2006.S. Thrun, D. Fox, W. B., and Dellaert, F. 2001. RobustMonte Carlo Localization for Mobile Robots. Artificial In-telligence.Serre, T.; Wolf, L.; and Poggio, T. 2005. Object recogni-tion with features inspired by visual cortex. In 2005 IEEEComputer Society Conference on Computer Vision and Pat-tern Recognition (CVPR 2005), 994–1000.von Hundelshausen, F., and Veloso, M. 2006. Active montecarlo recognition. In Proceedings of the 29th Annual Ger-man Conference on Artificial Intelligence. Springer Lec-ture Notes in Artificial Intelligence.