geometric hashing visual recognition lecture 9 “answer me speedily” psalm, 17

Geometric Hashing

Visual RecognitionVisual Recognition

Lecture 9Lecture 9

“Answer me speedily” Psalm, 17

Main approaches to recognition:

Pattern recognitionPattern recognition InvariantsInvariants AlignmentAlignment Part decompositionPart decomposition Functional descriptionFunctional description

Geometric Hashing A technique for model-based recognition of 3-D objects A technique for model-based recognition of 3-D objects

from unknown view points using single gray scale imagesfrom unknown view points using single gray scale images Especially useful for recognition of scenes with Especially useful for recognition of scenes with overlapping and partially occluded objectsoverlapping and partially occluded objects An efficient matching algorithm which assumes An efficient matching algorithm which assumes affine approximationaffine approximation The algorithm has an off-line model preprocessing phase The algorithm has an off-line model preprocessing phase

and a recognition phase to reduce matching complexityand a recognition phase to reduce matching complexity Successfully tested in recognition of flat industrial objects Successfully tested in recognition of flat industrial objects

appearing in composite occluded scenes.appearing in composite occluded scenes.

Definition of the Problem Object recognition in a cluttered 3-D scene Object recognition in a cluttered 3-D scene The models of the objects to be recognized are The models of the objects to be recognized are

assumed to be known in advance assumed to be known in advance The objects in the scene may overlap and also be The objects in the scene may overlap and also be

partially occluded by other (unknown objects)partially occluded by other (unknown objects) The image may be obtained from an arbitrary The image may be obtained from an arbitrary

viewpoint viewpoint At this stage we will assume that we are dealing with At this stage we will assume that we are dealing with

flat objectsflat objects

Pliers and their composite scene – observe different lengths of handles in the composite scene due to tilt

We assume that the depth of the centroids of the We assume that the depth of the centroids of the objects in the scene is large compared to the focal objects in the scene is large compared to the focal length of the camera, and that the depth variation of length of the camera, and that the depth variation of the objects are small compared to the depth of their the objects are small compared to the depth of their centroids centroids

Under these assumptions it is well known that the Under these assumptions it is well known that the perspective projection is well approximated by a perspective projection is well approximated by a parallel (orthographic) projection with a scale factor parallel (orthographic) projection with a scale factor

Hence, two different images of the same flat object Hence, two different images of the same flat object are in an affine 2-D correspondence are in an affine 2-D correspondence

There is a non singular 2X2 matrix There is a non singular 2X2 matrix AA and 2-D and 2-D (translation) vector (translation) vector bb such that each point such that each point xx in the in the first image is translated to the corresponding point first image is translated to the corresponding point

Ax + bAx + b in the second image in the second image

Our problem is:

to recognize the objects in the scene , and to recognize the objects in the scene , and for each recognized object to find the for each recognized object to find the affine transformation that gives the best affine transformation that gives the best least-squares fit between the model of the least-squares fit between the model of the object and its transformed image in the object and its transformed image in the scene.scene.

Choice of ‘Interest Points’ The matching algorithm extracts ’interest points’ both The matching algorithm extracts ’interest points’ both

in the object model images and in the scene image to in the object model images and in the scene image to find the best match between those point setsfind the best match between those point sets

Point extraction methods should be data base Point extraction methods should be data base dependent. Different data bases of models will suggest dependent. Different data bases of models will suggest different natural ‘interest points’different natural ‘interest points’

For example - a DB of polyhedral objects naturally For example - a DB of polyhedral objects naturally suggest the use of polyhedral vertices as ‘interest suggest the use of polyhedral vertices as ‘interest points’ , while ‘curved’ objects suggest the use of points’ , while ‘curved’ objects suggest the use of sharp convexities, deep concavities and ,maybe, zero sharp convexities, deep concavities and ,maybe, zero curvature points curvature points

Extracted interest points in the composite scene image

‘‘Interest points’ do not have to appear physically in the Interest points’ do not have to appear physically in the image. For example, a point may be taken as the image. For example, a point may be taken as the intersection of two non-parallel line segments, which are intersection of two non-parallel line segments, which are not necessary touching.not necessary touching.

An ‘interest point’ does not necessarily have to correspond An ‘interest point’ does not necessarily have to correspond to a geometrical feature (e.g. an ‘interest operation’ based to a geometrical feature (e.g. an ‘interest operation’ based on high variance in intensity - Barnard 1980)on high variance in intensity - Barnard 1980)

The basic assumption is that enough ‘interest points’ can The basic assumption is that enough ‘interest points’ can be extracted in the relevant imagesbe extracted in the relevant images

No special classification of these points is assumedNo special classification of these points is assumed

Recognition of a Single Model Affine transformation of the plane is uniquely defined by Affine transformation of the plane is uniquely defined by

the transformation of three non-collinear pointsthe transformation of three non-collinear points There is unique affine transformation which maps any There is unique affine transformation which maps any

non-collinear triplet in the plane to another non-collinear non-collinear triplet in the plane to another non-collinear triplettriplet

Hence we may extract interesting points on the model and Hence we may extract interesting points on the model and the scene and try to match non-collinear triplets of such the scene and try to match non-collinear triplets of such points to obtain candidate affine transformationspoints to obtain candidate affine transformations

Each such transformation can be checked by matching the Each such transformation can be checked by matching the transformed model against the scene (classical alignment transformed model against the scene (classical alignment see Huttenlocher and Ullman 87)see Huttenlocher and Ullman 87)

Unfavorable Complexity Quite Given Quite Given mm points in the model and points in the model and nn points in the points in the

scene,the worst case complexity is , where scene,the worst case complexity is , where tt is the complexity of matching the model against the is the complexity of matching the model against the scene scene

If we assume that If we assume that mm and and nn are of the same magnitude , are of the same magnitude , and and tt is at least of magnitude is at least of magnitude mm , the worst case , the worst case complexity is of order !complexity is of order !

One way to reduce complexity is to classify the points in One way to reduce complexity is to classify the points in a distinctive way, so that each triplet can match only a a distinctive way, so that each triplet can match only a small number of other triplets (however, such a small number of other triplets (however, such a distinction might not exist or cannot be made in reliable distinction might not exist or cannot be made in reliable way)way)

A more efficient triplet matching algorithm: GHA more efficient triplet matching algorithm: GH

3m n t

7n

The Algorithm

Two major steps:Two major steps: A preprocessing stepA preprocessing step

applied to the model points applied to the model points does not use any information about the scene does not use any information about the scene is executed off-line before actual matching is is executed off-line before actual matching is

attemptedattempted Proper matchingProper matching

Uses the data prepared by the first step to Uses the data prepared by the first step to match the models against the scenematch the models against the scene

Execution time of this second step is the actual Execution time of this second step is the actual recognition timerecognition time

Independence that allows comparison An affine transformation is uniquely defined by the An affine transformation is uniquely defined by the

transformation of three non-collinear pointstransformation of three non-collinear points Consider a set of Consider a set of mm points and pick any ordered subset points and pick any ordered subset

of three non-collinear pointsof three non-collinear points The two linearly independent vectors based on these The two linearly independent vectors based on these

points are a 2-D linear basispoints are a 2-D linear basis The coordinates of all model points can be expressed in The coordinates of all model points can be expressed in

this basisthis basis

Affine Transformation Any affine transformation applied to the set point will not change Any affine transformation applied to the set point will not change

the set of coordinates based on the same ordered basis tripletthe set of coordinates based on the same ordered basis triplet Let be an ordered affine basis triplet in the planeLet be an ordered affine basis triplet in the plane The affine coordinates of a point The affine coordinates of a point vv are: are:

Application of an affine transformation T will transform the point Application of an affine transformation T will transform the point vv to: to:

Hence has the same coordinates in the basis triplet Hence has the same coordinates in the basis triplet

10 00 01 00 00v e e e e e

10 00 01 00 00 v e e e e e

v ,

00 10 01, ,e Te Te

00 10, 01,e e e ,

Preprocessing Given an image of a model, where Given an image of a model, where mm ‘interest points’ have been ‘interest points’ have been

extractedextracted For each ordered non-collinear triplet points the coordinates of all For each ordered non-collinear triplet points the coordinates of all

other other m-3m-3 model points are computed taking this triplet as an model points are computed taking this triplet as an affine of the 2-D planeaffine of the 2-D plane

Each such coordinate (after a proper quantization) is used as an Each such coordinate (after a proper quantization) is used as an entry to a hash-table, where we record the number of the basis-entry to a hash-table, where we record the number of the basis-triplet at which the coordinate was obtained and the number of triplet at which the coordinate was obtained and the number of the model (in case of more than one model )the model (in case of more than one model )

The complexity of this preprocessing step is of order The complexity of this preprocessing step is of order per modelper model New models added to the DB can be processed independently New models added to the DB can be processed independently

without re-computing the hash-tablewithout re-computing the hash-table

4m

Recognition Given an image of a scene, where ‘interest points’ have been Given an image of a scene, where ‘interest points’ have been

extractedextracted Choose an arbitrary ordered triplet in the scene and compute the Choose an arbitrary ordered triplet in the scene and compute the

coordinates of the scene points taking this triplet as an affine basiscoordinates of the scene points taking this triplet as an affine basis For each such coordinate check the appropriate entry in the hash-For each such coordinate check the appropriate entry in the hash-

table, and for every pair (model number,basis-triplet number), table, and for every pair (model number,basis-triplet number), which appears there, tally a vote for the model and the basis-triplet which appears there, tally a vote for the model and the basis-triplet as corresponding to the triplet which was chosen in the scene (If as corresponding to the triplet which was chosen in the scene (If there is only one model, we have to vote for the basis triplet alone)there is only one model, we have to vote for the basis triplet alone)

If a certain pair (model, basis-triplet) scores a large number of If a certain pair (model, basis-triplet) scores a large number of votes, decide that this triplet corresponds to the one chosen in the votes, decide that this triplet corresponds to the one chosen in the scene scene

The uniquely defined affine transformation between these triplets is The uniquely defined affine transformation between these triplets is assumed to be the transformation between the model and the sceneassumed to be the transformation between the model and the scene

If the current triplet doesn’t score high enough, pass to another If the current triplet doesn’t score high enough, pass to another basis-triplet in the scenebasis-triplet in the scene

Some Remarks

For the algorithm to be successful it is enough, For the algorithm to be successful it is enough, theoretically, to pick three non – collinear points in the theoretically, to pick three non – collinear points in the scene , belonging to one model. scene , belonging to one model. The voting process, per triplet, is linear in the number The voting process, per triplet, is linear in the number points in the scene. points in the scene. Hence, the overall recognition time is dependent on Hence, the overall recognition time is dependent on the number of model points in the scene, and the numberthe number of model points in the scene, and the number of additional ‘interest points’ which belong to the scene, of additional ‘interest points’ which belong to the scene, but did not appear on any of the modelsbut did not appear on any of the models In the worst case ,we might have an order of In the worst case ,we might have an order of operationsoperations

4n

tt

When the number of models is small the algorithm will be When the number of models is small the algorithm will be much faster much faster

If there are If there are kk model points in a scene of model points in a scene of nn points, the points, the probability of not choosing aprobability of not choosing a model triplet in model triplet in tt trials is trials is approximately:approximately:

Hence, for a given , if we assume a lower bound on Hence, for a given , if we assume a lower bound on the ‘density’ of model points in a scene, then the the ‘density’ of model points in a scene, then the number of trials number of trials tt giving is of order giving is of order

which is a constant independent of which is a constant independent of nn Since the verification process is linear in n we have an Since the verification process is linear in n we have an

algorithm of complexity which will succeed with algorithm of complexity which will succeed with probability of at least probability of at least

3

1

t

kp

n

1 k

dn

p 3

log,

log 1 d

,O n1 .

Close Basis Points Numerical errors in the point coordinates are more severe Numerical errors in the point coordinates are more severe

when the basis points are close to each other compared to the when the basis points are close to each other compared to the other model points in the sceneother model points in the scene

To overcome this problem: To overcome this problem: If a certain basis triplet gets a number of votes,which,on one If a certain basis triplet gets a number of votes,which,on one

hand , are not enough to accept it as a ‘candidate’ basis, but, hand , are not enough to accept it as a ‘candidate’ basis, but, on the other hand, do not justify total rejection –on the other hand, do not justify total rejection –

change this triplet by another triplet consisting of points, change this triplet by another triplet consisting of points, which were among the ‘voting’ coordinates, and are more which were among the ‘voting’ coordinates, and are more distant from each other than the previous basis points. distant from each other than the previous basis points.

In the correct case this procedure will result in a growing In the correct case this procedure will result in a growing match, as the numerical errors become less significant match, as the numerical errors become less significant

Even if a basis-triplet belonging to some model did not get Even if a basis-triplet belonging to some model did not get enough votes due to noisy data, we still have chance to enough votes due to noisy data, we still have chance to recover this model from another basis-tripletrecover this model from another basis-triplet

Finding the Best Least-Squares Match

Assume that we are looking for an affine match between the Assume that we are looking for an affine match between the

sequences of planar points sequences of planar points andand We would like to find the affine transformation of We would like to find the affine transformation of

the plane which minimize the distance between the sequences the plane which minimize the distance between the sequences

and and

To simplify the calculation, first translate the set so that To simplify the calculation, first translate the set so that

Tu Au b 2l

1

n

j jTu

1

n

j jv

1

n

j jv

1

n

j ju

2

1

min .n

j jT

j

Tu v

ju

1

0.n

jj

u

Then Then

But But

Hence Hence bb and and AA appear independently in and we can appear independently in and we can minimize their contribution separately.minimize their contribution separately.

2

,1

min n

j jA b

j

Au b v

2 2

,1 1 1 1

min 2 2 .n n n n

j j j j jA b

j j j j

b v Au b Au Au v

1 1

0.n n

j jj j

b Au b A u

To minimize over To minimize over bb we simply put we simply put

As to denoteAs to denote

We have to find We have to find

To find this minima one has to solve the following system of 4 To find this minima one has to solve the following system of 4 equations equations

(*)(*)

1

1.

n

jj

b vn

, 1, 2; 1, 2ijA a i j

2

11 12 21 221 1

, , , 2n n

j j jj j

g A g a a a a Au Au v

11 12 21 22min min , , , .A

g A g a a a a

1,2; 1,2 :i j

0ij

g

a

Since Since gg is a quadratic function in each of its is a quadratic function in each of its

unknowns, (*) is a system of four linear equations withunknowns, (*) is a system of four linear equations with

four unknowns.four unknowns.

((Actually two independent sets of two linear equations with two unknownsActually two independent sets of two linear equations with two unknowns) )

For For i=1,2i=1,2 define the following four define the following four nn -dimensional -dimensional

vectors: vectors: 1

ni ij j

U u

1

ni ij j

V v

The solution of (*) is given byThe solution of (*) is given by

WhereWhere

As we can see is dependent only on one set of points (in As we can see is dependent only on one set of points (in this case the model points), so we can know in advance, this case the model points), so we can know in advance, which sets of model points will give a solution for the minima.which sets of model points will give a solution for the minima.

1 1 2 2 1 2 1 2U U U U U U U U

1 1 2 2 2 1 1 2

11

1 1 2 1 1 1 1 2

12

1 2 2 2 1 2 2 2

21

1 1 2 2 1 2 1 2

22

U V U U U V U Ua

U U U V U V U Ua

U V U U U U U Va

U U U V U V U Ua

WhereWhere

As we can see is dependent only on set of points (in thisAs we can see is dependent only on set of points (in this

case the model points), so we can know in advance, whichcase the model points), so we can know in advance, which

sets of model points will give a solution for the minima.sets of model points will give a solution for the minima.

1 1 2 2 1 2 1 2U U U U U U U U

Left - Pliers rotated and tilted in space (see different length of handles)Right –Extracted ‘interest points’

Left – A fit obtained by calculating the affine transformation from three basis pointsRight – Same model is fitted using the best lest-squares affine match based on 10 points (all of which were recovered by the transformation obtained in the left image)

Summary of the Algorithm

Our algorithm can be summarized as follows:Our algorithm can be summarized as follows:

A A Represent the model objects by sets of ‘interest points’ Represent the model objects by sets of ‘interest points’

BB For each non-collinear triplet of model points compute the For each non-collinear triplet of model points compute the coordinates of all the other model points according to this basiscoordinates of all the other model points according to this basis triplet and hash these coordinates into a table which stores alltriplet and hash these coordinates into a table which stores all the parts(model number, basis triplet number) for everythe parts(model number, basis triplet number) for every coordinatecoordinate

CC Given an image of a scene extract its interest points, choose Given an image of a scene extract its interest points, choose a triplet of non-collinear points as a basis triplet and compute a triplet of non-collinear points as a basis triplet and compute

the coordinates of the other points in this basis. the coordinates of the other points in this basis. For each such coordinate vote for the pairs (model number, For each such coordinate vote for the pairs (model number, basis triplet number), and find the pairs which obtained the basis triplet number), and find the pairs which obtained the most coincidence votes. most coincidence votes. If a certain pair scored a large number of votes, decide that If a certain pair scored a large number of votes, decide that its model and basis triplet correspond to the one chosen in its model and basis triplet correspond to the one chosen in the scene. the scene. If not, continue by checking another basis tripletIf not, continue by checking another basis triplet

DD For each candidate model and basis triplet from the For each candidate model and basis triplet from the previous step, establish a correspondence between the previous step, establish a correspondence between the model points and the appropriate scene points, and find the model points and the appropriate scene points, and find the affine transformation giving the best least-squares match affine transformation giving the best least-squares match for these corresponding sets. for these corresponding sets.

If the least-squares difference is too big go back to Step C If the least-squares difference is too big go back to Step C for another candidate triplet. for another candidate triplet.

Finally, the transformed model is compared with the scene Finally, the transformed model is compared with the scene (this time we are considering not only previously extracted (this time we are considering not only previously extracted ‘interest points’). ‘interest points’).

If this comparison gives a bed result go back again to Step If this comparison gives a bed result go back again to Step

C. C.

Recognition under Similarity

The situation when the viewing angle of the camera is the same The situation when the viewing angle of the camera is the same both for the model and the image (e.g. industry setting)both for the model and the image (e.g. industry setting)

Similarity: private case of affine – no change is neededSimilarity: private case of affine – no change is needed Similarity is orthogonal – two points are enough to form a basis Similarity is orthogonal – two points are enough to form a basis

which spans the 2D plane (third point is uniquely defined by the which spans the 2D plane (third point is uniquely defined by the two)two)

Same algorithm with pairs instead of triplets Same algorithm with pairs instead of triplets Complexity is reduced for preprocessing by a factor of Complexity is reduced for preprocessing by a factor of mm and and

worse case of the recognition by factor of worse case of the recognition by factor of nn

Line Matching

Extraction of points might be quite noisy. A line is Extraction of points might be quite noisy. A line is more stable feature than a point. Whenever lines can be more stable feature than a point. Whenever lines can be

extracted in a reliable way, e.g. scenes of polyhedral extracted in a reliable way, e.g. scenes of polyhedral objects, we can apply similar procedures to linesobjects, we can apply similar procedures to lines All the point matching techniques apply directly to lines, All the point matching techniques apply directly to lines,

since lines can be viewed as points in the dual spacesince lines can be viewed as points in the dual space Three lines which have no parallel pair are a basis of the Three lines which have no parallel pair are a basis of the affine space, each line has unique coordinates in this basisaffine space, each line has unique coordinates in this basis We repeat exactly the matching procedure as isWe repeat exactly the matching procedure as is We can use line segments to reduce the complexity of the We can use line segments to reduce the complexity of the

matching algorithm matching algorithm

If the endpoints of line segments can be reliably extracted, If the endpoints of line segments can be reliably extracted, then instead of a triplet of points or lines as a basis, we can then instead of a triplet of points or lines as a basis, we can take a line segment plus an additional point.take a line segment plus an additional point. The reduction of complexity is significant - Since an The reduction of complexity is significant - Since an

affine transformation maps collinear points into affine transformation maps collinear points into collinear points and points of line intersection into points collinear points and points of line intersection into points

of the same line intersection, we may develop algorithms of the same line intersection, we may develop algorithms which combine point and line informationwhich combine point and line information

For example, even if the algorithm utilizes point triplets as For example, even if the algorithm utilizes point triplets as an affine basis, the verification can be done not only on an affine basis, the verification can be done not only on other ‘interest points’ coordinates , but also on line other ‘interest points’ coordinates , but also on line

equations, etc. equations, etc.

Experimental Results Recognition results of a composite overlapping scene ofRecognition results of a composite overlapping scene of both pliers, which was also significantly tilted both pliers, which was also significantly tilted In the scene we have additional ‘interest points’ which are created by In the scene we have additional ‘interest points’ which are created by

the superposition of the two objects. These points do not correspond the superposition of the two objects. These points do not correspond to the ‘interest points’ of the original modelsto the ‘interest points’ of the original models

A number of the original ‘interest points’ are occluded in the sceneA number of the original ‘interest points’ are occluded in the scene The total number of ‘interest points’ in the scene (next) is 28. 16 of The total number of ‘interest points’ in the scene (next) is 28. 16 of

them are unoccluded model points of the second pliers out of 21 them are unoccluded model points of the second pliers out of 21 original model points (see next)original model points (see next)

Extracted interest points in the composite scene image

Running the recognition algorithm on all the possible basis Running the recognition algorithm on all the possible basis triplets of the scene. triplets of the scene. For each triplet we found the set of best (maximum vote) For each triplet we found the set of best (maximum vote)

matching model triplets. matching model triplets. The number of points identified by such a triplet as model The number of points identified by such a triplet as model

points are the, so called, no. of votes in the first column of the points are the, so called, no. of votes in the first column of the table table

The second column gives the number of triplets, which The second column gives the number of triplets, which obtained these votesobtained these votes

The third column gives the number of triplets which were The third column gives the number of triplets which were verified as belonging to the model (correct triplets).verified as belonging to the model (correct triplets).

Experimental Results

Remarks:

a)a) Since we have 16 model points in the scene, we Since we have 16 model points in the scene, we expect a expect a

maximum of 13 votes for a correct triplet.maximum of 13 votes for a correct triplet.

b)b) Since all 6 ordered occurrences of the same Since all 6 ordered occurrences of the same unordered triplet will give the same voting result, unordered triplet will give the same voting result, unordered triplets are counted in the statistics. In the unordered triplets are counted in the statistics. In the algorithm we are dealing with ordered triplets, thus, algorithm we are dealing with ordered triplets, thus, for example, we have 4x6=24 ordered basis triplets for example, we have 4x6=24 ordered basis triplets with the maximal number of votes.with the maximal number of votes.

The former composite pliers scene with an additional object whichdo not belong to the model data base

Conclusions The method is based on the representation of objects by The method is based on the representation of objects by

point sets and matching corresponding sets of pointspoint sets and matching corresponding sets of points By applying geometric constraints these sets of points can By applying geometric constraints these sets of points can

be further represented by a small subset of points (basis be further represented by a small subset of points (basis points)points)

The size of the basis depends on the transformation The size of the basis depends on the transformation applied to the modelsapplied to the models

A basis of 2 points is sufficient for 2-D scenes under A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scalerotation, translation and scale

A basis of 3 points is sufficient for affine transformation A basis of 3 points is sufficient for affine transformation for the perspective viewfor the perspective view

The process is divided into preprocessing and recognition The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing – reduces complexity, enables off-line preprocessing

Error Analysis Analysis of the effect of noise on the accuracy of the Analysis of the effect of noise on the accuracy of the

measurements obtained from the imagemeasurements obtained from the image Feature coordinates are quantized to hash Feature coordinates are quantized to hash In the presence of noise there might be some error in the In the presence of noise there might be some error in the

extracted values of the coordinatesextracted values of the coordinates This may result in accessing incorrect bins of the hash This may result in accessing incorrect bins of the hash

table table Calculate range of hash table bins which are consistent Calculate range of hash table bins which are consistent

with the feature coordinates extracted in the presence of with the feature coordinates extracted in the presence of noise noise

By accessing all these bins we assure that votes for the By accessing all these bins we assure that votes for the correct solution will not be lostcorrect solution will not be lost

Redundancy factor The need to access a range of bins for a given coordinate The need to access a range of bins for a given coordinate

results in an increased number of candidate (model,basis) results in an increased number of candidate (model,basis) pairs participating in the votingpairs participating in the voting

Incorrect (model,basis) pairs might get high vote at randomIncorrect (model,basis) pairs might get high vote at random To estimate this effect on the likelihood of getting false To estimate this effect on the likelihood of getting false

matches to a given basis we estimate the size of the set of matches to a given basis we estimate the size of the set of (model,basis) pairs counted for a given image point (model,basis) pairs counted for a given image point

The number of bins in the hash table which are consistent The number of bins in the hash table which are consistent with a given coordinate assuming a certain noise model is with a given coordinate assuming a certain noise model is defined as the defined as the redundancy factor redundancy factor

Estimate the redundancy factor for the case of point Estimate the redundancy factor for the case of point matching under various transformations and estimate the matching under various transformations and estimate the probability of a ‘random’ candidate solution to score probability of a ‘random’ candidate solution to score relatively high vote.relatively high vote.

Worse case analysis is assumedWorse case analysis is assumed

The Probability of False Matches In order to evaluate the efficiency of the voting stage we In order to evaluate the efficiency of the voting stage we

have to estimate the average number of solutions that may have to estimate the average number of solutions that may get a high voteget a high vote

Given a certain vote threshold and a random pair Given a certain vote threshold and a random pair (model,basis) we would like to know the probability of this (model,basis) we would like to know the probability of this random pair to get more than random pair to get more than votesvotes

Although such ‘falseAlthough such ‘false solutions’ will be rejected in the solutions’ will be rejected in the following verification stages, their expected number following verification stages, their expected number directly affects the computational efficiency of the directly affects the computational efficiency of the techniquetechnique

We assume that each of the bins in the hash table has equal We assume that each of the bins in the hash table has equal probability to be picked in the voting procedureprobability to be picked in the voting procedure

Note that the coordinates of the points in different bases Note that the coordinates of the points in different bases are dependent, hence the computation of their are dependent, hence the computation of their

distribution is not straightforward, and the former distribution is not straightforward, and the former assumption is simplistic assumption is simplistic

What is the probability that a certain random basis will obtain more than votes ?

Let Let kk be the size of a basis; be the size of a basis; MM - the number of models; - the number of models; mm # #

of features on a model, of features on a model, nn # of features in the image; - the # of features in the image; - the

fraction of model features serving as an acceptance fraction of model features serving as an acceptance

threshold; threshold; NN – size of the hash table; – size of the hash table; bb - voting redundancy - voting redundancy

factorfactor

Assuming one model in the DB, the entries of the hash table Assuming one model in the DB, the entries of the hash table

contain the information on the bases at which the address contain the information on the bases at which the address

coordinates occurred coordinates occurred

Given a Given a kk - feature basis in the image, the coordinates of all - feature basis in the image, the coordinates of all

the other the other n - kn - k features are computed, and each of them features are computed, and each of them

votes for a certain bin in the hash table.votes for a certain bin in the hash table.

m

Once a Once a kk -tuple of basis features in the image is chosen, the -tuple of basis features in the image is chosen, the coordinates of the coordinates of the n - kn - k other image features are computed, and other image features are computed, and

for each such coordinate the hash table is accessed for each such coordinate the hash table is accessed (approximately) (approximately) bb times times

Since each model basis has Since each model basis has m – k m – k entries in the entries in the NN bin table,we bin table,we assume that each basis has a probability of to be chosen assume that each basis has a probability of to be chosen

in a single accessin a single access The probability to chose a certain basis The probability to chose a certain basis BB in in bb accesses is accesses is (for small (for small pp it is ~ it is ~bpbp) ) The number of votes The number of votes VV scored by a basis scored by a basis BB in in n - kn - k accesses can accesses can

be computed using the be computed using the Binomial DistributionBinomial Distribution with probability with probability

, namely , namely The probability that The probability that VV exceeds the threshold : exceeds the threshold :

m kp

N

1 1b

bp p

bp , bB n k pm

1n k jj

b bj m

n kP V m p p

j

Since is usually very small and Since is usually very small and n-kn-k is large, the is large, the Binomial DistributionBinomial Distribution is well approximated by the is well approximated by the Poison Poison DistributionDistribution with Hence is well with Hence is well

approximated byapproximated by : : TheThe calculation of gave us the probability of one calculation of gave us the probability of one

specific basis to be voted as a correct matchspecific basis to be voted as a correct match However, we are interested in the average number of bases However, we are interested in the average number of bases

that will be accepted as a correct matchthat will be accepted as a correct match Let be the number of model bases that can be a-priori Let be the number of model bases that can be a-priori

matched to a given matched to a given kk -tuple basis in the image -tuple basis in the image - since each basis is defined by a pair of model - since each basis is defined by a pair of model

pointspoints

bp

.bn k p P V m

!0

1jj m

bj m j

p P V j ej

Bp

Bn

2Bn m

Let Let XX be the number of bases that accepted more be the number of bases that accepted more than votesthan votesThen Then XX is modeled by the Binomial Distribution is modeled by the Binomial Distribution

Hence the expected number of ‘accepted bases’ is Hence the expected number of ‘accepted bases’ is

The above calculation is for one basis The above calculation is for one basis k-k-tuple chosen tuple chosen in the image. It increases linearly with the number of in the image. It increases linearly with the number of image image k-k-tuples (bases) examined and with thetuples (bases) examined and with thenumber of models number of models MM in the data base in the data base

m

,B BB n p

.B Bn p

The probability to score 0.6 m votes.

In the table one can see some typical examples for the expect In the table one can see some typical examples for the expect

number of ‘random’ bases achieving a 0.6 number of ‘random’ bases achieving a 0.6 mm vote. (The total number of bins in vote. (The total number of bins in

this case is 7,200) One can see that these numbers are very small.this case is 7,200) One can see that these numbers are very small.

Coordinate Error Estimation

We assume 2-D recognition under affine transformationWe assume 2-D recognition under affine transformation We assume that the models can be acquired under ‘ideal’ We assume that the models can be acquired under ‘ideal’

circumstances (e.g. from a CAD model), hence the circumstances (e.g. from a CAD model), hence the preprocessing step is noiseless preprocessing step is noiseless

In the recognition step, image coordinates of interest points In the recognition step, image coordinates of interest points are measured and are represented by 2-D vectorsare measured and are represented by 2-D vectors

We may define a norm on this 2-D vector space. We will We may define a norm on this 2-D vector space. We will

usually use either the Euclidean or the maximum usually use either the Euclidean or the maximum

coordinate norm. Assume that image point coordinate norm. Assume that image point measurements introduce an error of at most in the given measurements introduce an error of at most in the given norm. norm.

2 ,L

L

The computation of the coordinates of an interest point The computation of the coordinates of an interest point

in the affine basis in the affine basis (a,b,c)(a,b,c) can be formulated can be formulated as a solution of the linear system of 2 equations in 2 as a solution of the linear system of 2 equations in 2 unknowns unknowns Ax=dAx=d

If If aa is the origin of the affine basis triplet, then the two is the origin of the affine basis triplet, then the two columns of the matrix are the difference vectors of the columns of the matrix are the difference vectors of the basis interest points basis interest points b-a b-a and and c-ac-a respectively, and the free respectively, and the free vector is vector is

These vectors are represented in image coordinates, These vectors are represented in image coordinates, while the solution vector while the solution vector xx gives the representation of the gives the representation of the point point in the affine basis in the affine basis (a,b,c)(a,b,c) coordinates coordinates

1 2,d d d

.d d a

d

Taking the errors into account, our task can be formulated as theTaking the errors into account, our task can be formulated as the

solution of the following linear system:solution of the following linear system:

where and ,are the errors of the matrix where and ,are the errors of the matrix AA and the and the

vectors vectors xx and and dd respectively respectively

By the nature of our point measurements, we may assume thatBy the nature of our point measurements, we may assume that

the absolute values of entries of the matrix and the vector the absolute values of entries of the matrix and the vector

are less than some given measurement error are less than some given measurement error

Note that (e.g. Golub) Note that (e.g. Golub)

where is the condition number of the matrix where is the condition number of the matrix AA

The above inequality holds for any vector norm and its The above inequality holds for any vector norm and its

appropriate matrix norm.appropriate matrix norm.

A A x x d d

,A x d

A d

2 x A d

k A Ox A d

1k A A A

The above inequality gives an estimate of the The above inequality gives an estimate of the maximal relative error which can be introduced by maximal relative error which can be introduced by the image measurement noise into the coordinates the image measurement noise into the coordinates of the hash- table address of the hash- table address xx

Hence, the voting procedure reflects this noise:Hence, the voting procedure reflects this noise:

For an address For an address xx all the bins with addresses in the all the bins with addresses in the

of of xx participate in the voting. participate in the voting. This ensures that votes for a correct model basis are This ensures that votes for a correct model basis are

not missed due to noise not missed due to noise In practice, tighter bounds usually apply In practice, tighter bounds usually apply

neighborhoodx

Since appropriate voting bins for each address can be Since appropriate voting bins for each address can be evaluated in advance, we do not expect a correct basis evaluated in advance, we do not expect a correct basis

triplet to achieve less votes than the corresponding number triplet to achieve less votes than the corresponding number of unoccluded model points of unoccluded model points

There still remains the possibility of a ‘random’ basis- There still remains the possibility of a ‘random’ basis- triplet achieving a large number of votes.triplet achieving a large number of votes.

Such a ‘wrong’ candidate will be discovered by two Such a ‘wrong’ candidate will be discovered by two verification procedures that are incorporated in the verification procedures that are incorporated in the algorithmalgorithm

Although ‘wrong’ candidates will be discovered in the Although ‘wrong’ candidates will be discovered in the verification step and discarded, we would still like to show verification step and discarded, we would still like to show that the probability of a ‘random’ configuration to get a that the probability of a ‘random’ configuration to get a

high vote is smallhigh vote is small See simulation results next See simulation results next

Simulation - Affine Transformation

Under affine we expect a greater effect of noise, since theUnder affine we expect a greater effect of noise, since the condition number of the matrix condition number of the matrix AA is no longer bounded by 2 is no longer bounded by 2 We have often basis triplets with We have often basis triplets with k(A) k(A) between 6 and 10between 6 and 10 Bases triplets resulting in a matrix Bases triplets resulting in a matrix AA with a relatively big with a relatively big

condition number represent unstable solutions, hence are not condition number represent unstable solutions, hence are not too informativetoo informative

Such bases can be eliminated from the recognition process Such bases can be eliminated from the recognition process without entering the voting procedurewithout entering the voting procedure

Relative error in point coordinate for a given basis triplet

This error depends also on the distance of that pointThis error depends also on the distance of that point from the from the

origin of the basis triplet (see inequality) origin of the basis triplet (see inequality) Hence, even if the condition number is of moderate size, stillHence, even if the condition number is of moderate size, still might be relatively largemight be relatively large Hence, only coordinates with under a prescribed threshold Hence, only coordinates with under a prescribed threshold

are participating in the voting procedureare participating in the voting procedure The threshold on was taken to be 0.25, namely we have The threshold on was taken to be 0.25, namely we have

allowed allowed xx to deviate at most 25% of its size in the norm to deviate at most 25% of its size in the norm Such thresholding, usually, resulted in approximately 70% of Such thresholding, usually, resulted in approximately 70% of

all the possible coordinate values participating in the voting all the possible coordinate values participating in the voting

procedure. procedure.

x

x

x

x

L

x

x

Percentages of coordinates which were obtained in three Percentages of coordinates which were obtained in three different simulations of the recognition having ratio or less different simulations of the recognition having ratio or less with the parameters with the parameters m=12,n=20, .m=12,n=20, .

x

x

1.5

Simulation results from m=12,n=20,

Some representative results of the simulation experimentsSome representative results of the simulation experiments

Number of votes obtained by all the image bases Number of votes obtained by all the image bases Estimated probability of random bases matchesEstimated probability of random bases matches The total number of possible model-image bases pairing isThe total number of possible model-image bases pairing is The probabilities of in both columns are of the same magnitudeThe probabilities of in both columns are of the same magnitude

3 3M m n

1.5

Simulation results for M=1,m=15,

AlthoughAlthough the absolute number of bases with a high vote in may look large,we the absolute number of bases with a high vote in may look large,we

should note that in these cases the search space is much bigger than in the cases should note that in these cases the search space is much bigger than in the cases

of the similarity transformation Thus the probability of a randomly chosen of the similarity transformation Thus the probability of a randomly chosen

image basis to obtain a high score remains very low.image basis to obtain a high score remains very low.

2.0

Discussion

The recognition part of the Geometric Hashing technique is The recognition part of the Geometric Hashing technique is based on two major stages: voting and verification based on two major stages: voting and verification Are they both necessary?Are they both necessary? Can the voting procedure on its own recover the correct Can the voting procedure on its own recover the correct

solution only, without introducing false ‘candidates’?solution only, without introducing false ‘candidates’?

- The examples, that we have examined, strongly suggest that - The examples, that we have examined, strongly suggest that

the voting procedure by itself can serve as a reliablethe voting procedure by itself can serve as a reliable

recognition technique only for the case of rigid motion recognition technique only for the case of rigid motion

(rotation and translation) and for non complicated scenes (rotation and translation) and for non complicated scenes

under the similarity transformation. It cannot be the only under the similarity transformation. It cannot be the only

procedure in complicated scenes under the affine procedure in complicated scenes under the affine

transformation transformation

Is the voting stage useful? Why not apply the verification Is the voting stage useful? Why not apply the verification stage directly to the candidate solutions?stage directly to the candidate solutions?

- The voting stage is just a ‘filtering’ procedure which should - The voting stage is just a ‘filtering’ procedure which should eliminate a ‘big chunk’ of candidate false solution before the eliminate a ‘big chunk’ of candidate false solution before the direct verification is applied. direct verification is applied.

A reliable verification procedure is usually quiet tedious and A reliable verification procedure is usually quiet tedious and time consuming, hence big time saving can be achieved by time consuming, hence big time saving can be achieved by avoiding this procedure. avoiding this procedure.

Thus, we have to examine the ratio of the ‘false candidates’ Thus, we have to examine the ratio of the ‘false candidates’ emerging from the voting stage compared with the total emerging from the voting stage compared with the total number of candidate solutions which have to be examined by number of candidate solutions which have to be examined by direct verification. direct verification.

This ratio is the This ratio is the ‘filtering factor’‘filtering factor’ of the voting stage. of the voting stage.

The ‘filtering factor’ of the voting stage equals the probability The ‘filtering factor’ of the voting stage equals the probability that a false model basis will get a vote above the preset that a false model basis will get a vote above the preset threshold. threshold. The results show that the estimated ‘filtering factor’ of the The results show that the estimated ‘filtering factor’ of the Geometric Hashing voting stage is quite significant even for Geometric Hashing voting stage is quite significant even for the more difficult affine transformation case the more difficult affine transformation case Note that the error analysis assumed a worst case error, so that Note that the error analysis assumed a worst case error, so that

no correct solution would be missed. By using a different (e.g. no correct solution would be missed. By using a different (e.g. average case) error model, one can increase the time saved, average case) error model, one can increase the time saved, although the recognition might be somewhat less reliable. although the recognition might be somewhat less reliable.

Conclusion: Conclusion: The application of the voting procedure causes a The application of the voting procedure causes a significant reduction in the complexity of recognition. significant reduction in the complexity of recognition.

Extensions 3-D objects recognition from range data can be 3-D objects recognition from range data can be

accomplished by similar methods using 3 point accomplished by similar methods using 3 point bases. bases.

Recognition of non-flat 3-D objects from 2-D Recognition of non-flat 3-D objects from 2-D images, using the following various options:images, using the following various options:1. 1. Approximation of the model objects by ‘almost’ planar Approximation of the model objects by ‘almost’ planar

faces and treating each such face as a model.faces and treating each such face as a model. The problem then reduces to recognition of flat 3-D objects.The problem then reduces to recognition of flat 3-D objects. This method will be especially favorable for polyhedral This method will be especially favorable for polyhedral objects, however it will not apply for objects without a stable objects, however it will not apply for objects without a stable polyhedral approximation.polyhedral approximation.

2. 2. Discretization of the space into viewing directionsDiscretization of the space into viewing directions.. Given a viewing direction we are faced with a similarityGiven a viewing direction we are faced with a similarity transformation only, which solution has a reduced transformation only, which solution has a reduced complexity. However the procedure will have to register allcomplexity. However the procedure will have to register all allowed viewing directions.allowed viewing directions.3. 3. Looking for 4 point correspondences between the 3-D Looking for 4 point correspondences between the 3-D model and 2-D image.model and 2-D image. Four non-coplanar points define a 3D basis. Other model points Four non-coplanar points define a 3D basis. Other model points can be represented by their coordinates in this basis. Assuming can be represented by their coordinates in this basis. Assuming the affine approximation of the viewing transformation, image the affine approximation of the viewing transformation, image points will have the same linear representation by the points will have the same linear representation by the corresponding four point set. Note, however, that this set is not corresponding four point set. Note, however, that this set is not an affine 2-D basis but only a spanning set, hence the an affine 2-D basis but only a spanning set, hence the representation is not unique.representation is not unique.

Extensions – continued

Implementation of similar matching Implementation of similar matching procedure based on synthesis of point and procedure based on synthesis of point and line information line information

Affine invariant curve matchingAffine invariant curve matching Recognition of objects using parameterized Recognition of objects using parameterized

modelsmodels

Conclusions The method is based on the representation of objects by The method is based on the representation of objects by

point sets and matching corresponding sets of pointspoint sets and matching corresponding sets of points By applying geometric constraints these sets of points can By applying geometric constraints these sets of points can

be further represented by a small subset of points (basis be further represented by a small subset of points (basis points)points)

The size of the basis depends on the transformation The size of the basis depends on the transformation applied to the modelsapplied to the models

A basis of 2 points is sufficient for 2-D scenes under A basis of 2 points is sufficient for 2-D scenes under rotation, translation and scalerotation, translation and scale

A basis of 3 points is sufficient for affine transformation A basis of 3 points is sufficient for affine transformation for the perspective viewfor the perspective view

The process is divided into preprocessing and recognition The process is divided into preprocessing and recognition – reduces complexity, enables off-line preprocessing – reduces complexity, enables off-line preprocessing

Recognize!

geometric hashing visual recognition lecture 9 “answer me speedily” psalm, 17

Documents