generic object recognition -- by yatharth saraf a project on

Generic Object RecognitionGeneric Object Recognition

-- by Yatharth Saraf-- by Yatharth Saraf

A Project on

Problem Definition and Problem Definition and BackgroundBackground

Recognizing generic class or category of a given object as Recognizing generic class or category of a given object as opposed to recognizing specific, individual objectsopposed to recognizing specific, individual objects humans are much better at generic recognition, humans are much better at generic recognition,

machines are more competitive at specific object machines are more competitive at specific object recognitionrecognition

Early work by Marr led to the ‘reconstruction school’Early work by Marr led to the ‘reconstruction school’ advocates 3-D reconstruction and modeling before advocates 3-D reconstruction and modeling before

further reasoning of a scenefurther reasoning of a scene

Current work in object categorization tends to fall in the Current work in object categorization tends to fall in the ‘recognition school’‘recognition school’ work in the 2-D domain, with 2-D image features and work in the 2-D domain, with 2-D image features and

descriptorsdescriptors e.g. Bag of features approaches, spatial 2-D geometry e.g. Bag of features approaches, spatial 2-D geometry

approaches as in the ‘constellation model’approaches as in the ‘constellation model’

ApplicationsApplications

Image database annotation and retrievalImage database annotation and retrieval Video surveillanceVideo surveillance Driver assistance, autonomous robotsDriver assistance, autonomous robots Cognitive support for disabled peopleCognitive support for disabled people

Related WorkRelated Work

Discriminative approachesDiscriminative approaches SVM, subspace methodsSVM, subspace methods

Bag of featuresBag of features Representation of objects with point Representation of objects with point

descriptorsdescriptors Constellation modelConstellation model

Representations that take into account spatial Representations that take into account spatial geometry (2-D) of key pointsgeometry (2-D) of key points

AssumptionsAssumptions

Images are scale-normalizedImages are scale-normalized Images are clean, i.e. no background Images are clean, i.e. no background

clutter/occlusionclutter/occlusion (-) Implies segmentation is necessary as a (-) Implies segmentation is necessary as a

pre-processing steppre-processing step (+) Avoids the problem of exponential search(+) Avoids the problem of exponential search

Outline of the Method (Training)Outline of the Method (Training)

Detect salient regions in all training Detect salient regions in all training images using Kadir-Brady feature detectorimages using Kadir-Brady feature detector

Extract X,Y coordinates, scale and 11x11 Extract X,Y coordinates, scale and 11x11 intensity patches around detected featuresintensity patches around detected features

Reduce dimensionality of appearance Reduce dimensionality of appearance patches from 121 to 16 using PCApatches from 121 to 16 using PCA

Estimate model parametersEstimate model parametersA single full Gaussian for location; one A single full Gaussian for location; one

Gaussian per partGaussian per part

Outline of the Method (Testing)Outline of the Method (Testing)

Extract features of test images in the same Extract features of test images in the same manner as in training phasemanner as in training phase

Use the learnt model to estimate Use the learnt model to estimate probability of detectionprobability of detection

Use Bayes’ Decision Rule to classifyUse Bayes’ Decision Rule to classify

ExperimentsExperiments

Careful tweaking of detector parameters Careful tweaking of detector parameters neededneeded

A single set of parameter settings may not A single set of parameter settings may not be suitable for all categoriesbe suitable for all categories

Starting scale: 23Starting scale: 3

Experiments (contd.)Experiments (contd.)

47 clean motorbike images used for 47 clean motorbike images used for training motorbike modeltraining motorbike model

Sorting the extracted patches by X-Sorting the extracted patches by X-coordinate helped (as opposed to sorting coordinate helped (as opposed to sorting by saliency)by saliency)

Appearance model not doing as wellAppearance model not doing as well

9 test images used (1-4 motorbikes, 5-7 cars, 8-9 faces)

Features sorted by saliency. Features sorted by X-coordinate.

Log-probabilities of the 9 test images from location model

Image 5 Image 9

Appearance log-probabilities of the 9 test images

Total log-probabilities of the 9 test images

Features sorted by saliency. Features sorted by X-coordinate.

Experiments (contd.)Experiments (contd.) Using a Mixture of Gaussians for the appearances of parts didn’t Using a Mixture of Gaussians for the appearances of parts didn’t

make too much differencemake too much difference

3 mixture components per part (EM initialized with k-means and sample covariances)

Experiments (contd.)Experiments (contd.) Levenshtein distances on the appearance patches worked quite Levenshtein distances on the appearance patches worked quite

nicelynicely

• Each appearance patch is a single character

• Matching cost was computed using a straight SSD

• Cost of inserting a gap = matching cost of the patch with a canonical 11x11 patch having uniform intensity of 128.

Conclusions and Future WorkConclusions and Future Work

Strong dependence on feature detectorStrong dependence on feature detectorAppearance model doesn’t seem to be Appearance model doesn’t seem to be

working too wellworking too wellLevenshtein distances could be more Levenshtein distances could be more

promisingpromisingExperiments with more clean training and Experiments with more clean training and

test data, multiple categoriestest data, multiple categoriesExponential search for dealing with clutter Exponential search for dealing with clutter

and occlusionand occlusion

Questions?Questions?

-- Thank You-- Thank You

generic object recognition -- by yatharth saraf a project on

Documents

d image features

d reconstruction

d geometry

bag of features approaches

generic object recognition

d domain

generic recognition

recognition school work