visual object category recognition

34
Visual Object Category Recognition Ashish Gupta Centre for Vision, Speech, and Signal Processing

Upload: ashish-gupta

Post on 13-Jan-2015

187 views

Category:

Technology


0 download

DESCRIPTION

PhD qualification presentation on visual object category recognition

TRANSCRIPT

Page 1: Visual Object Category Recognition

Visual Object Category Recognition

Ashish Gupta

Centre for Vision, Speech, and Signal Processing

Page 2: Visual Object Category Recognition

Contents

• Introduction

• Related work

• Overview: Object recognition system

• Object classification & detection

• Conclusions

• Future work

Page 3: Visual Object Category Recognition

Introduction

Research Topic: Visual object category recognition using weakly supervised learning.

DIPLECS: Artificial cognitive system for autonomous systems.

• Interested in object interactions determined by their functional properties.

• All objects in same category have the same functional properties.

• Recognition is based on object’s visual properties.

Page 4: Visual Object Category Recognition

Introduction

Research Topic: Visual object category recognition using weakly supervised learning.

• A very large training set is required to learn the large appearance variation in a category.

• So we utilize huge image datasets like Flickr®

and GoogleTM Image.

• The images are corrupt and incompletely labelled.

• Therefore, weakly supervised learning is utilized which can handle corrupt and noisy training data.

Page 5: Visual Object Category Recognition

Challenges

Intra-category appearance Pose Clutter Scale

Occlusion Illumination Articulation Camouflage

Page 6: Visual Object Category Recognition

Background

Page 7: Visual Object Category Recognition

Work done

Page 8: Visual Object Category Recognition

Visual Recognition System

Page 9: Visual Object Category Recognition

SIFT feature descriptor

Page 10: Visual Object Category Recognition

Occurrence frequency of visual words is characteristic of the object

Object model : bag-of-visual words

Creating a visual codebook

Page 11: Visual Object Category Recognition

Object model : bag-of-visual words

A test image can be classified based on the distance of its normalized codebook from the codebooks of positive and negative training samples.

Codebook positive samples Codebook negative samples Codebook test image

Page 12: Visual Object Category Recognition

Object model : bag-of-visual words

Visual codebooks for positive and negative samples of ‘car’ category in PASCAL VOC 2006

Page 13: Visual Object Category Recognition

Object model : bag-of-visual words

Visual codebooks for ‘car’ and ‘cow’ categories in PASCAL VOC 2009 dataset

Page 14: Visual Object Category Recognition

Classification

ROC (Receiver Operating Characteristics): evaluating classification performance.

ROC for ‘car’ category in PASCAL VOC 2006

The linear kernel: K(x,y) = xTy, was used since it is fast.

Page 15: Visual Object Category Recognition

Improve Classification

Larger Visual Codebook:

• More representative of category

• Higher computational cost

ROC of ‘car’ category in PACAL VOC

2006 for codebook sizes from 20 to

20000 visual words.

Page 16: Visual Object Category Recognition

Improve Classification

Page 17: Visual Object Category Recognition

Improve Classification

Training and test images in the dataset scaled down by same factor.

Training and test images scaled down by different factors.

Page 18: Visual Object Category Recognition

Improve Classification

Training Samples Dataset 1 Training Samples Dataset 2Scale down factor

/1

/2

Y NY Y

Test Image Image classified correctly

Page 19: Visual Object Category Recognition

Improve Classification

ROC for 20 visual categories in PASCAL VOC 2009

The PACAL VOC 2009 dataset is

larger and more challenging than the

2006 dataset.

Page 20: Visual Object Category Recognition

Improve Classification

ROC for PASCAL VOC 2009 training and test images images scaled down by factor of 2

ROC for PASCAL VOC 2009 using a universal visual vocabulary

Page 21: Visual Object Category Recognition

Object localization using sliding window

The poor localization results are due to:

• Lack of structural information in the bag-of-words object model

• Classifier learning object background

Page 22: Visual Object Category Recognition

Visual codebook

Training images with bounding - boxes

Training images without bounding - boxes

Good Codebook with equal population of positive and negative visual words

Positive background different from negative images

Positive background similar to negative images

With no bounding-box

utilized, the codebook

consists of a majority of

negative visual words.

Page 23: Visual Object Category Recognition

Visual codebook

Training images with bounding - boxes

Training images without bounding - boxes

Good Codebook with equal population of positive and negative visual words

Positive background different from negative images

Positive background similar to negative images

Classification based on

object context

(background) rather than

object features.

Page 24: Visual Object Category Recognition

Improve Classification

The detection at each iteration estimates a bounding box which provides a better

visual codebook which in turn leads to better detection.

Page 25: Visual Object Category Recognition

• Key-point configurations as features are a discriminativeobject feature set.

• A configuration of visual words appends structural informationto the bag-of-words model.

Object detection

• Harvest frequent and discriminative configurations.

• Encode configurations called transaction vectors.

• Association between a transaction vector and the

training type is an association rule.

• Apriori algorithm finds association rules with high

confidence in a support-confidence framework. Transaction vector encoding key-point configuration

Page 26: Visual Object Category Recognition

Apriori algorithm

• Uses breadth-first search and tree structure.

• Longer configurations will have lower support as

they are infrequent but higher confidence as they

are more discriminative.

• Downward closure lemma: prune configurations

with infrequent sub-sets.

Page 27: Visual Object Category Recognition

Object localization

Training Data Set

Test Data Set

Test Image

Generate Transactions Transactions Apriori data

miningAssociation

Rules

Generate Confidence for each Transaction

Threshold Confidence

Transactions

• A confidence is assigned to every

key-point in the image.

• Key-points with sufficiently high

confidence are retained.

• Key-points which occur on

common background objects like

doors and windows can have high

confidence.

Page 28: Visual Object Category Recognition

Object classification using Apriori

Training Data Set

Test Data Set

Generate Transactions Transactions Apriori data

miningAssociation

Rules

Generate Confidence for each Transaction

Sum Confidence

TransactionsTest Images

ROC ‘car’ in PASCAL VOC 2006

The summed confidence score depends

upon object scale in the image, which

explains the comparatively poor

performance of this approach.

Page 29: Visual Object Category Recognition

Conclusions

• The ‘bag-of-words’ model is good for classification, but poor for localization.

• Separate foreground-background for better visual codebooks.

• The good classification using PASCAL VOC 2006 dataset is attributed to

recognition of object context rather than object features.

• The dataset utilized should have sufficient variation in appearance of the

object and its background.

• Larger visual vocabulary gives slightly better classification, but is

computationally more expensive.

• The visual vocabulary built has majority of background visual words since

bounding-boxes are not utilized during training.

Page 30: Visual Object Category Recognition

Conclusions

• Improving the proportion of visual words representing the object in the

vocabulary is vital for good classification.

• Incorporate object boundary contour to the descriptor.

• Use of frequent and discriminative key-point configurations is a promising

approach for object localization.

• A low quality dataset results in a weak visual codebook and classifiers biased

to the training data.

• Classification using key-point configurations was poor compared to ‘bag-of-

words’ for PASCAL VOC 2006.

Page 31: Visual Object Category Recognition

Future Work

• Improve a visual codebook by increasing the proportion of visual words

pertaining to object features. Combine Apriori based localization and

clustering for visual word selection in an iterative approach.

•Model visual scene information (Use the GIST descriptor by Torralba). Learn

co-occurrence statistics of a scene and a visual category. Recognition of the

scene serves as prior for object presence and improves object recognition

performance.

• Improve object localization by using context priming.

• Model object contextual information to aid foreground-background

disambiguation for better object localization.

Page 32: Visual Object Category Recognition

Future Work

• Share information of features between visual categories. The size of a

universal visual vocabulary should increase sub-linearly with increase in

number of visual categories.

• Combine image segmentation and classification to improve the object

model to provide better classification performance.

• Build a hierarchical framework for visual categorization:

• Representation: combine local and global features.

• Model: combine semantic and structural object models.

• Classification: combine generative and discriminative approaches.

Page 33: Visual Object Category Recognition

Future Work

Page 34: Visual Object Category Recognition

Questions?