beyond nouns eccv_2008
DESCRIPTION
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers-- A PresentationTRANSCRIPT
![Page 1: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/1.jpg)
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual
Classifiers
Abhinav Gupta and Larry S. DavisUniversity of Maryland, College Park
Proceedings of ECCV 2008
Presented by: Debaleena Chattopadhyay
![Page 2: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/2.jpg)
Presentation Outline
- The Problem Definition
- The Novelty
- The Problem Solution
- The Results
![Page 3: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/3.jpg)
The Problem Definition
To learn visual classifiers for object recognition from weakly labeled data
Labels: city, mountain, sky, sun
Input:
Expected Output:
citymountai
n
sky sun
![Page 4: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/4.jpg)
Novelty
To learn visual classifiers for object recognition from weakly labeled data utilizing additional language
constructs
Labels: (Nouns) city, mountain, sky, sun(Relations) below(mountain, sky), below(mountain, sun) above(sky, city), above(sun, city)
brighter(sun, mountain), brighter(sun, city) behind(mountain, city), convex(sun, city)
in(sun, sky), smaller(sun, sky)
Input:
Expected Output:
citymountai
n
sky sun
![Page 5: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/5.jpg)
Related WorkSome Previous Works:• Learn classifiers for visual attributes from a training dataset of
+ve and –ve images using a generative model [Ferrari et. al]
• Learn adjectives and nouns in 2 steps (adjectives in the 1st step, nouns in the 2nd) using a latent model
[Bernard et. al]
Some After Works:• Mining Discriminative Adjectives and Prepositions for
Natural Scene Recognition [Fei-Fei Li et. al, CVPR 09]
• Joint Learning of visual attributes, object classes and visual saliency
[ Forsyth et. al, ICCV 2009]
![Page 6: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/6.jpg)
Overview
Relationships: in, above, below
SEA
SUN
SKY
(SEA, SUN)
(SEA, SKY)
(SKY, SEA)
(SKY, SUN)
(SUN, SKY)
(SUN, SEA)
Pairs of Nouns:
Nouns:
![Page 7: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/7.jpg)
Proposed Algorithm• Dataset: Training set annotated with nouns and binary
relationships (prepositions and comparative adjectives)• Algorithm:
o Each image represented into a set of image regions.o Each image region is represented by a set of features o Classifiers for nouns are based on these features (CA)o Classifiers for relationships are based on differential features
extracted from pairs of regions (CR)o EM-approach is used to learn noun and relationship models
simultaneously E-step: Update assignments of nouns to image regions,
given CA and CR
M-step: Update model parameters,(CA and CR ) given updated assignments
![Page 8: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/8.jpg)
The Generative Model
Ij Ik
ns np
r
Ijk
CA
CR
Graphical Model for Image Annotation
![Page 9: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/9.jpg)
Learning the Model
EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)
E-step: Compute the noun assignment using parameters from the previous iteration. P( noun i assigned to region j) =
Where,
'
'
'
'
( | , , )| , ,
( | , , )
lij
lik
l l old
A Al l l oldi l l old
k A A
P A IP A j I
P A I
' ' '( | , , ) ( | , , ) ( | , )l l old l l old l oldP A I P A I P A I
![Page 10: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/10.jpg)
Learning the Model
![Page 11: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/11.jpg)
Learning the Model
EM-approach: Simultaneously solve for the correspondence problem and learn the parameters of classifiers (noun and relationship)
M-step: Update the model parameters depending on the updated assignments in the E-step. The Maximum Likelihood parameters depends upon the classifier used.
To utilize contextual information for labeling test-images, priors on relationship ,P(r|ns,np), are also learnt from a co-occurrence table after the relationship annotations are generated.
![Page 12: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/12.jpg)
Inference- Labeling• Test images are divided into regions. Region j is associated with some features Ij and noun nj.
• We know Ij and we have to estimate nj.
• The labeling problem is constrained by priors on relationships between pairs of nouns.• Bayesian Network is used to represent the labeling problem and belief propagation for inference.
![Page 13: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/13.jpg)
Experimental ResultsDataset: • Subset of Corel5k training and test dataset• For training, 850 images with nouns and hand-labelled relationships between subset of pairs of nouns.• Nearest neighbor and Gaussian Classifier based likelihood model for nouns is used.• Decision stump based likelihood model for relationships is used.• 173 nouns • 19 relationships: above, behind, below, beside, more textured, brighter, in, greener, larger, left, near, far from, ontopof, more blue, right, similar, smaller, taller, shorter• Image Features used (30): area, x, y, boundary/area, convexity, moment-
of-inertia, RGB (3), RGB stdev (3), L*a*b (3), L*a*b stdev (3), mean oriented energy, 30 degree
increments (12)
![Page 14: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/14.jpg)
Experimental Results
Resolution of Correspondence Ambiguities• On randomly sampled 150 images from the training dataset• Compared with human labeling• Performance measures:
Range of semantics identified- Both algorithm give similar performance (L)
Frequency Correct- Later algorithm performs better in number of times a noun is identified (R)
Nouns only
Nouns & Relationships
(learned)
Nouns & Relationships
(Human)
Proposed EM algorithm
bootstrapped by IBM Model 1
Proposed EM algorithm bootstrapped by Duygulu et. al
![Page 15: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/15.jpg)
Experimental ResultsReducing Correspondence Ambiguity
Duygulu et. al Beyond Nouns
![Page 16: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/16.jpg)
Experimental ResultsLabeling New Images:• Dataset: Subset of 500 images provided in Corel5k dataset. (Images were selected randomly from those images which had been annotated with words present in the learned vocabulary)• Performance Measure:
Missed Labels (L): Compute St/Sg where St= set of annotations provided by Corel dataset, Sg = set of annotations generated by the algorithmUsing proposed Bayesian model, missed labels decreases by 24% (IBM Model 1) and 17% (Duygulu et. al)
False Labels (R): Compared with human observers.
![Page 17: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/17.jpg)
Experimental ResultsImage Labeling : Constrained Bayesian Model
Duygulu et. al Beyond Nouns
![Page 18: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/18.jpg)
Experimental Results
Precision-Recall:
Precision Ratio- The ratio of number of images that have been correctly annotated with that word to the number of images which were annotated with the word by the algorithm. (Respect to Human Observers)
Recall Ratio: The ratio of the number of images correctly annotated with that word using the algorithm to the number of images that should have been annotated with that word. (Respect to Corel Annotations)
![Page 19: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/19.jpg)
Conclusion
• Most approaches to learn visual classifiers from weakly labeled data use a “bag” of nouns model and try to find correspondence using co-occurrence of image features and the nouns. However, correspondence ambiguity remains.
• This algorithm proposes an EM based method to simultaneously learn visual classifiers for nouns, prepositions and comparative adjectives.
• Experimental results show that using relationship words helps in reduction of correspondence ambiguity and using a constrained model leads to a better labeling performance.
![Page 20: Beyond nouns eccv_2008](https://reader035.vdocuments.site/reader035/viewer/2022081519/55909d7e1a28ab75148b46a0/html5/thumbnails/20.jpg)
Thank you