![Page 1: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/1.jpg)
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed image
Vocabulary
Pinar Duygulu, Kobus Barnard, Nando de Freitas and David Forsyth
UC Berkeley Digital Library ProjectUBC Computer Science
Funding provided by NSF Digital Library Initiative II.Kobus Barnard also receives funding from NSERC (Canada)
Pinar Duygulu is also supported by TUBITAK (Turkey)
![Page 2: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/2.jpg)
•How to model?
Problems in Object Recognition
•Scale
•What is an object ?
![Page 3: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/3.jpg)
Our Approach
Object recognition on a large scale is linking words with image regions
tiger
grass
grass
grass
tiger
tiger grass cat
Use joint probability of words and pictures in largedatasets
![Page 4: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/4.jpg)
Auto-Annotating Images
tiger grass cat
Other related work : Maron 98, Mori 99
Barnard, Forsyth (ICCV 2001) , Barnard, Duygulu, Forsyth (CVPR 2001)
Finding words for the images
![Page 5: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/5.jpg)
Annotation vs Recognition
tiger cat grass?
Cannot be solved with one example
![Page 6: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/6.jpg)
Statistical Machine Translation
Data: Aligned sentences, but wordcorrespondences are unknown
“the beautiful sun”
“le soleil beau”
Brown, Della Pietra, Della Pietra & Mercer 93
![Page 7: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/7.jpg)
Statistical Machine Translation
Given the correspondences, we canestimate the translation p(sun|soleil)
Given the probabilities, we can estimate the correspondences
![Page 8: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/8.jpg)
Statistical Machine Translation
Enough data + EM, we canobtain the translation p(sun|soleil)=1
“the beautiful sun”
“le soleil beau”
![Page 9: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/9.jpg)
Multimedia Translation
“sun sea sky”
![Page 10: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/10.jpg)
392 CD’s, each consisting of 100 annotated images.
Corel Database
![Page 11: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/11.jpg)
Input
sun sky waves sea
Imageprocessing*
*Thanks to Blobworld team [Carson, Belongie, Greenspan, Malik], N-cuts team [Shi, Tal, Malik]
Each region is described by a set of features• Region size• Position• Color• Oriented energy (12 filters)• Simple shape features
![Page 12: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/12.jpg)
Tokenization
- Words � word tokens
- Image segments
•represented by 40 features(size, position, color, texture and shape)
•k-means to cluster features
•best cluster for the blob � blob tokens
![Page 13: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/13.jpg)
Data160 CD’s100 images in each
10 setseach :
randomly selected 80 CD’s~6000 training~2000 test150-200 word tokens500 blob tokens
Segmentationabout a month
![Page 14: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/14.jpg)
city mountain sky sun jet plane sky
jet plane sky
cat forest grass tiger
cat grass tiger waterbeach people sun water
![Page 15: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/15.jpg)
Assignments
“sun sea sky”
p(a1=1)
p(a1=2) p(a1=3)
p(a1=4)
Bn
Σ p(a1 = i) = 1i=1
![Page 16: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/16.jpg)
“sun sea sky”
p(a2=1)
p(a2=2) p(a2=3)
p(a2=4)
Bn
Σ p(a2 = i) = 1i=1
Assignments
![Page 17: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/17.jpg)
Assignments
“sun sea sky”
p(a3=1)
p(a3=2) p(a3=3)
p(a3=4)
Bn
Σ p(a3 = i) = 1i=1
![Page 18: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/18.jpg)
Initialization
Initialize translation table to blob-word cooccurences(emprical joint distribution of blobs and words)
.. ..
sun sea
![Page 19: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/19.jpg)
Expectation Maximization
Given the translation probabilities estimate the correspondences
Given the correspondences estimate the translation probabilities
Dempster et al., 77
![Page 20: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/20.jpg)
EM algorithmE step :
(for one pair)
b1 b3 b4
w1 w5
b2 b1 b5
w1 w2 w4
. . .
b1 b2
w1 w2 w6
. ...
w1b1
w2
b2
Predicting correspondences from translation probabilities
translation probabilities correspondences
![Page 21: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/21.jpg)
EM algorithmM step :
(for one pair)Predicting translation probabilities from correspondences
. ...
w1b1
w2
b2
translation probabilities
b1 b3 b4
w1 w5
. . .
b1 b2
w1 w2 w6
correspondences
b2 b1 b5
w1 w2 w4
![Page 22: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/22.jpg)
Dictionary
sun
sky
cat
horse
![Page 23: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/23.jpg)
Labeling Regions
On a new image
• Find the blob token
•Look at the word posterior given the blob
•For each region
•Segment the image
![Page 24: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/24.jpg)
Labeling Regions
tiger
cat
hors
egras
s
sun
fore
st
tiger
cat
hors
e
gras
s
sun
fore
st
![Page 25: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/25.jpg)
Labeling Regions
tiger
cat
hors
egras
s
sun
fore
st
Display only maximal probable word
tiger
![Page 26: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/26.jpg)
![Page 27: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/27.jpg)
![Page 28: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/28.jpg)
Measuring Performance
First strategy--score by hand
Second strategy--use annotation performance as a proxy.
![Page 29: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/29.jpg)
First StrategyScore by hand
Average performance is four times better than guessing the most common word
(“water”)
![Page 30: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/30.jpg)
Second Strategy Use Annotation
tiger cat grass water
Automatic : Don’t need to do by hand
![Page 31: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/31.jpg)
Annotating Images
. . .
![Page 32: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/32.jpg)
GRASS TIGER CAT FOREST
Predicted Words
Actual Keywords
CAT HORSE GRASS WATER
Measuring Annotation Performance
![Page 33: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/33.jpg)
GRASS TIGER CAT FOREST
Predicted Words
Actual Keywords
Measuring Annotation Performance
CAT HORSE GRASS WATER
![Page 34: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/34.jpg)
![Page 35: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/35.jpg)
![Page 36: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/36.jpg)
![Page 37: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/37.jpg)
Improving the System
•Refusing to predict
•Merging indistinguishable words
![Page 38: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/38.jpg)
Refusing to predict
if p(word | blob) > threshold
predict a wordotherwise
assign null
Null and fertility problemssimple solution to null - refusing to predict
![Page 39: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/39.jpg)
Examples (null threshold = 0.2)
![Page 40: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/40.jpg)
Recall and Precision(for null threshold from 0 to 0.5)
selected good words selected bad words
![Page 41: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/41.jpg)
Clustering Indistinguishable Words
merge words which can’t be told apart
e.g. locomotive vs. train
![Page 42: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/42.jpg)
Examples
![Page 43: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/43.jpg)
Future Directions(machine learning)
Estimate where a minimal amount of supervision can be most helpful (and provide it)
![Page 44: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/44.jpg)
Future Directions(computer vision)
Propose good features to differentiate words that are not distinguishable (e.g., eagle and jet)
![Page 45: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/45.jpg)
Future Directions(computer vision)
Propose region merging based on posterior word probabilities
Propose merging
![Page 46: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/46.jpg)
Conclusions
Recognition on the large scale
Unsupervised - using the available data efficiently
Learn what to recognize
![Page 47: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando](https://reader035.vdocuments.site/reader035/viewer/2022071010/5fc82145c7e6f93e73747f79/html5/thumbnails/47.jpg)
The End