object recognition as machine translation: learning a lexicon …duygulu/talks/eccv2002.pdf ·...

47
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando de Freitas and David Forsyth UC Berkeley Digital Library Project UBC Computer Science Funding provided by NSF Digital Library Initiative II. Kobus Barnard also receives funding from NSERC (Canada) Pinar Duygulu is also supported by TUBITAK (Turkey)

Upload: others

Post on 20-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed image

Vocabulary

Pinar Duygulu, Kobus Barnard, Nando de Freitas and David Forsyth

UC Berkeley Digital Library ProjectUBC Computer Science

Funding provided by NSF Digital Library Initiative II.Kobus Barnard also receives funding from NSERC (Canada)

Pinar Duygulu is also supported by TUBITAK (Turkey)

Page 2: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

•How to model?

Problems in Object Recognition

•Scale

•What is an object ?

Page 3: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Our Approach

Object recognition on a large scale is linking words with image regions

tiger

grass

grass

grass

tiger

tiger grass cat

Use joint probability of words and pictures in largedatasets

Page 4: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Auto-Annotating Images

tiger grass cat

Other related work : Maron 98, Mori 99

Barnard, Forsyth (ICCV 2001) , Barnard, Duygulu, Forsyth (CVPR 2001)

Finding words for the images

Page 5: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Annotation vs Recognition

tiger cat grass?

Cannot be solved with one example

Page 6: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Statistical Machine Translation

Data: Aligned sentences, but wordcorrespondences are unknown

“the beautiful sun”

“le soleil beau”

Brown, Della Pietra, Della Pietra & Mercer 93

Page 7: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Statistical Machine Translation

Given the correspondences, we canestimate the translation p(sun|soleil)

Given the probabilities, we can estimate the correspondences

Page 8: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Statistical Machine Translation

Enough data + EM, we canobtain the translation p(sun|soleil)=1

“the beautiful sun”

“le soleil beau”

Page 9: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Multimedia Translation

“sun sea sky”

Page 10: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

392 CD’s, each consisting of 100 annotated images.

Corel Database

Page 11: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Input

sun sky waves sea

Imageprocessing*

*Thanks to Blobworld team [Carson, Belongie, Greenspan, Malik], N-cuts team [Shi, Tal, Malik]

Each region is described by a set of features• Region size• Position• Color• Oriented energy (12 filters)• Simple shape features

Page 12: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Tokenization

- Words � word tokens

- Image segments

•represented by 40 features(size, position, color, texture and shape)

•k-means to cluster features

•best cluster for the blob � blob tokens

Page 13: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Data160 CD’s100 images in each

10 setseach :

randomly selected 80 CD’s~6000 training~2000 test150-200 word tokens500 blob tokens

Segmentationabout a month

Page 14: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

city mountain sky sun jet plane sky

jet plane sky

cat forest grass tiger

cat grass tiger waterbeach people sun water

Page 15: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Assignments

“sun sea sky”

p(a1=1)

p(a1=2) p(a1=3)

p(a1=4)

Bn

Σ p(a1 = i) = 1i=1

Page 16: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

“sun sea sky”

p(a2=1)

p(a2=2) p(a2=3)

p(a2=4)

Bn

Σ p(a2 = i) = 1i=1

Assignments

Page 17: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Assignments

“sun sea sky”

p(a3=1)

p(a3=2) p(a3=3)

p(a3=4)

Bn

Σ p(a3 = i) = 1i=1

Page 18: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Initialization

Initialize translation table to blob-word cooccurences(emprical joint distribution of blobs and words)

.. ..

sun sea

Page 19: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Expectation Maximization

Given the translation probabilities estimate the correspondences

Given the correspondences estimate the translation probabilities

Dempster et al., 77

Page 20: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

EM algorithmE step :

(for one pair)

b1 b3 b4

w1 w5

b2 b1 b5

w1 w2 w4

. . .

b1 b2

w1 w2 w6

. ...

w1b1

w2

b2

Predicting correspondences from translation probabilities

translation probabilities correspondences

Page 21: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

EM algorithmM step :

(for one pair)Predicting translation probabilities from correspondences

. ...

w1b1

w2

b2

translation probabilities

b1 b3 b4

w1 w5

. . .

b1 b2

w1 w2 w6

correspondences

b2 b1 b5

w1 w2 w4

Page 22: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Dictionary

sun

sky

cat

horse

Page 23: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Labeling Regions

On a new image

• Find the blob token

•Look at the word posterior given the blob

•For each region

•Segment the image

Page 24: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Labeling Regions

tiger

cat

hors

egras

s

sun

fore

st

tiger

cat

hors

e

gras

s

sun

fore

st

Page 25: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Labeling Regions

tiger

cat

hors

egras

s

sun

fore

st

Display only maximal probable word

tiger

Page 26: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando
Page 27: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando
Page 28: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Measuring Performance

First strategy--score by hand

Second strategy--use annotation performance as a proxy.

Page 29: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

First StrategyScore by hand

Average performance is four times better than guessing the most common word

(“water”)

Page 30: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Second Strategy Use Annotation

tiger cat grass water

Automatic : Don’t need to do by hand

Page 31: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Annotating Images

. . .

Page 32: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

GRASS TIGER CAT FOREST

Predicted Words

Actual Keywords

CAT HORSE GRASS WATER

Measuring Annotation Performance

Page 33: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

GRASS TIGER CAT FOREST

Predicted Words

Actual Keywords

Measuring Annotation Performance

CAT HORSE GRASS WATER

Page 34: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando
Page 35: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando
Page 36: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando
Page 37: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Improving the System

•Refusing to predict

•Merging indistinguishable words

Page 38: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Refusing to predict

if p(word | blob) > threshold

predict a wordotherwise

assign null

Null and fertility problemssimple solution to null - refusing to predict

Page 39: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Examples (null threshold = 0.2)

Page 40: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Recall and Precision(for null threshold from 0 to 0.5)

selected good words selected bad words

Page 41: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Clustering Indistinguishable Words

merge words which can’t be told apart

e.g. locomotive vs. train

Page 42: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Examples

Page 43: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Future Directions(machine learning)

Estimate where a minimal amount of supervision can be most helpful (and provide it)

Page 44: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Future Directions(computer vision)

Propose good features to differentiate words that are not distinguishable (e.g., eagle and jet)

Page 45: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Future Directions(computer vision)

Propose region merging based on posterior word probabilities

Propose merging

Page 46: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

Conclusions

Recognition on the large scale

Unsupervised - using the available data efficiently

Learn what to recognize

Page 47: Object Recognition as Machine Translation: Learning a Lexicon …duygulu/Talks/ECCV2002.pdf · Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu, Kobus Barnard, Nando

The End