recurrent image annotator

35
Recurrent Image Annotator for Arbitrary Length Image Tagging JIREN JIN NAKAYAMA LAB

Upload: -

Post on 21-Feb-2017

45 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recurrent Image Annotator

Recurrent Image Annotator for Arbitrary Length Image TaggingJIREN JINNAKAYAMA LAB

Page 2: Recurrent Image Annotator

2

1. Introduction to Automatic Image Annotation

Page 3: Recurrent Image Annotator

3

Automatic Image Annotation (AIA)

Page 4: Recurrent Image Annotator

4

Difficulties of the TaskMost previous work focus on several problems:• label sparsity• label imbalance• incorrect/incomplete labels

The basic way is to utilize:• image-to-tag correlation• tag-to-tag correlation

Page 5: Recurrent Image Annotator

5

Existing Methods• generative models (distribution over image features and annotation tags), Yu et al.• discriminatively trained classifiers, Claudio et al.• based on K-nearest-neighbor (KNN), Guillaumin et al.• based on Object detection, Song et al.

Page 6: Recurrent Image Annotator

6

2. The Missing Part: Annotation Length

Page 7: Recurrent Image Annotator

7

Missing Part: Annotation LengthConventional evaluation has a fixed annotation length • annotate k most relevant keywords • evaluate retrieval performance per keyword• average over keywords• typical k value is 5 or 3Why did they do this?• for ease of comparison with previous results• most existing methods cannot trivially predict proper number of tags

Page 8: Recurrent Image Annotator

8

Why Annotation Length MattersFixed annotation length:• not the natural way that we humans annotate images • not the fact of realistic images

Problem to solve: predict results with

arbitrary length AL:arbitrary length

T5: top-5

GT:

Ground truth

Page 9: Recurrent Image Annotator

9

3. Our Solution: Recurrent Image Annotator

Page 10: Recurrent Image Annotator

10

Sequence generation• just output them one by one -> arbitrary annotation length• previous outputs influence the current output -> tag-to-tag correlation

Inspired by machine translation and image captioning• image or language A’s sentence to be encoded• image description or language B’s sentence to be decoded

Natural Way for Arbitrary Length Outputs

Karpathy, et al. (2014)

Vinyals, Oriol, et al. (2014).

Page 11: Recurrent Image Annotator

11

What Else We NeedAn order of the tags• Both image captioning and machine translation aim to generate sentences, which have a natural order. • Unfortunately, in image annotation task, order is not available.• We have to choose or learn an order. Points for a useful order “rule”:• should be based on semantic image and tag information • tag sequences in each training example should follow the same rule to be sorted

Easy to learn Good for generation

Page 12: Recurrent Image Annotator

12

Contributions1.analyze the insufficiency in existing methods:

◦ unable to generate image dependent number of tags

2.first to form image annotation as a sequence generation problem◦ propose a novel RNN based model Recurrent

Image Annotator 3.propose and evaluate several orders for

sorting the tag inputs ◦ show the importance of tag order in tag sequence

generation problem

Page 13: Recurrent Image Annotator

13

Recurrent Image Annotator (RIA)

Page 14: Recurrent Image Annotator

14

4. Submodules of Recurrent Image Annotator

Page 15: Recurrent Image Annotator

15

Neural Networks

Hidden layer: linear transformation + nonlinear activation function (e.g., sigmoid function)Simple network from

Wikipedia

Fully-connected

Page 16: Recurrent Image Annotator

16

Convolutional Neural Networks

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Local connectivity

Shared weights

3D volumes of neurons

Page 17: Recurrent Image Annotator

17

Recurrent Neural Networks

http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Page 18: Recurrent Image Annotator

18

Long Short Term Memory NetworksAn improved version of RNN:• Remember information for long periods of time• Use gating units to control information flow through time steps

S.Hochreiter and J.Schmidhuber, 1997

Core idea of LSTM:

the cell state

easy for information to just

flow along it unchanged http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 19: Recurrent Image Annotator

19

5. Experimentation

Page 20: Recurrent Image Annotator

20

Dataset 1: Corel 5KVocabulary size

260

Number of images

4,493

Words per image

3.4 (maximum is 5)

Images per word

58.6 (maximum is 1004)

Page 21: Recurrent Image Annotator

21

Dataset 2: ESP GAMEVocabulary size

269

Number of images

18,689

Words per image

4.7(maximum is 15)

Images per word

362.7 (maximum is 4553)

Page 22: Recurrent Image Annotator

22

Dataset 3: IAPR-TC12Vocabulary size

291

Number of images

17,665

Words per image

5.7 (maximum is 23)

Images per word

347.7 (maximum is 4999)

Page 23: Recurrent Image Annotator

23

Evaluation Measures• precision, P (averaged over classes)• recall, R (averaged over classes)• f-measure, F (averaged over classes)• the number of classes with non-zero recall value, N+

Page 24: Recurrent Image Annotator

24

Different Orders for Tag Sequences• dictionary order: alphabetical order• random order: random sorting tags in each training example• frequent-first order: put the frequent tags ahead rare tags• rare-first order: put the rare tags ahead frequent tags

Page 25: Recurrent Image Annotator

25

6. Analysis and Conclusion

Page 26: Recurrent Image Annotator

26

Arbitrary Length Annotation (1)

Page 27: Recurrent Image Annotator

27

Arbitrary Length Annotation (2)

Page 28: Recurrent Image Annotator

28

Arbitrary Length Annotation (3)

Page 29: Recurrent Image Annotator

29

Compare Influence of Different Orders

P: precisionR: recallF: f-measureN+: the number of class with non-zero recall valuesLarger value represents better performance.

Page 30: Recurrent Image Annotator

30

Analysis of Results for Different Orders Why rare-first outperforms frequent-first:• “rare” means rare in the datasets, however, for the single image, it may represent more importance• frequent tags are easier to predict than rare tags naturally, while frequent-first order makes the easy task easier, but the difficult task more difficult• correctly predicting rare tags is more important in the per-class evaluation measure

Page 31: Recurrent Image Annotator

31

Top-5 Annotation

P: precisionR: recallF: f-measureN+: the number of class with non-zero recall values

Much faster testing speed: Constant time (5ms) for each testing image,instead of O(N) in KNN based methods.N: number of training images

Page 32: Recurrent Image Annotator

32

Conclusion• transform image annotation to sequence generation problem • achieve comparable performance to state-of-the-art methods • decide appropriate annotation length automatically • obtain a much faster testing speed• confirm the importance of a proper tag sequence order

Page 33: Recurrent Image Annotator

33

Output of This Work1.Accepted by International Conference

on Pattern Recognition (ICPR) 2016 (oral)

2.Web demo for RIA: www.nlab.ci.i.u-tokyo.ac.jp/annotator

Page 34: Recurrent Image Annotator

34

Future workImprove the strategy to obtain the tag sequence order• e.g., use reinforcement learning to learn the order automatically

Extend to personal preference annotation• consider eye-catching effect, etc.

Page 35: Recurrent Image Annotator

35