combining text/image in wikipediamm task 2009clef.isti.cnr.it/2009/clef2009-workshop-slides/... ·...

16
Combining text/image in WikipediaMM task 2009 Christophe Moulin , C´ ecile Barat, C´ edric Lemaˆ ıtre, Mathias G´ ery, Christophe Ducottet, Christine Largeron Laboratoire Hubert Curien, Saint- ´ Etienne, France October 1st 2009 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 1 / 16

Upload: others

Post on 17-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Combining text/image in WikipediaMM task 2009

Christophe Moulin, Cecile Barat, Cedric Lemaıtre, Mathias Gery,Christophe Ducottet, Christine Largeron

Laboratoire Hubert Curien, Saint-Etienne, France

October 1st 2009

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 1 / 16

Page 2: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Outline

1 Model overviewTextual vector space modelVisual vocabularyCombining text and image modalities

2 Experiments

3 Conclusion and future work

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 2 / 16

Page 3: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview

α +(1 − α)bag of words

approach

��

��documents

�indexing�

�combining

Model overviewA textual/visual model based on the bag of words approach

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 3 / 16

Page 4: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Textual vector space model

��

��stop words filtering

��

��Porter stemming

��

��bag of words creation

Textual vocabulary creationMain steps of the textual bag of words creation

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 4 / 16

Page 5: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Textual vector space model

bag of words vector of tf.idf weights

[2]

[1]: Salton et al.A vector space model for automatic indexing, 1975[2]: Robertson et al.Okapi et trec-3, 1994

Textual vector weightingSalton’s based tf.idf weighting[1]

wi,j = tfi,jidfj

tfi,j : representativeness

idfj : discrimination power

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 5 / 16

Page 6: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Textual vector space model

original Wikipedia article(n char around the image)

metadata of Wikipedia imageused in ImageCLEFwiki

Exploiting of the text around an image

Two sources of text : metadata + extracted text of the original Wikipediaarticles

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 6 / 16

Page 7: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Visual vocabulary

descriptors descriptorsprojection

visualvocabulary

bag of visualwords

descriptors bag of visualwords

vector oftfidf weights

[3]: Jurie et al.Creating efficient codebooks for visual recognition, 2005

Visual representationSimilar to the text representation using a visual codebook[3]

Visual vocabulary creation

Image representation

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 7 / 16

Page 8: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Visual vocabulary

meanstd(6 dimensions: 9350 visual words)

sift2(128 dimensions: 9630 visual words)

sift1(128 dimensions: 9303 visual words)

Visual features computationTwo different descriptors are used

regular partitioning: 16× 16 cells

interest regions based on MSER detector

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 8 / 16

Page 9: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Combining text and image modalities

query documents

Score matchingDistance computed between query and document vectors

query documentscore1 tf tf.idfscore2 tf.idf tf.idf

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 9 / 16

Page 10: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Model overview Combining text and image modalities

α +(1 − α)bag of words

approach

Model overviewLinear combination of textual and visual scores

α is fixed globally on ImageCLEFwiki 2008

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 10 / 16

Page 11: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Experiments

Global results

rank participant/score text image map num ret num rel ret

1 deuceng TXT - 0.2397 43052 1351

5 lahc/score2 100 char meanstd (α=0.025) 0.2178 44993 12136 lahc/score2 50 char meanstd (α=0.025) 0.2148 44993 1218

14 lahc/score2 metadata sift2 (α=0.084) 0.1903 44993 121215 lahc/score2 100 char - 0.1890 38004 120516 lahc/score2 50 char - 0.1880 37041 119820 lahc/score2 metadata meanstd (α=0.025) 0.1845 44993 120821 lahc/score2 metadata sift1 (α=0.012) 0.1807 44995 120024 lahc/score2 metadata meanstd (α=0.015) 0.1792 44993 121333 lahc/score2 metadata - 0.1667 35611 119244 lahc/score1 metadata - 0.1432 35611 116452 lahc/score2 metadata sift2 0.0365 619 14253 lahc/score2 metadata meanstd 0.0338 574 7654 lahc/score2 metadata sift1 0.0321 637 120

57 sztaki - IMG 0.0068 44993 80

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 11 / 16

Page 12: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Experiments

Textual results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

score1 (map: 0.1432)score2 (map: 0.1667)

score2 50 char (map: 0.1880)score2 100 char (map: 0.1890)

Improvements provided by additional text (15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 12 / 16

Page 13: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Experiments

Textual+visual results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.2 0.4 0.6 0.8 1

score2 (map: 0.1667)score2 sift1: α=0.012 (map: 0.1807)

score2 meanstd: α=0.025 (map: 0.1845)score2 sift2: α=0.084 (map: 0.1903)

sift2 > meanstd> sift1

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 13 / 16

Page 14: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Experiments

Best results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 0.2 0.4 0.6 0.8 1

score2 50 char (map: 0.1880)score2 100 char (map: 0.1890)

score2 50 char + meanstd (map: 0.2148)score2 100 char + meanstd (map: 0.2178)

Improvements provided by visual information (15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 14 / 16

Page 15: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Conclusion and future work

ConclusionImprovement of our last year model

It works:

Text around the image in original wikipedia articles. (+15%)

Addition of visual features (MSER+sift). (color/texturecomplementarity)

Text-Image combination. (+15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 15 / 16

Page 16: Combining text/image in WikipediaMM task 2009clef.isti.cnr.it/2009/CLEF2009-workshop-slides/... · Model overview Combining text and image modalities query documents Score matching

Conclusion and future work

Future work

Combination with more than one visual descriptor.

Other fusion method.

Learnα for each query.

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 16 / 16